Application of Machine Learning in Battery: State of Charge Estimation Using Feed Forward Neural Network for Sodium-Ion Battery

: Estimating the accurate State of Charge (SOC) of a battery is important to avoid the over/undercharging and protect the battery pack from low cycle life. Current methods of SOC estimation use complex equations in the Extended Kalman Filter (EKF) and the equivalent circuit model. In this paper, we used a Feed Forward Neural Network (FNN) to estimate the SOC value accurately where battery parameters such as current, voltage, and charge are mapped directly to the SOC value at the output. A FNN could self-learn the weights with each training data point and update the model parameters such as weights and bias using a combination of two gradient descents (Adam). This model comprises the Dropout technique, which can have many neural network architectures by dropping the neuron/mode at each epoch/training cycle using the same weights and biases. Our FNN model was trained with data comprising different current rates and tested for different cycling data, for example, 5th, 10th, 20th, and 50th cycles and at a different cutoff voltage (4.5 V). The battery used for estimating the SOC value was a Na-ion based battery, which is highly non-linear, and it was fabricated in a house using Na 0.67 Fe 0.5 Mn 0.5 O 2 (NFM) as a cathode and Na metal as a reference electrode. The FNN successfully estimated the SOC value for the highly non-linear nature of the Na-ion battery at different current rates (0.05 C, 0.1 C, 0.5 C, 1 C, 2 C), for different cycling data, and at higher cut-off voltage of –4.5 V Na +, reaching the R 2 value of ~0.97–~0.99, ~0.99, and ~0.98, respectively.


Introduction
A recent report from the International Energy Agency (IEA) on the Global Electric Vehicle (EV) Outlook 2020 [1] showed a surge in demand for electric mobility in the coming decade across the world. Stated Policies Scenarios, which incorporates existing government policies, has estimated the rise in global battery capacity from 170 gigawatt-hours (GWh) per annum in 2019 to 1500 GWh per annum in 2030, whereas the Sustainable Development Scenario projected the battery capacity demand to 3000 GWh/year in 2030, driven by rapid electrification and a rise in electric heavy-duty vehicles. In short, there is global pressure for implementing the policy to minimize CO 2 emissions, and increasing battery-powered electric vehicles will make a considerable contribution to achieve the target.
Batteries are one of the expensive and important components in the electric vehicle; therefore, they must be managed properly by electronics and software, i.e., having a reliable battery management system (BMS). A BMS maximizes the performance (power and energy) delivered by the battery and its service life and protects the battery pack [2,3]. To achieve this specific task in a BMS, a sophisticated algorithm is implemented on these specialized electronics. A BMS must be able to estimate two fundamental types of non-measurable battery-pack quantities: (1) states that change quickly (state of charge, diffusion voltage, hysteresis voltage) and (2) parameters that change slowly (cell capacities, resistances). This paper shows how a Feed Forward Neural Network (FNN) can accurately determine the state of charge (SOC) for a Na-ion battery comprising Na 0.67 Fe 0.5 Mn 0.5 O 2 (NFM) as a cathode and Na metal as a reference electrode. The FNN shows the following novelties: (1) It can estimate SOC for the non-linear nature of a Na-ion battery that has a higher number of redox activities than a Li-ion battery; (2) A FNN uses voltage, current, and charge as inputs to the network and estimates the SOC without Kalman filter or equivalent circuit model; (3) A FNN can self-learn its weight using gradient descent and can be trained for a smaller number of epochs, which make the computation faster; (4) A FNN, once trained for different current rates (0.05 C, 0.1 C, 0.5 C, 1 C, and 2 C for four cycles each), at a specific voltage of 4.2 V vs. Na + /Na, can estimate the SOC for a battery cycled at different cutoff voltages (4.5 V vs. Na + /Na), for different current rates, and for various cycles (5th, 10th, 20th, and 50th cycles). This is unique because most of the trained models have tested data for the immediate cycle [14] or for only one cycle [13] to evaluate and understand the robustness nature of the model.
After a brief introduction, the second section will discuss how a feed forward neural network is designed for estimating the SOC. The third section presents how sodium layered cathode material was synthesized in house, fabricated in a coin cell, generating data that was collected for training/testing data from the battery station. In the fourth section, performance of the FNN is evaluated on a variety of test datasets.

Methods
The fundamental block of deep learning is an artificial neuron. The neuron takes an input (x 1 , x 2 , x 3 . . . .x n ) and, based on feature importance, each input is assigned some corresponding weight (w 1 , w 2 , w 3 . . . .w n ). This neuron takes an aggregate of weighted inputs, applies some function, and gives the output, as shown in Figure 1. For McCulloch Pitt's (MP) neuron model, input/output can only be Boolean and all weights are unity [18]. All the input are added together, since all the inputs are Boolean, which means counting the number of things that have a value of 1.  [17]. This paper shows how a Feed Forward Neural Network (FNN) can accurately determine the state of charge (SOC) for a Na-ion battery comprising Na0.67Fe0.5Mn0.5O2 (NFM) as a cathode and Na metal as a reference electrode. The FNN shows the following novelties: (1) It can estimate SOC for the non-linear nature of a Na-ion battery that has a higher number of redox activities than a Li-ion battery; (2) A FNN uses voltage, current, and charge as inputs to the network and estimates the SOC without Kalman filter or equivalent circuit model; (3) A FNN can self-learn its weight using gradient descent and can be trained for a smaller number of epochs, which make the computation faster; (4) A FNN, once trained for different current rates (0.05 C, 0.1 C, 0.5 C, 1 C, and 2 C for four cycles each), at a specific voltage of 4.2 V vs. Na + /Na, can estimate the SOC for a battery cycled at different cutoff voltages (4.5 V vs. Na + /Na), for different current rates, and for various cycles (5th, 10th, 20 th , and 50th cycles). This is unique because most of the trained models have tested data for the immediate cycle [14] or for only one cycle [13] to evaluate and understand the robustness nature of the model.
After a brief introduction, the second section will discuss how a feed forward neural network is designed for estimating the SOC. The third section presents how sodium layered cathode material was synthesized in house, fabricated in a coin cell, generating data that was collected for training/testing data from the battery station. In the fourth section, performance of the FNN is evaluated on a variety of test datasets.

Methods
The fundamental block of deep learning is an artificial neuron. The neuron takes an input ( , , … . ) and, based on feature importance, each input is assigned some corresponding weight ( , , … . ). This neuron takes an aggregate of weighted inputs, applies some function, and gives the output, as shown in Figure 1. For McCulloch Pitt's (MP) neuron model, input/output can only be Boolean and all weights are unity [18]. All the input are added together, since all the inputs are Boolean, which means counting the number of things that have a value of 1.  Aggregation of this input can be called a pre-activation function 'a' (Equation (1)). The value of 'a' will pass through the function 'h', called the activation function, and gives output 1, which means the neuron will fire if the summation value is greater than some threshold value 't', or it will output 0, which means the summation value is less than the threshold value (Equation (2)).
If the predicted output 'ŷ' is different from the true output 'y', then the error in this case would be the square of the difference between true and predicted values. The difference value is squared to avoid the cancelation of positive and negative difference values. Equation (3) shows the loss value for one bit of datum having numerous features, and Equation (4) shows the loss value of all the data points having different features of corresponding importance (weights).
To minimize the error/loss value, the unique value of threshold 't' is used via the brute force method. With minimum loss value, the model is tested based on the accuracy it achieved (Equation (5)). Accuracy = number of correct prediction Total number of predictions (5) The MP neuron model divides the output into two sections: One section consists of the predicted value 1 and the other section consists of predicted value 0. The problem with this model is that it takes only binary values (0 and 1), has a poor learning algorithm to search the better threshold value 't', and it is a linear model.
To overcome the limitation of the MP neuron model, a Sigmoid neuron model can be used as an alternative with logistic function as an activation function 'h'. Pre-activation 'a' is the same as in the MP neuron model. Summation of the weighted inputs (could be n-dimensional) along with the bias as 'b' and output y as sigmoid or logistic function (Equation (6)). For the two inputs' case, the Sigmoid function is shown in Equation (7). For more than two inputs, the output equation is shown Equation (8). The input to the sigmoid neuron would be of any input value and output will be a continuous value between 0 and 1, for example, 0.4, 0.6, 0.8, and so on. The loss value (L) calculated for the Sigmoid neuron model would be the same as Equation (4).
To minimize the loss value (L), the parameter of the sigmoid neuron model 'w i ' and 'b i ' should be such that the difference between predicted (ŷ) and true value (y) should be minimal. Initially, the model parameters are assigned some random values. Then, predict the output and compute the loss as in the Perceptron model [19]. It iterates by changing the value of 'w' and 'b' until the loss value is minimized. However, by changing the model parameters with some random guesswork, the loss value will decrease at some point and increase for the next potential value. In the actual world, we want to start at a random point and move towards the minimal loss value with some learning algorithm. Instead of guessing the model parameters, it requires a principal way of changing 'w' and 'b' in such a way that loss value is minimized in the unique direction. With the help of the Gradient Descent (GD) rule [20], the values of 'w' and 'b' are updated by a partial derivative of loss function, which accounts for the entire data, computes the predicted output, computes the loss, and then updates the parameters again. This loop iterates continuously until good accuracy (loss value is minimal) is achieved. There are functions in the framework such as Pytorch [21] and Tensorflow [22], which compute the parameters automatically.
In the actual world, the data are not just linearly separable. Therefore, we need a complex function to fit the data. To have a complex function, using a simple sigmoid neuron model as the basic building block would not predict the output with high accuracy. Instead, combining several such sigmoid neurons in various layers, as shown in Figure 2 (known as Deep Neural Network (DNN)), can approximate a complex function between input and output [23,24]. The DNN would be differentiable, as the basic block is differentiable to learn the model parameters. The final output (ŷ) would be a function of (x 1 , x 2 , x 3 . . . .x n ) and it would be very complex because each input is passing through many neurons having an activation function with multiple transitions in different layers. With different network architecture, the one that gives the minimal loss value would be the best DNN approximating the relationship between inputs and the output. In summary, a neural network (Deep Neural Network (DNN)) with a certain number of hidden layers, an activation function, s number of neurons, and a learning rate could approximate any functions that exist between inputs and output. the value of 'w' and 'b' until the loss value is minimized. However, by changing the model parameters with some random guesswork, the loss value will decrease at some point and increase for the next potential value. In the actual world, we want to start at a random point and move towards the minimal loss value with some learning algorithm. Instead of guessing the model parameters, it requires a principal way of changing 'w' and 'b' in such a way that loss value is minimized in the unique direction. With the help of the Gradient Descent (GD) rule [20], the values of 'w' and 'b' are updated by a partial derivative of loss function, which accounts for the entire data, computes the predicted output, computes the loss, and then updates the parameters again. This loop iterates continuously until good accuracy (loss value is minimal) is achieved. There are functions in the framework such as Pytorch [21] and Tensorflow [22], which compute the parameters automatically.
In the actual world, the data are not just linearly separable. Therefore, we need a complex function to fit the data. To have a complex function, using a simple sigmoid neuron model as the basic building block would not predict the output with high accuracy. Instead, combining several such sigmoid neurons in various layers, as shown in Figure 2 (known as Deep Neural Network (DNN)), can approximate a complex function between input and output [23,24]. The DNN would be differentiable, as the basic block is differentiable to learn the model parameters. The final output ( ) would be a function of ( , , … . ) and it would be very complex because each input is passing through many neurons having an activation function with multiple transitions in different layers. With different network architecture, the one that gives the minimal loss value would be the best DNN approximating the relationship between inputs and the output. In summary, a neural network (Deep Neural Network (DNN)) with a certain number of hidden layers, an activation function, s number of neurons, and a learning rate could approximate any functions that exist between inputs and output. In Figure 2, the very first layer is known as the input layer, comprising current, voltage, and charge, and the last layer is known as the output layer for predicting the state of charge (SOC). All the other layers between input and output layers are known as intermediate/hidden layers. Each neuron has two things: One is the pre-activation, denoted as 'a', and the other is activation, denoted as 'h'. As in the case of the simple sigmoid neuron, In Figure 2, the very first layer is known as the input layer, comprising current, voltage, and charge, and the last layer is known as the output layer for predicting the state of charge (SOC). All the other layers between input and output layers are known as intermediate/hidden layers. Each neuron has two things: One is the pre-activation, denoted as 'a', and the other is activation, denoted as 'h'. As in the case of the simple sigmoid neuron, the aggregation of inputs is known as pre-activation, and activation passes the aggregation of inputs to the sigmoid/logistic function. The weight is labeled as w ijk where i = layer number, j = neuron number, and k = input number. For example, w 121 is the first layer, the second neuron is attached to the first input, a ij is the pre-activation, and h ij is the activation function of each neuron, where i is the layer number, j is the neuron numbers, and b L is the bias associated with L number of layers. Below, Equations (9) and (10) show the matrix of weight 'W 1 ' and activation function 'h 1 ' for the first layer, respectively. The 'a 1n ' is the pre-activation function for the first layer for n neurons (Equation (11)).
For the second layer, output of the activation function of the first layer becomes the input for the pre-activation function of the second layer with the corresponding weights and bias. The general equations for 'L' number of layers are mentioned below, in Equations (12)-(14).
The pre-activation at layer 'i' is given by The activation at layer 'i' is given by where g is called the activation function.
The activation function at the output layer 'L' is given bŷ where 'O' is called the output activation function. The estimated outputŷ will be a very composite and complex function of all the inputs passing through lots of non-linearities all the way. Once we compute the loss value, we can feed it to the function in the framework (from Pytorch, TensorFlow), which will update the parameter via backpropagation to minimize the overall loss value. The above network is also known as a Feed-forward network (FNN).
There are many architectural designs for a FNN based on different variables. They can be different depending upon the activation functions such as those chosen for sigmoid, tanh, ReLU, or LeakyReLU. The number of hidden layers can be changed to 2, 4, 5, and so on, the number of neurons in each layer can be 15, 16, 20 . . . etc., the learning rate can be 0.1, 0.01, 0.0001, etc., different batch sizes can be 16, 32, 128, etc., and different gradient descent techniques such as Adam, Adagrad, RMS prop, Momentum GD, etc. can be used. Distinct designs of a FNN determine the different loss values. This is known as hyper-parameter tuning and it can be performed using GridSearchCV [25] or RandomSearchCV [26]. In GridSearchCV, all the combinations of each variable with all other parameters are used to design the FNN and are run for a specific number of epochs. (Note: A full one-training epoch is considered when it includes one forward pass and one backward pass, the process of sending the Loss value signal backward to update the weights and bias.) Once all the combinations are run, they will give the best possible parameter whose loss value will be minimal. Similarly, in RandomsearchCV, it will randomly choose a variable combination for a predetermined number of combinations and provide the best combination parameter whose loss value is minimal. Hyper-parameter tuning via GridSearchCV is computationally time consuming and gives out the minimal loss value. However, in machine learning, getting the minimal loss value of training data does not guarantee the best model, as the model needs to be validated on the validation data. If validation loss for validation data is higher for the same model, which has minimal loss value on training data, then the model is encountered with the over-fitting. It occurs when the gap between the training loss value and validation loss is higher.
In this work, we designed a two-layer neural network with 15 neurons in each layer and used sigmoid as the activation function. Before data were fed to the neural network, the data were normalized using MinMaxScaler (Equation (15)).
A few optimization techniques were used in this model such as a batch size of 16 and Adam. Traditionally, the model looks at all the data points, computes the partial derivative of losses for all the data points, and updates the parameters; this is computationally expensive. Using data points as Batch 'B', means B data points are fed to the network, which computes the partial derivative, keeping a log for all such B number of data observed, and updates the weights and bias accordingly. After all the data points in batch size are fed to the network it is considered as one epoch. Instead of updating the model parameters once, its update the weights and bias 'B' number of times. The Adam algorithm is the combination of the two-Gradient Descent (GD) rule (Equation (18)), which is a momentumbased GD (a history component is used to make the current update (Equation (16)), and the RMS prop GD (in which history is used to update the learning rate (Equation (17)) [27].
The β 1 and β 2 are the exponential rates for the first and second momentum estimates, respectively, and have values less than 1: For example, β 1 : 0.9 and β 2 : 0.999. The η is the learning rate and is a tiny number to prevent any denominator from going to zero, for example, 10 −8 .
To avoid the symmetry breaking problem for using similar weights, an initialization technique is used for the initial weights. Based on the activation function, an initialization technique was used; for example, the best initialization technique for the sigmoid and tanh functions is Xavier uniform and for ReLU/Leaky ReLU, the He initialization technique is preferred. For the FNN, weights were initialized using the Xavier uniform distribution with sampling interval [−r,r] (Equation (19) shows the equation of r) [28].
where r = 6 n in + n out (19) The n in and n out are the number of input and output connections, respectively.

Dropout Technique
The process to mitigate over-fitting is called regularization, which means modifications are made in the learning algorithm with the goal of reducing the generalization error/validation loss rather than training error. A few techniques are early stopping [29], data augmentation, L2 regularization [30], batch normalization, and dropout technique [31]. In early stopping, the number of epochs is stopped when the validation loss is minimal, whereas in data augmentation, more training data or defects' data are added for training the model. Among all, the dropout technique is an interesting way to minimize the validation error.
Consider the example of using 10 different model architectures to approximate the relation between input and output. Instead of relying on the output of one model, we could rely on the output of all 10 models by averaging all the output data. Training the data on all different models or with a different subset of data having a different number of neurons, layers, or activation functions and then computing the loss value for all such neural networks will be computationally expensive. Instead, we can build a model in such a way that it shares the same weights and bias for neurons but has a different number of hidden layers and neurons and gets updated only when it is necessary. This configuration can be achieved via the Dropout technique where a neuron can be dropped based on some threshold value. For example, if the value of a node is 0, that means the node/neuron is dropped, and if it is 1, then keep the node/neuron in the network. If the neural network comprises 15 neurons, the combination of the network is 2 15 . For n neurons, 2 n different possible NN architectures can be designed. For the dropped-out architecture, the weights for the nodes are kept in the network, which is going to be same as the original network. The model moves according to the previously fed data to the network, calculates the loss values, backpropagates the loss values, and updates the weights, which are used to compute the output value, but weights that are connected to the dropped node/neuron are not updated. For the next architecture, if the dropped neuron is connected, it will update the weights from the last iteration. In this way, weights and bias values are propagated or shared throughout multiple architectures, making them computationally workable and less time consuming. Each neuron will be present in half of all the networks and, thus, it will be updated for 50% of the period during training. We used the dropout rate of 20% between the inputs and hidden layer, meaning one in five inputs will be randomly excluded from each layer update.

Material Synthesis
P2-type Na 0.67 Fe 0.5 Mn 0.5 O 2 (NFM) was synthesized using the sol-gel technique. CH 3 COONa (10% excess), Fe(NO 3 ) 3 .9H 2 O, (CH 3 COO) 2 Mn.4H 2 O, and citric acid as cheating agent were dissolved in deionized water with an appropriate molar ratio. The mixed solution was heated at 80 • C and stirred until the deionized water was evaporated. The dried powder was ground and heated at 400 • C for 4 h. in air (ramp rate 5 • C/min) followed by subsequent heating at 950 • C for 15 h. (in air at ramp rate 5 • C/min). The final calcined powder was stored in an Argon-filled glove box (H 2 O, O 2 ≤ 0.1 ppm) to avoid exposure to humidity and air

Material Characterization
X-Ray Diffraction (XRD) was performed using a Rigaku Ultima IV diffractometer with D/tex Ultra High Speed Detector and PANalytical powder diffractometers over the 2θ range from 10 • to 80 • with a scan speed of 2 • /min with Cu Kα radiation (power setting 40 kV, 44 mA). Crystallographic evaluations of the sol-gel synthesized P2-type Na 0.67 Fe 0.5 Mn 0.5 O 2 were performed using the XRD patterns, shown in Figure 3a. The patterns showed that the NFM powder samples had a hexagonal, layered structure with a P63/mmc space group, as reported in our previous paper [32,33]. The morphology of the powder samples was observed using an ultra-high-resolution Field Emission Scanning Electron Microscope (FE-SEM) Hitachi SU7000. The particle shapes of the powdered samples were hexagonal crystalline having average particle sizes between~0.8-2.5 µm (Figure 3b).

Electrochemical Characterization
Electrochemical performance was analyzed by fabricating the cathode (NFM) in a CR2032 coin cell ((0.787-inch diameter * 0.125-inch height), United Minerals and Chemical Corporation). The half-cells were assembled using Na metal (dia.: 7/16 inch) as the counter electrode, two glass microfibers (Whatman DBS 30,dia.: 5/8 inch) as the separator, and a slurry cast cathode (NFM) as the working electrode (dia.: 7/16 inch). The active material slurry was prepared by mixing the active material, Super P binder (Kynar PVDF), in the mass ratio of 80:10:10, respectively. N-Methyl-2-pyrrolidine (NMP) was used as a solvent for making the viscous slurry. Cathodes were prepared by casting the active material slurry on a carbon-coated aluminum current collector. The active material loading was 2-3 mg/cm 2 ), followed by drying under vacuum at 90 °C overnight. The electrolyte used was 1.0 M NaClO4 in Propylene Carbonate (PC) with 2% Fluoroethylene Carbonate (FEC) as an additive. The coin cells were assembled in an Argon-filled glove box (H2O, O2 ≤ 0.1 ppm). Galvanostatic cycling assessments were performed using a Maccor Series 4000 battery tester.

Training Data
For training the FNN model, the data collected comprised current, voltage, and charge value at each time step, which were used as inputs to the FNN, and the calculated SOC was found at the output layer. The current, voltage, and charge values were found from the data set obtained from cycling the Na-based cathode coin cell (the Fabrication of the coin cell is explained in Section 2.4). Figure 4a shows the cycling behavior of the NFM cathodes cycled between 1.5-4.2 V vs. Na/Na + for five cycles each at different C-rates from 0.05 C to 2 C rate (1 C = 260 mAh/g). The first four discharge cycles for each C-rate were added together in the Excel file and used as the training data. The training data were divided into training and validation data for tuning the parameters. The validation data comprised 10% of the training data.

Electrochemical Characterization
Electrochemical performance was analyzed by fabricating the cathode (NFM) in a CR2032 coin cell ((0.787-inch diameter * 0.125-inch height), United Minerals and Chemical Corporation). The half-cells were assembled using Na metal (dia.: 7/16 inch) as the counter electrode, two glass microfibers (Whatman DBS 30, dia.: 5/8 inch) as the separator, and a slurry cast cathode (NFM) as the working electrode (dia.: 7/16 inch). The active material slurry was prepared by mixing the active material, Super P binder (Kynar PVDF), in the mass ratio of 80:10:10, respectively. N-Methyl-2-pyrrolidine (NMP) was used as a solvent for making the viscous slurry. Cathodes were prepared by casting the active material slurry on a carbon-coated aluminum current collector. The active material loading was 2-3 mg/cm 2 ), followed by drying under vacuum at 90 • C overnight. The electrolyte used was 1.0 M NaClO 4 in Propylene Carbonate (PC) with 2% Fluoroethylene Carbonate (FEC) as an additive. The coin cells were assembled in an Argon-filled glove box (H 2 O, O 2 ≤ 0.1 ppm). Galvanostatic cycling assessments were performed using a Maccor Series 4000 battery tester.

Training Data
For training the FNN model, the data collected comprised current, voltage, and charge value at each time step, which were used as inputs to the FNN, and the calculated SOC was found at the output layer. The current, voltage, and charge values were found from the data set obtained from cycling the Na-based cathode coin cell (the Fabrication of the coin cell is explained in Section 2.4). Figure 4a shows the cycling behavior of the NFM cathodes cycled between 1.5-4.2 V vs. Na/Na + for five cycles each at different C-rates from 0.05 C to 2 C rate (1 C = 260 mAh/g). The first four discharge cycles for each C-rate were added together in the Excel file and used as the training data. The training data were divided into training and validation data for tuning the parameters. The validation data comprised 10% of the training data. Galvanostatic discharge curve for NFM cycled between 1.5-4.2 V vs. Na/Na + for the fifth cycle of each current rate from 0.05 C to 2 C rate. (c) Specific capacity as a function of cycle number for NFM cathode cycled between 1.5-4.2 V vs. Na/Na + at 0.1 C rate for 50 cycles. (d) Galvanostatic discharge curve for NFM cycled between 1.5-4.5 V vs. Na/Na + for the 2nd, 5th, and 10th cycles at 0.05 C rate (1 C = 260 mA/g).

Testing Data
The testing data were characterized into three sections.
(1) Figure 4b shows the charge-discharge profile curve of the NFM cathode cycled between 1.5-4.2 V vs. Na/Na + for different C-rate. (Note: The fifth cycle of each C-rate such as the 5 th , 10 th , 15 th , 20 th , and 25 th cycles in Figure 4a). (2) Figure 4c shows the cycling data of the NFM cathode cycled between 1.5-4.2 V vs. Na/Na+ for 50 cycles at 0.1 C rate, out of which the 5 th , 10 th , 20 th , and 50 th cycles were used for testing the FNN model. (3) Figure 4d shows the charge and discharge voltage profile curves for the NFM cathode cycled between 1.5-4.5 V vs. Na/Na+ at a 0.05 C rate.

Results and Discussion
After training the FNN with the training data, as mentioned in Section 2.5.1, the FNN was tested for various test data sets fetched at different current rates, cycling data, and cutoff voltages, which were not part of the training data. The model was trained for 20 epochs, for a batch size of 32, a learning rate of 0.001, using a sigmoid activation function, using initialization technique as the Xavier uniform, and applying a dropout function to only the first hidden layer. Galvanostatic discharge curve for NFM cycled between 1.5-4.2 V vs. Na/Na + for the fifth cycle of each current rate from 0.05 C to 2 C rate. (c) Specific capacity as a function of cycle number for NFM cathode cycled between 1.5-4.2 V vs. Na/Na + at 0.1 C rate for 50 cycles. (d) Galvanostatic discharge curve for NFM cycled between 1.5-4.5 V vs. Na/Na + for the 2nd, 5th, and 10th cycles at 0.05 C rate (1 C = 260 mA/g).

Testing Data
The testing data were characterized into three sections.
(1) Figure 4b shows the chargedischarge profile curve of the NFM cathode cycled between 1.5-4.2 V vs. Na/Na + for different C-rate. (Note: The fifth cycle of each C-rate such as the 5th, 10th, 15th, 20th, and 25th cycles in Figure 4a). (2) Figure 4c shows the cycling data of the NFM cathode cycled between 1.5-4.2 V vs. Na/Na+ for 50 cycles at 0.1 C rate, out of which the 5th, 10th, 20th, and 50th cycles were used for testing the FNN model. (3) Figure 4d shows the charge and discharge voltage profile curves for the NFM cathode cycled between 1.5-4.5 V vs. Na/Na+ at a 0.05 C rate.

Results and Discussion
After training the FNN with the training data, as mentioned in Section 2.5.1, the FNN was tested for various test data sets fetched at different current rates, cycling data, and cutoff voltages, which were not part of the training data. The model was trained for 20 epochs, for a batch size of 32, a learning rate of 0.001, using a sigmoid activation function, using initialization technique as the Xavier uniform, and applying a dropout function to only the first hidden layer.
The computational speed for this training took a few minutes to train the model. Figure 5 shows the training and loss values as a function of epochs. Figure 6 shows the relationship between the different current rates for voltage and the state of charge. The true value of the SOC (solid line) was compared with the predicted SOC (dashed line) value for the tested data. The Accuracy between the true and estimated SOC values was measured using the R 2 value. Most of the report showed the graph for a time vs. SOC graph, as shown in Figure 6a, but it failed to give a more detailed information compared to Figure 6b. Thus, in this paper, most of the data was analyzed for voltage vs. SOC values. The R 2 value for the test data at 0.05 C (Figure 6b) showed 0.9960, which means 99.60% of the predicted value matched with the true SOC value. (Note: Though the R 2 value was above 99%, some positive values compensated for negative values, which failed to interpret the true accuracy. The true understanding of the SOC estimation is known when the Voltage vs. SOC graph was plotted for True and Estimated SOC values.) The Model mostly predicted the correct value of the SOC at the slope, with a slight variation at the plateau region between 40% and 85% of the SOC. At higher C-rates, the R 2 value decreased, for example, at 0.5 C (0.9874), 1 C (0.9747), and 2 C (0.9780), shown in Figure 6d-f. Figure 7 shows the graph of Voltage vs. SOC for the test data run at a 0.1 C rate for a different cycle (5th, 10th, 20th, and 50th), and the model performed much better in estimating the SOC value, having the R 2 value of~0.99-0.97. The battery degraded when it was cycled for a longer period because of crystal structure instability, an increase in the impedance by forming a passivation layer on the electrode, the decomposition of the electrolyte, and many more reasons, as explained in our previous paper [32]. However, the model overcame the degradation nature of the battery and estimated the SOC value for a higher cycle (50th cycle) with good accuracy, of 0.99, making the model more robust. Figure 8 shows the result of the estimated SOC value compared to the true value for the NFM cathode cycle at a higher cutoff voltage (4.5 V). The Model was never trained for the higher voltage, but it showed better accuracy, of~0.994 for the 2nd, 0.9916 for the 5th cycle, and 0.9840 for 10th cycle. The battery run at higher cutoff voltage showed a different performance in terms of capacity and stability of the crystal structure. The SOC value differed when it was cycled at a higher potential, but the model predicted the SOC value with good accuracy, of greater than 98%. In summary, the model estimated the SOC value with an accuracy of~0.98-~0.99 for a higher cutoff voltage (4.5 V), 0.99 for a higher cycle number, and 0.97-0.99 at different current rates of the test data.
Electrochem 2022, 3, FOR PEER REVIEW 11 The computational speed for this training took a few minutes to train the model. Figure 5 shows the training and loss values as a function of epochs. Figure 6 shows the relationship between the different current rates for voltage and the state of charge. The true value of the SOC (solid line) was compared with the predicted SOC (dashed line) value for the tested data. The Accuracy between the true and estimated SOC values was measured using the R 2 value. Most of the report showed the graph for a time vs. SOC graph, as shown in Figure 6a, but it failed to give a more detailed information compared to Figure  6b. Thus, in this paper, most of the data was analyzed for voltage vs. SOC values. The R 2 value for the test data at 0.05 C (Figure 6b) showed 0.9960, which means 99.60% of the predicted value matched with the true SOC value. (Note: Though the R 2 value was above 99%, some positive values compensated for negative values, which failed to interpret the true accuracy. The true understanding of the SOC estimation is known when the Voltage vs. SOC graph was plotted for True and Estimated SOC values.) The Model mostly predicted the correct value of the SOC at the slope, with a slight variation at the plateau region between 40% and 85% of the SOC. At higher C-rates, the R 2 value decreased, for example, at 0.5 C (0.9874), 1 C (0.9747), and 2 C (0.9780), shown in Figure 6d-f. Figure 7 shows the graph of Voltage vs. SOC for the test data run at a 0.1 C rate for a different cycle (5th, 10th, 20 th , and 50th), and the model performed much better in estimating the SOC value, having the R 2 value of ~0.99-0.97. The battery degraded when it was cycled for a longer period because of crystal structure instability, an increase in the impedance by forming a passivation layer on the electrode, the decomposition of the electrolyte, and many more reasons, as explained in our previous paper [32]. However, the model overcame the degradation nature of the battery and estimated the SOC value for a higher cycle (50th cycle) with good accuracy, of ~0.99, making the model more robust. Figure 8 shows the result of the estimated SOC value compared to the true value for the NFM cathode cycle at a higher cutoff voltage (4.5 V). The Model was never trained for the higher voltage, but it showed better accuracy, of ~0.994 for the 2nd, 0.9916 for the 5th cycle, and 0.9840 for 10th cycle. The battery run at higher cutoff voltage showed a different performance in terms of capacity and stability of the crystal structure. The SOC value differed when it was cycled at a higher potential, but the model predicted the SOC value with good accuracy, of greater than 98%. In summary, the model estimated the SOC value with an accuracy of ~0.98-0.99 for a higher cutoff voltage (4.5 V), ~0.99 for a higher cycle number, and 0.97-0.99 at different current rates of the test data.

Conclusions
In this paper, a two-layer feed forward neural network was designed with 15 neurons in each layer, and sigmoid was used as the activation function. The FNN mapped the measured battery signal voltage, current, and charge value directly to SOC and achieved the competitive estimation performance. The FNN can estimate the SOC for the highly non-linear nature of a Na-ion battery at different current rates (0.05 C, 0.1 C, 0.5 C, 1 C, 2 C), with R 2 value of ~0.97-0.99 and ~0.99 for higher cycle numbers, and a higher cutoff voltage of 4.5 V vs. Na + /Na. The FNN can self-learn its weight using a gradient descent technique called Adam, and it was trained for 20 number of epochs. The future work is to train this model on the dataset of various drive cycles such as Urban Dynamometer Driving Schedule (UDDS), the Highway Fuel Economy Driving Schedule (HWFET), the Unified Driving Schedule (LA92), and the Supplemental Federal Test Procedures for various temperatures.

Conclusions
In this paper, a two-layer feed forward neural network was designed with 15 neurons in each layer, and sigmoid was used as the activation function. The FNN mapped the measured battery signal voltage, current, and charge value directly to SOC and achieved the competitive estimation performance. The FNN can estimate the SOC for the highly non-linear nature of a Na-ion battery at different current rates (0.05 C, 0.1 C, 0.5 C, 1 C, 2 C), with R 2 value of~0.97-0.99 and~0.99 for higher cycle numbers, and a higher cutoff voltage of 4.5 V vs. Na + /Na. The FNN can self-learn its weight using a gradient descent technique called Adam, and it was trained for 20 number of epochs. The future work is to train this model on the dataset of various drive cycles such as Urban Dynamometer Driving Schedule (UDDS), the Highway Fuel Economy Driving Schedule (HWFET), the Unified Driving Schedule (LA92), and the Supplemental Federal Test Procedures for various temperatures.