Modelling and Computational Experiment to Obtain Optimized Neural Network for Battery Thermal Management Data

The focus of this work is to computationally obtain an optimized neural network (NN) model to predict battery average Nusselt number (Nuavg) data using four activations functions. The battery Nuavg is highly nonlinear as reported in the literature, which depends mainly on flow velocity, coolant type, heat generation, thermal conductivity, battery length to width ratio, and space between the parallel battery packs. Nuavg is modeled at first using only one hidden layer in the network (NN1). The neurons in NN1 are experimented from 1 to 10 with activation functions: Sigmoidal, Gaussian, Tanh, and Linear functions to get the optimized NN1. Similarly, deep NN (NND) was also analyzed with neurons and activations functions to find an optimized number of hidden layers to predict the Nuavg. RSME (root mean square error) and R-Squared (R2) is accessed to conclude the optimized NN model. From this computational experiment, it is found that NN1 and NND both accurately predict the battery data. Six neurons in the hidden layer for NN1 give the best predictions. Sigmoidal and Gaussian functions have provided the best results for the NN1 model. In NND, the optimized model is obtained at different hidden layers and neurons for each activation function. The Sigmoidal and Gaussian functions outperformed the Tanh and Linear functions in an NN1 model. The linear function, on the other hand, was unable to forecast the battery data adequately. The Gaussian and Linear functions outperformed the other two NN-operated functions in the NND model. Overall, the deep NN (NND) model predicted better than the single-layered NN (NN1) model for each activation function.


Introduction
In modern electric cars, the battery thermal management system (BTMS) is a prominent aspect in the efficient design and long life of a lithium-ion module. To carry out a comprehensive analysis of this system, a large number of experiments are necessary. Owing to the complex nature of BTMS and a wide range of operating conditions, experimental investigations are both expensive and time-consuming. Artificial neural network (ANN) techniques, on the other hand, can be used to forecast performance based on a set of operational conditions. In the recent past, ANN have been frequently employed by a number of researchers to represent complex and nonlinear BTM (battery thermal management) systems because of their general approximation abilities [1][2][3][4][5][6]. process, as evidenced by the reasonable agreement between the current model results and those of another standard model. Wen et al. [19] developed a physics-driven machine learning (PD-ML) algorithm to predict temperature, stress, and deformation behaviour of Li-metal. They found that the PD-ML model, when provided with minimal test data, can accurately predict the mechanical characteristics of Li-metal over a wide range of temperature and distortion rate, and can aid in the creation of a Li-metal deformation map. The PD-ML modelling approach was also integrated with the FEM (finite element method) model. In analyzing temperature, stress, and mechanical behaviour of Li-metal, the coupling of PD-ML with FEM gives the advantages of FE (finite element) analysis and the precision of PD-ML. An electrochemical model integrated with a neural network (ETNN) was developed by Feng et al. [20] to predict the SOC (state of charge) and state of temperature (SOT). Based on rigorous testing, they found that this model can reliably predict battery voltage and core temperature at room temperatures with a 10-C rate.
The multi-physics model can effectively predict temperature; however, it requires higher computational time and is hard to implement to the optimization processes. To address this problem, a DNN (deep NN) model trained using a small set of simulation data was applied to the BTMS by Park et al. [21]. They found that the DNN model produced an accurate estimate with a total error of less than 1%. By decreasing the computation time for the response of the multi-physics model, the DNN model provided a sufficient increase in system efficiency.
The discussed modelling of BTMS data using different algorithms indicates the importance of data prediction as the battery data depends upon a massive number of parameters. Thermal modelling is quite challenging as they are sentitive to change in operating factors. The variety of battery characteristics reported in the literature are highly sentitive and nonlinear which makes the predictions slightly complicated. Various algorithms such as SVR, ANN, LSTM, ANFIS, etc. are commonly reported in literature. These algorithms are sometimes combined to make a hybrid algrorithm to model the battery characteristics. However, no detailed investigation on NN with single and deep layered networks to find an optimized model for a particular set of data is missing. Hence, the prime objective is to computationally analyze NN models which are optimal in predictions using different activation functions available. This study reports the use of single and deep layered NN models exploring four activations functions' potential. The NN model has experimented with different numbers of neurons and hidden layers to obtain an optimized model with fewer predictions errors. The rest of the work starts with Section 2 where the modelling of the battery system is explained briefly. Section 3 is provided with in detail discussion of use of ANN modelling of the BTMS data. The results of single-layered NN and deep NN modelling using different activation functions is provided in detail in Section 4. The work is concluded with insights of the work in Section 5.

Battery Modelling
An effective numerical model is created to get insights into the thermal features of Li-ion battery packs and cooling systems by incorporating the continuity of heat flux at the junction. The physical model employed for the coupled analysis of battery packs is depicted in Figure 1. A battery unit, in general, is made up of a layer of battery module (Figure 1a) whose temperature goes up as heat is produced during charging and discharging. As seen in Figure 1b, the prismatic battery module features a rectangular shape that transfers heat into the nearby coolant. It is worth noting here that the flow and thermal fields in the computational domain are also symmetric about the y-axis of the battery plate, in addition to geometric symmetry. As a result, just the left or right half of the physical domain can be used for computing in order to save the computation cost and time. So only 50% of the physical system is included for investigation (Figure 1b), with very few assumptions. According to the cell zone [22], heat produced is thought to be uniform. In the literature, steady heat analysis has been widely published, in which the heat produced while continuous charging and discharging for a longer duration is unchanged. The fluid flow varies from 0.01 m/s to as high as 12 ms 1 in a laminar flow coolant region. An air pump/fan is used to circulate the air flow, which requires more energy constantly. Thermal impacts are limited to a 2D model to reduce processing time and prevent needless complexity in modelling a practical 3D model. Further details of CPU-time, or memory requirements for this simulation required, can be referred to in [23,24].
time. So only 50% of the physical system is included for investigation (Figure 1b), very few assumptions. According to the cell zone [22], heat produced is thought uniform. In the literature, steady heat analysis has been widely published, in whic heat produced while continuous charging and discharging for a longer durati unchanged. The fluid flow varies from 0.01 m/s to as high as 12 ms 1 in a laminar coolant region. An air pump/fan is used to circulate the air flow, which requires energy constantly. Thermal impacts are limited to a 2D model to reduce processing and prevent needless complexity in modelling a practical 3D model. Further deta CPU-time, or memory requirements for this simulation required, can be referred [23,24]. The Finite volume method is used to evaluate the cooling ability of the prop fluid-based battery cooling system. The governing equations describing the heat tran mechanism during discharging/charging a Li-ion battery pack.
The following extra approximations and assumptions are made in order to co the physical model of the battery element provided above into a mathematical mod (i) The battery element's substance is homogeneous and isotropic. (ii) The battery element's conductivity is temperature independent. (iii) The temperature gradient in the x-y plane is insignificant. (iv) The flow is 2D, laminar, and incompressible.
(v) The coolant is viscous and Newtonian. (vi) The coolant's thermophysical characteristics are constant.
Using these assumptions, the equations governing the steady, two-dimensiona conduction in the battery element with non-uniform heat generation can be obtain [22] (

) + =
To solve the energy conservation of the fluid domain using a SIMPLE algorithm continuity and momentum equations in 2-D considered are given below by Equation The Finite volume method is used to evaluate the cooling ability of the proposed fluid-based battery cooling system. The governing equations describing the heat transport mechanism during discharging/charging a Li-ion battery pack.
The following extra approximations and assumptions are made in order to convert the physical model of the battery element provided above into a mathematical model: (i) The battery element's substance is homogeneous and isotropic. (ii) The battery element's conductivity is temperature independent. (iii) The temperature gradient in the x-y plane is insignificant. (iv) The flow is 2D, laminar, and incompressible.
(v) The coolant is viscous and Newtonian. (vi) The coolant's thermophysical characteristics are constant.
Using these assumptions, the equations governing the steady, two-dimensional heat conduction in the battery element with non-uniform heat generation can be obtained as [22] (k r ∇ 2 T r ) + q m r = ρc To solve the energy conservation of the fluid domain using a SIMPLE algorithm, the continuity and momentum equations in 2-D considered are given below by Equations (2)-(4) [25], In the above Equations (1)-(4) in the vector form, the term r represents a battery, and m represents the coolant domain. T is for the temperature of fluid and battery, u is the velocity in the x direction, ρ, c represents density and heat capacity of the fluid. α is the thermal diffusivity, and t is the time. The detailed non-dimensionalization, application of boundary conditions, grid independence study, validation, and in detail procedure have been reported in literature recently [22,[25][26][27][28][29]. The input paramters that affect the conjugated thermal behaviour of the battery are shown in Figure 2, considered in this study for modelling the average Nusselt number. * ( * ∇ * ) = − 1 ∇ + ∇ * (3) * + ∇ * = ∇ * (4) In the above Equations (1)-(4) in the vector form, the term r represents a battery, and m represents the coolant domain. T is for the temperature of fluid and battery, u is the velocity in the x direction, , represents density and heat capacity of the fluid.
is the thermal diffusivity, and t is the time. The detailed non-dimensionalization, application of boundary conditions, grid independence study, validation, and in detail procedure have been reported in literature recently [22,[25][26][27][28][29]. The input paramters that affect the conjugated thermal behaviour of the battery are shown in Figure 2, considered in this study for modelling the average Nusselt number.   Figure 2a depicts the proposed NN model with a single hidden layer used in this study to predict the battery pack's temperature with the number of neurons experimented with from 1 to n. The six inputs to the NN and one output from it are also clearly shown. At first, a single layer model (NN1) is used, as shown in Figure 2a. In this NN1 model, the number of neurons is tested from 1 to 10. Then, as shown in Figure 2b the NN is analyzed for multiple deep hidden layers (NND) where the neurons in each layer are also varied   Figure 2a depicts the proposed NN model with a single hidden layer used in this study to predict the battery pack's temperature with the number of neurons experimented with from 1 to n. The six inputs to the NN and one output from it are also clearly shown. At first, a single layer model (NN 1 ) is used, as shown in Figure 2a. In this NN 1 model, the number of neurons is tested from 1 to 10. Then, as shown in Figure 2b the NN is analyzed for multiple deep hidden layers (NN D ) where the neurons in each layer are also varied simultaneously, and hidden layers are also varied from 3 to 10.

Artificial Neural Network (NN) Modelling
A hidden layer has a particular number of nodes in addition to the input and output layers. Each neuron's inputs are weighted according to its edge weight and the weighted inputs are added together. Each neuron also has a bias added to it. An activation function 'f ' maps the whole sum of weighted inputs and the bias non-linearly to the neuron output. In this study, the activation function employed in the hidden layer is the sigmoid function: tanh, Gaussian, and linear function [30]; however, the sigmoid function produced the optimum results for this application. A simple linear function is used in the output layer to modify the weights: direction close to the error minimum. The Levenberg-Marquardt (LM) Algorithm [31], which is a mixture of gradient descent and the Newton technique, is used in the present distribution as this method is the one that converges the fastest.
An artificial neural network model, as depicted in Figure 2, is made up of nodes/neurons that are organised into many layers: an input layer, one or more hidden layers, and an output layer. Each neuron has an activation function that calculates how much stimulation is applied to each neuron. The input variables are transformed at each layer by collecting neurons, which are then disseminated to its next layer, defined by Equations (5)- (7).
where x is the 1st layer's input; A is the 1st layer's output; p and q are the NN node indexes; n is the layer's index; m pq (l) is the weight between the qth node in the 1st layer and the pth node in the input layer; b is the output layer's input; F(b p n ) is the output value of pth node in (n + 1)th layer after activation.
The weight and bias in between the synapses, which evaluate the relevance of the data transmitted across the link, are represented by m and m 0 , respectively. F(b) is the activation function, which uses the combined output of the hidden layer to produce the output y.
The starting weights and biases are allocated at random, and the training cycle repeats till the required result is achieved, as determined by Equation (8) of the cost function.
where o is the intended output, E(m) denotes the cost function utilized to assess the training operation, m denotes the weight, and i denotes the cost function calculation index. The weight and bias of the NN model are modified throughout the training procedure, as shown in Equation (9), to reduce the error.
where J = ∂E/∂m is the Jacobian matrix connected to m, I is the identity matrix, µ is the combining coefficient, and e is the prediction error. The forward calculation via Equation (10) is the 1st step in the LM method. Equation (10) is used to determine the output and hidden layer forecast errors.
The Jacobian is determined using a backpropagation technique, as indicated by the following Equation (11).
The deep learning model is a NN model having deep hidden layers, as shown in Figure 2b. As explained earlier, the fundamental foundation is the same as the NN model, which is based on the brain's operation and biological architecture, allowing machines to reach human-like intelligence. The most basic kind of NN D is hierarchical synapses that send information to adjacent neurons based on input, building a complicated system that learns from feedback. The input data is supplied into the 1st layer's non-hidden synapses, and the outcome of this layer is fed as an input for the next phase, and so on, till the outcome is attained. The outcome is expressed as a probability, resulting in a forecast of either "Yes" or "No". Each layer's synapses calculate a small function called an "activation function" that aids in transmitting the relevant neurons. Weights link the synapses of 2 sequential layers together. These weights govern the feature significance in forecasting the targeted value. The NN D with multiple hidden layers is shown in Figure 2b. The weights are initially random, but adjusted iteratively as the model is trained to learn and anticipate the outcome.
As shown in Figure 2, six operating parameters: flow velocity, coolant type, heat generation, thermal conductivity, battery length and width ratio, and space between the parallel battery packs were set as input parameters of the neural network models. The output parameter of the two neural network models (NN 1 and NN D ) was set to the average Nusselt number. The training data is derived entirely from numerical experiments. For the training procedure, a total of 1750 sets of numerical data were gathered. Each model has at least 3 layers and is designed to have some activation function during the training phase. A hidden layer follows each input layer, which is comprised of input variables. Each of the input layers has an effect on the output parameter prediction. The number of nodes in the input layer indicates how many input variables were used in the model. The hidden levels feed into another aggregation layer, which mixes the findings from the preceding layers and feeds them to the output layer, resulting in a continuous result. Several nodes/neurons in the hidden layer were tested to attain the desired regression efficiency to reduce overfitting and underfitting.
A total of 85% of sets of numerical data were used for training and the remaining 15% of sets of numerical data were used for testing. The neural network regression training process was given 1000 epochs. For a more accurate prediction of the output, the root mean square error (RMSE) must be as low as possible. In Table 1, the four activation functions used for the NN 1 and NN D models are provided. Table 1. Activation functions used in this work.

Activation Function Equation Graph
Sigmoid F(q) = 1 1+exp −kq continuous result. Several nodes/neurons in the hidden layer were tested to attain the desired regression efficiency to reduce overfitting and underfitting.
A total of 85% of sets of numerical data were used for training and the remaining 15% of sets of numerical data were used for testing. The neural network regression training process was given 1000 epochs. For a more accurate prediction of the output, the root mean square error (RMSE) must be as low as possible. In Table 1, the four activation functions used for the NN1 and NND models are provided.

Activation Function Equation Graph
Sigmoid desired regression efficiency to reduce overfitting and underfitting. A total of 85% of sets of numerical data were used for training and the remaining 15% of sets of numerical data were used for testing. The neural network regression training process was given 1000 epochs. For a more accurate prediction of the output, the root mean square error (RMSE) must be as low as possible. In Table 1, the four activation functions used for the NN1 and NND models are provided.

Activation Function Equation Graph
Sigmoid desired regression efficiency to reduce overfitting and underfitting. A total of 85% of sets of numerical data were used for training and the remaining 15% of sets of numerical data were used for testing. The neural network regression training process was given 1000 epochs. For a more accurate prediction of the output, the root mean square error (RMSE) must be as low as possible. In Table 1, the four activation functions used for the NN1 and NND models are provided.

Activation Function Equation Graph
Sigmoid

Results and Discussion
The modelling of Nuavg, which indicates the heat transfer from the conjugated battery (generating heat) surface, is shown and discussed in detail. A neural network (NN) with a single hidden layer (NN1) is analyzed thoroughly initially. The RMSE and R-squared are investigated for this NN1 to get an optimized model that predicts the battery Nuavg by experimenting with several neurons in the hidden layer having different activation functions. Later NND (deep NN) model is also analyzed with different hidden layers and neurons in each layer to obtain the optimized network for different activation functions. As reported in a few works, the Nuavg data pertinent to battery thermal management is highly nonlinear [8,22,25,28,32,33]. The entire data is sorted carefully and compiled to enable the intelligent algorithms to predict an essential aspect of battery systems. The optimized model is obtained in this work when the RMSE and R-squared values are at their best. These values are taken from the average of 5 computational cycles for each output, which counts for massive calculations and data generations. However, selected data is presented in this section to avoid too much length in this article.

NN1 Modelling of Nuavg
At first, the convergence of RMSE with increasing iterations is demonstrated for the NN1 model during training and testing for different activation functions. In Figure 3 the epochs which represent each iteration of data passed for computation by the algorithm is set equal to 1000 that leads to converged RMSE for all cases, with the NN1 model having six neurons in the hidden layer are shown. Compared to training sessions, the testing RMSE is converged at higher values. Except for the linear function, the remaining three

Results and Discussion
The modelling of Nu avg , which indicates the heat transfer from the conjugated battery (generating heat) surface, is shown and discussed in detail. A neural network (NN) with a single hidden layer (NN 1 ) is analyzed thoroughly initially. The RMSE and R-squared are investigated for this NN 1 to get an optimized model that predicts the battery Nu avg by experimenting with several neurons in the hidden layer having different activation functions. Later NN D (deep NN) model is also analyzed with different hidden layers and neurons in each layer to obtain the optimized network for different activation functions. As reported in a few works, the Nu avg data pertinent to battery thermal management is highly nonlinear [8,22,25,28,32,33]. The entire data is sorted carefully and compiled to enable the intelligent algorithms to predict an essential aspect of battery systems. The optimized model is obtained in this work when the RMSE and R-squared values are at their best. These values are taken from the average of 5 computational cycles for each output, which counts for massive calculations and data generations. However, selected data is presented in this section to avoid too much length in this article.

NN 1 Modelling of Nu avg
At first, the convergence of RMSE with increasing iterations is demonstrated for the NN 1 model during training and testing for different activation functions. In Figure 3 the epochs which represent each iteration of data passed for computation by the algorithm is set equal to 1000 that leads to converged RMSE for all cases, with the NN 1 model having six neurons in the hidden layer are shown. Compared to training sessions, the testing RMSE is converged at higher values. Except for the linear function, the remaining three functions converge more or less at the same RMSE while the linear function gives a higher RMSE value. The Linear testing functions have converged at higher values compared to the rest of all scenarios. Between Sigmoidal and Gaussian, the Sigmoidal function did not provide much better testing convergence than the Gaussian function. However, the difference between RMSE of the testing of these two is not very high.  In Figure 4, the RMSE of the trained and tested NN 1 model after completing more than 1000 epochs with increasing neurons in the hidden layer is shown. All the activations are employed to get the optimized model. From Figure 4a it is seen that the RMSE for Sigmoidal function reduces when neurons are increased from 1 to 6 in the hidden layer. From 6 neurons onwards, the RMSE for Sigmoidal function shows slight increments. For Tanh function, also similar trend can be noted. The Gaussian function has also given a comparatively less RMSE at six neurons in the hidden layer; The linear function has shown a higher and consistent error at all neurons. Hence, with six neurons in the hidden layer, the model training is optimized. Very similar behaviour is also found for the tested model (Figure 4b).
The rest of the computations are provided using the above optimized NN 1 model with 6 neurons in the hidden layer. The Sigmoidal activation function in the hidden layer is trained for the battery Nu avg data, whose training and predictions are depicted in Figure 5. The network predictions during both sessions show that the values are close to the actual (true) values as the data is denser towards the centre trendline. A slight outlier in the testing region can be noted because the R 2 and RMSE are slightly more significant than the training provided values. As the R 2 value is above 0.9, the modelling can be successful and capable of predicting well. The RSME value for training is 0.029, better than the RMSE of testing (0.04).   The rest of the computations are provided using the above optimized NN1 model with 6 neurons in the hidden layer. The Sigmoidal activation function in the hidden layer is trained for the battery Nuavg data, whose training and predictions are depicted in Figure  5. The network predictions during both sessions show that the values are close to the actual (true) values as the data is denser towards the centre trendline. A slight outlier in the testing region can be noted because the R 2 and RMSE are slightly more significant than the training provided values. As the R 2 value is above 0.9, the modelling can be successful and capable of predicting well. The RSME value for training is 0.029, better than the RMSE of testing (0.04). The optimized NN1 model with six neurons using tanh as the activation function is used for training the data. It is noted from Figure 6a,b that the network calculated battery output data is in line with the actual battery values. The highly sensitive Nuavg has significant outliers, as indicated in Figure 6b, slightly underfitting with the NN1 network. The trained computations are accurate as the R 2 and RMSE are above 0.9 and close to zero, respectively. However, during testing, this model has slightly under-predicted the data because the orientation towards the centerline is not seen. Using the Gaussian function for has significant outliers, as indicated in Figure 6b, slightly underfitting with the NN 1 network. The trained computations are accurate as the R 2 and RMSE are above 0.9 and close to zero, respectively. However, during testing, this model has slightly under-predicted the data because the orientation towards the centerline is not seen. Using the Gaussian function for NN 1 model calculations has shown some improved predictions than the Tanh function calculations. Figure 7a,b provides an accessible analysis of the network predictions compared to the actual battery values. The 45 • line drawn for the mapping of both the data indicates that the NN 1 model with Gaussian function is excellent in battery Nu avg data. The training is in line with the Tanh function, but the testing is far better than the Tanh function as the R 2 is above 0.9 during both sessions. In Figure 8, the NN 1 model calculations using the Linear activation function are depicted. Both Figures 8a and 9b clearly show that the Linear function cannot help the neural model to depict the sensitive data of the battery system based on thermal parameters. Figure 8a shows how the Nu avg values are out of bounds in the training session leading to R 2 and RMSE to just 0.68 and 0.099, respectively, which is unacceptable. The predictions in testing also have high errors, as shown in Figure     The overall trend of actual battery Nuavg data from four activation functions applied to the optimized NN1 model is given in Figure 9a,b for training and predictions, respectively. The actual trend in the black line is almost overlapped with the Sigmoidal and Gaussian function NN1 model. The Tanh function has some difference from the other two, followed by the Linear function, which has shown a more significant difference than others. Figure 9b

NND Modelling of Nuavg
The RMSE variations for different deep layers varying from 3 to 10 in the backpropagating NN model are shown in Figure 10a,b. The neurons in each deep layer are simultaneously varied from 4 to 10 for each activation function. In brief, analyses of RMSE from NND applying the Sigmoidal activation function is illustrated for the shortness of the article. R 2 values are also avoided for the same. However, optimized values of R 2 are The overall trend of actual battery Nu avg data from four activation functions applied to the optimized NN 1 model is given in Figure 9a,b for training and predictions, respec- tively. The actual trend in the black line is almost overlapped with the Sigmoidal and Gaussian function NN 1 model. The Tanh function has some difference from the other two, followed by the Linear function, which has shown a more significant difference than others. Figure 9b illustrates the predictions made by the trained NN 1 model where the actual values and Sigmoidal and Gaussian are very close to the Tanh and Linear function data points.

NN D Modelling of Nu avg
The RMSE variations for different deep layers varying from 3 to 10 in the backpropagating NN model are shown in Figure 10a,b. The neurons in each deep layer are simultaneously varied from 4 to 10 for each activation function. In brief, analyses of RMSE from NN D applying the Sigmoidal activation function is illustrated for the shortness of the article. R 2 values are also avoided for the same. However, optimized values of R 2 are directly shown in the optimized model (in the graphs) achieved after experimentations. From Figure 10a,b, the RMSE from the trained model shows that the six neurons in each six-layered deep NN model are the best from the entire set of computations. This is also confirmed with the R 2 values which are added in the forthcoming graphs as they are the optimal value. Though nine neurons have shown some promising results, it is not considered the best because its value is too fluctuating for different hidden layers, and the computational cost is also involved. From the similar set of experiments for Tanh, Gaussian, and Linear functions, it is concluded that 8-neurons in 4-deep layers each, 6-neurons in 5-deep layers each, and 7-neurons in 3-deep layers each respectively provided the best RMSE and R 2 .
The application of the Sigmoidal function for deep NN (NN D ) to train and predict the Nu avg data is shown in Figure 11. The least (optimized) R 2 and RMSE value was obtained for NN with six deep hidden layers having six neurons each. The respective R 2 and RMSE are 0.97 and 0.018 obtained from the trained model and 0.93 and 0.045 for the tested model. This deep model has proved to be much more efficient than the earlier model accuracy as the error is less and the mapping with actual value is much more closer. The training is more accurate while the tested model gives slightly more error, which is highly acceptable but may concern the difference in terms of percentage. This is again tested for other functions, which shows much less difference in the error between the training and testing model. Figure 12a,b shows the deep network predictions during the training/testing sessions using Tanh as the activation functions. The optimized error was obtained for this function applied to NN model having four hidden layers with eight neurons each. The R 2 and RMSE are 0.98 and 0.013 from this optimized model, which matches the actual values excellently. The training R 2 and RMSE are 0.91 and 0.048 during the testing, which is equally good as Sigmoidal function operated NN D .
functions, which shows much less difference in the error between the training and testing model. Figure 12a,b shows the deep network predictions during the training/testing sessions using Tanh as the activation functions. The optimized error was obtained for this function applied to NN model having four hidden layers with eight neurons each. The R 2 and RMSE are 0.98 and 0.013 from this optimized model, which matches the actual values excellently. The training R 2 and RMSE are 0.91 and 0.048 during the testing, which is equally good as Sigmoidal function operated NND.   The Gaussian function operated deep NN model is optimal with six neurons each in 5 hidden layers. The R 2 and RMSE are 0.97 and 0.023 from the output of trained model numerals, while in the testing R 2 and RMSE are 0.94 and 0.036, as shown in Figure 13. This has indicated that the Gaussian function is far better than the previous two models. Another interesting point to note is the less difference obtained in the trained and tested R 2 and RMSE value which was not obtained from many previous regressions. Next, unexpectedly, the Linear function, which was not accepted in the single NN model earlier, has provided better results than the Sigmoidal and Tanh function. Figure 14 depicts the same for the Linear function having near R 2 and RMSE values as Gaussian function.  The Gaussian function operated deep NN model is optimal with six neurons each in 5 hidden layers. The R 2 and RMSE are 0.97 and 0.023 from the output of trained model numerals, while in the testing R 2 and RMSE are 0.94 and 0.036, as shown in Figure 13. This has indicated that the Gaussian function is far better than the previous two models. Another interesting point to note is the less difference obtained in the trained and tested R 2 and RMSE value which was not obtained from many previous regressions. Next, unexpectedly, the Linear function, which was not accepted in the single NN model earlier, has provided better results than the Sigmoidal and Tanh function. Figure 14 depicts the same for the Linear function having near R 2 and RMSE values as Gaussian function.    In Figure 15, a combined trend of a deep NN model optimized with their neurons in hidden layers as mentioned earlier and the actual values are shown. The closeness between actual values and different activation functions is indicated for training and testing. The empty circles represent the trained values, and empty squares are for tested values. The scattered data points indicate that the actual Nu avg and predictions from the optimized models which are close enough for the forecasting. Though all are acceptable, except Tanh function, the rest have provided very close predictions and are highly useful in this enormously sensitive data of battery thermal system.
between actual values and different activation functions is indicated for training and testing. The empty circles represent the trained values, and empty squares are for tested values. The scattered data points indicate that the actual Nuavg and predictions from the optimized models which are close enough for the forecasting. Though all are acceptable, except Tanh function, the rest have provided very close predictions and are highly useful in this enormously sensitive data of battery thermal system.

Conclusions
Battery thermal management is crucial in the high capacity and longer life of Li-ion modules; careful modelling of heat transfer phenomena is required. An attempt was made to model the highly sensitive battery data to predict the heat transfer character, namely,

Conclusions
Battery thermal management is crucial in the high capacity and longer life of Li-ion modules; careful modelling of heat transfer phenomena is required. An attempt was made to model the highly sensitive battery data to predict the heat transfer character, namely, average Nusselt number (Nu avg ) depending on six operating parameters, including the battery heat generation, coolant properties, and battery spacing. A single layer and deep layer NN model with different activation functions was employed. A massive computational experiment (around 800 cycles of computations) was performed to obtain the optimized NN models for each activation function. The optimized NN model was obtained with 6 neurons in the hidden layer for all activation functions in a single layer. At the same time, the optimized model was in a deep NN model with 6 hidden layers having 6 neurons each for Sigmoidal. For Tanh, 8-neurons in 4-deep layers each, for Gaussian; 6-neurons in 5-deep layers each, and for Linear function; 7-neurons in 3-deep layers each was the optimized model. The RMSE and R 2 were accessed for all the models during the training and testing of the networks. In a single NN model, the Sigmoidal and Gaussian function outperformed the Tanh and Linear functions. Linear function, however, failed to predict the battery data sufficiently. In the deep NN model, the gaussian and Linear functions outperformed the other two functions operated NN. Overall, deep NN provided a much better prediction than the single-layer NN model.