Estimating the Heat Capacity of Non-Newtonian Ionanofluid Systems Using ANN, ANFIS, and SGB Tree Algorithms

This work investigated the capability of multilayer perceptron artificial neural network (MLP–ANN), stochastic gradient boosting (SGB) tree, radial basis function artificial neural network (RBF–ANN), and adaptive neuro-fuzzy inference system (ANFIS) models to determine the heat capacity (Cp) of ionanofluids in terms of the nanoparticle concentration (x) and the critical temperature (Tc), operational temperature (T), acentric factor (ω), and molecular weight (Mw) of pure ionic liquids (ILs). To this end, a comprehensive database of literature reviews was searched. The results of the SGB model were more satisfactory than the other models. Furthermore, an analysis was done to determine the outlying bad data points. It showed that most of the experimental data points were located in a reliable zone for the development of the model. The mean squared error and R2 were 0.00249 and 0.987, 0.0132 and 0.9434, 0.0320 and 0.8754, and 0.0201 and 0.9204 for the SGB, MLP–ANN, ANFIS, and RBF–ANN, respectively. According to this study, the ability of SGB for estimating the Cp of ionanofluids was shown to be greater than other models. By eliminating the need for conducting costly and time-consuming experiments, the SGB strategy showed its superiority compared with experimental measurements. Furthermore, the SGB displayed great generalizability because of the stochastic element. Therefore, it can be highly applicable to unseen conditions. Furthermore, it can help chemical engineers and chemists by providing a model with low parameters that yields satisfactory results for estimating the Cp of ionanofluids. Additionally, the sensitivity analysis showed that Cp is directly related to T, Mw, and Tc, and has an inverse relation with ω and x. Mw and Tc had the highest impact and ω had the lowest impact on Cp.


Introduction
The efficiency of conventional heat-transfer fluids is augmented by using nanofluids as a novel technique [1][2][3][4][5]. During the history of heat-transfer development, due to the interesting thermal properties of nanofluids, they have been utilized in the production of heat-transfer fluids [6,7].
Ionic liquids (ILs) are a novel group of salt-like materials that are liquid under ambient conditions [8] and have great potential for use in different industries. Although their transport properties have been widely investigated in recent years, investigations of the thermal properties are very limited in the literature [9][10][11][12]. Because of their excellent thermal properties, such as high thermal conductivity (TC) and high heat capacity (C p ), they are recognized as an appealing candidate for heat transfer applications [13,14].
In light of previous evidence, it is important to investigate the benefit of utilizing ILs augmented with nanoparticles. By adding only a small amount of nanoparticles into pure ILs, the thermophysical properties of ILs will be enhanced [9,[15][16][17][18]. A similar idea has been observed in the chemical-enhanced oil recovery methods, which increase the amount of recovered oil by adding a very small amount of nanoparticles to the injected water [19]. This group of nanofluids with ILs as the base fluids that show higher thermophysical properties is called ionanofluids or nanoparticle-enhanced ILs (NEILs) [16].
Paul et al. [20,21] carried out an experimental study and observed an increase in C p of up to 49% compared with the base ILs when Al 2 O 3 nanoparticles were added to pyrrolidinium-and imidazolium-based NEILs. Furthermore, silver nanofluid heat transfer through a tube with twisted tape inserts was tested by Waghole et al. [22]. They concluded that the heat transfer rate was enhanced by dispersing nanoparticles through water. By adding multi-walled carbon nanotubes (MWCNTs) to different ionanofluids and measuring their thermophysical properties, Nieto de Castro et al. concluded that the TC and C p of ionanofluids were enhanced by ≈8% and ≈9%, respectively [16,23].
What we know about the thermophysical properties of NEILs is largely based upon empirical studies, and these data are controversial regarding their accuracy; furthermore, there is little empirical data available in the literature. Therefore, the determination of the thermophysical properties of NEILs by utilizing theoretical methods is a necessary endeavor.
Computational investigations in particular have become vitally important [24][25][26][27][28][29][30][31][32][33], where many researchers have widely implemented intelligent methods based on neuro-fuzzy neural networks to model many engineering processes [6,[34][35][36][37][38][39] and predict the thermophysical properties of different nanofluids [40][41][42]. In 2013, Salehi et al. [43] used an interesting adaptive neuro-fuzzy inference system (ANFIS) modeling technique in a study that set out to predict the heat transfer coefficient of a nanofluid containing Al 2 O 3 under a uniform heat flux condition. Other authors have questioned the usefulness of such an approach. Mehrabi et al. [44] established a genetic algorithm-polynomial neural network and a fuzzy C-means (FCM) based neuro-fuzzy inference system to determine the TC ratio of Al 2 O 3 -based nanofluids based on the concentration and size of the nanoparticles, as well as the temperature. Golzar et al. [45] used artificial neural network (ANN) methods and the approximation of general function for the calculation of the thermophysical properties of quaternary ammonium-based ILs in terms of the critical temperature of the ILs and the water content. Soriano et al. determined the refractive index of binary solutions of IL systems based on ANN algorithms [46]. Lashkarblooki et al. calculated the viscosity of ILs by employing boiling temperatures based on ANN algorithms [47]. After that, Hezave et al. employed an ANN to determine the electrical conductivity of a ternary mixture of ILs [48].
Friedman suggested a robust decision tree algorithm named stochastic gradient boosting (SGB), which can be applied in estimation and classification problems. This strategy has shown wide applications as one of the more powerful schemes for different purposes [49][50][51][52][53][54][55].
The advantages of boosting (which combines several models) and regression trees are used simultaneously in an SGB tree. A regression tree is created using small incremental changes in the loss function from the previous tree. One of the improvements in the model is the ability to assemble a tree based on a randomly selected data subset. Furthermore, the accuracy of the estimation is maximized and overfitting is minimized via the utilization of a small number of training data points. Moreover, this algorithm reduces the demands regarding the transformation of the input or the selection of features, which is an advantage when performing in a high dimensional space. Furthermore, this approach has several other beneficial features, such as [49][50][51]56]: • SGB-based methods display better prediction performance than competing composite-tree models, including boosting or bagging by applying other approaches, such as AdaBoost. • SGB-based methods are easy to create. • SGB-based methods can use a large number of predictor parameters. • SGB-based methods are developed quickly (100 times faster than neural networks for some problems). • SGB-based methods are immune to outliers. • SGB-based methods perform acceptably when solving regression and classification problems. • SGB-based methods display an equivalent or better predictive ability than neural networks • Unrelated predictor parameters can be detected automatically such that they do not affect the estimating model.

•
Randomization of the elements guards against overfitting in this method.
This current study aimed to investigate these debates through an examination and prediction of the C p of ionanofluids as a function of the nanoparticle concentration (x) and the operational temperature (T), molecular weight (M w ), acentric factor (ω), and critical temperature (T c ) of pure ILs using a group method of data handling techniques. We evaluated our forecasting model by comparing it with experimental data using three accuracy measurements: R 2 , mean relative error (MRE%), and the mean squared error (MSE).

Data Preparation
In this work, the predictive capability of four groups of intelligence models was evaluated regarding their ability to estimate the C p of some NEILs. For this aim, MATLAB 2014 (version 2014, MathWorks, Natick, MA, USA) was used. The characterization of the input and output parameters of the models was the next step after the data preparation. Concerning this, the C p of the ionanofluids was taken to be the output, while the nanoparticle concentration (x) and the T, M w , ω, and T c of pure ILs and were chosen as the five input parameters. The first category, namely, the training set, contained 429 data points. The remaining 142 data points (i.e., 25% of the whole dataset) were employed to test the proficiency of the algorithms. At first, all of the data were normalized in the [−1, 1] interval: where, D, D N , D max , and D min are the actual, normalized, maximum, and minimum data points, respectively.

Theory of an ANN
An ANN, which is a kind of intelligence model, is proficient at adapting to changes in the environment, learning from experience, and improving its performance [57][58][59]. An ANN is composed of neurons. Multi-layer perceptron (MLP) and radial basis function (RBF) networks are two common types of ANNs.
A common MLP structure has the following steps: (1) an input layer, (2) one or more hidden layers, and (3) an output layer. There are some elements (or neurons) in each layer. By using optimization algorithms, the number of neurons used in the hidden layer should be determined [60].
A kind of feed-forward net is sketched using iterative localized basis functions, and the function approximation algorithms are RBF-ANNs. RBF-ANNs and MLP-ANNs differ not only in design but also in their responses to patterns; with more details, simple design, and very precise response to patterns that are not used for training, RBF-ANNs have an advantage over MLP-ANNs [61]. Because the RBF-ANN training process is much faster than for an MLP-ANN and the structure of an RBF-ANN is simpler, it is an acclaimed alternative to an MLP-ANN [62]. The RBF-ANN structure contains three layers: (1) an input layer, (2) a hidden layer with a non-linear active RBF function, and (3) an output layer. The following equation shows the RBF-ANN output: where x represents an input pattern, y i (x) denotes the ith output, w ki refers to the connection weight between the kth hidden unit and the ith output, || || represents the Euclidean norm, and c k represents the center of the kth hidden unit. The Gaussian function was chosen as the RBF (ϕ), where the Gaussian function is presented below: The center (c) and radius (r) are known as Gaussian parameters. The offered MLP-ANN approach utilizes the log-sigmoid transfer function (logsig) and linear transfer function (purelin) in its hidden and output layers, respectively.

Theory of an ANFIS
An ANFIS usually contains five layers. Jang introduced the method in 1997 [63]. A common ANFIS strategy is training using optimization methods. ANFIS employs the capabilities of fuzzy logic and neural network methods. Algorithms such as particle swarm optimization (PSO) and genetic algorithm (GA) can be employed in the ANFIS to determine the optimal model [64][65][66][67].
The structure of an ANFIS is shown in Figure 1, where two inputs (x,y) and one output ( f out ) are shown. Accordingly, the first layer can be defined for node i as [63]: By using a membership function (with a range covering the interval (0,1)), all nodes will be parameterized. We used the Gaussian function (given below) as a membership function in the ANFIS approach [63,68]: where σ and C are the parameters of the Gaussian function. Some weighted terms and constant nodes are in the second layer [69]: The average values of the weights are calculated using the following formula in the third layer: Each average value of the weight is multiplied by its associated function in the fourth layer: where p i , r i , and q i are the resulting parameters.
Eventually, by summing the previous outputs in the last layer, the output is calculated as follows: According to the ten clusters proposed in the ANFIS, 120 membership function (MF) parameters should be optimally determined, where the number 120 reached by multiplying the number of clusters (in this work, 10), the number of parameters of the membership function (in this work, 2), and the sum of the input and output parameters (in this work, 6). The PSO was used to optimize the ANFIS model.

Theory of the PSO
PSO is a population-based algorithm that begins with random solutions that are named particles [70]. Eberhart and Kennedy first introduced the concept, which was based on the group behavior of birds and used it for the optimization of continuous non-linear functions [71]. PSO shares many similarities with other evolutionary, population-based algorithms. In an optimization problem, each particle can be considered as a solution. The first stage of the optimization process is the random distribution of particles during the search. The personal best (p best ) and global best (g best ) values are the optimal solution that has a particle and the optimal solution calculated by the swarm, along with its particle index, respectively. Therefore, the particle velocity (in the next step) can be calculated using p best (cognitive component), g best (social component), and the current particle velocity. The cognitive and social components are both randomly selected [72]. The pth particle is introduced by: where X pi stands for the ith data point in the D-dimensional space. G = {g 1 , g 2 , . . . , g D } and p p = {p p1 , p p2 , . . . , p pD } ss represent the best position among all particles and the acceptable performance for the pth particle, respectively. The particle velocity is represented by The particle position changes due to its speed, and it is updated in each iteration. The particle moves based on its speed, which is calculated as follows: The new position is computed as follows: The factor ω represents the inertia weight. The positive constants C 1 and C 2 are denoted as learning factors and help particles to move toward a more appropriate space to reach a better solution. PSO updates the inertia weight for each iteration using [73]: where ω iter denotes the inertia weight and iter max represents the maximum number of iterations. According to experimental reports, the values of ω max = 0.4 and ω max = 0.9 are apt [72].

Stochastic Gradient Boosting
Boosting is known as an approach for enhancing the degree of precision in an estimating tool, which is carried out by repeatedly implementing the function as a series to provide a combination of outputs and weights to control the overall error of estimation. This approach has become one of the newest and most powerful learning algorithms, which are applicable in regression and classification problems [74].
A new form of function approximation and statistical learning is Friedman's SGB approach, which comes from implementing boosting in regression trees [49]. In this method, a series of relatively simple trees is determined, where each successive tree is created based on estimation residuals from the previous tree. The main source of complexity in these trees comes from two child nodes and a root node. In the SGB approach, the optimal data partitioning is calculated based on a step-by-step process; after that, the residuals of each partition are determined. Fitting a three-node tree on those residuals is the next stage for finding a new partition that will reduce the variance of the residual of the data from the aforementioned sequence of trees. According to the typical form of classification for the tree, the classification of observation data is completed, and each tree constructed is summed using this process. The combined result is used to reduce the sensitivity of this algorithm for the suspected datasets. Ensemble learning methods come from machine learning and data mining methods, which combine estimations from some algorithms through bagging, boosting, and related methods. These approaches can be formulated as follows [75]: where K and f k (x) are the ensemble size and base learner, respectively, for inputs x in the training dataset. In this equation, F(x) gives the ensemble estimation, which is determined via the linear combinations of estimations. Boosting approximates can be determined using an additive expansion form of the previous equation: where g(x; a k ) is chosen as a simple function in terms of x and a. A forward-stage approach is employed to forecast the parameters a k and the expansion coefficient of β k for the training dataset. The first stage in this process is the initial prediction F 0 (x), with the other k values to follow: L is Huber's function: Gradient boosting is used to solve Equation (16). First, a least-squares criterion is used to fit the base learning function to the pseudo-residuals: The best coefficient is determined as follows: According to this result, a hard optimization problem is changed into an easier problem by using one least-square, single-parameter optimization and the general form of a loss criterion. In the SGB algorithm, there is a focus on solving the problem using an observation that is located around the decision boundaries, which is expressed by performing the boosting operation of the model [49]. During the boosting process, it is possible to correct for the observation using an individual tree that is near to another class [76].

Data Preparation
To apply the above methods and construct an accurate model, in the first stage, a set of experimental data points for the network training is introduced. A review of empirical studies has been performed, and the C p values of some NEILs with wide ranges of nanoparticle concentrations and temperatures were found [20,21,[77][78][79]. The total collected experimental dataset (571) was randomly split into two parts: training (75%) and testing (25%) subsets. The first part was required to calculate the model parameters for creating the best network and the testing section was used to validate the predictive power and performance of each model. The characteristics of the studied ILs with added nanoparticles under different temperatures are summarized in Table 1. Furthermore, the M w , ω, and T c of pure ILs, which are the model parameters, are given in Table S1 [80,81].

Model Evaluation Parameters
The efficiency of the aforesaid tools was evaluated using statistical variables, namely, the MSE, MRE, R 2 , the standard deviation (STD), and the root of MSE (RMSE) between the empirical and predicted values. These analyses are presented in Equations (22)-(26), respectively: In these relationships, X actual i , N, and X actual i show the actual variable, the number of data points, and the output of the network, respectively. Furthermore, X actual is the average of the actual points.

Results and Discussion
The offered MLP-ANN approach utilized the log-sigmoid transfer function (logsig) and linear transfer function (purelin) in its hidden and output layers, respectively. The aim was to optimize the hidden layer neurons. This was done using trial and error. Twelve neurons were optimally available in the hidden layer of the MLP-ANN tool. Figure 2 illustrates the MLP-ANN model's performance when using the training data over several iterations. The MLP-ANN was trained using the Levenberg-Marquardt (LM) algorithm. Table S2 shows the optimal values of the MLP-ANN structure. Moreover, in the hidden layers of the RBF-ANN model, the RBF was used. Based on previous results, for the ANN, the number of hidden layer neurons is chosen to be less than one-tenth of the total number of training data points [82]. Ergo, it was assumed that the number of hidden layers of this algorithm was one-tenth of the total number of data points used for training. Figure 3 shows the LM algorithm based on the MSE of various iterations for the RBF-ANN. In Figure S1, the obtained membership functions are shown for the ANFIS model. This figure emphasizes that all of the data (output and input) were normalized in the range [−1, 1]. Figure 4 demonstrates the RMSE of the actual and estimated C p values for the training data. The highest number of iterations was 1000, and the best RMSE was 0.17416.   To create the structure of the SGB, the different parameters of this algorithm needed to be determined. These parameters were the number of additive terms, learning rates, minimum n in the child node, and the proportion of the sub-dataset. Table 2 gives details of the trained models. Figure S2 shows the predicted and actual (experimental) values of the C p s, which were obtained with different models. Figure 5 illustrates the regression diagrams of four models for the computed and actual values. According to these figures, both the training and testing results showed a surprisingly good fit to a straight line for the SGB model; however, the obtained fits for the other models did not give benchmarking results compared to the SGB. The MLP-ANN algorithm seems to have produced estimations than the ANFIS and RBF-ANN algorithms. The R 2 coefficient of the SGB method was 0.994 for both the training and testing datasets. The R 2 values were 0.933 and 0.943, 0.8499 and 0.8754, and 0.9327 and 0.9204 for the training and testing datasets of the MLP-ANN, ANFIS, and RBF-ANN algorithms, respectively. According to the statistical analysis, the linear regression equations of the testing datasets of the ANFIS, MLP-ANN, RBF-ANN, and SGB models were respectively expressed as follows: The regression equations express the accuracy or the deviation of the calculated C p of the ionanofluids from actual values. As these equations are more similar to the bisector line equation, they gave more accurate predictions. The relative deviations (%) between the predicted and actual C p of the ionanofluids for these approaches are depicted in Figure S3. The MREs (%) of the testing and training data of the SGB model were 0.93789% and 0.81109%, respectively. Furthermore, these values were 1.9799% and 1.8353%, 3.5439% and 3.5403%, and 1.768% and 1.6416% for the MLP-ANN, ANFIS, and RBF-ANN, respectively. It is clear that the SGB predicted the C p better than the other models. The RBF-ANN and MLP-ANN showed similar results, but the RBF-ANN was slightly better than the MLP-ANN. Table 3 summarizes the statistical techniques, including the R 2 , MSE, MRE, RMSE, and STD for training, testing, and total datasets for each model. The determined indexes in the training phase show that the proposed models were trained to acceptable degrees of accuracy. After the evaluation of the training phase, it was important to identify the performance of models regarding the determination of the C p of ionanofluids for unseen conditions. Therefore, the statistical indexes in the testing phase were investigated to ensure the generalization of the model. The generalization of models was confirmed by the determination of the error values. The low error values show that the SGB model had an interesting accuracy for unseen points.

Outlier Detection
The correctness of the algorithms is strongly influenced by the precision of the laboratory values [83]. This study used a large amount of literature data, and it should be noted that some of these data may have a significant laboratory error. The trend of outliers is distinct from the general trend. It is necessary to find the exact procedure for detecting outliers to remove the imprecise experimental data [84]. In this work, the Leverage method was used to find the outliers. Based on the following equation, after obtaining the residual values, a hat matrix (H) was created for the input values [85]: X is an m × n, where m and n are the number of model parameters and the number of samples, respectively. From the main diagonal of H, the hat values are obtained. Accordingly, a Williams plot can graphically detect outliers that show the standard residual values versus the hat values. Because the predictive power of the SGB was better than the ANFIS, the SGB was used to analyze the outlier detection. Figure 6 illustrates Williams plots for the various models investigated. The critical leverage value (H*) was calculated according to this equation: where the blue lines in the figures indicate the leverage limit and the data points that have lower critical leverage value (H*) than the hat values (H) are known as outliers. Moreover, the red lines y = ±3 are borders, where data points with standardized residuals outside these two lines are regarded as outliers.

Importance of the Input Parameters
To determine which inputs had the greatest impact on C p , we used the relevancy factor (r), where r is in the range of [−1, 1]. The r values are calculated using Equation (32) [84]: where X k,i and Y i are the ith input and output, respectively; Y and X k are the average value of output and kth input, respectively; n denotes the total number of data points. As shown in Figure 7, C p was directly related to T, M w , and T c , and had an inverse relation with ω and x. M w and T c had the highest and ω had the lowest impact on C p (with r equal to 0.451).

Conclusions
In this work, the predictive capability of four groups of intelligence models was evaluated to determine the C p of ionanofluids in extensive conditions based on a wide database containing 571 data points gathered from literature reviews. C p was estimated by considering the properties of ILs, the nanoparticle concentration, and operational temperature as input parameters. Moreover, the dependent parameters of the ANFIS were optimized using PSO. The LM algorithm was used to determine the tuning parameters of the ANN. The PSO displayed an excellent ability to determine the best values of the ANFIS parameters. The outstanding aspects of this study are its easy and quick calculation and the low number of adjustable parameters for the calculation methods. The statistical analyses of the SGB method showed highly satisfactory predictions compared with the other models. Fortunately, the SGB model presented here has simple calculations. Using it in commercial software or as an alternative tool when there is no empirical data is another of its applications.