A Hybrid Genetic Programming–Gray Wolf Optimizer Approach for Process Optimization of Biodiesel Production

: Biodiesel is one the most sought after alternate fuels in the current global need for sustainable and renewable energy sources due to their lower emissions and no major modiﬁcation requirement to existing engines. However, the performance and productivity of the biodiesel production process are signiﬁcantly dependent on the process parameters. In this regard, a novel hybrid genetic programming-gray wolf optimizer approach for the process optimization of biodiesel production is proposed in this paper. For an illustration of the proposed approach, kinematic viscosity is expressed as a symbolic regression metamodel to account for the inﬂuence of catalyst concentration, reaction temperature, alcohol-to-oil molar ratio, and reaction time. Then, the genetic programming-based symbolic regression metamodel is used as an objective function by the gray wolf optimizer to optimize the process parameters. The obtained results show that the proposed approach is simple, accurate, and robust.


Introduction
Biodiesel plays a vital role in the alternate fuel sector, as its main aim is not just to replace non-renewable (petrol, diesel, or other) fuel but to create the balance in creating energies. Biodiesel is a non-petroleum-based diesel fuel made by the combination of short-chain alkyl ester (methyl and ethyl) made through the transesterification process of animal and vegetable oil. There are a lot of benefits of using biodiesel in existing engines, as it can be directly used without much change and can be used in its pure form as well as in blended form with diesel in varying proportions. With an ever-increasing need for energy and the global initiatives taken toward reducing the carbon footprint, the demand for renewable sources of energy is felt like never before. Researchers across the globe have devoted tremendous time and effort to come up with alternate but sustainable sources to generate energy. Meng et al. used waste cooking oil to produce biodiesel as it contains a large amount of free fatty acid [1]. In this process, the oil is being heated to a particular temperature, which is followed by mixing methanol. Then, it is poured into the processor and is left undisturbed so that the oil gets separated in the form of layers of unwashed biodiesel and glycerine. Finally, the glycerine is separated from the oil to get biodiesel. In recent times, various methods have been used for the production of biodiesel from flaxseed [2], macroalgae, vegetable oil, food crop, soybean oil [3], palm oil [4], sunflower seed oil, etc. The focus of researchers is motivated toward biodiesel, 2 of 11 as it is less polluting, can be blended with diesel to reduce the consumption of diesel, and creates more employment opportunities for rural people [5]. The major problem of commercialization of biodiesel is the price of feedstocks as it occupies as much as 70% of the production cost of biodiesel [6].
Irrespective of the raw material used for biodiesel production, all biodiesel production plants are significantly affected by the underlying process parameters. The production process parameters ranging from molar ratio to temperature to catalyst percentage must be carefully controlled to achieve high yield and desirable biodiesel properties. In this regard, researchers often rely on optimization approaches to predict the optimum biodiesel production process parameters. Various approaches and different experimental design techniques have been used by several researchers such as Face-centred Central Composite Design (FCCD), Box-Behnken Design (BBD), and central composite design (CCD) to get the maximum yield of biodiesel. FCCD in conjunction with the RSM (response surface methodology) approach was used by Yongphet et al. to get the maximum yield of biodiesel through a transesterification reaction. The results obtained from the RSM study indicates that the electric field plays a vital role in the overall transesterification process of biodiesel and was a very efficient process for the fast production of biodiesel [7]. Ahmad et al. used an RSM approach to predict the maximum yield of biodiesel. They reported that the optimum condition for maximum biodiesel yield was at a methanol/oil molar ratio of 5.9:1, the catalyst weight percentage of 0.51%, reaction temperature of 59 • C, and reaction time of 33 min. A maximum yield of approximately 98% was achieved by them, whereas the RSM approach slightly overpredicted the yield as 99% [8]. Hameed et al. used RSM and predicted that the best yield of biodiesel can be achieved at a methanol/oil ratio of 11.43%, a reaction time of 9.72 h, and the amount of catalyst is 5.52 wt % [4]. Shin et al. too used an RSM approach to evaluate the relationship between the reaction parameters and fatty acid methyl ester. The results were taken at the optimum processing condition of 330 • C, 210 bar with a reaction time of 16 min and methanol-to-oil molar ratio as 50 mol/mol and got the resultant value of 93.3% methyl ester [9]. Wu et al. used an orthogonal array experimental design to predict the yield percentage of biodiesel produced from camelina seed oil using alkaline transesterification where the maximum yield was found at a methanol/oil molar ratio of 8:1 with a reaction time of 70 min, temperature of 50 • C, and catalyst wt % of 1. They predicted the optimization value of FAME (fatty acid methyl ester) yield as 98.4% [10]. Alviso et al. [11] developed a genetic programming predictive model based on literature data for estimating the physicochemical properties from its fatty acid composition. They reported significant improvement in the prediction power of the models as compared to the polynomial regression approach.
In this paper, a hybrid process parameter modeling and optimization approach is presented. Contrary to the conventional polynomial regression approach wherein often a fixed form polynomial model of second order is built, in this paper, a form-free approach is used. Typically, in polynomial regression, only the values of the coefficients for the predefined mathematical expression is computed. On the other hand, a symbolic regression may be defined as a regression approach where the form is not selected a priori. Both the form of the mathematical expression and the values of the coefficients are searched simultaneously. This is an advantage, as it allows the model to adapt to the problem's non-linearity and complexity. In this paper, genetic programming is used for symbolic regression-based process parameter modeling. Subsequently, the symbolic regression metamodel is deployed in conjunction with a powerful metaheuristic algorithm called the Gray Wolf Optimizer. The methodology is illustrated in the subsequent sections with the help of an experimental dataset from Gülüm et al. [12]. The kinematic viscosity of biodiesel is considered as the modeled response and is expressed as a function of catalyst concentration, reaction time, alcohol-to-oil molar ratio, and reaction time.

Methodology
The methodology used in this paper is shown in Figure 1 and is implemented by using an in-house FORTRAN program.

Methodology
The methodology used in this paper is shown in Figure 1 and is implemented by using an in-house FORTRAN program.

Genetic Programming
Genetic programming is a predictive model building or metamodeling approach that uses Darwin's theory of natural selection as an inspiration [13]. Genetic programming (GP) initiates by generating a host of random candidate solutions (i.e., GP trees). This set of solutions is collectively called the population. In this paper, the GP trees are constructed from a pool of pre-selected functions = {+, −, * ,/} and a set of terminals = { , , , , }. Here, , , , represent catalyst concentration, reaction temperature, alcohol-to-oil molar ratio, and reaction time, respectively. In the GP process, all the process parameters (i.e., the independent variables) are treated as terminals. In addition, the coefficients of regression (i.e., , , … , ) are considered as the terminals. can be any real number.
The fitness of each solution or GP tree is evaluated. In this paper, the coefficient of determination (R ) is considered as the fitness function. R is a measure of the amount of the variance in the dependent parameter that is predicted by the fitted metamodel. For an ideal case, R is 1, i.e., 100% of the variance in the dependent parameters in the training dataset is explained by the fitted metamodel. Thus, in other words, if the developed metamodel predicts the same values as those supplied in the training dataset, the R is 1. The GP trees with higher fitness have a larger probability of getting selected for breeding to generate the next population. If suppose say in the ith generation, two GP trees, T1 and T2 have R values of 0.6 and 0.8, respectively, then during the process of selection, T2 is more likely to be selected. The selection in the current paper is based on the tournament selection method, and the selected GP trees are recombined by using a single point crossover method. During crossover, some part of one GP tree is grafted over another GP tree to generate a completely new GP tree. A small random flipping of genes, i.e., the mutation is carried out to introduce new information. In addition, in the current paper elitism, i.e., copying some of the fittest solutions to the next generation, is also done. Thus, although

Genetic Programming
Genetic programming is a predictive model building or metamodeling approach that uses Darwin's theory of natural selection as an inspiration [13]. Genetic programming (GP) initiates by generating a host of random candidate solutions (i.e., GP trees). This set of solutions is collectively called the population. In this paper, the GP trees are constructed from a pool of pre-selected functions F = {+, −, * , /} and a set of terminals T = {real numbers, x 1 , x 2 , x 3 , x 4 }. Here, x 1 , x 2 , x 3 , x 4 represent catalyst concentration, reaction temperature, alcohol-to-oil molar ratio, and reaction time, respectively. In the GP process, all the process parameters (i.e., the independent variables) are treated as terminals. In addition, the coefficients of regression (i.e., β 1 , β 2 , . . . , β n ) are considered as the terminals. β i can be any real number.
The fitness of each solution or GP tree is evaluated. In this paper, the coefficient of determination (R 2 ) is considered as the fitness function. R 2 is a measure of the amount of the variance in the dependent parameter that is predicted by the fitted metamodel. For an ideal case, R 2 is 1, i.e., 100% of the variance in the dependent parameters in the training dataset is explained by the fitted metamodel. Thus, in other words, if the developed metamodel predicts the same values as those supplied in the training dataset, the R 2 is 1. The GP trees with higher fitness have a larger probability of getting selected for breeding to generate the next population. If suppose say in the ith generation, two GP trees, T1 and T2 have R 2 values of 0.6 and 0.8, respectively, then during the process of selection, T2 is more likely to be selected. The selection in the current paper is based on the tournament selection method, and the selected GP trees are recombined by using a single point crossover method. During crossover, some part of one GP tree is grafted over another GP tree to generate a completely new GP tree. A small random flipping of genes, i.e., the mutation is carried out to introduce new information. In addition, in the current paper elitism, i.e., copying some of the fittest solutions to the next generation, is also done. Thus, although the first generation in the GP process is randomly created, for the second generation onwards, the process of new GP tree generation is guided by the GP operators-selection, crossover, mutation, and elitism. After a predefined number of generations, the GP process is terminated, and the fittest individual so far is reported as the best GP tree.

Gray Wolf Optimizer
Gray wolf optimization is a popular swarm intelligent technique developed by Mirjalili et al. in 2014 [14], which mimics the leadership hierarchy of wolves [15]. The hunting techniques and the social hierarchy of wolves (i.e., alpha, beta, delta, and omega wolf) are mathematically modeled to develop gray wolf optimization (GWO). The GWO algorithm is implemented for optimization by mimicking three key steps of gray wolf behaviorhunting and searching for prey, encircling prey, and attacking prey [16].
In GWO, the fittest solution, the second-best solution and the third-best solution are called alpha (α), beta (β), and delta (δ), respectively. The remaining solutions are called omega (ω). In nature, the gray wolves hunt by encircling the prey, which is modeled in the GWO algorithm.
where t indicates the current iteration, → X p (t) and → X(t) represent the position vector of the prey and the gray wolf at iteration t.

→
A and → C are coefficient vectors, which are calculated as: where components of → a are linearly decreased from 2 to 0 throughout iterations and Any gray wolf at position (X, Y) can update its position as per the position of the prey (X*, Y*). Thus, the gray wolves can update their positions in the search space by using the above equations. Next, to mimic the hunting behavior of the gray wolves, the three best solutions are assigned the roles of α, β, and δ. The location of the prey (optimum) is searched by the α, β, and δ wolves. The ω wolves update their position by following the best agent by the following relations: Thus, the GWO works by initiating a random set of candidate solutions (i.e., a population of gray wolves). As iterations progress, the α, β, and δ wolf estimate the probable position of prey (i.e., optimum). By using the equations discussed above, each wolf updates its distance from the prey. This iterative process continues until the termination criteria are met.

Development of Genetic Programming Metamodel
Based on the training data from Gülüm et al. [12] and the genetic programming methodology depicted in Figure 1, symbolic regression metamodels are trained to indicate the kinematic viscosity as a function of catalyst concentration, reaction temperature, alcoholto-molar ratio, and reaction time. Based on a pilot study, the population size of the GP pool is selected as 1000, and the generation limit is considered as 500; i.e., a total of 500,000 GP tree evaluations are carried out. The elitism is kept as 1%, and crossover and mutation probabilities are considered as 0.9 and 0.1, respectively. Furthermore, to keep the GP metamodel from getting cumbersome, the GP tree length and depth are restricted to 35 and 8, respectively. Based on the GP process, the following metamodel is developed, where β 1 = 0.0410, β 2 = 0.0046, β 3 = 41.7427, β 4 = 11.9447, β 5 = 3.4662, and β 6 = 1.1379. The R 2 for the above metamodel is 94.1%, indicating a very high accuracy for the model. The mean squared error (MSE), mean absolute error (MAE), mean error (ME), mean relative error (MRE), and normalized mean squared error (NMSE) of the metamodel is 0.00073, 0.02297, −1.27306 × 10 −15 , 0.00551, and 0.057483 respectively.
The metamodel represented in Equation (12) can be visualized in the form of a GP tree, as shown in Figure 2. The variation in the relative frequency of the functions (i.e., +, −, * , /) and the terminals (i.e., constants and variables) over the 500 generations of the GP process is shown in Figure 3. It is seen that the constants (i.e., the β coefficients) have the most relative occurrence in the GP trees. Similarly, in Figure 4, the relative frequency of the individual process parameters in the GP trees over the entire generation limit is shown. It is seen that the catalyst concentration has the most occurrence in the GP trees. The impact of each process parameter is calculated on the GP trees, and it is found that the catalyst concentration, reaction temperature, alcohol-to-molar ratio, and reaction time have an impact of 90.60%, 8.43%, 0.54%, and 0.43% respectively. This finding is in sync with the inferences reported by Gülüm et al. [12] based on an RSM study where they conclude catalyst concentration and reaction temperature to be the most important parameters among the four. However, it is worth mentioning here that Gülüm et al. [12] had to carry out additional tests such as ANOVA (analysis of variance) to calculate the relative impact of the variables.       Figure 5 shows the variation in GP tree lengths of the entire population for a typical generation. It is seen that most of the generated GP trees during the process are well below the permissible GP tree length. The depth and length of the final GP tree represented in Figure 2 are 7 and 26, respectively. The predictive performance of the GP-based symbolic regression metamodel is compared in Figure 6 to experimental data. The figure indicates that the GP metamodel is fairly accurate and in some cases shows a small amount of underprediction. Further, the GP metamodel is deployed for the study of the influence of various process parameters on the response, as shown in Figure 7. From the figure, it can be ascertained that for low values of kinematic viscosity, a relatively larger value of catalyst concentration will be beneficial.  Figure 5 shows the variation in GP tree lengths of the entire population for a typical generation. It is seen that most of the generated GP trees during the process are well below the permissible GP tree length. The depth and length of the final GP tree represented in Figure 2 are 7 and 26, respectively. The predictive performance of the GP-based symbolic regression metamodel is compared in Figure 6 to experimental data. The figure indicates that the GP metamodel is fairly accurate and in some cases shows a small amount of underprediction. Further, the GP metamodel is deployed for the study of the influence of various process parameters on the response, as shown in Figure 7. From the figure, it can Processes 2021, 9, 442 7 of 11 be ascertained that for low values of kinematic viscosity, a relatively larger value of catalyst concentration will be beneficial. Figure 5 shows the variation in GP tree lengths of the entire population for a typical generation. It is seen that most of the generated GP trees during the process are well below the permissible GP tree length. The depth and length of the final GP tree represented in Figure 2 are 7 and 26, respectively. The predictive performance of the GP-based symbolic regression metamodel is compared in Figure 6 to experimental data. The figure indicates that the GP metamodel is fairly accurate and in some cases shows a small amount of underprediction. Further, the GP metamodel is deployed for the study of the influence of various process parameters on the response, as shown in Figure 7. From the figure, it can be ascertained that for low values of kinematic viscosity, a relatively larger value of catalyst concentration will be beneficial.

Optimization Using Gray Wolf Optimizer
The symbolic regression metamodel reported in Equation (1)  In the GWO process, the population size and iteration limit are considered as 500 and 100 respectively, i.e., 50,000 function evaluations are carried out. Due to the stochastic nature of the metaheuristic algorithms, GWO is run for 25 independent trials.
In the GWO process, the population size and iteration limit are considered as 500 and 100 respectively, i.e., 50,000 function evaluations are carried out. Due to the stochastic nature of the metaheuristic algorithms, GWO is run for 25 independent trials.  Figure 8 shows the convergence of the GWO for five typical trials. It is seen that the GWO rapidly converges across iterations. The overall 50,000 function evaluations for each of the 25 GWO trials are plotted in Figure 9. In general, it is seen that the median of the evaluated functions is much lower than the mean. This indicates that the majority of the evaluated functions have high fitness (i.e., lower kinematic viscosity value). This is indicative of a smaller number of function evaluations required to reach the optimal zone. The optimal values reported by the GWO algorithm for 25 independent trials are plotted in Figure 10. The best and the worst optimal response values reported by the GWO are 3.8745 cSt and 3.8758 cSt, i.e., a difference of approximately 0.035% to the best optimal solution. This negligible deviation in the optimal solution reported by the GWO while conducting independent trials indicates the high precision and repeatability of the GWO. Thus, by the current approach, the global optimal solution for minimizing the kinematic viscosity is obtained at catalyst concentration 1.25 wt %, reaction temperature 69.9986 °C, alcohol-tooil molar ratio of 10.2537:1, and reaction time of 96.3198 min with a kinematic viscosity of 3.8745 cSt. The current solution is better than the optimal solution reported by Gülüm et al. [12] using RSM as 3.878 cSt (0.09% better), that reported experimentally as 3.886 cSt (0.3% better), and that reported using cubic spline interpolation as 3.945 cSt (1.79% better).  Figure 8 shows the convergence of the GWO for five typical trials. It is seen that the GWO rapidly converges across iterations. The overall 50,000 function evaluations for each of the 25 GWO trials are plotted in Figure 9. In general, it is seen that the median of the evaluated functions is much lower than the mean. This indicates that the majority of the evaluated functions have high fitness (i.e., lower kinematic viscosity value). This is indicative of a smaller number of function evaluations required to reach the optimal zone. The optimal values reported by the GWO algorithm for 25 independent trials are plotted in Figure 10. The best and the worst optimal response values reported by the GWO are 3.8745 cSt and 3.8758 cSt, i.e., a difference of approximately 0.035% to the best optimal solution. This negligible deviation in the optimal solution reported by the GWO while conducting independent trials indicates the high precision and repeatability of the GWO. Thus, by the current approach, the global optimal solution for minimizing the kinematic viscosity is obtained at catalyst concentration 1.25 wt %, reaction temperature 69.9986 • C, alcohol-to-oil molar ratio of 10.2537:1, and reaction time of 96.3198 min with a kinematic viscosity of 3.8745 cSt. The current solution is better than the optimal solution reported by Gülüm et al. [12] using RSM as 3.878 cSt (0.09% better), that reported experimentally as 3.886 cSt (0.3% better), and that reported using cubic spline interpolation as 3.945 cSt (1.79% better).

Conclusions
In this paper, a novel approach to process modeling and optimization of the biodiesel production process is highlighted. As an illustrative example, the kinematic viscosity of the biodiesel is expressed as a function of the catalyst concentration, the reaction temperature, the alcohol-to-oil molar ratio, and the reaction time. Instead of the conventionally used fixed-form polynomial regression approach (by using RSM methodology), a formfree symbolic regression approach (by using GP, i.e., genetic programming) is used. In polynomial regression, often additional steps such as the transformation of data by methods such as Box-Cox transformation is needed so that it can model the non-linearity in the process better. Since the GP-based symbolic regression is not restricted by a predefined form, it can adapt to the training data and thus does not require additional data transformation steps. Further, unlike RSM approaches that require the use of additional statistical approaches such as ANOVA to account for the relative influence of each process parameter on the modeled response, the GP process by monitoring the relative occurrence frequency of the various variables in GP trees can give an estimate of the relative importance of the process parameters. The two points mentioned above make symbolic regression less labor intensive as compared to traditional polynomial regressions. The modeled response, obtained in form of a GP tree (or empirical relation) can be used as an inexpensive tool for parametric evaluation. Finally, the GP-based metamodel is linked to Gray Wolf Optimizer (GWO) to carry out process optimization. The GWO is seen to be very quick and robust in terms of global optimal prediction. The global optimal solution for minimizing the kinematic viscosity is obtained as catalyst concentration 1.25 wt %, reaction temperature 69.9986 °C, alcohol-to-oil molar ratio of 10.2537:1, and reaction time of 96.3198 min. The current optimal solution is found to be 0.09%, 0.3%, and 1.79% better than RSM, experimental, and cubic spline interpolation, respectively. Thus, by adopting the proposed hybrid genetic programming-gray wolf optimizer approach for the process optimization of biodiesel production, significant savings in time and effort in identifying and optimizing the process parameters can be achieved.

Conclusions
In this paper, a novel approach to process modeling and optimization of the biodiesel production process is highlighted. As an illustrative example, the kinematic viscosity of the biodiesel is expressed as a function of the catalyst concentration, the reaction temperature, the alcohol-to-oil molar ratio, and the reaction time. Instead of the conventionally used fixed-form polynomial regression approach (by using RSM methodology), a form-free symbolic regression approach (by using GP, i.e., genetic programming) is used. In polynomial regression, often additional steps such as the transformation of data by methods such as Box-Cox transformation is needed so that it can model the non-linearity in the process better. Since the GP-based symbolic regression is not restricted by a predefined form, it can adapt to the training data and thus does not require additional data transformation steps. Further, unlike RSM approaches that require the use of additional statistical approaches such as ANOVA to account for the relative influence of each process parameter on the modeled response, the GP process by monitoring the relative occurrence frequency of the various variables in GP trees can give an estimate of the relative importance of the process parameters. The two points mentioned above make symbolic regression less labor intensive as compared to traditional polynomial regressions. The modeled response, obtained in form of a GP tree (or empirical relation) can be used as an inexpensive tool for parametric evaluation. Finally, the GP-based metamodel is linked to Gray Wolf Optimizer (GWO) to carry out process optimization. The GWO is seen to be very quick and robust in terms of global optimal prediction. The global optimal solution for minimizing the kinematic viscosity is obtained as catalyst concentration 1.25 wt %, reaction temperature 69.9986 • C, alcohol-to-oil molar ratio of 10.2537:1, and reaction time of 96.3198 min. The current optimal solution is found to be 0.09%, 0.3%, and 1.79% better than RSM, experimental, and cubic spline interpolation, respectively. Thus, by adopting the proposed hybrid genetic programming-gray wolf optimizer approach for the process optimization of biodiesel production, significant savings in time and effort in identifying and optimizing the process parameters can be achieved.