Using SVM-RSM and ELM-RSM Approaches for Optimizing the Production Process of Methyl and Ethyl Esters

: The production of a desired product needs an effective use of the experimental model. The present study proposes an extreme learning machine (ELM) and a support vector machine (SVM) integrated with the response surface methodology (RSM) to solve the complexity in optimization and prediction of the ethyl ester and methyl ester production process. The novel hybrid models of ELM-RSM and ELM-SVM are further used as a case study to estimate the yield of methyl and ethyl esters through a trans-esteriﬁcation process from waste cooking oil (WCO) based on American Society for Testing and Materials (ASTM) standards. The results of the prediction phase were also compared with artiﬁcial neural networks (ANNs) and adaptive neuro-fuzzy inference system (ANFIS), which were recently developed by the second author of this study. Based on the results, an ELM with a correlation coefﬁcient of 0.9815 and 0.9863 for methyl and ethyl esters, respectively, had a high estimation capability compared with that for SVM, ANNs, and ANFIS. Accordingly, the maximum production yield was obtained in the case of using ELM-RSM of 96.86% for ethyl ester at a temperature of 68.48 ◦ C, a catalyst value of 1.15 wt. %, mixing intensity of 650.07 rpm, and an alcohol to oil molar ratio ( A / O ) of 5.77; for methyl ester, the production yield was 98.46% at a temperature of 67.62 ◦ C, a catalyst value of 1.1 wt. %, mixing intensity of 709.42 rpm, and an A / O of 6.09. Therefore, ELM-RSM increased the production yield by 3.6% for ethyl ester and 3.1% for methyl ester, compared with those for the experimental data.


Introduction
Nowadays, dependence on fossil fuels have caused an environmental crisis and a high production volume of pollutants [1,2]. Recently, the benefits of biofuels and comprehensive research in the field of converting renewable resources into the biofuels [3,4] have attracted much attention regarding the possibility of producing and using renewable fuels [5] such as biodiesel [6]. Waste oils, edible and nonedible oils, and animal fats are common feed-stocks to produce biodiesel [7][8][9]. The biodiesel is produced from a reaction between a short chain alcohol and oils through a transesterification process in the presence of an acidic or alkaline catalyst [10,11].
The numerous advantages of biodiesel as an alternative fuel has led many researchers to produce, evaluate, and use this fuel recently. Xue et al. [12] produced biodiesel from soybean and Jatropha oils using a magnetic catalyst. Ma et al. [13] produced biodiesel from unrefined methanol during a transesterification process. Román-Figueroa et al. [14] produced biodiesel from crude castor oil (Ricinus communis) during a non-catalytic supercritical methanol transesterification process. Raia et al. [15] produced biodiesel from Jatropha curcas L. oil during simultaneous esterification and transesterification mechanisms using sulfated zirconia. They found that the esterification mechanism is faster than transesterification in the beginning reaction. Ali et al. [16] produced biodiesel from waste cooking oil (WCO) using purified lipase of Pseudomonas aeruginosa. They reached a maximum biodiesel yield of 86% by optimizing the reaction conditions. Carvalho et al. [17], produced biodiesel from Mucor circinelloides in the presence of ethanol and impregnated heteropoly acid on alumina as a catalyst during one and two-step transesterification mechanisms.
The production of a desired product needs an effective use of an experimental model [18]. There are several modeling methods and the simplest method of modeling is a mathematical approach (or sometimes is remembered as a classic model) [19]. Modeling methods contain a wide range of methods from mathematical to soft computing methods, but modeling an ill-defined system, such as a biodiesel production process, becomes even more pronounced using mathematical and statistical techniques due to the complexity and the biological nature of biodiesel [20,21].
Soft computing methods are able to do complex modeling tasks without the need for relations governing the system. Adaptive neuro-fuzzy inference system (ANFIS) and artificial neural networks (ANNs) are different types of soft computing methods. Neural networks are widely applied on biodiesel production by various researchers. Recently models developed with the basis of soft computing methods have been used for the study and prediction of processes. Many of them originated from ANN models. The mentioned models have a high performance and accuracy. The arrival of these models to studies related to the production and the use of biodiesel will improve efficiency.
Sharon et al. [22] studied trained ANN in a SIMULINK model for biodiesel production and the prediction of a diesel engine performance. They studied the effect of NaOH on transesterification of fried oils. Fahmi and Cremaschi [23] developed ANN as alternative models to replace the real operation, thermodynamics, and complex models. They found that the ANN method reduces the computation cost. Moradi et al. [24] studied transesterification of soybean oil to biodiesel using KOH in different conditions. They applied ANN to estimate the biodiesel yield. Chakraborty and Sahu [25] evaluated the biodiesel production from waste goat tallow using a response surface method (RSM) and ANN. Farobie et al. [26] employed an ANN method to model the biodiesel production process in the presence of methanol and ethanol. Maran and Priya [27] applied ANN and RSM methods to predict the biodiesel production process from Azadirachta indica oil during an ultrasound-assisted intensification process. They reported ANN has better performance compared to RSM for predicting biodiesel production in the mentioned process. Maran and Priya [28] in another study employed RSM and ANN methods for modeling biodiesel production from muskmelon oil using an ultrasound-assisted reactor. The prediction ability of the ANN model was much better than RSM. Betiku et al. [29] developed ANN with a genetic algorithm (GA) and RSM models to model and optimize the transesterification step from the production of biodiesel from shea oil. Sarve et al. [30] applied RSM and ANN methods in a comparative study to produce biodiesel from Sesamum indicum L. oil in the presence of barium hydroxide as a catalyst for the reaction. Based on the results, ANN showed better prediction capability than RSM.
In a study by Aghbashlo et al. [31], the ELM method with wavelet transform (WT) algorithm was applied to predict the performance of diesel engine running on diesel/biodiesel blends from the viewpoint of exergetic theory. They compared the performance of ELM-WT, ELM, genetic programming (GP), and ANN models for the exergetic modeling of the engine.
Nowadays, using hybrid methods has become popular in modeling and estimating studies. These methods combine more than one classifier, estimator, or uses one for estimating and the other for optimizing. This helps to improve the current system performance and accuracy to get better results in the biodiesel production process or to optimize the production process. Based on the developed studies, the hybrid approach performs better than the single approach. Srivastava et al. [32] employed a hybrid optimization method using RSM, ANN, and GA to optimize the biodiesel production during supercritical methanol transesterification. Saeidi et al. [33] used a modified data envelopment analysis (MDEA), as a hybrid of data envelopment analysis (DEA) with ANN to design a multi-response problem, which was validated by RSM to optimize the biodiesel production from waste cooking palm oil. Sajjadi et al. [34] used ANFIS-PSO, ANFIS-GA and ANFIS-DE methods to estimate the effect of ultrasound and mechanically stirring systems on biodiesel synthesis in the transesterification process.
The present study follows from our previous study in reference [35]. The main point of the present study is to the use ELM and SVR methods to estimate the yield of methyl and ethyl esters through a trans-esterification process from WCO based on American Society for Testing and Materials (ASTM) standards and link them with an RSM to optimize the process (as a hybrid ELM-RSM and ELM-SVR) to provide a novel hybrid approach. To our best knowledge, this is the first report on biodiesel production with these approaches. The results of this study can help achieve efficient production and develop automatic devices and processes for the production of biodiesel. In reference [35], the main platform of modeling was to employ the single ANN method, RSM, and hybrid ANFIS methods to estimate the yield and cost of biodiesel production. The present study completes the previous study by developing hybrid ELM-RSM and ELM-SVR methods that enable us to focus on both prediction and optimization of the complex production system (biodiesel production system).

Biodiesel Production and Properties Measuring
In the present study, the biodiesel was produced from WCO during a transesterification process in the presence of a NaOH catalyst using ethanol and methanol. Figure 1 shows the production stages graphically.
WCO is a low cost material for biodiesel production [36,37]. Biodiesel production yield (BPY) was measured in various conditions. Reaction temperature, alcohol type, molar ratio of alcohol to oil, amount of catalyst, mixing intensity, and reaction time were considered as independent variables of the tests (input variables of the model). Based on the definition, the biodiesel production efficiency is equal to the ratio of produced biodiesel weight to the initial weight of the oil [38][39][40][41]. The data of the biodiesel production and testing variables are tabulated on Table 1.

Extreme Learning Machine
Extreme learning machines (ELMs) have become an interesting topic for research in recent years. These methods were first introduced by Huang et al. [42]. The ELM model operates on generalized single-hidden-layer feed-forward networks (SLFNs) based on structural risk minimization principles. In the ELM, the hidden layer does not need to be set and the functions of this layer, which is a feature transfer to the new space, are already specified. Support vector machine (SVM), radial basis function (RBF), and single layer feed forward models are special cases of this model. Figure 2 presents the architecture of the ELM model [43].

Extreme Learning Machine
Extreme learning machines (ELMs) have become an interesting topic for research in recent years. These methods were first introduced by Huang et al. [42]. The ELM model operates on generalized single-hidden-layer feed-forward networks (SLFNs) based on structural risk minimization principles. In the ELM, the hidden layer does not need to be set and the functions of this layer, which is a feature transfer to the new space, are already specified. Support vector machine (SVM), radial basis function (RBF), and single layer feed forward models are special cases of this model. Figure 2 presents the architecture of the ELM model [43]. (1) The ELM model can provide an integrated template for a variety of types of broad features that can be used in a hidden layer, which can be directly used in multi-part and regression categorizations [31]. Practical results show that this model has better scalability, for regression and two-class categorization, it has similar generalizability, and for multi-class categorization, it has better generalizable and a much higher computation rate than the conventional SVM and LS-SVM methods [43].
ELM uses the same kernel as SVM, such as the linear kernel and the RBF kernel.
Linear kernel (core function): k (xi, xj) = xi.xj; RBF kernel (core function): Recent research showed that proximal support vector machine (PSVM), least square support vector machine (LS-SVM), and regularization algorithms can be further simplified and placed in an integrated format called ELM.
This model aims to provide an integrated model that incorporates all of the above-mentioned methods not initially referred to as ELMs. The ELM model was originally proposed for SLFNs and then was extended to the generalized SLFNs.
• wi presents the weight vector between input and hidden nodes; • βi presents the weight vector between output and hidden nodes; • bi presents the threshold of the ith hidden node.
Unlike the usual learning methods, the ELM not only tries to minimize the error of the training data, it also tries to minimize the norm of the weights of the output [44]. According to the Barlet theory for SLFNs, a soft weight loss along with the reduction of the training error is better for In this case, the researchers have focused on several main goals: (1) The ELM model can provide an integrated template for a variety of types of broad features that can be used in a hidden layer, which can be directly used in multi-part and regression categorizations [31]. Practical results show that this model has better scalability, for regression and two-class categorization, it has similar generalizability, and for multi-class categorization, it has better generalizable and a much higher computation rate than the conventional SVM and LS-SVM methods [43].
ELM uses the same kernel as SVM, such as the linear kernel and the RBF kernel. Linear kernel (core function): Recent research showed that proximal support vector machine (PSVM), least square support vector machine (LS-SVM), and regularization algorithms can be further simplified and placed in an integrated format called ELM.
This model aims to provide an integrated model that incorporates all of the above-mentioned methods not initially referred to as ELMs. The ELM model was originally proposed for SLFNs and then was extended to the generalized SLFNs. (1) estimates the SLFNs with k hidden nodes and h(x) activation function.
• w i presents the weight vector between input and hidden nodes; • β i presents the weight vector between output and hidden nodes; • b i presents the threshold of the ith hidden node.
Unlike the usual learning methods, the ELM not only tries to minimize the error of the training data, it also tries to minimize the norm of the weights of the output [44]. According to the Barlet theory for SLFNs, a soft weight loss along with the reduction of the training error is better for generalizability [45]. Therefore, the ELM target function that attempts to reduce the training error and the weights norm of the output is as Equation (2): where: • H is the hidden layer output matrix on the neural network.
The output weights can be calculated using the following equation: • H † is the inverse of the hidden layer output matrix H. More details about the ELM are available in references [42,43]. The user-defined function was obtained using a trial and error method. The parameters of this function are given in Table 2.

Support Vector Machine
SVM is considered to be a soft computing method, which i based on statistical learning theory with a wide application in many fields of science for classification and regression problems [46,47]. One of the advantages of this method is that the SVM reduces the generalized upper bound error rather than the local training error which makes it more particular compared to other machine learning methods.
For x i input data and y i target data with data size of n ({xi, di} n i ), Equations (7) and (8) give the SVM estimation: is defined as the high dimensional space feature that mapped the input space vector x; • w is defined as the normal vector; L(x i , y i ) is defined as the empirical error. The parameters w and b can be calculated by minimizing the R SVMs for a regular couple of (w, ξ * ).
• C is defined as positive regularization parameter for minimizing the empirical error; • ξ i , ξ * i are the penalties of the training error by the loss function; • ε is the chosen error tolerance.
In natural processes, the predictor variables (from the input space) and objective variables (from output space) have a nonlinear relation. Mapping the input space onto the feature space using a kernel function can solve this limitation. Lagrange multipliers solve Equation (10). Any function satisfying the Mercer condition qualifies as a kernel function. The generalization ability is affected by the kernel function. Linear, sigmoid, polynomial, and RBFs are four basic kernel functions. Among these four, the RBF is the simplest, and most efficient and adaptive function. The RBF kernel function is as follows: • (x i , x j ) are the vectors of the input space; • γ is the regularization parameter.
The accuracy and performance of the RBF kernel function depends on γ, C, and ε. In the present study, these parameters are determined during a trial and error method as 0.35, 1000 and 0.05 respectively for γ, C, and ε.

Evaluation of Developed Models
The performance of each model is calculated by comparing the outputs of the models and the target values. Comparisons were performed by employing mean absolute error (MAE), root mean square error (RMSE), and r.
where A is the target value, P is the predicted value, and N is the number of data points. MAE and RMSE calculate the difference of the target and predicted values and r is a measure of the linearity between target and predicted values [20,21].
Based on Table 1, the results of the experimental approach show that the high production yield for ethyl and methyl esters were 93.5% and 95.5%, respectively, which was accrued at a temperature of 70 • C, a process time of 90 min, an A/O of 6:1, a catalyst value of 1 wt. %, and a mixing intensity of 600 rpm. Selecting the input variables is one of the most important issues when developing prediction models. Tables 3 and 4 present the variance analysis of biodiesel production data for ethanol and methanol, respectively. The effect of all parameters on biodiesel production for both alcohol types were significant effects at probability level of 1%, therefore all parameters were effective in biodiesel production and were involved. Using them can provide a good prediction, therefore it can be presented as a model with one output and five inputs, as given in Equation (14). Therefore, data were divided into two categories: input data (including temperature ( • C), time (min), A/O, catalyst (wt. %), and mixing intensity) and output data (methyl ester/ethyl ester yield (%)). Accordingly, due to the correlation of input variables and output variables, it was possible to model the output variables against the input variables. In the following, the results of modeling are presented.

The Results of Modeling
In this section, the results of modeling using ELM and SVM methods are presented and compared with the results of a study previously developed by Najafi et al. [35]. Table 5 reports the results of the training process for the best predictor of both ELM and SVM methods. The training process was performed by choosing the best predictor and accordingly choosing the parameters of the user-defined functions. The training process was performed using 75% of data. The models were developed using MATLAB Software R2012a (MathWork Inc., New York, NY, USA). Based on Table 5, the best ELM-based predictor, which was obtained using 15 neurons on a hidden layer and a sigmoid function as an activation function, provided a correlation of 0.9815 and 0.9863 and RMSE of 1.78 and 1.7 for estimating methyl and ethyl esters, respectively; on the other hand, in the case of using the SVM methodology, the best predictor was obtained by using the RBF kernel with parameters of 1000, 0.05, and 0.35 for C, ε, and γ, respectively, with high correlation coefficients of 0.9656 and 0.9769 and low RMSEs of 2.407 and 2.148 for methyl and ethyl esters, respectively. As is clear from Table 5, the ability of the developed models in estimating ethyl ester was a little more than predicting methyl ester (based off high correlation values of estimating ethyl ester and low RMSE values for ELM, SVM, MLP, RBF, and ANFIS).
However, in the case of comparing models in the training stage, the ability of the ELM method to estimate both of the methyl and ethyl esters was significantly higher than that of the SVM method, which was developed in the present study (with a higher correlation coefficient of 0.9815 and 0.9863 and lower RMSE values of 1.78 and 1.7 for methyl and ethyl esters, respectively), and MLP, RBF and ANFIS techniques, which were developed by Najafi et al. [35].
In the case of testing the developed methods using the other 25% of the data, the results were a little different compared to the training stage (Table 6). More specifically, the ability of SVM in the testing stage for estimating methyl ester production was higher than that of ethyl ester production, unlike the training stage (with correlation coefficients of 0.915 and 0.8945 and RMSE values of 2.805 and 2.105 for methyl and ethyl esters, respectively). On the other hand, it can be seen that the ability of each predictor has been reduced in the testing stage compared to the training stage. To see the reduction of the modeling capability quantitatively, the factor of comparison of training and testing capability (CTTC) was employed (Equation (15)). In fact, this factor indicates the difference percentage of each parameter in testing and training stages. Based on these two factors, the CTTC r indicates the difference percentage for the correlation coefficient in the testing stage compared to the training stage where this parameter describes the reduction or increase of modeling performance of the testing stage compared to the training stage: where r train is the correlation coefficient of the training stage and r test is the correlation coefficient of the testing stage. As is clear from Table 6, both ELM and SVM had a reduction in modeling performance as seen by the negative sign for CTTC r . In the ELM model, the reduction of modeling performance in the case of methyl ester production was higher than that of ethyl ester production (−5.2% and −2.9% for methyl and ethyl esters, respectively), but in the case of the SVM model, the reduction of the modeling performance in the case of ethyl ester production was higher than that of methyl ester production (−5.2 and −8.4 for methyl and ethyl esters, respectively). Figure 3 also indicates the scatter plot of the testing stage for ELM and SVM for the production of methyl and ethyl esters. Based on Figure 3, it is clear that the ability of ELM in the case of ethyl ester production was higher than that of methyl ester production (with R 2 values of 0.9179 and 0.8649 for methyl and ethyl esters, respectively) but in the case of the SVM method, the result is the opposite such that the ability of the SVM in the case of

Hybrid Modeling
This section describes a novel hybrid method by merging RSM, SVM, and ELM methods (i.e., RSM-SVM and RSM-ELM). In statistics, RSM provides the relationships among several explanatory variables and response variables.  The general logic of this model was that the inputs were imported to both RSM and ELM or SVM predictor simultaneously, then the output of the predictor was imported to the RSM method as the response of the system. Finally, the RSM method exports the desired results.

Hybrid Modeling
This section describes a novel hybrid method by merging RSM, SVM, and ELM methods (i.e., RSM-SVM and RSM-ELM). In statistics, RSM provides the relationships among several explanatory variables and response variables.

Hybrid Modeling
This section describes a novel hybrid method by merging RSM, SVM, and ELM methods (i.e., RSM-SVM and RSM-ELM). In statistics, RSM provides the relationships among several explanatory variables and response variables.  The general logic of this model was that the inputs were imported to both RSM and ELM or SVM predictor simultaneously, then the output of the predictor was imported to the RSM method as the response of the system. Finally, the RSM method exports the desired results. The general logic of this model was that the inputs were imported to both RSM and ELM or SVM predictor simultaneously, then the output of the predictor was imported to the RSM method as the response of the system. Finally, the RSM method exports the desired results.
After preparing the model, data were imported to the model and the output of the model were extracted. In this stage, the mode was employed as a predictor model. The results are presented in Table 7. Using a hybrid method improved and increased the system prediction efficiency compared to the testing stage of each predictor (Table 6). In this case, the RSM-ELM method had a high prediction capability compared to that of the RSM-SVM method.
After ensuring the accurate operation of the developed hybrid methods, it was the time for entering the optimization phase. The optimization phase was performed in three stages, one with experimental data, a second with data generated using the ELM method, and the third with data generated using the SVM method. All three phases were performed to maximize the methyl or ethyl esters production yield. Figure 5 presents the optimization conditions of methyl ester production in the case of importing experimental data. This model provided the maximum methyl ester yield of 96.68% with a reaction time of 90 min, temperature of 67.66 • C, A/O of 6.09, catalyst of 1.1 wt. %, and mixing intensity of 712.7 rev/min.  entering the optimization phase. The optimization phase was performed in three stages, one with experimental data, a second with data generated using the ELM method, and the third with data generated using the SVM method. All three phases were performed to maximize the methyl or ethyl esters production yield. Figure 5 presents the optimization conditions of methyl ester production in the case of importing experimental data. This model provided the maximum methyl ester yield of 96.68% with a reaction time of 90 min, temperature of 67.66 °C, A/O of 6.09, catalyst of 1.1 wt. %, and mixing intensity of 712.7 rev/min.            The total results are collected in Table 8 with separated inputs and outputs. Based on Table 8, using the RSM and RSM-ELM provided a high production yield of methyl and ethyl esters compared to experimental condition, while in case of production of methyl ester, RSM-SVM provided the lowest methyl ester production yield. The maximum methyl ester production was related to the use of the RSM-ELM method with a yield of 96.86%, and maximum ethyl ester production yield was also related to the RSM-ELM method with a yield of 98.46%. Using the RSM-ELM method reduced the reaction temperature to 67.62 °C and 68.48 °C for methyl and ethyl esters, respectively, which reduced the required thermal energy for production, but increased other parameters including A/O, catalyst, and mixing intensity.   The total results are collected in Table 8 with separated inputs and outputs. Based on Table 8, using the RSM and RSM-ELM provided a high production yield of methyl and ethyl esters compared to experimental condition, while in case of production of methyl ester, RSM-SVM provided the lowest methyl ester production yield. The maximum methyl ester production was related to the use of the RSM-ELM method with a yield of 96.86%, and maximum ethyl ester production yield was also related to the RSM-ELM method with a yield of 98.46%. Using the RSM-ELM method reduced the reaction temperature to 67.62 • C and 68.48 • C for methyl and ethyl esters, respectively, which reduced the required thermal energy for production, but increased other parameters including A/O, catalyst, and mixing intensity.

Conclusions
The present study employed novel hybrid ELM-RSM and SVR-RSM techniques to obtain a maximum yield of methyl and ethyl esters through a trans-esterification process from WCO based on ASTM standards and also to explain the importance of hybrid approaches from the viewpoint of accuracy and functionality. Given that the biodiesel production process is a time-consuming and costly process, any trial and error in the production process will be detrimental to the manufacturer. Using the policy in this study can reduce additional errors and in fact increase production efficiency. This study can also encourage interested scholars to study further in the field of using hybrid methods for the optimization of processes.
Input variables were temperature ( • C), time (min), A/O, catalyst (wt. %), and mixing intensity (rev/min), and output variables were the production yield (%) of ethyl or methyl esters. Based on the results, the ability of both models (ELM and SVR) in estimating ethyl ester was a little more than that for the prediction of methyl ester, where this ability in the case of using the SVM methodology was more sensible. Using the hybrid method improved and increased the optimization of the system efficiency. Based on the results of the optimizing stage, the RSM-ELM technique could increase the methyl and ethyl esters production from 93.5% (in case of using experimental data) to 96.86% and from 95.5% (in case of using experimental data) to 98.46%, respectively.
In general, using the ELM method provided a high accuracy for estimation of the production yield and employing the RSM-ELM method increased the production yield, and the increase of the ethyl ester production was more sensible compared to that of the methyl ester production. Due to the time-consuming nature of the optimization process in using different methods, this study could provide a new approach for the optimum production of the ethyl or methyl esters for researchers. Therefore, by using the results of this study, there is no need for trial and error in modeling and optimizing the production process in different ways.