Optimizing Biodiesel Production from Waste Cooking Oil Using Genetic Algorithm-Based Support Vector Machines

The ever increasing fuel demands and the limitations of oil reserves have motivated research of renewable and sustainable energy resources to replace, even partially, fossil fuels, which are having a serious environmental impact on global warming and climate change, excessive greenhouse emissions and deforestation. For this reason, an alternative, renewable and biodegradable combustible like biodiesel is necessary. For this purpose, waste cooking oil is a potential replacement for vegetable oils in the production of biodiesel. Direct transesterification of vegetable oils was undertaken to synthesize the biodiesel. Several variables controlled the process. The alkaline catalyst that is used, typically sodium hydroxide (NaOH) or potassium hydroxide (KOH), increases the solubility and speeds up the reaction. Therefore, the methodology that this study suggests for improving the biodiesel production is based on computing techniques for prediction and optimization of these process dimensions. The method builds and selects a group of regression models that predict several properties of biodiesel samples (viscosity turbidity, density, high heating value and yield) based on various attributes of the transesterification process (dosage of catalyst, molar ratio, mixing speed, mixing time, temperature, humidity and impurities). In order to develop it, a Box-Behnken type of Design of Experiment (DoE) was designed that considered the variables that were previously mentioned. Then, using this DoE, biodiesel production features were decided by conducting lab experiments to complete a dataset with real production properties. Subsequently, using this dataset, a group of regression models—linear regression and support vector machines (using linear kernel, polynomial kernel and radial basic function kernel)—were constructed to predict the studied properties of biodiesel and to obtain a better understanding of the process. Finally, several biodiesel optimization scenarios were reached through the application of genetic algorithms to the regression models obtained with greater precision. In this way, it was possible to identify the best combinations of variables, both independent and dependent. These scenarios were based mainly on a desire to improve the biodiesel yield by obtaining a higher heating value, while decreasing the viscosity, density and turbidity. These conditions were achieved when the dosage of catalyst was approximately 1 wt %.


Introduction
With the never ending increase in world fuel demand, biodiesel has emerged in recent decades as one of the main alternatives to petroleum diesel [1].Biodiesel can be used in conventional diesel engines.Engine performance is comparable to that that provided by petroleum.No changes to fuel handling or the delivery systems are required.Diesel fuel is known as a substitute for, or an additive to, diesel fuel.Nevertheless, it is normally produced from fats and oils of plants and animals [2,3].In addition, waste cooking oil is a potential sustainable alternative for vegetable oils, as it solves the disposal problem and reduces the cost of raw material [4,5].A vast quantity of waste cooking oil is generated annually.The methods by which it is disposed of pose a problem, and may result in contamination of water in the environment.Thus, using waste cooking oil to produce biodiesel is an excellent way to use it economically, efficiently, and in an eco-friendly manner [6].
Biodiesel is produced from vegetable oils by direct transesterification.Triglycerides react with short-chain alcohols in the process.An alkaline catalyst, usually sodium hydroxide (NaOH) or potassium hydroxide (KOH) is used in the process to improve the solubility and hasten the reaction.(KOH) [7][8][9]; this increases the solubility and speed of the reaction.
The common method to produce biodiesel is by transesterification.In this method, according to stoichiometry, one mole of triglyceride and three moles of alcohol react in the presence of Sodium Hydroxide (NaOH) to produce a mixture of glycol and fatty acid alkali esters (biodiesel) [10,11] (Figure 1).This transesterification process is influenced by several variables that can be optimized.However, expensive and time-consuming laboratory tests are required to optimize those variables in an experimental fashion.The biodiesel yield is affected by such variables as process temperature, type and dosage of catalyst, agitation time and speed, water content and impurities [12][13][14].Also, the type and quantity of products that are created during frying affect the biodiesel properties and transesterification reaction.For example, the methyl ester yield is affected by the water that is contained in waste cooking oil; this aids in the saponification reaction [15][16][17].Pretreating the line prior to transesterification is necessary to remove the undesirable compounds that waste cooking oil contains.
Transesterification of waste cooking oil was involved in the present work.The process used NaOH as a base catalyst.Soft computing and machine learning formed the basis of the optimization process; these are useful in solving problems in engineering and optimization [18,19].Biodiesel production is influenced by several parameters.Therefore, the determination of the optimum values of these parameters is crucial for biodiesel production performance and scaling up.Mathematical and statistical-based models can provide vital information for the understanding, analysis and prediction of transesterification processes and are required for the optimization of parameters of these processes to improve biodiesel properties.Modelling and optimizing biodiesel production processes will contribute to a greater understanding of the transesterification process features like yield and production rate [20].
A successful optimization study for biodiesel manufacturing process could assist manufacturers in the future development of mass production facilities.Traditionally, modelling and optimizing biodiesel have been carried out using Design of Experiments (DoE) [21] and optimization techniques [22][23][24][25].These approaches have been used extensively and their concepts and limitations are well known.For example, the factorial DoE has been shown to be unappealing, since it is time consuming, resource demanding and requires intensive work when the numbers of input factors are increased [26].
To date, most of the methods used in studies to optimize biodiesel production process conditions are based on Response Surface Methodology (RSM) and Artificial Neural Networks (ANN).However, regression models that have been based on Support Vector Machines (SVM) have also been implanted in recent years.They have been used for the study of the study biodiesel production process conditions and alcoholysis reactions [27].Following this approach, some researchers modeled and optimized various factors that affect the transesterification process by using regression models that are based on data mining techniques.For example, the molar ratio, reaction temperature and amount of catalyst in biodiesel production were examined in Moradi, et al. [28] as operational conditions with the biodiesel yield estimated with the use of ANN.Fernandez et al. [29] used Genetic Algorithm (GA) and ANN to optimize biodiesel production process parameters.Both works used experimental data as the basis for modelling and optimizing.Corral et al. [23] employed RSM to optimize the biodiesel from waste cooking oil.Mumtaz et al. [24] used RSM to optimize biodiesel that was produced from rice bran and sunflower oil.RSM was used by Mansourpoor and Shariati [25] to optimize biodiesel from sunflower oil.Yuan et al. [21] examined the production of biodiesel from waste rapeseed oil that had a high level of free fatty acids for use as feedstock for biodiesel production.They optimize with RSM the conditions necessary for maximum conversion to biodiesel.They also sought to understand the importance of factors that are relevant to biodiesel production and their interaction.
In addition, many studies have proposed machine learning model approaches, such as ANN or SVM, as alternative methods to solve complex and ill-defined problems [30].ANN has become popular due to its usefulness in prediction capability, even with small databases.This methodology could be employed in simulators to control and optimize processes.For example, Moradi et al. [28] used ANN to obtain the optimal combination of inputs or process variables in biodiesel yield.Yuste and Dorado developed an ANN model to simulate biodiesel production by the transesterification of used frying olive oil [18].In addition to proposing the use of ANN for predicting aspects of biodiesel production, various studies have presented ANN combined with an optimization technique for different optimization purposes.For example, Rajendra et al. [29] used hybrid ANN-GA to optimize biodiesel production.In this case, the input parameters for the ANN to generalize the pretreatment process were catalyst concentration, methanol to oil molar ratio, the initial acid value of vegetable oil and reaction time.The output parameter was the final acid value of biodiesel.Betiku and Taiwo [31] used RSM and ANN to optimize bioethanol production using breadfruit as a potential substrate.Mathematical models were also employed in forecasting the bioethanol yield.More recently, Betiku et al. [2] used ANN, combined with GA and RSM, to model and optimize the process parameters of biodiesel production from the Shea tree (Vitellaria paradoxa).
However, although extensively used, both ANN and GA have deficiencies and other techniques can improve the results.Thus, more advances in the hybridization of prediction and optimization algorithms are still required to optimize the transesterification reaction of biodiesel.In one study, Kusumu et al. [27] investigated the variables that affect the acid-catalyzed transesterification process of Ceiba Pentandra oil, using the SVM approach to optimize five aspects of transesterification (catalyst weight, the molar ratio of methanol to oil, reaction temperature, agitation speed, and reaction time) in order to achieve a high yield, and reduce production costs.
The literature contains numerous studies concerning the production of biodiesel using waste cooking oil.However, there are few studies concerning the use of SVM models to model mathematically transesterification process conditions and subsequent optimization using GA.For this reason, the present work examines the effectiveness of using regression techniques that are based on SVM to find and identify the best combination of transesterification process characteristics when waste cooking is used for various biodiesel production optimization scenarios.In this work, process temperature, mixing time, mixing speed, impurities and humidity of waste cooking oil and dosage of catalyst, were considered to be input features.Density, viscosity, turbidity, high heating value, and yield were considered to be outputs features.(Figure 1).
Based on the dataset obtained from the proposed biodiesel production process, SVM regression techniques were applied with different kernels to model the final biodiesel properties features that were selected.Also, a comparison of ordinary Linear Regression (LR) and the previous nonlinear technique was made to determine the influence on non-linearity.Training of these regression models used the data that were obtained by following the DoE that was proposed and used repeated cross-validation [32,33].Next, the proposed models were tested to determine their degree of generalization.The additional data that were selected at random complete the range of possibilities.

Materials
Waste domestic cooking oils were collected from local restaurants to provide raw material for the production of biodiesel by transesterification using NaOH as the catalyst.The synthesis used the following methanol 98% (GR for analysis, Merck, Darmstadt, Germany) and NaOH (GR for analysis, Merck, Darmstadt, Germany) as reagents.

Design of Experiments and Design Matrix
A DoE is an experimentation method that is structured and systematized and in which all factors vary simultaneously throughout a set of experimental runs.This enables the determination of the relationship between factors that affect dependent responses of the process.DoE [34] is used to minimize the number of experiments required and to obtain sufficient detail to support a hypothesis.It is generally hypothesized that there are a number of variables that decide the value of responses (outputs) that have a differentiable function.These include independent and controllable variables (inputs or design factors) and uncontrollable variables (noise factors).The method that is proposed for development of the DoE entails constructing a design matrix (factor and levels) and measuring experimentally the responses [35].The input features in this study, and the experimentally selected optimization factors that were used were the following: the quantity of the NaOH catalyst (C), the methanol/oil molar ratio (M), the temperature of the reaction (T), the humidity (H), the stirring speed (S), time (t), and impurities (I).The outputs that were examined were: turbidity (Turb), yield (η), viscosity (µ), density (ρ), and high heating value (HHV).On the basis of the features that were selected, the DoE (Table 1) was undertaken with a Box-Behnken design [36].The latter consists of a fractional, three-level design that requires fewer experiments than a full three-level design.The reduction in the number of experiments saves time and expense.In addition, it lessens the difficultly in preparing the initial samples for each experiment.
The DoE that is defined in Table 1 indicates that a design matrix was generated with 56 experiments and their corresponding factors and levels (Table 2).The variable levels that appear in Table 1 address the intervals that commonly appear in the literature in regards to biodiesel production [37][38][39][40][41].

Materials
Waste domestic cooking oils were collected from local restaurants to provide raw material for the production of biodiesel by transesterification using NaOH as the catalyst.The synthesis used the following methanol 98% (GR for analysis, Merck, Darmstadt, Germany) and NaOH (GR for analysis, Merck, Darmstadt, Germany) as reagents.

Design of Experiments and Design Matrix
A DoE is an experimentation method that is structured and systematized and in which all factors vary simultaneously throughout a set of experimental runs.This enables the determination of the relationship between factors that affect dependent responses of the process.DoE [34] is used to minimize the number of experiments required and to obtain sufficient detail to support a hypothesis.It is generally hypothesized that there are a number of variables that decide the value of responses (outputs) that have a differentiable function.These include independent and controllable variables (inputs or design factors) and uncontrollable variables (noise factors).The method that is proposed for development of the DoE entails constructing a design matrix (factor and levels) and measuring experimentally the responses [35].The input features in this study, and the experimentally selected optimization factors that were used were the following: the quantity of the NaOH catalyst (C), the methanol/oil molar ratio (M), the temperature of the reaction (T), the humidity (H), the stirring speed (S), time (t), and impurities (I).The outputs that were examined were: turbidity (Turb), yield (η), viscosity (µ), density (ρ), and high heating value (HHV).On the basis of the features that were selected, the DoE (Table 1) was undertaken with a Box-Behnken design [36].The latter consists of a fractional, three-level design that requires fewer experiments than a full three-level design.The reduction in the number of experiments saves time and expense.In addition, it lessens the difficultly in preparing the initial samples for each experiment.
The DoE that is defined in Table 1 indicates that a design matrix was generated with 56 experiments and their corresponding factors and levels (Table 2).The variable levels that appear in Table 1 address the intervals that commonly appear in the literature in regards to biodiesel production [37][38][39][40][41].

Experimental Procedure
To remove insoluble impurities, the waste cooking oils that were collected were filtered.Then, they were heated to 100 • C to remove moisture.Experiments on a laboratory scale setup were conducted.The oil was heated to the necessary temperature by a water bath.A magnetic hot plate stirrer that had a temperature controller was used.A constant stirring speed was maintained during each experiment.Then, NaOH and methanol were added.When the reaction had reached the predetermined time, heating and stirring were ended.The biodiesel that was produced was transferred into glass containers and later analyzed to determine the ρ, µ, Turb, HHV and η.
In addition, the following variables were measured in the final biodiesel product: Turbidity by a 2100Q Turbidimeter (HACH, Loveland, CO, USA) and kinematic viscosity by a Cannon-Fenske viscometers (Cannon Instrument Co., State College, PA, USA) at 40 • C per the standard American Society of Testing Materials (ASTM) D445 method [43].Density was established by a pycnometer following ASTM D941 [44].The biodiesel HHV was determined by a PARR 1351 bomb calorimeter following the ASTM D240 standard method [45].

Support Vector Machines
SVM is one of the numerous nonlinear regression techniques.Studied intensively, it is applied for its use as a universal approximation [46][47][48].A kernel-based algorithm is the basis of this method.SVM has sparse solutions.Its predictions for new inputs depend on the kernel function evaluation at a subcategory of occurrences during a training stage.The objective of this method is to find a function to minimize the final error in Equation (2).
where y(x) is the predicted value, w is the vector with parameters that define the model, b is the value of the bias and φ(x) fixes the feature-space transformation.In this method, the error function that appears in the simple linear regression (Equation ( 3)) is replaced by an insensitive error function (Equation ( 4)).The latter assigns a zero to values when exceeds the difference between the target and the predicted value.If the difference is not less than , the error function maintains its value.(The foregoing appears to be a contradiction.)In order to minimize Equation ( 5), a cost (C) is also assigned to the difference between the target and predicted values.
where y(x) is the value that Equation (2) predicts, t is the searched target function, is the margin where the function does not penalize, and C is the penalty.The process is optimized, but the initial function (Equation ( 2)) increases in complexity (Equation ( 6)).
where α is one solution for the optimization problem that Lagrangian Theory makes possible.The data is transformed by the function to a higher dimensional feature space.This increases the accuracy of the nonlinear problem.Thus, the final function resembles Equation (7).
To program the methodology that was proposed, generate the design matrix and develop the regression models, R statistical software was selected [20].
To undertake this part of the method, the SVM models were trained by using 50 times repeated 10fold cross-validation.Their calculation times were not long.This made possible the use of the entire training dataset that came from the DoE, and was formed by 56 entries to create the models.However, it was necessary to first divide the initial database into 10 subsets, using nine subsets to build the model and using the tenth subset to calculate the error.Other errors were obtained when the procedure was repeated an additional 50 times.Finally, the arithmetic mean of all errors of the process was calculated.In addition, the various algorithms were trained, significant parameters for each algorithm were tuned, and their values that produced the best predictive performance were identified.The accuracy of the predictions were judged by means of the Root Mean Square Error (RMSE) and the correlation of real and predicted values.
Finally, after the most accurate models were built, selected and trained, they were tested in the laboratory in nine experiments and with new samples, to determine their real degree of generalization.

Genetic Algorithms for Optimization of Biodiesel Production Features
Regression models that had the best generalization capacity were selected to identify the best combinations of transesterification attributes that would yield desirable biodiesel production properties.The intention was to study and optimize nine different scenarios in biodiesel production from waste cooking oil.These were: maximizing η, minimizing Turb, minimizing ρ and µ, maximizing the HHV, minimizing C, minimizing S, minimizing t and minimizing M. The search for the best combination used GA-based evolutionary optimization techniques.Using these techniques for the optimization of industrial processes and devices [49][50][51][52][53][54] has been proven previously in the literature.GA conducts optimization with a population of several individuals in a way that resembles the behavior of biological process.The resulting solution approximates in general the global minimum of all possibilities [55].The optimization using GA involves the six following steps, (see Figure 2), [56,57].a new offspring.These portions have been selected on the basis of two positions and two longitudes.The majority (60%) of the new population consists of new offspring who were created by crossovers from selected parents and h the crossovers consisting of a change in a random number of bits of chromosomes of, alternatively, two randomly selected best individuals.(6) Mutation: Several individuals were selected from the 25% who had been chosen and the 60% who had been, were created by crossovers.This was done on the basis of a uniform probability and a random element of the chromosome that was flipped to create a new individual.Mutation is responsible for 15% of the new population.
The optimization process that provides in each scenario the best combination of transesterification attributes was the following: First, 1000 individuals transesterification parameters (M, C, T, S, t, H, I) or combinations thereof were randomly generated corresponding to the range that appears in Table 1.These first individuals or combinations formed the initial generation, generation 0. Based on these individuals, the variables that define biodiesel properties were predicted by the regression models that had the best generalization capacity in each case (SVM or LR models).Based on this prediction, an error obtained from a defined fitness function (J 1 ) permitted the selection of the individuals for the subsequent generations.In this case, the fitness function that was developed considered the maximization or minimization of the studied features, and the weights of the different proposed scenarios (Equation ( 11)).
where ω i are the weights that are applied to each component in the fitness function J 1 , and P j max,min defines the objective sought for each of the features.These weights ω i were so defined that the above-mentioned optimization scenarios would be achieved.Table 3 shows the weights ω i of each variable that are part of the fitness function J 1 to optimize the biodiesel production process for the nine different optimization scenarios.
After selecting 25% of the best individuals (i.e., 250) according to the fitness function (J 1 ) and the optimization scenarios that were selected, the crossover process was undertaken.It was first necessary for two selected parents to provide two complete chromosomes.Each chromosome was then coded in binary code.Then, a random number of crossings (1, 2, 3, or 4) was selected.Also, a number was randomly selected for each crossing to define for each crossing the initial bit's position.The longitude of the number of bits in the chromosome's crossing part was selected similarly.This information was used in selecting some of the first parent's bits.The second parent donated the remaining bits to create a new chromosome and generation.Positions 1 and 2 appear in Figure 2. Figure 2 also shows the lengths (longitude 1) and (longitude 2) that were used to cross two individuals from generation "0".Positions 1 and 2 were equal to "10" and "47" respectively.Longitudes 1 and 2 were equal to "9" and "10" respectively.(Values of all positions and longitudes are in bits).Finally, the binary code of new chromosomes was decoded in order to generate new individuals.The latter were formed by parameters that required adjustment to complete 60% of the new population (i.e., generation 1), namely 600 individuals.The remaining individuals (i.e., 15% or 150 individuals) were generated randomly to complete the new generation (i.e., generation 0).After selecting 25% of the best individuals (i.e., 250) according to the fitness function (J1) and the optimization scenarios that were selected, the crossover process was undertaken.It was first necessary for two selected parents to provide two complete chromosomes.Each chromosome was then coded in binary code.Then, a random number of crossings (1, 2, 3, or 4) was selected.Also, a number was randomly selected for each crossing to define for each crossing the initial bit's position.The longitude of the number of bits in the chromosome's crossing part was selected similarly.This information was used in selecting some of the first parent's bits.The second parent donated the remaining bits to create a new chromosome and generation.Positions 1 and 2 appear in Figure 2. Figure 2 also shows the lengths (longitude 1) and (longitude 2) that were used to cross two individuals from generation "0".Positions 1 and 2 were equal to "10" and "47" respectively.Longitudes 1 and 2 were equal to "9" and "10" respectively.(Values of all positions and longitudes are in bits).Finally, the binary code of new chromosomes was decoded in order to generate new individuals.The latter were formed by parameters that required adjustment to complete 60% of the new population (i.e., generation 1), namely 600 individuals.The remaining individuals (i.e., 15% or 150 individuals) were generated randomly to complete the new generation (i.e., generation 0).

Experimental Results
Table 4 shows the results obtained experimentally for the attributes of the transesterification process (ρ, µ, Turb, HHV, η) according to the Box-Behnken DoE design matrix (Table 2).

Experimental Results
Table 4 shows the results obtained experimentally for the attributes of the transesterification process (ρ, µ, Turb, HHV, η) according to the Box-Behnken DoE design matrix (Table 2).

Analysis of the Uncertainty
The analysis of variance (ANOVA) was examined to evaluate the uncertainty in the experimental measurements due to the proposed DoE.The five properties of the biodiesel that were studied experimentally were compared to the seven transesterification process attributes to identify any relationships (Table 5).
The p-values that were obtained had low values for two features, C and M.This indicates that the apparent relationships were statistically significant.Therefore, a p-value of less than 0.1 was considered to denote low uncertainty.Consequently, the model had no predictive usefulness and could be rejected.In this case, C satisfies this condition for each of µ, HHV, and η.

Regresion Models Based on Data Mining Tecniques
After selecting the features of the final data set, the next steps were to train, test and validate prediction models.The construction of regression models that predict aspects of a biodiesel production process borrowed from machine learning techniques that use SVM technique.In addition, LR was used to be able to compare linear and nonlinear techniques.The method in this case was to conduct 56 experiments according to the proposed DoE.The resulting data were normalized to between 0 and 1.The models were trained by using these 56 instances by 50 times repeated 10fold cross-validation.The RMSE and the real and predicted values correlation were obtained during training to permit the models' accuracy to be compared.Meanwhile, each algorithm's most important parameters were tuned to improve the prediction capabilities.For example, Figure 3 gives the RMSE for output feature 'µ' that is training an SVM that has an RBF kernel when parameters, such as cost and σ vary.The most accurate model in this case had a cost equal to 1.4 and an σ equal to 0.2.
Nine new experiments were conducted for the purpose of testing the models with data that had not been used previously during training (Tables 6 and 7).These new data were selected randomly and sought to cover all possibilities of the problem from previous DoE and, also, to avoid overtraining the models.The training and testing stages' results appear in Tables 8-12 (selected models are in are in bold letters).A lower RMSE in training and testing usually implied a higher correlation.As a result, these conditions determined the model selection process.Nevertheless, the experts in some cases selected a model based on results obtained, although also considering other criteria.For example, in the case of HHV and µ, the models that were selected did not have fewer errors.However, some had outliers (Figure 4) that, when removed, improved the predictions of the selected models.This caused their selection.

Optimization of Biodiesel Production Process Based on Genetic Algorithms
Once the models were selected on the basis of their accuracy and relevance to the problem, GA was applied to optimize the process.In this case, the regression models that had the best generalization capacity (SVM with polynomial kernel for Turb and SVM with RBF kernel for η, ρ, µ and HHV) were used to search for the values of the features that define biodiesel production in the hope of finding some fixed biodiesel properties.
Results of this search are shown in Table 13, in which each scenario is indicated by two columns.The first column provides the predicted value of the transesterification process features obtained according to the methodology, after fixing biodiesel properties.The second column provides the experimentally obtained values of biodiesel properties as methodology validation, using the same values that were obtained in the predicted case for the features that control the transesterification process.
For the nine scenarios that were defined by prefixed aim values of the features η, Turb, ρ, µ and HHV (Table 3), objective functions were performed.These objective functions manage the search of the needed values of the features M, C, T, S, t, H and I to produce biodiesel with the final searched prefixed values of biodiesel properties.The search for a suitable combination of values to control the transesterification process to reach the goal biodiesel properties involved applying GA-based evolutionary optimization techniques.This technique enables one to determine the values that are necessary to define the transesterification process and to produce biodiesel with final specific properties.After the values were determined for these nine scenarios, the results obtained with mathematical models were validated with new real values obtained from the laboratory.Experimental samples of biodiesel production were produced that were based on the values of the attributes that defines the transesterification process.Subsequently, the biodiesel properties were measured.In order to validate the method, the relative errors of all scenarios and all properties were calculated.The mean of this error was 4.14%.This indicates that the method can achieve accurate results when the transesterification process is optimized.
For example, in the third scenario, the aim values were the following: η = 100, Turb = 1.9, ρ = 0.81, µ = 7.12 and HHV = 44.79.In order to produce biodiesel with these properties, it was necessary to fix the values for the parameters that define the transesterification process.The following values were adopted for this purpose: M = 7.88, C = 1.18,T = 20, S = 989, t = 34.22,H = 0.95 and I = 0.19.Later, an experiment was conducted to validate the method.In this experiment, the last values were used in the laboratory to produce biodiesel.Then, the biodiesel properties were measured to validate the method.The values in this case were: η = 92, Turb = 1.78, ρ = 0.83, µ = 7.01 and HHV = 42.59, with a relative error between the real values and the predicted values of 3.97%.

Optimization of Biodiesel Production Process Based on Genetic Algorithms
Once the models were selected on the basis of their accuracy and relevance to the problem, GA was applied to optimize the process.In this case, the regression models that had the best generalization capacity (SVM with polynomial kernel for Turb and SVM with RBF kernel for η, ρ, µ and HHV) were used to search for the values of the features that define biodiesel production in the hope of finding some fixed biodiesel properties.
Results of this search are shown in Table 13, in which each scenario is indicated by two columns.The first column provides the predicted value of the transesterification process features obtained according to the methodology, after fixing biodiesel properties.The second column provides the experimentally obtained values of biodiesel properties as methodology validation, using the same values that were obtained in the predicted case for the features that control the transesterification process.
For the nine scenarios that were defined by prefixed aim values of the features η, Turb, ρ, µ and HHV (Table 3), objective functions were performed.These objective functions manage the search of the needed values of the features M, C, T, S, t, H and I to produce biodiesel with the final searched prefixed values of biodiesel properties.The search for a suitable combination of values to control the transesterification process to reach the goal biodiesel properties involved applying GA-based evolutionary optimization techniques.This technique enables one to determine the values that are necessary to define the transesterification process and to produce biodiesel with final specific properties.After the values were determined for these nine scenarios, the results obtained with mathematical models were validated with new real values obtained from the laboratory.Experimental samples of biodiesel production were produced that were based on the values of the attributes that defines the transesterification process.Subsequently, the biodiesel properties were measured.In order to validate the method, the relative errors of all scenarios and all properties were calculated.The mean of this error was 4.14%.This indicates that the method can achieve accurate results when the transesterification process is optimized.
For example, in the third scenario, the aim values were the following: η = 100, Turb = 1.9, ρ = 0.81, µ = 7.12 and HHV = 44.79.In order to produce biodiesel with these properties, it was necessary to fix the values for the parameters that define the transesterification process.The following values were adopted for this purpose: M = 7.88, C = 1.18,T = 20, S = 989, t = 34.22,H = 0.95 and I = 0.19.Later, an experiment was conducted to validate the method.In this experiment, the last values were used in the laboratory to produce biodiesel.Then, the biodiesel properties were measured to validate the method.The values in this case were: η = 92, Turb = 1.78, ρ = 0.83, µ = 7.01 and HHV = 42.59, with a relative error between the real values and the predicted values of 3.97%.

Conclusions
An alternative and useful combustible biodiesel can be derived from vegetable and waste oils.It is compatible with diesel engines as a substitute fuel or when blended with diesel.Thus, the knowledge of the process to produce biodiesel from waste cooking oil by transesterification using NaOH as the catalyst is highly significant.This paper presents a group of regression models based on SVM techniques to predict several biodiesel properties or (viscosity, density, turbidity, HHV and yield) from attributes of the transesterification process (dosage of catalyst, molar ratio, humidity and impurities, mixing speed, temperature, mixing time).Firstly, the experimental data was obtained according to a Box-Behnken DoE to study the effects of the process features on the production of biodiesel.Subsequently, several regression models were constructed using this experimental dataset.This was done to predict the biodiesel properties that were being studied and to obtain greater understanding of the process.The biodiesel property that was best predicted was turbidity (Turb), which presented an RMSE test of 10.43%, whereas the property that presented the greatest error was density (ρ) whose RMSE was 44.91%.The SVM models with Polynomial kernel and with RBF kernel were the regression models that were selected to model the stated biodiesel properties.Finally, evolutionary optimization techniques that are based on GA were used to search for optimum values in nine prefixed scenarios.This optimization showed how to control the transesterification process to manufacture biodiesel that has specific properties.From the results, it is noted that some of the attributes of the transesterification process show a more greatly reduced range than for any of the nine scenarios studied.Thus, for example, the range of values obtained for molar ratio was 6.0 to 8.4; for dosage of catalyst it was 1.00 to 1.38 wt %; and for time it was 20.00 to 26.94 min.In contrast, some of the attributes had a fairly wide range of values that almost encompass the range proposed in the initial DoE.For example, the range of values obtained for the mixing speed was from 500.00 to 999.99 rpm; for the temperature it was 28.75 to 37.5 • C; for the humidity it was 0 to 2.31 wt %; and for impurities it was 0 to 2.99 wt %.Also, the entire process was validated.The final error of 4.14% showed that the methodology was accurate and that the predicted values were very similar to the experimental data.

Figure 1 .
Figure 1.Properties and attributes of the transesterification process in Biodiesel production.

Figure 1 .
Figure 1.Properties and attributes of the transesterification process in Biodiesel production.

( 1 )
Coding/decoding: The information from feature values for each individual is changed to binary codes that join to form chromosomes, and vice versa.Values that are out of the range proposed by DoE is replaced.(2) Population initialization: The initial population is generated by a random set of individuals.(3) Evaluation: The responses of models are evaluated by a fitness function to determinate the individuals that will be part of the next generation.(4) Selection: Sorting of individuals uses the criterion that the fitness function provides.The top 25% of individuals are selected for the next generation.(5) Crossover: Portions of two parents of the current generation have been combined to form

Figure 2 .
Figure 2. Script development in R language to implement the optimization based on GA and details of the crossovers and mutations of the individuals.

Figure 2 .
Figure 2. Script development in R language to implement the optimization based on GA and details of the crossovers and mutations of the individuals.

Figure 3 .
Figure 3. Results of the training period (50 times repeated 10 fold cross-validation) using SVM with a RBF kernel to predict the feature 'µ'.Values of the most accurate configuration: Cost = 1.4 and sigma = 0.2.

Figure 3 .
Figure 3. Results of the training period (50 times repeated 10 fold cross-validation) using SVM with a RBF kernel to predict the feature 'µ'.Values of the most accurate configuration: Cost = 1.4 and sigma = 0.2.

Figure 4 .
Figure 4. Correlation of real and predicted values of HHV during the training stage in two cases: (a) LR and (b) SVM (RBF kernel).

Figure 4 .
Figure 4. Correlation of real and predicted values of HHV during the training stage in two cases: (a) LR and (b) SVM (RBF kernel).

Table 1 .
Factors and levels applicable to the Box-Behnken DoE.

Table 2 .
Design matrix for transesterification process.

Table 3 .
Different proposed scenarios with weights of each component of the fitness function.

Table 4 .
Experimental results based on the Box-Behnken DoE.

Table 5 .
Results of the analysis of the variance of the five output features.

Table 6 .
Test matrix for transesterification process.

Table 7 .
Results of the experiments performed in the testing stage.

Table 6 .
Test matrix for transesterification process.

Table 7 .
Results of the experiments performed in the testing stage.

Table 8 .
Results obtained during the training and testing stages for η.

Table 9 .
Results obtained during training and testing stage for Turb.

Table 10 .
Results obtained during the training and testing stages for ρ.

Table 11 .
Results obtained during the training and testing stages for µ.

Table 12 .
Results obtained during the training and testing stages for HHV.