Statistical Modeling and Optimization of Process Parameters for 2,4-Dichlorophenoxyacetic Acid Removal by Using AC/PDMAEMA Hydrogel Adsorbent: Comparison of Different RSM Designs and ANN Training Methods

: In this study, the response surface methodology (RSM) and artiﬁcial neural network (ANN) were employed to study the adsorption process of 2,4-dichlorophenoxyacetic acid (2,4-D) by using modiﬁed hydrogel, i.e., activated carbon poly(dimethylaminoethyl methacrylate) (AC/PDMAEMA hydrogel). The effect of pH, the initial concentration of 2,4-D and the activated carbon content on the removal of 2,4-D and adsorption capacity were investigated through the face-centered composite design (FCCD), optimal design and two-level factorial design. The response surface plot suggested that higher removal of 2,4-D and adsorption capacity could be achieved at the higher initial concentration of 2,4-D and lower pH and activated carbon content. The modeling and optimization for the adsorption process of 2,4-D were also carried out by different design methods of RSM and different training methods of ANN. It was found that among the three design methods of RSM, the optimal design has the highest accuracy for the prediction of 2,4-D removal and adsorption capacity (R 2 = 0.9958 and R 2 = 0.9998, respectively). The numerical optimization of the optimal design found that the maximum removal of 2,4-D and adsorption capacity of 65.01% and 65.29 mg/g, respectively, were obtained at a pH of 3, initial concentration of 2,4-D of 94.52 mg/L and 2.5 wt% of activated carbon. Apart from the optimization of process parameters, the neural network architecture was also optimized by trial and error with different numbers of hidden neurons in the layers to obtain the best performance of the response. The optimization of the neural network was performed with different training methods. It was found that among the three training methods of the ANN model, the Bayesian Regularization method had the highest R 2 and lowest mean square error (MSE) with the optimum network architecture of 3:9:2. The optimum condition obtained from RSM was also simulated with the optimized neural network architecture to validate the responses and adequacy of the RSM model.


Introduction
Recently, water pollution has become one of the most common issues in the world. Agriculture practices, which contribute about 70% usage of surface water supplies, are one of the major causes of water pollution due to the usage of herbicides or pesticides by the farmer [1,2]. Among different types of herbicides, 2,4-dichlorophenoxyacetic acid (2,4-D) developed ANN model are able to predict and optimize the removal of chromium (VI) at various operating conditions with reasonably high accuracy.
Selecting an RSM design is important to get better prediction and optimum results as there are many different designs in RSM. On top of that, selection of architecture in the neural network especially the number of hidden layers or nodes could affect the performance of the process [19,20]. Nevertheless, none of the reported studies have compared the performance of the adsorption process with different RSM designs and ANN training methods. This is because most of the researchers only focused on the predictive capability between the commonly used design methods and training methods of both RSM and ANN. In addition, no information exists with respect to the modeling and optimization of 2,4-D adsorption by using activated carbon poly(dimethylaminoethyl methacrylate) (AC/PDMAEMA hydrogel), except for the study conducted by Taktak et al. [4]. Taktak et al. [4], however, only focused on the prediction and optimization of 2,4-D removal with modified hydrogel by using a face-centered composite design (FCCD) of RSM. It is important to ensure the optimum condition that can maximize the performance of the adsorption process by taking into account the selection of the right RSM design and the ANN training method. Therefore, the key objective of this study was to predict and optimize the 2,4-D removal by modified hydrogel (AC/PDMAEMA hydrogel) using different RSM designs and ANN training methods by employing the experimental data obtained by Taktak et al. [4]. The interaction of independent variables (such as pH, initial concentration of 2,4-D and activated carbon content) toward the removal of 2,4-D and adsorption capacity were also investigated.

Materials and Methods
The experimental data on the batch adsorption of 2,4-D with modified hydrogel by using a face-centered composite design (FCCD) of RSM were obtained from the study conducted by Taktak et al. [4]. The modified hydrogel was prepared using activated carbon (extracted from pomegranate husk), (dimethylamino) ethyl methacrylate (DMAEMA), N, N'-methylenebisacrylamide and ammonium persulfate. In their study, the percentage removal of 2,4-D (%) and adsorption capacity of 2,4-D (mg/g) were taken as the responses and were calculated using the following equations.
Removal of 2, 4 − D (%) Adsorption capacity(mg/g) = where C 0 and C e are the initial and equilibrium concentrations of 2,4-D (mg/L), respectively, m is the mass of the hydrogel adsorbent (g), and V is the solution volume (mL).

Response Surface Modeling
In the present study, the adsorption of 2,4-D was studied to determine the optimum condition for the maximum removal of 2,4-D and adsorption capacity. The influence of pH, initial concentration of 2,4-D and activated carbon content on the responses was also investigated through the RSM study. In general, the independent variable was varied at two levels, as shown in Table 1. The range for the independent variables was selected based on the previous study [4]. For the RSM modeling, Design Expert V12.0 (Stat-Ease, Inc., Minneapolis, MN, USA) software was used, and various design techniques were employed, i.e., face-centered composite design (FCCD), optimal design and two-level factorial design. Based on the data inserted in the design matrix, Design Expert V12.0 software generated an empirical model (i.e., quadratic model) as shown below to describe the relationship between the variables and the responses [14].
whereŷ is the response or output, x i and x j are the input factors, β 0 , β i , β ii , β ij are coefficients for intercept, linear, quadratic and interaction parameters, respectively, and ε is the residual associated with the experiments [21]. Since different design methods have different steps in designing the experiment, the details on each of these steps are further explained in the following sections.

Face-Centered Composite Design (FCCD)
The modeling was carried out by selecting the numeric factors as shown in Table 1. The generated experimental design matrix in the FCCD (with the coded distance (α) of 1) consisted of 20 runs where 8 of the runs were the factorial point, 6 of them were the axial point, and another 6 were the center point.

Optimal Design
In this design, the constraint was built by considering the problem vertex and the constraint point for each factor. The problem vertex was selected based on the level of factors that gave low responses. Based on the previous experimental data [4], the responses were low when a pH of 9, 20 wt% activated carbon and 20 mg/L initial concentration of 2,4-D were used. Hence, these values were chosen as the problem vertex to be excluded from the design space. Based on the problem vertexes, the constraint points for each factor were chosen by considering the point that would be feasible to be run by the design. Table 2 summarizes the input parameters for each factor that was required to be inserted in the constraint tool. Based on the value of the input parameters, the constraint equation was built. Equation (4) below shows the constraint equation generated by the parameters from Table 2.
Prior to the final step of design, the search method of point and optimality for the design were chosen such that it could give the best performance to the data. There were 20 runs required where 10 of them were the required model point, 5 of the runs were the replicate point, and the rest were the lack-of-fit point. Since the custom design generated an unusual combination of factors in the experimental design, the response for the factors was generated based on an empirical model built by the FCCD [4].

Two-Level Factorial Design
The building of this design started by selecting the number of factors that needed to be studied. Based on the selected number of factors, there were only 8 runs needed in this design. The design of the experiment continued with the power calculation which included determination of the signal and noise. Signal is the smallest change in response that could be considered an achievement, and its value was defined by the user itself. Meanwhile, the noise was taken from the standard deviation as stated in the previous study [4]. Table 3 summarizes the input parameters inserted for the design power calculation. It was important to ensure that the design power was greater than 80% as it represents the probability of success for the effect that we wanted to detect in the study [22]. The performance of the responses predicted by the empirical model built by the software was analyzed using analysis of variance (ANOVA). The values of the coefficient of variation (C.V) and coefficient of determination (R 2 ) were used to identify the model with the best fit. The model with the best fit had a lower C.V and a higher R 2 . The adequacy of the model was determined from the significance of the model and the lack of fit. The significance of the model or lack of fit was determined based on the p-value or 'Prob>F' displayed in the ANOVA section of the software. The model must have a value of 'Prob > f' or a p value of less than 0.05 to significantly display the relationship between the response and the factors. Meanwhile, the lack of fit of the model must be insignificant (p > 0.05) for the model to fit well with the data [22]. The transformation of the data or model reduction was performed to improve the statistical performance of the selected model, whereby the Box-Cox graph in the diagnostic tab of the Design Expert software was used as a reference. Numerical optimization was performed for each design method to obtain the optimum condition of the adsorption process. All the independent variables were kept in range, while the responses were kept at a maximum. The restriction for the upper and lower limits of the responses was used to ensure a unique optimum condition (only 1 solution) at high desirability (more than 0.9) suggested by the software.

Neural Network Modeling
In the present study, Matlab R2021a (The MathWorks, Inc., Natick, MA, USA) software was used to build the ANN model, using the feedforward backpropagation network with the learning method 'learngdm' (gradient descent with momentum weight and bias learning function). The neural network created consisted of an input layer (i.e., pH, initial concentration of 2,4-D and activated carbon content), an output layer (i.e., removal of 2,4-D and adsorption capacity) and a hidden layer. The hidden nodes in the hidden layer were adjusted from 1 to 10. Trifonov et. al. [23] suggested that the optimum number of neurons in the hidden layer could be estimated by N/2, where N is the number of input variables or experimental data. The tangent sigmoid transfer function (tansig) and linear transfer function (purelin) were applied to the hidden layer and output layer, respectively. Three different training methods (i.e., Levenberg-Marquardt, Bayesian and Scaled Conjugated Gradient) were used in the ANN models, and the data division in the simulation was set at default to divide randomly.
The experimental data taken from the previous study [4] were used to constitute the optimum architecture of the ANN model, whereby the network was trained until a high overall correlation coefficient (R) value for training, testing, validation and all data sets was obtained. In other words, the R value was the stopping point for the training of the network. According to Mourabet et al. [24], the value of the overall correlation coefficient (R) could be used as a measure of the network's predictive capability. Prior to the training of the network, the data were normalized within a range of 0 (new x min ) to 1 (new x max ) using the following equation to obtain fast convergee and minimal mean square error (MSE) values [25].
where x n is the normalized value of x i , and x min and x max are the minimum and maximum values of x i , respectively. After a desirable result for a network was achieved, the weight, bias and the output data predicted were recorded. Based on the experimental response and predicted response of the network, the values of the MSE and the coefficient of determination (R 2 ) were calculated using the following equations.
where N is the number of data, y prd,i is the ith predicted property characteristic, y exp,i is the i th measured value, and y m is the mean value of y exp,i . The calculation for the MSE and R 2 was performed manually since the 'nntool' command was used instead of 'nnstart' to generate the neural network toolbox. Note that each network was trained separately, and the best performance of networks (with different training methods) over different numbers of hidden nodes in the architecture was chosen according to the value of MSE, R 2 and overall R value. The chosen optimized architecture of the ANN for each training method was used in the post analysis of the result to validate both responses predicted by RSM models and its model adequacy. The post analysis of the result was performed by running the optimum condition generated by RSM designs in the selected optimized ANN model. The predicted results under optimum conditions were inserted in the confirmation tab in the Design Expert V12.0 software, and the average result was observed. If the average result was predicted to fall within the 95% prediction interval (PI), then the empirical model generated by different designs of RSM was useable even though they had a significant lack of fit [22].

Comparative Analysis of RSM and ANN Models
The comparisons were made with respect to different training methods and designs in the ANN and RSM, respectively. The values of calculated R 2 and MSE were used in determining the performance of the ANN model. However, the performance of RSM models was determined by observing the values of R 2 and the coefficient of variation (C.V, %) generated by the Design Expert V12.0 software.

Response Surface Methodology (RSM)
Design Expert V12.0 software was used to study the effect of pH, initial concentration of 2,4-D and the activated carbon content on the removal of 2,4-D and adsorption capacity of modified hydrogel. The predictive modeling and optimization were performed using different designs of RSM.

Predictive Modeling
All the empirical equations (suggested by the software) for predicting the removal of 2,4-D (Y1) and adsorption capacity (Y2) were expressed in coded form, where A is the pH, B is the initial concentration of 2,4-D, and C is the activated carbon content. The generated empirical equation for the FCCD is shown in Equation (8) as a reduced quadratic model for the removal of 2,4-D, while Equation (9)  The developed empirical equation generated in the optimal design is shown in Equations (10) and (11). The reduced quadratic model was suggested by the software for the removal of 2,4-D, and the quadratic model was recommended for predicting adsorption capacity.
On top of that, the empirical model generated in the two-level factorial design is shown in Equations (12) and (13). The main effect model was suggested by the software for the removal of 2,4-D. Meanwhile, a reduced two-factor interaction (2FI) model was recommended for predicting adsorption capacity.

Statistical Analysis
The statistical analysis for each design was performed using ANOVA to determine the adequacy of the models generated by the software as shown in Tables 4-6. In this study, both models (with respect to the removal of 2,4-D and adsorption capacity of modified hydrogel) for all the designs were significant (p < 0.05). The lack of fit for the FCCD was found to be significant (p < 0.05) for both models. The software cannot generate/calculate the lack-of-fit F-statistic for the other two designs. The lack of fit was found to be significant as the pure error was very low since the original data taken from the previous study [4] have very little variation in the responses for factors at the center points. This may indicate that the center points could not capture all normal process variations in the system. Despite all the effort in the transformation of the model and reduction of the model, the lack of fit for all models was still found to be significant. Therefore, the post analysis of the result predicted by the RSM model was performed to determine the model's usefulness.
Although the lack-of-fit analysis shows that the models do not fit well with the data, the adequacy of the model could also be determined from other fit statistic parameters such as R 2 and C.V (as shown in Tables 4-6 above). The values of R 2 for the removal of 2,4-D were 0.9886, 0.9958 and 0.9960 while the values of R 2 for adsorption capacity were 0.9920, 0.9998 and 0.9516 in the FCCD, optimal design and two-level factorial design, respectively. According to Aklilu et al. [25], the value of R 2 must be at least 0.8 for the model to be a good fit. Therefore, the predicted data in all the models were in good agreement with the experimental data since these three designs have values of R 2 of more than 0.8. In addition, C.V is the ratio of standard error in predicting the mean value of the actual response. In other words, C.V could be used to represent the repeatability of the model. The values of C.V for the removal of 2,4-D were 7.58, 2.39 and 3.22 while the regression model for adsorption capacity has a C.V of 8.25, 0.9651 and 31.89 in the FCCD, optimal design and two-level factorial design, respectively. Lower C.V indicated a better precision and reliability of the experiment [25]. From the above statistical analysis, the optimal design was found to have the best performance in predicting both responses. This is because the values of standard deviation and C.V were found to be the lowest compared to other design methods. In addition, the value of R 2 for response predicted by optimal design surpassed most of the other design methods. However, it should be noted that the result in the optimal design could be misleading since the experimental data for optimal design were obtained from the empirical model while the data for other designs were taken from the result of the experiment conducted by Taktak et al. [4]. Data generated from the empirical model may have less error compared to the raw data from the experiment itself. In addition, the empirical model resulted from the modeling and fitting of the data was generated by the software such that its performance to predict the responses was the best that it could be.

Analysis of Response Surface
To facilitate a straightforward examination of the effect of independent variables and their interaction, the developed mathematical model was utilized to construct threedimensional (3D) response surfaces. Since the effect of process parameters on the responses had the same pattern as they originated from the same adsorption study, it was sufficient to analyze the response surface plot for one of the design methods only. In this study, the 3D response surface plot in a two-level factorial design was chosen to be analyzed. This model was selected to distinguish it from the other model (FCCD) that had been studied by Taktak et al. [4], as well as to cross-check with previous findings. Figure 1a,b show the interaction between the initial concentration of 2,4-D and pH on the removal of 2,4-D and the adsorption capacity of the adsorbent. It can be seen from the figure that both responses increased when the initial concentration was raised from 20 to 100 mg/L. This is because the initial concentration of 2,4-D provided an important driving force to overcome all mass transfer resistance of the adsorbate between the solid and aqueous phase. According to Fick's second law, by increasing the driving force, the rate of mass transfer along the concentration gradient would consequently increase. These results are in agreement with Taktak et al. [4] and Bazrafshan et al. [9]. In contrast, both responses tended to decrease when the pH of the solution was adjusted from 3 to 9. According to Bazrafshan et al. [9], pH was the key parameter in the removal of the pollutant by the adsorbent as it could control the electrostatic force between the adsorbent and the adsorbate.
Their study on the adsorption of 2,4-D with single-wall carbon nanotubes as the adsorbent found that both the removal of 2,4-D and adsorption capacity fluctuated as the pH was increased from 3 to 13. A similar study was also reported by Safa and Bhatti [26], where they studied the usage of rice husk as an adsorbent for the removal of Everdirect Orange-3GL and Direct Blue-67 textile dyes. It was found that as the pH was increased from 2 to 12, the adsorption capacity decreased from 20.5 to 18.8 mg/g. The effect of pH of the solution and the activated carbon content on the removal of 2,4-D and adsorption capacity of modified hydrogel is shown in Figure 2a,b. It could be seen from the figure that both responses decreased when activated carbon content was increased from 2.5 to 20 wt%. This is because the adsorbent was made by introducing activated carbon from pomegranate husk into the polymeric network of hydrogel. Therefore, as more activated carbon was introduced, the structure of the adsorbent became less porous. Consequently, the uptake capacity of the adsorbent as well as the removal of 2,4-D decreased. A similar result was also reported by Xu et al. [27] whereby the adsorption capacities of the biosorbent prepared from rice husk toward heavy metals from simulated wastewater decreased with increasing adsorbent content. Figure 3a,b show the interaction between the initial concentration of 2,4-D and activated carbon content on the responses. As discussed previously, a high initial concentration of 2,4-D and low activated carbon content cause a positive effect on both responses. Hence, the maximum removal of 2,4-D and adsorption capacity were obtained when the lowest level of activated carbon and the highest level of initial concertation of 2,4-D were applied in the experiment. In summary, pH plays the most important role in the adsorption of 2,4-D followed by activated carbon content and the initial concentration of 2,4-D. This could be proven from the F value and p value of ANOVA shown previously.

Optimization
In this study, it was aimed to achieve the maximum removal of 2,4-D and adsorption capacity of modified hydrogel. Thus, the goal chosen in the numerical optimizations was the maximum removal of 2,4-D and adsorption capacity while the process variable (pH, initial concentration of 2,4-D and activated carbon content) was set to be in a range. In the first optimization, there were too many solutions with the same desirability suggested by the software for each design. Thus, the limit for the goals was restricted. Then, the second optimization was performed by increasing the upper and lower limits of goals. By doing so, a stretch could be put in the maximization of the goal. Otherwise, many potential optimum conditions may come up such as the one encountered in the first optimization. Nevertheless, the range of the stretch had to be carefully chosen so that the desirability of the optimum condition suggested was not too low. In the current study, the range was adjusted such that the maximization of response was obtained at a unique optimum condition with a desirability of more than 0.9. The result of the second optimization is tabulated in Table 7.

Artificial Neural Network (ANN)
The architecture of ANN for different training methods (i.e., Levenberg-Marquardt, Scaled Conjugate Gradient and Bayesian Regularization) were constructed by varying the number of neurons in the hidden layer. The optimization of the ANN model was performed by training the network until the value of the correlation coefficient (R) for training, testing, validation and all prediction sets of more than 0.9 was obtained. In other words, the R value was the stopping point for the training of the network in the measurement of the network's predictive capability [24]. Meanwhile, the best architecture of the optimized ANN model for each training method was determined by the value of MSE and R 2 calculated for both responses. The following Figures 4 and 5 show the value of MSE and R 2 , respectively, for both responses over different numbers of hidden nodes for different training methods. The results presented in Figures 4 and 5 show that the optimized network with the Levenberg-Marquardt training method had the best performance (MSE of 0.0003 and 0.0006; R 2 of 0.9928 and 0.9833) for the removal of 2,4-D and adsorption capacity, respectively, when eight hidden nodes were used. Meanwhile, the best performance of the optimized network with the Scaled Conjugate Gradient training method was obtained when nine hidden neurons were used. The value of MSE for this network was 0.002 and 0.0019, while the value of R 2 was 0.9576 and 0.9552 for the removal of 2,4-D and adsorption capacity, respectively. On top of that, the best architecture for the Bayesian Regularization training method was found to be 3:9:2 as this model has a very low MSE (0.0004 and 0.0003) and high R 2 (0.9910 and 0.9924) for the removal of 2,4-D and adsorption capacity, respectively. High R 2 and low MSE values show that the model has high accuracy in predicting both responses. To sum up, the architecture of the optimized network for the Levenberg-Marquardt, Scaled Conjugate Gradient and Bayesian Regularization training methods was 3:8:2, 3:9:2 and 3:9:2, respectively. Hence, it was decided that these networks were used in the post analysis of results whereby the responses under optimum conditions generated by different designs of RSM models were predicted. The comparison between the experimental and the computed ANN data's spread plot in training, testing and validation for the selected networks is shown in the following Figure 6. Figure 6a shows the R value of 0.99998, 0.97368, 0.98974 and 0.99451 for training, testing, validation and all data sets, respectively, for the optimized network with the Levenberg-Marquardt training method. The R value for the optimized network with the Scaled Conjugate Gradient training method is shown in Figure 6b with the value of 0.98161, 0.93959, 0.99975 and 0.97985 for training, testing, validation and all data sets, respectively. The high value of R indicated that these optimized models have good precision as the R value was a measure of the predictive ability of the networks [24]. Additionally, Figure 6c shows the R value of 0.99999, 0.9922 and 0.99632, respectively, for training, testing and all data sets for the optimized network with the Bayesian Regularization training method. There was no R value for validation data sets available in this training method since the validation stop was disabled by default in the training parameter settings. Nevertheless, this training method has its own validation built into its algorithm, where the validation was performed in the form of regularization [22]. This result was analogous to those reported by Jazayeri et al. [28] whereby no data were available on the validation performance when they estimated the output power of a photovoltaic (PV) module.

Post Analysis of Results
The post analysis was performed to validate the responses predicted by the RSM models under optimum operating conditions recommended by the software. A similar study was reported by Xu et al. [27], where three independent experiments were performed under the predicted optimum condition to confirm the model prediction. In addition, according to StateEase [22], for a case where the lack of fit was found to be significant, an additional experiment could be run under optimal conditions. In the current study, the additional experiment was replaced with the simulation of a selected optimized network of the ANN model. The comparison between the responses predicted by the RSM model and the selected optimized ANN model under optimum conditions recommended by the Design Expert software is shown in Table 8. By using the confirmation tab in the Design Expert software, the average result for all RSM models was found to fall within the 95% prediction interval (PI). Hence, the model was useable, and the result predicted by the models was validated.

Conclusions
In this study, the modeling and optimization for the adsorption process of 2,4-D using modified hydrogel (AC/PDMAEMA hydrogel) were conducted with different designs of RSM and different training methods of ANN. The 3D response surface plot in RSM shows that the maximum removal of 2,4-D and adsorption capacity was achieved when a high initial concentration of 2,4-D was used. In contrast, the high pH of the solution and the activated carbon content in the polymeric network of the adsorbent had a negative impact on both responses. Among different design methods of RSM, the empirical model generated by optimal design was found to have the highest value of R 2 for both responses with the lowest C.V value. However, further study needs to be conducted in the future to confirm the result since this result could be misleading due to the different sets of experimental data used in this design. In the ANN modeling, the Bayesian Regularization training method (with an architecture of 3:9:2) was found to have the best performance among different training methods with a very low MSE and a high R 2 for both responses. From the numerical optimization, the maximum removal of 2,4-D and adsorption capacity in optimal design were 65.01% and 65.29 mg/g, respectively, which were obtained at a pH of 3, initial concentration of 2,4-D of 94.52 mg/L and 2.5 wt% of activated carbon. The post analysis with the optimized ANN model found that the responses predicted by RSM under optimum conditions were validated and the generated models were useable even though they had a significant lack of fit. Data Availability Statement: Data sets generated during the current study are available from the corresponding author on reasonable request.