Artificial Neural Network Approach for Modelling of Mercury Ions Removal from Water Using Functionalized CNTs with Deep Eutectic Solvent

Multi-walled carbon nanotubes (CNTs) functionalized with a deep eutectic solvent (DES) were utilized to remove mercury ions from water. An artificial neural network (ANN) technique was used for modelling the functionalized CNTs adsorption capacity. The amount of adsorbent dosage, contact time, mercury ions concentration and pH were varied, and the effect of parameters on the functionalized CNT adsorption capacity is observed. The (NARX) network, (FFNN) network and layer recurrent (LR) neural network were used. The model performance was compared using different indicators, including the root mean square error (RMSE), relative root mean square error (RRMSE), mean absolute percentage error (MAPE), mean square error (MSE), correlation coefficient (R2) and relative error (RE). Three kinetic models were applied to the experimental and predicted data; the pseudo second-order model was the best at describing the data. The maximum RE, R2 and MSE were 9.79%, 0.9701 and 1.15 × 10−3, respectively, for the NARX model; 15.02%, 0.9304 and 2.2 × 10−3 for the LR model; and 16.4%, 0.9313 and 2.27 × 10−3 for the FFNN model. The NARX model accurately predicted the adsorption capacity with better performance than the FFNN and LR models.


Introduction
Mercury is the most toxic heavy metal, and has a serious influence on the environment and human health [1]. Mercury poisoning effects mostly include renal disorders and neurological disorders; mercury easily passes through the brain barrier and influences the brain. High mercury concentrations are a source of impaired kidney and pulmonary function [2]. Mercury availability and toxicity depend on the chemical form in which the mercury is found. Mercury can be released from different sources due to its long range, bio-accumulative properties, and high toxicity [3]. The removal of mercury is a substantial concern [4,5]. Elemental mercury (Hg 0 ) is highly insoluble and volatile in water; therefore, Hg 0 removal using traditional methods is problematic [6]. Major problems due to mercury (Hg 2+ ) pollution exist in several countries, including Iraq, China, Brazil and Japan [7,8]. The pollution of water

Results and Discussion
The removal of mercury from water solution and the neural networks used as a modelling technique for modelling the functionalized CNT adsorption capacity were the main focus of this study. Ak-CNTs were used in this study, and different amounts of the adsorbent were used-5, 20 and 30 mg. The efficiency of the adsorbent was examined using various initial concentrations-1, 3 and 5 mg/L-and different pH values-3, 5.5, 6 and 8. The contact time lasted until equilibrium was reached. Different samples were used in experiments at different times using various variable values, including mercury concentration, adsorbent dosage and pH; therefore, the total number of samples prepared was 176. The effect of the involved input parameters on the adsorbent (Ak-CNTs) adsorption capacity is studied. The evaluation of model performance was conducted using different indicators, including RRMSE, MSE, RE, RMSE, and MAPE.

Hybrid Material Characterization
Investigating the electrical charge of adsorbent is important because of the parameter effect on the adsorbent's efficiency. The measurement of zeta potential electrical potential on the dielectric layer on the suspended particles surface in the bulk fluid and solution. The electric potential is the balance of the electrostatic force which keep the microparticles or nanoparticles stable in a suspension or emulsion. The zeta potential measurement for the P-CNTs and Ak-CNTs are conducted, the zeta potential absolute value increased, the P-CNT value is 5.5 mV, whereby the value of the Ak-CNTs is 52.3 mV. Furthermore, an increase in the ID/IG ratio was presented using the Raman spectra from 1.11 to 1.18 for the P-CNTs and Ak-CNTs, respectively. This displays the functional groups effect in the sp3 direction. Furthermore, Raman spectra presented an increase in the ratio of ID/IG from 1.11 to 1.18 for the P-CNTs and Ak-CNTs, respectively, which displays the functional groups effect in the sp3 direction gained from A-DES. The results showed that O-H stretching disappeared after A-DES functionalization. O-H may have been present because the adsorbed water on the CNT surface was hydrophilic, which was obvious in the k-CNT case. A-DES functionalization enhanced the sample drying process and decreased the hydrophilicity, which is why OH-disappeared from the P-CNT and Ak-CNT FTIR spectra. PO 4 −3 is observed in the 500-600 cm −1 range, and C-Br stretching was also detected in the 550-650 cm −1 range. The adsorption process is influenced by the adsorbent surface area; therefore, the use of A-DES as a CNT functionalization agent increased the pore size diameter of adsorbent from 20.49 to 127.34 Å and the surface area from 123 to 199.366 m 2 /g. The increase resulted in the performance of the adsorption capacity of Ak-CNTs [36].

pH Effect
Solution pH is one of the main operational variables in water treatment systems in industrial, commercial and urban areas. The effect of pH is studied to optimize the value for absorbance or purification. Deviations at this point in experiments and parameter sensitivity analysis will result in high uncertainty and poor performance. The effect of pH on Ak-CNTs is presented in Figure 1. The pH effect is investigated by fixing the other parameters, including an adsorbent dosage of 20 mg, contact time of 55 min, and 3 mg/L initial concentration. The pH values were 3, 5.5, 6 and 8. pH has an effect on functional group protonation in biomass, such as in amino phosphate and carboxyl groups, and the metal chemistry, including the solubility [37,38]. The results reveal that with increasing pH, the adsorption capacity increases as well. Increasing the value of pH from 3 to 5.5, an increase in the adsorption capacity occurred from 2.125 mg/g to 3.015 mg/g, while increasing the value of pH from 6 to 8 increased the adsorption capacity from 3.196 mg/g to 3.432 mg/g. This increase occurred due to the presence of negatively charged functional groups that contain oxygen, such as carboxylic groups, and the enhancement of negative electronic charge by the presence of OH-in the solution. This negative charge pattern was widely distributed on the adsorbent surface, which may determine metal sorption. At pH values greater than 7, the Hg 2+ dominant species are Hg (OH) + and Hg (OH) 2 . This complexation occurs due to the presence of OH−, which results in precipitation [32]. The NARX network modelling method is used to model the adsorption capacity using the obtained experimental data set. Upon comparing the NARX outputs to experimental results, the NARX model showed high accuracy; Figure 1 presents the NARX and experimental results.

Initial Concentration Effect
The initial Hg 2+ ion concentration is used as one of the parameters in the experimental work, and the Hg 2+ initial concentrations used were 1, 3 and 5 mg/L. The effect of Hg 2+ concentration on Ak-CNTs is presented in Figure 2. The Hg 2+ initial concentration had a favourable and prominent effect on the Hg 2+ adsorbed quantity on the Ak-CNT adsorbent. The effect of the initial concentration of mercury ions on the Ak-CNT adsorption capacity was investigated by fixing a contact time of 120 min, pH of 5.5 and adsorbent (Ak-CNT) dosage of 5 mg. Upon increasing the mercury concentration, the adsorption capacity is improved. The adsorbent (Ak-CNT) adsorption capacity is increased from 0.9935 to 7.495 mg/g by increasing the concentration from 1 to 3 mg/L; meanwhile, when increasing the mercury concentration from 3 to 5 mg/L, the adsorbent (Ak-CNT) adsorption capacity increased from 7.495 to 11.214 mg/g. This increase, caused by the higher collision between the adsorbent molecules and adsorbent (Ak-CNT) active sites [39]. The experimental results were used to train the NARX neural network model, and the created model proved to have high accuracy compared to the experimental data. Figure 2 presents the NARX and experimental results.

Effect of Adsorbent Dosage
The adsorbent dosage effect on the adsorption capacity has an important influence, the Ak-CNTs were used as an adsorbent in this study for mercury removal. The adsorbent dosages used in this study were 5, 20 and 30 mg. To examine the adsorbent dosage effect, the other parameters involved in the experimental work were fixed, with a contact time of 30 min, mercury initial concentration 5 mg/L was selected, and pH value 6. The presented results in Figure 3 reveal that the adsorption capacity decreased when increasing the adsorbent dosage. This decrease might have happened due to the fact that during the adsorption process, some of the active sites remained unsaturated [40,41]. The adsorbent adsorption capacity was 13.56 mg/g when using an adsorbent dosage of 5 mg; meanwhile, by increasing the adsorbent dosage amount from 20 to 30 mg, there was a decrease in the adsorption capacity from 8.49 to 6.47 mg/g. The results achieved from the experimental work were used to train the NARX neural network model. The model results are compared to the experimental result, the NARX model proved high accuracy in the predicting the Ak-CNT adsorbent adsorption capacity. The NARX model and experimental results are presented in Figure 3.

Kinetic Study
The kinetic study aimed to study the behaviour of the reaction using the Ak-CNT adsorbent. The kinetic models were applied, including pseudo first-order, pseudo second-order and intraparticle diffusion models. Different parameter conditions were used in this work for the kinetic study. Figure 4A presents the kinetic study results using 20 mg of adsorbent dosage, 3 mg/L of initial mercury concentration, pH of 3 and different time intervals until reaching equilibrium. In Figure 4B, the adsorbent dosage used is 20 mg, the concentration of mercury ions is 5 mg/L, the selected pH is 6, and the contact time (time to equilibrium) was until 162 min. In Figure 4C, the adsorption capacity is 30 mg, the initial concentration of mercury ions is 5 mg/L, the used value of pH is 6 and the contact time is 163 min. Figure 4A-C indicates that the Ak-CNT adsorption process fits to a pseudo-second order, with a higher correlation coefficient R 2 value compared to the results of the pseudo first-order and intraparticle diffusion models [42]. Table 1 presents the kinetic models results. The NARX modelling method is used to model and predict the experimental results. The kinetic models were applied on the NARX model outputs, the results also fitted to pseudo second-order. The kinetic study results fit to the pseudo second-order model with R 2 values greater than 0.99 at different doses, initial concentrations and pH values. The NARX output result was also fitted to the pseudo second-order model with an R 2 value greater than 0.99 at the same dose, initial concentration and pH, confirming that the NARX model is able to predict the experimental results. The kinetic model results are presented in Table 1.

Neural Network Performance
The removal of mercury from water by Ak-CNT adsorbent was modelled by ANNs using MATLAB R2014a software. The experimental data set prepared at the lab scale with a total of 176 data points which is used to model the adsorption capacity of Ak-CNT adsorbent. The contact time, initial concentration, pH and adsorbent dosage are included in the experimental work; later, the used parameters were included in the modelling process, the used parameters values were: adsorbent dosage amount (5, 20 and 30 mg), concentration of mercury ion (1, 3 and 5 mg/L), contact time (varying intervals), and pH (3, 5.5, 6 and 8). One hundred seventy-six (176) datasets were prepared at the lab scale. One hundred fifty-one (151) records were used for the network training and validation and twenty-five (25) records were used for testing.
In this work, three neural network types were used and compared based on their productivity and performance. The best model was used to study the parameters effect including pH, adsorbent dosage, and initial concentration and for kinetic studies. The NARX model has a maximum RE of 9.79% ( Figure 6) and R 2 of 0.9701 ( Figure 5). In contrast, the FFNN model has a maximum RE of 16.4% ( Figure 6) and R 2 of 0.9313 ( Figure 5). The LR model has a maximum RE of 15.02% ( Figure 6) and R 2 of 0.9304 ( Figure 5).

Neural Network Performance
The removal of mercury from water by Ak-CNT adsorbent was modelled by ANNs using MATLAB R2014a software. The experimental data set prepared at the lab scale with a total of 176 data points which is used to model the adsorption capacity of Ak-CNT adsorbent. The contact time, initial concentration, pH and adsorbent dosage are included in the experimental work; later, the used parameters were included in the modelling process, the used parameters values were: adsorbent dosage amount (5, 20 and 30 mg), concentration of mercury ion (1, 3 and 5 mg/L), contact time (varying intervals), and pH (3, 5.5, 6 and 8). One hundred seventy-six (176) datasets were prepared at the lab scale. One hundred fifty-one (151) records were used for the network training and validation and twenty-five (25) records were used for testing.
In this work, three neural network types were used and compared based on their productivity and performance. The best model was used to study the parameters effect including pH, adsorbent dosage, and initial concentration and for kinetic studies. The NARX model has a maximum RE of 9.79% ( Figure 5) and R 2 of 0.9701 ( Figure 6). In contrast, the FFNN model has a maximum RE of 16.4% ( Figure 5) and R 2 of 0.9313 ( Figure 6). The LR model has a maximum RE of 15.02% ( Figure 5) and R 2 of 0.9304 ( Figure 6).
Comparison of the RE and R 2 values for the three models reveals that the NARX model has the best performance. The selection of a proper model structure with high accuracy and the productivity is a challenging task because of the involvement of different parameters, such as the hidden layer(s) number, transfer function type and the neuron(s) number in each layer. In this study, different hidden layer numbers, transfer function types and node numbers were used to choose the suitable structure. The structure selection was based on the performance and the productivity of the network. Initially, the MSE was used during the training phase; then, the created model performance and productivity were tested based on data that were not included in the training section using various indicators; the results are presented in Table 2. By comparing the obtained results, which are presented in Table 2 for the three models, the NARX model showed better performance than the FFNN and LR models.

Chemicals and Experiments Setup
The chemicals and materials utilized in this work included multi-walled CNTs with dimensions (D × L) of 6-9 nm × 5 µm and a content of >95% carbon, hydrochloric acid (36.5%-38%), sodium hydroxide pellets, Gly and potassium permanganate, all of the materials were provided by SIGMA-ALDRICH. APB and a 1000 mg/L mercury standard solution were provided by MERCK.
The DES synthesis was conducted by mixing the Gly with APB using the magnetic stirring system at 400 rpm with a temperature of 80 • C. The mixing continued until the mixed component reached to the liquid state without precipitation; the product is referred to as A-DES in this study. The molar ratio details, synthesis and characterization were based on [43]. Primary oxidation was conducted to oxidize the pristine CNT (P-CNT) surface with KMnO4 for 2 h at 65 • C [44]; the CNTs functionalized with KMnO 4 (adsorbent) are symbolized as k-CNTs. Then, 200 mg of k-CNTs was mixed with 7 mL of the prepared A-DES using the sonication system for 3 h at a temperature of 65 • C to produce Ak-CNTs as an adsorbent. After the functionalization steps, a washing step was conducted using distilled water with a vacuum pump and a polytetrafluoroethylene (PTFE) 0.45 µm membrane until the filtered water pH became neutral. Then, the adsorbent (Ak-CNT) was dried overnight at a temperature of 100 • C.
The adsorption process was conducted with different values for the involved parameters, including pH, initial mercury concentration, contact time and amount of adsorbent dosage. The Ak-CNT adsorbent was prepared to remove mercury ions from water solution; three different amounts of Ak-CNTs were used (10, 20 and 30 mg), the utilized mercury ion concentrations were (1, 3 and 5 mg/L), the values of pH were 3, 5.5, 6 and 8 and the contact time lasted until equilibrium was achieved. Different amounts of Ak-CNTs were mixed with 50 mL of Hg 2+ stock solutions in 250 mL flasks with different pH values and mercury concentrations. The prepared flasks were placed in the mechanical system with a 180 rpm shaking speed at room temperature. 176 samples were used in this work under various conditions. The samples were tested at different time intervals to study the adsorbent performance. Inductively coupled plasma-optical emission spectrometry (ICP-OES) (OPTIMA7000DV, PerkinElmer ® , USA) was used to test the samples.
Zeta potential analysis, Raman spectroscopy and Fourier transform infrared (FTIR) were utilized to characterize the Ak-CNTs, k-CNTs and P-CNTs. To identify the surface charge of the adsorbent particles, the Zetasizer (Malvern, UK) was used. To determine the surface chemical modification of the adsorbents, a PerkinElmer ® FTIR spectrometer (Akron, OH, USA) with a wavenumber range of 400-4000 was used. The Raman shift was obtained using a Renishaw System 2000 to identify the functionalization degree.

Artificial Neural Networks (ANNs)
ANNs are formed from single neurons that are connected and arranged in a way that is described by their architecture [45,46]. There are three stages of modelling: training, validation and testing. During the training stage, a dataset is fed to the network, and a selected algorithm works to fit the provided dataset by adjusting the weights that connect the neurons. During the training stage, transfer functions, such as sigmoid or linear functions, determine the calculations that occur during the processing of the data. Nonlinear responses between the classification and sigmoid transfer functions enable the ANN to detect nonlinear relationships during training of the dataset [47]. ANNs are a mathematical method based on biological neural systems and have the ability to learn, store and recall information. The multi-layer perceptron (MLP) neural network is the most common and simple ANN that is allocated to FFNNs. FFNNs consist of multiple layers. The connection arrangements between the layers permit information to pass forward to the output layer only. Some networks use feedback connections that permit information to pass backwards or laterally within the network; these networks are named recurrent neural networks (RNNs) [48]. The NARX network is a nonlinear network with an exogenous input [49]. During pre-processing, the data are normalized between 0 and 1 to avoid network over-fitting. In this work, an FFNN, LR neural network and NARX neural network are proposed to model the functionalized CNTs adsorption capacity.
Moreover, FFNNs have several neurons; the neurons transfer the input value to the next layer. FFNNs use a supervisory learning method to select the optimum parameters, such as the weight and bias value [50]. In FFNNs, the neurons in the same layer do not connect to each other but are rather connected to the next layer. The layer connection is expressed by the weight value [51]. FFNNs use learning to produce the relationship between the inputs and the target data, which is usually associated with a random initial weight, and then update the value by comparing the network results with the experimental results. In the diverse research using neural computations, different transfer functions are used depending on the problem nonlinearity and data complexity to design proper networks. The selection of a proper network structure depends on different variables, such as the type of transfer function, number of hidden layer and number of neurons. The best FFNN model performance was obtained by using three hidden layers and 10 neurons with a tansig as a transfer function.
The LR neural network is a recurrent network, and each layer of the network has a recurrent connection associated with the hidden layer and the output layer. The tap delay associated with the network permits the network to produce a dynamic response to input samples in a time series [52]. Moreover, the LR neural network is similar to the distributed delay network and time delay with a finite impulse response. The LR neural network is classified by the backward connection from the hidden layer output to the hidden layer input as a context unit. The selection of a proper network structure depends on different variables, such as the type of transfer function, hidden layer number, and neurons number in the hidden layer. The best LR neural network model performance was obtained by using two hidden layers and 15 neurons with a tansig transfer function.
NARX is a recurrent neural network. It is a nonlinear network and has an exogenous input. NARX consist from different layers such as input layer, hidden layer and the output layer with a feedback connection [49]. NARX has the highest generalization degree and speed of convergence compared to the other networks. The NARX network uses an iterative training process; weight and biases are iteratively updated to develop the performance of the model at each step. The network outputs are regressed with actual target values during the training step, with the actual data being fed to the network during the training phase. This method results in better training and learning, and good performance of steady networks such as FFNN.
With regard to the inclusion of an exogenous input, the NARX model has a greater degree of freedom compared with other networks. This degree of freedom decreases the number of parameters required by the model and increases the model accuracy. The NARX outputs are presented in Equation (1).
where: f = non-linear function, y(t) = network output, u(t) = network input, n y and n u = output and input order. When f constitutes a multi-layer perception, the system result is identified as a NARX network [49]. It is determined that the best NARX structure uses three hidden layers, and the general models' structures are presented in Figure 7. One input layer is used with four (4) input nodes (initial concentration of mercury ions, pH, adsorbent dosage and contact time) and one output layer with one node (Q c ). In the Figure 7, the z is the delay element, w ij is the weight of network and b h is the bias of network. The development of the NARX model contains several steps (training and validation) using the same data set and testing for the model accuracy. For the training, validation and testing step, 176 runs were used for the development of the NARX model. The training and validation steps were developed in parallel, pre-switching to the testing step. For training and validation, 151 runs were used, while the remaining 25 data sets were used for the model testing. For the training of the model, the trainbr is the best selection with the best performance. Different network structures were utilized to select the best structure with the lowest error. The selected optimum structure consists of one input layer with 4 nodes, three hidden layers with 10 neurons, and one output layer with one neuron; the best transfer function is tansig, and was used for the modelling.
Different indicators are selected in this work to evaluate the model's accuracy and performance using the actual and predicted results. The used indicators are RE, RMSE, MSE, MAPE and RRMSE.
where: D f (t) = the predicted results, D a (t) = the actual results.

Conclusions
The CNTs functionalized with an APB-based DES were able to efficiently remove mercury ions from water solution. A comparative study of the NARX, LR and FFNN models was conducted based on their performance and accuracy; the NARX model presented better performance than the FFNN and LR models. The effect of various parameters, including adsorbent dosage, pH, contact time and mercury ion concentration, was investigated. Three kinetics models were applied on the predicted and experimental data such as intraparticle diffusion, pseudo first-order and pseudo second-order models, the pseudo second-order model described the data best. For the FFNN model, the maximum RE was 16.4%, R 2 was 0.9313 and MSE was 2.27 × 10 −3 . The LR model had a maximum RE of 15.02%, an R 2 of 0.9304 and an MSE of 2.2 × 10 −3 . The NARX model had a maximum RE of 9. 79%, an R 2 of 0.9701 and an MSE of 1.15 × 10 −3 .