A Stepwise-Cluster Inference Model for Phenanthrene Immobilization at the Aqueous/Modified Palygorskite Interface

A stepwise-cluster inference (SI) model was established through introducing stepwise-cluster analysis (SCA) into the phenanthrene immobilization process at the aqueous/modified palygorskite interface. SCA has the advantages of tackling the nonlinear relationships among environmental factors and the phenanthrene sorption amount in the immobilization process. The essence of SCA is to form a tree-based classification on a series of cutting or mergence procedures under given statistical criteria. The results indicated that SI could help develop a statistical relationship between environmental variables and the phenanthrene sorption amount, where discrete and nonlinear complexities exist. During the experiment, data were randomly sampled 10 times for model calibration and verification. The R2 (close to one) and root mean squared error (RMSE) (close to zero) values guaranteed the prediction accuracy of the model. Compared to other statistical methods, the calculation of R2 and RMSEs showed that SI was more straightforward for describing the nonlinear relationships and precisely fitting and predicting the immobilization of phenanthrene. Through the calculation of the input effects on the output in the SI model, the influence of environmental factors on phenanthrene immobilization were ranged in descending order as: initial phenanthrene concentration, ionic strength, pH, added humic acid dose, and temperature. It is revealed that SCA can be used to map the nonlinear and discrete relationships and elucidate the transport patterns of phenanthrene at the aqueous/modified palygorskite interface.


Introduction
Polycyclic aromatic hydrocarbons (PAHs) are a group of nonpolar hydrophobic contaminants with two or more fused benzene rings from natural as well as anthropogenic sources.PAHs can transport and be accumulated in groundwater and surface water for a long period of time, and are difficult to biodegrade.Immobilization of PAHs to solid sorbents places an important effect on its fate, transport, and bioavailability in natural aquatic environments [1].Clay minerals, such as palygorskite, have received substantial interest as a potential alternative to conventional sorbents such as activated carbon.Due to the intrinsic negative charges of palygorskite, it has been widely utilized as a sorbent for heavy metals [2].However, being intrinsically hydrophilic, palygorskite shows low immobilization for hydrophobic organic contaminants.One effective way for the modification of palygorskite is organomodification, that is, to replace the natural inorganic exchangeable cations with large organic cations of surfactant molecules.Previous studies regarding the immobilization of organic pollutants by organopalygorskite have concentrated on polar organic compounds such as 2,4-D herbicide [3], atrazine pesticide [4], and phenolic compounds [5].However, the removal of nonpolar compounds from the aqueous phase by organopalygorskite is not well documented.
It has been reported that increasing the surfactant chain length and the number of alkyl chains per surfactant molecule can increase the immobilization of organic contaminants by organoclays [6].Containing two hydrophilic heads and two long hydrophobic tails in one surfactant molecule, cationic gemini surfactants have received emerging interest recently.Compared to their single-chain counterparts, gemini surfactants have increasing hydrophobicity and can endow solids with increasing organic matter.The sorbed surfactants can act as an effective partitioning media for nonpolar or weakly polar organic pollutants [7].In our previous studies, the enhanced solubilization capabilities of cationic gemini surfactants towards PAHs has been proved [8].The enhanced retention for phenanthrene by soils with the addition of gemini surfactants has been well established [9].However, there is limited data on modifying palygorskite with gemini surfactants for the removal of PAHs from the aqueous phase.Enriched knowledge regarding the solution chemistry effects on PAH immobilization to modified palygorskite still needs to be clarified.Moreover, it is of great importance to elucidate the environmental factors controlling the sorption process, including the PAH concentration, pH, ionic strength, and temperature, etc.
Mathematical modeling is one of the essential approaches for tackling the complicated relationships among environmental factors and the sorption amount in the immobilization process.Among them, factorial design are extensively used to obtain information regarding whether or not several initial conditions had an impact on specific immobilization characteristics [10].However, factorial analysis is not straightforward when evaluating the relation between multiple factors and the sorption amount.In addition, many model parameters need calibration and verification in order to achieve a satisfactory fit [11].More recently, a few studies have been conducted through the development of statistical tools, such as the fuzzy factorial method [1], fuzzy logic modeling [12], support vector regression [13], multiple linear regression and tree regression analysis [14], and artificial neural networks [15].However, many variables in the immobilization system can be either continuous or discrete, and relations among them are inherently nonlinear, which leads to difficulties in performing these methods.
Stepwise-cluster analysis (SCA) is a type of non-parameter regression technology.The essence of SCA is to form a classification tree in the sense of probability, based on a series of cutting or mergence procedures under given statistical criteria.A cluster could be cut into two sub-clusters, while two could be merged into a new cluster during the iterative training process.Step by step, a tree can be established when no clusters can be further cut or merged.SCA can not only deal with nonlinear relationships among continuous and/or discrete variables, but also clearly show the significance levels of different branches.There are a few applications of SCA for the prediction of air quality in the atmospheric environment [16], groundwater remediation [17], hydrological processes [18][19][20][21], streamflow prediction [22,23], composting [11,24], and climate change [25,26].However, it was not reported that SCA can help reveal the complicated relationships between environmental factors and process characteristics in the PAH immobilization process.Moreover, the potential utilization of SCA for tackling discrete and nonlinear complexities in the immobilization system is desired.
Therefore, this study aims to develop a stepwise-cluster inference (SI) model through introducing SCA into the PAH immobilization process to tackle the nonlinear relationships among environmental parameters and the PAH sorption amount.In detail: (1) establishing a stepwise-cluster inference model for predicting variations of PAH immobilization at the aqueous/modified palygorskite interface; (2) verifying the proposed modeling system based on the data obtained from the PAH immobilization process; and (3) evaluating the effects of multiple factors on the distribution of PAHs in the water/modified palygorskite system.

Materials
Phenanthrene was selected as the representative PAH, and was purchased from Sigma-Aldrich Canada Co. (Oakville, ON, Canada), with a purity greater than 98%.Cationic gemini surfactant (N1-dodecyl-N1, N1, N2, N2-tetramethyl-N2-octylethane-1,2-diaminium bromide, 12-2-12) was obtained from Chengdu Organic Chemicals Co., Ltd.(Chengdu, China), with a purity of 98%.The water solubility of phenanthrene is 1.06 mg/L at 298 K and the octanol-water coefficient (log Kow) is 4.57.The stock solution of phenanthrene was made by diluting the desired amount of pure phenanthrene into High Performance Liquid Chromatography (HPLC)-grade methanol solution, and was stored in a dark place at 277 K in an amber borosilicate bottle to minimize photodegradation and volatilization.Palygorskite was obtained from Huaiyuan Mining Co. Ltd. (Xuyi, China).The gemini surfactant modified palygorskite was prepared according to the method reported in our previous studies [27].

Sorption Studies
The batch sorption experiments were conducted in 20 mL glass vials.0.1 g of clay sample was first added into the glass vial; after that, the appropriate amount of deionized water was added into the vial.The background solution contained 0.01 M NaCl as an electrolyte and the pH was 7. Then a pre-calculated volume of phenanthrene stock solution was added to each vial, and the initial concentration of phenanthrene was pre-determined for each sample.The vials were sealed with Teflon-lined screw caps and were vortexed for 20 s, and then were placed in a reciprocal shaker at 200 rpm for 24 h to reach the sorption equilibrium.Preliminary experiments showed that 24 h were sufficient for the sorption process to reach equilibrium and the experimental loss of phenanthrene was negligible [28][29][30].Before testing, the samples were placed for 10 min to separate the soil from the solution.An appropriate aliquot of supernatant was then carefully withdrawn with a volumetric pipette to further determine the residual amount of phenanthrene.Meanwhile, controlled experiments without phenanthrene were conducted and the supernatant of the controlled samples was analyzed as the background concentration for PHE.The quality assurance/quality control test was conducted and the phenanthrene recovery rate was between 96% and 103%.
To investigate the influence of temperature on the immobilization of phenanthrene, the sorption behaviors of phenanthrene were carried out at 283, 293, and 303 K.The solutions contained 0.01 M NaCl and pH was kept at 7, and then the solutions were placed in a reciprocal shaker at 200 rpm for 24 h.To study the effect of pH on the immobilization of phenanthrene, the immobilization behaviors of phenanthrene were investigated at pH 3, 7, and 11.The pH value was adjusted with standard HCl or NaOH solution, and the ion concentration in the system was kept constant at 0.01 M. The initial concentrations of phenanthrene ranged from 0.3 to 1 mg/L.The temperature was kept constant at 293 K. Batch experiments were performed in the manner as mentioned for the immobilization studies.The effect of humic acid (HA) on the immobilization behaviors of phenanthrene was examined in a modified palygorskite/water system.The experiments were conducted in the presence of HA ranging from 0 to 80 mg organic carbon/L (OC/L).An appropriate volume of phenanthrene stock solution was added and the initial concentrations for phenanthrene varied from 0.3 to 1 mg/L.Temperature was kept constant at 293 K. Batch experiments were performed following the sorption test procedures.To investigate the influence of ionic strength, immobilization tests were conducted.NaCl was added at different concentrations (0.01, 0.1 and 1 M).The initial concentrations of phenanthrene were from 0.3 to 1 mg/L and the pH value was 7. The vials were placed on a reciprocal shaker at 293 K and 200 rpm for 24 h to reach the sorption equilibrium.After preliminary experiments, a 3 5 full factorial design approach was adopted to represent various factorial effects on phenanthrene immobilization, including the initial phenanthrene concentration, added HA dose, ionic strength, temperature, and pH.Three levels of each parameter obtained from the preliminary experiments were indicated in Table 1.A total of 3 5 × 2 = 486 experiments with all possible combinations of variables were conducted in duplicate.

Analytical Methods
Phenanthrene was analyzed using HPLC.The HPLC instrument, an Agilent 1260 Infinity LC System (USA), was equipped with a vacuum degasser, binary pump, autosampler, thermostated column compartment (set to 303 K), diode array detector (DAD), and ZORBAX Eclipse PAH column (3.5 µm particle size, 4.6 mm × 150 mm ID).A mobile phase consisting of acetonitrile/water (75/25, v/v) was used at a flow rate of 1.0 mL/min.Phenanthrene was monitored with DAD at 250 nm.The amounts of the phenanthrene/surfactants sorbed to the soil were the difference between the initial amount added and the amount remaining in the solution.The pH measurements were conducted through a SevenEasy S20K pH meter (Mettler-Toledo, Columbus, OH, USA).All tests were conducted in duplicate and the typical error in the measurement was less than ±10%.

Data Collection and Analysis
The candidate inputs (environmental parameters) were state variables, which had potential effects on phenanthrene immobilization; the inputs included the initial phenanthrene concentration (X 1 ), added HA dose (X 2 ), ionic strength (X 3 ), temperature (X 4 ), and pH (X 5 ).The phenanthrene sorption amount (Y) were the outputs (dependent variables).A total of 486 samples were obtained.The data were divided into training and test sets.The training set was used for shaping the SCA trees and the test set for verifying the developed model.The model calibration and verification were conducted 10 times though SCA methods to assure the stability of the model.In each experimental run, 162 and 81 samples were randomly selected as the training and the test set, respectively.The correlation coefficient (R) and the root mean squared error (RMSE) were employed to evaluate the performance of the SCA model.The RMSE is represented as: where n is the number of samples in the training or test set, and v j and y j are the predicted and observed values for the phenanthrene immobilization amount in the jth sample.

Stepwise-Cluster Analysis
The classical methods are mainly classification methods associated with a few complexities.For example, the variable observations collected for classical methods may be uncertain due to causes such as instrumental or operational errors.The multiple variables selected to characterize the sorption process may be dependent on each other.A common condition for most statistical classification methods is that the variable samples should come from a normally distributed population.In addition, many classical methods rely on subjective judgements to support the screening of variables, selection of classification thresholds, and setting of sample sizes or numbers prior to classification practices.These issues are challenging the effectiveness of existing classification methods, the reliability of classification results, and the reasonability of impact studies or the other related research.Recently, nonparametric statistical methods have received much attention because of their superior capability in capturing nonlinear and discrete relationships between state and response variables [17].Stepwise-cluster analysis (SCA) is an emerging non-parameter regression technique.It consists of a series of cutting or merging operations according to given statistical criteria and finally generates a cluster tree in the sense of probability.SCA is capable of reflecting differences both between and within clusters, therefore improving the prediction accuracy [24].
The whole or part of the training set can be treated as one cluster (α), including n α samples, m independent variables (X), and one dependent variable (Y), indicated as follows: Here please refer to the flowing SCA chart with detailed descriptions in our previous study [11].Let cluster α be cut into two sub-clusters β and γ (with n β and n γ samples, respectively).According to Wilks' likelihood-ratio criterion, the cutting point is optimal only if Wilks' Λ value is a minimum [31].According to Wilk's likelihood-ratio criterion, the smaller the Λ value, the bigger the difference between the sample means of β and γ.Since the Λ is directly related to the F statistic, the sample means of the two sub-clusters can be compared for significant differences through an F test.Therefore, the criteria of cutting (or merging or not merging) clusters is based on the F tests.All sub-clusters produced from the original dataset will enter a set of iterative cutting (or merging) runs until all hypotheses of further cut (or mergence) are rejected or the minimum number of samples (N min ) within every cluster is reached.After all calculations and tests are completed, an SCA tree can be built which indicates that the training is done.When a new sample (x 1 , x 2 , ..., x m , y 1 ; y 1 is unknown) enters the tree at a cutting point, the sample will finally drop into a tip cluster which cannot be either cut or merged further according to the routes decided by new independent variables (x 1 , x 2 , ..., x m ).The predicted value of y 1 will be the mean of the dependent variables of the training samples in the tip cluster.Therefore, the SCA tree can predict new dependent variables when new samples enter the tree from top to bottom.

Sorption Studies
The performance of gemini surfactant modified palygorskite for the removal of phenanthrene under the impacts of different state variables was investigated.From Figure 1a-d, it can be seen that the phenanthrene immobilization was affected by the initial concentration, with a significant sorption increase as the initial concentration increased from 0.3 to 1.0 mg/L.The initial concentration provided an important driving force for overcoming all mass transfer resistance of phenanthrene molecules between the aqueous and solid phases [1].In Figure 1a, the effects of ionic strength on the immobilization of phenanthrene on modified palygorskite were evaluated, with phenanthrene from 0.3 to 1.0 mg/L, ionic strength from 0.01 to 1 M, and a constant pH of 7. The phenanthrene sorbed amounts were found to increase significantly with the increasing NaCl concentration from 0.01 to 1 M.The maximum phenanthrene immobilization was 0.476 mg/g at NaCl of 1 M.It could, to a certain extent, be explained by the "salting-out" effect, which refers to the reduced solubility of organic compounds in salt solutions [32].Phenanthrene is a nonionic compound and the immobilization was partially due to hydrophobic interaction mechanisms.The decrease in solubility caused an increase in the immobilization capacity of the sorbents due to increasing hydrophobic interactions induced by the increasing NaCl concentration [33].Generally, the amount of phenanthrene sorbed by modified palygorskite increased with the increasing NaCl concentrations at a given phenanthrene aqueous concentration.The addition of more salt ions, therefore, can enhance the retention of phenanthrene on modified palygorskite.This is of special interest for the removal of PAHs through modified palygorskite from an aqueous phase with a high salinity level.
Water 2017, 9, 590 6 of 15 the increasing NaCl concentration [33].Generally, the amount of phenanthrene sorbed by modified palygorskite increased with the increasing NaCl concentrations at a given phenanthrene aqueous concentration.The addition of more salt ions, therefore, can enhance the retention of phenanthrene on modified palygorskite.This is of special interest for the removal of PAHs through modified palygorskite from an aqueous phase with a high salinity level.The effects of pH on phenanthrene immobilization onto modified palygorskite were studied through batch experiments at pH 3, 7, and 11. Figure 1b shows that the pH level can significantly affect phenanthrene immobilization on modified palygorskite.The maximum uptake of phenanthrene was 0.455 mg/L at pH 3 for the gemini surfactant modified palygorskite.Then the sorption reduced markedly from pH 3 to 7 at all given concentrations of phenanthrene.A further increase of pH from 7 to 11 had less pronounced effects on phenanthrene immobilization.Three mechanisms applicable to PAH sorption were proposed: the hydrophobic interactions, the electron donor-acceptor interaction, and the π-π interaction [34].Because palygorskite clay minerals contained low amounts of humic substances [35], mechanisms such as enhanced dipole interaction between the charged surface (electron acceptors) and phenanthrene with electron-rich π systems (electron donors) might be the major cause of phenanthrene immobilization.In addition, the solution pH affected the surface properties of the sorbent and the degree of ionization of the sorbate [36].The pHpzc of modified palygorskite was 7.9.The mineral surface would be covered with negative charges The effects of pH on phenanthrene immobilization onto modified palygorskite were studied through batch experiments at pH 3, 7, and 11. Figure 1b shows that the pH level can significantly affect phenanthrene immobilization on modified palygorskite.The maximum uptake of phenanthrene was 0.455 mg/L at pH 3 for the gemini surfactant modified palygorskite.Then the sorption reduced markedly from pH 3 to 7 at all given concentrations of phenanthrene.A further increase of pH from 7 to 11 had less pronounced effects on phenanthrene immobilization.Three mechanisms applicable to PAH sorption were proposed: the hydrophobic interactions, the electron donor-acceptor interaction, and the π-π interaction [34].Because palygorskite clay minerals contained low amounts of humic substances [35], mechanisms such as enhanced dipole interaction between the charged surface (electron acceptors) and phenanthrene with electron-rich π systems (electron donors) might be the major cause of phenanthrene immobilization.In addition, the solution pH affected the surface properties of the sorbent and the degree of ionization of the sorbate [36].The pH pzc of modified palygorskite was 7.9.The mineral surface would be covered with negative charges when the pH was higher than pH pzc .The modified palygorskite was subjected to protonation at low pH or deprotonation at elevated pH [37].This would affect the sorption sites for PAHs on sorbent surface.The above-mentioned mechanisms were to some extent related to solution pH, resulting in a dependence of PAH sorption on aqueous pH.The detailed mechanisms depended on the physical and chemical properties of the interactive sorbate-sorbent system and still needed further investigation.
The effects of HA on phenanthrene immobilization to gemini-surfactant-modified palygorskite as a function of HA dose (0-80 mg OC/L) were investigated and the results are shown in Figure 1c.The initial phenanthrene concentrations were 0.5, 0.75, and 1 mg/L, respectively.The immobilization of phenanthrene increased at the low HA level and then decreased if more HA was added.Specifically, when the added dose of HA was less than 4 mg OC/L, the sorbed phenanthrene amount varied from 0.442 to 0.460 mg/g.However, the binding affinity of phenanthrene to the modified palygorskite decreased with the further addition of HA.By gradually increasing the HA concentration to 80 mg OC/L, the retention of phenanthrene was hindered from 0.460 to 0.416 mg/g.At a low HA dose, the initial phenanthrene sorption increase was presumably due to the binding of phenanthrene to HA along with the sorption of HA onto modified palygorskite, forming a complexation of sorbed HA with phenanthrene [38].The hydrophobic interaction mechanism was reported as the major factor responsible for the binding of PAHs to HA [39].This process could also be produced by the π−π interactions between phenanthrene and humic substance acceptor groups in aqueous solutions [40].The presence of HA at the low concentration facilitated the uptake of phenanthrene due to the cosorption of HA and the phenanthrene complex on the modified palygorskite surface.At higher HA concentrations, phenanthrene immobilization decreased with increasing HA concentration.Instead of cosorption, the HA molecules would inhibit phenanthrene retention through competition for limited sites on modified palygorskite surfaces [41].With the increasing dose of HA, HA molecules would occupy the pores of modified palygorskite surfaces that were large enough to accommodate phenanthrene molecules, thus reducing the accessibility of phenanthrene molecules to sorption sites and reducing the phenanthrene sorption on modified palygorskite [42].In addition, the high hydrophobicity of phenanthrene enhanced its ability to be dissolved into the hydrophobic HA molecules in the solution and impeded the phenanthrene retention on the modified palygorskite.The pH PZC value of modified palygorskite was 7.9.Under the test pH, the HA sorption would decrease and more HA molecules would enter the aqueous phase.This would also result in enhanced phenanthrene solubility in the solution and thus in reduced phenanthrene sorption from sorbents.
The effects of temperature on phenanthrene immobilization were investigated in Figure 1d.There was slight difference in phenanthrene sorption when the temperature varied from 283 to 293 K.However, the sorption decreased from 0.452 to 0.445 mg/g with temperatures varying from 293 K to 303 K.This indicated that temperature played a negative effect on phenanthrene immobilization.The uptake of phenanthrene was found to reduce with increasing temperature, indicating that phenanthrene immobilization on modified palygorskite was favored at lower temperatures.The reduction in phenanthrene immobilization capacity with increasing temperature from 293 to 303 K indicated an exothermic nature of the immobilization process.One explanation was that increasing temperature could enhance phenanthrene solubility, and decrease immobilization [43].

SCA Trees
An SCA tree was established to reflect the relationships between the state variables and the dependent variables.There were several factors affecting the performance of the SCA model, including the quality of original experimental data, internal configuration parameters (α and N min ), the combination of state variables and the data partition strategy [11].The internal parameters had significant effects on the shape of the SCA trees, since the SCA relied on these variables for cutting and merging procedures as well as for stopping the iterations.The criteria for the cutting and merging clusters were: cut cluster when p ≤ α and merge clusters when p > α, where the p values used at cutting and merging knots were significance levels of the F test.In this study, a default significance level of 0.01 was set for both cutting and merging exercises.In general, the higher the α value, the lower the F level (i.e., a decreased strictness in the cutting which would result in more cutting operations).Similarly, the lower the α value, the higher the F level (i.e., more merging operations because of the reduced strictness of mergence).The N min was set as 3 in this study and it also placed significant effects on the scales of the cluster trees since it was used as one of the end criteria in training the SCA tree.
As for experimental run No. 3, the corresponding SCA tree is described in Figure 2; it formed a forecasting system to reflect the phenanthrene immobilization process at the water/modified palygorskite interface.Based on the tree, the phenanthrene immobilization can be predicted.For example, let X 1 = 0.9, X 2 = 20, X 3 = 0.3, X 4 = 300, and X 5 = 9 be new inputs.To predict the corresponding phenanthrene immobilization process, we have: X 1 > 0.6 for the first branch knot so that the sample enters cluster 3; X 1 ≤ 0.9, so that it enters cluster 6; X 3 ≤ 0.325, so that it enters cluster 12; X 5 > 8.75, so that it enters cluster 19; and X 2 > 7, so that it finally enters cluster 29 with a prediction value of 0.337 ± 0.004.Let X 1 = 0.4, X 2 = 6, X 3 = 0.2, X 4 = 290, and X 5 = 6.5 as new inputs.To predict the corresponding phenanthrene immobilization, we have: X 1 ≤ 0.6 for the first branch knot so that the sample enters cluster 2; X 3 ≤ 0.325, so that it enters cluster 4; X 5 > 5.25, so that it enters cluster 9; and X 2 ≤ 7, so that it enters cluster 16 and finally merges into cluster 38 with a prediction value of 0.182 ± 0.011.Let X 1 = 0.65, X 2 = 12, X 3 = 0.08, X 4 = 310, and X 5 = 8.5 as new inputs.To predict the corresponding phenanthrene immobilization, we have: X 1 > 0.6 for the first branch knot so that the sample enters cluster 3; X 1 ≤ 0.9, so that it enters cluster 6; X 3 ≤ 0.325, so that it enters cluster 12; X 5 ≤ 8.75, so that it enters cluster 18; X 3 > 0.075, so that it enters cluster 27; and X 5 > 5.25, so that it enters cluster 37 and finally merges into cluster 40 with a prediction value of 0.359 ± 0.021.Similarly, we can obtain other prediction data with respect to any combination of X 1 -X 5 through the SCA tree.
Water 2017, 9, 590 8 of 15 reduced strictness of mergence).The Nmin was set as 3 in this study and it also placed significant effects on the scales of the cluster trees since it was used as one of the end criteria in training the SCA tree.As for experimental run No. 3, the corresponding SCA tree is described in Figure 2; it formed a forecasting system to reflect the phenanthrene immobilization process at the water/modified palygorskite interface.Based on the tree, the phenanthrene immobilization can be predicted.For example, let X1 = 0.9, X2 = 20, X3 = 0.3, X4 = 300, and X5 = 9 be new inputs.To predict the corresponding phenanthrene immobilization process, we have: X1 > 0.6 for the first branch knot so that the sample enters cluster 3; X1 ≤ 0.9, so that it enters cluster 6; X3 ≤ 0.325, so that it enters cluster 12; X5 > 8.75, so that it enters cluster 19; and X2 > 7, so that it finally enters cluster 29 with a prediction value of 0.337 ± 0.004.Let X1 = 0.4, X2 = 6, X3 = 0.2, X4 = 290, and X5 = 6.5 as new inputs.To predict the corresponding phenanthrene immobilization, we have: X1 ≤ 0.6 for the first branch knot so that the sample enters cluster 2; X3 ≤ 0.325, so that it enters cluster 4; X5 > 5.25, so that it enters cluster 9; and X2 ≤ 7, so that it enters cluster 16 and finally merges into cluster 38 with a prediction value of 0.182 ± 0.011.Let X1 = 0.65, X2 = 12, X3 = 0.08, X4 = 310, and X5 = 8.5 as new inputs.To predict the corresponding phenanthrene immobilization, we have: X1 > 0.6 for the first branch knot so that the sample enters cluster 3; X1 ≤ 0.9, so that it enters cluster 6; X3 ≤ 0.325, so that it enters cluster 12; X5 ≤ 8.75, so that it enters cluster 18; X3 > 0.075, so that it enters cluster 27; and X5 > 5.25, so that it enters cluster 37 and finally merges into cluster 40 with a prediction value of 0.359 ± 0.021.Similarly, we can obtain other prediction data with respect to any combination of X1-X5 through the SCA tree.The tree clearly shows the role of every environmental parameter in mapping the relationship.However, the SCA tree could not directly quantify the effects of environmental factors on phenanthrene immobilization.To this end, the effects of input (Xk) on the output (Y) in SCA are defined as [11]: Figure 2. The stepwise-cluster analysis (SCA) tree for phenanthrene immobilization at aqueous/modified palygorskite interface (Run No. 3, the cutting nodes are highlighted in red while the merging nodes are highlighted in blue).X 1 : Initial phenanthrene concentration; X 2 : Added HA dose; X 3 : Ionic strength; X 4 : Temperature; X 5 : pH.
The tree clearly shows the role of every environmental parameter in mapping the relationship.However, the SCA tree could not directly quantify the effects of environmental factors on phenanthrene immobilization.To this end, the effects of input (X k ) on the output (Y) in SCA are defined as [11]: where, NN is the number of total nodes in the SCA tree; and N Xk (i) is the number of patterns (samples) at node i of the SCA tree where the corresponding X k variable is used as the cutting criteria at node i. Correspondingly, N Xk (i) represents the number of samples on which the X k variable has influence at node i.To a certain extent, the layers (nodes) where X k is located in the SCA tree represent the weight effect which X k has on Y.This is because the higher the independent variables are in the SCA tree, the earlier the classification criteria would depend on the variables, and the more data patterns on which X k would have an effect.Therefore, based on the obtained SCA tree, the effects of the environmental factors on phenanthrene immobilization were ranged in a descending order as: initial phenanthrene concentration, ionic strength, pH, added HA dose, and temperature.The results are consistent with our previous studies using the fuzzy factorial method [1].One of the advantages of SCA is that it has no requirement for the format of training and test data, while factorial method needs preliminary experimental design for the levels of input data.Figures 3-5 and Table 2 show the results obtained using least squares support vector machines (LSSVM) and random forest (RF) models for experimental run No. 3. The reason for choosing these models for comparison was that the current models used for constructing relationships in sorption studies are mainly based on these methods.The LSSVM-related model indicated a higher predictive capability than the linear method for the sorption of methylene blue [44].RF proved to be more powerful than multiple linear regression to predict the sorption of bromophenol blue using activated carbon sorbents [45].As for the immobilization of phenanthrene on modified palygorskite, the coefficient of the determination of R 2 for training and test sets of SCA (0.994 and 0.992) were higher than those of LSSVM (0.843 and 0.710) and those of RF (0.881 and 0.883).The RMSEs for training and test sets of SCA (0.00837 and 0.0101) were lower than those of LSSVM (0.0454 and 0.0642) and those of RF (0.0577 and 0.0603).From the data, the calibration and verification efficiency of RF was higher than 0.8 and the whole dataset distributed along a straight line.However, the dataset was unbalanced in terms of regional distribution and all the data were away from the line.Moreover, the calibration and verification ability of LSSVM was limited.These results confirmed the better fitting and predictive ability of SCA relative to LSSVM and RF when building nonlinear relationships between environmental factors and the sorption amount of phenanthrene in the immobilization process.
Water 2017, 9, 590 9 of 15 where, NN is the number of total nodes in the SCA tree; and NXk (i) is the number of patterns (samples) at node i of the SCA tree where the corresponding Xk variable is used as the cutting criteria at node i.
Correspondingly, NXk (i) represents the number of samples on which the Xk variable has influence at node i.To a certain extent, the layers (nodes) where Xk is located in the SCA tree represent the weight effect which Xk has on Y.This is because the higher the independent variables are in the SCA tree, the earlier the classification criteria would depend on the variables, and the more data patterns on which Xk would have an effect.Therefore, based on the obtained SCA tree, the effects of the environmental factors on phenanthrene immobilization were ranged in a descending order as: initial phenanthrene concentration, ionic strength, pH, added HA dose, and temperature.The results are consistent with our previous studies using the fuzzy factorial method [1].One of the advantages of SCA is that it has no requirement for the format of training and test data, while factorial method needs preliminary experimental design for the levels of input data.Figures 3-5 and Table 2 show the results obtained using least squares support vector machines (LSSVM) and random forest (RF) models for experimental run No. 3. The reason for choosing these models for comparison was that the current models used for constructing relationships in sorption studies are mainly based on these methods.The LSSVM-related model indicated a higher predictive capability than the linear method for the sorption of methylene blue [44].RF proved to be more powerful than multiple linear regression to predict the sorption of bromophenol blue using activated carbon sorbents [45].As for the immobilization of phenanthrene on modified palygorskite, the coefficient of the determination of R 2 for training and test sets of SCA (0.994 and 0.992) were higher than those of LSSVM (0.843 and 0.710) and those of RF (0.881 and 0.883).The RMSEs for training and test sets of SCA (0.00837 and 0.0101) were lower than those of LSSVM (0.0454 and 0.0642) and those of RF (0.0577 and 0.0603).From the data, the calibration and verification efficiency of RF was higher than 0.8 and the whole dataset distributed along a straight line.However, the dataset was unbalanced in terms of regional distribution and all the data were away from the line.Moreover, the calibration and verification ability of LSSVM was limited.These results confirmed the better fitting and predictive ability of SCA relative to LSSVM and RF when building nonlinear relationships between environmental factors and the sorption amount of phenanthrene in the immobilization process.In this study, the data were collected 10 times for the model calibration and verification through SCA methods.The calibration and verification of the model were conducted 10 times to assure the stability of the model.Each experimental run consists of 243 samples, 162 samples for the training set and 81 samples for the test set.The data were randomly sampled based on uniform distribution.The correlation coefficients R 2 and RMSE of SCA from ten experimental runs are listed in Table 3.It is indicated in the table that all the R 2 are approaching one and the RMSE approaching zero.This confirmed the excellent prediction performance of the model when building nonlinear relationships through SCA methods.In this study, the data were collected 10 times for the model calibration and verification through SCA methods.The calibration and verification of the model were conducted 10 times to assure the stability of the model.Each experimental run consists of 243 samples, 162 samples for the training set and 81 samples for the test set.The data were randomly sampled based on uniform distribution.The correlation coefficients R 2 and RMSE of SCA from ten experimental runs are listed in Table 3.It is indicated in the table that all the R 2 are approaching one and the RMSE approaching zero.This confirmed the excellent prediction performance of the model when building nonlinear relationships through SCA methods.After collecting the experimental data 10 times, 10 SCA trees were built accordingly.The sequence of the significance of the factors is listed in Table 4.It is obvious that X 1 (initial phenanthrene concentration) and X 3 (ionic strength) are the two most significant factors influencing the sorption process for all the experimental runs.X 2 and X 5 are less significant than X 1 and X 3 .This is because the sample size becomes smaller after several cutting/merging processes, and the system is not that stable with a small dataset.Therefore, the sequence of X 2 (added HA dose) and X 5 (pH) might be different.The most insignificant factor compared to the other four parameters is X 4 (temperature), though it still has certain effects on the immobilization process (Figure 1).In all, the ordering of the effect of the input variables was basically the same for multiple trees, indicating that SCA can be used to map the nonlinear and discrete relationships and elucidate the effects of factors on phenanthrene immobilization at the aqueous/modified palygorskite interface.

Run Number
Sequence of the Environmental Factor

Conclusions
In this paper, a stepwise-cluster inference model was developed to analyze the nonlinear relationships among environmental factors and phenanthrene immobilization at the water/modified palygorskite interface, including the initial phenanthrene concentration, pH, ionic strength, temperature, and added HA dose.The results indicated that SI could help establish a statistical relationship between environmental parameters and phenanthrene immobilization where discrete and nonlinear complexities exist.Compared to LSSVM and RF methods, SI was more straightforward for describing the nonlinear relationships and precisely fitting and predicting the immobilization of phenanthrene.R 2 and RMSE were calculated to demonstrate the accuracies of the developed forecasting trees.Through the calculation of input effects on the output in SI model, the effects of environmental factors on phenanthrene immobilization were ranged in a descending order as: initial phenanthrene concentration, ionic strength, pH, added HA dose, and temperature.The maximum phenanthrene immobilization on modified palygorskite was 0.476 mg/g under the experimental conditions tested and the corresponding removal efficiency of phenanthrene was higher than 94%.To the best of our knowledge, this study was the first attempt to introduce SCA into mapping the nonlinear relationships involved in PAH immobilization processes.It was expected that SCA would expand the potential applications to other complicated relationships during multiple types of sorbent/sobate systems in the immobilization processes.

Figure 4 .
Figure 4.The calibration (a) and verification (b) of phenanthrene sorption data using the random forest (RF) method (Run No. 3).

Figure 4 .
Figure 4.The calibration (a) and verification (b) of phenanthrene sorption data using the random forest (RF) method (Run No. 3).

Figure 4 .
Figure 4.The calibration (a) and verification (b) of phenanthrene sorption data using the random forest (RF) method (Run No. 3).

Figure 5 .
Figure 5.The calibration (a) and verification (b) of phenanthrene sorption data using the least squares support vector machines (LSSVM) method (Run No. 3).

Figure 5 .
Figure 5.The calibration (a) and verification (b) of phenanthrene sorption data using the least squares support vector machines (LSSVM) method (Run No. 3).

Table 1 .
The 35experimental design of phenanthrene immobilization at the aqueous/modified palygorskite interface.

Table 2 .
Comparisons among SCA, LSSVM and random forest for the fitting of phenanthrene sorption data (Run No. 3).

Table 2 .
Comparisons among SCA, LSSVM and random forest for the fitting of phenanthrene sorption data (Run No. 3).

Table 3 .
R 2 and RMSE through SCA methods through 10 experimental runs.

Table 4 .
The sequence of the significance of the factors in the SCA tree.