A Review on Progress in QSPR Studies for Surfactants

This paper presents a review on recent progress in quantitative structure-property relationship (QSPR) studies of surfactants and applications of various molecular descriptors. QSPR studies on critical micelle concentration (cmc) and surface tension (γ) of surfactants are introduced. Studies on charge distribution in ionic surfactants by quantum chemical calculations and its effects on the structures and properties of the colloids of surfactants are also reviewed. The trends of QSPR studies on cloud point (for nonionic surfactants), biodegradation potential and some other properties of surfactants are evaluated.


Introduction
Surfactants are usually amphiphilic organic compounds, meaning they contain both hydrophobic groups (their "tails") and hydrophilic groups (their "heads"). Therefore, they are soluble in both organic solvents and water. Due to their unique amphiphilic structures, surfactants have been widely used in traditional industries [1]. Furthermore their applications in many fields of science and OPEN ACCESS technology have recently been expanded [2][3][4]. Preparation of size-controllable nanoparticles by application of micelles and microemulsions as well as of porous materials by application of liquid crystals have been well documented [5,6]. Various membrane structures mainly consisting of amphiphilic molecules can be applied to fields such as photochemical solar energy transformation, molecular recognition, pharmaceutical formulation, targeting and sustained-release, and provision of unique micro-environments for substrates and enzymes and enzyme immobilization [7][8][9][10], and even restoration of environmental contamination [11].
On the other hand, close attention has been given to the impact on the environment, especially on soils and waters, caused by the use of large quantities of surfactants [12]. It is especially worth noting that due to their special structures, perfluorooctane sulfonate (PFOS) and perfluorooctanoic acid (PFOA) surfactants are an important class of perfluorinated compound (PFCs) and also a class of emerging persistent organic pollutants (POPs) due to their high chemical stability and slow degradation. Fluorosurfactants are synthetic organofluorine chemical compounds that have multiple fluorine atoms. They can be polyfluorinated or fluorocarbon-based (perfluorinated) [13]. Fluorosurfactants have a fluorinated "tail" and a hydrophilic "head"， and are more effective at lowering the surface tension of water than comparable hydrocarbon surfactants, as they can lower the surface tension of water down to a value half of what is attainable by using hydrocarbon surfactants [14]. Some fluorosurfactants, such as PFOS, are detected in humans and wildlife, and fluorosurfactants tend to concentrate at the liquid-air interface due to the lipophobic nature of fluorocarbons [15]. They are not susceptible to the London dispersion force, the basis for lipophilicity, because the electronegativity of fluorine reduces the polarizability of the surfactants' fluorinated molecular surface. Therefore, the attractive interactions resulting from the "fleeting dipoles" are reduced, in comparison to hydrocarbon surfactants. However, fluorosurfactants are more stable and fit for harsh conditions than hydrocarbon surfactants because of the stability of the carbon-fluorine bond. Likewise, fluorosurfactants can persist in the environment due to their high stability.
Consequently, studies on quantitative structure-property relationships (QSPRs) of surfactants and understanding of the effects of molecular structures on their functions and properties are becoming increasingly important [1,16]. In the processes for study, development and application of surfactants, a broad range of data concerning their properties and activities have been accumulated. Using thermodynamic data and other experimental data, widely applicable and acceptable QSPR models have been established between basic structures and physicochemical properties, applicable functions and some other special properties [17][18][19][20][21]. However, these QSPR models regarding surfactants are generally based only on the summary of a large amount of experimental data, and no detailed studies on their mechanisms of interaction have been performed. At present, QSPR methods based on the studies by Hansch and Free-Wilson [22] have been applied to a variety of fields, and related software and useful mathematical models have been developed [23,24]. In this paper, we will mainly review recent progress on development of QSPR for surfactants.

Studies on Relationships between CMC and Molecular Structures
In chemistry, the critical micelle concentration (cmc) is defined as the concentration of surfactants above which micelles are spontaneously formed. Upon addition of surfactants into a system, they will initially partition into the interface, reducing the system free energy by lowering the energy of the interface and by removing the hydrophobic parts of the surfactant from contacts with water. When the surface coverage by the surfactants increases and the surface free energy (surface tension) has decreased, the surfactants start aggregating into micelles, thus again decreasing the system free energy by decreasing the contact area of hydrophobic parts of the surfactant with water. Upon reaching cmc, any further addition of surfactants will just increase the number of micelles (in the ideal case). As a measurable physical quantity for sudden change in surfactant solution, cmc can be used as a gauge for the surface activity of surfactants. In the beginning of development of QSAR models for surfactants, the length of lipophilic chain of surfactants was mainly considered, e.g., Becher et al. [25] established an equation to relate cmc to the numbers of carbons and ethoxyl groups. In recent years, topological and electronic structures have been adopted for this purpose. The program CODESSA (Comprehensive Descriptors for Structural and Statistical Analysis), running under the Windows environment and developed by Florida University, can produce more than 400 molecular descriptors including constitutional, topological, geometrical and electronic structure, and this program can be combined with molecular orbital program MOPAC (Molecular Orbital PACkage) [26][27][28][29].

QSPR Studies on CMC of Surfactants Based on Molecular Connectivity Index
Using the CODESSA program, QSAR models have been developed by Huibers et al. to predict cmc values for non-ionic and anionic surfactants [30,31]. For non-ionic surfactants, a dataset of 77 samples was selected and divided into seven categories based on the characteristics of their hydrophobic and hydrophilic moieties [30] (Figure 1).
Multiple linear regression (MLR) analyses of molecular descriptors and the logarithm of the cmc were carried out using the heuristic algorithm, and the best model obtained is: Log 10 cmc = -(1.80 ± 0.16) -(0.567 ± 0.009)c-KH0 + (1.054 ± 0.048)c-AIC2 + (7.5 ± 1.0)RNNO (n = 77, R 2 = 0.9833, F = 1433, S 2 = 0.0313) (1) where n is the number of compounds used for regression, R 2 the squared correlation coefficient, S 2 the standard error of the regression, and F the Fisher ratio for the regression; c-KH0 stands for the Kier & Hall molecular connectivity index of zero-th order for hydrophobic fragment [32], and using the numbers of total electrons, valence electrons and hydrogen atoms contained in this fragment represents the contribution from all non-hydrogen groups; the second order average structural information index (c-AIC 2 ) reflects basic chemical characteristics of hydrophobic moiety [33]; RNNO (so-called relative number of oxygen and nitrogen atoms) stands for the size of hydrophilic moiety and its value is related to the numbers of nitrogen and oxygen atoms. The positive regression constants (R 2 = 0.9833) show that branches and other structures can increase cmc more than a straight chain.
In this equation, the most useful descriptor is t-sum-KH0 which represents the kier & hall index of zero-th order for the whole hydrophobic domain and is related to molecular volume and surface domain. The second descriptor TDIP is molecular total dipole moment obtained from atomic charges using MOPAC program [34]. Analysis of the change in total dipole moment of the anionic surfactants shows that total dipole moment increases with a rise in the size of alkyl chain if the hydrophilic moiety remains identical [30]. If the alkyl chain keeps unchanged, moving of hydrophilic group towards the molecular center will lead to a decrease of total dipole moment, however its effect on the cmc concentration is insignificant. For the double-tailed surfactants, total dipole moment is decided by the longer hydrophobic chain. The third descriptor stands for the sum of carbon atoms in the whole hydrophilic moiety and describes the variation of the hydrophilic moiety structure.

Figure 2.
Representative structures of the anionic surfactants, showing the diversity of the hydrophilic and hydrophobic domains [30].
When the hydrophilic moiety is sulfate or sulfonate, change in cmc only depends on the hydrophobic moiety, and the change in the hydrophilic moiety is small and even a fragment descriptor is not needed for this. The regression equation is [30]: In this model, the three descriptors are related to variation of the hydrophobic moiety of the surfactants. KH1 is the first-order Kier & Hall molecular connectivity index and is related to molecular surface domain and volume. KS3 is the third-order Kier & Hall molecular connectivity index, which contains the information for molecular shape. This index has a higher value for molecules with a straight chain than for those with a branched chain. HGP is hydrophobic group position on the longest chain, which simply describes the carbon number attached to the hydrophilic moiety. This descriptor explains the phenomenon that the cmc increases as the head group moves farther from the α-carbon position.

QSPR Studies of CMC of Surfactants Based on Quantum Mechanical Descriptors
It is known that dissolving of a surfactant in water and micelle formation is an exothermic process due to free energy reduction in the system [17], and that change in the energy mainly comes from interactions between surfactant molecules and between surfactant molecules and water molecules. Coulombic interactions, intermolecular van der Waals interactions and hydrogen bonding all play important roles in the formation of micelles. In the work by Wang et al. [35], therefore, quantum mechanical descriptors like molecular energies and dipole moment were additionally employed as descriptors and a better model was obtained between molecular structures and cmc. The model for the nonionic surfactants is [35]: The model for the anionic surfactants is [27]: where E T is the total energy of molecule, ΔH f molar heat of formation, D the molecular dipole moment, E HOMO the energy of the highest occupied molecular orbital, E LUMO the energy of the lowest unoccupied molecular orbital, and KH0 the kier & hall molecular connectivity index of zero-th order. In the QSAR model for anionic surfactants, the molecular structure descriptors having effecs on the cmc are in the following order: KH0 > E T > D > E LUMO >ΔH f > E HOMO . Katritzky et al. also suggested that significantly important molecular descriptors in the selected QSPR models were topological, solvational and charge-related descriptors as the driving force of the intermolecular interactions between anionic surfactants and water [36]. The correlation coefficients (R 2 ) between the calculated values with the above two models and the experimental values are 0.9965 [35] and 0.9989 [27]. These high correlations have demonstrated the necessity of using electronic structures to study QSAR for surfactants.
Wang et al. [37] conducted a further study to derive a quantitative structure-property relationship for 77 nonionic surfactants belonging to eight series, and they suggested that the best model contained four quantum-chemical descriptors (ΔH, D, E HOMO and E LUMO ), and two constitutional descriptors (the molecular weight of surfactant (M) and the number of oxygen and nitrogen atoms (n NO )), and one topological descriptor (KH0). Wang et al. [38] used the model (Equation 5) to predict cmc of three AE 3 SO 3 compounds, and their results indicated that the calculated values were in accordance with their observed values.
Li et al. optimized hydrophobic-hydrophilic segment geometries of 98 anionic surfactants at ab initio RHF/6-31G(d) level, and obtained a quantum chemical dataset including charge density, energies of molecular orbital and dipole moment. The anionic surfactants employed include sodium alkyl sulfates, sodium alkyl sulfonates, sodium alkyl benzenesulfonates, and potassium alkyl carboxylates with a wide variety of hydrophobic structures [29]. Based on one constitutional descriptor and two quantum chemical descriptors, a significant QSPR model for cmc of anionic surfactants was obtained by MLR technique. The model they have established is [29]: where N T represents the total atom number in the hydrophobic-hydrophilic segment, µ is the dipole moment of surfactant segment, and Q C-max represents the maximum net atomic charges on C atom; 2 CV R is the squared correlation coefficient for the 'leave-one-out' cross-validation procedure. Katritzky et al. explored a data set of 181 diverse anionic surfactants to relate the logarithm of critical micelle concentration (cmc) to the molecular structure using CODESSA Pro software [36]. Their final regression equation involved five descriptors: the Kier & Hall index (order 1); the Kier shape index (order 2) defined for the hydrophobic fragment; moment of inertia B, calculated for the hydrophilic fragment; the total point-charge component of the molecular dipole; and the image of the Born solvation energy defined for the whole molecule. The most obvious influence on cmc was manifested by hydrophobic fragments expressed by the topological and geometrical descriptors, while the hydrophilic fragment is represented by constitutional, geometrical, and charge related descriptors.

QSPR Studies of CMC of Surfactants Using Neural Network
Utilizing MLR and an artificial neural network (ANN) algorithm, Katritzky et al. derived linear and nonlinear predictive models from a data set of 162 nonionic surfactants [19]. The artificial neural network (ANN) is a popular tool in function learning due to its ability to learn rather complicated functions. ANN is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks. Neural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data. The descriptors in the derived models relate to the molecular shape and size and to the presence of heteroatoms participating in donor-acceptor and dipole-dipole interactions. Steric hindrance in the hydrophobic area also plays an important role in micellization. The QSAR models reported are expected to provide reliable estimations for the following surfactant classes: branched and linear alkyl ethoxylates, octylphenyl, ethoxylates, linear ethoxylated alcohols, octylphenols, alkanediols, alkyl mono-and disaccharides, ethoxylated alkylamines and alkylamides, fluorinated alkyl ethoxylates, carbohydrate derivatives, and dimeric surfactants.
A QSPR study was also performed using wavelet neural network (WNM) to relate the structure of 94 cationic Gemini surfactants to their cmc [16]. Wavelet neural networks are another novel approach towards the learning function.Wavelet networks, which combine the wavelet theory and feed forward neural networks, utilize wavelets as the basis function to construct the networks. The performance of the QSPR model obtained was investigated by the test set and the average error was 0.105 for the test set, which is superior to the MLR model. In this work, the cmc of Gemini surfactants was related to the 12 descriptors (seven topological, three WHIMs, one geometrical and one functional group descriptors) by WNN model for the first time.

QSPR Studies of CMC of Surfactants Using Other Methods
Considering the fact that relationship between free energy change in micellization process and the nucleus structure is related to hydrophobic fraction of surfactants, Robert et al. adopted octanol/water partition coefficient (logP) to systematically study the cmc models for anionic surfactants [39]. They made regression analysis of experimental cmc values for primary alcohol sulfate and primary alcohol ester sulfate (at 50 °C) vs IIh (the logP fragment value for the hydrophobe, simply defined as the whole molecule minus the negatively charged fundamental fragment SO 3 or OSO 3 -) and L (the length of the hydrophobe, in C-C single bond unit), and the following QSPR models were obtained [39]: where Pcmc is the negative logarithm of the cmc. IIh in Model 6 was calculated with Leo and Hantsch approach [40] and IIh in Model 7 computed based on position-dependent breaching factor [41]. A regression was made between the cmc values calculated with Model 7 and the experimental cmc values, and a fairly good liner relationship was found. The correlations obtained are summarized as follows: for all anionic surfactants (n = 133, R 2 = 0.976, S = 0.12, F = 5360); all anionic surfactants except SALS (secondary alcohol sulfates), LAS (linear alkyl benzene sulfonate), and β-branched PAS (primary alcohol sulfate) (n = 75, R2 = 0.988, S = 0.09, F = 6122); SALS, LAS and β-branched PAS (n = 58, R 2 = 0.982, S = 0.08, F = 3074). In addition, molecular mechanics has also been used to predict cmc for surfactants like linear alkyl polyoxyethylene ethers and alkyl polyglucoside [42,43].

Charge Distribution of Surfactants and Its Influence on Their Properties
A variety of properties for surfactants are related to charge distribution in their molecules. A recent study showed that in colloidal dispersion systems (mainly system with low particle concentrations and high surface charges) there existed a long-range force purely from electrostatic interactions between particles in addition to short-ranged forces caused by van de Waals interaction [44]. Therefore the key to the study of electrostatic interactions is description and quantification of the charge distribution.

Computation of Charge Distribution in Ionic Surfactant Molecules and Their Effects
For ionic surfactants, net charge carried on the headgroup of the molecules has been normally treated as a point charge, although it is factually located on several atoms in the hydrophilic groups and even enter tails of surfactant molecules (see Figure 3) [45]. Using an ab initio 6-31G basis set, electrostatic potential surface was calculated for dodecyl carboxylate and decyl sulfate and proved that there is a partial negative charge on the alkyl chain of these surfactants [46]. By studying the fluorescence quenching behavior and constants of ionic surfactants, it is indicated that the single molecule of the surfactant adopt dynamic coil configuration in water [47]. This configuration will be more closely related to molecular charge distribution than other configurations. Consequently, a previous assumption that the tail part of a surfactant is nonpolar and electronically in equilibrium is not objective in a physical sense.
Huibers has recently studied charge distribution in common ionic surfactants using four widely accepted semiempirical methods (MINDO/3, AM1, PM3 and MNDO/d), and developed QSAR models for some properties of these surfactants (Figure 4) [45]. The ionic surfactants they studied include anionic (sulfate, sulphonate and carboxylate), cationic (trimehthylammonium and pyridinium) and amphoteric (betaine and dimethylamine oxide) classes. Addition of d-orbital to basis sets for MNDO [48] has improved calculation results for elements of the third period such as sulphur and phosphorus. From the calculation results, it can be seen that there is a partial charge distribution found for the α-CH 2 and the alkyl chain. The terminal methyl group of all surfactants has a positive charge and this causes their neighboring CH 2 to carry a partial negative charge. For the anionic surfactants, a ca. 5% partial negative charge resides on the tail part of the molecules (for sulphonate surfactants, a 5% partial positive charge is carried on the tail part of the molecules). The polar head of the amphoteric surfactants carries a ca. 6% negative charge with the same amount of positive charge on the tail. The polar head of the cationic surfactants carries a positive charge and the tail part of surfactants also carries a ca. 10% positive charge, the highest local charge. Although, in some studies modeling associated behavior of surfactants [49], α-CH 2 was considered as part of the polar head, the explanation about this was not given. NMR data from Zhao and Fung [50] also indicated that chemical environment around the α-CH 2 is different from those around the other methylene groups and this may lead to its association with water molecules. Huibers et al. have quantified charges on these groups and shown that a relatively high amount of charge is located on the α-CH 2 and thus have supported the idea that the α-CH 2 belongs to the polar head group [45]. In addition, it is interesting to note that the sign of the charge on the α-CH 2 and that of the charge on polar head can be the same or opposite. These results showed that existence of local charge on the tail of the alkyl chain of surfactants provide a need to reconsider the properties for micelle cores. Normally, for treatment of micelle formation and solubilization, 'water drop' model has been employed without consideration of repulsive interaction between the alkyl chains [49,50]. Obviously, the migration of the charge on the polar head of ionic surfactants to the other part of the molecules, especially to α-CH 2 and terminal methyl group, has an important effect on their properties. Polar head charge and charge on α-CH 2 have varying effects on cmc of different surfactants. Inequality of local charges on the alkyl chains can rationalize polarity of micelles and effects of polar head charge on molecular self-assembly. Furthermore, using the Huckel molecular orbital theory, Jacobs and Anacker have computed charges on hydrophilic group for decyl pyridinium chloride [51], finding that the aggregation numbers in micelles is related to atomic charges on the pyridinium ring attached to the alkyl chains. They were trying to elucidate the role of charge delocalization in micelle formation by determining the aggregation numbers of 1-decylpyridinium bromide and three of its structurally similar isomers (2-, 3-, and 4-decylpyridinium hydrobromide) in an aqueous environment, and suggested that the positive charge is not localized on the nitrogen but is spread over the entire polar head. Using the AM1-calculated charges, Huibers and Jacobs [52] have rationalized the effect of the charge distribution on the hydrophilic group on the aggregation numbers in micelles. The aggregation numbers of the surfactants are shown to increase with a decrease in the residual partial charge in the alkyl tails, suggesting a change in the packing of the surfactants. The critical micelle concentration increases with a decrease in the partial charge of the head groups, indicating increased solubility of the surfactant molecule as charge is more widely distributed throughout the molecule. Villamagna et al. [53] have studied configuration design for structure of surfactants for water-in-oil emulsions using AM1 calculations. Their molecular modelling analysis of presently used surfacts in water/oil emulsions leads to the ideal structure of a surfactant have tail:polar head:hydrogen bonding chains in the volume ratio 1:1:1, and that is a useful way to designing ideal surfactants.
In the meantime, it should be realized that the above calculations are based on gaseous-phase models. Although no environmental effects are considered in the calculations and only pure charge distribution is provided, this can be used to compare different surfactants. To consider the environment's perturbation on charge distribution, i.e., to model the solvation environment is a fairly complex process, since this kind of models needs to reflect the characteristics of flowing media with a certain dielectric strength. Consequently, further studies on model establishment theory and calculation capability are needed in order to predict such a complex system as surfactant micelles.

Descriptors Related to Molecular Surface Area-CPSA
To study interactions between molecules, some researchers have used solvent-accessible molecular surface area as a descriptor. There have been some reports on descriptors for molecular surface area, atomic charges and charge scaling factors and their applications [54][55][56]. Stanton and Jurs [57] have combined molecular surface area with atomic charges and defined a new molecular descriptor named charged partial surface area (CPSA) to address polarization interactions between molecules, which can be correlated with physical quantities such as chromatographic retention, boiling point and surface tension.
The geometric model for CPSA descriptor is to utilize overlapping of hard spheres defined by the van de Waals radii of atoms. The calculation for this descriptor was performed with a UNIX system (Sun4/1102) and ADAPT software. Solvent-accessible area calculation adopted SAVOL algorithm developed by Pearlman [58]. Atomic charges were obtained from the Abraham and Smith [59] algorithm, an empirical method including σ and π contributions, which is parameterized to reproduce experimental dipole moments. Calculations of both surface areas and atomic charges include hydrogen atoms. The CPSA descriptor system established in this study have 25 individual descriptors, which include partial positive surface area descriptors (PPSAs), partial negative surface area descriptors (PNSAs), partial surface area descriptors (DPSAs), fractional charged surface area descriptors (FPSA and FNSA), total surface weighted partial surface area descriptors (WPSA and WNSA), relative positive and relative negative charges descriptors (RPCG and RNCG), and relative positive and relative negative charged surface area descriptors (RPCS and RNCS).

Surface Tension Prediction Models
Surfactants can to some extent balance interfacial unsaturated force fields to reduce surface tension. Different structures of surfactants and the resulted difference in intermolecular interactions can be understood at different levels. The Wang group has focused on parameters of molecular structure of surfactants [60], used as molecular descriptors oxygen atom number (NO) in hydrophilic group, Kier & Hall zeroeth-order index (KH0), heat of formation (ΔH f) calculated by quantum mechanics, total energy of a molecule (E T ), molecular mass (Ws) and dipole moment (D) etc to regress vs. minimum surface tension at cmc (γ 0 ) and established several types of related models. Among them, the best model is [61]: Their chosen data set of γ 0 contains 30 diverse structures of nonionic surfactants and molecular descriptors ΔH f , E T and D are obtained from calculations using MNDO-MOPAC 7.0. Through energy and electronic parameters, this model set up indirect QSAR between surface tension and electron motion.
Wang et al. have also established quantitative models for 20 anionic surfactants with different structures concerning their surface tension reduction effect expressed as surface pressure (II) at different temperatures (t) and different counter ion concentrations (c) [62]. This effect is normally expressed as maximum surface pressure measured at cmc condition. In addition to temperature and counter ion concentration, variables for the optimal models obtained also include molar heat of formation for anionic surfactant (ΔH f ), dipole moment (D) and Kier & Hall zeroeth-order index (KH0) for hydrophobic moiety. The model containing five descriptors is as follows [63]: The IIcmc cal values calculated with this model is highly related to the observed values, and the regression equation is as follows: IIcmc cal = 2.982 + 0.9199IIcmc (obs) (n = 34, R = 0.994, S 2 = 0.0688) Stanton et al. [64] and Stanton and Jurs [65] have focused on descriptors relating to molecular surface area. They made multiple linear regressions on observed surface tension of alkanes, alkyl esters, alkyl alcohol, etc. against these descriptors, and established surface tension prediction models. 146 compounds were selected for the regressions, among which 74 compounds are from alkane type (accounting for 50.7% of the total), 44 compounds from α-ester type (accounting for 30.1%), and 28 compounds from alcohol type (accounting for 19.2%) [64]. This model contained 10 molecular descriptors [65], among which six descriptors are topological, two electronic and two from hydrogen bonding system. Predicted surface tension values for these three systems under study showed an excellent correlation with the experimental values (R 2 = 0.983, s = 0.4 dyn/cm). After studying inter-correlations of these descriptors, these authors indicated that the topological descriptors are of a significant correlation to the molecular surface area, and intermolecular interaction increases with a rise in the molecular surface area. This will result in a relatively high surface tension, and thus molecular surface area is the most influential factor for surface tension.

Qsar Studies on Cloud Point of Nonionic Surfactants
The cloud point of a nonionic surfactant is the temperature where the mixture starts to phase separate and two phases appear, thus becoming cloudy. This behavior is characteristics of non-ionic surfactants containing polyoxyethylene chains, which exhibit reverse solubility versus temperature behavior in water and therefore "cloud out" at some point as the temperature is raised. It is affected by salinity, being generally lower in more saline fluids. Cloud point is a critical factor in the performance of nonionic surfactants (such as those containing polyoxyethylene polymers as their hydrophilic moieties) in detergent formulations [66,67]. Nonionic surfactants show rich phase behavior in aqueous mixtures. Below their cloud points (CPs), a number of isotropic phases exist. Above their CPs, nonionic surfactants form opaque suspensions, which eventually separate into water-rich and surfactant-rich phases [68,69].
Bünz et al. studied cloud point of 20 nonionic surfactants with alkyl zwitterions groups, obtaining QSAR models (Table 1) [70]. These models contained four descriptors for molecular structures: two topological [average information content (order 2), Kier shape index (order 3)] and two constitutional descriptors (relative molecular weight, relative number of rings).
where EO# is the number of ethylene oxide residues, 3 k is the third order Kier shape index for the hydrophobic tail, 0 ABIC is the zeroth order average bonding information content of the tail, and 1 SIC is the first order structural information content of the tail.
where lgP is the octanol/water partition coefficient, A is the molecular area, M r is the relative mass, and μ-y and μ-z were the molecular dipole.
In 2006, Ren et al. developed the QSPR models to predict cloud points and study the cloud phenomena of nonionic surfactants in aqueous solution [69]. Four descriptors were selected by the heuristic method as the inputs of multiplier linear regression and support vector machine (SVM) models. The basic idea of SVM is to map the input vectors into a higher dimensional feature space by a kernel function, K(xi, xj), and then to do linear regression in this space. SVM models performed better both in fitness and in prediction capacity. For the test set, they gave a predictive correlation coefficient of 0.9882, root mean squared error of 4.2727, and absolute average relative deviation of 9.5490, respectively. The proposed models can provide some insight into what structural features are related to the cloud points of compounds, i.e., the molecular size, structure, and isomerism of the hydrocarbon moiety and the degree of oxyethylation.
More recently, QSPR analysis has been directed to a series of pure nonionic surfactants containing linear alkyl, cyclic alkyl and alkyl phenyl ethoxylates [66]. Modeling of cloud point of these compounds as a function of the theoretically derived descriptors was established by MLR and partial least squares (PLS) regression. PLS, which is based on factor analysis fundamentals, is applied where there are many variables but not enough samples or observations. PLS has been applied to many fields of applied sciences with great success. In chemometrics, it is one of the favored methods of analysis. In this study, a genetic algorithm (GA) was employed as a variable selection method in QSPR analysis. GA is developed to mimic some of the processes observed in natural evolution, which are an efficient strategy to search for the global optima of solutions. The results indicate that the GA is a very effective variable selection approach for QSPR analysis. The comparison of the two regression methods used showed that PLS has better prediction ability than MLR.

Studies on Degradation of Surfactants
QSPR studies with respect to cmc, charge distribution and surface tension prediction of surfactants have received a high attention as described above. However biodegradation potential of surfactants should be another highly important issue as this decides their environmental impact. Concerning biodegradation of surfactants, the reported studies include effects of alkyl chain structure (straight and branched chains) and position of branching in the chain, etc., on biodegradable activity [73][74][75].
Biodegradation is the process whereby organic (i.e., carbon-containing) matter is decomposed by the action of micro-organisms present in the environment. The evaluation of biodegradability of anthropogenic organic substances is an essential parameter for environmental risk assessment and required according to appropriate legislation.
Biodegradation with respect to surfactants is defined as primary biodegradation, ultimate biodegradation and ready aerobic biodegradability. Primary biodegradation means the structural change (transformation) of a surfactant by microorganisms resulting in the loss of its surface-active properties due to the degradation of the parent substance. Ultimate biodegradation means the level of biodegradation achieved when the surfactant is completely used by micro-organisms resulting in its breakdown to inorganic end-products such as carbon dioxide, water and mineral salts of any other elements present (mineralization) and new microbial cellular constituents (biomass). Ready aerobic biodegradability is an arbitrary classification of surfactants which have passed certain specified screening tests for ultimate biodegradability; these tests are so stringent that it is assumed that such [76,77].
Siwiski et al. developed a modified river water die-away test for controlling the biodegradability of anionic surfactants and non-ionic surfactants of detergent powders and investigated twelve powders. They found that anionic surfactants were much more easily biodegraded than non-ionic surfactants, and non-ionic surfactants were very different in terms of biodegradability [78]. Sales et al. have carried out a study to research the influence of several environmental factors on the biodegradation of a commercial anionic surfactant (LAS) in waters and sediments of Cadiz Bay (southwest Iberian Peninsula). They concluded that degradation is basically an aerobic process, and hence the introduction of air to the solution will favour it [79]. Li et al. have conducted a study to interpret the differences in biodegradation of LAS and its coproducts from the electronic structure characteristics and to explore the mechanism of LAS biodegradation. In their research, electronic descriptors of LAS and its cocproducts, including orbital energy, dipole moment, charge distributions and local electronic characteristics of surfactant molecule were calculated by using semiempirical quantum chemical method at the PM3 level. They have explained why biodegradation of model compounds at first takes place in the terminal CH 3 group, and they have given two reasonable explanation: (1) there is no transferred H atom in the S-O bond; (2) the degrees of S-O bonding are higher than those of the terminal C-H bonding in the model compounds [80].
PFC surfactants can resist degradation by acids, bases, oxidants, reductants, photolytic processes, microbes and metabolic processes [81][82][83][84]. Some monitoring studies indicate that fluorosurfactants are globally distributed, environmentally persistent and bioaccumulative [85,86]. To evaluate the fate of PFCs in the environment a set of principal transformations was developed and implemented in the simulator of microbial degradation using the catabolite software engine (CATABOL) [87]. The simulator was applied to generate metabolic pathways for 171 perfluorinated substances on Canada's domestic substances list. It was found that although the extent of biodegradation of parent compounds could reach 60%, persistent metabolites could be formed in significant quantities. During the microbial degradation a trend was observed where PFCs are transformed to more bioaccumulative and more toxic products. Perfluorooctanoic acid and perfluorooctanesulfonate were predicted to be the persistent biodegradation products of 17 and 27% of the perfluorinated sulphonic acid and carboxylic acid containing compounds, respectively. Structural, electronic, and thermodynamic properties of linear perfluorooctane sulfonate (PFOS) and its trifluoromethyl-branched isomers (i.e., 1-CF 3 -to 6-CF 3 -PFOS) were theoretically investigated by density functional theory (DFT) calculations with the B3LYP functional and a 6-31++G(d,p) basis set [88]. The linear and branched PFOS ions were identified as the most suitable compounds for interacting with charged species. Furthermore, in the linear anion, the LUMO orbital is located along the whole fluoro-carbon chain, while it is localized to the region close to the ternary carbon in the 4-CF 3 -PFOS, 5-CF 3 -PFOS, and 6-CF 3 -PFOS isomers.
The higher accessibility of the LUMO orbital in these branched anions implicates that they have a higher probability of reacting with free radicals compared with the linear counterpart. This finding is in agreement with the experimental observation that only the branched PFOS isomers were susceptible to reductive defluorination by reduced vitamin B 12 as previously reported. The relative stability of the linear and branched PFOS in their different forms computed by comparing their calculated Gibbs free energy showed that 1-CF 3 -, 6-CF 3 -, and linear PFOS are the most favorable structures in terms of chemical stability.
In summary, QSPR studies on degradation of surfactants are far from complete, and further studies should be initiated especially on fluorosilicone surfactants and silicone surfactants. Due to their higher degradation potential than those of PFOS and PFOA, they belong to a promising direction, where QSPR method can be actively applied for studying their degradation-related properties.

QSPR Studies on Other Properties of Surfactants
It is well known that surfactants are typically amphiphilic molecules that contain both hydrophilic and lipophilic groups. The hydrophile-lipophile balance (HLB) is one of the indicators representing the ratio of the hydrophilicity of a surfactant to its hydrophobicity. The value of HLB number is between 0-60 defining the affinity of a surfactant for water or oil. Chen et al. have established two QSPR models for the HLB value of anionic surfactants by using the quantum chemical descriptors generated by semiempirical approach and density functional theory (DFT). One multiple linear regression model 14 included 46 anionic surfactants belonging to four series of alkyl sulfates and alkyl sulfonates, with the optimal squared correlation coefficient (R 2 ) being 1.000, and the other multiple linear regression model 15 involved 73 structures including polyoxyethylene, acetate, propionate and fluorinated anionic surfactants, with the optimal squared correlation coefficient being 0.993 [89].
It is noteworthy that using these quantum mechanical descriptors can differentiate the differences between the HLB values of different isomers of the surfactants and overcome the difficulty encountered by Davies equation (HLB = ∑(hydrophilic group numbers) + ∑(hydrophobic group numbers)). Davies in the 1950s developed a system based on the analysis of group numbers. The "group number" characterizes the contribution of each specific functional group to the energy that would be required if a solvent molecule were changed from water to an organic solvent [90].
Ghasem et al. established some QSPR models to predict solubility of nonionic solutes in anionic micelle, and their QSPR models were tested for an external prediction set of 11 compounds randomly chosen from 62 compounds. The squared regression coefficients of prediction for the multiple linear regression and partial least squares regression methods were 0.9679 and 0.9728 respectively [92].
Campbell et al. studied effects of surfactants on attachment of bacteria to cellulose acetate (CA) and aromatic polyamide (PA) reverse osmosis membrane [93]. They analyzed effects of 23 classes of surfactants, including nonionic, anionic and amphoteric types, on attachment of Mycobacterium Sp to CA and PA membrane. The results showed that 17 classes of surfactants inhibited attachment of the bacteria to PA membrane, 25 classes inhibited the attachment to CA membrane, and 13 classes inhibited the attachment to both CA and PA membrane. Results from examination of adsorption of anionic surfactants to CA membrane using ATR-FT/IR (attenuated total reflection fourier-transform infrared) indicated that structures of surfactants can be effectively manipulated to optimize adsorption to inhibit attachment of the bacteria to reverse osmosis membrane. Also using SciQSAR program (SciVision, Lexington, MA) to calculate structures of surfactant molecules, 17 molecular descriptors were obtained ranging from molecular weight to electronic and topological ones. The QSAR models obtained for attachment force of surfactants to CA and PA membrane is [93]: CA membrane: attachment force = 1.06 × 10 -3 + 1.01 × 10 -5 CMC + 4.52 × 10 -5 Dipole + 4.35 × 10 -4 ABSQ on + 0.0187MaxQpos -8.00 × 10 -5 K a 3 (n = 23, R 2 = 0.377) PA membrane: attachment force = 0.13 -6.40 × 10 -4 CMC -0.0407logP -1.89 × 10 -3 Dipole (n = 23, R 2 = 0.771) (18) where ABSQ on , M ax Q pos and K a 3 stand for the sum of absolute values of charges on nitrogen and oxygen atoms in a surfactant molecule, the largest charge on all atoms and the third-order Hall & Kier index, respectively. Sensitivity analysis of the variables above demonstrated that for CA membrane system, K a 3 and cmc were the most effective factors to determine surface active effects, while dipole moment and other descriptors had very limited effects. For PA membrane system, cmc was a strongest factor, and dipole moment and lgP had very limited effects. Structure of alkyl chain, especially the chain length, has a direct or indirect effect on properties of a surfactant. By using the methods of quasielastic light scattering spectroscopy, Biz and Occelli derived hydrodynamic radius of alkyl sulphate micelle (Rh) [94]. On the basis of this, Missel et al. performed a theoretical study on the formation of rod-shaped micelles from sphere-shaped micelles [95], deriving the kinetic constant (K) for controlling dodecyl sodium sulfate micelle growth. Their further study manifested that K could work as function of chain length (the number of carbon atoms n c = 8-12), R h value increased linearly approximately and was more dependent on temperature with the chain length.
At high concentrations of NaCl, the growth of micelles depends strongly on temperature, and when concentration of a surfactant is higher than cmc, micelles would be transformed into a cylindrical shape from a spherical shape by bonding to each other [96].
By changing surfactant concentration, washing temperature and washing time, Lindgren et al. studied correlation of detergency of some nonionic surfactants with these physicochemical properties and established a QSPR model to predict cleaning effect (Y) [97]: Y obs = b 0 + b 1 C + b 2 t + b 3 T + b 11 C 2 + b 22 t 2 + b 33 T 2 + b 12 ct + b 13 cT + b 23 tT + e (19) where c1, t, T stands for concentration of a surfactant, washing time and temperature, respectively. Coefficient b 0 is constant; b 1 , b 2 and b 3 stand for contribution to cleaning effect from each respective variable; b 11 , b 22 and b 33 reveal whether in this effect, variable can give maximum/minimum values; b 12 , b 13 and b 23 stand for interaction of different variables. According to this model, detergency effect of surfactants is influenced mainly by (in order of importance): longest carbon chain in the hydrophobic part (redc); critical packing parameter where the branching of the hydrophobic part is taken into account (redcpp); hydrophilic-lipophilic balance (HLB); derivative of the cloud point curve (dCP); relationship between the longest carbon chain and the total amount of carbon in the hydrophobic part (redc/c) and amount of nonethoxylated fatty alcohol (f-alcohol). It was also shown that detergency effect of nonionic surfactants was related to cmc, molecular weight, unit of ethyleneoxy in the hydrophilic part and the number of carbon atoms in the hydrophobic part. Washing temperature was affected by the number of different carbon chains present in the hydrophobe (chains), cloud point, etc. These all indicated that detergency effect of surfactants was affected by molecular structures. Warszynski and Lunkenheimer analyzed experimental results on surface tension for dimethyloxy phosphine with 7-13 carbon atoms in alkyl chain and its homologues [98], indicating that there was a repulsive interaction in adsorption layer and demonstrating that this was the result of decrease in configurational free energy due to a closer coiling of the hydrophobic chain. The adsorption isotherm for surfactants on air/water interface (statistical level) presented by these authors explicitly considered configurational free energy and the results exhibited that under a constant surface pressure, conformational free energy increased linearly with the hydrocarbon chain length.
Wang et al. investigated the interaction of CH 3 (CH 2 ) 7 OSO 3 − with 1 to 6 water molecules at the air-water interface with quantum mechanics [99]. DFT (density functional theory) was employed to optimize the configuration of the anionic surfactant complexes CH 3 (CH 2 ) 7 OSO 3 − (H 2 O)n (n = 0-6) and calculate their molecular frequencies at the B3LYP/6-311+G* level. The results revealed that the hydration shell was formed in the form of H-bond between the hydrophilic group of CH 3 (CH 2 ) 7 OSO 3 − and 6 waters. The strength of H-bonds belongs to medium. Binding free energy revealed that the hydration shell was stable. The increase of the number of water molecules will cause increases of the total charge of hydrophilic group and S10-O9-C8 bond angle, but decreases of the alkyl chain length and the bond lengths of S10-O11, S10-O12 as well as S10-O13, respectively. Based on linear solvation energy relationships (LSERs), Vitha and Carr studied fundamental chemical interactions responsible for solute retention in micellar electrokinetic capillary chromatography (MEKC) [100]. The system under study was homologous series of sodium dodecyl sulfate (SDS), sodium decyl sulfate (SdecS) and sodium octyl sulfate (SOS). It was found in this study that there was no evident change on interaction of solute with micelles when the number of carbon atom in the alkyl chain changed from 8-12. In fact, chromatographic parameters for the micelle phase of the three systems under study were identical. Analysis of linear solvation energy relationship and free energy change for each methylene transferring from water to micelles (ΔG θ CH2 ) indicated that steadiness of solvation free energy as function of alkyl chain length made SDS, SdecS and SOS have similar solvation energy. An important conclusion was drawn from this that the solute existed in polar hydrophobic moiety of micelles, rather than in the nonpolar core part. In addition, an effect of solute functional group on its internal positioning and orientation was also discussed in this work.
For predicting the interaction parameters  where ΔH f is the heat of formation of molecule, E t is the total electronic energy, E LUMO and E HOMO are the energies of the lowest unoccupied molecular orbit and the highest occupied molecular orbit, KH0 is the Kier and Hall index of zero order of the hydrophobic fragment of compound and RNNO is the relative number of oxygen and nitrogen atoms of the hydrophilic segment. R 2 , F, s 2 , and n are the correlation coefficient, the F-test, the standard error, and the number of the regression model, respectively. It is known that when more than two surfactants with different molecular structures are mixed, their surface activities often can be increased intensively (synergism) [101]. The interaction between two surfactants is mainly due to electrostatic forces. The strength of attractive electrostatic interaction decreases in the order anionic-cationic > anionic-zwitterionic capable of accepting a proton > cation-zwitterionic capable of losing a proton > anionic-POE nonionic > cationic-POE nonionic. Mixtures of surfactants of the same charge type can show significant interaction at other interfaces interaction, although they show very weak effect at the aqueous solution-air interface [102]. Some studies addressed on this phenomenon have also been conducted. Tu et al. established two kinds of equations of the surface tension vs. the concentration for ideal binary mixtures of surfactants with Newton iterative method [103]. Their accuracy was verified by comparison among the values obtained respectively from the iterative, the numerical and observable approaches for surface tension of the ideal binary mixed homology systems C1 2 H 25 PO(CH 3 ) 2 /C10H 21 PO(CH 3 ) 2 , C 12 H 25 (CH 3 ) 3 NBr/C 16 H 33 (CH 3 ) 3 NBr in aqueous solutions (25 °C), C 8 F 15 O 2 NH 4 /C 9 F 17 O 2 NH4 in 0.1 mol·L -1 ammonium chloride solution (25 °C). Their research also suggested that the astringency velocity of the two kinds of iterative methods was very fast and the relative error of expression was below 1%. Wang et al. have defined synergisms in surface tension reduction efficiency and mixed micelle formation of binary surfactant mixtures in aqueous solution by appointing the ideal mixture system of surfactants as the standard of comparison, and they have deduced the conditions and the corresponding optimum point values of these two kinds of synergisms based on the regular solution theory and the ideal solution theory [104].

Conclusions and Prospects
In previous paragraphs, the progress in the QSPR studies on surfactants relating to cloud points, charge distribution, surface tension, degradation and other properties have been reviewed ( Table 2). In earlier QSPR studies for surfactants cited in the present review, topological descriptors were mostly adopted to correlate activities/properties, however theses studies may lack insightfulness since there have been few investigations involving effects of electronic motion in molecules (or at intermolecular level) on their properties and functions that is highly critical to development and application of surfactants. Therefore quantum mechanical descriptors have been introduced to solve this problem. More recently, higher level of theory and larger basis sets have been used instead of semiempirical methods to get a higher accuracy for descriptors. For statistical methods, PLS, PCA and neural network are more and more improtant in this field for supplement of MLR. The model statistics developed by MLR are very often too optimistic, therefore the reliability of models should be checked by running for example leave-one-out, randomization or bootstrapping test. In addition, three-dimensional quantitative structure-property relationship (3D-QSPR) models can also be used to surfactants, especially in their biodegradation properties prediction. At present, it has been understood that biological surfactants have some properties which synthetic surfactants do not have, especially in the aspects of biological degradation, safety and physiological activity. For surfactants or molecule with amphiphilic structure [105,106], effects of their structures on the functions mainly depend on various interfacial behavior and this is unavoidably related to intermolecular interactions between surfactant molecules and between surfactant molecules and between surfactant molecules and those of a solvent. Presently, for treatment of intermolecular interactions in surfactant solution, various approximate models based on classical electrostatic potential theory, such as multiple expansion approach (MPE), IC (image charge approximation) and apparent surface charge (ASC) approach, have been developed following continuum medium model [107]. In these models, using quantum mechanical ab initio and semiempirical methods, intermolecular interactions can be studied based on various cavity models and corresponding energies and charge distribution [108][109][110]. In addition, electrostatic interactions of charge fields force field calculations describing intermolecular interactions was used to correlate properties of small molecules and aggregates [111]. However, this method can only be applied to those small systems at this stage, and application of more advanced models to surfactants or molecules with amphiphilic to perform quantitative modeling still needs breakthrough in system interaction mechanism studies and theoretical calculations.