Modeling the Dispersibility of Single Walled Carbon Nanotubes in Organic Solvents by Quantitative Structure-Activity Relationship Approach

The knowledge of physico-chemical properties of carbon nanotubes, including behavior in organic solvents is very important for design, manufacturing and utilizing of their counterparts with improved properties. In the present study a quantitative structure-activity/property relationship (QSAR/QSPR) approach was applied to predict the dispersibility of single walled carbon nanotubes (SWNTs) in various organic solvents. A number of additive descriptors and quantum-chemical descriptors were calculated and utilized to build QSAR models. The best predictability is shown by a 4-variable model. The model showed statistically good results (R2training = 0.797, Q2 = 0.665, R2test = 0.807), with high internal and external correlation coefficients. Presence of the X0Av descriptor and its negative term suggest that small size solvents have better SWCNTs solubility. Mass weighted descriptor ATS6m also indicates that heavier solvents (and small in size) most probably are better solvents for SWCNTs. The presence of the Dipole Z descriptor indicates that higher polarizability of the solvent molecule increases the solubility. The developed model and contributed descriptors can help to understand the mechanism of the dispersion process and predictorganic solvents that improve the dispersibility of SWNTs.


Introduction
With the rapid development of nanoscience and nanotechnology, carbon nanotubes (CNTs) have attracted a great deal of attention due to their unique and versatile properties since single-walled carbon nanotubes (SWCNTs) were discovered by Iijima in 1993 [1]. SWCNTs areknown for exhibiting unique mechanical, electrical, and thermal properties that are useful for a widerange of applications in materials. However, SWCNTs represent quite unusual systems and possess extremely low solubility or poor dispersibility in water and many known organic solvents [2]. Because of high polarizability, hydrophobic surface, and substantial van der Waals interactions, CNTs are able to aggregate, with each other, as well as with other chemical and biological systems to give mixture aggregates, especially in water [3,4]. The treating of CNTs by various active chemicals can change the surface properties and therefore the ability to aggregate. Some active chemicals can destruct the CNTs. This can be achieved by oxidizing CNTs by strong acids, such as refluxing in a mixture of sulfuric acid and nitric acid [5,6], "piranha" solution (sulfuric acid-hydrogen peroxide) [7], boiling in nitric acid [8], or treating with oxidative gases, such as ozone [4,9]. However, treatment under such harsh conditions clearly deviates from green chemistry and results in the opening of the tube tips [5], shortening of the tubes [7], and fragmentation of the sidewalls [8]. Therefore, the stability of CNTs may decrease, along with other important properties.
In recent years there have been several studies describing the preparation of stable suspensions of SWCNTs in a range of known solvents [9][10][11][12][13][14][15][16]. To get detailed structural information and dispersion values of SWCNTs in organic solvents, Hildebrand or Hansen solubility parameters have received close attention by researchers [17][18][19][20]. The main purpose of these studies was to improve the dispersion of nanotubes and to understand the dispersion process.
Quantitative structure activity relationships (QSARs) are often used to predict various physicochemical and biological properties of chemicals. Considering the difficulties in obtaining experimental data in CNTs research, theoretical approaches including Quantitative Structure-Property Relationships (QSPR) can provide useful information regarding predicted dispersibility values of carbon nanotubes directly from the structure. Discovering and developing effective new organic solvents for SWCNTs and C60 solubility have attracted many researchers'attention [21][22][23][24][25][26]. The first attempt to explain SWCNT dispersibility with the application of QSAR approach was carried out by Rofouei et al. [27], and the second study was made by Salahinejad et al. [28]. Our study is focused on developing an improved QSPR model that is able to predict the dispersibility of SWCNTs in various organic solvents.

Data Set
The QSAR modeling was applied for a set of single-walled carbon nanotubes in a pool of 29 different solvents (molecular structures are represented in Figure 1) which were selected from Bergin et al. [29]. The set consist of 29 organic solvents, which are randomly divided into training (22 compounds) and test (7 compounds) sets. The splitting to training and test sets was balanced across the variables. The dispersibilities of studied compounds were expressed in terms of the C (mg/mL) for SWCNTs (Table 1). All original concentrations data were converted to molar LogC(exp) variables. Notes: * Test Set; ** Experimental data is taken from [29].  Figure 1. The structures of 29 organic solvents for single-walled carbon nanotube (SWCNT) derivatives.

Quantum Chemical Calculations
The initial structures of investigated organic solvents for SWCNTs were built using HyperChem 7.5 package [30]. After that step, the structures of compounds were firstly pre-optimized with the Molecular Mechanics Force Field (MM+) procedure included in the HyperChem. The semiempirical quantum chemical descriptors (including total energy, binding energy, electronic energy, nuclear energies, heats of formation, total dipole moment, X, Y, and Z components of dipole moment, EHOMO, ELUMO, surface area, volume, hydration energy, refractivity, LogP, polarizability, mass) were calculated by the RM1 method implemented in HyperChem. An initial set of 258 DRAGON software generated [31] theoretical descriptors was selected from the entire set of generated descriptors and used to describe the chemical diversity of the compounds. The software provides about 4000 various descriptors corresponding to 0D-, 1D-, 2D-, and 3D-descriptor modules. The outlined modules are comprised of 20 different classes of descriptors, namely, the constitutional, the topological, the walk and path counts, the connectivity indices, the information indices, the 2D autocorrelations, the edge adjacency indices, Burden eigenvalues, the topological charge indices, the eigenvalue based indices, the randic molecular profiles, the geometrical descriptors, the RDF descriptors, the 3D-MoRSE descriptors, the WHIM descriptors, the GETAWAY descriptors, the functional groups, the atom-centered fragments, the charge descriptors, and the molecular properties descriptors [32,33]. In addition, the density functional theory (DFT) with the hybrid meta exchange-correlation functional M06-2X/6-311G(d,p) [34] calculations were applied to obtain another set of quantum-chemically generated physico-chemical parameters of studied SWCNTs solvents-including dipole moments (total dipole moment, X, Y, and Z components); orbital energies, EHOMO, ELUMO and heats of formation. All DFT calculations were performed using the Gaussian 09 software [35].

QSAR Modeling and Statistical Analysis
The correlation between biological activity and structural properties was obtained by using the variable selection Genetic Algorithm (GA) and Multiple Linear Analysis (MLRA) methods. Preliminary models selection was performed by means of the GA-MLRA [36][37][38] technique as implemented in the BuildQSAR [39] program. Genetic Algorithms have been applied in recent studies as a powerful tool to address many problems in QSAR studies [36][37][38]. This method based on the mechanism of evaluation of species, in which the higher descriptor weights are the more preserved in the mathematic model, while the lower weight is eliminated. In this form, the best model which represents the experimental biological activity isobtained [36][37][38]40,41]. We selected the resulting model in the range of 1-5 variables per model by limiting the GA variable selection algorithm. The MLR technique was used to develop QSAR models since it is transparent, easy interpretable, and ideal to obtain reproducible results. Several QSAR models developed were followed by statistical analysis with evaluation by squared correlation coefficient R 2 , standard error s, Fisher coefficient F, and non-collinearity of descriptors in the model. A final set of QSARs was tested by applying the "leave-one-out" technique (the process of removing a molecule from the set, then creating and validating the model against the individual molecules, which was performed for the entire training set). The mean was taken of all the Q 2 based on the predictive error sum of squares (PRESS).
The selection of robust and well predictive QSAR models on the basis of only R 2 , Q 2 and R 2 pred might mislead the search for the ideal predictive model, so additional statistical analysis was done on the basis of a few other parameters, such as Average rm 2 , Delta rm 2 . For an acceptable QSAR model, the value of "Average rm 2 " should be >0.5 and "Delta rm 2 " should be <0.2 [42,43].

Results and Discussion
Our study was focused on developing a valid model that is able to predict the dispersibility of SWCNTs in various organic solvents. For this purpose we utilized a QSAR approach. Dragon software-generated additive descriptors, as well as semi-empirical and quantum mechanical descriptors were calculated and a total of 280 descriptors were used to build a QSAR model.
The correlation matrix for the most populated 2D-3D descriptors and LogC(cal) used in the present study is shown in Table 2. The sign of the correlation tells us whether the two variables are positively (more X means more Y) or negatively (more X means less Y) related. LogC(cal) and SRW09 had a good positive correlation (r=0.706) and strongly associated with dispersibility. In addition, LogC(cal), was found to be correlated to the Ram descriptor with r = 0.530 and to Dipole Z with r = 0.377, respectively. However, the correlations for any of these two descriptors considered as a single descriptor in the model were not sufficient to be considered significant in predicting dispersibility. In Table 3 the performances for all developed models with 1-5 variables, for the training and test sets are listed. The 4-variable GA-MLRA based model showed the best predictive ability (R 2 training = 0.797, Q 2 = 0.665, R 2 test = 0.807), with high internal and correlation coefficients. It is clearly noticeable that R 2 values in the case of the training set follow increasing order with increase of the number of variables: 1-variable model < 2-variable model < 3-variable model < 4-variable model and for the test set follow increasing order: 2-variable model < 4-variable model < 3-variable model < 1-variable model. In the results, according to Figure 2 the 4-variable model was chosen as the most predictive and robust model. Table 3. Descriptor names and statistical values for the developed models (statistics are shown for split sets into training (22 compounds) and test (7)).   Table 4 represents the descriptor values selected using GA-MLR variable selection for dispersibility of SWCNTs in organic solvents. The one, two, and four variable models include SRW09 (9th order self-returning walk count). Organic solvents like 1, 3-10, 13, and 20 which have 5-membered rings, show a high value of SRW09, with a positive impact on the SWCNTs dispersibility.

No. Descriptors
The 1-variable model is represented by the following Equation (  The first descriptor given in the Equation (1) above is SRW09 (from the MWC class). It is among the 2D-descriptors representing self-returning walk counts of different lengths. The SRW count of any even order indicates the length and shape (branching) of the entire molecular graph. It increases when the number of atoms increases, when a molecule becomes more branched or contains even-membered rings. But on the other hand, the SRW count of the odd order represents only local surrounding of odd-membered rings [44,45]. To clarify the above, the self-returning walk of the 9th order of the molecules 1 and 4 structures is given in Figure 3.
SRW09 is the self-returning walk count of the 9th order and represents the surroundings of odd membered rings (five-membered in our case). It provides valuable insight into the relationship between structure and dispersibility action.  (2) is a dipole moment descriptor, which is a 3D electronic descriptor that indicates the strength and orientation behavior of a molecule in an electrostatic field. Both the magnitude and the components (X, Y, Z) of the dipole moment are calculated. The descriptor is estimated by utilizing partial atomic charges and atomic coordinates. The presence and sign of the dipole moment contribution indicates that the higher polarity of the attached fragment the higher will be the overall dispersibility value of the SWCNT derivative.
The 3-variable model is represented by following Equation (3): Here one can see the presence of the Ram descriptor, which positively contributes to the dispersibility. The topological descriptor Ram addresses the branching in the molecule [46]. Its regression coefficient suggests in favor of more branched molecular structures for increased activity. Another descriptor represents the molecular multiple path count of order 05 (piPC05). This descriptor reflects the length of the molecule. In Equation (3)  In the model (4) one can see the presence of other descriptors, ATS6m and X0Av. Walk and path counts class descriptors, SRW09, the 2D-AUTO class descriptors, the atomic mass weighted terms ATS6m show the higher weights with positive influences. In this model, the atomic mass weighted term (ATS6m) showed the highest contribution. The activity exhibits negative linear relationship with the connectivity index descriptor (X0Av), which is an average valence connectivity index chi-0.
Analysis of these influential molecular descriptors can lead to the revealing of the mechanism of the dispersion process and thus the model is able to predictnew organic solvents that improve the dispersibility of SWCNTs. For example, X0Av descriptor and its negative term suggest that small size solvents have better influence on the solubility of SWCNTs. Also, mass weighted descriptor ATS6m indicates that a heavier solvent (at the same time having a small size) is most probably a better solvent for SWCNTs. The presence of the Dipole Z descriptor indicates that higher polarizability of the solvent molecule increases the solubility.
The dependence of the number of variables in the models, for training and test sets, on the R 2 values is displayed in Figure 2. The correlation graph of the best QSAR model (Equation (4)) is shown in Figure 4. The GA-MLRA based QSAR model with four variables showed better results than other models (R 2 training = 0.797, Q 2 = 0.666, R 2 test = 0.807), with high internal and external correlation coefficients. A good agreement between the predictions and the experimental values confirmed the reliability of the QSAR model.

Conclusions
In the present work we investigated the influence of the characteristics of a series of organic solvents on the dispersibility of SWCNTs. For this purpose both the additive descriptors (DRAGON-software based) and quantum-chemical descriptors were generated, a total of 280 descriptors.
The 4-variable model retains also a good ratio of the number of descriptors and their predictive ability. TheGA-MLRA based model showed good results (R 2 training = 0.797, Q 2 = 0.665, R 2 test = 0.807), with high internal and external correlation coefficients. The model (4) developed here showed the highest performance with the presence of the following four descriptors, SRW09, ATS6m, Dipole Z, and X0Av. In this model the atomic mass weighted term (ATS6m) showed the highest

Training Set
Test Set contribution. The other significant descriptors are graph elements weighted SRW09 (as walk and path counts) and Dipole-Z (as a 3D descriptor). Descriptors SRW09 and Dipole Z exhibit a positive influence on the dispersibility. A molecular multiple path count of the order 05 (piPC05) and an average valence connectivity index of the order 0 (X0Av) showed negative influence on the considered activity. The X0Av descriptor and its negative term suggest that small size solvents have better influence on the solubility of SWCNTs. Also, the mass weighted descriptor ATS6m indicates that a heavier solvent (at the same time having a small size) most probably is the better solvent for SWCNTs. The presence of the Dipole Z descriptor indicates that a higher polarizability of the solvent molecule increases the solubility. Analysis of these influential molecular descriptors can lead to details of the mechanism of the dispersion process and thus enable predictions of new organic solvents to improve the dispersibility of SWCNTs.