Next Article in Journal
Multi-Step Usage of in Vivo Models During Rational Drug Design and Discovery
Previous Article in Journal
One-Dimensional Hydrogen-Bonded Infinite Chain from Nickel(II) Tetraaza Macrocyclic Complex and 1,2-Cyclopentanedicarboxylate Ligand
Article Menu

Export Article

Int. J. Mol. Sci. 2011, 12(4), 2242-2261; doi:10.3390/ijms12042242

Improving the Accuracy of Density Functional Theory (DFT) Calculation for Homolysis Bond Dissociation Energies of Y-NO Bond: Generalized Regression Neural Network Based on Grey Relational Analysis and Principal Component Analysis
Hong Zhi Li 1, Wei Tao 1, Ting Gao 2, Hui Li 2, Ying Hua Lu 1,2,* and Zhong Min Su 1,*
Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun, 130024, China
School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130017, China
Authors to whom correspondence should be addressed; Tel.: +86-0431-85099108; Fax: +86-431-85684009.
Received: 2 February 2011; in revised form: 21 February 2011 / Accepted: 31 March 2011 / Published: 1 April 2011


: We propose a generalized regression neural network (GRNN) approach based on grey relational analysis (GRA) and principal component analysis (PCA) (GP-GRNN) to improve the accuracy of density functional theory (DFT) calculation for homolysis bond dissociation energies (BDE) of Y-NO bond. As a demonstration, this combined quantum chemistry calculation with the GP-GRNN approach has been applied to evaluate the homolysis BDE of 92 Y-NO organic molecules. The results show that the ull-descriptor GRNN without GRA and PCA (F-GRNN) and with GRA (G-GRNN) approaches reduce the root-mean-square (RMS) of the calculated homolysis BDE of 92 organic molecules from 5.31 to 0.49 and 0.39 kcal mol−1 for the B3LYP/6-31G (d) calculation. Then the newly developed GP-GRNN approach further reduces the RMS to 0.31 kcal mol−1. Thus, the GP-GRNN correction on top of B3LYP/6-31G (d) can improve the accuracy of calculating the homolysis BDE in quantum chemistry and can predict homolysis BDE which cannot be obtained experimentally.
Y-NO bond; homolysis bond dissociation energy; density functional theory; grey relational analysis; principal component analysis; generalized regression neural network

1. Introduction

Nitric oxide (NO) is an important signaling and effector molecule that is key to many physiological functions of the body (e.g., blood pressure regulation, immune system and nerve conduction), and plays a vital role in the regulation of [19]. NO has a high chemical activity and rarely exists in the form of free radical in the body. It becomes stable by binding itself with the carrier molecules in the body in specific binding sites that enable its storage, transfer and release.

Inspired by the concepts of the “proton affinity” and “electron affinity”, Cheng et al. proposed the concept of NO affinity which can be defined as the measure of the strength for a receptor (X) to bind with the NO group [10]. It is characterized using the energies of the Y-NO bond (Y is the atom in the carrier molecules to which the NO group is attached, Y = N, S, O, C) in two different ways:

X N O X + N O
X N O X + N O +

The first reaction represents the homolysis which is chemical bond dissociation of a neutral molecule generating two free radicals. The energy during the reaction is referred to as homolysis Bond Dissociation Energy (BDE).The second reaction represents the heterolysis which is chemical bond cleavage of a neutral molecule generation an anions or cations. The energy during the reaction is referred to heterolysis BDE. In recent years, Cheng et al. developed a simple experimental approach to measure the homolysis and heterolysis bond dissociation energies (BDE) of the Y-NO (Y = C, N, O, S) bond in solution [1020]. Experimental data show that the heterolysis energy of Y-NO bond is generally higher than the homolysis BDE of Y-NO bond which implies that it is easier for the NO carrier to release NO than NO and NO+. The carrier molecule is a potential free radical to bond with NO. The study of the heterolysis and homolysis BDE of the carrier molecules containing Y-NO (Y = C, N, O, S) bond helps measure the bonding and release capacity of NO in the body and understand and predict the transfer direction and mechanism of NO in the body.

Quantum chemistry approaches are not only limited to the level of experimental validation, but also can predict the BDE without experimental results or with uncertain experimental. The main reason leading to this limitation is the similarity between the computational approaches. Therefore, in recent 10 years, many statistical approaches have been used to improve the accuracy of quantum chemistry. First, the molecular properties are obtained from the calculation of quantum chemistry approaches and then the statistical approaches are applied to establish the relation between the experimental and calculated values. These statistical improvements include linear approaches, such as multiple linear regression [21], and nonlinear approaches, such as neural networks, etc. [2224]. Although multiple linear regression approach is simple and intuitive, neural networks can better solve complex nonlinear problems which are difficult to model mathematically given the same physical parameters [2527]. If the training of the neural networks is based on the back propagation (BP) algorithm, it is vulnerable to the slow convergence rate, and gets stuck at the local minimum points [28]. The genetic algorithm is an efficient global search approach which has been adopted in problems with a large search state-space to explore the globally optimal results. Therefore, the genetic algorithm can be used to optimize the weights of neural network [29].

Roman M. Balabin et al. estimate the density functional theory (DFT) energy with large basis set using lower-level energy values and molecular descriptors [30]. A total of 208 different molecules were used for the artificial neural network (ANN) training, cross validation, and testing by applying BLYP, B3LYP, and BMK [31] density functionals. An expected error, mean absolute deviation, ANN approximation to DFT energies was 0.6 ±0.2 kcal mol−1. Wu and Xu proposed the X1 approach that combines the DFT (B3LYP) with the neural network correction for an accurate prediction of formation heat [25]. An error close to the G3 approach (1.34 versus 1.05 kcal mol−1 for the G3/99 molecule set) was reached.

Chen and co-workers proposed a DFT-NEURON approach to establish the quantitative relationship between the experimental data and the results computed from the first main principle [23]. This relationship was then used to reduce the error margin of the values of the computed absorption energy [32]. With the TDDFT/B3LYP/6-31G (d) approaches, the root-mean-square (RMS) for the absorption energies in 60 organic molecules was reduced from 0.33 to 0.09 eV. Recently, our research group proposed a successful improvement approach based on genetic algorithm and neural network (GANN) to correct the absorption energies of 150 organic molecules [33]. In addition, we also proposed a least squares support vector machine approach to correct the absorption energies of 160 organic molecules [34].

There are mainly two factors affecting the accuracy of the calculation of the homolysis BDE: (1) the selection of molecular descriptors and pretreatment; (2) the statistical approaches. Some researchers only focus on the statistical approach selection and ignore the molecular descriptors selection and pretreatment. The subjective choice of molecular descriptors, the distribution of the weights, the redundancy in the chosen molecular descriptors and the multiple correlations in molecular descriptors all affect the final results. Therefore, the molecular descriptors selection and pretreatment are significant.

In this paper, the Generalized Regression Neural Network (GRNN) based on the grey relational analysis (GRA) and principal component analysis (PCA) (GP-GRNN) approach is proposed to improve the accuracy of calculating the homolysis BDE of 92 organic molecules. The DFT B3LYP/6-31G (d) approach is first applied to optimize the carrier molecules and calculate their frequency in order to obtain the homolysis BDE value and relevant molecular descriptors of the Y-NO (Y = C, N, O, S) bond. GRA is used to select the appropriate molecular descriptors. PCA is used to optimize the selected molecular descriptors. Finally, GRNN is used to establish nonlinear model. Then the GP-GRNN is applied to reduce the RMS of homolysis BDE for the 92 organic molecules. The results show that GP-GRNN is a more accurate and informative correction technique in chemical physics.

2. Description of Approach

2.1. Grey Relational Analysis

If the index characteristics of a system are represented as a reference array x0 (n-dimensional), the array is used as reference index (experimental value of homolysis BDE in this paper). If an array that has multiple characteristics is related to the reference index, then the array is represented as xi(i = 1,2,…,m) (n-dimensional). To measure the relationship between the reference index and the arrays xi(i = 1,2,…,m), the concept of relational coefficient is introduced below:

r i ( k ) = min i min k | x 0 ( k ) x i ( k ) | + α * max i max k | x 0 ( k ) x i ( k ) | | x 0 ( k ) x i ( k ) | + α * max i max k | x 0 ( k ) x i ( k ) | ( i = 1 , 2 , , m ; k = 1 , 2 , n )
where α is the distinguishing coefficient, ri(k) is the relational coefficient between xi and x0 at the kth characteristic.

The following equation further processes the relational coefficients to obtain the ɛi which is the relation degree between xi and x0.

ɛ i = [ r i ( 1 ) + r i ( 2 ) + + r i ( n ) ] / n
ɛi is the average of the relational coefficients [3537].

2.2. Principal Component Analysis

PCA is a multivariate statistical approach used for reducing variables in the data [38]. Its basic content is to convert a set of original indices into a set of comprehensive indices which are new and uncorrelated with one another to replace the original. According to the actual needs, it is likely to select several fewer comprehensive indices which can reflect the information of original indices as much as possible to represent the total indices of original variables to achieve the purpose of variable reduction.

PCA is aimed at compressing the number of variables and making the model reflect the real situation better by using fewer variables to explain most of the variables in the original data and eliminating the redundancy. This means to convert a number of highly correlated variables into new, fewer and independent of one another variables, i.e., principal component which can explain most of the variance of the original data. This approach can erase the collinearity existing in the original variables, and overcome the problems, such as instability of calculation, ill-conditioned matrix and so on and so forth.

The following part shows the approach and calculation steps that PCA adopts to determine the weight.

We suppose that the number of sample (organic molecules) is n and the number of indicators’ (molecular descriptors) value in each sample is m. Then, we can organize the experimental data into a matrix.

X = [ X i j ] ( i = 1 , 2 , , n ; j = 1 , 2 , , m )

(a) Standardize the original data

The indicator in each sample is converted into the standardized indicator X*j according to the standardized Equation 6. Therein, Xj and Sj is respectively are the mean and standard deviation of Xj. The mean of X*j is 0 and the variance is 1.

X i j * = X i j X ¯ j S j ( i = 1 , 2 , , n ; j = 1 , 2 , , m )

(b) Calculate the correlation coefficient rij, of each standardized indicator X*j and write down the matrix of correlation coefficient: R = [rij]m*m. Therein,

r i j = 1 n 1 t = 1 n X t i X t j ( i , j = 1 , 2 , , m )

(c) Calculate the eigenvalue λi (I = 1,2,…,m) of related matrix R and then, arrange the eigenvalue λi according to the descending order λ1λ2λ3 ≥ … ≥ λm ≥ 0. α i = λ i m is called the variance contribution of the principal component Zi, i.e., the weight of the principal component Zi.

2.3. Generalized Regression Neural Network

GRNN was proposed by the American scholar D. F. Specht [39]. The approach uses vertical basis function as the basis of the hidden units to form the hidden layers. The hidden layers (include pattern layer and summation layer) transform the input vectors from the low-dimensional input data into a high dimensional space so that the problem can be separated linearly in the high dimensional space. It is good at function approximation and the network finally converges to the optimized regression plane which contains the most samples. It can predict well, even with very few sample data, and can handle instability in the data. The structure of GRNN is composed of four layers, input layer, pattern layer, summation layer and output layer (Figure 1).

The output is Y which corresponds to the net input X = [x1,x2,,xp]T. The number of neurons in the input layer is equal to the dimensions of input vector p in the study sample. The number of neurons in pattern layer is equal to the number of study sample n. The transfer function of neuron n is

p i = exp [ ( X X i ) T ( X X i ) 2 δ 2 ] ( i = 1 , 2 , n )
therein, X is the network’s input variable and δ is the smoothing factor which determines the shape of function. The larger the value is, the smoother the function is. Xi is the corresponding study sample of neuron i. Each unit in the pattern layer corresponds to a training sample and the Gaussian function is treated as the activation of kernel function. Two types of neurons are used for summation in the summation layer, one is
X = [ X i j ] ( i = 1 , 2 , , n ; j = 1 , 2 , , m )
whose connection weight with each neuron in the pattern layer is 1, and the other is
S N = i = 1 n y i p i
whose connection weight is each factor yi of the output sample in the pattern layer in which the weighted sum is adopted to work out the summation of the output of corresponding neurons. The output of neurons in the output layer is
Y = S N S D
i.e., the output of network. In our paper, Figure 2 shows a flow chart of GP-GRNN model calculation.

3. Computational

3.1. Data Set

Ninety-two (92) important organic carrier molecules containing NO are studied in this work. They are the four typical NO carrier molecules in the acetonitrile solution: N-nitrosamine compounds, O-nitrite, C-nitroso compounds, S-nitrosylation compounds (their molecular structures are shown in Table 1). The data set is randomly divided into a training set (80 molecules) and a test set (12 molecules). The training set used to adjust the model parameters and the test set is used to test the model's predictive ability.

3.2. Calculation of Molecular Descriptors

All of the calculations were done using Gaussian03 [40]. The geometry was optimized at the DFT/B3LYP level with 6-31G (d) basis set. Subsequently, vibrational frequencies were performed at the same theoretical level to confirm their local minima. The gas phase homolysis BDE is defined as the enthalpy change of the Equation 1 at 298 K in a vacuum [41]. The enthalpy of formation for each species was calculated using the following equation:

H 298 = E + Z P E + Δ H 298 0 + R T

The zero point energy correction was taken into account in the calculation. ΔH298–0 is the standard temperature correction term including Hvib, Hrot and Htrans.

Δ H hom o = H 298 ( X ) + H 298 ( N O ) H 298 ( X N O )

A set of molecular descriptors can be obtained from the geometry optimization and frequency calculations. The molecular descriptors include the calculated homolysis BDE value (ΔHhomo), the net charge (QY) on the Y(C, N, O, S) atom which is bonded with the NO, the net charge (QN) and (QO) on the atoms N, O in the NO molecule fragment, the number of electrons (NX) on the fragment X (excluding the NO radical), the molecule dipole moment (μ) and the molecule polarizability (α), the highest occupied molecular orbital and the lowest unoccupied molecular orbital energy (EHOMO) and (ELUMO), the second highest occupied molecular orbital and the second lowest unoccupied molecular orbital energy (EHOMO−1) and (ELUMO+1), the energy gap (ΔE) between EHOMO and ELUMO. The molecular descriptors can reflect covalent and ionic interactions in a molecule bond.

4. Results and Discussion

4.1. Calculation of Descriptor

For the 92 organic molecules containing the Y-NO (Y = C, N, O, S) bond, the B3LYP function is used to optimize the geometry at 6-31G (d) basis set level and the frequency is calculated to confirm the stable structure. Finally, the homolysis BDE of Y-NO (Y = C, N, O, S) bond and relevant molecular descriptors are obtained and shown in Table 2.

In addition, Table 2 also provides the homolysis BDE values that are observed in the experiment in the acetonitrile solution. The higher the homolysis BDE is, the stronger that the NO can bond with the carrier and vice versa. Lower homolysis BDE indicates that the carrier can serve as a good NO releasing agent. The four types of carrier molecules studied in this paper have lower homolysis BDE and are excellent NO radical carriers in the body. From Table 2, we can see that the carrier molecules containing N-NO, O-NO, S-NO and C-NO bonds have homolysis BDE at 12.4–43.8 kcal mol−1, 32.5–38.6 kcal mol−1, 17.2–29.2 kcal mol−1 and 27.5–31.4 kcal mol−1 respectively. The carrier molecule containing the S-NO bond has the lowest homolysis BDE which indicates that the carrier containing the S-NO bond is the best free NO radical carrier among the four compounds studied. The theoretical calculated homolysis BDE levels for the four compounds are at 12.63–35.57 kcal mol−1, 29.80–40.39 kcal mol−1, 17.49–27.96 kcal mol−1 and 17.69–24.62 kcal mol−1 respectively. Compared with the experimental values, the calculated value for the molecule carrier containing the N-NO bond is close to its experimental value with some underestimations and the calculated energy for the molecule carrier containing the O-NO and S-NO bonds is very close to their experimental values. The calculated energy for the molecule carrier containing the C-NO bond is lower than its experimental value.

4.2. Calculation Results of GRA

The experimenatal value of homolysis BDE of the 92 carrier molecules are used as the reference array. The 12 computed molecular descriptors are used as the contrast array. The closer the relation is to 1, the more relations the two arrays have and vice versa.

When the distinguishing coefficient is set to 0.5, ΔHhomo, QY, QN, QO, NX, µ, α, EHOMO−1, EHOMO, ELUMO, ELUMO+1, ΔE are 0.8902, 0.6137, 0.7946, 0.8626, 0.8079, 0.7899, 0.8096, 0.8889, 0.8925, 0.8359, 0.7020, 0.8827 respectively. The relational coefficients of the 12 parameters on the experimental values in decreasing order are EHOMO, ΔHhomo, EHOMO−1, ΔE, QO, ELUMO, α, NX, QN, μ, ELUMO+1, QY. Clearly, the EHOMO and EHOMO−1 have a big impact on the homolysis BDE. The ELUMO has a certain influence on the homolysis BDE. The ELUMO+1 has little impact on the homolysis BDE. The relational coefficient is 0.8902 which indicates a good match between the theoretical calculations and the experimental values. The difference between the EHOMO energy level and ELUMO energy level ΔE can measure the stability of the molecules and has a large impact on the homolysis BDE. In the NO molecular fragments, the oxygen’s electronegativity is greater than that of nitrogen and the net charge of oxygen has a larger impact than that of the nitrogen on the homolysis. The net charge on the Y(C, N, O, S) atoms which are connected to NO has the minimal impact on the homolysis which implies that the ionicity of chemical bonds is smaller. The polarization rate can measure the molecular deformation and affect the homolysis BDE. The number of electrons on the molecular fragment X which is connected to NO has a certain impact on the homolysis. The dipole moment is a vector that has little impact on the homolysis which indicates that the Y-NO direction has little impact on the molecular dipole moment.

4.3. Calculation Results of PCA

According to the results of GRA, the molecular descriptors that have a relation of greater than 0.8 are selected. The first eight molecular descriptors are used as the basic characteristics. A correlation matrix is generated after the correlation analysis on the eight molecular descriptors (Table 3). It can be seen that there is a certain correlation between them and the correlation between α and NX is as high as 0.9331. It is inevitable that increasing the complexity of data analysis, if these eight selected molecular descriptors are taken as the final attribute characteristics directly, there would be some problems such as instable calculation and ill-conditioned matrix which are caused by superposed information existing in the above eight molecular descriptors. Nevertheless, these problems can be avoided through PCA that can also make the weight distribution of the molecular descriptors more reasonable, avoid the redundant information, and eliminate the not useful information.

The PCA is performed on the eight selected molecular descriptors to obtain the eigenvalues, variance and cumulative variance contribution rate for each principle component (Table 4). It can be seen that the first six principal components can explain 99.63% of the total variance of all variables, which means that the six indicators of the new indicators system can reflect 99.63% differences in samples. The eigenvalue for the seventh principal component is already very small and the eighth eigenvalue is 0. The smaller the eigenvalue is, the less amount of information its principal component contains. Therefore, the seventh principal component contains very little useful information and the eighth principal component contains no useful information. Therefore, the first six principal components are selected to avoid redundancy of information and eliminate the interference information, namely no useful information.

The weights of the first six principal components and the corresponding molecular descriptors are shown in Table 5. From the analysis of the weights of the first six principal components, it can be seen that the EHOMO−1 and ELUMO have a higher load in the first principal component, the ΔHhomo and ΔE on the second principal component have a higher load, the NX and α on the third principal component have a higher load, the ΔHhomo on the fourth principal component has a higher load, the QO on the fifth principal component has a higher load and the EHOMO on the sixth principal component has a higher load. Thus the assignment to the weights has the theoretical basis. Meanwhile, the variance between either two of six principal components is 0 which means that the six principal components are unrelated to each other. Therefore, the instability and ill-conditioned matrix problems can be avoided in the computation.

After the GRA and PCA approaches are performed for the molecular descriptors selection and optimization, the first six principal components are used as the final GRNN inputs. To assess the GP-GRNN approach’s effect on calculating the homolysis BDE of 92 organic molecules, we compare the GP-GRNN correction results with the B3LYP/6-31G (d) correction results, the correction results of the full-descriptor GRNN without GRA and PCA (F-GRNN) and with GRA (G-GRNN), respectively. The differences between the experimental and calculated homolysis BDE for the F-GRNN, G-GRNN and GP-GRNN correction results are tabulated in Table 6.

Figure 3a is the scatter diagram of the B3LYP/6-31G (d) results and experimental results. The vertical coordinates are the experimental values and the horizontal coordinates are the B3LYP/6-31G (d) calculated values. The diagonal line represents that the vertical coordinate and horizontal coordinates are equal. In Figure 3b, Figure 3c and Figure 3d the horizontal coordinates represent the correction results of F-GRNN, G-GRNN and GP-GRNN, respectively. It can be seen that the correction results of GP-GRNN are closer to the experimental values. The insets are the histogram for the deviation of three approaches in Figures 3a–3d. It is obvious that B3LYP/6-31G (d) approach has large systematic deviation while the F-GRNN, G-GRNN and GP-GRNN approaches corrected have small systematic deviation.

For GRNN, the initialization is to determine the study process of training samples. Then, the connection weight between the network structure and each neuron is determined after the determination of the study samples. The training process of network is just the process of determining δ. In the training process, the learning algorithm is to adjust the transfer function of each unit to acquire the best results of regression estimation by changing δ, not by adjusting the connection weight between neurons.

In the transfer function, the value of δ is increased progressively from 0.02 to 1 by the constant of the variation of 0.02. The optimal output of neural network can be decided in the process of the variation of δ. For F-GRNN, G-GRNN and GP-GRNN approaches, when the values of δ are respectively 0.18, 0.08 and 0.10, the best results of regression estimation appear. For the training test, the RMS before correction is 5.40 kcal mol−1, and the RMS of F-GRNN and G-GRNN after correction is 0.48 and 0.38 kcal mol−1; however the RMS of GP-GRNN is 0.30 kcal mol−1. For the test set, the RMS respectively decreases from 4.69 to 0.55, 0.46 and 0.39 kcal mol−1 (Table 7). The GP-GRNN approach improved DFT calculation results in both the training set and the test set separately. After the correction of GP-GRNN, the deviation between the value of each sample and the experimental value of homolysis BDE in the test set is reduced. Nevertheless, two deviations (in sample 22 and 24) are much bigger than the other ten. The reasons why the above two deviations are much bigger lie in the following two facts. (1) When the smoothing factor δ is larger, the network output Y (predictive value of homolysis BDE) approaches the mean of experimental value of homolysis BDE in all samples. On the contrary, when the smoothing factor δ tends to 0, Y is close to training sample. When the points which need predicating are included in the training sample, the predictive value of the experimental value of homolysis BDE calculated by using Equation 11 approaches corresponding experimental value of homolysis BDE in sample. Nevertheless, the predictive results may be worse if there are some sample points excluded in the sample. The value of δ is not the larger the better, nor the smaller the better. When the value of δ is moderate, all the experimental value of homolysis BDE in the training sample are taken into account and the experimenatal value of homolysis BDE corresponding to the sample points close to the predictive points add more weight. Hence, the value of δ is selected in this paper after many experiments with the aim to acquire the best network output. (2) Sample data is few in the training set and the features cannot be extracted in the training procedure of neural network. The prediction accuracy of GP-GRNN approach can be further improved as more and better experimental data are available. The consistency between the training and test set implied that the GP-GRNN results could indeed predict the homolysis BDE with higher accuracy than F-GRNN and G-GRNN.

The B3LYP/6-31G (d) calculations are carried out to evaluate the homolysis BDE of the 92 organic molecules, and their overall resulting RMS from the experimental data is 5.31 kcal mol−1. Upon the traditional F-GRNN correction approach, the RMS of the calculated homolysis BDE of the 92 organic molecules is reduced from 5.31 to 0.49 kcal mol−1 for the B3LYP/6-31G (d) calculation. With the G-GRNN and GP-GRNN correction, the RMS is reduced from 5.31 to 0.39 and 0.31 kcal mol−1, respectively (Table 7).

From Table 7, it can be seen that the correction result acquired from G-GRNN is better than that acquired from F-GRNN. If the molecular descriptors are added to G-GRNN, the good fitting ability and the poor generalization ability appear on the training set and the test set, respectively. This denotes that over fitting happens with the F-GRNN. Similarly, the correction result from GP-GRNN is also better than that from G-GRNN. This better result is facilitated by PCA which can optimize selected molecular descriptors. From the above discussion, it can be deduced that the descriptors selection and optimization play a key role to obtain a perfect model, although GRNN theoretically shows a stronger capability of anti-redundancy.

5. Conclusions

GP-GRNN approach was successfully used to improve the homolysis BDE calculation’s accuracy. GRA is used to select the appropriate molecular descriptors. PCA is used to optimize the selection of molecular descriptors. GRNN is used to establish non-linear model. The GP-GRNN approach reduced the calculated RMS of 92 organic molecules from 5.31 to 0.31 kcal mol−1. Compared with the F-GRNN and G-GRNN, GP-GRNN is more feasible and effective. Further, the GP-GRNN correction on top of the B3LYP/6-31G (d) results is a better approach to correct homolysis BDE and can be used as the approximation of experimental results, when the experimental results are limited to measurement, with very high accuracy. GP-GRNN approach extends the B3LYP/6-31G (d)’s feasibility and applicability. The more experimental data the training set has, the more accurate the GP-GRNN approach will be. GP-GRNN approach can be not only used to calculate the homolysis BDE, but also it can be applied to calculate the heterolysis BDE, absorption energy, ionization energy, formation heat and so on. In summary, the GP-GRNN approach is an effective and predictive tool that can be used in the study of physical and chemical properties at the molecular level.


The authors gratefully acknowledge financial support from National Natural Science Foundation of China (Grant No(s). 20703008 and 20903020), the National Basic Research Program of China (Grant No(s). 2009CB623605), and the Science and Technology Development Planning of Jilin Province (No(s). 20100114).


  1. Butler, AR; Williams, DLH. The physiological role of nitric oxide. Chem. Soc. Rev 1993, 22, 233–241. [Google Scholar]
  2. Averill, BA. Dissimilatory nitrite and nitric oxide reductases. Chem. Rev 1996, 96, 2951–2964. [Google Scholar]
  3. Palmer, RMJ; Ferrige, AG; Moncada, S. Nitric oxide release accounts for the biological activity of endothelium-derived relaxing factor. Nature 1987, 327, 524–526. [Google Scholar]
  4. Ignarro, LJ. Biosynthesis and metabolism of endothelium-derived nitric oxide. Annu. Rev. Pharmacol. Toxicol 1990, 30, 535–560. [Google Scholar]
  5. Feldman, PL; Griffith, OW; Stuehr, DJ. The surprising life of nitric oxide. Chem. Eng. News 1993, 71, 26–38. [Google Scholar]
  6. Fukuto, JM; Ignarro, LJ. In vivo aspects of nitric oxide (NO) chemistry: Does peroxynitrite (OONO) play a major role in cytotoxicity? Acc. Chem. Res 1997, 30, 149–152. [Google Scholar]
  7. Moncada, S; Palmer, RMJ; Higgs, EA. Nitric oxide: Physiology, pathophysiology, and pharmacology. Pharmacol. Rev 1991, 43, 109–142. [Google Scholar]
  8. Ignarro, LJ. Signal transduction mechanisms involving nitric oxide. Biochem. Pharmacol 1991, 41, 485–490. [Google Scholar]
  9. Gnewuch, CT; Sosnovsky, GA. Critical appraisal of the evolution of N-nitrosoureas as anticancer drugs. Chem. Rev 1997, 97, 829–1014. [Google Scholar]
  10. Cheng, JP; Wang, K; Yin, Z; Zhu, X; Lu, Y. NO affinity. The driving force of nitric oxide (NO) transfer in biomimetic N-nitrosoacetanilide and N-nitrososulfoanilide systems. Tetrahedron Lett 1998, 39, 7925–7928. [Google Scholar]
  11. Cheng, JP; Xian, M; Wang, K; Zhu, X; Yin, Z; Wang, PG. Heterolytic and homolytic Y-NO bond energy scales of nitroso-containing compounds: Chemical origin of NO release and NO capture. J. Am. Chem. Soc 1998, 120, 10266–10267. [Google Scholar]
  12. Xian, M; Zhu, XQ; Lu, J; Wen, Z; Cheng, JP. The first O-NO bond energy scale in solution: heterolytic and homolytic cleavage enthalpies of O-nitrosyl carboxylate. Compd. Org. Lett 2000, 2, 265–268. [Google Scholar]
  13. Zhu, XQ; He, JQ; Li, Q; Xian, M; Lu, J; Cheng, JP. N-NO bond dissociation energies of N-nitroso diphenylamine derivatives (or analogues) and their radical anions: Implications for the effect of reductive electron transfer on N-NO bond activation and for the mechanisms of NO transfer to nitranions. J. Org. Chem 2000, 65, 6729–6735. [Google Scholar]
  14. Lü, JM; Wittbrodt, JM; Wang, K; Wen, Z; Schlegel, HB; Wang, PG; Cheng, JP. NO affinities of S-nitrosothiols: a direct experimental and computational investigation of RS-NO bond dissociation energies. J. Am. Chem. Soc 2001, 123, 2903–2904. [Google Scholar]
  15. Zhu, XQ; Hao, WF; Tang, H; Wang, CH; Cheng, JP. Determination of N-NO bond dissociation energies of N-methyl-N-nitrosobenzenesulfonamides in acetonitrile and application in the mechanism analyses on NO transfer. J. Am. Chem. Soc 2005, 127, 2696–2708. [Google Scholar]
  16. Zhu, XQ; Zhang, JY; Cheng, JP. Mechanism and driving force of NO transfer from S-nitrosothiol to cobalt(II) porphyrin: A detailed thermodynamic and kinetic study. Inorg. Chem 2006, 46, 592–600. [Google Scholar]
  17. Li, X; Zhu, XQ; Wang, XX; Cheng, JP. Determination of N-NO bond dissociation energies of N-nitrosoindoles and their radical anions in acetonitrile. Chem. J. Chin. Univ 2007, 28, 2295–2298. [Google Scholar]
  18. Li, X; Zhu, XQ; Cheng, JP. Determination of NO chemical affinities of benzyl nitrite in acetonitrile. Chem. J. Chin. Univ 2007, 29, 2327–2329. [Google Scholar]
  19. Li, X; Cheng, JP. Determination of S-NO bond dissociation energies of S-nitroso-N-acety-d,l-penicillamine dipeptides. Chem. J. Chin. Univ 2008, 29, 1569–1572. [Google Scholar]
  20. Li, X; Deng, H; Zhu, XQ; Wang, X; Liang, H; Cheng, JP. Establishment of the C-NO Bond dissociation energy scale in solution and its application in analyzing the trend of NO transfer from C-nitroso compound to thiols. J. Org. Chem 2009, 74, 4472–4478. [Google Scholar]
  21. Vapnik, VN. Statical Learning Theory; John Wiley&Sons: New York, NY, USA, 1998. [Google Scholar]
  22. Wang, X; Hu, L; Wong, L; Chen, GH. A combined first-principles calculation and neural networks correction approach for evaluating Gibbs energy of formation. Mol. Simul 2004, 30, 9–15. [Google Scholar]
  23. Wang, X; Wong, L; Hu, L; Chan, C; Su, ZM; Chen, GH. Improving the accuracy of density-functional theory calculation: the statistical correction approach. J Phys Chem A 2004, 108, 8514–8525. [Google Scholar]
  24. Duan, XM; Li, ZH; Song, GL; Wang, WN; Chen, GH; Fan, KN. Neural network correction for heats of formation with a larger experimental training set and new molecular descriptors. Chem. Phys. Lett 2005, 410, 125–130. [Google Scholar]
  25. Balabin, RM; Lomakina, EI; Safieva, RZ. Neural network (ANN) approach to biodiesel analysis: analysis of biodiesel density, kinematic viscosity, methanol and water contents using near infrared (NIR) spectroscopy. Fuel 2011, 90, 2007–2015. [Google Scholar]
  26. Eros, D; Keri, G; Kovesdi, I; Szantai-Kis, C; Meszaros, G; Orfi, L. Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS, and ANN methods. Mini Rev. Med. Chem 2004, 4, 167–177. [Google Scholar]
  27. Balabin, RM; Safieva, RZ; Lomakina, EI. Wavelet neural network (WNN) approach for calibration model building based on gasoline near infrared (NIR) spectra. Chemometr. Intell. Lab. Syst 2008, 93, 58–62. [Google Scholar]
  28. Wu, J; Xu, X. The X1 method for accurate and efficient prediction of heats of formation. J. Chem. Phys 2007, 127, 214105–2141058. [Google Scholar]
  29. Wu, J; Xu, X. Improving the B3LYP bond energies by using the X1 method. J. Chem. Phys 2008, 129, 164103–164111. [Google Scholar]
  30. Balabin, RM; Lomakina, EI. Neural network approach to quantum-chemistry data; Accurate prediction of density functional theory energies. J. Chem. Phys 2009, 131, 74104–74108. [Google Scholar]
  31. Boese, AD; Martin, JML. Development of density functionals for thermochemical kinetics. J. Chem. Phys 2004, 121, 3405–3416. [Google Scholar]
  32. Hu, L; Wang, X; Wong, L; Chen, GH. Combined first-principles calculation and neural-network correction approach for heat of formation. J. Chem. Phys 2003, 119, 11501–11507. [Google Scholar]
  33. Li, H; Shi, LL; Zhang, M; Su, ZM; Wang, X; Hu, L; Chen, GH. Improving the accuracy of density-functional theory calculation: The genetic algorithm and neural network approach. J. Chem. Phys 2007, 126, 144101–144108. [Google Scholar]
  34. Gao, T; Sun, SL; Shi, LL; Li, H; Li, HZ; Su, ZM; Lu, YH. An accurate density functional theory calculation for electronic excitation energies: The least-squares support vector machine. J. Chem. Phys 2009, 130, 184104–184107. [Google Scholar]
  35. Deng, JL. The theory and method of socioeconomic grey systems. Soc. Sci. China 1984, 6, 47–60. [Google Scholar]
  36. Deng, JL. Figure on difference information space in grey relational analysis. J. Grey Syst 2004, 16, 96–100. [Google Scholar]
  37. Huang, SJ; Chiu, NH; Chen, LW. Integration of the grey relational analysis with genetic algorithm for software effort estimation. Eur J. Oper. Res 2008, 188, 898–909. [Google Scholar]
  38. Jolliffe, IT. Principal Component Analysis; Springer: New York, NY, USA, 1986. [Google Scholar]
  39. Specht, DF. The general regression neural network-rediscovered. Neural Netw 1993, 6, 1033–1034. [Google Scholar]
  40. Gaussian 03; Revision B03; Gaussian, Inc: Pittsburgh, PA, USA, 2003.
  41. Fu, Y; Mou, Y; Lin, BL; Guo, QX. Structures of the X-Y-NO molecules and homolytic dissociation energies of the Y-NO bonds (Y = C, N, O, S). J. Phys. Chem. A 2002, 106, 12386–12392. [Google Scholar]
Figure 1. Structure of generalized regression neural network (GRNN).
Figure 1. Structure of generalized regression neural network (GRNN).
Ijms 12 02242f1 1024
Figure 2. Flow chart of GP-GRNN model calculation (GRA, grey relational analysis; PCA, principal component analysis).
Figure 2. Flow chart of GP-GRNN model calculation (GRA, grey relational analysis; PCA, principal component analysis).
Ijms 12 02242f2 1024
Figure 3. Calculated homolysis BDE versus experimental homolysis BDE for all 92 organic molecules; (a) B3LYP 6-31G (d) calculated homolysis BDE from the DFT approach; (b) full-descriptor GRNN corrected homolysis BDE for the F-GRNN approach; (c) The combined GRA and GRNN corrected homolysis BDE for the G-GRNN approach; (d) The combined GRA, PCA and GRNN corrected homolysis BDE for the GP-GRNN approach. Triangles (Δ) are for the training set and crosses (×) are for the test set. Insets are the histograms for the differences between the experimental and calculated homolysis BDE; All values are in units of kcal mol1.
Figure 3. Calculated homolysis BDE versus experimental homolysis BDE for all 92 organic molecules; (a) B3LYP 6-31G (d) calculated homolysis BDE from the DFT approach; (b) full-descriptor GRNN corrected homolysis BDE for the F-GRNN approach; (c) The combined GRA and GRNN corrected homolysis BDE for the G-GRNN approach; (d) The combined GRA, PCA and GRNN corrected homolysis BDE for the GP-GRNN approach. Triangles (Δ) are for the training set and crosses (×) are for the test set. Insets are the histograms for the differences between the experimental and calculated homolysis BDE; All values are in units of kcal mol1.
Ijms 12 02242f3 1024
Table 1. The structures of the 92 organic carrier molecules with Y-NO a.

a1–53 contain N-NO; 54–66 contain O-NO; 67–84 contain S-NO; 85–92 contain C-NO.

Table 2. Theoretical calculation of the Y-NO (Y = C, N, O, S) bond homolysis BDE (in the gas phase) and molecular descriptors a.
Table 2. Theoretical calculation of the Y-NO (Y = C, N, O, S) bond homolysis BDE (in the gas phase) and molecular descriptors a.

aUnit: ΔHhomo (kcal mol−1), Charge (e), Dipole Moment (debye), Polar (a.u.) and Energy (a.u.);b1–53 contain N-NO, 54–66 contain O-NO, 67–84 contain S-NO, 85–92 contain C-NO;cMeasured in CH3CN at 25 °C by titration calorimentry;dThe calculated homolysis BDE are with zero-point energy (ZPE) and thermal corrections to enthaply at 298 K by B3LYP/6-31G (d).

Table 3. Correlation matrix between the selected molecular descriptors.
Table 3. Correlation matrix between the selected molecular descriptors.
Table 4. Eigenvalues and cumulative contributions of variances.
Table 4. Eigenvalues and cumulative contributions of variances.
No.EigenvaluesVariances (%)Cumulative (%)
Table 5. First six principal components of the weight coefficients and the corresponding molecular descriptors.
Table 5. First six principal components of the weight coefficients and the corresponding molecular descriptors.
Table 6. The experimental homolysis BDE values and the differences between the experimental and calculated values of 92 organic molecules (in kcal mol−1).
Table 6. The experimental homolysis BDE values and the differences between the experimental and calculated values of 92 organic molecules (in kcal mol−1).
No.Expt. aDeviation bDeviation cDeviation dDeviation e

aExperimental data;bDifferences between the B3LYP calculated and experimental values;cDifferences between calculated and experimental values for DFT-F-GRNN calculation;dDifferences between calculated and experimental values for DFT-G-GRNN calculation;eDifferences between calculated and experimental values for DFT-GP-GRNN calculation;fOrganic molecules belong to the test set.

Table 7. RMS of B3LYP/6-31G (d), DFT-F-GRNN, DFT-G-GRNN, and DFT-GP-GRNN correction (in kcal mol−1).
Table 7. RMS of B3LYP/6-31G (d), DFT-F-GRNN, DFT-G-GRNN, and DFT-GP-GRNN correction (in kcal mol−1).
Training set5.400.480.380.30
Test set4.690.550.460.39
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top