Improving the Accuracy of Density Functional Theory (DFT) Calculation for Homolysis Bond Dissociation Energies of Y-NO Bond: Generalized Regression Neural Network Based on Grey Relational Analysis and Principal Component Analysis

We propose a generalized regression neural network (GRNN) approach based on grey relational analysis (GRA) and principal component analysis (PCA) (GP-GRNN) to improve the accuracy of density functional theory (DFT) calculation for homolysis bond dissociation energies (BDE) of Y-NO bond. As a demonstration, this combined quantum chemistry calculation with the GP-GRNN approach has been applied to evaluate the homolysis BDE of 92 Y-NO organic molecules. The results show that the ull-descriptor GRNN without GRA and PCA (F-GRNN) and with GRA (G-GRNN) approaches reduce the root-mean-square (RMS) of the calculated homolysis BDE of 92 organic molecules from 5.31 to 0.49 and 0.39 kcal mol−1 for the B3LYP/6-31G (d) calculation. Then the newly developed GP-GRNN approach further reduces the RMS to 0.31 kcal mol−1. Thus, the GP-GRNN correction on top of B3LYP/6-31G (d) can improve the accuracy of calculating the homolysis BDE in quantum chemistry and can predict homolysis BDE which cannot be obtained experimentally.


Introduction
Nitric oxide (NO) is an important signaling and effector molecule that is key to many physiological functions of the body (e.g., blood pressure regulation, immune system and nerve conduction), and plays a vital role in the regulation of [1][2][3][4][5][6][7][8][9]. NO has a high chemical activity and rarely exists in the form of free radical in the body. It becomes stable by binding itself with the carrier molecules in the body in specific binding sites that enable its storage, transfer and release.
Inspired by the concepts of the "proton affinity" and "electron affinity", Cheng et al. proposed the concept of NO affinity which can be defined as the measure of the strength for a receptor (X) to bind with the NO group [10]. It is characterized using the energies of the Y-NO bond (Y is the atom in the carrier molecules to which the NO group is attached, Y = N, S, O, C) in two different ways: The first reaction represents the homolysis which is chemical bond dissociation of a neutral molecule generating two free radicals. The energy during the reaction is referred to as homolysis Bond Dissociation Energy (BDE).The second reaction represents the heterolysis which is chemical bond cleavage of a neutral molecule generation an anions or cations. The energy during the reaction is referred to heterolysis BDE. In recent years, Cheng et al. developed a simple experimental approach to measure the homolysis and heterolysis bond dissociation energies (BDE) of the Y-NO (Y = C, N, O, S) bond in solution [10][11][12][13][14][15][16][17][18][19][20]. Experimental data show that the heterolysis energy of Y-NO bond is generally higher than the homolysis BDE of Y-NO bond which implies that it is easier for the NO carrier to release NO than NO − and NO + . The carrier molecule is a potential free radical to bond with NO. The study of the heterolysis and homolysis BDE of the carrier molecules containing Y-NO (Y = C, N, O, S) bond helps measure the bonding and release capacity of NO in the body and understand and predict the transfer direction and mechanism of NO in the body.
Quantum chemistry approaches are not only limited to the level of experimental validation, but also can predict the BDE without experimental results or with uncertain experimental. The main reason leading to this limitation is the similarity between the computational approaches. Therefore, in recent 10 years, many statistical approaches have been used to improve the accuracy of quantum chemistry. First, the molecular properties are obtained from the calculation of quantum chemistry approaches and then the statistical approaches are applied to establish the relation between the experimental and calculated values. These statistical improvements include linear approaches, such as multiple linear regression [21], and nonlinear approaches, such as neural networks, etc. [22][23][24]. Although multiple linear regression approach is simple and intuitive, neural networks can better solve complex nonlinear problems which are difficult to model mathematically given the same physical parameters [25][26][27]. If the training of the neural networks is based on the back propagation (BP) algorithm, it is vulnerable to the slow convergence rate, and gets stuck at the local minimum points [28]. The genetic algorithm is an efficient global search approach which has been adopted in problems with a large search state-space to explore the globally optimal results. Therefore, the genetic algorithm can be used to optimize the weights of neural network [29]. Roman M. Balabin et al. estimate the density functional theory (DFT) energy with large basis set using lower-level energy values and molecular descriptors [30]. A total of 208 different molecules were used for the artificial neural network (ANN) training, cross validation, and testing by applying BLYP, B3LYP, and BMK [31] density functionals. An expected error, mean absolute deviation, ANN approximation to DFT energies was 0.6 ± 0.2 kcal mol −1 . Wu and Xu proposed the X1 approach that combines the DFT (B3LYP) with the neural network correction for an accurate prediction of formation heat [25]. An error close to the G3 approach (1.34 versus 1.05 kcal mol −1 for the G3/99 molecule set) was reached.
Chen and co-workers proposed a DFT-NEURON approach to establish the quantitative relationship between the experimental data and the results computed from the first main principle [23]. This relationship was then used to reduce the error margin of the values of the computed absorption energy [32]. With the TDDFT/B3LYP/6-31G (d) approaches, the root-mean-square (RMS) for the absorption energies in 60 organic molecules was reduced from 0.33 to 0.09 eV. Recently, our research group proposed a successful improvement approach based on genetic algorithm and neural network (GANN) to correct the absorption energies of 150 organic molecules [33]. In addition, we also proposed a least squares support vector machine approach to correct the absorption energies of 160 organic molecules [34].
There are mainly two factors affecting the accuracy of the calculation of the homolysis BDE: (1) the selection of molecular descriptors and pretreatment; (2) the statistical approaches. Some researchers only focus on the statistical approach selection and ignore the molecular descriptors selection and pretreatment. The subjective choice of molecular descriptors, the distribution of the weights, the redundancy in the chosen molecular descriptors and the multiple correlations in molecular descriptors all affect the final results. Therefore, the molecular descriptors selection and pretreatment are significant.
In this paper, the Generalized Regression Neural Network (GRNN) based on the grey relational analysis (GRA) and principal component analysis (PCA) (GP-GRNN) approach is proposed to improve the accuracy of calculating the homolysis BDE of 92 organic molecules. The DFT B3LYP/6-31G (d) approach is first applied to optimize the carrier molecules and calculate their frequency in order to obtain the homolysis BDE value and relevant molecular descriptors of the Y-NO (Y = C, N, O, S) bond. GRA is used to select the appropriate molecular descriptors. PCA is used to optimize the selected molecular descriptors. Finally, GRNN is used to establish nonlinear model. Then the GP-GRNN is applied to reduce the RMS of homolysis BDE for the 92 organic molecules. The results show that GP-GRNN is a more accurate and informative correction technique in chemical physics.

Grey Relational Analysis
If the index characteristics of a system are represented as a reference array x 0 (n-dimensional), the array is used as reference index (experimental value of homolysis BDE in this paper). If an array that has multiple characteristics is related to the reference index, then the array is represented as x i (i = 1,2,…,m) (n-dimensional). To measure the relationship between the reference index and the arrays x i (i = 1,2,…,m), the concept of relational coefficient is introduced below: where α is the distinguishing coefficient, r i (k) is the relational coefficient between i x and 0 x at the kth characteristic.
The following equation further processes the relational coefficients to obtain the i  which is the relation degree between i x and 0 x .

Principal Component Analysis
PCA is a multivariate statistical approach used for reducing variables in the data [38]. Its basic content is to convert a set of original indices into a set of comprehensive indices which are new and uncorrelated with one another to replace the original. According to the actual needs, it is likely to select several fewer comprehensive indices which can reflect the information of original indices as much as possible to represent the total indices of original variables to achieve the purpose of variable reduction.
PCA is aimed at compressing the number of variables and making the model reflect the real situation better by using fewer variables to explain most of the variables in the original data and eliminating the redundancy. This means to convert a number of highly correlated variables into new, fewer and independent of one another variables, i.e., principal component which can explain most of the variance of the original data. This approach can erase the collinearity existing in the original variables, and overcome the problems, such as instability of calculation, ill-conditioned matrix and so on and so forth.
The following part shows the approach and calculation steps that PCA adopts to determine the weight. We suppose that the number of sample (organic molecules) is n and the number of indicators' (molecular descriptors) value in each sample is m. Then, we can organize the experimental data into a matrix.
[ ]( 1, 2,..., ; 1, 2,..., ) (a) Standardize the original data The indicator in each sample is converted into the standardized indicator X * j according to the standardized Equation 6. Therein, X j and S j is respectively are the mean and standard deviation of X j . The mean of X * j is 0 and the variance is 1.
is called the variance contribution of the principal component Z i , i.e., the weight of the principal component Z i .

Generalized Regression Neural Network
GRNN was proposed by the American scholar D. F. Specht [39]. The approach uses vertical basis function as the basis of the hidden units to form the hidden layers. The hidden layers (include pattern layer and summation layer) transform the input vectors from the low-dimensional input data into a high dimensional space so that the problem can be separated linearly in the high dimensional space. It is good at function approximation and the network finally converges to the optimized regression plane which contains the most samples. It can predict well, even with very few sample data, and can handle instability in the data. The structure of GRNN is composed of four layers, input layer, pattern layer, summation layer and output layer ( Figure 1).  ( 1, 2,..., ) 2 therein, X is the network's input variable and δ is the smoothing factor which determines the shape of function. The larger the value is, the smoother the function is. X i is the corresponding study sample of neuron i. Each unit in the pattern layer corresponds to a training sample and the Gaussian function is treated as the activation of kernel function. Two types of neurons are used for summation in the summation layer, one is whose connection weight with each neuron in the pattern layer is 1, and the other is whose connection weight is each factor y i of the output sample in the pattern layer in which the weighted sum is adopted to work out the summation of the output of corresponding neurons. The output of neurons in the output layer is i.e., the output of network. In our paper, Figure 2 shows a flow chart of GP-GRNN model calculation.

Data Set
Ninety-two (92) important organic carrier molecules containing NO are studied in this work. They are the four typical NO carrier molecules in the acetonitrile solution: N-nitrosamine compounds, O-nitrite, C-nitroso compounds, S-nitrosylation compounds (their molecular structures are shown in Table 1). The data set is randomly divided into a training set (80 molecules) and a test set (12 molecules). The training set used to adjust the model parameters and the test set is used to test the model's predictive ability.

Calculation of Molecular Descriptors
All of the calculations were done using Gaussian03 [40]. The geometry was optimized at the DFT/B3LYP level with 6-31G (d) basis set. Subsequently, vibrational frequencies were performed at the same theoretical level to confirm their local minima. The gas phase homolysis BDE is defined as the enthalpy change of the Equation 1 at 298 K in a vacuum [41]. The enthalpy of formation for each species was calculated using the following equation: The zero point energy correction was taken into account in the calculation. ΔH 298-0 is the standard temperature correction term including H vib , H rot and H trans .
A set of molecular descriptors can be obtained from the geometry optimization and frequency calculations. The molecular descriptors include the calculated homolysis BDE value (ΔH homo ), the net charge (Q Y ) on the Y(C, N, O, S) atom which is bonded with the NO, the net charge (Q N ) and (Q O ) on the atoms N, O in the NO molecule fragment, the number of electrons (N X ) on the fragment X (excluding the NO radical), the molecule dipole moment (μ) and the molecule polarizability (α), the highest occupied molecular orbital and the lowest unoccupied molecular orbital energy (E HOMO ) and (E LUMO ), the second highest occupied molecular orbital and the second lowest unoccupied molecular orbital energy (E HOMO−1 ) and (E LUMO+1 ), the energy gap (ΔE) between E HOMO and E LUMO . The molecular descriptors can reflect covalent and ionic interactions in a molecule bond.

Calculation of Descriptor
For the 92 organic molecules containing the Y-NO (Y = C, N, O, S) bond, the B3LYP function is used to optimize the geometry at 6-31G (d) basis set level and the frequency is calculated to confirm the stable structure. Finally, the homolysis BDE of Y-NO (Y = C, N, O, S) bond and relevant molecular descriptors are obtained and shown in Table 2. Table 2. Theoretical calculation of the Y-NO (Y = C, N, O, S) bond homolysis BDE (in the gas phase) and molecular descriptors a .   In addition, Table 2 also provides the homolysis BDE values that are observed in the experiment in the acetonitrile solution. The higher the homolysis BDE is, the stronger that the NO can bond with the carrier and vice versa. Lower homolysis BDE indicates that the carrier can serve as a good NO releasing agent. The four types of carrier molecules studied in this paper have lower homolysis BDE and are excellent NO radical carriers in the body. From Table 2

Calculation Results of GRA
The experimenatal value of homolysis BDE of the 92 carrier molecules are used as the reference array. The 12 computed molecular descriptors are used as the contrast array. The closer the relation is to 1, the more relations the two arrays have and vice versa.
When the distinguishing coefficient is set to 0. 5 The relational coefficient is 0.8902 which indicates a good match between the theoretical calculations and the experimental values. The difference between the E HOMO energy level and E LUMO energy level ΔE can measure the stability of the molecules and has a large impact on the homolysis BDE. In the NO molecular fragments, the oxygen's electronegativity is greater than that of nitrogen and the net charge of oxygen has a larger impact than that of the nitrogen on the homolysis. The net charge on the Y(C, N, O, S) atoms which are connected to NO has the minimal impact on the homolysis which implies that the ionicity of chemical bonds is smaller. The polarization rate can measure the molecular deformation and affect the homolysis BDE. The number of electrons on the molecular fragment X which is connected to NO has a certain impact on the homolysis. The dipole moment is a vector that has little impact on the homolysis which indicates that the Y-NO direction has little impact on the molecular dipole moment.

Calculation Results of PCA
According to the results of GRA, the molecular descriptors that have a relation of greater than 0.8 are selected. The first eight molecular descriptors are used as the basic characteristics. A correlation matrix is generated after the correlation analysis on the eight molecular descriptors (Table 3). It can be seen that there is a certain correlation between them and the correlation between α and N X is as high as 0.9331. It is inevitable that increasing the complexity of data analysis, if these eight selected molecular descriptors are taken as the final attribute characteristics directly, there would be some problems such as instable calculation and ill-conditioned matrix which are caused by superposed information existing in the above eight molecular descriptors. Nevertheless, these problems can be avoided through PCA that can also make the weight distribution of the molecular descriptors more reasonable, avoid the redundant information, and eliminate the not useful information. The PCA is performed on the eight selected molecular descriptors to obtain the eigenvalues, variance and cumulative variance contribution rate for each principle component (Table 4). It can be seen that the first six principal components can explain 99.63% of the total variance of all variables, which means that the six indicators of the new indicators system can reflect 99.63% differences in samples. The eigenvalue for the seventh principal component is already very small and the eighth eigenvalue is 0. The smaller the eigenvalue is, the less amount of information its principal component contains. Therefore, the seventh principal component contains very little useful information and the eighth principal component contains no useful information. Therefore, the first six principal components are selected to avoid redundancy of information and eliminate the interference information, namely no useful information. The weights of the first six principal components and the corresponding molecular descriptors are shown in Table 5. From the analysis of the weights of the first six principal components, it can be seen that the E HOMO−1 and E LUMO have a higher load in the first principal component, the ΔH homo and ΔE on the second principal component have a higher load, the N X and α on the third principal component have a higher load, the ΔH homo on the fourth principal component has a higher load, the Q O on the fifth principal component has a higher load and the E HOMO on the sixth principal component has a higher load. Thus the assignment to the weights has the theoretical basis. Meanwhile, the variance between either two of six principal components is 0 which means that the six principal components are unrelated to each other. Therefore, the instability and ill-conditioned matrix problems can be avoided in the computation. After the GRA and PCA approaches are performed for the molecular descriptors selection and optimization, the first six principal components are used as the final GRNN inputs. To assess the GP-GRNN approach's effect on calculating the homolysis BDE of 92 organic molecules, we compare the GP-GRNN correction results with the B3LYP/6-31G (d) correction results, the correction results of the full-descriptor GRNN without GRA and PCA (F-GRNN) and with GRA (G-GRNN), respectively. The differences between the experimental and calculated homolysis BDE for the F-GRNN, G-GRNN and GP-GRNN correction results are tabulated in Table 6.    For GRNN, the initialization is to determine the study process of training samples. Then, the connection weight between the network structure and each neuron is determined after the determination of the study samples. The training process of network is just the process of determining δ. In the training process, the learning algorithm is to adjust the transfer function of each unit to acquire the best results of regression estimation by changing δ, not by adjusting the connection weight between neurons.
In the transfer function, the value of δ is increased progressively from 0.02 to 1 by the constant of the variation of 0.02. The optimal output of neural network can be decided in the process of the variation of δ. For F-GRNN, G-GRNN and GP-GRNN approaches, when the values of δ are respectively 0.18, 0.08 and 0.10, the best results of regression estimation appear. For the training test, the RMS before correction is 5.40 kcal mol −1 , and the RMS of F-GRNN and G-GRNN after correction is 0.48 and 0.38 kcal mol −1 ; however the RMS of GP-GRNN is 0.30 kcal mol −1 . For the test set, the RMS respectively decreases from 4.69 to 0.55, 0.46 and 0.39 kcal mol −1 ( Table 7). The GP-GRNN approach improved DFT calculation results in both the training set and the test set separately. After the correction of GP-GRNN, the deviation between the value of each sample and the experimental value of homolysis BDE in the test set is reduced. Nevertheless, two deviations (in sample 22 and 24) are much bigger than the other ten. The reasons why the above two deviations are much bigger lie in the following two facts. (1) When the smoothing factor δ is larger, the network output Y (predictive value of homolysis BDE) approaches the mean of experimental value of homolysis BDE in all samples. On the contrary, when the smoothing factor δ tends to 0, Y is close to training sample. When the points which need predicating are included in the training sample, the predictive value of the experimental value of homolysis BDE calculated by using Equation 11 approaches corresponding experimental value of homolysis BDE in sample. Nevertheless, the predictive results may be worse if there are some sample points excluded in the sample. The value of δ is not the larger the better, nor the smaller the better. When the value of δ is moderate, all the experimental value of homolysis BDE in the training sample are taken into account and the experimenatal value of homolysis BDE corresponding to the sample points close to the predictive points add more weight. Hence, the value of δ is selected in this paper after many experiments with the aim to acquire the best network output. (2) Sample data is few in the training set and the features cannot be extracted in the training procedure of neural network. The prediction accuracy of GP-GRNN approach can be further improved as more and better experimental data are available. The consistency between the training and test set implied that the GP-GRNN results could indeed predict the homolysis BDE with higher accuracy than F-GRNN and G-GRNN. Table 7. RMS of B3LYP/6-31G (d), DFT-F-GRNN, DFT-G-GRNN, and DFT-GP-GRNN correction (in kcal mol −1 ). The B3LYP/6-31G (d) calculations are carried out to evaluate the homolysis BDE of the 92 organic molecules, and their overall resulting RMS from the experimental data is 5.31 kcal mol −1 . Upon the traditional F-GRNN correction approach, the RMS of the calculated homolysis BDE of the 92 organic molecules is reduced from 5.31 to 0.49 kcal mol −1 for the B3LYP/6-31G (d) calculation. With the G-GRNN and GP-GRNN correction, the RMS is reduced from 5.31 to 0.39 and 0.31 kcal mol −1 , respectively (Table 7). From Table 7, it can be seen that the correction result acquired from G-GRNN is better than that acquired from F-GRNN. If the molecular descriptors are added to G-GRNN, the good fitting ability and the poor generalization ability appear on the training set and the test set, respectively. This denotes that over fitting happens with the F-GRNN. Similarly, the correction result from GP-GRNN is also better than that from G-GRNN. This better result is facilitated by PCA which can optimize selected molecular descriptors. From the above discussion, it can be deduced that the descriptors selection and optimization play a key role to obtain a perfect model, although GRNN theoretically shows a stronger capability of anti-redundancy.

Conclusions
GP-GRNN approach was successfully used to improve the homolysis BDE calculation's accuracy. GRA is used to select the appropriate molecular descriptors. PCA is used to optimize the selection of molecular descriptors. GRNN is used to establish non-linear model. The GP-GRNN approach reduced the calculated RMS of 92 organic molecules from 5.31 to 0.31 kcal mol −1 . Compared with the F-GRNN and G-GRNN, GP-GRNN is more feasible and effective. Further, the GP-GRNN correction on top of the B3LYP/6-31G (d) results is a better approach to correct homolysis BDE and can be used as the approximation of experimental results, when the experimental results are limited to measurement, with very high accuracy. GP-GRNN approach extends the B3LYP/6-31G (d)'s feasibility and applicability. The more experimental data the training set has, the more accurate the GP-GRNN approach will be. GP-GRNN approach can be not only used to calculate the homolysis BDE, but also it can be applied to calculate the heterolysis BDE, absorption energy, ionization energy, formation heat and so on. In summary, the GP-GRNN approach is an effective and predictive tool that can be used in the study of physical and chemical properties at the molecular level.