This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
A DFTSOFMRBFNN method is proposed to improve the accuracy of DFT calculations on YNO (Y = C, N, O, S) homolysis bond dissociation energies (BDE) by combining density functional theory (DFT) and artificial intelligence/machine learning methods, which consist of selforganizing feature mapping neural networks (SOFMNN) and radial basis function neural networks (RBFNN). A descriptor refinement step including SOFMNN clustering analysis and correlation analysis is implemented. The SOFMNN clustering analysis is applied to classify descriptors, and the representative descriptors in the groups are selected as neural network inputs according to their closeness to the experimental values through correlation analysis. Redundant descriptors and intuitively biased choices of descriptors can be avoided by this newly introduced step. Using RBFNN calculation with the selected descriptors, chemical accuracy (≤1 kcal·mol^{−1}) is achieved for all 92 calculated organic YNO homolysis BDE calculated by DFTB3LYP, and the mean absolute deviations (MADs) of the B3LYP/631G(d) and B3LYP/STO3G methods are reduced from 4.45 and 10.53 kcal·mol^{−1} to 0.15 and 0.18 kcal·mol^{−1}, respectively. The improved results for the minimal basis set STO3G reach the same accuracy as those of 631G(d), and thus B3LYP calculation with the minimal basis set is recommended to be used for minimizing the computational cost and to expand the applications to large molecular systems. Further extrapolation tests are performed with six molecules (two containing SiNO bonds and two containing fluorine), and the accuracy of the tests was within 1 kcal·mol^{−1}. This study shows that DFTSOFMRBFNN is an efficient and highly accurate method for YNO homolysis BDE. The method may be used as a tool to design new NO carrier molecules.
Over the past two decades, firstprinciples calculations have become an attractive complement or alternative to wet chemistry experiments for studying molecular properties and chemical reaction mechanisms. Great progress has been made: calculation speed has accelerated and the size of the target molecules has increased, as has the computational accuracy [
One firstprinciples method, hybrid density functional theory (DFT) has become very popular in recent years because of its efficiency and accuracy. With the introduction of exchange and correlation functionals, DFT costs much less than other highlevel
However, only a few reports investigate preprocessing molecular descriptors [
Nitric oxide (NO) performs significant physiological functions in human life processes [
In this article DFT, SOFMNN and RBFNN methods are combined to improve the accuracy of the calculations of homolysis YNO BDE by DFT. The first section describes the neural network methods SOFMNN and RBFNN; the second section describes calculations using the DFT B3LYP method with two basis sets, 631G(d) and STO3G, and the collection of the calculated homolysis BDE and relevant molecular descriptors of YNO bond; the third part discusses the calculation results from the DFT, SOFMNN and RBFNN methods, as well as classifying appropriate molecular descriptors by the SOFMNN method, setting up RBFNN and optimizing the nonlinear model for both B3LYP results. In the last section, our conclusions are summarized.
Selforganizing feature mapping neural network (SOFMNN) was proposed by Kohonen in 1981 around the concept that an ordered arrangement of neurons could reflect certain physical properties of sensed external stimuli [
A characteristic of SOFMNN is that the featured topology distribution of the input signal can be established in terms of an array of onedimensional or twodimensional processing units so that SOFMNN may extract features of the input signal. This is of great importance to correct firstprinciples calculations using the neural network because the neural network must extract precisely the essential information from inputs obtained by firstprinciples methods. Calculations over the past few decades have proved that primarily firstprinciples methods can capture the physical essence of molecules. These characteristics of SOFMNN are the strength of our DFTSOFMRBFNN method to achieve highaccuracy calculations. The procedures of the SOFMNN learning algorithm are as follows:
Network initialization
The input layer and competitive layer are composed of
The winning neuron calculation
A training sample
where
If the
Weight update
The weights of the winning neuron
Learning rate and neighborhood neurons update
Once the weights of the winning neuron and the neighborhood neurons are updated, the learning rate and neighborhood neurons must be updated before the next iteration according to
where the operator ⌈ ⌉ represents rounding up.
Iteration
If the learning process is not finished, another sample will be randomly chosen to continue the calculation, and the iteration returns to step (2), or if
In 1985, Powell proposed the radial basis function (RBF) method of multivariable interpolation [
The basic idea of RBFNN uses RBF as the “basis” of neurons in the hidden layer to construct the hidden layer space. Thus, input vectors can be mapped directly to the hidden layer space without weights between the input layer and hidden layer. Once the RBF central point is determined, the mapping relationship is determined. The mapping from the hidden layer space to the output layer space is linear,
The specific steps of the learning algorithm of the RBFNN are as follows:
Determining the RBF center of neurons in the hidden layer
The input matrix
where
The corresponding RBF center of
Determining the threshold value of neurons in the hidden layer
The corresponding threshold value of
where
Determining weights and threshold values between the hidden layer and the output layer
Once the RBF center and threshold value of neurons in the hidden layer is determined, the output of neurons in the hidden layer can be obtained by
where
The connection weight
where
If the threshold value
The weight
In total, 98 organic molecules were used in the dataset for this study. Six molecules were added to the set of molecules used in our previous study [
Molecular descriptors should represent typical characteristics of molecules and closely correlate to the quantity of concern. Because we intended to develop an easytouse method, simple descriptors were favored. Because the DFT calculation results are corrected and performed for each molecule, quantum chemical descriptors are readymade. In addition to quantum chemical descriptors, constitutional descriptors such as the molecular weight, number of atoms, and number of electrons are also better descriptors due to their ease of generation. All DFT calculations were performed using the Gaussian03 software package [
The homolysis BDE are calculated using DFT B3LYP method with two basis sets, 631G(d) and STO3G. The minimal basis set STO3G consists of 1 function for H, 5 functions for Li to F and 9 functions for Na to Cl; the basis set 631G(d) consists of 2 functions for H, 15 for Li to F and 19 function for Na to Cl. So for most organic molecules, STO3G only contains less than half of 631G(d) basis functions. Then with the STO3G basis set much time can be saved during DFT calculations. For example, the B3LYP frequency calculation for molecule 85 takes 114 minutes with the basis set 631G(d), while it only takes 13 minutes with the basis set STO3G. This offers applications for molecules that are quite large. The calculated values of the YNO (Y = C, N, O, S) homolysis BDE, the experimental data and the corresponding molecular descriptors of the 92 molecules are listed in
By analyzing the molecular descriptors, we find that, in the B3LYP/631G results, the charge on the N atom of NO does not change with the charge on Y. The electronegativity of Y itself is most likely the key factor determining the amount of charge on N because the charge on the N atom only changes with the type of Y atoms. Neither the structure of molecular fragments that connect to Y nor the amount of charges on Y has much effect on the charge value of N. When Y = N, O, S, C, the charges on the N atom of YNO are between 0.21–0.25e, 0.38–0.44e, −0.01–0.08e and 0.13–0.17e, respectively; the charges on Y change in the range from −0.63–0.33e, as determined by the rest of molecules. The charges on the O atoms do not fluctuate very much and have no clear pattern. In
Structural analysis indicates that the conformation of the molecules and functional groups on the aromatic rings are shown to affect the homolysis BDE. Conformational effects reported by the Guo group show that syn and anti conformations induce BDE differences between isomers [
To study the correlation between the molecular descriptors and the YNO experimental homolysis BDE, a correlation analysis was performed. The results show that the B3LYP/631G(d)calculated homolysis BDE values (ΔH_{homo}) are the most relevant to the experimental homolysis BDE and the correlation coefficient is 0.64, which proves that DFT calculations indeed capture the essence of physics. This is the reason that DFTcalculated homolysis BDE (ΔH_{homo}) are considered the primary descriptor. The correlation coefficients of other strong related molecular descriptors are as follows: E_{HOMO}(0.51), Q_{N}(0.49), Q_{Y}(0.46) and E_{HOMO1}(0.43). The remaining descriptors have numerically weaker relationships with the experimental homolysis BDE, and the correlation coefficients are α(0.28), ΔE(0.27), μ(0.18), E_{LUMO}(0.17), N_{X}(0.12), E_{LUMO+1}(0.05) and Q_{O}(0.02). For the molecular descriptors calculated by B3LYP/STO3G, the correlation coefficients in decreasing order are E_{HOMO}(0.48), Q_{N}(0.43), E_{HOMO1}(0.40), Q_{Y}(0.39), ΔH_{homo}(0.35), ΔE(0.34), α(0.31), N_{X}(0.12), E_{LUMO}(0.06), E_{LUMO+1}(0.05), Q_{O}(0.05) and μ(0.03). The coefficient shows that the calculated ΔH_{homo} by B3LYP/STO3G has a weaker relationship with the experimental homolysis BDE than that of B3LYP/631G(d) due to its poor accuracy. In addition, it can be seen that the types of molecular descriptors strongly related with the experimental homolysis BDE do not change greatly. This suggests that the B3LYP/STO3G calculation results essentially agree with the B3LYP/631G(d) results, but with large deviations.
The deviations of all the methods are listed in
Descriptor selection is a significant step for neural networks, but reports on this topic are scarce [
SOFMNN clustering analysis for the molecular descriptors is illustrated by the B3LYP/631G(d) calculation results. When twelve molecular descriptors (ΔH_{homo}, Q_{Y}, Q_{N}, Q_{O}, N_{X}, μ, α, E_{HOMO1}, E_{HOMO}, E_{LUMO}, E_{LUMO+1} and ΔE) are taken as the input of SOFMNN, the input layer of SOFMNN contains twelve neurons, and a 6 × 4 pattern is adopted in the network structure of the competitive layer (
In the SOFMNN calculation, only one neuron wins each time. Its weight and the corresponding weights of its peripheral neurons are adjusted synchronously, and the weights of the neurons change in favor of winning the competition. At the same time, SOFMNN reduces the neighborhood area gradually and starts to repulse its neighbor neurons. The mode combining cooperation with competition allows SOFMNN to acquire superior performance and significantly improves the learning ability and generalization of the neural network. After running the SOFMNN program, the resulting labels are likely different because the excited neurons are different each time, but the final clustering result does not change no matter which neuron is excited.
As mentioned above, eight descriptors (ΔH_{homo}, Q_{Y}, N_{X}, μ, α, Q_{N}, E_{HOMO} and E_{LUMO}) for B3LYP/631G(d) and nine descriptors (ΔH_{homo}, Q_{Y}, Q_{N}, E_{HOMO}, N_{X}, μ, α, E_{LUMO} and ΔE) for B3LYP/STO3G selected by SOFMNN clustering analysis and correlation analysis were taken as the RBFNN final inputs. These inputs of RBFNN must be normalized to make the learning and training process easier because the magnitude of the raw data may vary widely if very different raw data are input directly into the neural network. Data with large fluctuations might monopolize the RBFNN learning process, and the network may fail to reflect small changes in data.
In RBFNN, the value of spread is increased from 0.2 to 3 by the constant with a variation of 0.2. The optimal neural network output can be decided during the variation of spread. For DFTRBFNN and DFTSOFMRBFNN methods, the best results of regression estimation are achieved when the values of spread are 0.6 and 0.8, respectively.
If we use fewer molecular descriptors, how many descriptors should we choose and which ones should be chosen? These questions can be answered by SOFMNN coupled with correlation analysis.
During this study, we considered the extrapolation of the method to larger molecules and molecules with more types of elements as well as to different YNO bonds in addition to the four types in this dataset, so we preferred descriptors that were independent of the elemental types. After establishing the DFTSOFMRBFNN method, some molecules were used to test the ability to extrapolate. The structures of the molecules and the calculation results are shown in
The excellent performance of the DFTSOFMRBFNN method benefits from the combined advantages of all the methods. DFT molecular descriptors represent the physical essence of the homolysis BDE; the RBFNN is independent of the initial weights and thresholds, converges quickly to global minima, has few parameters that must be adjusted, shows great capacity for reverse redundancy and fault tolerance and possesses a builtin nonlinear model capable of carrying out calculations with a partial response. As a result of the SOFMNN cluster analysis, the significant features of the descriptors have been discovered and the number of descriptors can be narrowed down, so that the accuracy and efficiency of RBFNN calculations are improved. The combined DFTSOFMRBFNN method improves the DFT calculations and develops new applications in chemistry for SOFMNN and RBFNN.
To compare the DFTSOFMRBFNN calculations with more sophisticated DFT calculations with a larger basis set, the M062X/6311 + G(2d,p) calculations with or without the solvent effect are performed for the four smallest molecules from each type of YNO molecule. The results are listed in
Recently, artificial intelligence, or “machine learning,” has begun to be employed to solve firstprinciples/quantum chemical calculation problems in a simple and efficient manner; therefore, they can reach statistically interesting problems rather than simply solving wave functions. In this study, the accuracy of the DFT calculations for the homolysis BDE of 92 organic NO carrier molecules was improved by the proposed DFTSOFMRBFNN method, which combines firstprinciples DFT and artificial intelligence SOFMNN and RBFNN. The DFT computes the molecular descriptors/quantum mechanical descriptors, the SOFMNN performs cluster analysis to classify molecular descriptors, and the correlation analysis selects descriptor from each classified group; thus, subjective opinions on and the biases of molecular descriptors can be avoided. Thereafter, RBFNN uses these selected molecular descriptors as inputs to correct the DFTcalculated homolysis BDE. The DFT calculations are performed by B3LYP with two basis sets, the minimal basis set STO3G and the medium size basis set 631G(d). In total, twelve descriptors are obtained, eight and nine groups are categorized by SOFMNN for descriptors acquired with the B3LYP/631G(d) and B3LYP/STO3G basis sets, respectively. After the final RBFNN calculations, chemical accuracy (≤1 kcal·mol^{−1}) is achieved for all DFTcalculated homolysis BDE of 92 NO carrier molecules. The overall MADs of the homolysis BDE calculated by the B3LYP method with the 631G(d) and STO3G basis sets decrease from 4.45 to 0.15 kcal·mol^{−1} and from 10.53 to 0.18 kcal·mol^{−1}, respectively. Although the raw MAD by B3LYP/STO3G was much worse than that of B3LYP/631G(d), high accuracy for B3LYP/STO3G has yet to be obtained. The minimal basis set DFTSOFMRBFNN could apply to fairly large molecules; additionally, the molecular descriptors used are general, which makes the method easy to use and further extrapolate to various system; extrapolation tests proved that highaccuracy results can be achieved for molecules with different types of YNO bond and systems including atoms not already in the database. In particular, the highaccuracy result obtained in the study is practically important for the design of new types of NOreleasing drug molecules. We firmly believe that DFTSOFMRBFNN can calculate not only the homolysis BDE but also other interesting properties such as bond heterolysis energy, optical properties, power conversion efficiency, and further research is ongoing.
The authors gratefully acknowledge financial support from the Program for Changjiang Scholars and Innovative Research Team in University (IRT0714), the National Basic Research Program of China (973 Program2009CB623605), the National Natural Science Foundation of China (20903020), the Science and Technology Development Planning of Jilin Province (20100114, 20110364 and 20125002), and the Fundamental Research Funds for the Central Universities (11QNJJ008 and 11QNJJ027).
The structure of selforganizing feature mapping neural network (SOFMNN).
The structure of radial basis function neural network (RBFNN).
(
The histograms of deviations between the different calculated homolysis BDE and the experimental values for 92 organic molecules, (
Deviations between experimental and calculated values of 92 organic molecules from different methods, reported in kcal·mol^{−1}.
No.  B3LYP/631G(d)  B3LYP/STO3G  DFTRBFNN  DFTSOFMRBFNN  


 
631G(d)  STO3G  631G(d)  STO3G  
1  −17.17  −6.89  −0.12  −1.84  −0.04  −1.18 
2  −7.88  2.66  0.46  0.12  0.38  0.22 
3  −9.31  0.85  −0.48  −0.65  −0.38  −0.58 
4  −9.29  1.27  −0.03  −0.02  −0.01  −0.01 
5  −9.77  0.14  0.00  −0.01  0.00  0.00 
6 
−9.13  1.04  −0.40  −0.53  −0.34  −0.46 
7  −9.01  1.11  0.05  −0.01  0.03  0.01 
8  −12.53  0.28  −0.03  0.00  −0.01  0.00 
9  −13.13  −3.06  0.00  0.00  0.00  0.00 
10  −10.9  −0.51  −0.01  −0.01  0.00  0.00 
11  2.16  12.31  0.07  0.02  0.04  0.01 
12  2.70  13.23  0.58  0.81  0.55  0.68 
13  1.72  12.17  −0.34  0.42  −0.34  0.23 
14  −0.39  10.26  −0.10  0.03  −0.06  0.01 
15  −1.56  10.1  0.00  0.01  0.00  0.01 
16  1.69  11.63  0.00  0.01  0.00  0.00 
17  2.00  12.39  −0.20  0.25  −0.23  0.13 
18  −8.37  2.73  −0.16  0.05  −0.06  0.03 
19  −7.30  4.12  −0.28  −0.02  −0.21  −0.01 
20 
−6.93  4.16  −0.22  −0.47  −0.21  −0.41 
21  −7.68  3.96  0.29  0.01  0.27  0.00 
22  −10.58  0.56  0.00  0.00  0.00  0.00 
23  −2.11  8.33  0.01  −0.93  0.06  −0.75 
24  3.45  12.33  0.35  0.67  0.19  0.45 
25  −8.07  3.05  −0.53  −0.21  −0.51  −0.18 
26  −7.90  3.23  0.28  0.18  0.29  0.17 
27 
−8.60  2.58  −0.42  −0.01  −0.38  −0.01 
28  −8.22  4.07  0.01  0.00  0.00  0.00 
29  −4.97  6.77  0.00  0.00  0.00  0.00 
30  1.87  −11.2  0.00  0.02  0.00  0.01 
31 
1.97  −11.27  −0.05  0.00  −0.04  0.00 
32  0.33  −12.53  −0.01  −0.03  0.00  −0.02 
33 
1.91  −6.79  0.04  −0.03  0.03  −0.03 
34  0.74  −11.6  0.00  0.00  0.00  0.00 
35  1.92  −10.83  0.18  0.01  0.15  0.01 
36  0.62  −14  −0.18  0.00  −0.15  0.00 
37  1.16  10.52  0.00  0.00  0.00  0.00 
38  0.76  11.2  0.14  0.12  0.10  0.10 
39  0.29  11.06  −0.05  −0.09  −0.07  −0.08 
40  −0.36  10.68  −0.06  −0.39  −0.05  −0.36 
41  −0.41  11.52  0.00  0.00  0.00  0.00 
42  −0.04  11.72  0.02  0.40  0.01  0.37 
43  −0.26  10.28  0.04  −0.05  0.04  −0.03 
44 
−1.14  11.08  1.01  0.95  0.92  0.84 
45  −0.97  9.89  0.00  0.00  0.00  0.00 
46  0.03  12.03  0.00  0.00  0.00  0.00 
47  0.87  10.84  0.02  0.04  0.01  0.02 
48  −1.67  8.65  0.00  0.00  0.00  0.00 
49  −3.41  8.59  −0.01  −0.03  0.00  −0.02 
50  7.47  −0.71  −0.01  0.01  0.01  0.01 
51  5.60  −0.55  0.00  0.00  0.00  0.00 
52  7.03  −1.38  0.03  0.00  0.01  0.00 
53  6.33  −2.14  −0.01  −0.01  −0.01  −0.01 
54  −2.62  15.71  0.00  0.00  0.00  0.00 
55  −2.88  15.23  0.12  0.28  0.08  0.25 
56  −3.88  14.1  −0.12  −0.28  −0.08  −0.25 
57  −3.89  13.76  0.00  −0.01  0.00  −0.01 
58 
−7.57  9.35  0.00  0.00  0.00  0.00 
59  −4.88  12.76  1.26  1.19  1.20  1.14 
60  −7.33  9.84  −1.20  −1.15  −1.16  −1.12 
61  −6.90  10.9  0.17  0.26  0.20  0.28 
62  6.39  18.5  0.00  0.00  0.00  0.00 
63  4.12  17.94  0.00  0.38  0.00  0.35 
64  −9.96  16.41  0.00  −0.37  0.00  −0.34 
65  4.19  15.06  0.00  −0.01  0.00  −0.01 
66  0.55  14.42  0.00  0.00  0.00  0.00 
67  −3.51  19.3  −0.60  −0.52  −0.47  −0.43 
68  −2.46  21.15  −0.93  −0.93  −0.85  −0.90 
69  0.27  22.96  0.51  0.57  0.44  0.54 
70  0.05  22.7  0.07  0.50  0.04  0.47 
71 
2.43  22.6  0.19  0.18  0.16  0.14 
72  0.20  19.63  0.01  0.00  0.00  0.00 
73  −0.88  20.53  −0.16  −0.52  −0.09  −0.48 
74  7.91  19.5  0.02  0.03  0.01  0.02 
75  −0.36  22.56  0.38  0.39  0.39  0.40 
76  2.96  21.38  0.00  0.00  0.00  0.00 
77  1.69  22.06  0.83  0.53  0.61  0.43 
78  2.77  21.23  0.00  0.01  0.00  0.01 
79  2.52  20.27  0.21  0.00  0.13  0.00 
80  0.84  19.65  0.01  −0.01  0.00  −0.01 
81  1.17  21.22  0.00  0.00  0.00  0.00 
82  0.68  20.49  −0.21  0.00  −0.13  0.00 
83 
−2.03  16.73  −0.27  −0.57  −0.26  −0.56 
84 
−0.24  18.15  0.27  0.57  0.26  0.56 
85 
−7.63  2.33  −0.04  0.02  −0.03  0.02 
86  −4.58  6.59  0.00  0.00  0.00  0.00 
87  −7.16  5.16  0.48  0.16  0.36  0.12 
88  −8.00  2.5  0.02  0.10  0.01  0.07 
89  −3.70  11.26  0.00  0.00  0.00  0.00 
90  −10.85  0.62  −0.49  −0.26  −0.37  −0.18 
91 
−8.77  5.98  −0.16  −0.17  −0.13  −0.13 
92  −8.61  1.34  0.00  0.00  0.00  0.00 
The molecules belong to the test set.
SOFMNN clustering analysis results for twelve molecular descriptors.
DFT  Training Steps  Clustering Analysis  

 
ΔH_{homo}  Q_{Y}  Q_{N}  Q_{O},  N_{X}  μ  α  E_{HOMO1}  E_{HOMO}  E_{LUMO}  E_{LUMO+1}  ΔE  
B3LYP/631G(d)  10  24  1  1  1  24  4  24  1  1  1  1  1 
30  5  13  13  13  24  19  24  13  13  13  13  13  
50  4  12  6  12  1  21  1  12  12  12  12  12  
100  19  12  10  12  3  22  1  12  12  11  11  10  
200  16  1  8  1  11  19  24  1  1  2  2  8  
500  16  13  1  19  12  8  24  19  19  20  20  1  
1000  16  13  20  13  23  2  24  13  13  14  14  20  
 
B3LYP/STO3G  10  2  1  1  1  24  1  24  1  1  1  1  1 
30  23  1  7  1  24  5  24  1  1  1  2  7  
50  21  1  1  1  6  13  12  1  1  1  1  1  
100  21  7  19  7  24  3  12  7  7  14  19  19  
200  5  7  19  1  24  3  22  1  1  13  14  15  
500  4  16  19  21  24  8  12  21  21  20  19  13  
1000  10  13  15  19  24  2  12  19  19  20  15  21 
The extrapolation test for the DFTSOFMRBFNN method. (kcal·mol^{−1}).
No.  Structures  Expt.  B3LYP/631G(d)  DFTSOFMRBFNN 631G(d)  B3LYP/STO3G  DFTSOFMRBFNN STO3G 

1 

31.6  29.02  30.49  48.10  30.56 
2 

41.1  38.55  40.95  49.4  40.59 
3 

39.9  37.67  39.90  32.47  39.90 
4 

50.5  50.34  50.48  60.6  51.12 
5 

37.8  27.04  37.85  48.8  37.98 
6 

44.8  34.65  44.76  50.58  44.62 
The deviations of calculation methods (kcal·mol^{−1}).
NO.  DFTSOFMRBFNN 
M062X/6311 + G(2d,p)  M062X/6311 + G(2d,p) (PCM)  B3LYP/631G(d) 

39  −0.1  3.6  2.4  0.29 
59  1.2  1.5  0.8  −4.9 
76  0.0  4.2  4.2  3.0 
91  −0.1  −2.2  −4.1  −8.7 
DFTSOFMRBFNN is based on B3LYP/631G(d) calculations.