A Promising Tool to Achieve Chemical Accuracy for Density Functional Theory Calculations on Y-NO Homolysis Bond Dissociation Energies

A DFT-SOFM-RBFNN method is proposed to improve the accuracy of DFT calculations on Y-NO (Y = C, N, O, S) homolysis bond dissociation energies (BDE) by combining density functional theory (DFT) and artificial intelligence/machine learning methods, which consist of self-organizing feature mapping neural networks (SOFMNN) and radial basis function neural networks (RBFNN). A descriptor refinement step including SOFMNN clustering analysis and correlation analysis is implemented. The SOFMNN clustering analysis is applied to classify descriptors, and the representative descriptors in the groups are selected as neural network inputs according to their closeness to the experimental values through correlation analysis. Redundant descriptors and intuitively biased choices of descriptors can be avoided by this newly introduced step. Using RBFNN calculation with the selected descriptors, chemical accuracy (≤1 kcal·mol−1) is achieved for all 92 calculated organic Y-NO homolysis BDE calculated by DFT-B3LYP, and the mean absolute deviations (MADs) of the B3LYP/6-31G(d) and B3LYP/STO-3G methods are reduced from 4.45 and 10.53 kcal·mol−1 to 0.15 and 0.18 kcal·mol−1, respectively. The improved results for the minimal basis set STO-3G reach the same accuracy as those of 6-31G(d), and thus B3LYP calculation with the minimal basis set is recommended to be used for minimizing the computational cost and to expand the applications to large molecular systems. Further extrapolation tests are performed with six molecules (two containing Si-NO bonds and two containing fluorine), and the accuracy of the tests was within 1 kcal·mol−1. This study shows that DFT-SOFM-RBFNN is an efficient and highly accurate method for Y-NO homolysis BDE. The method may be used as a tool to design new NO carrier molecules.


Introduction
Over the past two decades, first-principles calculations have become an attractive complement or alternative to wet chemistry experiments for studying molecular properties and chemical reaction mechanisms. Great progress has been made: calculation speed has accelerated and the size of the target molecules has increased, as has the computational accuracy [1][2][3]. The applications of first-principles methods are rather extensive. In some studies, they have already gone beyond the level of testing and verifying experiments to predicting the properties of molecules under experimental circumstances that have not undergone real-life tests [4][5][6][7][8]. However, current first-principles calculations cannot yet meet the high accuracy needed for databases with large numbers of medium or large molecules. Deviations in calculations arise from various sources: some from inherent programs, approximations and simplifications in formulas and some from the choice of software, methods, basis sets, and so forth. In addition, we have to admit that each molecule is unique, but computational program cannot fully cover the uniqueness of each molecule, some deviations induced by unified calculations are unavoidable. These deviations can be corrected to improve calculations. Computational theory can be improved, for example, by modifying functions, avoiding approximations, and using an infinite basis set. However, these corrections are time-consuming, and the effect might be insignificant. An alternative is to correct calculation results through statistical methods, which may improve the calculations significantly in a simple and fast way and simplify the prediction of new compounds [9][10][11][12][13][14][15][16][17][18]. The method is quite useful for improving functional molecule design and can guide synthetic chemists in choosing potential target compounds. In particular, machine learning methods have recently become a new option to solve wave function problems [12,19].
One first-principles method, hybrid density functional theory (DFT) has become very popular in recent years because of its efficiency and accuracy. With the introduction of exchange and correlation functionals, DFT costs much less than other high-level ab initio methods (such as MP2 and CI), and its accuracy can be as good as those methods. Nevertheless, DFT calculations need further improvement to achieve highly accurate results, especially for medium or large molecules [7][8][9]. The DFT-NEURON method from the Chen group combines neural networks and DFT methods, setting up a quantitative relationship between experimental values and DFT calculation results using a neural network to improve DFT calculation accuracy. The first application of this method was made to the heats of formation for 180 organic molecules; the root mean square errors were reduced from 21.4 kcal·mol −1 to 3.3 kcal·mol −1 for B3LYP/6-311 + G(d,p) calculations [9]. In the study, the neural network or machine learning method showed its substantial potential to improve the efficiency of first-principles calculations. The method has since been successfully applied to other properties [10][11][12][13][14][15][16][17][18], and the concept is applicable to other quantum chemical methods.
However, only a few reports investigate preprocessing molecular descriptors [14,15] (the inputs of a neural network). These inputs are crucial for calculations because they greatly influence the capability of the network. Molecular descriptors can be obtained from the structure or properties of systems and can be diverse, including constitutional, topological, electrostatic, geometrical, and quantum chemical descriptors [20]. Without a selection procedure, molecular descriptors are usually selected subjectively according to the knowledge and experience of researchers, who may overlook very important information related to the quantity of interest or inadvertently overlay this information with noise. Chemists may think some descriptors are trivial when they are actually critical for the statistical calculations. With hundreds of molecular descriptors, it is difficult to make prudent choices relying only on intuition and experience. Therefore, in this study, we introduce SOFMNN clustering analysis and correlation analysis to refine molecular descriptors as the inputs of neural networks.
Nitric oxide (NO) performs significant physiological functions in human life processes [21][22][23][24][25][26][27][28][29][30]. The highly active free radical NO must be carried by a linear molecular precursor, so NO homolysis (formation/breaking of the bond between NO and the rest of the molecule) BDE is of interest for the medicinal study of NO-release diseases. Because the experiments are complicated, homolysis BDE of NO carrier molecules is difficult to measure with high accuracy. Recently, the Cheng group has focused much effort on measurements of homolysis BDE of the Y-NO bond in solution [31][32][33][34][35][36][37][38][39][40][41], which has greatly contributed to NO molecular carrier design in silico.
In this article DFT, SOFMNN and RBFNN methods are combined to improve the accuracy of the calculations of homolysis Y-NO BDE by DFT. The first section describes the neural network methods SOFMNN and RBFNN; the second section describes calculations using the DFT B3LYP method with two basis sets, 6-31G(d) and STO-3G, and the collection of the calculated homolysis BDE and relevant molecular descriptors of Y-NO bond; the third part discusses the calculation results from the DFT, SOFMNN and RBFNN methods, as well as classifying appropriate molecular descriptors by the SOFMNN method, setting up RBFNN and optimizing the non-linear model for both B3LYP results. In the last section, our conclusions are summarized.

Self-Organizing Feature Mapping Neural Network
Self-organizing feature mapping neural network (SOFMNN) was proposed by Kohonen in 1981 around the concept that an ordered arrangement of neurons could reflect certain physical properties of sensed external stimuli [42]. The main idea is to gradually reduce the interaction areas of neurons in the study process and strengthen the activation of central neurons per relevant learning rules, allowing the removal of neural connections to achieve a model of the real brain nervous system that "excited the nearby neurons while retaining the far-away ones." The structure of SOFMNN consists of input layers and competitive layers (aka mapping layers) (shown in Figure 1).

Figure 1.
The structure of self-organizing feature mapping neural network (SOFMNN).
A characteristic of SOFMNN is that the featured topology distribution of the input signal can be established in terms of an array of one-dimensional or two-dimensional processing units so that SOFMNN may extract features of the input signal. This is of great importance to correct first-principles calculations using the neural network because the neural network must extract precisely the essential information from inputs obtained by first-principles methods. Calculations over the past few decades have proved that primarily first-principles methods can capture the physical essence of molecules. These characteristics of SOFMNN are the strength of our DFT-SOFM-RBFNN method to achieve high-accuracy calculations. The procedures of the SOFMNN learning algorithm are as follows: (1) Network initialization The input layer and competitive layer are composed of R and S 1 neurons, respectively. The initial values of each neuron in the competitive layer start from a small random number IW ij 1,1 (i = 1,2,…, S 1 , j = 1,2,…,R), where IW ij 1,1 represents the connection weight between the i th neuron in the competitive layer and the j th neuron in the input layer. N c is set as the initial neighborhood, η as the initial learning rate, T as the maximum iterations, and N = 1 as the initial iteration.
(2) The winning neuron calculation A training sample p is selected randomly and the input of neurons in the competitive layer is calculated according to Equation (1) where n 1 i and b 1 i represent the output and the threshold value of the i th neuron in the competitive layer, respectively; p j stands for the value of the j th input variable of sample p.
If the k th neuron in the competitive layer is the winning neuron, it should meet the requirements in Equation (2): (3) Weight update The weights of the winning neuron k and all neurons in neighborhood N c (t) will be updated according to Equation (3): (4) Learning rate and neighborhood neurons update Once the weights of the winning neuron and the neighborhood neurons are updated, the learning rate and neighborhood neurons must be updated before the next iteration according to Equations 4,5: where the operator ⎡ ⎤ represents rounding up.

(5) Iteration
If the learning process is not finished, another sample will be randomly chosen to continue the calculation, and the iteration returns to step (2), or if N < T, then N = N + 1, and iteration also returns to step (2). Otherwise, iteration concludes.

Radial Basis Function Neural Network
In 1985, Powell proposed the radial basis function (RBF) method of multivariable interpolation [43]. In 1988, Moody and Darken came up with a neural network structure, i.e., RBFNN, which can approach any continuous function with various accuracies. RBFNN is a three-layered feed-forward network. The network structure is shown in Figure 2. The basic idea of RBFNN uses RBF as the "basis" of neurons in the hidden layer to construct the hidden layer space. Thus, input vectors can be mapped directly to the hidden layer space without weights between the input layer and hidden layer. Once the RBF central point is determined, the mapping relationship is determined. The mapping from the hidden layer space to the output layer space is linear, i.e., the output is the sum of linear weighted neurons in the hidden layer, where the weight is the adjustable parameter of the network. Generally, network mapping from input to output is non-linear, while the output is linear to adjustable parameters. In this way, the weight of the neural network can be solved directly from linear equations so that learning rate will improve significantly and local minimum problems will be avoided.
The specific steps of the learning algorithm of the RBFNN are as follows: (1) Determining the RBF center of neurons in the hidden layer The input matrix P and output matrix T for the training set can be described in Equation (6): where p ij represents the i th input variable of the j th training sample; t ij represents the i th output variable of the j th training sample; M is the dimension of the input variables; N is the dimension of the output variables; and Q is the number of samples in training set.
The corresponding RBF center of Q neurons in the hidden layer is: (2) Determining the threshold value of neurons in the hidden layer The corresponding threshold value of Q neurons in the hidden layer is: , , , where b 11 = b 12 = ··· = b 1Q = 0.8326/spread, spread is the expanding coefficient of RBF.
(3) Determining weights and threshold values between the hidden layer and the output layer Once the RBF center and threshold value of neurons in the hidden layer is determined, the output of neurons in the hidden layer can be obtained by Equation (9): where p i = [p i1 , p i2 , ···, p im ] is the i th vector of the training set. And the matrix A is set to A = [a 1 , a 2 , ···a Q ]. The connection weight W between the hidden layer and the output layer is set as Equation 10:  (10) where w ij represents the connection weight between the j th neuron in the hidden layer and the i th neuron in the output layer.
If the threshold value b 2 of N neurons in the output layer is obtained Equation (11): , , , The weight W and threshold value b 2 between the hidden layer and output layer can be obtained by the linear Equation (12)

Data Set
In total, 98 organic molecules were used in the dataset for this study. Six molecules were added to the set of molecules used in our previous study [15] to validate the predictive ability of the neural network. Chemical elements in these molecules include H, C, N, O, F, Si, P, S, Cl and Br, and the number of non-hydrogen atoms in the molecules varies from 8 to 25 for these small or medium molecules. The final RBFNN models are attained according to the relatively stable estimation results of the testing set. Once the neural network is established, the calculations for these data require negligible time to perform, which shows the efficiency of this correction approach.

Molecular Descriptor Calculations
Molecular descriptors should represent typical characteristics of molecules and closely correlate to the quantity of concern. Because we intended to develop an easy-to-use method, simple descriptors were favored. Because the DFT calculation results are corrected and performed for each molecule, quantum chemical descriptors are ready-made. In addition to quantum chemical descriptors, constitutional descriptors such as the molecular weight, number of atoms, and number of electrons are also better descriptors due to their ease of generation. All DFT calculations were performed using the Gaussian03 software package [44]. The DFT calculation for homolysis BDE and twelve molecular descriptors by hybrid functional method B3LYP with 6-31G(d) were described in [15], and the corresponding calculation results by B3LYP/STO-3G method are shown in the Supplementary materials.

Calculating Y-NO Homolysis BDE with DFT Method
The homolysis BDE are calculated using DFT B3LYP method with two basis sets, 6-31G(d) and STO-3G. The minimal basis set STO-3G consists of 1 function for H, 5 functions for Li to F and 9 functions for Na to Cl; the basis set 6-31G(d) consists of 2 functions for H, 15 for Li to F and 19 function for Na to Cl. So for most organic molecules, STO-3G only contains less than half of 6-31G(d) basis functions. Then with the STO-3G basis set much time can be saved during DFT calculations.   By analyzing the molecular descriptors, we find that, in the B3LYP/6-31G results, the charge on the N atom of NO does not change with the charge on Y. The electronegativity of Y itself is most likely the key factor determining the amount of charge on N because the charge on the N atom only changes with the type of Y atoms. Neither the structure of molecular fragments that connect to Y nor the amount of charges on Y has much effect on the charge value of N.  Table  2  Structural analysis indicates that the conformation of the molecules and functional groups on the aromatic rings are shown to affect the homolysis BDE. Conformational effects reported by the Guo group show that syn and anti conformations induce BDE differences between isomers [45]. In our data set, most molecules contain aromatic rings and functional groups that include −CH 3 , −CH 3 O, −Cl, −Br and −NO 2 , among which −CH 3 and −CH 3 O are electron-donating groups and −Cl, −Br and −NO 2 are electron-withdrawing groups. The electron-donating groups on the meta-and para-positions of the benzene ring decrease the BDE of the Y-NO bond, while electron-withdrawing groups on these positions increase the BDE. Electron-donating groups at the ortho-position decrease the BDE of the Y-NO bond, but the effects of electron-withdrawing groups are stronger than electron-donating groups. The substitution effects are smaller for molecules with multiple rings (e.g., indole, dibenzo-azepine) than for benzene rings due to the longer distance between the substituent group and the Y-NO bond.
To study the correlation between the molecular descriptors and the Y-NO experimental homolysis BDE, a correlation analysis was performed. The results show that the B3LYP/6-31G(d)-calculated homolysis BDE values (ΔH homo ) are the most relevant to the experimental homolysis BDE and the correlation coefficient is 0.64, which proves that DFT calculations indeed capture the essence of physics. This is the reason that DFT-calculated homolysis BDE (ΔH homo ) are considered the primary descriptor. The correlation coefficients of other strong related molecular descriptors are as follows: E HOMO  The coefficient shows that the calculated ΔH homo by B3LYP/STO-3G has a weaker relationship with the experimental homolysis BDE than that of B3LYP/6-31G(d) due to its poor accuracy. In addition, it can be seen that the types of molecular descriptors strongly related with the experimental homolysis BDE do not change greatly. This suggests that the B3LYP/STO-3G calculation results essentially agree with the B3LYP/6-31G(d) results, but with large deviations.
The deviations of all the methods are listed in Table 1. The total MADs for two basis sets 6-31G(d) and STO-3G are 4.45 and 10.53 kcal·mol −1 , respectively. For the results of B3LYP/6-31G(d), the deviations between the DFT calculated and experimental homolysis BDE for all four types of carriers span a wide range, from −17.17 to 7.91 kcal·mol −1 . The calculated homolysis BDE vary according to the type of Y atoms in the Y-NO bond, and the deviation distributions also change with different types of Y-NO bond. The DFT-calculated homolysis BDE of the S-NO bond carrier molecules agree best with the measured values: the MAD is 1.83 kcal·mol −1 . The DFT calculation results are in particularly good agreement with the experimental data for molecules with amino acid groups (78-84), although the introduced amino acid groups make these molecules the largest in the dataset, and the MAD is only 1.46 kcal·mol −1 . This may be good news for theoretical studies on the mechanism of physiological release of NO in the human body. The MAD of DFT-calculated homolysis BDE for N-NO bond carrier molecules is 4.75 kcal·mol −1 , which is much larger than that of the S-NO bond carrier molecules; and the deviation distribution shows two extremes: deviations of 20 molecules exceed 7 kcal·mol −1 , whereas the deviations of the other 27 molecules are less than 3 kcal·mol −1 (There are 53 N-NO bond molecules in total). In addition, for some calculations, the homolysis BDE are dramatically underestimated (the absolute deviations of the DFT calculated homolysis BDE exceed 10 kcal·mol −1 ). The deviations of the calculated and experimental homolysis BDE for the O-NO bond carrier molecule homolysis BDE are relatively large, and the MAD is 5.01 kcal·mol −1 . The deviations for the C-NO bond homolysis BDE of the carrier molecules are the largest and all of the homolysis BDE are underestimated (MAD is 7.41 kcal·mol −1 ). This is consistent with the results of the Guo group [45], who found that the DFT calculations in a vacuum tend to underestimate the homolysis BDE of Y-NO bond carrier molecules. The results from B3LYP/STO-3G are obviously worse than those from B3LYP/6-31G(d), especially for the S-NO and O-NO bond molecular carriers. The results for the S-NO homolysis BDE have the largest MAD (20.67 kcal·mol −1 ), which is exactly opposite to the results from 6-31G(d), which has the smallest MAD among the four types of Y-NO bonds. This indicates that the polarization function may be obligatory for the S-NO BDE calculations.

SOFMNN Calculation Results
Descriptor selection is a significant step for neural networks, but reports on this topic are scarce [14,15]. In this study, twelve molecular descriptors for each molecule are used. Twelve may seem a small number, but if we exhaust all combinations of these descriptors, there are 1 2 1 2 12 12 12 4095 options. Therefore, if there are hundreds of descriptors (n), it is impossible to consider all of the combinations (2 n − 1) without the appropriate methods. The SOFMNN clustering analysis is able to classify similar molecular descriptors into a group; one or several typical molecular descriptors will be selected to represent the group according to the correlation analysis for descriptors and experimental values, considerably reducing the number of descriptors. Through SOFMNN clustering analysis and correlation analysis, subjective selection and bias on molecular descriptors can be avoided and molecular descriptors with the same properties will not be chosen repeatedly. Signals extracted from molecular descriptors can stand out from the noise; therefore, the neural network is more efficient and accurate than the neural network with full molecular descriptors.
SOFMNN clustering analysis for the molecular descriptors is illustrated by the B3LYP/6-31G(d) calculation results. When twelve molecular descriptors (ΔH homo , Q Y , Q N , Q O , N X , µ, α, E HOMO-1 , E HOMO , E LUMO , E LUMO+1 and ΔE) are taken as the input of SOFMNN, the input layer of SOFMNN contains twelve neurons, and a 6 × 4 pattern is adopted in the network structure of the competitive layer (Figure 3a). The number of neurons grows gradually from the bottom left to the top right, i.e., the number of the neurons at the bottom left is 1, and the number on the top right is 24. In Figure 3b, the blue neurons are those that won in competition, and the numbers refer to how many times the neuron has won. The clustering analysis results are reported in Table 2. When the training step is set to 10, ΔH homo , N X and α belong to one group, µ itself becomes one group, and all other molecular descriptors are clustered into one group. Similarly, when the training step is set to 30, 50, and 100, the preliminary clustering is performed for the descriptors, but the cluster is not accurate enough because the training steps are not sufficient and the results are not stable. When the number of training steps increases to 1,000, the calculated results of SOFMNN only show small differences when compared to 200 or 500 training steps, i.e., when the training step reaches 500, the clustering results by SOFMNN become steady, and the corresponding clustering number of the twelve molecular descriptors computed by SOFMNN are 16, 13, 1, 19, 12, 8, 24, 19, 19, 20, 20 and 1, respectively. This suggests that the SOFMNN classifies twelve descriptors into eight groups in total: ΔH homo , Q Y , N X , µ and α as five independent groups, Q N and ΔE as one group, Q O , E HOMO-1 and E HOMO as one group, and E LUMO and E LUMO+1 as another group. For groups with more than one descriptor, selection is made according to the correlation analysis results, so Q N , E HOMO and E LUMO , are chosen because of their higher correlation coefficient. These three descriptors, together with the five independent molecular descriptors (ΔH homo , Q Y , N X , µ and α) are chosen to represent the major characteristics of the Y-NO bond homolysis BDE and are taken as the final inputs of RBFNN. With the same procedure, the nine descriptors ΔH homo , Q Y , Q N , E HOMO , N X , µ, α, E LUMO and ΔE obtained by B3LYP/STO-3G are selected for the final inputs of RBFNN. In the SOFMNN calculation, only one neuron wins each time. Its weight and the corresponding weights of its peripheral neurons are adjusted synchronously, and the weights of the neurons change in favor of winning the competition. At the same time, SOFMNN reduces the neighborhood area gradually and starts to repulse its neighbor neurons. The mode combining cooperation with competition allows SOFMNN to acquire superior performance and significantly improves the learning ability and generalization of the neural network. After running the SOFMNN program, the resulting labels are likely different because the excited neurons are different each time, but the final clustering result does not change no matter which neuron is excited.

RBFNN Calculation Results
As mentioned above, eight descriptors (ΔH homo , Q Y , N X , µ, α, Q N , E HOMO and E LUMO ) for B3LYP/6-31G(d) and nine descriptors (ΔH homo , Q Y , Q N , E HOMO , N X , µ, α, E LUMO and ΔE) for B3LYP/STO-3G selected by SOFMNN clustering analysis and correlation analysis were taken as the RBFNN final inputs. These inputs of RBFNN must be normalized to make the learning and training process easier because the magnitude of the raw data may vary widely if very different raw data are input directly into the neural network. Data with large fluctuations might monopolize the RBFNN learning process, and the network may fail to reflect small changes in data.
In RBFNN, the value of spread is increased from 0.2 to 3 by the constant with a variation of 0.2. The optimal neural network output can be decided during the variation of spread. For DFT-RBFNN and DFT-SOFM-RBFNN methods, the best results of regression estimation are achieved when the values of spread are 0.6 and 0.8, respectively. Figure 4 shows the histograms of deviations between the computed homolysis BDE values and the experimental BDE values. The Figure 4 If we use fewer molecular descriptors, how many descriptors should we choose and which ones should be chosen? These questions can be answered by SOFMNN coupled with correlation analysis. Figure 4(c,f) shows the histograms of deviations for B3LYP with two basis sets corrected by RBFNN with the SOFMNN classified descriptors as inputs, and the method is denoted DFT-SOFM-RBFNN. Calculations are performed to improve DFT calculations, employing these selected descriptors as inputs of RBFNN. In Figure 4(c,f), the deviations of DFT-SOFM-RBFNN are further improved compared with DFT-RBFNN, although the difference is slight. The ranges of deviations are −1.2-1.2 kcal·mol −1 and −1.2-1.1 kcal·mol −1 and the MADs are 0.15 and 0.18 kcal·mol −1 for the 6-31G(d) and STO-3G basis sets, respectively. When regarding only improvements to the accuracy, the significance of SOFMNN is unclear because DFT-RBFNN is already sufficiently accurate, but SOFMNN increases the calculation efficiency and solves mass descriptor problems, very well when many descriptors are used. Although the improvement of accuracy compared with DFT-RBFNN is slight, chemical accuracy (1 kcal·mol −1 ) is achieved for all 92 Y-NO homolysis BDE calculation results, which is a very important result. Surprisingly, the homolysis BDE by B3LYP/STO-3G after correction are comparable to those by B3LYP/6-31G(d), even with the raw MAD (10.53 kcal·mol −1 ) of STO-3G being much worse than that (4.45 kcal·mol −1 ) of 6-31G(d). With the minimal basis set STO-3G, we can save much time and many resources while retaining the ability to perform calculations for large molecules. During this study, we considered the extrapolation of the method to larger molecules and molecules with more types of elements as well as to different Y-NO bonds in addition to the four types in this dataset, so we preferred descriptors that were independent of the elemental types. After establishing the DFT-SOFM-RBFNN method, some molecules were used to test the ability to extrapolate. The structures of the molecules and the calculation results are shown in Table 3. Six extrapolation test molecules contained Si-NO bonds and fluorine, which were not included in original dataset. The DFT-SOFM-RBFNN results show that deviations of the DFT calculations for test molecules are reduced dramatically and reach the same accuracy as the 92 organic Y-NO bond molecules, particularly for B3LYP/STO-3G calculation results with large calculation deviations, which gives us more confidence in the predictive ability of this method. The excellent performance of the DFT-SOFM-RBFNN method benefits from the combined advantages of all the methods. DFT molecular descriptors represent the physical essence of the homolysis BDE; the RBFNN is independent of the initial weights and thresholds, converges quickly to global minima, has few parameters that must be adjusted, shows great capacity for reverse redundancy and fault tolerance and possesses a built-in nonlinear model capable of carrying out calculations with a partial response. As a result of the SOFMNN cluster analysis, the significant features of the descriptors have been discovered and the number of descriptors can be narrowed down, so that the accuracy and efficiency of RBFNN calculations are improved. The combined DFT-SOFM-RBFNN method improves the DFT calculations and develops new applications in chemistry for SOFMNN and RBFNN.
To compare the DFT-SOFM-RBFNN calculations with more sophisticated DFT calculations with a larger basis set, the M06-2X/6-311 + G(2d,p) calculations with or without the solvent effect are performed for the four smallest molecules from each type of Y-NO molecule. The results are listed in Table 4. As shown in Table 4, the BDE calculations are improved by the M06-2X/6-311 + G(2d,p) calculation compared to the B3LYP/6-31G(d) calculations, but high accuracy cannot be reached. The solvent effect by the polarizable continuum model (PCM) on the BDE is adopted. The results show that the solvent effects are small (<2 kcal·mol −1 ) and uncertain for improvement of BDE calculations, and the chemical accuracy cannot be reached even when considering the solvent effects. This further exhibits the high efficiency and accuracy of the proposed DFT-SOFM-RBFNN method.