QSAR Assessing the Efficiency of Antioxidants in the Termination of Radical-Chain Oxidation Processes of Organic Compounds

Using the GUSAR 2013 program, the quantitative structure–antioxidant activity relationship has been studied for 74 phenols, aminophenols, aromatic amines and uracils having lgk7 = 0.01–6.65 (where k7 is the rate constant for the reaction of antioxidants with peroxyl radicals generated upon oxidation). Based on the atomic descriptors (Quantitative Neighborhood of Atoms (QNA) and Multilevel Neighborhoods of Atoms (MNA)) and molecular (topological length, topological volume and lipophilicity) descriptors, we have developed 9 statistically significant QSAR consensus models that demonstrate high accuracy in predicting the lgk7 values for the compounds of training sets and appropriately predict lgk7 for the test samples. Moderate predictive power of these models is demonstrated using metrics of two categories: (1) based on the determination coefficients R2 (R2TSi, R20, Q2(F1), Q2(F2), RmTSi2¯) and based on the concordance correlation coefficient (CCC)); or (2) based on the prediction lgk7 errors (root mean square error (RMSEP), mean absolute error (MAE) and standard deviation (S.D.)) The RBF-SCR method has been used for selecting the descriptors. Our theoretical prognosis of the lgk7 for 8-PPDA, a known antioxidant, based on the consensus models well agrees with the experimental value measure in the present work. Thus, the algorithms for calculating the descriptors implemented in the GUSAR 2013 program allow simulating kinetic parameters of the reactions underling the liquid-phase oxidation of hydrocarbons.


KINETIC CLASSIFICATION OF ANTIOXIDANTS DEPENDING ON THE DEACTIVATION MODE OF OXIDATION
1. Antioxidants terminating the chains by their reactions with peroxyl radicals (phenols, naphthols, hydroquinones, aromatic amines, aminophenols, diamines) resulting in the formation of radical intermediates with low activity.
Such antioxidants are efficient at very low concentrations of dioxygen and in solid polymers).
4. Metal-deactivating antioxidants (diamines, hydroxy acids, and other bifunctional compounds) interacting with metal ions and forming the complexes inactive towards hydroperoxides.
6. Inhibitors with combined action. Such a mechanism is realized when (1) the inhibitor molecule has two and more functional groups undergoing their own reaction; and (2) the original inhibitor and its products of its transformation possess the inhibitory activities through different inhibition modes (e.g., the phenolic group of phenol sulfide reacts with peroxyl radical whereas its sulfide group is reactive towards hydroperoxide). 7. Synergetic inhibition is implemented when two inhibitors mutually enhance their inhibitory effects (e.g., in the case of 'phenol + sulfide' mixtures, in which phenol reacts with the peroxyl radical and sulfide reduces the degenerate chain branching by non-radical decomposition of hydroperoxide).
In the aspect above, a quantitative study of the antioxidant properties of natural and synthetic substances in various model systems is an important task. Assessing the antioxidant activity of individual substances and compositions may be performed with various physicochemical and biochemical methods is used [18][19][20]. This can be done according to their influence on the oxygen absorption (lipid peroxidation, aromatic hydrocarbons, secondary and tertiary alcohols, oxidation of crocin, chemiluminescence with luminol, oxidation of Rphycoerythrin, sensitivity of erythrocytes to hemolysis, recovery of the activity of iron ions, lipid peroxides). Some authors measure the antioxidant activity of enzymes, e.g., ascorbate-S4 peroxidase, glutathione reductase, dehydroascorbate reductase and mono-dehydroascorbate reductase. Herewith, in some cases, the antioxidant status of the organism correlates with the intensity of the pathology, e.g., the growth of malignant tumor cells MK-1.
Despite the diverse photometric, chromatographic and electrochemical methods, a study of the antioxidant activity (AOA) of individual compounds usually starts from the methods of chemical kinetics. In these methods, AOA compounds are involved to the model reactions such as oxidation of aliphatic and alkyl-aromatic hydrocarbons, fatty acid esters. Here, the    PRESS is the sum of the squares of the prediction errors (predictive sum of squares);

PARAMETERS FOR ASSESSING THE DESCRIPTIVE AND PREDICTIVE POTENTIAL OF QSAR MODELS
TSS is the total sum of squares (is sum of squared deviations from the data set mean); and are the total sum of squares of the external set calculated using the training set mean and external set mean, respectively.

CALCULATION OF STRUCTURAL DESCRIPTORS
Here is a description of the GUSAR program necessary to understand the text of the article.
A detailed description of the ideology of calculating descriptors and constructing QSAR models using this program is given in the articles listed in the list of references and in the site http://www.pharmaexpert.ru (http://www.pharmaexpert.ru/passonline/downloads/articles/Filimonov-and-Poroikov-Chapter-6.pdf).
From a general point of view, the assessment of the activity of an organic molecule in the GUSAR2013 program is carried out according to the equation (1): where a0, a1,… different functions of organic molecule's structure S.

S8
In classic QSAR methods, the functions f1(S), f2(S), … represent physical-chemical parameters or other quantitative characteristics of molecular structure, and the coefficients a0, a1,... are determined using multiple linear regression (MLR), partial least squares (PLS) analysis, or support vector regression (SVR), etc. [1]. QSAR methods based on the similarity between a certain molecule Si with known biological activity and the molecule S use the value fiðSÞ of their similarity.
In the GUSAR 2013 program, the description of the structure and the calculation of the regression coefficients for the further construction of QSAR models is based on the use of two types of substructural descriptors of atomic neighborhoods: MNA (Multilevel Neighborhoods of Atoms) and QNA (Quantitative Neighborhoods of Atoms) [39,40]. They are automatically deduced from the matrices of molecular connectivity, standard ionization potentials (IP) and electron affinities (EA). The QNA descriptors are defined by two functions, P and Q. The P and Q values for each atom i are calculated using the following formulae [39]: where k is the remaining atoms in the molecule, IP is the first ionization potential, EA is the electron affinity for each atom (in eV), and C is the connectivity matrix for the molecule as a whole [46]. The standard values IP and EA of atoms in a molecule were collected from the literature. Although the value P-Q can be considered by convention as the partial atomic charge, where  is the chemical potential, in general the P and Q values are not the estimate of partial atomic charges or hardness, etc.
Any atom influences the others, although the influence decreases with the increase of the distance between them. The algorithm of the QNA descriptor calculation is really very simple due to the uselessness of the matrix Exp(-1/2C) itself, the fact that the product of Exp(-1/2C) by a vectoris needed only, and the fact that the matrix C consists of 0 and 1 only. A detailed description of QNA descriptors is represented in [45].
Thus, the QNA descriptors are calculated taking into account the relationships between all atoms of the structure. These values describe each atom of the molecule but, at the same time, depend on the structure of the molecule as a whole [45,46]. In the future, based on the functions P and Q, the fi(S) functions are calculated. Each function of the structure of the S9 molecule fi (S) is calculated according to equation (4) as the average value of the function gi (P, Q) for those m atoms of the molecule that have two or more immediate neighbors: Substitution of expression (5) into equation (1) and permutation of the sums allows one to obtain equation (6): Thus, in accordance with equation (6), the estimate of the parameter ypred for a molecule is the average of the predicted values for specific atoms in the molecule. Formally, QNA descriptors represent the structure of a molecule with only two descriptors (P and Q), in contrast to the many traditional descriptors used in QSAR.
However, the developers of the GUSAR program found that the P and Q values are highly correlated with each other (r = 0.903). Since the values of P and Q have different scales (standard deviations are 0.023 and 0.208, respectively), the developers of the GUSAR program carried out normalization to optimize the family of functions gi(P, Q). Normalization was performed by calculating mean values (EP and EQ), standard deviations (DP and DQ), and correlation between P and Q values (RPQ): The orthonormal U and V have zero mean, unit variance, and they are uncorrelated [45,46].
The QNA values are the basic information for calculating the Chebyshev 2D polynomials.
where the integers u, v=0, 1, 2,...define the 2D Chebyshev polynomial degree. The final equation for estimate ypred using QNA descriptors is , ( 1 0 0 (10) S10 Thus, the regression equations constructed in the GUSAR 2013 program take into account both the specificity and physicochemical properties of each atom entering the training set [41, 43-46, 66, 67]. However, QNA descriptors cannot be physically interpreted due to the peculiarities of their calculation. In this regard, they are not explicitly displayed under calculations.
The MNA descriptors are computed using the PASS algorithm (Prediction of Activity Spectra for Substances) [39,40], which predicts approximately 6,400 "biological activities" with an accuracy threshold of an average prediction of at least 95%. These descriptors are generated based on the structural formulae of chemical compounds without using any precompiled list of structural fragments [39][40][41]46]. The authors of the GUSAR 2013 program report that "MNA-descriptors are based on the molecular structure representation, which includes hydrogens according to the valences and partial charges of other atoms and does not specify the types of bonds." They are generated as "a recursively defined sequence: • zero-level MNA descriptor for each atom is the mark A of the atom itself; • any next-level MNA descriptor for the atom is the substructure notation A (D1D2…Di…), where Di is the previous-level MNA descriptor for i-th immediate neighbor of the atom A.
The neighbor descriptors D1D2…Di… are arranged in a unique manner. This may be, for example, a lexicographic sequence. MNA descriptors are generated using an iterative procedure, which results in the formation of structural descriptors that include the first, second, etc. neighborhoods of each atom. The label contains not only information about the type of atom, but also additional information about its belonging to a cyclic or acyclic system, etc. For example, an atom that does not enter a ring is marked with a "-".
Based on the MNA descriptors using B-statistics, calculated in the PASS program, the biological activity spectrum of a chemical compound is predicted [35,36,[42][43][44].
The output of the PASS program is the probabilities of the activity (Pa) and of inactivity (Pi) of each prognostic result. The difference between these two values (Pa-Pi) for a randomly selected subset of predicted activities is used as independent variables for regression analysis in GUSAR. GUSAR2013 incorporates a PASS version that pedicts 4130 types of biological activity. The developers of the GUSAR 2013 program report that the list of predictable biological activities currently includes 501 pharmacotherapeutic effects, 3295 mechanisms of action, 57 adverse and toxic effects, 199 metabolic terms, 49 transporter proteins and 29 activities related to gene expression [46]. The average accuracy of a reliable prediction of biological activity, calculated by leave-one-out cross-validation procedure is approximately S11 95% [68]. However, the regression equation constructed based on the MNA descriptors reveals the specificity of the action of the compound but does not explicitly reflects the physicochemical parameters of chemical compounds [46].
In addition, the GUSAR 2013 program calculates the QSAR descriptors of an entire molecule such as topological length, topological volume, lipophilicity, and physicochemical descriptors (numbers of positive and negative charges, number of donors and acceptors of the hydrogen bond, number of aromatic atoms, molecular weight and number of halogen atoms) [39,40]. Therefore, these parameters were added to the QNA descriptors. The topological length of a molecule was calculated as the maximal distance between any two atoms and the volume of a molecule as the sum of each atom's volume, 4/3R 3 , where R is the atomic radius.
The authors of the GUSAR 2013 program report that "in GUSAR, the scale of QNAand PASS-based descriptors ranges from −1 to 1. Therefore, no additional normalization is required for these types of descriptors. Only whole-molecule descriptors are normalized using a standard Z-score normalization procedure" [40].
It should be noted that the program is able to construct QSAR models both relying solely on one of these types of descriptors, and on their combination in terms of the consensus approach [42][43][44]. At the same time, based on the consensus approach methodology, models for

S12
The SCR and RBF-SCR methods are the most preferable. The SCR method is correctly applied to modeling compounds with a rather high degree of similarity. The other two methods of selecting the optimal number of descriptors show good results when modeling structurally dissimilar compounds.
It was previously shown [39-44, 46, 66, 67] that self-consistent regression (SCR) can be successfully applied to various QSAR problems. The SCR method is resistant to noise in the data and allows deleting the variables that poorly describe the target value. This is a regularized method of the least squares. Independent parameters a are calculated in this method according to the equation (4) [43]: where a is the regression coefficient, n is the number of objects, yi is the response value of the i-th object, m is the number of independent variables, xik is the value of the k-th independent variable of the i-th object, ak is the k-th value of the regression coefficients, and vk is the k-th value of the regularization parameters. Equation (4) hasthefollowingsolution: where X T is the transposed regression matrix X, and V is the diagonal matrix of the regularization parameters. The regression coefficients obtained from the SCR reflect the contribution of each particular descriptor (variable) to the final equation. The higher the absolute value of the coefficient, the greater its contribution. Thus, the regression coefficients obtained after the SCR can be used to weight the descriptors (variables) depending on their importance.
The second method used implemented in the GUSAR 2013 program for selecting the optimal number of descriptors is the interpolation method for radial basis functions RBF [39].
The authors of the GUSAR 2013 program reports [39] that, unlike the RBF network, this method uses each input variable as a center of gravity. The learning process is performed on all input variables of the training set. As can be seen from equation (5), the approximating function y(x) in the case of the RBF interpolation is represented as the sum of N radial basis functions, each of which is related to another center xi and weighted by the corresponding coefficient wi.
If the points xi are different then the interpolation matrix Φ in the above equation is nonsingular. The weights w are calculated as:

S13
Assessing the weights is based on the simple least squares method [39].
The RBF-SCR method is the third tool of the GUSAR 2013 program for selecting the optimal number of descriptors. It has a 3-step algorithm: 1) selecting descriptors using the SCR method; 2) calculating the radial basis functions using the weighted coefficient of SCR as a criterion of similarity; 3) calculating the weighting coefficients RBF by the least squares.
The RBF-SCR method can be expressed as [39]: where a is taken from equation (4). Weights ai are a new elements as compared to equation (5).
The RBF and RBF-SCR interpolation is based on a linear radial basis function that allows modeling a variety of training sets with a high level of dissimilarity between the objects.
Additionally, the GUSAR program allows visualizing the contribution of each atom into the predicted value [37][38][39][40][41][42][43][44][45][46]. This capability is implemented in the QSAR models based on the QNA descriptors and, accordingly, in the consensus combination of the QSAR models designed in different modes. It opens opportunities to identify "strong" and "weak" points in the biologically active molecules and, consequently, to rationalize the conclusions about the replacement of certain fragments upon molecular design directed to enhancing/weakening the target property.

CONSTRUCTING OF THE QSAR MODELS
The QSAR models were designed in the GUSAR 2013 program as follows. To describe the structures of compounds within the program, two types of atom-centered descriptors were used, viz.substructural MNA, electrotopological QNA, and, additionally, three descriptors of the whole molecule (topological length, topological volume, and lipophilicity).
The optimal set of the descriptors for constructing particular regression equations was automatically selected by the self-consistent regression [39] and sliding control procedures [37,39,40,[42][43][44][45][46]. The GUSAR 2013 program allows constructing any single QSAR models and consensus models based on them. In this study, we use the consensus approach to construct the QSAR models. This allows reducing the variability of the predictions. Consensus models were designed in GUSAR 2013 automatically based on the principle of common similarity of particular regression dependencies [37,[39][40][41][42][43][44][45][46].

S14
The final predicted values for lgk7 were calculated using a weighted average of the predictions from several selected QSAR models. Each model is based on a different set of QNA and MNA descriptors. Its predictions for each compound are weighted according to the similarity value as calculated during the applicability domain assessment. Note that each of these partial models involved by the consensus model was made independently based on either QNA or MNA descriptors. As a result, 9 consensus QSAR models were designed. These models included 140 partial models. However, not all of them had acceptable statistical parameters. To select the most predictive models, a 20-fold crosscheck was performed for each model. These

ASSESSMENT OF THE RANGE OF THE APPLICABILITY
To assess the applicability of models, GUSAR 2013 provides three different approaches based on similarity, leverage, and accuracy previously described in detail [43,46].

Similarity.
Using the Pearson correlation coefficients for each compound, we calculated the distances toward its nearest neighbors in the training set in the space of independent variables obtained after SCR. The compound is considered in the range of the model's applicability if the average value of these three distances is lower or equal to 0.7.
Leverage. The calculation of leverage allows estimating the contribution of each molecule to its own predicted value [43,46]: where x is the vector of descriptors of the tested compound and X is the matrix made up with rows corresponding to the descriptors of all the molecules of the training set [43]. The compound is considered out of the applicability range if its leverage is larger than 99 % in the distribution of the leverage values of the training set.

Accuracy degree (AD).
Here, the prediction of the applicability range for each compound is calculated based on the prediction error for the three most similar compounds in the test set relative to the training set as a whole [43,46]: In the present study, a threshold value of 1 was used for AD. S16    Replacing the H atom by OH in position R3 decreases AOA of componds V (Figure 1).

RESULTS
A similar effect is observed in the case of introducing the amino group. At the same time, introducing one or two methyl groups in R2 and R4 with the presence of OH in R3 enhances S33 AOA. The next introduction of methyl in R1 with the presence of two methyl groups in R2 and R4 and OH in R3 decreases lgk7. At the same time, in the case of amino group instead of OH in R3, the effect is opposite.    .