Synthesis and in Vitro Antioxidant Activity Evaluation of 3-Carboxycoumarin Derivatives and QSAR Study of Their DPPH• Radical Scavenging Activity

The in vitro antioxidant activities of eight 3-carboxycoumarin derivatives were assayed by the quantitative 1,1-diphenyl-2-picrylhydrazil (DPPH•) radical scavenging activity method. 3-Acetyl-6-hydroxy-2H-1-benzopyran-2-one (C1) and ethyl 6-hydroxy-2-oxo-2H-1-benzopyran-3-carboxylate (C2) presented the best radical-scavenging activity. A quantitative structure-activity relationship (QSAR) study was performed and correlated with the experimental DPPH• scavenging data. We used structural, geometrical, topological and quantum-chemical descriptors selected with Genetic Algorithms in order to determine which of these parameters are responsible of the observed DPPH• radical scavenging activity. We constructed a back propagation neural network with the hydrophilic factor (Hy) descriptor to generate an adequate architecture of neurons for the system description. The mathematical model showed a multiple determination coefficient of 0.9196 and a root mean squared error of 0.0851. Our results shows that the presence of hydroxyl groups on the ring structure of 3-carboxy-coumarins are correlated with the observed DPPH• radical scavenging activity effects.


Introduction
Antioxidants play important roles in preventing diseases induced by reactive oxygen species, which result in oxidative damage, including protein denaturation, mutagenesis and degenerative or pathological events, such as aging, asthma, and cancer. The diversity of structural characteristics in the natural and synthetic coumarins offers a vast field of research for new biological properties of these compounds.
Here we proposed to measure the antioxidant activity in vitro assay of eight 3-carboxycoumarin derivatives with different structural variations for modular replication by the quantitative 1,1-diphenyl-2-picrylhydrazyl (DPPH•) radical scavenging activity method. These is the first time that this measurement has been performed on these compounds, although a similar type of coumarins was reported by Lin et al. in 2008 [8].
Quantitative Structure-Activity Relationship (QSAR/QSPR) methodologies are one of the most powerful tools for describing the relationships between biological activity and the physicochemical characteristics of molecules. Current literature demonstrates that almost every area of chemical and life sciences, as well as technology, utilizes quantitative structure-activity/property relationships (QSAR/QSPR) to accelerate product development and increase efficiency. The designs of pharmaceuticals, agrochemicals, and consumer products as well as the assessment of their toxicity and environmental impact have become major areas of application of QSAR/QSPR techniques, whose methods also penetrate into relatively new applications such as materials science and nanotechnology. In terms of methodology development the new trend is the integration of QSAR/QSPR with related computational methods such as virtual screening and molecular dynamics. Such a synergy offers unique opportunities and heralds a new era of computer-aided molecular design [9]. QSAR/QSPR modeling usually consist of four main operations: calculating or measuring a pool of descriptors or other input variables; choosing a small subset of these descriptors that are relevant to the biological activity being modeled (in some cases this step may not be required); generating the often nonlinear relationship between the descriptors and the global material property; and validating the model to assess its reliability, robustness, predictivity, and domain of applicability [10]. Almost all QSPR modeling methods involve some sort of regression. This can be simple least-squares, multiple linear regression (MLR) or, where the structure-property relationship is not linear, a polynomial, bilinear, or neural network method. The simplest QSPR modeling method is known as multiple linear regression, It assumes that the property being modeled is a linear function of the descriptors [11]. To develop a QSAR, a more significant number of compounds is required to develop a meaningful relationship. An often asked question is "how many compounds are required to develop a QSAR?" There is no direct and simple response to this question other than "as many as possible!" To provide some guide, it is widely accepted that between five and ten compounds are required for every descriptor in a QSAR [12]. This does suggest that a one descriptor regression-based QSAR could be developed on five compounds. This is possible, but is very reliant on issues such as data distribution and range. Ideally "many more" compounds are required to obtain statistically robust QSARs, with some modelling techniques being considerably more data hungry than regression analysis. In our case, we have only eight compounds whose biological activities have been determined experimentally in our laboratory.
Molecular descriptors are formal mathematical representations of a molecule, obtained by a well-specified algorithm, and applied to a defined molecular representation or a well-specified experimental procedure: the molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment. A general consideration about the use of molecular descriptors in modeling problems concerns their information content. This depends on the type of molecular representation used and the defined algorithm for their calculation. There are simple molecular descriptors derived by counting some atom types or structural fragments in the molecule, as well as physicochemical and bulk properties such as, for example, molecular weight, number of hydrogen bond donors/acceptors, number of OH-groups, and so on. Other molecular descriptors are derived from algorithms applied to a topological representation. These are usually termed topological, or 2D-descriptors. Other molecular descriptors are derived from the spatial (x, y, z) coordinates of the molecule, usually called geometrical, or 3D-descriptors; another class of molecular descriptors, called 4D-descriptors, is derived from the interaction energies between the molecule, imbedded into a grid, and some probe. Single indexes derived from a molecular graph are called topological indexes. These are numerical quantifiers of molecular topology that are mathematically derived in a direct and unambiguous manner from the structural graph of a molecule, usually an H-depleted molecular graph. On the other hand many of those descriptors are based directly on the results of quantum-mechanical calculations or can be derived from the electronic wave function or electrostatic field of the molecule [13]. Since the electrophilicity index is a chemical reactivity descriptor and its definition has strong foundation from the density functional theory [14,15], it is appropriate to make use of this descriptor in the QSAR parlance. Recently the electrophilicity index has been used as a possible descriptor of biological activity confirming the fact that the electrophilicity properly quantifies the biological activity. Although there is no one-to-one agreement between AM1 and B3LYP values, the B3LYP method in general provides better estimates of biological activity when compared to the corresponding AM1 values [15]. Within the density functional theory framework some quantum chemical descriptors such the softness, chemical potential and electrophilicity index, where used here because of the good correlation they have shown in the prediction of radical scavenging antioxidant activity [16][17][18][19].
Genetic Algorithms (GA) are powerful computational tools that have been used in many areas of investigation because of their reliable mathematical models. This method is based on the mechanism of evolution of species, the higher descriptor weights (genes) the more preserved in the mathematic model, while the lower weights are eliminated. In this manner, the best mathematical models which represent the observed biological activity (phenotype) are obtained [20,21]. Furthermore, Artificial Neural Networks (ANN) is a computational tool used in the rationale drug design. ANN tries to simulate the human brain mechanism. In this method the basic unit is the neuron and the interconnection of all of them forms the architecture of the neural network. There is a variation of this method called back propagation ANN as well. In this, the output of the network is compared to the real value and then the network weights are adjusted in order to ensure that the error is minimum. This type of neural network is the most frequently used to develop of QSAR and QSPR studies [22,23].

Scheme 1.
Reaction for the formation of coumarins.

DPPH• Radical Scavenging Activity
Antioxidant compounds play an important role as a health-protecting factor. The interaction of the examined compounds with the stable free radical DPPH• was studied. Results of the assays are summarized in Figures 1-3. Compounds C1 and C2 showed the highest radical scavenging activity ( Figure 1). For both compounds the interaction was time and concentration dependent (Figures 2 and 3). The time course of DPPH• interaction is affected by various concentrations. In general, this interaction expresses their ability to scavenge free radicals [27,28]. Trials of discoloration of DPPH• at 60 min with different concentrations of compounds C1 and C2 in order to verify the dose-effect of the concentration of these compounds on the entrapment of the DPPH• radical [29] are shown below.

Computational Details and Results
A conformational study was performed over the eight coumarins (Table 1) using PM3 semi-empirical method as implemented in the SPARTAN′08 code [30,31]. The structures of all conformers of minimum energy were fully optimized without symmetry constrains within the density functional theory methodologies and the resulting ground states were characterized via frequency analysis. In the present work, we have used the hybrid B3LYP [32] functional and the 6-31+G (d,p) basis set [33]. We have included the influence of DMSO solvent using the SMD solvation model [34] implemented in the Gaussian 09 program [35]. Molecular descriptors of all optimized structures were calculated from the DFT context and the DRAGON05 program [36]. This software includes 20 families of descriptors in the code. Here, we have selected group account, geometrical and molecular property families. These families include a total of 257 descriptors but DRAGON program only gave us 73 descriptors based on the molecular characteristics of our compounds. We calculated the correlation matrix of these 73 descriptors the data analyzer within the Molegro Virtual Docker (MVD) software [37] and obtained nine non-correlated descriptors (see Table 2). The SPH (spherosity) is an anisometry descriptor calculated as a function of the eigenvalues of the covariance matrix calculated from the molecular matrix: The spherosity index varies from zero for flat molecules, such as benzene, to one for totally spherical molecules [38]. The Ui (unsaturation index) is a simple information index for unsatured bonds defined as: (2) where nDB, nTB and nAB are the number of of double, triple and aromatic bonds, respectively [36]. The Hy is the hydrophilic factor descriptor and it's calculated from Equation (3): where is the number of hydrophilic groups (-OH, -SH and -NH 2 ), nC represents the number of carbon atoms and nSK stands for all atoms excluding Hydrogen [39]. The AMR (molar refactivity) descriptor is calculated according to the Ghose-Crippen model, based on a group contribution method [40]. The ALOGP descriptor (Ghose-Crippen-Viswanadhan octanol-water partition coefficient) is calculated from the ALOGP model consisting of a regression equation based on the hydrophobicity contribution of 120 atom types [41]. The TPSA (Topological Polar Surface Area) descriptor originally proposed by Ertl P. et al. [42] is calculated from Equation (4): the C i term is the contribution of atom i to the molecular surface, n i is the frequency of the atom i in the molecule and the sum runs over all types of polar fragments. The TPSA calculation takes into account the contribution of the functional groups containing oxygen and nitrogen atoms to the polarization of the molecular surface as implemented in the DRAGON code [36]. Additionally we calculated quantum chemical descriptors from DFT (Table 2) as total energy (E), dipole moment, hardness (η), electrophilicity index (ω), chemical potential (µ), softness (S) and gap HOMO-LUMO. In this work E, corresponds to the ground state energy of our coumarin molecules and the dipole moment was calculated as implemented in Gaussian 09 [35]. The chemical potential (µ), which is widely used as a descriptor of chemical reactivity, indicates the escape tendency of the electrons and it's calculated from: (5) where E is the energy of the system and N is the number of electrons [14]. Here we used the finite difference approximation: (6) where I is the vertical ionization potential defined as the difference of total energy between cationic structures in the optimized geometry of the neutral compounds and the optimized neutral structures: A is the vertical electron affinity defined as the difference of the total energy between the optimized neutral structures and the corresponding anions in the optimized geometry of the neutral compounds: The hardness (η) is a global property of the molecular system and measure the resistance imposed by it to any change in its electron distribution: (9) In the finite difference approximation the above equation is: (10) The softness (S) is the inverse of hardness: (11) The electrophilicity index (ω) can be determined from chemical potential ( ) and hardness ( ) [14] as: (12) where ω represents the stabilization energy of the molecular system when it is saturated by electrons coming from the surroundings [43].

Genetic Algorithms (GA)
We introduced all 13 descriptors into the Neuroshell Predictor program code [44]. According to the GeneHunter Genetic Algorithm [45] implemented in this program we obtained the weights of the molecular descriptors (see Figure 4).  Figure 5 shows the linear correlation between the log Y exp (actual) and log Y pred calculated by GA analysis (predicted). We obtained a coefficient of multiple determinations (R squared ) of 0.9313, a correlation factor (r) of 0.9658 and a root mean squared error (RMSE) of 0.0786. R squared is a statistical indicator usually used in multiple regression analysis to compare the reliability of the model with respect to reference points. R squared is defined as: (13) where y is the experimental value, is the value predicted by the model, is the average of all the output values. Furthermore r is a measure of the linear correlation between experimental and predicted values in terms of direction, namely: RMSE is defined as the root mean square of the summation of quadratic terms. These terms correspond to the difference between experimental and predicted data values: (15) Experimental and calculated antiradical activity, error and percent error are shown in Table 3. The error is calculated from the difference between experimental (Y exp ) and calculated (Y cal ) antiradical activity. Percent error is calculated as: (16)  The highest error value was 8.69% and the lowest one 0%. The average percent error was 3.77%. We propose the construction of a Back Propagation Neural Network (BPNN) with the most important descriptor Hy (Table 4) in order to obtain a mathematical model that fits with the QSAR theory, this is one descriptor per 4 to 10 molecules.

Backpropagation Neural Network
NeuroShell Predictor software [44] was used to build and train our BPNN. The BPNN framework was formed with one input neurons, five hidden neurons and one output neuron (see Figure 6). (1) (2) The BPNN model showed that in all the analyzed compounds Hy descriptor is the most important variable in the antiradical activity. The Hy descriptor indicates antiradical activity increases as we incorporate hydrophilic groups to the coumarin molecules.
The linear correlation between log Y exp and log Y pred antiradical activity of coumarins was very successful. The graphic is showed in Figure 7. Here we obtained a R squered of 0.9196, r = 0.959 and RMSE = 0.0850.
Experimental and calculated antiradical activity, error and percent error are shown in Table 5. The highest % error value was 14.29% and the lowest one 1.26%. In our opinion the high errors should decrease as the number of molecules is increased. In the BPNN methodology the average percent error was 7.18% which corresponds to a 3.41% higher than the calculated from GA.
Determination of reliability of our QSAR model was done by calculating the statistical parameters and ∆ proposed by Roy et al. [46,47]. The value for this mathematical model was 0.8687 and the ∆ = 0.0759. For an acceptable QSAR model the average r m 2 must be >0.5 and ∆r m 2 < 0.2, in this terms the QSAR model proposed here was good. In contrast and ∆r m 2 values for our GA model was of 0.9014 and 0.056 respectively, but we have to consider that in GA analysis 13 descriptors were used and in the ANN only one. These results show the importance to include ANN with the GA methodology. A previous QSAR study [48] made with Multiple Linear Regression and 15 more complex coumarins derivatives they found that the HOMO, LUMO and partial charges in the OH, N and S where the most important descriptors for the development of the antiradical scavenging activity. There's results concord with ours in the way that Hy take account the functional groups OH, NH 2 and SH. Also in our study we validated our model with the statistical parameters and ∆ [44,45] that are a rigorous method for QSAR evaluation.  It's important to mention that the C1 and C2 compounds show the highest antiradical activities because both possess an -OH hydrophilic group. This functional group increases the Hy value in such a way that we could say that -OH group is crucial for antiradical activity of coumarins.

General
All chemicals and solvents were of reagent grade and used as received. Melting points were measured on an Electrothermal IA 9100 apparatus and were uncorrected. IR spectra were recorded neat using a Varian 3100 FT-IR with ATR system Excalibur Series spectrophotometer. Mass spectra were obtained in a Bruker Esquire 6000 spectrometer with an electron ionization mode. 1 H and 13 C-NMR spectra were recorded on a Varian Mercury 300 ( 1 H, 300.08; 13

Antiradical Activity Measurement with the DPPH• Assay
The antiradical activity of compounds A1-D2 was estimated according to a slight modification of the procedure reported by Morales and Jimenez-Perez [27]. Dilutions in DMSO solvent at 10 mg/mL of the eight compounds were prepared. An aliquot of each sample (50 μL) was added to a solution of 1,1-diphenyl-2-picrylhydrazyl (DPPH•) radical (250 µL) prepared fresh daily, at a concentration of 74 mg/L in ethanol. The mixtures (200 µL) were placed in a 96-well microplate and absorbance at time zero was immediately measured using a UV wavelength of 520 nm. Measurement were performed every 5 min for 60 min. Antiradical activity evaluation for compounds was measured in terms of absorbance decrease at 520 nm of the DPPH• ethanolic solution produced by the effect of each compound as a result of their ability to donate a hydrogen giving place to the reduced form of DPPH•. 6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid (Trolox) was used as standard molecule. The antiradical activity for each compound was determinate in Trolox equivalent antioxidant capacity (TEAC). The DPPH• solution in presence of DMSO and in the absence of coumarins was tested and used as a negative control. A null DPPH• free radical scavenging for the DMSO was verified. In all experiments, samples were analyzed in triplicate, and mean values ± SD were recorded in order to present the activity for each compound and be able to evaluate the structure-activity relationships.

Conclusions
In GA analysis we obtained an average percent error of 3.77% while in BPNN the average percent error was 7.18%. This result indicate that the combination of the two methodologies optimize the creation of QSAR models. The GA allows finding the most important descriptor for the development of the antiradical activity and ANN improves our model with the use of only one molecular descriptor to obtain accurate prediction values. The presence of hydroxyl groups on the ring structure of 3-carboxycoumarins is correlated with their DPPH• radical scavenging effects. The mixed QSAR model showed that Hy could indicate that antiradical activity would increase as we incorporate hydroxyl groups in the coumarin molecules. According to and ∆ obtained for ANN the mathematical model proposed in this work has good predictive ability.