QSAR Study and Molecular Design of Open-Chain Enaminones as Anticonvulsant Agents

Present work employs the QSAR formalism to predict the ED50 anticonvulsant activity of ringed-enaminones, in order to apply these relationships for the prediction of unknown open-chain compounds containing the same types of functional groups in their molecular structure. Two different modeling approaches are applied with the purpose of comparing the consistency of our results: (a) the search of molecular descriptors via multivariable linear regressions; and (b) the calculation of flexible descriptors with the CORAL (CORrelation And Logic) program. Among the results found, we propose some potent candidate open-chain enaminones having ED50 values lower than 10 mg·kg−1 for corresponding pharmacological studies. These compounds are classified as Class 1 and Class 2 according to the Anticonvulsant Selection Project.

Biologically active enaminones may be classified in two different types, according to the layout of the functional group [13][14][15]: (a) open-chain enaminones (OCEs), where the characteristic group is part of a chain (thus having the flexibility that enables different conformers); and (b) ringed enaminones (REs), where the characteristic group is part of a ring and the enaminone group is not flexible.In recent years, a group of REs has been reported as anticonvulsant.The mechanism of action of these biomolecules would be similar to many classic antiepileptics and second-generation drugs, while they act on ion channels by blocking the passage of ions through them [2][3][4][5][6][7][8][9][10].Among the bioactive REs appears DM5 (methyl 4-(4-chlorophenylamino), 6-methyl,2-oxocyclohex-3-ene carboxylate), (Figure 1a) and ON2 (ethyl 6-methyl,4-(5-methylisoxazol-3-ylamino), 2-oxocyclohex-3-ene carboxylate), (Figure 1b) [6,7].Another family of enaminones with biological activity is derived from benzylamine enaminones, (Figure 1c) [9].These have anticonvulsant activity similar to DM5 (aniline enaminone derivate) and ON2 (isoxasol enaminone derivate).Distance between the carbonyl oxygen and the aromatic ring is of great importance during the binding of the molecule with the sodium channel [16].Conformations that adopt a RE influence this distance may result in different activities [2][3][4][5][6][7][8][9].In a previous study, we have performed a QSAR study on the activity of various RE in the active conformation [17].Now, a comparison between both enaminone families demonstrates the similarity of the molecular structure and functional groups involved in the linkage with the sodium channel, as evidenced by the different pharmacophore models reported in the literature [16,[18][19][20] (Figure 2).In this way, an OCE could bind to the receptor in a similar way as the REs do.Moreover for the OCE, the flexible open chain and greater ability to transport through biological membranes would allow more precise fitting of its site of action.Accordingly, it is feasible to formulate the following question: could an open-chain enaminone have anticonvulsant activity as it is the case for ringed enaminones?Several techniques have been developed to elucidate a relationship between the structure and biological activity, SAR, QSAR [21], S-SAR [22][23][24].The main objective of this work is to study a molecular set of OCEs for predicting their antiepileptic activity using the QSAR methodology, which would allow us to provide some guidelines on the anticonvulsant properties of this class of molecules.

Experimental Data
The experimental information on the antiepileptic activities of the molecular structures is obtained from various recent publications, by methods that have been previously reported [4][5][6][7][8][9][10].Due to the scarcity of experimental information and the need for QSAR models, it is necessary to collect data from different authors [4][5][6][7][8][9][10].However, we pay attention that the parameter of activity (ED 50 ), which represents the dose at which 50% of individuals reach the desired effect, is obtained by using the same assay.This is determined in the "Anticonvulsant Selection Project" (ASP) by the experimental method "Maximal electroshock seizure" (MES) [2,7,8,25].For modeling purposes, we use Log 10 ED 50 to get a more standardized property.

Geometry Optimization and Molecular Descriptors Calculation
The structures of all the examined compounds are optimized with the Semiempirical Method PM3 (Parametric Method-3) included in the HyperChem 6.03 software [26].By means of the software Dragon [27], we calculate a set of 1307 molecular descriptors [28], which includes.0D: Constitutional Descriptors, 1D: Functional Groups, Empirics Descriptors, Atom Centred Fragments; 2D: Descriptors

Aromatic site
Electrostatic site H-binding

Ringed Enaminone
Open-chain Enaminone topological, Molecular walk counts, Galvez Charge Index, BCUT Descriptors; 3D: Descriptors of Charge, aromatic index, molecular profiles of Randic, Geometry Descriptors, RDF Descriptors, 3D-Morse Descriptors, WHIM descriptors and GATEWAY Descriptors.In addition, 5 descriptors obtained from the semiempirical calculation are added (molecular dipole moment, energy of the HOMO and LUMO and HOMO-LUMO gap).Therefore, the set of descriptors contains D = 1312 variables.

Model Development
The QSAR established in this work are obtained via two different modeling approaches with the purpose of comparing the consistency of our results: (a) the search of molecular descriptors via multivariable linear regressions; and (b) the calculation of flexible descriptors with the CORAL (CORrelation And Logic) program.

Linear Descriptors Search
In the search for the best model we use the Matlab 7.0 [29].Our quest is to find from the set of D descriptors a subset of d ones (d <<< D) with the minimum standard deviation (S), so we use the Replacement Method (RM) [30][31][32].Standard deviation is defined as follows: where N is the number of molecules in the calibration set CC (molecular set used for calibration of the model), res i is the residue of the molecule i (difference between experimental and predicted property of i).
The QSAR Theory searches for the best predictions of the activity, but it is a rule in practice that the models should be simple, interpretable, and have a descriptor per six or seven molecules in order to achieve satisfactory results [33].Then, we calculate the maximum number of descriptors (d nm ) to be included in the linear regression equation as: On the other hand, the Kubinyi function FIT [34,35] is used to get the optimum number of descriptors (d opt ) of each linear regression established.The FIT criterion is a very effective method for obtaining the optimal number of descriptors of a particular model [32][33][34].

Calculation of Flexible Descriptors
CHEMPREDICT/CORAL (CORrelation And Logic) version 1.4 [36] is a freeware for Windows.Each molecular structure must be represented by SMILES (Simplified Molecular Input Line Entry System) notation, calculated with ACD/ChemSketch software [37].CORAL approach is based on the presence of certain SMILES attributes occurring in the molecule which can be associated to the activity of the molecule under evaluation [38][39][40][41].As SMILES attributes are used the symbols representing the chemical elements, cycles, branching of molecular skeleton, charges, etc.More specific details on the CORAL algorithm can be found in the recent literature [38][39][40][41].

Model Validation
A next step of current analysis is to verify the validation (predictive capability) of the QSAR relationships established on a calibration set of chemical structures.These must be predictive and capable to adapt equally-well on new structures (test set) that do not participate during the training of the model.We choose the well-known leave-one-out (loo) and leave-more-out (l-%-o) cross-validation procedures, where % represents the percentage of molecules removed from the calibration set.For l-%-o, we generate 1,000,000 cases of random molecules removal, where % = 10 (five compounds).The standard deviations S test and S l-%-o are calculated in this step.

QSAR on Ringed-Enaminones
In a previous work we have developed a mathematical model for the prediction of ED 50 in REs compounds [17].This model contains five molecular descriptors and involves a calibration set of 46 compounds.For such model (Equation 3), validation is performed with a set of five molecules, leading to S test = 0.232 and R test = 0.835:   4).The calibration is established with 51 compounds, including all compounds belonging to Equation 3. Thus, Equation 4 contains more biochemical information and its predictive power may be higher.This last model is applied to the same calibration and test sets of Equation 3, leading to: The highest intercorrelation coefficient for the five descriptors of Equation 3 is 0.733.This is because BELe6 and BELp8 descriptors belong to the same BCUT family.In general, QSAR models accept intercorrelations up to the value 0.98, but the orthogonalization process can be used to give better analysis when necessary [42,43].Equation 4 has low intercorrelations between descriptors, the highest value is 0.561.Only descriptor R4e + (R maximal autocorrelation of lag 4/weighted by atomic Sanderson Electronegativities) simultaneously appears in both equations and has low intercorrelations to the remaining ones.
Table 2 lists the compounds of both models, together with the experimental and predicted ED 50 values.Figure 3 shows the experimental and predicted Log 10 ED 50 plot for the calibration and validation sets.From this figure it can be noted that the two enaminones of the validation set, 47 and 51, are very well predicted.Dispersion plots of the residuals for the calibration and test sets are provided in the supplementary material.Such figures reveal that the behavior of the residuals in terms of the predictions follows a random distribution, in accordance to the assumption involved in linear regression analysis.No molecule in the set exhibits a residual larger than the value of S.    Now, it is feasible to improve the statistical performance of Equations 3 and 4 by using models established via flexible descriptor definitions calculated with the CORAL program.We run a Monte Carlo simulation for obtaining the DCW 3 descriptor of Equation 5, achieving the following QSAR model: Figure 4 plots the predicted activities as function of the experimental data.The predictions achieved by model 5 are included in Table 2.It is easily appreciated from the statistical parameters of calibration and leave-one-out validation that the quality of Equations 3 and 4 outperforms that of Equation 5.However, we decide to include Equation 5 in order to compare the predictions.
Another crucial problem to consider is the definition of the Applicability Domain (AD) of a QSAR model [44][45][46].In other words, not even a robust, significant, and validated QSAR model can be expected to reliably predict the modeled property for the entire universe of molecules.In fact, only the predictions for molecules falling within this AD can be considered reliable and not just model extrapolations.The AD is a theoretical region in chemical space, and depends upon the set of chemical structures and the experimental property analyzed; hence the AD is different for each QSAR model established.We define the AD for each QSAR in terms of the ranges of variation of the numerical values of its descriptors: a molecular structure would be, in principle, reliably predicted if its numerical descriptor values fall within such ranges.Thus, for Equation

QSAR on Open-Chain Enaminones
The selected OCEs are structurally-related to the REs used in the calibration and validation sets.For this selection, an analysis of molecular modulation is carried out, based on an active molecule.Then, the molecules 1A, 1B, 1C and 1D are obtained from molecules 3, 51, 43 and 41 (Figure 5).This figure shows the conformers of the OCEs.Molecules 3 and 51 belong to the family of aniline derivatives, 43 pertains to the family of benzylamine derivatives and 41 belongs to the family of isoxasol derivatives.The structural similarity between the molecules used in the models and the OCEs suggests that the models developed in this work would serve to predict ED 50 of these molecules.Having no experimental values, a way to verify the predictions is to note that Equations 3 and 4 do not lead to absurd predictions (different predictions for the same molecules).As shown in Table 3, the predictions are similar for both models.Both equations predict that 1B is the most active, while the enaminone with lower activity is 3A.Then, we argue that the predictions obtained are not at random, and that the predicted values of ED 50 obtained with both models should be close to the experimental observations.

Conclusions
A linear QSAR model is developed to predict ED 50 in REs and applied for the prediction of OCEs.In addition, an alternative linear model using a different methodology based on the flexible descriptor definition is obtained with the same purpose.The developed models allow the prediction of antiepileptic activities of 16 OCEs.These compounds are presented as candidate structures for corresponding pharmacological studies.The 16 enaminones would be classified as Class 1 and Class 2 according to ASP.Several of the ED 50 obtained here are less than 10 mg•kg −1 .Accordingly, conformational flexibility in OCEs is a crucial factor to be considered during the study of the antiepileptic activity behaviour.

Figure 2 .
Figure 2. Pharmacophore models reported in the literature and ringed and open-chain enaminones structures.

Figure 4 .
Figure 4. Experimental and predicted Log 10 ED 50 plot using flexible descriptors model: ○ Calibration set • test set.

Figure 5 .
Figure 5. Structure of the 16 conformers of open-chain enaminones.Scheme for the selection of the compounds.

Table 1 .
N = 51; p < 10 −4 ; R cal = 0.864; S cal = 0.209; R test = 0.947; S test = 0.204; R loo = 0.847; S BELe6 and BELp8 are BCUT descriptors, RDF025v is a Radial Distribution Function descriptor, Mor15e is a 3D-MoRSE descriptor and R4e + is a 3D GATEWAY descriptor.The structural variables appearing in Equation 4 combine multidimensional aspects of the molecular structure and are classified as follows: Radial Distribution Function descriptors (RDF025m and RDF115m), Geometrical (G(O..Cl)), GATEWAY (R4e + ) and HOMO-LUMO energy gap (Homo-Lumo).A brief explanation of the descriptors participating in both equations is provided in Table1.Symbols and description for molecular descriptors involved in QSAR.

Table 2 .
Experimental and predicted Log 10 ED 50 antiepileptic activity values of the compounds of calibration set and test set.
* Molecules of test set.