Some QSAR studies for a group of sulfonamide Schiff base as carbonic anhydrase CA II inhibitors.

In the present study, quantitative structure-activity-relationship (QSAR) study on a group of sulfonamide Schiff-base inhibitors of Carbonic Anhydrase (CA) enzyme has been carried out using Codessa Pro methodology and software. Linear regression QSAR models of the biological activity (Ki) of 38 inhibitors of carbonic anhydrase CA-II isozyme were established with 12 different molecular descriptors which were selected from more than hundreds of geometrical, topological, quantum-mechanical, and electronic types of descriptors and calculated using Codessa Pro software. Among the models presented in this study, statistically the most significant one is a five-parameter equation with correlation coefficient, R(2) values of ca. 0.840, and the cross-validated correlation coefficient, R(2) values of ca. 0.777. The obtained models allowed us to reveal some physicochemical and structural factors, which are strongly correlated with the biological activity of the compounds.


Introduction
The metallo-protein carbonic anhydrase (CA, EC 4.2.1.1) is one of the most widely spread biological catalysts all over the phylogenetic tree. In humans, isozymes I, II, and IV are involved in respiration and regulation of the acid/base homeostasis. These complex processes involve both the transport of CO 2 /bicarbonate between metabolizing tissues and excretion sites (lungs, kidneys), facilitate CO 2 elimination in capillaries and pulmonary microvasculature, eliminate H + ions in the renal tubules and collecting ducts, as well as help in the reabsorption of bicarbonate in the brush border and thick ascending Henle loop of the kidneys. By producing the bicarbonate-rich aqueous humor secretion (mediated by ciliary processes isozymes CA I, II, CA IV and CA XII) within the eye, CAs are involved in vision, and their misfunctioning leads to high intraocular pressure and glaucoma. CA II is also involved in the bone development and in functions such as the differentiation of osteoclasts or the provision of acid for bone resorption in osteoclasts [1][2][3][4]. The presence of these isozymes in so many tissues and in a number of different isoforms represents an attractive objective for the design of inhibitors with biomedical applications.
In this study, we investigated QSAR for 38 sulfanilamide Schiff's base inhibitors of the physiologically relevant isozyme CAII using Codessa Pro approach [29]. To the best of our knowledge, all QSAR studies using sulfonamides with Schiff base have been performed on the same data set. The majority of molecules in our set have been newly synthesized and till now they have not been conducted in any QSAR study. The results of this study may help estimate the inhibition activity of sulfonamide with Schiff base of this series, prior to synthesis.

Computational details
For all the molecules studied, 3-D modeling and calculations of quantum mechanical descriptors were performed using the Gaussian 03 quantum chemistry package [30]. To save computational time, initial geometry optimizations were carried out with the molecular mechanics (MM) method, using the MM+ force fields. The lowest energy conformations of the molecules obtained by the MM method were further optimized by the DFT [31] method by employing Becke's three-parameter hybrid functional (B3LYP) [32] and the 6-31G (d) basis set. Their fundamental vibrations were also calculated using the same method to check if there were true minima. All the computations were carried out for the ground states of these molecules as single states. Codessa Pro was used for statistical analysis. This code uses diverse statistical structure property/activity correlation techniques for the analysis of experimental data in combination with the calculated molecular descriptors. It is worthy to mention here that we used a high level of theory (DFT/B3LYP) to obtain more precise data of descriptors during the calculation of the optimized 3-D geometry and quantum mechanical descriptors of the compounds, although no geometric, quantum mechanical, and thermo dynamical descriptors were involved in the obtained models presented in the following section. The heuristic method (HM) [33] implemented in Codessa Pro was employed for selecting the 'best' regression model. HM can either quickly give a good estimation about what quality of correlation to expect from the data, or can derive several best regression models. Besides, it will demonstrate as to which descriptors have bad or missing values, which descriptors are insignificant, and which descriptors are highly inter-correlated. This information will be helpful in reducing the number of descriptors involved in the search for the best QSAR model. A pre-selection of descriptors is accomplished by HM as follows. All descriptors are checked to ensure that (a) value of each descriptor is available for every structure and (b) there is a variation in these values. The descriptors for which values are not available for every structure in the data in question are discarded. Descriptors having a constant value for all structures in the data set are also discarded. A printout showing descriptors thus discarded is provided. Thereafter, the one-parameter correlation equations for each descriptor are calculated. To further reduce the number in the ''starting set'' of descriptors, the following criteria are applied and a descriptor is eliminated if: (a) the F-test's value for the one-parameter correlation with the descriptor is below 1.0, (b) the squared correlation coefficient of the one-parameter equation is less than R 2 min 0.01 by default, (c) the parameter's t-value is less than t 1 (where R 2 min 0.1 by default and t 1 1.5 by default are user-defined values), and (d) the descriptor is highly inter-correlated (above r full , where r full is a user-specified value by default 0.80), with another descriptor. All the remaining descriptors are then listed in the decreasing order according to the correlation coefficient of the corresponding oneparameter correlation equation. All two parameter regression models with remaining descriptors are developed and ranked by the regression correlation coefficient R 2 . A stepwise addition of further descriptor scales is performed to find the best multi-parameter regression models with optimum values of statistical criteria (highest values of R 2 , the cross-validated, R 2 CV, and the F value). In addition to the descriptors calculated using Gaussian 03 and Codessa Pro, we have also added two physicochemical properties of compounds, namely, logarithm of 1-octonal/water partition coefficient (logP) and logarithm of aqueous solubility (logS) to the descriptor pool. These two parameters were calculated using WEB tool of ALOGPS 2.1 software [34]. This WEB tool calculates logP and logS using six different softwares and algorithms including ALOGPS 2.1. The calculation results of the properties are listed in the WEB for each software and in the average of six softwares as well. We have used average values of logP and logS from six different softwares in the regression procedures.

Results
The structures of 38 Schiff-base sulfonamide compounds are shown in Figure 1. The experimental inhibitory activity of the compounds against CA II isozyme was taken from three references [35][36][37]. Table 1 shows the following information: (i) calculated molecular descriptor values involved in the models, (ii) experimental Ki (nM) values taken from the original references, and (iii) the calculated Ki (nM) values using the best model 5 obtained in this study. The plot of observed versus calculated Ki for CA II using model 5 is shown in Figure 2. The inter-correlation of descriptors is shown in Table 3.
Using the HM, several regression equations were obtained in this study. Among the regression results, five equations (the best one, two, three, four, and five parameters) were selected as models and are given in Table 2. In these models, the correlation coefficient, R 2 , is a measure of the fit of the regression equation. F, the Fisher test value, reflects the ratio of the variance explained by the model and the variance due to the error in the model. Higher values of F-test indicate the significance of the equation. s 2 is the standard deviation of the regression. R 2 CV , the 'leave one out' (LOO) cross-validated coefficient, is a practical and reliable method for testing the predictive performance and stability of a regression model. LOO approach involves developing a number of models, with one sample omitted at a time. After developing each model, the omitted data are predicted and the differences between the experimental and predicted activity values are calculated. The R 2 CV values are then calculated according to the following formula [38]: where y i is the actual experimental activity, i y − the average actual experimental activity, and î ŷ the predicted activity of compound i computed by the new regression equation obtained each time after leaving out one datum point (No. i).
In the present work, more than two hundred descriptors were exploited. In Codessa Pro, descriptors are divided into groups such as constitutional, topological, geometrical, electrostatic, quantum chemical, thermodynamic, and constructed. Constitutional descriptors are related to the number of atoms and bonds in each molecule. Topological descriptors include valence and non valence molecular connectivity indices calculated from the hydrogen-suppressed formula of the molecule, encoding information about the size, composition, and the degree of branching of a molecule. Geometrical descriptors are calculated from 3-D atomic coordinates of the molecule and comprise moments of inertia, shadow indices, molecular volumes, molecular surface areas, and gravitation indices. Electrostatic descriptors reflect characteristics of the charge distribution of the molecule. Quantum chemical descriptors encode the polar interactions between molecules or their chemical reactivity and the activation energy of the corresponding chemical reaction. Thermodynamic descriptors are quantum mechanically calculated on the basis of the total partition function of the molecule Q and its electronic, translational, rotational, and vibrational components. Codessa Pro also allows one to construct new descriptors by using the existing descriptors. In this way, the author has constructed some common quantum chemical indices, namely, chemical hardness, electronegativity, and electrophilicity from HOMO and LUMO orbital energies. The results shown in Table 2 have been quite surprising, which is attributed to the fact that no quantum chemical indices has turned out in our models. In our previous studies [13][14], the QSAR models have been drawn up from the quantum mechanical descriptors of a group of diverse aromatic and heterocyclic sulfonamides and from the inhibitory activity of these compounds against CA II isozyme. For comparison, we have tried to correlate inhibitory activity Ki-CA II of molecule set of this study (Schiff base sulfonamides) with the same quantum mechanical descriptors involved in QSAR models in our previous works. The correlation coefficient was very poor, less than (R< 0.1). This result indicates that inhibition mechanism of Schiff-base sulfonamides is different from that of the aromatic and heterocyclic sulfonamides.
According to the preliminary regression analysis, these two compounds exhibited unusual behaviors in all the models. When the heuristic method has been run with default for 38 compounds, the best one, two, three, four and five parameter equations have shown up as the program output. In all these five equations, compounds 29 and 38 have had the largest standard residual (almost twice of mean residua). After selecting these two compounds as outliers, the statistical quality of one, two, three, four and five parameter equations were increased dramatically such as statistical parameters for five parameter equation R 2 from 0.71 to 0.84, F from 15.96 to 31.54, and s 2 from 0.061 to 0.034. It is 12 worthy here mention that the descriptors involved in the best equations obtained for 38 compounds set and 36 compounds set are not the same. The best one, two, three, four and five parameter equations obtained from 36 compounds are presented as models in following.
A perusal of Table 2 shows that twelve types of descriptors are involved in all the five models. The use of HM method yielded the best one-parameter regression expression as follows.
LogCA II-Ki= 0.180 + 32.767 RNBR N=36; R 2 =0.527; R 2 CV =0.488; F=37.98; s 2 =0.090 (1) Here and thereafter, N is the number of compounds, R 2 is the correlation coefficient, R 2 CV is the 'leave one out' (LOO) cross-validated coefficient, F is the Fisher-statistic value, and s 2 is the standard deviation of the regression equation. In this model, as well as the models presented below, compounds 29 and 38 are outliers. In this model, RNBR is the relative number of benzene rings. RNBR is the ratio between the numbers of benzene rings divided by the total number of atoms in the molecules. In the above equation, RNBR has a positive-sign coefficient. This means that increases in the magnitude of RNBR favors the exhibitions of the inhibitory activity of CA II-Ki.
Among the obtained two-parameter models, statistically the best one is as below: LogCA In this model, NBR is the number of benzene rings and NCA is the number of C atoms in the molecules. These two descriptors are constitutional. NBR has a coefficient with positive sign and NCA has a coefficient with a negative sign. When models 1 and 2 are considered together, both models highlight the same features of the molecules. According to the models, substituting benzene with nitro or hydroxy groups instead of methyl groups and decreasing the number of CH 2 bond to imine nitrogen are favorable for an increase in the inhibition activity of the compounds.
Among the obtained three-parameter models, statistically the best one is as follows: LogCA In this model, the sign of coefficients of NBR and NCA is the same as in that in the model 2 and thus they carry the same significance as in that model. NNA is a constitutional descriptor. It has a positive sign of coefficient. This means that the increase in the magnitude of NNA is the favorable parameter for the exhibition of the inhibitory activity. ALP is the logarithm of 1-octonal/water partition coefficient and exhibits hydrophobicity of compounds. The positive sign of coefficient of ALP indicates that Log CA II-Ki increases with increase in the magnitude of Average logP. 2 AIC is the second-order average complementary information content that could be considered an index of heterogeneity of a molecule [38]. The positive sign of coefficient of 2 AIC indicates that increases in the magnitude of 2 AIC are favorable for an increase of inhibition activity of the compounds.
By default setting, HM uses maximum five descriptors to construct a regression model. We have changed the default setting of the software in order to allow for the use of more than five descriptors. This has yielded several six-and seven-parameter regression equations. None of those equations has better statistic parameters than the above five parameters (equation 5). Although R 2 values of those six-and seven-parameter equations were greater than those of equation 5, they were not selected as models in this study due to their relatively low F value.

Discussion
Twelve types of descriptors were involved in the models. Two of the descriptors are physicochemical property, namely, ALS, average logS, and ALP, average logP. In model 4, average logS negatively contribute to the inhibitory activity. In model 5, average logP positively contributes to the inhibitory activity.

Figure 1. Molecular structure of Schiff-base sulfonamides used in the present study
This result is expected due to the fact that the average logP is inversely proportional to the average logS. Five of the involved descriptors are constitutional. RNBR, the relative number of benzene rings in model 1, and NBR, the number of benzene rings in the models 2 and 5 positively contribute to the inhibitory activity. On the contrary, NCA, the number of C atoms, negatively contribute to the inhibitory activity in the models 2 and 3. When these two results are combined together, one could draw a conclusion that one should avoid substituting benzene rings with C-containing groups and adding CH 2 bond to imine nitrogen for designing Schiff-base sulfonamide compounds with increased inhibitory activity. Another constitutional descriptor is NNA, the number of N atoms which positively contribute to the inhibitory activity in model 5. As a consequence, substituting benzene ring with nitro groups or inserting an N-containing ring to a Schiff-base sulfonamide compound may help increase the inhibitory activity. Two descriptors, 3 χ and 2 AIC , out of twelve are topological index. 3 χ is the thirdorder Randic index which negatively contributes to the inhibitory activity in model 4. This means that if one wants to design a Schiff-base sulfonamide with high inhibitory activity, it should be considered that the third-order branching is not the favorable parameter. Another topological index is 2   a) From Ref. [37]; b) From Ref. [35]; c) From Ref. [36]; d) Compounds 29 and 38 are outliers and were deleted from the regression procedure of model 5. Table 2. Regression parameters and statistical quality of the correlations of the activity LogK i -CAII in the present study.

Inhibition of CA II
The inhibition value LogCA II-Ki was adopted from the references [35][36][37].

Topological indexes
Two topological indexes were involved in our models. These two indices are defined using the formula given below according to Codessa Pro reference Manual [33]. (order 0-3). The general formula for the calculation of these indexes is as follows: where q is a number of edges in the structural graph of the molecule.

Electrostatic indexes
According to Codessa Pro reference Manual [33], electrostatic indexes were involved in our models and are calculated using the formulae given below.
Charged partial surface-area descriptors. Charged partial surface-area (CSPA) descriptors have been invented by Jurs et al. [45][46] in terms of the whole surface area of the molecule and in terms of functional group portions.
Descriptors involved in this study are as follows: Minimum and maximum partial charges for particular types of atoms (e.g. C, O, N etc.). The empirical partial charges in the molecule are calculated using the approach proposed by Zefirov [47][48]. This method is based on the Sanderson electronegativity scale and uses the concept which represents the molecular electronegativity as a geometric mean of atomic electronegativities.