Hologram QSAR Studies of Antiprotozoal Activities of Sesquiterpene Lactones

Infectious diseases such as trypanosomiasis and leishmaniasis are considered neglected tropical diseases due the lack for many years of research and development into new drug treatments besides the high incidence of mortality and the lack of current safe and effective drug therapies. Natural products such as sesquiterpene lactones have shown activity against T. brucei and L. donovani, the parasites responsible for these neglected diseases. To evaluate structure activity relationships, HQSAR models were constructed to relate a series of 40 sesquiterpene lactones (STLs) with activity against T. brucei, T. cruzi, L. donovani and P. falciparum and also with their cytotoxicity. All constructed models showed good internal (leave-one-out q2 values ranging from 0.637 to 0.775) and external validation coefficients (r2test values ranging from 0.653 to 0.944). From HQSAR contribution maps, several differences between the most and least potent compounds were found. The fragment contribution of PLS-generated models confirmed the results of previous QSAR studies that the presence of α,β-unsatured carbonyl groups is fundamental to biological activity. QSAR models for the activity of these compounds against T. cruzi, L. donovani and P. falciparum are reported here for the first time. The constructed HQSAR models are suitable to predict the activity of untested STLs.


Introduction
Nowadays, several diseases caused by protozoan parasites such as leishamaniases, trypanosomiases (Chagas Disease and African Sleeping sickness) and malaria represent major health risks in developing countries.Leishmaniases and trypanosomiases have few available drug therapies and the development of anti-malarial compounds is also urgently needed due to rapidly emerging resistance of the parasites against existing drugs.Is estimated that the infections by Trypanosoma and Leishmania are responsible for over a million deaths per year.Their treatment by drugs is complicated by severe side effects due to the high toxicity of available drugs.Due to the lack of research and development of new drugs over many decades, these diseases are considered "neglected tropical diseases" [1][2][3][4][5].There is thus an urgent need for development of new therapeuticals against these diseases.Many classes of chemicals have been tested against these parasites.Among them, natural products and, particularly, sesquiterpene lactones (STLs) have shown interesting activities [6,7].
In a previous study [8], in vitro activity data for 40 sesquiterpene lactones (STLs) against Trypanosoma brucei rhodesiense (the etiologic agent of East African sleeping sickness; Tbr), T. cruzi (Chagas Disease; Tcr), Leishmania donovani (visceral leishmaniasis, Kala-Azar; Ldon) as well as Plasmodium falciparum (tropical malaria; Pfc) were reported.Quantitative structure-activity relationship (QSAR) models for the activity against T. brucei rhodesiense and for the cytotoxic activity of these compounds against L6 rat skeletal myoblast cells were presented.It was found that the biological effects against the protozoan parasites were all correlated significantly with cytotoxicity against the mammalian control cells.It was not possible at that time and with the methods used for QSAR modelling to clearly define a structural basis for selectivity against the parasites [8].QSAR approaches are considered powerful tools in lead identification as well as optimization [9] in cases where the bioactivity of congeneric sets of compounds is known.Even though QSAR methods have been applied to STLs successfully for several bioactivities [10][11][12][13] it remained a challenge to construct validated models of anti-protozoal activity of STLs against T. cruzi, L. donovani and P. falciparum [8,14].
The main objective of our present work is therefore to apply the hologram quantitative structureactivity relationship (HQSAR) approach to construct comparable models for all four mentioned protozoa and cytotoxicity and to employ the molecular fragment information of the generated models to analyze the structural basis for the antiprotozoal activity and cytotoxicity of the compounds in this data set in order to find possible reasons for the selectivity observed with some of the STLs.

Results and Discussion
As biological activities against T.brucei and L6 cover a range of at least three logarithmic units, as shown by Figure 1, and all data within each activity set were determined under identical experimental conditions [8], the dataset is deemed suitable for QSAR studies.The biological activities against T. cruzi, L. donovani and P. falciparum only cover 2.25, 1.90 and 1.92 logarithmic units, respectively.This is not the ideal scenario to construct HQSAR models, but we decided to construct these three models in order to support the results generated by Tbr and L6 models.Initially, the HQSAR models with 16 series of fragment distinction and fixed fragment size (4 to 7 atoms) were generated for each series of biological activity (T.brucei rhodesiense, T. cruzi, L. donovani, P. falciparum and L6 cytotoxicity).The five best models for each dependent variable are presented in Table 1.
The initial search for the fragment distinction that best represents each biological activity shows that the model employing fragments based on atoms, bonds and connections (A/B/C) provides the best description for anti-T.brucei activity (q 2 = 0.637).The best models for T. cruzi and P. falciparum were obtained employing fragments based on atoms and connections (A/C) with cross-validated correlation coefficients (q 2 ) equal to 0.721 and 0.703, respectively.Finally, best HQSAR models for L. donovani (q 2 = 0.775) and cytotoxicity (q 2 = 0.647) employed fragment distinction based on atoms, connections, chirality and H-bond donor/acceptor groups (A/C/Ch/DA).In general, these initial results indicate that both anti-Ldon activity and cytotoxicity could be influenced more strongly by H-bond interactions and stereoselectivity since the best Ldon and L6 models were the only ones constructed with Ch and DA flags in fragment distinction.After this step, the fragment distinction of the best models was fixed and then a variation of fragment size was employed in order to analyze the influence of this parameter on statistical results.For each model (Tbr, Tcr, Ldon, Pfc and L6 models), we tested the fragment sizes with: 1 to 4 atoms, 2 to 5 atoms, 3 to 6 atoms, 4 to 7 atoms (tested in first step), 5 to 8 atoms, 6 to 9 atoms, 7 to 10 atoms and 8 to 11 atoms.All results of this second step are shown in Table 2.After the analysis of the influence of fragment distinction and size, hologram length and number of PCs on the statistical parameters, we evaluated the quality of the constructed models by internal and external validations.
The robustness test (Figure 2) suggests that all constructed models have acceptable internal consistency since all average q 2 values for each number of cross-validation groups were higher than 0.6.In order to certify that all models are completely validated, the r 2 test value was calculated for each model and the residues of prediction were also considered in external validation.Table 3 summarizes all parameters of the constructed HQSAR models as well all statistical results of internal and external validations.Figure 3    These compounds were removed from the respective data sets and the modelling repeated without them, in order to avoid distortions in the models.Manifold reasons may lead to the behavior of particular compounds as outliers [15] on which to speculate here for each case in detail does not appear useful.From the results of external validations, we can note that all constructed models have acceptable values of external validation correlation coefficients and residuals of prediction for all test set compounds lower than 1 logarithmic unit (Supplementary Table S6).All generated models including fragment distinction search and fragment size evaluation for the five sets of biological data are available in Supplementary Tables S1-S5.
Therefore, both the LOO and CV internal validation methods as well as the external validation provide results which indicate that all constructed HQSAR models and their respective fragments information are suitable to explain the anti-protozoal and cytotoxic activities.
From the contribution maps of compound 2, one of the most potent compounds in each HQSAR model (Figure 4), it becomes clear that the 7-membered ring with one of the attached methyl groups is assigned a positive contribution to biological activity by each of the HQSAR models.Quite notably, the oxygen atom in the butyrolactone ring only shows a positive contribution to the cytotoxicity model, indicating that this atom (or the butyrolactone moiety) could be related to an important difference between anti-protozoal and toxic activities of the compounds in this data set.The lactone carbonyl oxygen atom contributes positively to Tbr and Tcr models.Analyzing the contribution maps of the five constructed models (Figure 5), the 6-membered ring contributes negatively to T. brucei, P. falciparum and cytotoxic activities.The butyrolactone moiety (except the oxygen atom of carbonyl group) contributes negatively to anti T. cruzi activity.The oxygen atom of oxirane group contributes negatively to the L. donovani HQSAR model.
On the background of previous QSAR analyses of this data set, it could be expected that all HQSAR models should be influenced by similar parameters and lead to similar contribution maps since the pairwise correlation between the sets of biological activity values is quite high (higher than 69%, Supplementary Table S7) [8].However, this is not the case so that the information provided by the contribution maps of the individual models could be useful to identify differences, especially between cytotoxicity and the anti-protozoal activities.Even though the differences between the models for the antiparasitic and cytotoxic activities may be subtle and difficult to interpret in detail due to the complexity of the applied descriptors, it is noteworthy that these differences exist and thus represent a possibility to rationalize the structural reasons for the selectivity of some compounds against the parasites.We calculated the maximum common structure (MCS) with the HQSAR module (Figure 6, MCS colored in cyan).This MCS comprises the butyrolactone moiety along with two carbon atoms of the attached ring system.The α-methylene group, although present in most compounds, is not part of the MCS since compounds 5, 6, 7 and 35 are 11,13-dihydro derivatives, i.e., they have a methyl group instead of the =CH 2 group.Apart from this, compound 23 has a cyclic substituent at this position.Compounds 5, 6 and 7 are pseudoguaianolides bearing another α,β-unsatured carbonyl system, i.e., a cyclopentenone moiety located on the opposite side of the molecule.Compounds 23 and 35 do not contain any α,β-unsatured carbonyl system and both show very low activity against Tbr and also no significant cytotoxicity (pIC 50 values equal to 3.79 and 4.31, respectively).Therefore, our HQSAR studies indicate that the presence of α,β-unsatured carbonyl system could be considered a common scaffold which is generally related to biological activity, while the fragments with positive and negative contributions (Figures 4 and 5) could be related to the differences of pIC 50 in each model.In order to perform an analysis of the anti-Tbr HQSAR model in terms of statistical influence of particular fragments on biological activity, we extracted the information about the fragments with highest positive and negative contributions to biological activity from this model (Table 4).From the results obtained, it is possible to note that two of three fragments with highest contribution to biological activity (fragments 01 and 03) have two sp 2 carbon atoms bonded directly to each other which would be characteristic of an α,β-unsatured carbonyl system.Fragment 04 is the fragment with an explicit α,β-unsatured carbonyl system with highest positive contribution to the model and this fragment is exactly the substructure present in compounds 01-08 which have the highest anti-T.brucei activities.Fragment 05 is one example of a fragment encoding the butyrolactone moiety indicating that this group also contributes positively to biological activity.There are also fragments containing an α,β-unsatured carbonyl system that show a negative contribution to the model (fragments 09, 10) but, in general terms, the values of their contribution to biological activity are lower than the positive ones, indicating that positive contributions have a higher statistical significance to this HQSAR model.From fragments with negative contributions (fragments 07, 08), it is possible to note that an epoxide group contributes negatively to anti-Tbr activity.As previously described, α,β-unsatured carbonyl systems such as the α-methylene--lactone and cyclopentenone moiety are of major influence on biological activity of STLs, not only with respect to their antiprotozoal and cytotoxic activity.[8,10,11,14,[16][17][18].
In comparison to recent descriptor-based QSARs models for T. brucei activity and cytotoxicity constructed by Schmidt et al. [7], the obtained results in HQSAR suggest similar physicochemical interpretations.The positive contribution of methylcycloheptane (as part of a pseudoguaianolide skeleton) to all models suggests a positive influence of this ring system on activity that may be due to steric or hydrophobic factors since the cyclohexane system as present in the eudesmanolides showed a negative contribution to biological activity for both Tbr and L6 models.
The two fragments with the highest contribution to the Tbr model represent alkene structures which are also hydrophobic groups.These results corroborate the positive contribution of hydrophobicity to anti-Tbr activity.
In summary, our HQSAR models showed once more that α,β-unsatured groups are fundamental to biological activity of STLs, in accordance with several previous works [7][8][9][10][11][12][13]. Furthermore, the methyl-cycloheptane ring as well as further hydrophobic groups appear to be responsible for higher levels of biological activity, indicating that the potency of the studied compounds could be related to cellular permeation mechanisms.
After the analyses of HQSAR maps and the influence of fragments for most and less potent compounds, we also analyzed the HQSAR maps of the compounds with highest selectivity indices (SI) for T. brucei (compounds 19, 24 and 32) and lowest SI (compounds 26, 25 and 28).We generated these maps with Tbr and L6 models in order to verify the influence of fragments for both biological activities as a strategy to study the selectivity.From the maps of compounds 24, 25, 26 and 28 we cannot verify significant differences that could explain the selectivity of the lack of it (Supplementary Figure S1).The maps of compounds 19 and 32 are shown by Figure 7. From Figure 7, we can note: (i) the contribution maps of compound 19, the most T. brucei selective, indicates that de C-atoms of the α,β-unsatured carbonyl system and the 7-membered ring contribute positively to the Tbr model but negatively to toxicity.Therefore, this compound could be considered a lead for the development of new chemical entities with antiprotozoal activity and low toxicity; (ii) the contribution map for compound 32 indicates that the 6-membered ring contributes positively to toxicity.From this information, it is possible to note that this fragment is present in compounds with lower antiprotozoal activity and also could lead to increased toxicity; (iii) the O atom of the hydroxy group of the distal ring of compound 32 contributes positively to anti-T.brucei activity, indicating that compounds with an -OH group at this position could be tested due the low influence of this fragment on toxicity.

Data Set
The data set used for the HQSAR studies contains 40 sesquiterpene lactones with their antiprotozoal activity against Trypanosoma brucei rhodesiense (Tbr), Trypanosoma cruzi (Tcr), Leishmania donovani (Ldon) and Plasmodium falciparum (Pfc), as well as cytotoxicity against L6 rat skeletal myoblasts (Table 5) [8].The biological activity data were reported as micromolar IC 50 values which were converted to molar pIC 50 (−logIC 50 ) and used as dependent variables in the QSAR model development (Table 5).The chemical structures were drawn in the 2D format and converted to 3D, using the Sybyl X 2.0 package [19].The studied compounds were divided into training and test sets containing 80% and 20%, respectively, of the total number of compounds of each dataset (a set with certain compounds with specific biological activity measurement) in order to construct the HQSAR models and to perform external validations.The dataset split step was performed in such a manner that the entire range of pIC 50 values was covered by test set compounds, also taking into account the structural homogeneity of training and test sets.Thus, both training and test set compounds were inside the two dimensional Y (biological activity) and X (fragment) spaces.

Fragment-Based Strategy
The HQSAR technique was chosen as fragment-based drug design strategy [20][21][22][23].This technique has been successfully employed in drug design studies obtaining good agreement with experimental data of several different compound datasets [24][25][26][27].The HQSAR technique consists in the decomposition of each molecule in the dataset into a molecular hologram that consists basically of linear, branched, and overlapping fragments which are divided to a fixed-length array (53 to 401 bins).The bin occupancies encode compositional and topological molecular information used as independent (X) variables in QSAR modeling.The hologram length, fragment size and fragment distinction (atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and H-bond donor/acceptor groups (DA)) are the parameters that affect the hologram generation and consequently the statistical evaluation of constructed HQSAR models.Initially, the several models applying different combinations of fragment distinctions were generated using default fragment size 4-7 atoms over the 13 default series of hologram lengths.Next, the influence of fragment size was further investigated for the best model.All models generated in this study were generated using the Partial Least Squares (PLS) method.Each model was fully cross-validated by the Leave-One-Out (LOO) method.

QSAR Model Validation
After the obtainment of an optimum HQSAR model for each biological activity, we carried out a robustness test and external validation, with a test set of compounds which were not considered for the

Figure 1 .
Figure 1.pIC 50 distribution of the dataset of 40 STLs over the five biological activities under study.Each graph represents the respective number of compounds with measured pIC 50 (N) values in a particular concentration range against each tested parasite and cytotoxicity (L6).

Figure 2 .
Figure 2. Robustness test of the five constructed HQSAR models.

Figure 3 .
Figure 3. Experimental versus predicted pIC 50 values of training and test sets of all constructed HQSAR models.

Figure 4 .
Figure 4. HQSAR maps of positive contribution for all 5 constructed HQSAR models.

Figure 5 .
Figure 5. HQSAR maps of negative contribution for all five constructed HQSAR models.

Figure 6 .
Figure 6.Maximum common structure (cyan atoms of compound 01) of dataset calculated by all HQSAR models.

Table 1 .
5Best HQSAR models with fragment size equal to 4 to 7 atoms.

HQSAR models with F dist = A/B/C F size (atoms)
Ldon HQSAR models with F dist = A/C/Ch/DA

Table 3 .
Comparison of statistical results of all five constructed HQSAR models.
dist : fragment distinction; F size : fragment size; HL: hologram lenght; PC: number of PLS principal components; N: number of compounds of training set; SEV: standard error of validation; SEE: standard error of estimation.

Table 4 .
List of fragments with highest positive and negative contribution to Tbr HQSAR model; X atoms are the connectivity flag and are not considered part of fragment.