Modeling Natural Anti-Inflammatory Compounds by Molecular Topology

One of the main pharmacological problems today in the treatment of chronic inflammation diseases consists of the fact that anti-inflammatory drugs usually exhibit side effects. The natural products offer a great hope in the identification of bioactive lead compounds and their development into drugs for treating inflammatory diseases. Computer-aided drug design has proved to be a very useful tool for discovering new drugs and, specifically, Molecular Topology has become a good technique for such a goal. A topological-mathematical model, obtained by linear discriminant analysis, has been developed for the search of new anti-inflammatory natural compounds. An external validation obtained with the remaining compounds (those not used in building up the model), has been carried out. Finally, a virtual screening on natural products was performed and 74 compounds showed actual anti-inflammatory activity. From them, 54 had been previously described as anti-inflammatory in the literature. This can be seen as a plus in the model validation and as a reinforcement of the role of Molecular Topology as an efficient tool for the discovery of new anti-inflammatory natural compounds.


Introduction
One of the biggest pharmacological problems today is the treatment of chronic inflammations. Diseases like chronic asthma, rheumatoid arthritis, multiple sclerosis, inflammatory bowel disease (IBD), and psoriasis, are strongly debilitating and are becoming increasingly common in our aging society. Rheumatoid arthritis and osteoarthritis are the major inflammatory diseases affecting people worldwide. Increases in life expectancy and aging populations are expected to make osteoarthritis the fourth leading cause of disability by the year 2020. Moreover, epidemiological studies have identified chronic infections and inflammation as major risk factors for various types of cancer [1].
Several classes of drugs, such as corticosteroids, NSAIDs, and biologics, are used to treat the inflammatory disorders. The main problem is that these drugs possess several adverse effects or are too expensive to be used. Corticosteroids have long been used for the management of rheumatoid arthritis and IBD's diseases, but they suffer from some serious adverse effects, such as Cushing's habitus, hypertension, hyperglycemia, muscular weakness, increased susceptibility to infection, osteoporosis, glaucoma, psychiatric disturbances, growth arrest, etc.
Likewise, the side effects associated with the use of NSAIDs, such as gastrointestinal ulceration and bleeding, and platelet dysfunction, are several and common, and because of the largest use (and abuse) of this class of drugs, they represent a big problem at the moment to treat chronic inflammations.
The coxibs also exhibited cardiovascular side effects due to inhibition of prostacyclin formation in the infarcted heart, tipping the balance of prostacyclin/thromboxane, coupled with a diminution in prostacyclin in heart muscle. Therefore, it is quite clear that the clinically used anti-inflammatory drugs suffer from the disadvantage of side effects and high cost of treatment (in case of biologics) [1].
There is a valid alternative to these drugs, represented by natural products, which offer a great hope in the identification of bioactive lead compounds and their development into drugs for treating inflammatory diseases [1]. Is known that plants have been the basis of many traditional medicine systems throughout the world for thousands of years and they represent an exhaustive source of "raw materials" in order to find and synthesize new molecules with pharmacological activity [1].
Natural Products (NP) are classified into three groups: NPs, semi-synthetic NPs or NP-derived [2]. The value of natural products can be assessed by the rate of introduction of new chemical entities of wide structural diversity, including serving as templates for semisynthetic and total synthetic modification.
An analysis of the origin of the drugs developed between 1981 and 2002 showed that natural products or natural product-derived drugs comprised 28% of all new chemical entities (NCEs) launched onto the market. In addition, 24% of these NCEs were synthetic or natural mimic compounds, based on the study of pharmacophores related to natural products. This combined percentage (52% of all NCEs) suggests that natural products are important sources for new drugs and are also good lead compounds suitable for further modification during the drug development process. Scrutiny of medical indications by source of compounds has demonstrated that natural products and related drugs are used to treat 87% of all categorized human diseases (48/55) [3].
It is noteworthy that Natural Products have played a pivotal role in immunosuppression drug discovery as shown by the launch of the NPs cyclosporin 72 (1983), tacrolimus (1993), sirolimus 10 (1999) and mycophenolate sodium (2003), and the semi-synthetic NPs mycophenolate mofetil (1995), everolimus 129 (2004) and fingolimod (2010). In addition, the NP-derived aspirin (acetylsalicylic acid) discovered in the late 1890s is still used widely as an analgesic and anti-inflammatory, while corticosteroids and b2 agonists modeled on adrenaline (e.g., salbutamol and salmeterol) are used to help control asthma [2]. A total of 13 NP and NP-derived drugs were approved for marketing worldwide from 2005 to 2007 , with 5 being classified as NPs, 6 semi-synthetic NPs and 2 NP-derived drugs [2].
Despite this statistic, pharmaceutical companies have embraced the era of combinatorial chemistry, neglecting the development of natural products as potential drug candidates in favor of high-throughput synthesis of large compound libraries [4]. The main reasons for this include the incompatibility of natural product libraries with high-throughput screening and the marginal improvement in core technologies for natural product screening in the late 1980s and early 1990s [5]. Luckily, during the last years, the development of new technologies has revolutionized the screening of natural products. Applying these technologies compensates for the inherent limitations of natural products and offers a unique opportunity to re-establish natural products as a major source for drug discovery [5].
We have to understand that the natural product landscape offers, not only the direct introduction of natural products into the drug discovery process, but more often, natural products serve themselves as lead agents, providing the chemist with a structural platform which can be elaborated upon, or simplified, to yield a therapeutically valuable pharmaceutical [6]. They offer unmatched chemical diversity with structural complexity and biological potency. Natural product resources, especially from the marine environment, are resourceful and largely unexplored [7].
Another key point relating to natural products is that we can start from the original natural product and develop an analog strategy which permits us to create or modify new molecules with biological or pharmacological activity. As for the anti-inflammatory natural products, they have been discovered based on ethnopharmacological observations, thanks to some new strategies in chemical investigation. In this regard, through the use of topological descriptors, they could provide new potential drug targets.
In the late 1980s, computational chemistry sped up the drug discovery process. Afterwards, combinatorial chemistry (including molecular evolution, multiple parallel synthesis, etc.) arrived combined with High Throughput Screening (HTS), in the mid-1990s.
Virtual Screening, or in silico screening, is an approach attracting increasing interest in the pharmaceutical industry as a productive and cost-effective technology for the search of novel hit or lead compounds [8][9][10].
The principles involve the computational analysis of chemical databases, to identify those compounds that are most likely to show a given biological activity. Of course, these ideas are not new, but have been pursued for years by groups working in drug design and discovery. However, the availability of inexpensive high-performance computing platforms has transformed these processes in such a way that, at present, increasingly complex and more accurate analyses can be performed on a very large data set.
The topological virtual screening is based on the analysis of a chemical diversity of molecules [8], which enables the selection of the best potential molecular choices. In principle, the molecules are not classified according to their biological activity, but depicted by their topological indices (TIs) and after a computational study of their structures, only those ones complying with a desired topological model are chosen for further development.
Then a model comes from a linear discriminant analysis (LDA) containing two sets of structures: One of them has a well-defined pharmacological activity, and the other one, is built from structures showing no this biological activity.
The resulting model, associated with the desired pharmacological activity, generates a set of topological descriptors capable of differentiating potentially active compounds from those lacking activity.
The method above represents a rather detailed and relevant framework to search for leads, prioritizing the selection of compounds that are advisable to be tested in a biological assay. It offers a new option, a new method that shows itself to be powerful in facing the hunt for new targets, new lead compounds that finally enable the securing of new drugs.
This report deals with the search of natural anti-inflammatory compounds by using a database of natural products. The research team has gained experience in discovering new drugs applying Molecular Topology, and has developed several models in the field of anti-inflammatory compounds [11][12][13].

Analyzed Compounds
The model for searching new natural anti-inflammatory compounds was made up of 412 natural compounds, 123 active as anti-inflammatories and 289 inactive. Almost all the active compounds were from a paper reported by Kontogiorgis et al. [14] and the rest of them active and inactive from the collection Pure Natural Products from MicroSource database [15]. Compounds conforming the test set and Virtual screening were also achieved from these sources. Compounds forming the training set are shown in Supplementary material, Annex I. All sets of compounds are characterized by a large structural diversity

Molecular Descriptors
The 2D structure of each compound was drawn using the ChemDraw Ultra package [16]. Each compound was characterized by a set of 436 topological indices, standing among them the topological charge indices, quotients and differences between nonvalence and valence connectivity indices, topological and 2D autocorrelation descriptors. All indices were calculated with Dragon software [17]. In the supplementary material, Annex I, the TI's values are given for all compounds of the model.

Modeling Techniques
Linear discriminant analysis (LDA) is a pattern recognition method which provides a classification model based on the combination of variables that best predict the category or group to which a given compound belongs. We built up a natural compounds database where all compounds were allocated into an active or inactive group according to their anti-inflammatory activity. The LDA was then applied to these two groups to obtain a discriminant function (DF) with the statistical software Statistica 9.0 [18]. The independent variables were the TIs, and the discriminatory property was the anti-inflammatory activity. The discriminant capability was assessed as the percentage of correct classifications in each set of compounds. The classification criterion was the minimal Mahalanobis distance (distance of each case to the mean of all the cases in a category). The quality of the discriminant function was evaluated using the Wilks parameter, λ, which was obtained by multivariate analysis of variance that tests the equality of group means for the variable in the discriminant model.
The method used to select the descriptors was based on the Fisher-Snedecor parameter (F), which determines the relative importance of candidate variables. The variables used to compute the linear classification function are chosen in a stepwise manner: at each step, the variable that makes the largest contribution to the separation of the groups is entered into the discriminant equation (or the variable that makes the smallest contribution is removed).
The validation of the selected function was done using an external test set. Compounds that comprise the test set, were randomly selected from approximately 20% of the data, and were not used in the set up of the DF equation.
Another important parameter that usually provides a balanced evaluation of the model's prediction is the Matthews correlation coefficient (MCC) [19]. This coefficient is based on the fact that in any prediction process there can be four different possibilities to account for: It is clear therefore, that any single number that represents the predictive power of the method must account for all the possibilities listed above. MCC fulfils these requirements. Matthews' coefficient is defined as shown in Equation (1): The Matthews correlation coefficient ranges from −1 ≤ MCC ≤ 1. A value of MCC = 1 indicates the best possible prediction, in which every compound in the model was correctly classified, whereas if MCC = −1 then we are in the worst possible case (or anti-correlation), where no one single compound has been correctly labeled. Finally, a Matthews correlation coefficient of MCC = 0 is what would be expected for a random prediction.

Pharmacological-Activity Distribution Diagrams
A pharmacological distribution diagram (PDD) is a graphical representation that provides a straightforward way of visualizing the regions of minimum overlap between active and inactive compounds, as well as the regions in which the probability of finding active compounds is at a maximum [20].
Actually, a PDD is a frequency distribution diagram of dependent variables in which the ordinate represents the expectancy (probability of activity) and the abscissa represents the DF values in the range. For an arbitrary range of values of a given function, an "expectancy of activity" can be defined as Ea = a/(i + 1),where "a" is the number of active compounds in the range divided by the total number of active compounds and "i" is the number of inactive compounds in the interval divided by the total number of inactive compounds. The expectancy of inactivity is defined in a symmetrical way, as Ei = i/(a + 1). Presented with these diagrams, it is easy to visualize the intervals in which there is a maximum probability of finding new active compounds and a minimum probability of finding inactive compounds.

Topological Virtual Screening
The topological model resulting from DF function was used to find new natural anti-inflammatory compounds. A group of compounds from MicroSource Pure Natural Products Collection database, that has not been employed neither in the training set nor in the test set, were screened for the search of potential new anti-inflammatory natural compounds.

Similarity Study
A study of compounds' similarity was previously carried out in order to guarantee that no simple or evident structural features are discriminating between the molecules that make up the data set. Thus, molecular weight, MW, partition coefficient, logP, (values estimated for log P with Dragon software, [17]) and Randic index, 1 χ, have been calculated for all compounds in the database. These descriptors give us information about the molecular size, lipophilia and molecular branching, respectively. were obtained for the inactive ones. Hence, the set is well balanced and no obvious structural differences are expected to distort the study. The results obtained with the test set are similar to those of the training set (see Figure 1).
If we compare these values to those obtained for the selected set of anti-inflammatory natural compounds, i.e., 276.75 (MW), 2.00 (logP) and 8.12 ( 1 χ) value, we can see that the predicted anti-inflammatory natural compounds show lower values of the three parameters, and therefore the structures selected from natural compounds are diverse from those already well-known and used in the training set.

Mathematical Modeling
The mathematical model was developed from a training set including 412 compounds, with heterogeneous molecular structures. Even if the number of active compounds (123 molecules) and inactive (289) that comprise the training set were not similar in number, this was offset by the construction of a model by which every compound has the same statistical weight.
By applying this criterion to the training set (412 compounds), (see supplementary material annex I for details), 61 out of 123 experimentally active compounds were correctly classified as such (50% accuracy), and 284 out of 289 experimentally inactive compounds were also well classified (98% accuracy) as can be seen in Table 2. Altogether, the average of correct classification for the entire set of compounds (active plus inactive) was 74%. The following formula was used to calculate the percentage of correctly classified compounds within a particular category (active or inactive) as shown in Equation (3): Classification accuracy (%) = (CCC × 100/TNC) where CCC is correctly-classified compounds and TNC is a total number of compounds.
Regarding the Matthews correlation coefficient, which returns a value between −1 and +1, our model shows a value of 0.6, what ensures its reliability.
Furthermore, the Matthews correlation coefficient was calculated in a slightly different way, i.e., by adding +1 to each scale value, in this way the outcome it could be expressed as % accuracy. In other words, 0 would mean no correlation at all, 1 represents 50% and 2 stands for the maximum correlation (100%). By doing so, our model's yield was 80% (MCC modified = 1.6).
To establish the adequate range of activity, we analyzed the pharmacological distribution diagram obtained with the discriminant function, DF.
Looking at Figure 2, we can appreciate that, all the compounds studied show DF values in the range 8 > DF > −7. Outside these ranges the compound's classification is uncertain and it is labeled as "not-classified" (outliers), NC. An easy way to evaluate the quality of the function above is to apply it into an external group. In our case, this group was made up of 84 compounds (41 active and 43 inactive) which had not been included for DF calculation, what is about 20% of the data. Table 3 outlines the results of the prediction obtained for every compound of the test set.   As we can appreciate in Table 2, the success rate is increased in the active group up to 59% (24 of 41 compounds analyzed were correctly classified). In the case of the inactive group belonging to the test set, there are only six compounds misclassified; the rate of correct compounds was 86% (37 of 43 compounds analyzed were correctly classified), indicating that DF has a high specificity in recognizing inactive compounds, because it has the capability to predict if an inactive compound is actually inactive. Hence, we can ensure that the number of "false active" is going to be minimumized. Furthermore, although DF will lead to the loss of some of the active compounds, the important point is that there is a lower risk of including false active compounds when we carry out a database screening searching for anti-inflammatory natural compounds.
As illustrated in Table 3, there is just one outlier or uncertain compound in the inactive group, namely Tropine, whose DF value exceeds the range of application of the model.
In Equation (2), there are topological descriptors which evaluate the molecular bonds, TI1, the atomic masses, ATS7m, Van der Waal volumes, ATS4v and ATS7v, and finally, the atomic polarizabilities of the molecules, ATS1p.
Although it is not easy to unfold the structural features explaining the discriminant equation obtained, some insight can be gained on the basis of the most relevant indices in the regression equation, namely TI1, ATS7m, ATS4v, ATS7v and ATS1p. Each one of these indices refers to a specific physical or chemical property of the molecule.
For example, the Moreau-Broto (ATS) autocorrelation descriptors represent the interactions between atoms at topological distance k, (lag k), for a particular atomic property (weighting factor). In our case, the weighting factors are basically the atomic polarizability, the van der Waals volume and the atomic mass. These descriptors seem to be sensitive to the molecular branching and cyclicity.
From a general overview of the active and inactive compounds, we can find some differences, for example the active compounds typically show hydroxyl groups (low mass and electronic acceptors) which, contrary to the inactive compounds, are placed in the molecule far away from the carbonyls. On the other hand, the inactive set includes compounds showing methoxy groups (higher mass and electronic donors). In general there are less cyclic compounds among the inactive set. See as examples (Figure 3), capsaicin or p-hydroxycinnamaldehyde among the actives or Rhodinyl acetate or theanine among the inactives.
The active compounds often show a higher polarizability (taken into account by the index ATS1p), as compared to the inactive, which is compatible with their larger molecular volume and the presence of hydroxyl groups. Obviously the hydroxyl groups would also play a key role in molecular solubility and the molecule's capability to form hydrogen bonding, which are also well known factors influencing the activity. Given the high structural heterogenicity of the molecules used to build up the models, these results can be applied to large databases including natural compounds to search for new active compounds.

Topological Virtual Screening
Based on the model described above, a virtual screening was carried out on a database of heterogeneous natural compounds. We used some of the compounds of the library MicroSource Pure Natural Products Collection, that were not used for the construction of the model or for the external validation and we performed a virtual screening searching for anti-inflammatory natural compounds. The library composition can be obtained from the MicroSource Discovery Systems website [15].
As shown in Table 4, a set of 74 natural compounds were selected with a DF values between 2 < DF < 6 with predicted activity as anti-inflammatory. Almost all of these were commercially available.   nr (is not referenced as an anti-inflammatory in literature).
As illustrated in the Table 4, most of the compounds selected had been described previously as anti-inflammatory in the literature (55/74) (see column 9), which is highly encouraging and represents an extra proof of the model's performance. It is pretty clear that there are many ways of applying the model described herein to the search for new anti-inflammatory natural compounds. Although 19 out of the 74 compounds do not show anti-inflammatory activity, one cannot be sure if this is because of their inactivity or the absence of laboratory tests developed by someone. So it will be an attractive challenge for us to test these compounds and see if some of them would indeed show antiinflammatory activity.
Virtual screening is increasingly gaining acceptance in the pharmaceutical industry as a cost-effective and timely strategy for analyzing very large chemical data set. This procedure is computationally intensive for analyzing large databases and it provides the most detailed basis for determining which compounds are likely to be potent hits or leads. The results outlined here demonstrate not only that the Topological Virtual Screening could accurately reproduce the well-known pharmacological activity, but also represent a new step forward in the pathway to demonstrate the high efficiency of the in silico methods based in Molecular Topology.

Conclusions
The joint use of topological-structural descriptors of compounds and a statistical treatment based on discriminant analysis has been demonstrated as a very efficient methodology for the selection of new natural compounds with anti-inflammatory activity. The mathematical model obtained can readily be applied to the search of new natural compounds in large databases or even for drug design. These results confirm the usefulness of Molecular Topology as a powerful tool in the search for new drugs.