Chemoinformatic Analysis as a Tool for Prioritization of Trypanocidal Marine Derived Lead Compounds

Marine trypanocidal natural products are, most often, reported with trypanocidal activity and selectivity against human cell lines. The triaging of hits requires a consideration of chemical tractability for drug development. We utilized a combined Lipinski’s rule-of-five, chemical clustering and ChemGPS-NP principle analysis to analyze a set of 40 antitrypanosomal natural products for their drug like properties and chemical space. The analyses identified 16 chemical clusters with 11 well positioned within drug-like chemical space. This study demonstrated that our combined analysis can be used as an important strategy for prioritization of active marine natural products for further investigation.


Introduction
Human African Trypanosomiasis (HAT), also known as African Sleeping Sickness, is a fatal disease transmitted by two species of a protozoan parasite, T. brucei rhodesiense and T. brucei gambiense. T. b. rhodesiense is the agent of the acute form of the disease, prevailing in Eastern and Southern Africa and T. b. gambiense causes the chronic form of the disease in Western and Central Africa. According to the latest figures from the World Health Organization (WHO), African Sleeping

OPEN ACCESS
Sickness threatens 70 million people in resource-poor regions of Africa, and is the world's third most devastating parasite disease [1]. Since the disease predominantly afflicts the very poor, it is designated as a neglected tropical disease. The registered drugs suramin and pentamidine are not effective against both stages of the disease. The second stage effective drug melarsoprol has associated toxicity which has been reported as lethal in up to 12% of cases [2]. There have also been reports of incidence of drug resistance in HAT cases [2,3]. There is an urgent need for the development of new, safer and more effective drugs to fight African Sleeping Sickness.
The search for antitrypanosomal agents has predominantly focused on synthetic efforts. A series of purine nitriles synthesized by combinational chemistry have showed potent trypanocidal activity and a high degree of selectivity [4]. Most recently, Sanofi-Aventis and Drugs for Neglected Diseases initiative (DNDi) have announced an agreement for the development, manufacturing and distribution of fexinidazole, a promising new drug for the treatment of African Sleeping Sickness [5]. Though natural product research has not played a central role in the search for antitrypanosomal therapeutics, there are emerging numbers of compounds from plants and marine organisms with promising activity against trypanosomiasis [6][7][8].
We have previously reported a series of marine natural products active against T. b. brucei [9][10][11][12][13]. In most cases, natural products research stops when new structures and their associated biological activities are published. We wish to develop a strategy to prioritize these molecules for further investigation. We conducted further analysis to evaluate the drug-like properties and chemical space of these and other compounds. In this paper, we will discuss the chemoinformatic methods we used to conduct the analysis, including Lipinski's rule-of-five, chemical clustering and ChemGPS-NP principle component analysis, as well as the results of these analyses.

Results and Discussion
The overall outline of the natural product discovery program is shown in Figure 1. The objective was to front-load both crude extracts and subsequent fractions with desirable physicochemical properties, rapidly isolate natural products that are principally located within biologically relevant chemical space, and prioritize isolated compounds for further chemical and biological investigation.

Marine Fraction Library
A pre-fractionated library was constructed using a proprietary lead-like enhanced extraction and fractionation protocol developed in-house [14,15]. The crude CH 2 Cl 2 and MeOH extracts from Australian marine organisms were first loaded onto solid-phase absorbent poly(divinylbenzene-N-vinylpyrrolidone) copolymer (Waters Oasis HLB) eluting with MeOH/H 2 O (70:30) containing 1% trifluoroacetic acid (TFA). The MeOH/H 2 O (70:30) eluent which was proven to contain constituents with calculated log P < 5. The fraction library was constructed using reverse-phase solvent conditions (MeOH/H 2 O/0.1% TFA) on a C 18 Monolithic HPLC column. Eleven fractions were collected per extract between 2 and 7 min of the chromatogram where constituents had calculated log P < 5 ( Figure 2). The fractionation provided a second filtration of log P allowing constituents with high log P to be removed. The fractionation process also separated the complex crude extracts into fractions containing a small number of compounds to facilitate the rapid identification of active molecules.
4765 Australian marine organisms were extracted and fractionated to construct the marine library consisting of 52,415 fractions. These marine organisms represented over 200 families and 420 genera. The organisms were collected from tropical and sub-tropical Queensland and temperate Tasmanian waters in Australia.

Lipinski's Rule-of-Five
The drug-and lead-like physical and chemical properties of these natural products were calculated using Instant JChem (version 6.03) [40]. The parameters including molecular weight (MW), log P, number of hydrogen bond acceptors (HBA), and number of hydrogen bond donors (HBD) were analysed against Lipinski's rule-of-five (Table 1 and Figure 6). The results (Table 1 and Figure 6) suggested that the majority of isolated natural products obeyed Lipinski's rule-of-five in terms of MW < 500 Da (92%), log P < 5 (87.5%), HBA < 10 (97.5%) and HBD < 5 (97.5%), although we have previously reported that log D 5.5 is a more useful parameter to classify the lipophilicity of ionisable natural products [14].

Chemical Clustering
Cluster analysis of the isolated natural products was undertaken to identify congeneric chemical series. Canvas (version 1.6) was used to calculate 32-bit linear, path-based chemical fingerprints using Daylight invariant atom types [41]. To enhance the discriminating power of the chemical fingerprint, bits present in more than 95% or less than 5% of compounds were discarded.
Hierarchical clustering was then performed using the average distance between all inter-cluster pairs of a Tanimoto similarity matrix calculated from the fingerprints. Although the Kelley criterion [41] suggested 11 clusters were statistically optimal, the merging distance was manually decreased until each resulting cluster presented more structurally homogeneous groupings. The hierarchical clustering combined the 40 individual natural products into 11 chemical classes and 5 singletons as indicated in Figure 7 (with individual member structures memberships shown in red text on Figure 5). The structural classes identified by the hierarchical clustering showed an excellent correlation with the partitioning of observed antitrypanosomal activity [1][2][3][4][5]. The most active chemical classes identified were the pyridoacridine alkaloids (12)(13)(14) in cluster 1, the cinnamoyl amino acids (10)(11) in cluster 6, the aryl amines (8-9) in cluster 7, and the cyclic peroxides (1-2) in clusters 16.

ChemGPS-NP Analysis
Rather than consider each physicochemical property in isolation, we were also interested in how these properties combine to influence the drug-likeness of the isolated natural products. ChemGPS-NP is a -global map‖ representing the limits of biologically relevant chemical space where the individual coordinates are t-scores from principal component analysis (PCA) using 35 descriptors calculated from 1779 chemical structures [42]. While ChemGPS-NP is comprised of eight coordinate dimensions (principal components, PCs), the four most significant PCs explain 77% of the variance in the training data and can be interpreted as representing broad physical properties such as size, shape, and polarizability (PC1); aromatic and conjugation related properties (PC2); lipophilicity, polarity, and H-bond capacity (PC3); and flexibility and rigidity (PC4).

General Experimental Procedures
All solvents used for SPE, HPLC, and MS were Lab-Scan HPLC grade, and the H 2 O was Millipore Milli-Q PF filtered. Dimethyl sulfoxide (DMSO, 99.9%) and TFA (99%) were obtained from Fluka. Oasis HLB was obtained from Waters. Commercially available Oasis HLB cartridges (400 mg) were employed to generate the fraction library. HPLC separations were performed on a Phenomenex C 18 Monolithic HPLC column (4.6 mm × 100 mm).
A Bio-line orbital shaker was used for large-scale extractions. Alltech Davisil 40-60 μm 60 Å C 18 bonded silica was used for pre-adsorption work. A Waters 600 pump equipped with a Waters 996 PDA detector and a Waters 717 autosampler were used for HPLC. A Gilson 215 liquid handler (5 mL syringe, 200 µL Rheodyne sample loop) was used for injection and fraction collection. The liquid handler was controlled by Gilson 735 software (version 6.00). A ThermoElectron C 18 Betasil 5 μm 143 Å column (21.2 mm × 150 mm) was used for semi-preparative HPLC separations [11].

Animal Material
The marine samples were collected in Queensland and Tasmania, Australia, by SCUBA diving. Samples were kept frozen prior to freeze-drying and extraction. Voucher samples have been lodged at the Queensland Museum, Brisbane, Australia.

Construction of Fraction Library
Freeze-dried and ground marine invertebrate samples (300 mg) were extracted with n-hexane (7 mL). The n-hexane extract was discarded, and each sample then extracted with 80:20 CH 2 Cl 2 /MeOH (7 mL) and dried. A second extract using MeOH (13 mL) was collected in the same glass test tube and dried to afford the crude extract. Further extraction and purification protocols refer to previous publication [14].

Extraction and Isolation
The freeze-dried and ground marine organism (10 g) was poured into a conical flask (1 L), n-hexane (250 mL) was added and the flask was shaken at 200 rpm for 2 h. The n-hexane extract was filtered under gravity then discarded. CH 2 Cl 2 :MeOH (4:1, 250 mL) was added to the de-fatted sponge material in the conical flask and shaken at 200 rpm for 2 h. The resulting extract was filtered under gravity, and set aside. MeOH (250 mL) was added and the MeOH/marine organism mixture was shaken for a further 2 h at 200 rpm. Following gravity filtration the biota was extracted with another volume of MeOH (250 mL), while being shaken at 200 rpm for 16 h. All CH 2 Cl 2 /MeOH extracts were combined and dried under reduced pressure to yield crude extracts. A portion of this material (1.0 g) was pre-adsorbed to C 18 -bonded silica (1 g) then packed into a stainless steel cartridge (10 × 30 mm) that was subsequently attached to a C 18 semi preparative HPLC column. Isocratic HPLC conditions of 90% H 2 O (0.1% TFA)/10% MeOH (0.1% TFA) were initially employed for the first 10 min, then a linear gradient to 100% MeOH (0.1% TFA) was run over 40 min, followed by isocratic conditions of MeOH (0.1% TFA) for a further 10 min, all at a flow rate of 9 mL/min. Sixty fractions (60 × 1 min) were collected every minute from the start of the HPLC run. The fractions of interest were analyzed by LC-MS and bioassay, further purifications were carried out predominantly by reverse-phase C 18 HPLC eluting with gradient MeOH/H 2 O containing 0.1% TFA to yield pure natural products.
Canvas (version 1.6) [41] was used to calculate 32-bit linear fingerprints with Daylight invariant atom types, excluding bits set in less than 5% or more than 95% of molecules. Hierarchical clustering using average cluster linkage was performed using a Tanimoto similarity matrix calculated from the chemical fingerprints. The clustering level was manually adjusted (0.7569 merging distance) and the 16 resulting cluster centroids exported for ChemGPS-NP analysis.
ChemGPS-NP coordinates for all structures were calculated using the online web service.

Conclusions
A combined Lipinski's rule-of-five, cluster analysis and ChemGPS-NP principle component analysis of 40 marine natural products led to the identification of 16 chemical clusters, with 11 clusters positioned within drug-like chemical space. The results demonstrated that the initial enrichment of the screening library based on physicochemical profiling can translate into isolation of natural products with desirable physicochemical properties for oral bioavailability. The combined Lipinski's rule-of-five, chemical clustering and ChemGPS-NP analysis can be employed as a beneficial strategy for the prioritization of active marine natural products for further investigation.