PLS-Prediction and Confirmation of Hydrojuglone Glucoside as the Antitrypanosomal Constituent of Juglans Spp.

Naphthoquinones (NQs) occur naturally in a large variety of plants. Several NQs are highly active against protozoans, amongst them the causative pathogens of neglected tropical diseases such as human African trypanosomiasis (sleeping sickness), Chagas disease and leishmaniasis. Prominent NQ-producing plants can be found among Juglans spp. (Juglandaceae) with juglone derivatives as known constituents. In this study, 36 highly variable extracts were prepared from different plant parts of J. regia, J. cinerea and J. nigra. For all extracts, antiprotozoal activity was determined against the protozoans Trypanosoma cruzi, T. brucei rhodesiense and Leishmania donovani. In addition, an LC-MS fingerprint was recorded for each extract. With each extract’s fingerprint and the data on in vitro growth inhibitory activity against T. brucei rhodesiense a Partial Least Squares (PLS) regression model was calculated in order to obtain an indication of compounds responsible for the differences in bioactivity between the 36 extracts. By means of PLS, hydrojuglone glucoside was predicted as an active compound against T. brucei and consequently isolated and tested in vitro. In fact, the pure compound showed activity against T. brucei at a significantly lower cytotoxicity towards mammalian cells than established antiprotozoal NQs such as lapachol.


Introduction
Protozoal infections are a major cause of suffering and death in tropical regions and especially in developing countries. The WHO lists human African trypanosomiasis (HAT or sleeping sickness), Chagas disease and different leishmaniases including visceral leishmaniasis (Kala-Azar), as neglected tropical diseases with protozoan parasites as causative pathogens (the hemoflagellates Trypanosoma brucei rhodesiense, T. cruzi and Leishmania donovani, respectively) [1,2]. Naphthoquinones (NQs) constitute a class of secondary plant metabolites with frequently reported activity against these pathogens. NQs are immanently susceptible to oxidative reactions as well as to the addition of electrophiles via Michael-addition, and products of such reactions have been described as plant constituents. This includes for example dimeric NQs like diospyrin, which result from oxidative coupling of monomers and may also exhibit antiprotozoal activities [3][4][5][6]. NQs and other quinones may disturb the cellular redox status and thereby provoke the generation of reactive oxygen species (ROS) by redox cycling. Because their redox metabolism significantly differs from that of other eukaryotic cells, protozoans of the family Trypanosomatidae are strongly impaired by ROS. Yet, their poor bioavailability and toxicity to mammalian cells has limited the application of NQs and the necessity to find related structures with improved activity and/or reduced side effects has been stated [5][6][7].
ROS mainly arising from the photosynthetic apparatus of herbal materials under conditions of stress or senescence have been shown to significantly impact on secondary metabolites [8][9][10]. Due to their susceptibility to chemical modifications by e.g., oxidative coupling we hypothesized that the oxidative environment provided in drying or senescing leaves of NQ-bearing plants should yield new NQ derivatives with possibly improved pharmacological properties.
We decided for a chemometric approach using Partial Least Squares Regression (PLS) in order to readily detect such compounds in extract of Juglans spp. (Juglandaceae)-a genus that generates the NQ juglone from its precursor hydrojuglone [11]. PLS has first been introduced to phytochemistry as a predictive model for the anti-plasmodial activity of Artemisia annua L., Asteraceae [12] and was later used as a reflective model allowing for the identification of analytical signals correlated with bioactivity [13][14][15]. Our study aims at reflecting on PLS models in order to identify compounds from Juglans spp. that show antiprotozoal activity by correlating multiple samples' bioactivities with their LC-MS fingerprints.

Results and Discussion
Due to their high content of NQ-related compounds and abundant local availability, Juglans spp. appeared an interesting group of plants to investigate for antiprotozoal activity. To this end, samples of different plant parts (leaves, pericarps, bark, male flowers) in varying developmental stages, including senescent parts, were collected from Juglans regia L. (JR), Juglans cinerea L. (JC) and Juglans nigra L. (JN) (all Juglandaceae) and either freeze-or air-dried. Extracts were prepared with solvents of different polarity and extraction methods (see Experimental) yielding a total of 36 different samples. This procedure was chosen in order to maximize the chemical variability in the examined samples which is a prerequisite for successful application of multivariate data analysis methods. All extracts were tested for in vitro growth-inhibitory activity against the Trypanosomatidae Trypanosoma brucei rhodesiense Plimmer & Bradford (TB), T. cruzi Chagas (TC) and Leishmania donovani (Laveran et Mesnil) Ross (LD). Each sample was tested against each parasite at two different concentrations, i.e., 2 and 10 μg/mL. The results, along with the main characteristics of each extract, are listed in Table 1. Activities against TC and LD were quite low and hardly ever exceeded 20% or 50% inhibition, respectively. In contrast, multiple extracts from the sample collection frequently reached inhibition rates >90% against TB, while several other extracts showed poor or no observable activity under the test conditions. In order to test our hypothesis that senescent material should contain new active NQs, we compared by paired t-test the activities of those extracts of the same kind whose original materials only differed in their physiological state (e.g., No. 7 with No. 28). Senescent samples showed a significantly lower activity against TB than green samples (p-value 0.04). Consequently, this hypothesis is rejected.
In order to identify chemical constituents in the plant materials which are responsible for the variability in the biological response data (i.e., compounds accounting for high activity of certain samples in comparison with others), we chose to apply a method of multivariate statistics, namely, Partial Least Squares (PLS) regression, correlating biological activity with LC-MS analytical data. As for any correlative analysis, a meaningful PLS regression model should benefit from an adequate number of samples sufficiently varying within both the independent and dependent variables, i.e., in this case the chemical profile and bioactivity, respectively, which was assumed to be the case in the present set of anti-TB data.
To this end, each extract's chemical fingerprint was recorded by HPLC/ESI-QTOF-MS and processed into a data matrix suitable for PLS using Bruker Daltonics ProfileAnalysis 2.0 (Bruker Daltonik GmbH, Bremen, Germany, 2010). In particular, processing involved a bucketing procedure based on the detection of isotope patterns appearing as chromatographic peaks (program function "Find Molecular Features"). This matrix of independent variables, consisting of 401 variables, each representing an m/z-value: Retention time pair, was then exported to Camo Process AS the Unscrambler v. 9.2 (Camo Process AS, Oslo, Norway, 2005), subjected to unit variance scaling and amended by adding the bioactivity data as dependent variables before constructing a PLS model. The resulting PLS-model for activity against TB is depicted in Figure 1. High values for explained variance R 2 and variance predicted by leave-one-out cross validation Q 2 indicate that a well-fitting regression model was obtained (R 2 = 0.908/Q 2 = 0.886). Reflection on this model revealed that the buckets No. 68 (11.0 min: m/z 177.06) and 70 (11.0 min: m/z 339.12) along with some buckets of minor intensity (Table 2) show strong positive loading weights on PC1. Consequently, these buckets can be considered as important predictors for bioactivity. In fact, all buckets represent signals of pseudomolecular ions or fragments in the mass spectrum of the same compound which is thus very likely to represent a compound with antitrypanosomal activity.  Table 1. Colour code in (a) indicates the samples' inhibitory activity from the lowest (8.8%, blue) to the highest value (100%, red). Red colour in (d) indicates cross-validated data. Numbers in regression coefficient plot (b) and loadings plot (c) relate to bucket numbers. Buckets representing (1) are highlighted by circles.  This compound 1 was isolated in three steps (see Experimental Section) from a methanol extract of green JR-leaves, which was the most abundant source material. The molecular formula was established as C16H18O8 from the exact m/z and isotope pattern of the [M + H] + ; fragmentation indicates loss of a hexose moiety (NFL of 162.0547 amu) to yield the aglycone fragment at m/z 177.0570 (C10H8O3). Further, all fragments indicated by PLS were present in the MS² spectrum and could be explained by Metfrag [16]. The hexose was identified as β-glucose by the coupling constant of its anomeric proton ( 3 J1,2 = 7.6 Hz) and its 13 C-NMR shift values [17]. The remaining 1 H-NMR data indicate five aromatic protons for the aglycone, allocated to an AMX and an AX spin system, each one showing at least one coupling constant between 6.7 and 8.4 Hz. HSQC and 13 C-NMR data are consistent with five aromatic methine carbons and two quaternary carbons resonating between 108.21 and 128.86 ppm as well as three hydroxylated quaternary carbons with signals around 150 ppm. The HMBC spectrum showed a cross signal between the anomeric proton and a hydroxylated aromatic carbon in vicinity to the AX spin system and without an aromatic proton in peri-position. Thus, the aglycone was assigned as hydrojuglone and the entire molecule 1 as hydrojuglone β-glucoside ( Figure 2). All spectroscopic data are in agreement with published values (NMR: [18]; UV and MS: [19]). Pure 1 was consequently tested in the cellular assays and found to be active against TB (IC50 = 6.1 μM, SI = 20.0) and LD (IC50 = 16.7 μM, SI = 7.3). The cytotoxicity of 1 against mammalian cells was low (L6 rat skeletal myoblasts; IC50 = 122.4 μM). In comparison to the activity data of selected NQ and the chemically related benzohydroquinone glucoside arbutin which were also tested (Table 3), 1 shows the best combination of antitrypanosomal activity and toxicity. Even though its IC50 value against TB is considerably higher than those of the quinones juglone, 1,4-naphothoquinone, plumbagin and shikonin,

General Experimental Procedures
D2O was purchased from Merck Chemicals (Merck KGaA, Darmstadt, Germany). Chromatographic eluents and their additives were purchased from VWR International (VWR International GmbH, Darmstadt, Germany). Any other solvents or chemicals were purchased from Fisher Chemicals (Fisher Scientific, Schwerte, Germany).  Table 1). Senescent leaves, flowers or fruits were collected in October and were characterized by chlorosis and significantly lowered detachment force. Plant material was either air-dried at ambient temperature, or freeze-dried after being ground in liquid nitrogen.

Extract Preparation
Plant material was extracted using seven different protocols (A-G). A: one part of plant material was twice extracted with 10 parts of water at ambient temperature for 30 min in an ultrasonic bath. The extracts were pooled and freeze dried. B: one part of plant material was extracted twice with 10 parts of methanol at ambient temperature for 30 min in an ultrasonic bath. The extracts were pooled and dried using a rotary evaporator. C: one part of plant material was extracted twice with 10 parts of dichloromethane at ambient temperature for 30 min in an ultrasonic bath. The extracts were pooled and dried using a rotary evaporator. D: one part of plant material was twice extracted with 10 parts of methanol at ambient temperature for 30 min in an ultrasonic bath. The extracts were pooled, mixed with the same volume of water and kept in the fridge for 12 h. The extract was filtered, the methanol removed using a rotary evaporator and then freeze dried. E: one part of plant material was extracted twice with 10 parts of dichloromethane at ambient temperature for 30 min in an ultrasonic bath. The residual plant material was air dried and then extracted twice with 10 parts of 50% methanol at ambient temperature for 30 min in an ultrasonic bath. The 50% methanol extracts were pooled, the methanol removed using a rotary evaporator and then freeze dried. F: one part of plant material was extracted twice with 10 parts of dichloromethane at ambient temperature for 30 min in an ultrasonic bath. The residual plant material was air dried and then extracted twice with 10 parts of methanol at ambient temperature for 30 min in an ultrasonic bath. The methanol extracts were pooled, mixed with the same volume of water and kept in the fridge for 12 h. The extract was filtered, the methanol removed using a rotary evaporator and then freeze dried. G: one part of plant material was twice extracted with 10 parts of methanol at ambient temperature for 30 min in an ultrasonic bath. The extracts were pooled, mixed with the same volume of water and kept in the fridge for 12 h. The extract was filtered and subjected to rotary evaporation; water was added during the evaporation until the distillate became colorless. This procedure G removes juglone and other steam-volatile compounds from the extract.

LC-MS Measurements
For the preparation of LC-MS samples, each extract was dissolved in methanol to a concentration of 20 mg/mL by 5-min treatment in an ultrasonic bath. The solutions were centrifuged for 5 min at 14,680 rpm and the supernatant subjected to LC-MS analysis. Five random samples were injected before valid data was recorded in order to saturate the column with sample matrix. Each extract was then measured twice in one sequence without interruption. The order of the samples was changed randomly after the first measurement. Separation was performed on an Ultimate 3000 RS Liquid Chromatography System (Dionex Softron GmbH, Germering, Germany) over a Dionex Acclaim RSLC 120, C18 column (2.1 mm × 100 mm, 2.2 μm) with a binary gradient (A: water with 0.1% formic acic; B: acetonitrile with 0.1% formic acid) at 0.4 mL/min. 0 to 5 min: Isocratic at 5% B; 5 to 37 min: Linear from 5% to 100% B; 37 to 47 min: Isocratic at 100% B; 47 to 48 min: Linear from 100% to 5% B; 48 to 55 min: Isocratic at 5% B. The injection volume was 2 μL. Eluted compounds were detected using a Dionex Ultimate DAD-3000 RS over a wavelength range of 200-400 nm and a micrOTOF-QII time-of-flight mass spectrometer (Bruker Daltonics, Bremen, Germany) equipped with an Apollo electrospray ionization source in positive mode at 2 Hz over a mass range of m/z 50-1000 using the following instrument settings: Nebulizer gas nitrogen, 4 bar; dry gas nitrogen, 9 L/min, 200 °C; capillary voltage −4500 V; end plate offset −500 V; transfer time 100 μs, prepulse storage 6 μs; collision energy and collision RF settings were combined to each single spectrum of 2500 summations as follows: 1250 summations with 80 eV collision energy and 130 Vpp + 625 summations with 16 eV collision energy and 130 Vpp + 625 summations with 16 eV collision energy and 130 Vpp. MS/MS scans were triggered by AutoMS2 settings within a range of m/z 150-1000 (toggled off for fingerprints to be used in PLS models). Internal dataset calibration (HPC mode) was performed for each analysis using the mass spectrum of a 10 mM solution of sodium formiate in 50% isopropanol that was infused during LC re-equilibration using a divert valve equipped with a 20 μL sample loop.

Antiprotozoal Assays
Antiprotozoal testing of all extracts was performed at the Swiss TPH in Basel, Switzerland. For activity against TB, the strain STIB 900 (bloodstream trypomastigotes) was tested. For TC, the strain Tulahuen C2C4 (intracellular amastigotes in L6 cells) was used. For LD, activity against the strain MHOM/ET/67/L82 (axenic amastigotes) was determined. For control measurements of cytotoxicity, L6-cells (rat skeletal myoblasts, also used as host cells in the TC assay) were used. All extracts were tested for percent growth inhibition at two concentrations, namely, 2 and 10 μg/mL. IC50 values were determined for pure compounds as previously published. For more detailed protocols, see [20].

Calculation of PLS Models
Before calculating the PLS model, the LC-MS data of all extracts were converted to a bucket table using the Software ProfileAnalysis (Version 2.0, Bruker Daltonik GmbH) using the Find molecular features function for Data Selection with s/n threshold 3, Correlation coefficient 0.6, Minimum compound length 4 spectra, smoothing width 2 and additional smoothing activated. Buckets were generated from 1 to 30 min and from 100 to 1000 m/z. Advanced bucketing was applied with a retention time tolerance of 0.3 min and an m/z tolerance of 100 mDa. Data were not normalized within each analysis. Buckets with a value count of less than 50% were ruled out. Afterwards, the bucket table was transferred from ProfileAnalysis to the Software The Unscrambler (Version 9.2, Camo Process AS): The bucket table was exported from ProfileAnalysis in Simca-P + format which was imported from MS-Excel where data were transposed, assorted and exported as CSV. The CSV data were in turn imported by The Unscrambler as ASCII. With The Unscrambler, all buckets were scaled to unit variance (1.00/SDev) and the PLS model was calculated with full cross validation, using the samples' bucket tables as independent x-data and their bioactivity (expressed as percent of inhibition at 10 μg/mL) as dependent y-data. To refine the initial model, 3 of the extracts leading to large residues were left out of the model (No. 19, 34 and 36). As a final step, the model was recalculated with the significant x-variables (buckets) selected by Martens' Uncertainty Test.

Isolation and Structure Elucidation of (1)
Green, freeze dried leaves of JR (396 g, collected in 09/2012 in the garden of the IPBP, University of Münster) were macerated with methanol at ambient temperature to yield 80 g of dry extract after evaporation under reduced pressure. A portion (50 g) of this extract was separated over a glass column (60 cm × 5.2 cm, V = 1274 mL) filled with 500 g of Sephadex LH20 as stationary phase. The column was eluted with 3 L of methanol at a flow rate of 1 mL/min. Eluate was collected in portions of 10 mL. The fraction eluting between 1580 and 1650 mL yielded 3 g. 2.5 g of this fraction were separated further on a glass column (60 cm × 4 cm, V = 754 mL) packed with 250 g of silica gel 60 using a step gradient of 1 L ethyl acetate: methanol:water (9:1:1), 0.5 L of ethyl acetate:methanol:water (5:5:1) and 0.5 L of methanol at a flow rate of 1 mL/min. The eluate was collected in portions of 10 mL. The fraction eluting between 430-1480 mL yielded 1.49 g. 50 mg of this fraction were dissolved in 5 mL of methanol and subjected to preparative HPLC. Separation was performed on an HPLC system consisting of two

Conclusions
In our PLS model, compound 1 was predicted to account for the observed activity against TB. Antitrypanosomal activity of 1 was confirmed with the isolated compound. Consequently, the reflection on PLS models with chemical fingerprints as independent and bioactivity data as dependent variables has proven suitable for the identification of bioactive compounds from complex extracts, thereby recommending itself as an alternative to bioassay-guided isolation.
The PLS approach outclasses bio-guided isolation in that biological testing does not have to be iterated in between successive isolation procedures, but analytically guided targeted isolation can be applied once an active has been predicted by PLS. This was of particular advantage in this study, as the biotesting was performed in the laboratory of one cooperation partner remote from the chemical laboratories of the other, where LC-MS measurements and isolation took place. Thus, the prediction of actives from multiple variable samples by PLS or comparable methods of multivariate statistics represents an ideal platform for international cooperation. The required sample variability was successfully achieved by combining different plant materials from Juglans spp. with different sample preparation techniques. Though senescent material has not proven to contain NQ with improved bioactivity as initially hypothesized, its inclusion has aided the successful bioactivity prediction of 1, as many of the extracts less active against TB derived from senescent source materials and contained lower amounts of 1.
As cytotoxicity is a limiting factor for the application of many NQ, a particular interesting property of 1 is its superior selectivity index towards TB that outclasses established antiprotozoal NQs like lapachol. The much lower IC50 values of the benzoquinone glucoside arbutin suggests that 1 is not acting via unspecific toxic effects attributable to its phenolic character while differences in redox potential might play a role. Compared to juglone, the hydroquinone glucoside 1 shows an increased selectivity index. As TB is known to hold β-glycosidase activity [21], it seems reasonable that 1 enters the parasite cells as a prodrug to finally yield the NQ juglone, which is almost four times more active against TB and still displays an SI value > 10. This may explain the improved selectivity index in the cell culture models, even though the exact fate of 1 in trypanosomes and mammalian cells remains to be elucidated.
To our knowledge, this is the first description of a glucosylated NQ-related compound showing activity against protozoans. Especially the low cytotoxicity of the glycosylated compound leading to higher selectivity against the protozoans compared to simple NQ is of interest and suggests screening further glycosidic NQ-related compounds for antiprotozoal activity.