A Phenotarget Approach for Identifying an Alkaloid Interacting with the Tuberculosis Protein Rv1466

In recent years, there has been a revival of interest in phenotypic-based drug discovery (PDD) due to target-based drug discovery (TDD) falling below expectations. Both PDD and TDD have their unique advantages and should be used as complementary methods in drug discovery. The PhenoTarget approach combines the strengths of the PDD and TDD approaches. Phenotypic screening is conducted initially to detect cellular active components and the hits are then screened against a panel of putative targets. This PhenoTarget protocol can be equally applied to pure compound libraries as well as natural product fractions. Here we described the use of the PhenoTarget approach to identify an anti-tuberculosis lead compound. Fractions from Polycarpa aurata were identified with activity against Mycobacterium tuberculosis H37Rv. Native magnetic resonance mass spectrometry (MRMS) against a panel of 37 proteins from Mycobacterium proteomes showed that a fraction from a 95% ethanol re-extraction specifically formed a protein-ligand complex with Rv1466, a putative uncharacterized Mycobacterium tuberculosis protein. The natural product responsible was isolated and characterized to be polycarpine. The molecular weight of the ligand bound to Rv1466, 233 Da, was half the molecular weight of polycarpine less one proton, indicating that polycarpine formed a covalent bond with Rv1466.


Introduction
Drug discovery research and development has experienced two periods with different centric strategies, namely phenotypic-based drug discovery (PDD) and target-based drug discovery (TDD) (Figure 1). Commonly PDD refers to an approach without prior knowledge of the target. In phenotypic-based screening, compounds that modify a phenotype to generate a positive outcome in cell culture or in a whole organism are identified. TDD examines a specific drug target which is hypothesized to play an important role in disease. Adapted from [1,2].
Before the 1980s, the era without recombinant DNA technology, PDD was the primary approach in drug discovery. Most drugs at that time were discovered serendipitously by phenotypic assays in live animals or isolated tissues [3]. Examples are penicillin isolated from a Penicillium species in 1928 [4] [2] and ivermectin isolated from Streptomyces avermitilis in 1975 [5]. From the 1990s, the development of genomics allowed the identification of drug target proteins. The advent of highthroughput screening and combinatorial libraries enabled the screening of target proteins in high throughput. Due to the development of technologies including X-ray crystallography, computational modeling and screening (virtual docking), visualization of the interaction of target protein and compound greatly facilitated the later stage of structure-based development. The development of these technologies appealed to the pharmaceutical industry and academic researchers who then switched to focus on TDD during the last three decades.
In the era that mainly focused on TDD, the total number of new molecular entities (NMEs) and new biologics approved by the Food and Drug Administration was far below expectations [6]. The timeline of cumulative NME approvals from 1950 to 2008 was contributed to by the three most productive companies in the industry and showed almost straight lines, indicating that productivity continued at a constant rate for almost 60 years [1]. The introduction of new molecular biology tools such as recombinant DNA technology, deep sequencing, mining of Expressed Sequence Tagged (cDNA) libraries and the draft human genome did not facilitate drug innovation as expected. This was also indicated in a later NMEs analysis with a timeline spanning over five years to 2013 [7]. More dishearteningly, while the number of NMEs per year has remained relatively constant for the past four decades, the investment in pharmaceutical research and development (R&D) has increased dramatically to over 50 billion USD per year. Today the number of NMEs launched per billion dollars of investment is well below the return for an equivalent billion-dollar investment 50 years ago [1]. The asymmetrical output raises questions about the limitation of the popular target centric strategy of R&D in recent decades.
Consequently, there is a revival of interest in PDD. A significant analysis by Swinney demonstrated that the majority of NMEs approved by the FDA during the 10-year period between 1999 and 2008 were discovered using phenotypic assays, where 28 came from phenotypic screening approaches and 17 came from target-based approaches [8]. It was suggested that the lack of successful NMEs in the post-genomic era was mainly due to the limited use of phenotypic screening. Because an organism is a complex biological system, a simplified single protein assay may not efficiently represent the disease pathogenesis [8] The effects on a single protein may not be translated to meaningful therapeutic efficacy. Conversely, phenotypic screening is an approach for unbiased targets. Phenotypic-based screening is, therefore, being reconsidered for screening compounds due to the realization that results for a single target protein may not fully correlate in the context of a complex biological process [9]. Phenotypic screening also holds the unique promise to uncover new mechanisms of action for currently untreated diseases such as rare diseases and/or neglected diseases [2]. Before the 1980s, the era without recombinant DNA technology, PDD was the primary approach in drug discovery. Most drugs at that time were discovered serendipitously by phenotypic assays in live animals or isolated tissues [3]. Examples are penicillin isolated from a Penicillium species in 1928 [2,4] and ivermectin isolated from Streptomyces avermitilis in 1975 [5]. From the 1990s, the development of genomics allowed the identification of drug target proteins. The advent of high-throughput screening and combinatorial libraries enabled the screening of target proteins in high throughput. Due to the development of technologies including X-ray crystallography, computational modeling and screening (virtual docking), visualization of the interaction of target protein and compound greatly facilitated the later stage of structure-based development. The development of these technologies appealed to the pharmaceutical industry and academic researchers who then switched to focus on TDD during the last three decades.
In the era that mainly focused on TDD, the total number of new molecular entities (NMEs) and new biologics approved by the Food and Drug Administration was far below expectations [6]. The timeline of cumulative NME approvals from 1950 to 2008 was contributed to by the three most productive companies in the industry and showed almost straight lines, indicating that productivity continued at a constant rate for almost 60 years [1]. The introduction of new molecular biology tools such as recombinant DNA technology, deep sequencing, mining of Expressed Sequence Tagged (cDNA) libraries and the draft human genome did not facilitate drug innovation as expected. This was also indicated in a later NMEs analysis with a timeline spanning over five years to 2013 [7]. More dishearteningly, while the number of NMEs per year has remained relatively constant for the past four decades, the investment in pharmaceutical research and development (R&D) has increased dramatically to over 50 billion USD per year. Today the number of NMEs launched per billion dollars of investment is well below the return for an equivalent billion-dollar investment 50 years ago [1]. The asymmetrical output raises questions about the limitation of the popular target centric strategy of R&D in recent decades.
Consequently, there is a revival of interest in PDD. A significant analysis by Swinney demonstrated that the majority of NMEs approved by the FDA during the 10-year period between 1999 and 2008 were discovered using phenotypic assays, where 28 came from phenotypic screening approaches and 17 came from target-based approaches [8]. It was suggested that the lack of successful NMEs in the post-genomic era was mainly due to the limited use of phenotypic screening. Because an organism is a complex biological system, a simplified single protein assay may not efficiently represent the disease pathogenesis [8] The effects on a single protein may not be translated to meaningful therapeutic efficacy. Conversely, phenotypic screening is an approach for unbiased targets. Phenotypic-based screening is, therefore, being reconsidered for screening compounds due to the realization that results for a single target protein may not fully correlate in the context of a complex biological process [9]. Phenotypic screening also holds the unique promise to uncover new mechanisms of action for currently untreated diseases such as rare diseases and/or neglected diseases [2].
The current phenotypic based approach should not be regarded as a step back to the classical phenotypic screening but as a new discipline [7]. Currently, in vivo and in vitro approaches are involved in phenotypic screening, of which in vitro cell-based phenotypic assays can be easily adapted to a high-throughput format for automated phenotypic analysis. New technologies such as gene expression, genetic modifier screening, resistance mutation and computational inference are increasingly being applied in phenotypic screening. These sophisticated phenotypic screening methodologies enhance the identification of novel compounds as well as their mechanism of action [10].
Target identification is a crucial part of drug discovery. Most large pharmaceutical companies strongly recommend target identification because failing to assign the mechanism of action is frequently regarded as a major risk factor for clinical development and regulatory approval [8]. Even with the most advanced phenotypic screens, in most cases, it is still difficult to determine the mechanism of action. Target identification is the key value of TDD [6]. With the target protein in hand, detailed drug-protein characterization is possible and this provides a better understanding of structure-activity relationships, which is a challenge in phenotypic screening without a known target [11].
Considering the strengths and weaknesses of PDD and TDD, these two approaches should not be treated monolithically but as complementary approaches that can work together to increase the productivity of drug discovery and development. This article describes an approach that combines PDD and TDD: PhenoTarget screening to identify active compounds from natural resources. As a proof of principle, we apply this combination approach to probe fractions of natural product libraries for compounds active against Mycobacterium tuberculosis H37Rv, the etiological agent responsible for tuberculosis (TB). The active fraction was then screened against a panel of 37 unique mycobacterial proteins to identify the potential compound and protein responsible for the activity against M. tuberculosis.

Results
A combined approach of phenotypic screening and target screening, which we refer to as a PhenoTarget approach, was used to identify lead compounds against Mycobacterium tuberculosis from natural resources. The PhenoTarget approach started from the phenotypic screening of a natural product fraction library to identify fractions that were active in an HTS screen against M. tuberculosis H37Rv. Following phenotypic screening, native mass spectrometry was used to simultaneously identify the target protein and the molecular weight of the compound bound to the protein in pooled fractions using a panel of 37 purified mycobacterial proteins. The proteins in the panel were selected because they were all essential enzymes or virulence factors, their three-dimensional structures had been solved by the Seattle Structural Genomics Center for Infectious Diseases (SSGCID); [12,13] and SSGCID was able to supply purified proteins for the PhenoTarget screens. Screening of the pooled fractions against this protein panel of 37 putative anti-TB targets was conducted by a magnetic resonance mass spectrometry (MRMS) system equipped with an automated chip-based Nanomate system. A summary of the PhenoTarget approach is described in Figure 2.  [14]. A high throughput phenotypic screening of 202,983 NB LLE fractions against M. tuberculosis H37Rv was initially performed. Active fractions with an MIC value of less than 6.1 µge/µL were identified and chosen for protein screening against a panel of 37 putative anti-TB targets from Mycobacteria species. To lower sample consumption, especially protein, nine active fractions were pooled (Pool Fractions 1 to 40) and incubated with each of the target proteins. Native mass spectrometry was then used to identify free target protein (P, blue) and proteinligand (P-L) complexes. The mass shift between the P (black) and the P-L (red) peaks provided the molecular weight of the bound ligand (L, purple) and this facilitated the isolation and identification of the active compound. Figure 3 compares the native MRMS spectra for free Rv1466, a protein associated with [Fe-S] complex assembly and repair (a) and NB LLE Pool Fraction 4 (b) and 5 (c). All three spectra contain clusters of ions with three different charged states (+8, +7, and +6). However, the spectra with Pool Fractions 4 and 5 both contain a cluster of ions shifted to high m/z values by identical amounts relative to the free Rv1466 ions. These new ions are also of higher intensity than the free Rv1466 ions and correspond to Rv1466-ligand (P-L) complexes. From the charge and m/z differences between the cluster of P and P-L ions it is possible to obtain the molecular weight of the ligand-bound to Rv1466, 233 Da. Because the ligand that was bound to Rv1466 from Pool Fraction 4 and 5 had the same molecular weight, and likely the same compound, and appeared to have a high affinity for Rv1466 as deduced by the P:P-L intensity ratio, we chose to pursue the identity of this compound.  [14]. A high throughput phenotypic screening of 202,983 NB LLE fractions against M. tuberculosis H37Rv was initially performed. Active fractions with an MIC value of less than 6.1 µge/µL were identified and chosen for protein screening against a panel of 37 putative anti-TB targets from Mycobacteria species. To lower sample consumption, especially protein, nine active fractions were pooled (Pool Fractions 1 to 40) and incubated with each of the target proteins. Native mass spectrometry was then used to identify free target protein (P, blue) and protein-ligand (P-L) complexes. The mass shift between the P (black) and the P-L (red) peaks provided the molecular weight of the bound ligand (L, purple) and this facilitated the isolation and identification of the active compound. Figure 3 compares the native MRMS spectra for free Rv1466, a protein associated with [Fe-S] complex assembly and repair (a) and NB LLE Pool Fraction 4 (b) and 5 (c). All three spectra contain clusters of ions with three different charged states (+8, +7, and +6). However, the spectra with Pool Fractions 4 and 5 both contain a cluster of ions shifted to high m/z values by identical amounts relative to the free Rv1466 ions. These new ions are also of higher intensity than the free Rv1466 ions and correspond to Rv1466-ligand (P-L) complexes. From the charge and m/z differences between the cluster of P and P-L ions it is possible to obtain the molecular weight of the ligand-bound to Rv1466, 233 Da. Because the ligand that was bound to Rv1466 from Pool Fraction 4 and 5 had the same molecular weight, and likely the same compound, and appeared to have a high affinity for Rv1466 as deduced by the P:P-L intensity ratio, we chose to pursue the identity of this compound.

Target Fraction Confirmation
A feature common to Pool Fractions 4 and 5 was that they contained LLE fractions (LLE-2 and LLE-3, respectively) from the same marine biota, Polycarpa aurata, suggesting that a compound from P. aurata had interacted with Rv1466. A detailed high-resolution mass spectrometry (HRMS) investigation of all nine fractions from Pool Fraction 5 indicated that no fractions showed an ion at 233 m/z. In this case, MS investigation of the fractions did not confirm the presence of a ligand, so native MS was used to identify the correct fraction for isolation. Thus, five fractions were generated from a re-extraction of P. aurata by 95% ethanol and named Fractions A, B, C, D and E (Figure 4a). HRMS analysis of the constituents of Fraction C by a quadrupole-time-of-flight (Q-TOF) mass spectrometer identified an ion at m/z 236 as well as a higher mass ion at m/z 469 corresponding to a 468 + H + ion (Figure 4b). Another round of native MRMS screening confirmed that a compound in Fraction C interacted with Rv1466 and the molecular weight of the bound species was 233 Da ( Figure  4c). Because the molecular weight of the ligand bound to Rv1466 was approximately half the molecular weight of the major species in Fraction C, our attention was drawn towards a potentially symmetrical parent compound with a molecular weight of 468 Da as the agent reacting with Rv1466.

Target Fraction Confirmation
A feature common to Pool Fractions 4 and 5 was that they contained LLE fractions (LLE-2 and LLE-3, respectively) from the same marine biota, Polycarpa aurata, suggesting that a compound from P. aurata had interacted with Rv1466. A detailed high-resolution mass spectrometry (HRMS) investigation of all nine fractions from Pool Fraction 5 indicated that no fractions showed an ion at 233 m/z. In this case, MS investigation of the fractions did not confirm the presence of a ligand, so native MS was used to identify the correct fraction for isolation. Thus, five fractions were generated from a re-extraction of P. aurata by 95% ethanol and named Fractions A, B, C, D and E (Figure 4a). HRMS analysis of the constituents of Fraction C by a quadrupole-time-of-flight (Q-TOF) mass spectrometer identified an ion at m/z 236 as well as a higher mass ion at m/z 469 corresponding to a 468 + H + ion ( Figure 4b). Another round of native MRMS screening confirmed that a compound in Fraction C interacted with Rv1466 and the molecular weight of the bound species was 233 Da (Figure 4c). Because the molecular weight of the ligand bound to Rv1466 was approximately half the molecular weight of the major species in Fraction C, our attention was drawn towards a potentially symmetrical parent compound with a molecular weight of 468 Da as the agent reacting with Rv1466.

Binding Compound Isolation and Structure Elucidation
Separation of P. aurata extracts by reverse phase semi-preparative HPLC ( Figure 5) gave a compound identified as polycarpine, C22H24N6O2S2, a dimeric disulfide alkaloid previously identified

Binding Compound Isolation and Structure Elucidation
Separation of P. aurata extracts by reverse phase semi-preparative HPLC ( Figure 5) gave a compound identified as polycarpine, C 22 H 24 N 6 O 2 S 2 , a dimeric disulfide alkaloid previously identified from P. autara [15] and in a related species, Polycarpa clavata [16] (Figure 6). As illustrated in Figure 5a, the resonances observed in the one-dimensional 1 H spectrum of pure polycarpine (bottom, red) are present in the one-dimensional 1 H NMR spectrum for Fraction C (top, black). Moreover, pure polycarpine elutes with a retention time identical to a band in the LC profile for Fraction C (Figure 5b). The mass spectra of both bands are identical (Figure 5c) and consistent with a species with a molecular weight of 468 Da. from P. autara [15] and in a related species, Polycarpa clavata [16] (Figure 6). As illustrated in Figure  5a, the resonances observed in the one-dimensional 1 H spectrum of pure polycarpine (bottom, red) are present in the one-dimensional 1 H NMR spectrum for Fraction C (top, black). Moreover, pure polycarpine elutes with a retention time identical to a band in the LC profile for Fraction C (Figure 5b).  . Figure 6. The structure of polycarpine.

Pseudo-KD Value Determination of Polycarpine with Rv1466
An advantage of native mass spectrometry is that in addition to the qualitative identification of protein-ligand formation, the technique also provides quantitative information on the strength of the interaction [17,18]. This is because it is possible to estimate the dissociation constant, KD, from the ratio of free and bound protein observed in the MS chromatograms. It is accomplished by collecting MRMS data at a fixed protein concentration with increasing concentrations of the ligand (Figure 7a). Figure 7b displays twelve mass spectra of samples containing 9 μM Rv1466 and increasing amounts of polycarpine (0.1-300 μM). A ligand concentration was reached where the intensity of the proteinligand complex reached a plateau. The ratios of the intensity of protein-ligand peak and sum of protein peak and protein-ligand peak were plotted against the concentration of polycarpine ( Figure  7c). Using these ratios and Equations 1 and 2, a pseudo-KD of 5.3 ± 0.4 μΜ was calculated for polycarpine binding to Rv1466. A real KD cannot be calculated because after binding a covalent bond forms between the ligand and Rv1466 in a time-dependent manner, pushing the equilibrium to the product. . Figure 6. The structure of polycarpine.

Pseudo-KD Value Determination of Polycarpine with Rv1466
An advantage of native mass spectrometry is that in addition to the qualitative identification of protein-ligand formation, the technique also provides quantitative information on the strength of the interaction [17,18]. This is because it is possible to estimate the dissociation constant, KD, from the ratio of free and bound protein observed in the MS chromatograms. It is accomplished by collecting MRMS data at a fixed protein concentration with increasing concentrations of the ligand (Figure 7a). Figure 7b displays twelve mass spectra of samples containing 9 μM Rv1466 and increasing amounts of polycarpine (0.1-300 μM). A ligand concentration was reached where the intensity of the proteinligand complex reached a plateau. The ratios of the intensity of protein-ligand peak and sum of protein peak and protein-ligand peak were plotted against the concentration of polycarpine ( Figure  7c). Using these ratios and Equations 1 and 2, a pseudo-KD of 5.3 ± 0.4 μΜ was calculated for polycarpine binding to Rv1466. A real KD cannot be calculated because after binding a covalent bond forms between the ligand and Rv1466 in a time-dependent manner, pushing the equilibrium to the product.

Pseudo-K D Value Determination of Polycarpine with Rv1466
An advantage of native mass spectrometry is that in addition to the qualitative identification of protein-ligand formation, the technique also provides quantitative information on the strength of the interaction [17,18]. This is because it is possible to estimate the dissociation constant, K D , from the ratio of free and bound protein observed in the MS chromatograms. It is accomplished by collecting MRMS data at a fixed protein concentration with increasing concentrations of the ligand (Figure 7a). Figure 7b displays twelve mass spectra of samples containing 9 µM Rv1466 and increasing amounts of polycarpine (0.1-300 µM). A ligand concentration was reached where the intensity of the protein-ligand complex reached a plateau. The ratios of the intensity of protein-ligand peak and sum of protein peak and protein-ligand peak were plotted against the concentration of polycarpine (Figure 7c). Using these ratios and Equations 1 and 2, a pseudo-K D of 5.3 ± 0.4 µM was calculated for polycarpine binding to Rv1466. A real K D cannot be calculated because after binding a covalent bond forms between the ligand and Rv1466 in a time-dependent manner, pushing the equilibrium to the product.

Discussion
Using the PhenoTarget approach, a ligand from a natural product fraction library was identified that bound to the M. tuberculosis protein Rv1466. The ligand molecular weight from the native MRMS spectrum was 233 Da. The compound was shown to be polycarpine with a molecular mass of 468 Da. The ligand MW of 233 Da was half the molecular weight of polycarpine less one proton, indicating that polycarpine formed a covalent bond with Rv1466. Under mild silica chromatography, the polycarpine disulfide bond has been shown to be cleaved [15,16]. Rv1466 has a single cysteine and the structure of Rv1466 (5IRD) indicates this residue is exposed on the surface of the protein so that it could attack the disulfide bond of polycarpine to form a covalent disulfide linked complex. The polycarpine-Rv1466 pseudo-KD was determined to be 5.3 ± 0.4 μM. P. aurata, a species of tunicate in the family Styelidae, is rich in alkaloids containing sulfur such as polycarpine, polycarpaurine A, polycarpaurine B and polycarpaurine C [15,16,19,20].
In the PhenoTarget approach, phenotypic screening is conducted initially to detect cellular active components. In our case a natural product fraction library was used, however, this could be equally applied to pure compound libraries. Mass spectrometry using a protein panel of putative targets provided sufficient throughput to analyze the phenotypic hits. Native mass spectrometry offers the further advantage that it can be used for cloned, expressed and purified proteins that lack structural information. This opens the option to explore the proteome of many species, in our case, the Mycobacterium proteome. As well as its high sensitivity, high-throughput capability and in time observation of protein-ligand interaction, it can also provide exact mass information of the binding compound [21]. The mass information is useful for isolating the compound from the fraction mixture using mass-guided isolation. In addition, mass spectrometry has been reported as an alternative technique for the quantitative characterization of protein-ligand interactions in solution, that is, the determination of equilibrium constants for protein-ligand association (Ka) or dissociation (Kd = 1 / Ka) [22,23]. Among all the mass spectrometers, MRMS provides the highest accuracy and resolution.

Discussion
Using the PhenoTarget approach, a ligand from a natural product fraction library was identified that bound to the M. tuberculosis protein Rv1466. The ligand molecular weight from the native MRMS spectrum was 233 Da. The compound was shown to be polycarpine with a molecular mass of 468 Da. The ligand MW of 233 Da was half the molecular weight of polycarpine less one proton, indicating that polycarpine formed a covalent bond with Rv1466. Under mild silica chromatography, the polycarpine disulfide bond has been shown to be cleaved [15,16]. Rv1466 has a single cysteine and the structure of Rv1466 (5IRD) indicates this residue is exposed on the surface of the protein so that it could attack the disulfide bond of polycarpine to form a covalent disulfide linked complex. The polycarpine-Rv1466 pseudo-K D was determined to be 5.3 ± 0.4 µM. P. aurata, a species of tunicate in the family Styelidae, is rich in alkaloids containing sulfur such as polycarpine, polycarpaurine A, polycarpaurine B and polycarpaurine C [15,16,19,20].
In the PhenoTarget approach, phenotypic screening is conducted initially to detect cellular active components. In our case a natural product fraction library was used, however, this could be equally applied to pure compound libraries. Mass spectrometry using a protein panel of putative targets provided sufficient throughput to analyze the phenotypic hits. Native mass spectrometry offers the further advantage that it can be used for cloned, expressed and purified proteins that lack structural information. This opens the option to explore the proteome of many species, in our case, the Mycobacterium proteome. As well as its high sensitivity, high-throughput capability and in time observation of protein-ligand interaction, it can also provide exact mass information of the binding compound [21]. The mass information is useful for isolating the compound from the fraction mixture using mass-guided isolation. In addition, mass spectrometry has been reported as an alternative technique for the quantitative characterization of protein-ligand interactions in solution, that is, the determination of equilibrium constants for protein-ligand association (Ka) or dissociation (Kd = 1/Ka) [22,23]. Among all the mass spectrometers, MRMS provides the highest accuracy and resolution. Nanoelectrospray was used for sample delivery. Compared to conventional electrospray, its benefits include an increased sensitivity signal, lower consumption of ligand and protein, and no sample-to-sample cross-contamination, a faster speed and a higher degree of nozzle-to-nozzle reproducibility [24]. A high degree of nozzle-to-nozzle reproducibility is important for quantitative studies such as K D determination. It has been proven that K D results from chip-based electrospray are consistent with conventional techniques [17].
While there are advantages in the phenotypic drug discovery approach it should be noted that there are also some disadvantages. Foremost is the procurement of pure protein panels for target identification. For example, the genome of M. tuberculosis contains over 4000 genes [25] and our exploratory panel contained only 37 different proteins. This bottleneck can be partly circumvented by using orthologous proteins from related species [26]. Indeed, in addition to M. tuberculosis (12 proteins), our mycobacterial panel contained proteins from Mycobacterium smegmatis (5), Mycobacterium abscessus In summary, while the PhenoTarget approach has some limitations, this article is an example of the powerful potential of the PhenoTarget approach in identifying new lead compounds, from a natural product fraction library, as well as the target protein of the lead compound. In the PhenoTarget approach, phenotypic screening is conducted initially to detect cellular active components. The hits are then screened against a panel of putative targets. This PhenoTarget protocol can be equally applied to pure compound libraries as well as natural product fractions and may prove to be a valuable alternative strategy for developing new intervention therapies.

General Experimental Procedures
NMR spectra were recorded in DMSO-d 6 (δ H 2.50 and δ C 39.5), CD 3  urich, Switzerland) equipped with a triple resonance cryoprobe. The low-resolution LC-MS was acquired using Ultimate 3000 RS UHPLC coupled to a Thermo Fisher MSQ Plus single quadruple ESI mass spectrometer (Waltham, MA, USA) with a Thermo Accucore C 18 column (2.6 µm, 2.1 × 150 mm). High-resolution mass spectra (HRESIMS) were recorded on a Bruker maXis II ETD ESI-qTOF (Bremen, Germany). The HPLC system for LLE fractions for phenotypic screening included a Waters 600 pump (Milford, MA, USA) fitted with a 996-photodiode array detector and Gilson FC204 fraction collector (Middleton, WI, USA). The HPLC system for Fractions A~E from re-extraction by ethanol was the Thermo Ultimate 3000 system (Waltham, MA, USA). Semi-preparative HPLC was also programmed on Thermo Ultimate 3000 with a PDA detector. A Phenomenex C 18 Monolithic column (5 µm, 4.6 × 100 mm) was used for analytical HPLC; a Thermo Electron Betasil C 18 column (5 µm, 21.2 × 150 mm) was used for semi-preparative HPLC.
The fraction library (202,983 fractions) was from Nature Bank (Griffith Institute for Drug Discovery, Brisbane, Australia). All the solvents used for extraction, chromatography and MS were Lab-Scan HPLC grade, and the H 2 O was Millipore Milli-Q PF filtered.
Ammonium acetate was purchased from. Sigma-Aldrich (St Louis, O.M., USA). Rv1466 and the other 36 unique proteins in the target panel were supplied by the Seattle Structural Genomics Center for Infectious Diseases (SSGCID, Seattle, WA, USA). These were "well behaved" proteins whose structures have been solved by SSGCID and are publicly available in the PDB. All proteins contained a non-natural, N-terminal tag for purification (MAHHHHHHMGTLEAQTQGPGS-). For protein buffer exchange, Nalgene NAP-5 size G25, from GE Healthcare (Parramatta, NSW, Australia) was used (purchased through Thermo Fisher Scientific Australia Pty Ltd., Scoresby, VIC, Australia). The screening of protein-ligand complexes was conducted using Bruker Daltonics SolariX 12 T Magnetic Resonance Mass Spectrometry (Bruker Daltonics Inc., Billerica, MA, USA) equipped with automatic Nanoelectrospray system (TriVersa NanoMate, Advion Biosciences, Ithaca, NY, USA).

Lead-Like Enhanced Fraction Library
The lead-like enhanced fraction library was prepared as previously described [14].

Phenotypic Screening-Activity against Replicating M. tuberculosis H37Rv
Growth inhibition of M. tuberculosis was monitored using a 1 µL of fraction (250 µge/µL) dispensed into each well of a 384-well plate. To this, 40 µL of M. tuberculosis ATCC 27294 H37Rv (3-5 × 10 5 /mL in Middlebrook 7H9 broth with 0.05% Tween 80, 10 v/v ADC and Casamino acids) was added with a Multidrop dispenser. The plates were then incubated at 37 • C for 7 days. A 10 µL solution of Resazurin (20 mg/100 mL diluted 1:1 with 10% Tween 80) was added and incubated further for an additional 24 h at 37 • C for color development. Absorbance was monitored at two wavelengths (575 and 610 nm) using Spectramax and the ratios were determined to calculate the % inhibition. Growth controls in the absence of compound as well as media controls served as inhibition~0% and −100% respectively.

Automatic Chip-Based MRMS High Throughtput Screening
Proteins were buffer-exchanged into a suitable volatile buffer (ammonium acetate) under near physiological conditions using size exclusion chromatography prior to MRMS analysis. Depending on the protein, the buffer and its concentration were chosen to obtain the highest sensitivity in the mass spectrometer. Final concentration was 10 µM protein in 10 mM ammonium acetate after buffer change.
All screening samples were prepared fresh on the day for analyzing data to avoid the precipitation or decomposition in the sample.
Each 9 LLE fractions (9 × 1 µL) were combined as a Pool Fraction. Every Pool Fraction was dried, re-suspended in 9 µL MeOH. For the screening of Pool Fractions, 1 µL MeOH solution of pool fraction was mixed with 9 µL of 10 µM protein and then incubated for 1 h at room temperature before being analyzed by MRMS.
For screening the incubation of Fraction C of P. aurata with Rv1466-Putative uncharacterized protein, Fraction C was dried and re-suspended in 400 µL MeOH. An aliquot solution (1 µL) was added to protein (9 µL at 10 µM) for 15 min at room temperature and then analyzed by MRMS.
For the screening of pure compound-polycarpine, the stock solution of polycarpine, 3000 µM dissolved in DMSO-d 6 , was further diluted to additional varied concentrations between 1-1000 µM for dose-response measurement. Polycarpine solution in DMSO-d 6 was dried and then dissolved in MeOH. A 40 µL sample of these polycarpine solutions was lyophilized and resuspended in 40 µL of MeOH. An aliquot (1 µL) of varied concentration of polycarpine in MeOH was incubated with 9µL of protein (10 µM) for 15 min at room temperature and then analyzed by MRMS.
Experiments were performed on a Bruker SolariX XR 12T MRMS (Bruker Daltonics Inc., Billerica, MA) equipped with an external automated chip-based nanoelectrospray. The ESI mass spectra were recorded in positive mode with a mass range from 294 to 10,000 m/z. Each spectrum was composed of 2 M data points. All the aspects of instrument parameter and data acquisition were controlled by Solarix control software under Windows operating system. Instrument parameters were tuned as follows for Pool Fraction screening: sample flow rate 120 µL/min, nebulizer gas (N 2 ) pressure 3, end plate offset voltage 0 V, capillary voltage 1000 V, drying gas (N 2 ) flow rate 1.5 L/min, drying gas temperature 100 All sample solutions were injected by fully automated chip-based nanoelectrospray, which was mounted to the mass spectrometer. A Triversa NanoMate incorporating ESI chip technology (Advion Biosciences, Ithaca, NY) was used as an automatic chip system. The automated chip system consists of a 384-well plate, a rack of 96 disposable, conductive pipette tips, and the chip, which was positioned a few millimeters from the sampling cone. The system was programmed by ChipSoft software (Version 8.3.1, Advion BioSciences), which sequentially picks up a new pipette tip, aspirates 3 µL of sample from the 384-well plate followed by 1.5 µL of air and then delivers to the inlet side of ESI chip. Nanoelectrospray was carried out by applying a 1.60 kV spray voltage and a 1.0 psi gas pressure to the sample in the pipette tip. Sample solutions for screening were transferred into the 384-well plate. For every sample, a fresh tip and nozzle were used, thus preventing cross-contamination of samples. Following sample infusion and MS analysis, the pipet tip was disposed.
When the protein-ligand complex was found, the molecular weight of the binding ligand was estimated from the spectrum using the following equation: MW ligand = ∆ m/z × z.

Isolation
Crude ethanolic extract of P. aurata (200 mg) was dissolved in MeOH. HPLC separation was conducted on a Thermo Electron Betasil C 18 column (5 µm, 21.2 × 150 mm) at a flow rate of 9 mL/min: using gradient solvent system (10-100% MeOH in 50 min, 100% MeOH from 50-60min) and 60 fractions were collected by minutes. Fraction 33 and 34 were further purified on the same column using gradient solvent from 30-80% MeOH to afford polycarpine.

Dose-Response Data Analysis
The relative abundances of protein-ligand complex to total protein in the mass spectra correlated to the relative equilibrium concentrations of ligand to total protein in solution. The pseudo K D of polycarpine with Rv1466 was determined using the following equations [18]: Experimental relative ratios of protein-ligand complex and total protein ion abundances were plotted against the total concentration of ligand. The pseudo K D could be obtained as a parameter of a nonlinear least-squares curve fitting.