Feature-Based Molecular Network-Guided Dereplication of Natural Bioactive Products from Leaves of Stryphnodendron pulcherrimum (Willd.) Hochr

Stryphnodendron pulcherrimum is a species known to have a high content of tannins. Accordingly, its preparations are used in southern Pará, Brazil, for their anti-inflammatory and antimicrobial activities, but so far, its chemical profile composition remains essentially unknown. We herein describe the compounds present in a hydro-acetonic extract from S. pulcherrimum leaves as revealed by dereplication via ultra-high performance liquid chromatography coupled to high-resolution mass spectrometry. The data were combined with spectral organization, spectral matching through the Global Natural Products Social platform, in silico annotation and taxonomical ponderation. Several types of phenolic compounds were identified such as gallic acids, flavan-3-ols and flavone-like compounds. From these, 5 have been recently reported by our group, whereas 44 are reported here for the first time in this tree species, and 41 (out of 49) for this genus. The results highlight the possible role of Stryphnodendron pulcherrimum as a renewable source for natural bioactive products with potential pharmaceutical applications.


Introduction
Stryphnodendron pulcherrimum (Wild.) Hochr species belong to the Fabaceae family and are usually found in the Amazon and Atlantic forests and in the southern part of Bahia state in Brazil. This tree is commonly known in Brazil as "fake-barbatimão", paricazinho, paricarana, jubarbatimão, juerana-branca or cowboy [1]. In southern Pará, its leaves, fruits and bark are used in traditional medicine, being particularly indicated as an anti-inflammatory agent [2]. A recent study [3] of several plant extracts from the Amazon region highlighted the antimicrobial activity of the aerial parts of S. pulcherrimum against strains of Enterococcus faecalis, proposing this plant as a potential source of natural bioactive compounds.
The use of natural products as drugs or as inspiration to develop new active principles currently plays an important role in the pharmaceutical industry [4]. This role has motivated researchers aiming to develop new strategies to study plant metabolites in a more comprehensive and rational manner. Plants offer vast, renewable and sustainable sources of new pharmacophores with chemical structures exquisitely designed for biological functions, and many of these are yet to be discovered [4][5][6].
The comprehensive chemical description of complex natural mixtures remains as an extremely challenging task [7]. Due to high composition complexity, they often contain several hundreds of compounds of contrasting properties [8,9]. This task has been, however, facilitated thanks to the continuous improvements in analytical techniques such as nuclear magnetic resonance (NMR) spectroscopy and liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) [10][11][12]. Recent developments of innovative strategies to study metabolites from complex matrices such as the organization of the spectral information using Global Natural Product Social Molecular Networking (GNPS)have also been very useful. This latter methodology is capable of simultaneously visualizing the chemical space from non-targeted mass spectrometry data and identifying compounds through mass spectral matching [13].
Herein, in the search for bioactive compounds, we used UHPLC-MS/MS combined with feature-based molecular networking-Global Natural Products Social (FBMN-GNPS) and other state-of-the-art bioinformatic tools to further decipher the chemical composition of the hydro-acetonic leaf extract of Stryphnodendron pulcherrimum.

Stryphnodendron Pulcherrimum Extract Characterization by UHPLC-MS/MS
The hydro-acetonic extract was cleaned up using a C-18 SPE cartridge to remove highly lipophilic compounds such as chlorophylls. It was subjected to UHPLC-MS/MS in the negative ion (NI) mode to favor the ionization of phenolic compounds known to be abundant in plants from the genus. This high-resolution MS analysis using a Qtof analyzer provided nearly accurate molecular masses (MS 1 ) and the corresponding fragmentation patterns (MS 2 ) of the molecular ions [32]. To obtain a molecular network, the MS data were treated using MZmine software [33] and then, the data MS 1 and MS 2 were uploaded to the GNPS platform [34]. The molecular network (MN) obtained consisted in 284 features ( Figure 1). The resulting MN was then subjected to dereplication against experimental MS 2 data from the GNPS databases, and in silico fragmentation spectra created from a large database of natural products, followed by a re-ranking of the putative identities based on taxonomy [35,36].
According to the dictionary of natural products (DNP, v.29.1,), 42 compounds have been reported in the genus Stryphnodendron. From them, 36 correspond to saponin derivatives, mostly stryphnosides, which have been reported in fruits from Stryphnodendron fissuratum [37,38] and Stryphnodendron coriaceum [39]. The remaining six compounds include polyphenolic derivatives from Stryphnodendron adstringens [14,15,40]. Since the working sample is from leaves, not the fruits, an extensive literature research was conducted to create an in-house database (Table S1). It included 74 compounds reported for the Stryphnodendron genus, in different parts of the plants [14][15][16][17]23,24,29,[40][41][42][43][44][45][46][47]. Based on the molecular formula (MF) and heuristic filtering [48], this database was used to further identify compounds in the extract.  for each precursor ion. Node size is proportional to the intensity of the ion peaks in the total ion chromatogram. Node colors represent the three different groups: gallic acid derivatives (G1, blue), flavan-3-ol derivatives (G2, orange) and flavone derivatives (G3, green). Rhomboid shapes correspond to a match against an experimental MS 2 ; square shapes are identifications from in silico spectra matching.

Structural Identification of Compounds Present in the Extract of S. pulcherrimum
According to our annotation results, 30 out 49 compounds had a spectral match against the experimental database from GNPS, while the remaining 19 had a match against the in silico database. To corroborate the structural proposals for these 19 features, their individual fragmentation patterns were explained. Figure 3 display examples of fragmentation patterns for each structural group. Group 1 includes conjugates of gallic acid with sugars and small organic acids ( Figure 4). Fragmentation patterns of these compounds are characterized by the ions of m/z 169 (gallate) and 125 (3,4,5-trihydroxybenzene, ion derived from the decarboxylation of the gallate) [50,51]. If a sugar unit is present, common losses of H2O and consecutive losses of C2H4O2 units from the residual hexose and the heterolytic cleavage of the sugar are observed [52]. For Group 2 (flavan-3-ol derivatives, Figure 5), the most characteristic fragmentation includes quinone methide fission (QM), heterolytic ring fission (HRF) and retro-Diels-Alder (RDA) cleavage, which inform the hydroxylation patterns in the different rings (A, B and C) [53,54]     Note: 1: gallic acid group; 2: flavan-3-ols group; 3: flavones group. N/A: not available, meaning this annotation is a spectral match against the in silico database.

Structural Identification of Compounds Present in the Extract of S. pulcherrimum
According to our annotation results, 30 out 49 compounds had a spectral match against the experimental database from GNPS, while the remaining 19 had a match against the in silico database. To corroborate the structural proposals for these 19 features, their individual fragmentation patterns were explained. Figure 3 display examples of fragmentation patterns for each structural group. Group 1 includes conjugates of gallic acid with sugars and small organic acids (Figure 4). Fragmentation patterns of these compounds are characterized by the ions of m/z 169 (gallate) and 125 (3,4,5-trihydroxybenzene, ion derived from the decarboxylation of the gallate) [50,51]. If a sugar unit is present, common losses of H 2 O and consecutive losses of C 2 H 4 O 2 units from the residual hexose and the heterolytic cleavage of the sugar are observed [52]. For Group 2 (flavan-3-ol derivatives, Figure 5), the most characteristic fragmentation includes quinone methide fission (QM), heterolytic ring fission (HRF) and retro-Diels-Alder (RDA) cleavage, which inform the hydroxylation patterns in the different rings (A, B and C) [53,54] and the identity of the monomeric units. Group 3 is mainly composed of glycosylated flavones ( Figure 6). Their fragmentation usually yields a fragment ion of m/z 316 [C 15 H 9 O 8 • -H] •− , which points to homolytic cleavage and loss of the glycoside radical (sugar unit) [55,56]. Fragment ions of m/z 301 and 300 are both indicative of heterocyclic or homolytic cleavage and loss of the glycoside [56]. The fragment ion [(Aglycone-H)-CO 2 ] − of m/z 271 is also usually formed. The relatively light fragment ion of m/z 151 is produced by RDA cleavage of the aglycone [54,57].
The gallic acid derivative group, contributing to the main part of the cluster of Figure 4, covers a wide range of polarities (RT from 1 to 17 min, Figure 2 We identified 10 compounds, as illustrated in Figure 5, in the flavan-3-ols category, presenting a 2-phenyl-3,4-dihydro-2H-chromen-3-yl backbone [58]. These types of compounds are commonly used as functional/nutritional agents for beverages, fruits and vegetables, food grains, herbal remedies, dietary supplements and dairy products [59], or as ligand for protein inactivation, which prevent the growth of microorganisms [31]. Metabolites 2021, 11, x FOR PEER REVIEW 9 of 20  proposed that the C-O-C bridge between the hexose and a gallic acid moiety is linked at C4 of the benzoate. In such connectivity, this bond is weaker, which could explain the variation in abundance of fragment ions with respect to 9 and  . Gallic acid derivatives in the largest cluster of the network. Numbers inside the nodes correspond to the accurate mass for each precursor. Node size is proportional to the intensity of the ions in the total ion chromatogram. Node colors represent structural groups: gallic acid derivatives (G1, blue), flavan-3-ol derivatives (G2, orange) and flavone derivatives (G3, green). Rhomboid shapes correspond to a match against an experimental MS 2 ; square shapes are identifications from in silico spectra matching.
We identified 10 compounds, as illustrated in Figure 5, in the flavan-3-ols category, presenting a 2-phenyl-3,4-dihydro-2H-chromen-3-yl backbone [58]. These types of com- Figure 4. Gallic acid derivatives in the largest cluster of the network. Numbers inside the nodes correspond to the accurate mass for each precursor. Node size is proportional to the intensity of the ions in the total ion chromatogram. Node colors represent structural groups: gallic acid derivatives (G1, blue), flavan-3-ol derivatives (G2, orange) and flavone derivatives (G3, green). Rhomboid shapes correspond to a match against an experimental MS 2 ; square shapes are identifications from in silico spectra matching. pounds are commonly used as functional/nutritional agents for beverages, fruits and vegetables, food grains, herbal remedies, dietary supplements and dairy products [59], or as ligand for protein inactivation, which prevent the growth of microorganisms [31].   Numbers inside the nodes correspond to the accurate mass for each precursor. Node size is proportional to the intensity of the ions in the total ion chromatogram. Node colors represent the three different groups: gallic acid derivatives (G1, blue), flavan-3-ol derivatives (G2, orange) and flavone derivatives (G3, green). Rhomboid shapes correspond to a match against an experimental MS 2 ; square shapes are identifications from in silico spectra matching.
Compound 5 was assigned to myricetin 3-sorboside, based on its fragmentation pattern. From dereplication on ISDB, we suggest that its CH2OH group is not at the C-5 position of the hexose, as in myricetin 3-galactoside, but at C-1. This assignment is justified Figure 6. Compounds assigned mainly as flavone derivatives in the cluster belonging to Group 3. Numbers inside the nodes correspond to the accurate mass for each precursor. Node size is proportional to the intensity of the ions in the total ion chromatogram. Node colors represent the three different groups: gallic acid derivatives (G1, blue), flavan-3-ol derivatives (G2, orange) and flavone derivatives (G3, green). Rhomboid shapes correspond to a match against an experimental MS 2 ; square shapes are identifications from in silico spectra matching. For the last group of chemical structures, a total of 24 flavone derivatives were detected and annotated ( Figure 6). Flavones have a double bond between C2 and C3 (C ring) in the flavonoid structure and they are oxidized at C4 and are reported to have a variety of biological activities [62,63].
Compound 5 was assigned to myricetin 3-sorboside, based on its fragmentation pattern. From dereplication on ISDB, we suggest that its CH 2 OH group is not at the C-5 position of the hexose, as in myricetin 3-galactoside, but at C-1. This assignment is justified by the uncommon fragment ions of m/z 250 and 121. This structural difference may explain why 5 clustered with the gallic acid derivatives (Figure 1 and Figure S1).  Figures S4-S8.
In this study, we aimed to comprehensively describe the composition of the leaves Stryphnodendron pulcherrimum based on the annotation of all metabolites detected in the hydro-acetonic extract by UHPLC-MS/MS. The combination of spectral organization through MN [34] and the search against both experimental and in silico spectral databases considerably reduces the time and resources for the putative identification of a considerable number of metabolites. Combining the MN with matching of MS/MS against in silico databases [35] was possible re-weighting of the putative candidates based on the biological source and taxonomy [36]. Also, filtering with MF assignment and manually verifying major fragmentation patterns allowed us to significantly increase the confidence of the metabolite annotation.
All compounds reported here, mainly small phenolic conjugates and polyphenols, derive either from the upper stream of the shikimate, phenylpropanoid or phenylpropanoidacetate pathway. The MN presents two major clusters. The largest one is mainly constituted by compounds from G1 ( Figure 4) which are derivatives and conjugates of gallic acid (1). Note that gallic acid displays several biological activities and is considered a biomarker for the chemotaxonomy and differentiation of Stryphnodendron species because of its recurrence and concentration [43,[64][65][66][67].
The second G3 major cluster, composed entirely of flavone derivatives (Figure 6), contains the most abundant compounds according to their ion intensities in the TIC trace ( Figure 2). Myricitrin (31) displays a wide range of biological activities and is remarkable as an anti-inflammatory and antioxidant agent [68][69][70][71][72][73]. Additionally, a smaller group of compounds bearing a flavan-3-ol core structure was putatively identified. We unequivocally detected and identified gallocatechin and (epi)gallocatechin (4,6). These, along with catechin and other monomeric units, are the precursors of most condensed tannins [59,74]. The added value of this type of study lies in previous findings in which some of the identified compounds have been reported in other biological sources as promising bioactive agents for the pharmaceutical industry.
Myricitrin (31) was the most intensity compound found in the extract. It was recently proposed to pre-treat liver ischemia/reperfusion injury [69], and it also presents antithrombotic [70], osteoarthritis [71], antioxidant, anti-inflammatory and antifibrotic [72] activities. Quercitrin (41) was recently described to reduce the infarct volume in stroke in mouses [85], to display inflammatory [86], antiproliferative and anti-apoptotic effects on lung cancer cells [87] and to have cytoprotective effects against free radicals [88].
The metabolites present in the leaves of S. pulcherrimum have therefore great biological potential, some of them already proven, and some yet to be discovered, making this plant a promising new, natural and renewable Amazonian source of bioactive molecules. The dereplication process employed proved to be efficient, producing valuable information about the chemical composition in a considerably reduced time compared to classical phytochemical studies. Finally, the description of the various structurally new secondary metabolites.

Plant Material Collection and Extraction
Stryphnodendron pulcherrimum leaves were collected in September 2019 in Vigia (PA, Brazil). It is associated with the research project registered in SISGEN under number A678D8C. An exsiccate (Voucher IAN 199608) was deposited in the herbarium of Embrapa Amazônia Oriental, Pará, Brazil. The leaves were washed with water and then disinfected with a sodium hypochlorite solution (NaOCl, 0.1%), followed by ethyl alcohol (70% v/v). Then, the leaves were dried in an air circulation oven at 45 • C until constant weight. The dried leaves were crushed in a Fritsch pulverisette 14 ball mill (Idar-Oberstein, RP, Germany.), obtaining 1.866 kg of powder with a 60-100 µm particle size.
A total of 20 g of powder was extracted with 100 mL of H 2 O/Acetone (3:7) in an ultrasound bath, Branson 2510 (Danbury, CT, USA), for 40 min at 25 • C as previously reported [28]. The liquid was vacuum filtered, and the solvent was reduced under vacuum in a rotary evaporator Büchi Syncore (Flawil, Switzerland). A total of 6 g of dry extract (HCOE) was obtained. The extract was kept at −18 • C in a freeze Indrel (Londrina, PR, Brazil) until analysis.

Sample Preparation for UHPLC-MS/MS
HCOE (1 mg) was solubilized in 1 mL of H 2 O/Methanol (2:8 v/v) and passed through a 50 mg C18 Solid Phase Extraction (SPE, Phenomenex, Torrance, CA, USA) cartridge, previously conditioned with 1 mL of methanol and 1 mL of water. The filtrate was dried, and the solid residue was dissolved in 1 mL of methanol, filtered through a 0.22 µm hydrophilic filter (Millipore, Merk, Darmstadt, Germany) and diluted to 100 µg/mL.

UHPLC-MS/MS
The analyses were performed on a UHPLC coupled to an ESI-QTof Xevo G2-S Tof mass spectrometer (Waters Corp., Milford, MA, USA) with an electrospray ionization (ESI) probe operating in the negative ion mode. The m/z mass was 100-1200, and Leucine-enkephalin was used as a LockSpray reference compound. Analysis by UHPLC was carried out on a BEH C18 (50 × 2.1 mm, 1.7 µm) Waters column. The column and autosampler were kept at 40 and 25 • C, respectively. Elution was performed with ultra-pure water (solvent A) and methanol acidified with 0.1% formic acid (solvent B). The gradient method was set as follows: 0-2 min, 2% B 2-10 min, 2-10% B; 10-20 min, 10-20% B; 20-25min, 20-30% B; 25-30min, 30-40% B. The flow rate was 250 µL/min. The total ion chromatogram was acquired using Masslynx V4.1 software (Waters Corp., Milford, MA, USA). The mass spectrometry parameters were set to the following: desolvation gas flow (N 2 ) at 600 L/h and desolvation temperature at 150 • C, the cone gas flow (N 2 ) at 50 L/h and the source temperature at 120 • C. The capillary and sampling cone voltages were adjusted to 1.0 kV and 40 V, respectively.
The data-dependent experiments (DDA, MS/MS) were performed on the five most abundant ions detected in full-scan MS (top 5 experiments per scan). The ion peaks were detected at +1 and +2 charge states with the inclusion of the 10 most intense ion peaks with 0.2 Da (m/z) charge state tolerance and 2 Da extraction tolerance. The differentiation of molecular ions, adducts and fragment ions was conducted by chromatographic deconvolution with 3 Da de-isotope tolerance and 6 de-isotope extraction tolerance. The MS/MS isolation window width was 1 Da, and the stepped normalized collision energy (NCE) was set to 10, 20, 30, 40 and 50 eV units.

Mass Spectrometry Data Treatment Parameters
The UHPLC-MS/MS data were converted from RAW (Waters Corp., Milford, MA, USA) standard data format to mzXML format using MSConvert 3.0.2 [89]. The resulting file was treated using MZmine v2.53 [33]. For mass detection, at MS 1 and MS 2 levels, noise levels of 5.0E 2 and 0.0E0 were used. The ADAP chromatogram builder algorithm was used and set to a minimum group size of scans of 3, minimum group intensity threshold of 5.0E 2 and minimum highest intensity of 5.0E 2 with an m/z tolerance of 12.0 ppm. The ADAP algorithm (wavelets) was used for chromatogram deconvolution. The intensity window S/N was used as S/N estimator with a signal to noise ratio set to 5, a minimum feature height of 5.0E 2 , a coefficient area threshold of 50, a peak duration range from 0.01 to 1.0 min and an RT wavelet range from 0.01 to 0.08 min. Isotopes were detected using the isotope peak grouper with an m/z tolerance of 5.0 ppm, an RT tolerance of 0.05 min (absolute) and the maximum charge set at 1 and the representative isotope used was the most intense. The resulting peak list was filtered using the peak-filter option (height: 1.0 E 4 to 1.0 E 7 , data points: 8 to 50, tailing factor: 0.5 to 2.00). Last, using the peak-list rows filter option, features without an MS 2 spectrum associated were removed. The resulting peak list, containing 283 features, was exported using the GNPS Export/Submit module to an .mgf file and a quantitation table in csv format.

Feature-Based Molecular Networking and Taxonomically Informed Metabolite Annotation
From the .mgf file obtained from the MZmine treatment, an MN was created using the Feature-Based Molecular Networking workflow [13] on the GNPS platform (https:// gnps.ucsdhttps://gnps.ucsd.edu.edu). The precursor ion mass and the MS/MS fragment ion tolerances were set to 0.02 and 0.05 Da, respectively. An MN was then created where the edges were filtered above a cosine of 0.6 and more than 3 matches' peaks. The edges between two nodes were kept in the network if and only if each of the nodes appeared in each other's respective top 10 most similar nodes. The molecular family size was set to a maximum of 100. The spectra in the network were searched against the GNPS spectral libraries [90]. The work is available at the following link, https://gnps.ucsd. edu/ProteoSAFe/status.jsp?task=9e5e99ef164a4fa0bbe218c73de3ff26. Visualization of the results was carried out in Cytoscape 3.8.0 [91]. The output of the GNPS was used to annotate against the in silico ISDB-DNP [35] and then the script for taxonomically informed metabolite annotation [36] was used to re-rank and clean out the output based on the taxonomy.

Conclusions
The metabolite profiling workflow based on UHPLC-MS/MS provided a comprehensive survey of the phenolic composition of the methanolic extract of Stryphnodendron pulcherrimum leaves. This workflow identified 30 compounds through spectral matching against experimental data from GNPS. Additionally, 19 compounds were putatively assigned via the same spectral matching but against in silico databases. Compounds 1, 4, 6, 13, 24, 31, 42 and 45 have been previously reported in the genus Stryphnodendron. Gallic acid (1), (epi)gallocatechin (6), (epi)gallocatechin gallate (13), myricitrin (31) and myricetin (42) were reported previously in the species by our group [49]. To the best of our knowledge, 44 of these compounds are reported herein for the first time in the species Stryphnodendron pulcherrimum and 41 are described for the first time in the genus. The significant and diverse biological activities reported for many of these compounds indicate that this species represents a potentially important biological source of interesting bioactive phenolic compounds.