2.1. Collection, Extraction, LC-MS2 Analysis, and Costruction of the Molecular Network
A sample of the marine sponge Stylissa caribica, collected along the coast of Compass Cay Island, in the Exuma Islands (Bahamas Islands), was extracted with MeOH and CHCl3 mixtures. The MeOH extract was partitioned between H2O and n-BuOH, and the n-BuOH layer was combined with the other organic extracts. The total organic extract was chromatographed using a reversed-phase column chromatography on RP-18 silica gel.
One way to improve the quality of untargeted metabolic profiling is to use a wider separation space, which helps to keep the number of co-eluting metabolites low. Therefore, RP-18 fractions (rather than the crude organic extract) were analyzed by liquid chromatography coupled with high-resolution tandem mass spectrometry (LC-HRMS2) performed using an LTQ Orbitrap instrument with an electrospray (ESI) source and a pentafluorophenyl (PFP) high-performance liquid chromatography (HPLC) column, because of its partially orthogonal retention ability compared to the RP-18 stationary phase. After each full MS scan, the five most intense ions in the spectrum were fragmented in subsequent MS2 scans. From these data, a molecular network was generated by combined use of MZmine2 and MetGem.
The preprocessing of LC-MS2 data with MZmine was the key for obtaining a clear and informative network and will be discussed in some detail here. In our view, the final goal of FBMN is to achieve the equality one node, one compound. In classical molecular networking, this goal is prevented by a number of obstacles. On one hand, the same compounds can give rise to more nodes, because of the presence of the isotope peaks and the frequent formation of different adduct ions (e.g., [M+H]+ and [M+Na]+), and because two noisy MS2 spectra can be mistaken as coming from different compounds when MS2 spectra are clustered. On the other hand, isomeric compounds can collapse into the same node if they show similar MS2 spectra and chromatographic information is not taken into account. To circumvent these problems, the following scheme for the preprocessing of LC-MS2 data was used.
After standard initial data processing (mass detection, chromatogram build, and chromatogram deconvolution), data from the LC-MS2
runs of individual fractions were joined in a single feature list using the Join aligner module. The Adduct search
module was then used not only to identify peaks of [M+Na]+
, and [M+K]+
adduct ions, but also to identify 13
C (mass difference 1.0033) and 81
Br (mass difference 1.9979) isotope peaks. They were subsequently all removed using the Row filter
module. As a result, most compounds in the extract gave only a single entry in the feature list. Finally, the Export to GNPS
module was used to export the MS2
spectra into an .mgf file and quantitative data into a .csv file, which were used for the construction of the molecular network. Detailed information of data processing can be found in Stylissa_MZmine.xml in the Supplementary Materials section
Construction of a molecular network requires the selection of a few parameters that can dramatically affect the resulting network, and whose optimal values are strongly dependent on the nature of the sample, on the technology of the MS instrument, and on the settings used for the LC-MS2 runs. The three most important networking parameters are the mass tolerance for peak matching, the minimum number of matched peaks for a cosine score to be calculated, and the minimum cosine score for two nodes to be connected. Optimization of these parameters was pursued using the program MetGem, which for small datasets is far faster than the GNPS website (a few seconds vs. at least a few minutes). We found that setting the mass tolerance to 0.01 Da for both the parent and the fragment ions, the minimum number of matched peaks to eight, and the minimum cosine score to 0.55 produced the largest and most informative set of clusters, while still keeping the number of false positives low.
The .mgf and .csv files were then submitted to the GNPS website to produce the final, public version of the network. The Dereplicator tool in GNPS was then used to identify some of the nodes in the network. Unexpectedly, the network obtained using the new Feature-Based Molecular Networking workflow, combined with the optimized parameters discussed above was remarkably different compared to the network produced by MetGem, with smaller clusters and many missing nodes (including the node of stylissamide L) (Figure S1
and Table S3
). Contrarily, the network produced using the older Metabolomics workflow and the same parameters was identical to the MetGem network. We were not able to determine the reason for this unexpected outcome and proceeded with the Metabolomics workflow. The feature-based network was constructed and visualized using the Cytoscape software importing the relevant features directly from the quantitation file exported from MZmine.
The resulting network is shown in Figure 2
. In the network, the color of each node is mapped to the relevant retention time to give a visual indication of the polarity of the metabolite, and the size of the node is related to the amounts of the metabolite. In addition, nodes annotated by Dereplicator with a putatively identified metabolite are represented as hexagons.
Most clusters in the network were related to brominated compounds, which are abundant and diverse in S. caribica
, but the largest cluster in the network was the cluster of cyclic peptides. Five of the nodes in this cluster could be putatively annotated as known peptides, two of which were not previously reported from S. caribica
), but the remaining 13 nodes could not be associated with any known natural peptide, indicating the presence of new compounds. Interestingly, the most abundant unknown peptide (m
817.39) showed a much shorter retention time compared with the other peptides in the cluster; it was not present in the RP-18 fraction (fraction F4) where most of the other peptides were eluted, but in the earlier fraction F3. This peptide was isolated as a pure compound (7.2 mg) in a single step of reversed-phase HPLC chromatography and named stylissamide L (1
2.2. Structure Elucidation of Stylissamide L (1)
The high resolution ESI mass spectrum of stylissamide L (1) showed [M+H]+ and [M+Na]+ ion peaks at m/z 817.3876 and m/z 839.3694, respectively, which defined its molecular formula as C41H53O10N8 with 20 unsaturations. The fragmentation pattern observed in the MS2 spectrum of compound 1 confirmed a cyclic peptide structure, with fragments originating from the loss of H2O and CO and of one phenylalanine, one glutamine, one tyrosine, and one proline residues. The molecular formula was satisfied with the presence of one serine and two further proline residues in addition to the four residues above, thus defining the amino acid composition of compound 1, which was later confirmed by NMR analysis. Considering that these seven amino acids accounted for 19 degrees of unsaturation, the 20 unsaturations determined by the molecular formula confirmed the cyclic structure of compound 1.
A full set of homonuclear and heteronuclear two-dimensional NMR spectra (COSY, TOCSY, NOESY, HSQC, and HMBC) were recorded (Figures S3–S11
). The proton spectrum showed four amide NH signals and seven α-proton signals, as expected for a cyclic heptapeptide with three proline residues. The aliphatic protons of each residue were identified from their cross peaks with the corresponding α-proton or amide NH signals in the TOCSY spectrum, and their assignment was achieved using the COSY and HSQC spectra (Table 1
and Figure S8
The amino acid sequence in the peptide was determined from HMBC data. In addition to the standard HMBC experiment, a band selective HMBC experiment was used to improve resolution in the 13
C dimension and allow for discrimination of CO signals with very close 13
C chemical shifts such as ProII
-C1 and ProIII
-C1 (Figure S12
). The most significant HMBC correlations used to elucidate the amino acid sequence are shown in Figure 3
. The carbonyl 13
C signals of each amino acid were assigned (except for Ser) based on their HMBC correlations with one or both protons at the respective β methylene (i.e., at position 3) (blue arrows in Figure 3
). Inter-residue linkages were established by the HMBC correlations of the four amide protons (Ser-NH with ProII
-C1, Tyr-NH with Ser-C1, Gln-NH with ProIII
-C1, Phe-NH with Gln-C1) and of proline ε protons (ProI
-5b with Phe-C1 and ProII
-5b with ProI
-C1) (red arrows in Figure 3
), this defining the sequence as cyclo (Pro-Pro-Ser-Tyr-Pro-Gln-Phe).
The absolute configuration of the seven amino acid residues was defined by an advanced Marfey’s methodology, using the Orbitrap high-resolution MS instrument as detector to improve sensitivity and specificity and perform the analysis using only a few µg of sample [5
]. Compound 1
(32 μg) was subjected to total hydrolysis by treating it with 6 N HCl/AcOH (1:1) at 120 °C for 18 h and then derivatized with the d
-enantiomer of Marfey’s reagent (1-fluoro-2-4-dinitrophenyl-5-d
-alanine amide, or d
-FDAA), adding 100 μL of 1% d
-FDAA. In the total hydrolysis conditions used, the glutamine residue is transformed into glutamic acid. The resulting d
-FDAA derivatives of Pro, Ser, Tyr, Glu and Phe were analyzed by high-resolution LC-MS, and their retention times were compared with authentic standards prepared by reaction of l
- and d
-FDAA with l
-Phe. LC-MS analysis revealed the l
configuration for all amino acids, based on the retention times of Marfey’s derivatives; the exclusive presence of l
amino acids was in accordance with the other heptacyclopeptides of the stylissamide class.
The NOESY spectrum of stylissamide L (1
) showed many cross peaks between topologically far protons (e.g., Tyr-NH with Phe-NH or Tyr-NH with ProI
-H2; see also Table S2
) suggesting a highly structured conformation as in other stylissamides [21
]. The electronic circular dichroism (ECD) spectrum (Figure S13
) showed a quite complex band structure, with a positive Cotton effect at 236 nm and negative Cotton effects at 219 and 202 nm. It has been shown that configurational isomerism about proline peptide bonds is possible in strained cyclic peptides like, for example, for stylissamide H and euryjanicin A [21
]. Therefore, the cis or trans geometry of the bond of proline residues with the preceding amino acid should be considered a configuration rather than a conformation in such compounds, and needed to be clarified to complete structural elucidation of stylissamide L. ProII
was determined to be cis because of the NOESY cross peak between ProII
-H2 and ProI
-H2, and because the difference between the 13
C NMR chemical shift of ProII
-C3 and ProII
-C4 was greater than 8.0 ppm, with ProII
-C4 below 23.3 ppm, in accordance with an empirical rule discussed in ref. [19
]. Likewise, ProI
were deduced to be trans because the respective differences (3.8 and 3.7 ppm) between C-3 and C-4 chemical shifts were well below the 8.0 ppm threshold. Additionally, no NOESY cross peaks conflicting with this assignment were detected.
From the structural point of view, stylissamide L is analogous in many ways to the other members of the family of stylissamide, which are all heptapeptides rich in proline (from two to four proline residues); however, it is the first example of a stylissamide containing a serine residue. The reason why stylissamide L is poorly retained by RP-18 stationary phase has no easy explanation. Stylissamide L lacks aliphatic amino acids other than proline, but this feature is common to other analogues like stylissamide F, which showed remarkably longer retention times; on the other hand, compounds with apparently similar polarity, like stylissamide A, are retained even less than stylissamide L by the RP-18 stationary phase (Table S1
). It is possible that RP-18 retention times may be strongly dependent on the conformation of the peptide, which may prevent non-polar regions of the molecule from interacting with the hydrophobic chromatographic stationary phase.
2.3. Cell Proliferation and Migration Assays
The peculiar conformational features of stylissamide L and the cytotoxic activity reported for some stylissamides prompted evaluation of the growth inhibitory effects of stylissamide L (1
). Assays were conducted using MCF-7 breast cancer and BxPC-3 pancreatic cancer cells, through impedance-based dynamic monitoring of cell proliferation after drug exposure, following a previously described procedure [23
]. After 72 h incubation with different concentrations (6.25, 12.5, 25, and 50 µM) of 1
, MCF-7 and BxPC-3 cell growth remained substantially unaffected even at the highest dose tested (Figure S14
Based upon structure similarity with the known cell-migration inhibitor stylissamide X [24
], stylissamide L (1
) was then evaluated for its ability to affect cell motility. Cell migration consists of chemoattractant-induced movement of cells from one location to another and is a crucial step in tumour cell dissemination and formation of metastases, making it an attractive target in cancer therapy. Migration of MCF-7 breast cancer cells and 3AB-OS osteosarcoma stem cells was monitored for 20 h after exposure to 10 and 50 µM of compound 1
. Migratory activity of MCF-7 and 3AB-OS cells was unaffected or even slightly increased at 50 µM of 1
In spite of the disappointing results of the assays described above, the structural diversity of the cyclic heptapeptides found in Stylissa sponges and the biological activity shown by some of them makes this group of metabolites worthy of further examination. A more complete study about the biological activity of all cyclic peptides isolated from S. caribica, also aimed to determine the structure–activity relationship, is in progress and the results will be reported in the due course.