Computational Metabolomics Tools Reveal Subarmigerides, Unprecedented Linear Peptides from the Marine Sponge Holobiont Callyspongia subarmigera

A detailed examination of a unique molecular family, restricted to the Callyspongia genus, in a molecular network obtained from an in-house Haplosclerida marine sponge collection (including Haliclona, Callyspongia, Xestospongia, and Petrosia species) led to the discovery of subarmigerides, a series of rare linear peptides from Callyspongia subarmigera, a genus mainly known for polyacetylenes and lipids. The structure of the sole isolated peptide, subarmigeride A (1) was elucidated through extensive 1D and 2D NMR spectroscopy, HRMS/MS, and Marfey’s method to assign its absolute configuration. The putative structures of seven additional linear peptides were proposed by an analysis of their respective MS/MS spectra and a comparison of their fragmentation patterns with the heptapeptide 1. Surprisingly, several structurally related analogues of subarmigeride A (1) occurred in one distinct cluster from the molecular network of the cyanobacteria strains of the Guadeloupe mangroves, suggesting that the true producer of this peptide family might be the microbial sponge-associated community, i.e., the sponge-associated cyanobacteria.


Introduction
In our previous study [1], a comprehensive metabolomic strategy, integrating 1 H NMR-and HRMS-based multiblock modelling in conjunction with taxonomically informed molecular networking, was used for the study of 33 Haplosclerida marine sponge samples of three different families (Callyspongiidae, Chalinidae, and Petrosiidae) and four different genera (Callyspongia Duchassaing and Michelotti 1864, Haliclona Grant 1841, Petrosia Vosmaer 1885, and Xestospongia de Laubenfels 1932). To inspect the chemical space of the 33 marine sponge extracts, a feature-based molecular network [2] was generated from the layering of their acquired LC-MS/MS and taxonomical data, allowing us to assess the families of compounds that are shared between several species or genera or that are unique to a group. This strategy has proven to be a powerful way to map large dataset collections in order to perform natural product prioritization [3][4][5][6]. As a way to highlight the unique chemistries within this taxonomically homogenous set of samples, the whole molecular network (175 mass features) was mapped at the genus level using a typical color tag ( Figure S1A, Supporting Information). Remarkably, a unique molecular family, containing 56 related nodes, seemed entirely restricted to the Callyspongia genus ( Figure S1B), and the MS/MS fragmentations suggested the presence of linear peptides. This observation stimulated our interest as, so far, the Callyspongia genus is only known for its cyclic peptides [7]. For this reason and despite no biological activities being observed in the crude extract, the compound at m/z 857.4914 belonging to the species Callyspongia subarmigera (Cladochalina) was isolated in regard to its HPLC profile. Its structure was determined through extensive 1D and 2D NMR spectroscopy, coupled with HRMS/MS data, as an unusual linear heptapeptide that we named subarmigeride A (1). The advanced Marfey's method was used to assign the absolute configuration of the seven amino acids. The putative structures of the seven linear peptides, belonging to the same family, were proposed by an analysis of their respective MS/MS spectra and a comparison of their fragmentation patterns with the heptapeptide subarmigeride A (1). Moreover, using the metabolomics tool MASST [8], the annotations matched with the peptide analogues from cyanobacteria, in particular the Synechocystis sp. PCC 6803 strain. On the basis of these findings, we have re-examined the cluster of ten linear peptides of the molecular network in cyanobacteria from the mangroves of Guadeloupe [9] ( Figure S2) that includes species of the order Syneccocochales and Spirulinales. A comparison of the MS 2 spectra revealed these peptides to be analogues of subarmigeride A (1) and allowed us to putatively identify seven of them, suggesting a microbial origin for the unusual linear peptide series found in Callyspongia subarmigera. The identification of this unusual linear peptide pipeline appeared as a challenge in these Callyspongia sponges. .1160 originating from the loss of two leucine or isoleucine, two phenylalanine, and three proline residues, in conjunction with the presence of iminium ion peaks characteristics at m/z 70.0645, 86.0963, and 120.0802, accounted for the presence of three proline, two leucine, and two phenylalanine units. This hypothesis was then confirmed by an NMR analysis (Figures S3-S9), as described below.

Results and Discussion
In the 1 H NMR spectrum, four distinct amide NH signals and seven distinct α-proton signals were present (Table 1). Only four α-protons showed relevant correlation in the TOCSY spectrum with the corresponding amide NH signals (Table 1), confirming the presence of three proline units. The combined analyses of the two-dimensional NMR spectra, i.e., COSY, TOCSY, and HSQC confirmed the presence of two leucine and two phenylalanine residues. An analysis of the HMBC data led to the assignment of the CO signal of each amino acid (except for L-Pro II and L-Leu II ) through its cross peak with the relevant proton in position 2 and/or 3. The HMBC and NOESY spectra (Table 1) led to the determination of the inter-residue linkages through the correlations between the four amide protons with the carbonyl 13 C signals of the subsequent amino acids (Leu I -NH with Pro I -CO, Phe I -NH with Phe II -CO, Phe II -NH with Pro III -CO, and Leu II -NH with Leu II -1 ). Regarding the proline units, the HMBC data were used to correlate the protons in position 5 with the α-proton of the subsequent proline (Pro I -5 with Pro I -2) and the carbonyl 13 C signals of the subsequent phenylalanine and leucine, respectively (Pro II -5 with Phe I -1 and Pro III -5 with Leu II -1).
In addition, three signals (brs) at δ 7.96 (Leu II -1 ), 6.97 (Leu I -NH 2 a), and 7.17 (Leu I -NH 2 b) were observed in the proton spectrum. The HMBC correlations between the proton Leu II -1 with the CO-Leu II -1, and Leu I -NH 2 a and b with Leu I -CO suggested the presence of one terminal formamide and two terminal amide protons, respectively. This agreed with the molecular formula of compound 1 and allowed us to define the planar structure of compound 1 as being the linear heptapeptide NH 2 -Leu I -Pro I -Pro II -Phe I -Phe II-Pro III -Leu II -CHO, as depicted in Figure 1, that was named subarmigeride A (1).   As the ∆δC 3 -C 4 of the three Pro residues were below 8.0 ppm, (4.1, 3.7, and 4.5 ppm, respectively), they were determined to be trans. In addition, according to the empirical rule, no NOESY correlation was observed between the Hα of the Pro residues and the Hα of their vicinal amino acid [10]. The structure of the linear heptapeptide 1 was further confirmed by the examination of its MS/MS fragmentation pattern (Table 2, Figure S10). Using the advanced Marfey's method [11], the configuration of each amino acid was determined to be L-configuration ( Figure S11).

Putative Structure of Seven Additional Linear Peptides
A detailed examination of the cluster of the molecular networking revealed that additional related peptides were shared with subarmigeride A (1) from the marine sponge C. subarmigera ( Figure 2). We decided to explore their chemical structure thanks to their respective MS/MS fragmentation spectra (Figures S12-S18 and Tables S1-S7). The identification of the putative seven additional peptides was obtained according to their fragmentations, compared with the linear heptapeptide 1 (Table 3). To further clarify the occurrence of subarmigeride A (1), its MS/MS spectrum was queried across all public GNPS datasets [12] using the recently introduced metabolomics tool MASST [7]. Interestingly, among the matched datasets, some analogous compounds occurred in the cyanobacteria genera such as Synechocystis (Synechocystis sp. PCC 6803, order Synechococcales). Mar. Drugs 2022, 20, x FOR PEER REVIEW 6 of 13

Occurrence of the Linear Peptide Subarmigeride A (1) in the Previously Studied Cyanobacterium PMC 1052.18 from Guadeloupe
To further clarify the occurrence of subarmigeride A (1), its MS/MS spectrum was queried across all public GNPS datasets [12] using the recently introduced metabolomics tool MASST [7]. Interestingly, among the matched datasets, some analogous compounds occurred in the cyanobacteria genera such as Synechocystis (Synechocystis sp. PCC 6803, order Synechococcales).
These results prompted us to re-examine a distinct peptide cluster ( Figure S2) in our previous study on cyanobacteria strains from the mangroves of Guadeloupe that showed the occurrence of twelve known peptides from marine sponges in the molecular network annotated with the DEREPLICATOR tool [9]. Satisfyingly, the extracted ion chromatograms of the Guadeloupe cyanobacteria strain PMC 1052.18, corresponding to cyanobacterium gen. nov. 3, sp. nov. 1, revealed the presence of a coincidental feature at m/z  These results prompted us to re-examine a distinct peptide cluster ( Figure S2) in our previous study on cyanobacteria strains from the mangroves of Guadeloupe that showed the occurrence of twelve known peptides from marine sponges in the molecular network annotated with the DEREPLICATOR tool [9]. Satisfyingly, the extracted ion chromatograms of the Guadeloupe cyanobacteria strain PMC 1052.18, corresponding to cyanobacterium gen. nov. 3, sp. nov. 1, revealed the presence of a coincidental feature at m/z 857.4914 (tolerance < 10 ppm) with a retention time (RT = 5.974 min, ∆ = 0.011 min). The MS/MS spectra of both features appeared similar ( Figure S19). The strain PMC 1052.18 was isolated from a large benthic bacterial mat. It was initially assigned to a novel genus and species with only limited similarity to other cultured cyanobacterial strains, with a 16S rRNA sequence 90.7% similar to that of the Synechocystis sp. PCC 6803 [9]. Recently published, new 16S rRNA sequences available in the GenBank database, as well as further microscopy, allowed us to refine the identification. PMC 1052.18 is closely related to one cyanobacterium assigned to the genus Spirulina from soil (strain HSDM2). Indeed, it displays 99% 16SrRNA sequence similarity as well as the helically coiled morphology typical of the genus Spirulina ( Figure S20).
Despite the similarities in the retention time and fragmentation pattern with the cyanobacteria strain PMC 1052.18, the presence of subarmigeride A (1) could not be confirmed. However, a structurally related isomer of this peptide has been detected. This result suggests a potential microbial origin for this unusual linear peptide series found in the marine sponge Callyspongia subarmigera, possibly through some associated symbionts, since cyanobacteria have been reported to associate with the genus Callyspongia. [13].

General Experimental Procedures
Mass spectra were recorded on a MAXIS II ETD ultra-high-resolution ESI-QTOF mass spectrometer. NMR spectra were obtained on either a Bruker Avance 400 or 600 spectrometer using standard pulse sequences. Flash chromatography was carried out on Buchi C-615, C-601, C-605 pump system (Rungis, France). Analytical reversed-phase (Luna C18, 250 × 4.6 mm, 5 µm, Phenomenex, Torrance, CA, USA) column was performed with an Agilent Infinity (model 1220 LC), equipped with a photodiode array detector (model 1220 DAD Infinity LC) and the software OpenLab CDS. The data station recorded the wavelengths at 280, 254, and 210 nm. Chromatography columns (CC) were performed using silica gel (200~400 mesh; Merck, Darmstadt, Germany) Sephadex ® LH-20 (Amersham Pharmacia, Uppsala, Sweden). The Marfey's experiment was performed using a Thermo LTQ Orbitrap XL mass spectrometer coupled to a Thermo Ultimate 3000 RS system (Thermo Fisher Scientific Spa, Rodano, Italy), which included solvent reservoir, in-line degasser, ternary pump, column thermostat, and refrigerated autosampler.

Sponge Material
The sponge sample from this study was identified as Callyspongia (Cladochalina) subarmigera, code name SS18. It belonged to the house sponge extracts collection of the Haplosclerida order, which was collected on South Sulawesi Island (Indonesia). Voucher specimens were deposited at the Naturalis Biodiversity Center.

Sponge Extract Preparation
As previously described [1], the sponge samples (500 g) were cut into small pieces and immediately immersed in MeOH (1 L) after collection. After filtration of an aliquot (20 mL), solvent was evaporated, and 150 mg of each dry extract was mixed to 2 g of C18 and deposited as a powder on a C18 Sep-Pack cartridge (Phenomenex 200 mg/10 mL) to be eluted first with H 2 O (20 mL) in order to eliminate salt and second with MeOH (20 mL) to obtain the desalted extracts. After solvent evaporation, an aliquot of 200 µg was dissolved in 200 µL MeOH for mass analyses.

NMR Data Acquisition and Processing
Proton spectra were acquired at 600 MHz and 298 K on a Bruker Avance III HD spectrometer with a 5 mm reversed TCI cryoprobe. One-dimensional free induction decays (FID) were acquired with a single 90 • pulse sequence on 64K data points for 10.0 ppm spectral width, with a 1 s relaxation delay and 256 scan accumulations.
Signal processing was automatically performed in TopSpin software including the Fourier transform with a 0.3 Hz line broadening, baseline correction, and chemical shift calibration (DMSO at δ H 2.50 ppm).
2D TOCSY experiment was performed on 2K data points for F2 and 0.5K data points for F1 with 10.0 ppm spectral width in both dimensions, spin lock of 80 ms, and 16 scan accumulations. 2D COSY experiment was performed on 2K data points for F2 and 0.5K data points for F1 with 10.0 ppm spectral width in both dimensions and 16 scan accumulations.
2D NOESY experiment was performed on 2K data points for F2 and 0.5K data points for F1 with 10.0 ppm spectral width in both dimensions, mixing time 500 ms, and 16 scan accumulations.
2D HSQC experiment was performed on 1K data points for F2 and 0.5K data points for F1 with 10.0 ppm spectral width for F2, 190 ppm spectral width for F1, and 16 scan accumulations.
2D HMBC experiment was performed on 2K data points for F2 and 0.5K data points for F1 with 10.0 ppm spectral width for F2, 230 ppm spectral width for F1, and 16 scan accumulations.

LC-MS 2 Analyses of Extracts
LC-ESI-HRMS2 analyses were achieved using ultra-high-performance LC system (Ultimate 3000 RSLC, Thermo Scientific, Waltham, MA, USA) coupled to a high-resolution electrospray ionization quadrupole time-of-flight (ESI-Q-TOF) mass spectrometer (MaXis II ETD, Bruker Daltonics, Billerica, MA, USA). An Acclaim RSLC Polar Advantage II column (2.2 µm, 2.1 × 100 mm, Thermo Scientific) was used for LC separation with a flow rate of 0.3 mL/min and a linear gradient from 5% B (A: H 2 O + 0.1% formic acid, B: ACN + 0.08% formic acid) to 100% B in 10 min and then 100% B over 1 min, followed by a decrease to 5% in 1 min for a total runtime of 20 min. The mass range m/z from 50 to 1300 in positive ion mode was acquired. Injection volume was set at 10 µL. Source parameters were set as follows: nebulizer gas 2. ESI source, operating in positive ion mode. A Sunfire analytical C18 column (150 × 2.1 mm; i.d. 3.5 µm, Waters, Milford, MA, USA) was used, with a flow rate of 250 µL/min and a linear gradient from 5% B (A: H 2 O + 0.1% formic acid, B: ACN) to 100% B in 20 min and then 100% B over 10 min for a total runtime of 30 min. Injection volume was set at 10 µL. Source parameters were set as follows: capillary temperature at 320 • C, source voltage at 3500 V, and sheath gas flow rate at 10 L/ min. The divert valve was set to waste for the first 3 min. MS scans were operated in full-scan mode from m/z 100 to 1700 (0.1 s scan time) with a mass resolution of 11,000 at m/z 922. MS1 scan was followed by MS2 scans of the five most intense ions above an absolute threshold of 5000 counts. Selected parent ions were fragmented at a collision energy fixed at 45 eV and an isolation window of 1.3 amu. Calibration solution contained two internal reference masses (purine, C 5 H 4 N 4 , m/z 121.050873; and HP-921 [hexakis-(1H,1H,3H-tetrafluoropentoxy) phosphazene], C 18 H 18 O 6 N 3 P 3 F 24 , m/z 922.0098). A permanent MS/MS exclusion list criterion was set to prevent oversampling of the internal calibrant. LC-UV and MS data acquisition and processing were performed using MassHunter Workstation software (Agilent Technologies, Massy, France).

Mass Spectrometry: LC-MS/MS Data Processing
The MS 2 data file was converted from the .d standard data format to .mzXML format using the MSConvert software, part of the ProteoWizard package [14]. All .mzXML were then imported in MZmine 2 v.53 [15]. The mass detection was performed on exact masses with mass level 1 and centroided masses with mass level 2 by keeping the noise level at 1.2 × 10 3 at MS 1 and at 2 × 10 1 at MS 2 , respectively. The ADAP chromatogram builder was used to build a chromatogram with a minimum group size of scans of 2, a group intensity threshold of 2 × 10 3 , a minimum highest intensity of 2 × 10 3 , and m/z tolerance of 10 ppm [16]. As it regards chromatogram deconvolution, the local minimum search algorithm was employed with the following settings: chromatographic threshold = 1%, search minimum in RT range (min) = 0.1, minimum relative height = 5%, minimum absolute height = 2 × 10 3 , min ratio of peak top/edge = 1.4, and peak duration range (min) = 0.05-2.  13 C] + adducts were filtered out by setting the maximum relative height at 100%. The resulting peak list was filtered to keep only rows with MS 2 features. The .mgf and .csv files were generated using the dedicated "Export/Submit to GNPS/FBMN" option.

Mass Spectrometry: Molecular Networking
A molecular network was created using the online FBMN workflow (version re-lease_28.2) on GNPS (https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=8a40068370b4 4e21855c1e14647ff23a, accessed on 24 May 2022) ( Figure S21). The parent mass tolerance was 0.02 Da, and the MS/MS fragment ion tolerance was 0.02 Da. A network was then created where edges were filtered to have a cosine score above 0.65 and more than 6 matched peaks. Further edges between two nodes were kept in the network if, and only if, each of the nodes appeared in the respective top 10 most similar nodes of each other. The spectra in the network were then searched against GNPS spectral libraries. All matches kept between network spectra and library spectra were required to have a score above 0.65 and at least 6 matched peaks. The molecular networking data were analyzed and visualized using Cytoscape (ver. 3.9.1) [17].

Advanced Marfey's Analysis
According to previously described experiment [10], compound 1 (200 µg) was hydrolyzed with 6 N HCl/AcOH (1:1) at 120 • C for 12 h. The residual HCl fumes were removed under N 2 stream. The hydrolysate of 1 was dissolved in TEA/acetone (2:3, 100 µL), and the solution was treated with 100 µL of 1% 1-fluoro-2,4-dinitrophenyl-5-D-alaninamide (D-FDAA) in ACN/acetone (1:2). The vial was heated at 50 • C for 2 h. The mixture was dried, and the resulting D-FDAA derivatives of Leu, Phe, and Pro were redissolved in MeOH (100 µL) for subsequent analysis. Authentic standards of L-Pro, L-Phe, and L-Leu were treated with L-FDAA and D-FDAA as described above and yielded the L-FDAA and D-FDAA standards. Marfey's derivatives of 1 were analyzed using HPLC-ESI-HRMS, and their retention times were compared with those from the authentic standards derivatives. A Kinetex C18 (Phenomenex) 150 × 2.1 mm, 5 µm column. The gradient conditions were set as follows: 35 min prerun with 5% ACN, 5% ACN 3 min, 5%→50% ACN over 30 min, 50% ACN 1 min, 50%→90% ACN 1 min, and 90% ACN 6. Mass spectra were acquired in positive ion detection mode, and the data were analyzed using the suite of programs XCalibur. The precursor ion mass tolerance was set to 2.0 Da and a MS/MS fragment ion tolerance of 0.5 Da. The library spectra were filtered in the same manner as the input data. All matches kept between input spectra and library spectra were required to have a score above 0.2 and at least 3 matched peaks. The job is accessible here: https://gnps.ucsd.edu/ProteoSAFe/ status.jsp?task=c118cea33a4443478caf118c82d4ec98, accessed on 2 September 2022.

Conclusions
A detailed examination of the whole molecular network of an in-house collection of Haplosclerida marine sponges led to the selection and deep analysis of the crude extract of the marine sponge Callyspongia subarmigera (Cladochalina) and to the first discovery of one unusual linear peptide that we named subarmigeride A (1). So far, only cyclic peptides have been reported in the literature within the genus Callyspongia (261 described species) marine sponges, which are mainly known for their polyacetylenes and lipids [7]. Furthermore, the putative seven additional peptides, revealed by fragmentation pattern analyses, contribute to the knowledge of the chemical diversity of marine sponges of the genus Callyspongia.
Although the biological activities of Callyspongia marine sponges have been previously reported, the evaluation of the crude extract of C. subarmigera at 10 µg/mL against the human lung adenocarcinoma (A549), colorectal carcinoma (HCT116), and leukemia (HL60) cell lines showed no cytotoxic activity. Furthermore, no environmental activities, including antibiofilm activity against Pseudomonas aeruginosa and antihelminthic activity, have been detected. Consequently, the role of the isolated peptide subarmigeride A (1) and its congeners within the marine sponge Callyspongia subarmigera remains to be elucidated. Furthermore, the purification of subarmigeride A and its analogues are in progress in order to obtain pure linear peptides that could be evaluated for biological and environmental activities. Interestingly, the occurrence of structurally related analogues produced by cyanobacteria was revealed using MASST. These results suggest that the reported linear peptides might originate from cyanobacteria, which are well-known producers of linear peptides [18]. The present study confirms the prolific source of unusual molecules produced by the marine sponge holobiont as well as the interest of sharing well-informed omics datasets in the context of microbiome research.  Figure S3: 1 H-NMR spectrum of subarmigeride A (1) (600 MHz, DMSO-d6). Figure S4: DEPTQ-NMR spectrum of subarmigeride A (1) (150 MHz, DMSO-d6). Figure S5: COSY-NMR spectrum of subarmigeride A (1) (600 MHz, DMSO-d6). Figure S6: TOCSY-NMR spectrum of subarmigeride A (1) (600 MHz, DMSO-d6). Figure S7: HSQC-NMR spectrum of subarmigeride A (1) (600 MHz, DMSO-d6). Figure S8: HMBC-NMR spectrum of subarmigeride A (1) (600 MHz, DMSO-d6). Figure S9: NOESY-NMR spectrum of subarmigeride A (1) (600 MHz, DMSO-d6). Figure S10 Figure S20: Helically coiled morphology of Spirulina sp. PMC 1052.18. Figure  S21: Molecular networks obtained using the feature-based molecular network workflow on GNPS (https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=8a40068370b44e21855c1e14647ff23a).