A Mass Spectrometry Database for Sea Cucumber Triterpene Glycosides

Sea cucumber triterpene glycosides are a class of secondary metabolites that possess distinctive chemical structures and exhibit a variety of biological and pharmacological activities. The application of MS-based approaches for the study of triterpene glycosides allows rapid evaluation of the structural diversity of metabolites in complex mixtures. However, the identification of the detected triterpene glycosides can be challenging. The objective of this study is to establish the first spectral library containing the mass spectra of sea cucumber triterpene glycosides using ultraperformance liquid chromatography-quadrupole time-of-flight mass spectrometry. The library contains the electrospray ionization tandem mass spectra and retention times of 191 triterpene glycosides previously isolated from 15 sea cucumber species and one starfish at the Laboratory of the Chemistry of Marine Natural Products of the G.B. Elyakov Pacific Institute of Bioorganic Chemistry. In addition, the chromatographic behavior and some structure-related neutral losses in tandem MS are discussed. The obtained data will accelerate the accurate dereplication of known triterpene glycosides and the annotation of novel compounds, as we demonstrated by the processing of LC-MS/MS data of Eupentacta fraudatrix extract.


Introduction
Sea cucumbers are a source of a wide range of secondary metabolites, including triterpene glycosides, which are of particular interest due to their diverse biological and pharmacological effects, including cytotoxic, hemolytic, antiviral, antifungal, and immunomodulatory properties [1][2][3][4][5]. The sea cucumber triterpene glycosides are characterized by unique chemical structures with a conservative structural framework but significant natural diversity in some features, leading to a vast potential for structural variability. Triterpene glycosides have lanostane-and norlanostane-type aglycons, and most of them contain an 18 (20)-lactone, although aglycons with an 18(16)-lactone or without a lactone cycle are not uncommon (Figure 1). The polycyclic nuclei of the aglycons preferably contain a 7(8)-or 9(11)-double bond and may have oxygen-containing substituents at C-12, C-17, or C-16. The side chains are a major source of aglycon structural diversity due to chain length reduction and modification by oxygen-containing substituents and double bonds. The carbohydrate moiety of triterpene glycosides is attached to C-3 of the aglycon and can contain up to six monosaccharide units, with glucose (Glc), quinovose (Qui), xylose (Xyl), 3-O-methylglucose (MeGlc), and 3-O-methylxylose (MeXyl) being the most common monosaccharides. Xylose is always the first unit of the oligosaccharide chain. Quinovose Due to the extreme complexity of the sea cucumber extracts and the enormous diversity of triterpene glycoside structures, conventional approaches in natural product research, which require the isolation of individual compounds by a combination of chromatographic techniques followed by structure elucidation by nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), or other analytical techniques, are timeand labor-intensive procedures. The application of MS-based methods, such as liquid chromatography-electrospray mass spectrometry (LC-ESI MS), to the investigation of complex fractions of sea cucumber triterpene glycosides enables the acquisition of qualitative and quantitative information about the chemical composition, which can subsequently be used to facilitate the isolation of biologically active compounds as well as the study of biosynthetic pathways and biological roles of target metabolites. A recent review demonstrated the advantages of MS-based metabolomics as a powerful research tool for assessing the biochemical diversity of sea cucumbers and starfish [11].
The application of MS-based approaches to the study of complex mixtures of natural products can be fraught with difficulties, of which metabolite identification is a current bottleneck [12]. High confidence identification can be achieved by comparing data, including monoisotopic mass, MS/MS spectra, and retention times with data from a reference standard [13], which requires either an extensive chemical library or specific databases. Chemical libraries of natural products are very limited and cover only a small part of the total scope of biochemical diversity, especially in the field of marine natural products, due to the inaccessibility of sources and the enormously labor-intensive processes of synthesis or isolation of individual compounds and the maintenance of chemical collections. Therefore, the availability of comprehensive, open-access databases is extremely important for the successful application of mass spectrometry to the analysis of natural products [12].
The advancement of metabolomics as a powerful research tool in the field of natural products over the last two decades has promoted the development of open-access resources and databases containing experimental data and MS/MS spectra, such as MassBank [14], Metlin [15], MassBank of North America (https://mona.fiehnlab.ucdavis.edu, accessed on 1 April 2023), BMDMS-NP [16], and GNPS [17]. The latter is a web-based mass spectrometry platform that combines research tools for storing, analyzing, annotating, and sharing experimental data with an open-access MS/MS database combining several dozen spectral libraries, including several specialized natural product libraries such as the Lichen DataBase (LDB) [18], the Monoterpene Indole Alkaloids DataBase (MIADB) [19], the IsoQuinoline and Annonaceous Metabolites Data Base (IQAMDB) [20], the phytochemical database Sam Sik Kang Legacy Library [21], and others.
However, despite the increasing coverage of natural products in existing spectral libraries, data on marine natural products remains very limited. To our knowledge, there are no currently open spectral libraries containing MS/MS spectra of sea cucumber triterpene glycosides. This limitation, coupled with the significant structural diversity of these compounds, restricts the broad application of mass spectrometry in the investigation of triterpene glycosides.
Over the past decades, more than 700 new marine natural products, including a large series of sea cucumber triterpene glycosides, have been isolated and structurally elucidated at the G.B. Elyakov Pacific Institute of Bioorganic Chemistry. Herein, we report on the establishment of a sea cucumber triterpene glycoside database comprising MS/MS spectra and retention times for 191 compounds previously isolated from 15 species of Holothuroidea and one starfish at the Laboratory of the Chemistry of Marine Natural Products of the G.B. Elyakov Pacific Institute of Bioorganic Chemistry. The dataset and spectral library are open-access and can be downloaded as Supplementary Material or from MetaboLights using the identifier MTBLS7506.

Chemicals
Acetonitrile (UHPLC grade) was purchased from Panreac (Barcelona, Spain), and methanol (HPLC grade) was purchased from J.T. Baker (Deventer, The Netherlands). Water was obtained from an aquaMAX Ultra 370 Series water purification system (YoungIn Chromass, Anyang, Republic of Korea). MS-grade formic acid was purchased from Merck (Darmstadt, Germany).
The chemical library of triterpene glycosides is maintained in the Laboratory of the Chemistry of Marine Natural Products of the G.B. Elyakov Pacific Institute of Bioorganic Chemistry, FEB RAS. All the triterpene glycosides have previously been isolated as individual compounds from various sea cucumber species by a combination of column chromatography and HPLC, and the structures of these compounds have been elucidated by several independent methods, including NMR spectroscopy and MS. Specimens of sea cucumbers and starfish were collected from various locations, including the South China Sea, the Sea of Okhotsk, the Sea of Japan, the Bering Sea, the Arabian Sea, and the Eastern Weddell Sea, between 1982 and 2019, by scuba diving, Sigsbee trawling, or scallop dredging. Sampling and identification were performed by Prof. Valery S. Levin, Dr. Igor Yu. Dolmatov, Dr. Vadim G. Stepanov, Dr. Alexey V. Smirnov, Dr. Salim Sh. Dautov, and Boris B. Grebnev; voucher specimens are kept in the collection of the G.B. Elyakov Pacific Institute of Bioorganic Chemistry, FEB RAS. Most of the triterpene glycosides were obtained between 2011 and 2022 and stored in the dried state at a low temperature (−20 • C).

Sample Preparation
The dried samples of triterpene glycosides were dissolved in 50% methanol in water (v/v) (for non-sulfated, mono-and disulfated compounds) or in water (for tri-and tetrasulfated compounds) at a concentration of 0.2 mg/mL. If necessary, certain samples were centrifuged at 10,000× g for 5 min. The 0.5 mL solution was transferred to a 2 mL autosampler vial for LC-MS analysis.

Data Acquisition
Data were acquired using a Bruker Elute UHPLC system (Bruker Daltonics, Bremen, Germany) consisting of an Elute Pump HPG 1300, an Elute Autosampler UHPLC, and an Elute Column Oven coupled to a Bruker Impact II Q-TOF mass spectrometer (Bruker Daltonics, Bremen, Germany). Chromatographic separation was performed using an InfinityLab Poroshell 120 SB-C18 column (2.1 × 150 mm, 1.9 µm, Agilent Technologies, Santa Clara, CA, USA) at a flow rate of 0.3 mL/min and a temperature of 40 • C. The mobile phase consisted of water (eluent A) and acetonitrile (eluent B), both acidified with 0.1% formic acid. The gradient program used was as follows: eluent B was increased from 15 to 30% over 3 min, then from 30 to 60% eluent B from 3 to 20 min, from 60 to 100% eluent B from 20 to 21 min, held isocratically at 100% eluent B from 21 to 25 min, and finally reduced from 100 to 15% eluent B over 25 to 25.2 min. After returning to the initial conditions, equilibration was achieved after 2.8 min. The injection volume was 2 µL in the partial loop injection mode.
Mass spectrometry detection was performed using an ESI ionization source in negative ion mode. Optimized ionization parameters for ESI were as follows: a capillary voltage of 4 kV, nebulization with nitrogen at 2.5 bar, and a dry gas flow of 6 L/min at a temperature of 215 • C. Mass spectra were recorded in the m/z mass range of 80-2000 at 1.5 Hz. The isCID energy was increased to 90.0 eV to avoid unnecessary adduct and dimer formation.
Collision-induced dissociation (CID) product ion mass spectra were recorded in auto-MS/MS mode. The precursor ions were isolated with an isolation width of 4 Th. The collision energy was optimized to obtain the most informative MS/MS spectra by adjusting it according to the precursor ion charge, utilizing values of 120, 60, 43, and 40 eV for precursor ion charges of 1, 2, 3, and 4, respectively, based on early preliminary experiments. Mass spectrometry acquisitions were split into four scan events, with the first being an MS scan, followed by three MS/MS scans of the precursor ion with the highest intensity detected during the first scan event. The MS/MS spectra were obtained using the MultiCE option, which allowed for acquisition with varying collision energies. During the acquisition cycle, the first MS/MS scan was obtained at 75% of the target collision energy, followed by a second scan at 100%, and a third scan at 120%. Detailed mass spectra acquisition parameters are provided as supplementary material (Table S1). If the substance was unsuitable for reversed-phase LC, the sample was directly injected into the ESI ion source using a syringe pump at a flow rate of 0.1 mL/min.
The mass spectrometer was operated using otofControl (ver. 4.1, Bruker Daltonics, Bremen, Germany). The instrument was calibrated using the ESI-L Low Concentration Tuning Mix (Agilent Technologies, Santa Clara, CA, USA).

Data Processing and Spectral Library Constitution
A set of 191 files in Bruker .d format containing raw data was generated as a result of the analysis of triterpene glycoside reference standards. Raw data were manually inspected, and extracted ion chromatograms were constructed for the corresponding calculated m/z of the precursor ions using DataAnalysis software (Ver. 4.4, Bruker Daltonics, Bremen, Germany). The most intense MS and informative MS/MS scans obtained at different collision energies of each compound were added to the spectral library using DataAnalysis and LibraryEditor software (ver. 4.4, Bruker Daltonics, Bremen, Germany). The resulting library in .mlb format contained 191 MS and 494 MS/MS spectra, along with information about the name, structure, molecular formula, retention time, and acquisition parameters, and is available as Supplementary Material.
For the MS spectral library in .mgf format constitution, the LC-MS raw files were converted to an open-source mzML format using the MSConvert utility [22]. The mzML files were then processed using the MZmine 2.53 software [23], which allowed the scans associated with the signal of interest to be exported into separate mzML files. The files containing the most informative MS/MS scans were processed using MZmine to generate a list of masses from the raw MS data. Mass detection was performed using the wavelet transform algorithm with the following parameters: noise level-20; scale level-10; and wavelet window size-100%. The resulting peak lists from each file were exported as separate mgf files, which were then combined into a single mgf file using a custom script. A set of mzML and mgf files containing raw and derived MS and MS/MS spectra at different collision energies, as well as a merged mgf file containing all MS/MS data, are deposited on MetaboLights [24] under the identifier MTBLS7506 and are also available as Supplementary Material.
The mgf file containing the combined MS/MS data was used as input data to create a molecular network using the online molecular networking workflow (version release_28.2) at GNPS with a parent mass tolerance of 0.02 Da and an MS/MS fragment ion tolerance of 0.05 Da. A molecular network was created with edges filtered to have a cosine score above 0.7 and more than 6 matching peaks. Furthermore, edges between two nodes were kept in the network if and only if each node appeared in the top 10 most similar nodes to the other. The maximum size of a molecular family was set to 100, and the lowest-scoring edges were removed from molecular families until the molecular family size was below this threshold. The molecular networks were visualized using Cytoscape software [25]. The molecular networks can be accessed at [26].
Statistical analyses of the chromatographic data proceeded with GraphPad Prism software (ver. 9.5.1, GraphPad Prism software Inc., La Jolla, CA, USA). The unpaired nonparametric Kruskal-Wallis tests, followed by Dunn's post hoc tests, were used for the comparison of several groups. Differences between groups were considered statistically significant when p < 0.05. The log p values were calculated using the Chemistry Development Kit (CDK) ver. 2.8 [27].

LC-MS Analysis of the Eupentacta Fraudatrix Extract
Technical validation of the established database was achieved by using the spectral library to dereplicate the extract of the previously studied sea cucumber, Eupentacta fraudatrix (Djakonov et Baranova). Ten animals collected at Amursky Bay (Peter the Great Gulf, the Sea of Japan) in April 2017 at a depth of 1.0-1.5 m were chopped and extracted with 250 mL of ethanol. After filtration, the 2 mL of extracts were subjected to a drying process and later reconstituted in 1 mL of 50% methanol in water (v/v). Purification and desalting were performed by solid-phase extraction using an Oasis HLB extraction cartridge (60 mg, Waters, Medford, MA, USA). The sorbent was conditioned with 3 mL of methanol followed by 3 mL of water. The 250 µL of the extract was loaded into the SPE cartridge, and then the SPE cartridge was washed with 1 mL of water. Triterpene glycosides were eluted with 1 mL of 50% methanol and subjected to LC-MS analyses. The analysis conditions were the same as those described previously, except for the disable multiCE option. The LC-MS raw file was converted to mzML format using MSConvert and subsequently processed using MZmine 2.53, which allowed for the generation of an mgf file containing MS/MS data of the detected features (detailed MZmine parameters are provided as supplementary material (Table S2)), and the results were exported to GNPS for metabolite identification and FBMN analysis (parameters used were the same as those described above). The molecular network with the E. fraudatrix data can be accessed at [28].

Results and Discussion
A total of 15 sea cucumber species belonging to 5 families were the sources of the analyzed compounds (Table 1, Table S3). The family Cucumariidae was the most abundant, with 9 represented species yielding 68 triterpene glycosides. The families Psolidae and Sclerodactylidae were represented with 2 species each, providing 40 and 59 metabolites, respectively. The families Phyllophoridae and Caudinidae were represented by one species each. In addition, several triterpene glycosides were isolated from the starfish Solaster pacificus (Solasteridae, Valvatida).
The triterpene glycosides used in this study demonstrate a significant range of structural diversity, encompassing a broad spectrum of both oligosaccharide fragments and aglycons structures (the structures of all compounds are shown in Figures S1-S14). The set of triterpene glycosides analyzed included 61 non-sulfated and 130 sulfated metabolites, including 55 monosulfated, 43 disulfated, 23 trisulfated, and 9 of the rarest tetrasulfated compounds. The majority of glycosides (76 substances) contained oligosaccharide chains comprising four monosaccharide units, while the penta-and hexaoside groups consisted of 65 and 45 compounds, respectively. Additionally, five compounds had shorter oligosaccharide chains: three glycosides had three monosaccharide units each, and two compounds were biosides. The pool of aglycon structures comprised both compounds with a lactone cycle, including holostane derivatives with 18(20)-lactone (150 compounds) and the rarer 18(16)-lactone (16 compounds), as well as compounds without a lactone cycle (25 compounds). Some glycosides had oxygen-containing substituents in the polycyclic nuclei and the side chains, including acetoxy, hydroxy, and keto groups, as well as epoxy-and hydroperoxy groups.  Thus, the structures of the compounds used in this study encompass the majority of the known structural variants of sea cucumber triterpene glycosides, and the information acquired may facilitate the investigation of mass spectrometric behavior, fragmentation pathways, and the correlation of structures with retention times.

Analysis of the Mass Spectrometry Data
The established sea cucumber triterpene glycoside spectral library contains MS/MS spectra of 191 compounds. In addition, 494 MS/MS spectra obtained at various collision energies are available as Supplementary Material. Supplementary Table S3 presents a comprehensive list of the analyzed triterpene glycosides, along with their retention times and accurate mass measurements of precursor ions.
Although the positive ion mode provides intense B-and C-type product ions arising from the cleavages of glycosidic bonds (following the nomenclature of Domon and Costello [29]) in sulfated and non-sulfated compounds [30], the negative ion mode is generally more sensitive, especially for compounds with multiple sulfate groups. Furthermore, it is often unfeasible to obtain the molecular ion for glycosides containing more than two sulfate groups in the positive ion mode. In the negative ion mode, monosulfated and non-sulfated compounds were detected as [M−Na] − and [M−H] − , respectively, whereas di-, tri-, and tetrasulfated glycosides produced multiply-charged ions, [M−nNa] n− , where n represents the number of sulfate groups.
The MS/MS spectra were acquired for each compound by applying three different collision energy levels (75, 100, and 125%), which allowed spectra with different degrees of precursor ion fragmentation to be collected. However, it should be noted that for many sulfated compounds, MS/MS spectra obtained at a collision energy level of 75% show only the precursor ion, and the most informative spectra were obtained at a collision energy level of 125%. Conversely, for non-sulfated compounds, a collision energy level of 75% (90 eV) was found to be optimal for obtaining informative MS/MS spectra. Higher energy values led to excessive fragmentation, resulting in uninformative or low-intensity spectra.
The majority of tandem mass spectra in the negative ion mode have included characteristic neutral losses of monosaccharide units (176 Da for MeGlc, 162 Da for Glc, 146 Da for Qui or MeXyl, and 132 Da for Xyl) and product ions arising from the polycyclic nucleus or side chain bond cleavage. The analysis of the obtained MS data allowed us to conclude the characteristic fragmentation pathways of triterpene glycosides under CID conditions. It was found that the presence and number of sulfate groups determine the optimal collision energy and primary fragmentation pathways. The product ion spectra of most non-sulfated triterpene glycosides contained an intense Y-type product ion series. In the MS/MS spectra of glycosides with a lactone cycle, ions related to the cleavage of lactone cycle bonds followed by the elimination of the CO 2 molecule were observed. A neutral loss of 60 Da relates to the loss of an acetic acid molecule and is a characteristic of glycosides having an acetoxy group, while a neutral loss of 104 Da is indicative of the loss of a C 2 H 4 O 2 + CO 2 fragment and is a characteristic feature of glycosides that contain an acetoxy group and an 18(20)lactone cycle [30]. In addition, some compounds tend to lose H 2 Figure 2). In addition, weakly characteristic B-and C-type product ion series were observed.
The product ion spectra of sulfated glycosides shows an intense diagnostic ion at m/z 96.96, which indicates the presence of a sulfate group. The MS/MS spectra also exhibited a product ion series arising from the cleavage of glycosidic bonds, with the charge localized on the sulfate group. Another fragmentation product specific for all sulfated glycosides were the ions associated with sulfated monosaccharide units at m/z 255, 241, and 211 for sulfated methylglucose, glucose, and xylose, respectively.      In addition to B-and Y-type series, in the product ion spectra of di-, tri-, and tetrasulfated glycosides A-and X-type product ion series, formed by the cross-ring cleavages of monosaccharide units, and ions associated with the elimination of additional sulfate groups were detected.  Figure S15). MS/MS spectra of the trisulfated glycoside quadran-gularisoside D 2 also shows the formation of B-, C-, Y-, and Z-type product ion series, as well as A 3 , A 4 , and X 2 product ions resulting from cross-ring cleavages of the sulfated quinovose and xylose ( Figure S16).
Thus, the information provided in CID spectra typically allows the straightforward determination of the structure and sequence of the glycone moiety. On the other hand, the elucidation of the structure of aglycon by fragmentation in CID spectra is often challenging because most of the glycosides exhibit limited aglycon fragmentation. However, detailed analysis and comparison of the fragmentation patterns of a series of monosulfated glycosides have resulted in the identification of characteristic structure-related neutral losses.
Cucumarioside H 7 has a saturated side chain without substituents. The fragmentation of this compound shows a diagnostic series of neutral losses arising from cleavage of the C-20-C-22 bond (f-ions corresponding to the loss of the side chain, nomenclature proposed by Griffiths [31]), accompanied by the elimination of CO 2 , C 2 H 4 O 2 , and fragments of D-ring, as well as b-type ions produced by retro-Diels-Alder fragmentation of the B-ring ( Table 2). Product ion spectra of lefevreoside B and typicoside A 1 displayed similar series of neutral losses, but all masses of neutral losses were shifted by 2 and 4 Da, respectively, indicating one and two double bonds in the side chains of these molecules. The spectrum of cucumarioside H 5 , which has a 22Z,24-diene system in the side chain, shows the formation of a series of neutral losses that were identical to those of typicoside A 1 , which has a 22E,24-diene system in the side chain.  (Figure S15). MS/MS spectra of the trisulfated glycoside quadrangularisoside D2 also shows the formation of B-, C-, Y-, and Z-type product ion series, as well as A3, A4, and X2 product ions resulting from cross-ring cleavages of the sulfated quinovose and xylose ( Figure S16).
Thus, the information provided in CID spectra typically allows the straightforward determination of the structure and sequence of the glycone moiety. On the other hand, the elucidation of the structure of aglycon by fragmentation in CID spectra is often challenging because most of the glycosides exhibit limited aglycon fragmentation. However, detailed analysis and comparison of the fragmentation patterns of a series of monosulfated glycosides have resulted in the identification of characteristic structure-related neutral losses.
Cucumarioside H7 has a saturated side chain without substituents. The fragmentation of this compound shows a diagnostic series of neutral losses arising from cleavage of the C-20-C-22 bond (f-ions corresponding to the loss of the side chain, nomenclature proposed by Griffiths [31]), accompanied by the elimination of CO2, C2H4O2, and fragments of D-ring, as well as b-type ions produced by retro-Diels-Alder fragmentation of the Bring ( Table 2). Product ion spectra of lefevreoside B and typicoside A1 displayed similar series of neutral losses, but all masses of neutral losses were shifted by 2 and 4 Da, respectively, indicating one and two double bonds in the side chains of these molecules. The spectrum of cucumarioside H5, which has a 22Z,24-diene system in the side chain, shows the formation of a series of neutral losses that were identical to those of typicoside A1, which has a 22E,24-diene system in the side chain. The most interesting fragmentation pattern was observed in the spectra of glycosides having an aglycon with ∆ 25 double bonds. The MS/MS spectra revealed the h-type fragment ions arising from the cleavage of the C-22-C-23 bond (Table 2). For example, in the product ion spectra of colochiroside A 1 and lefevreoside C, which have aglycons with ∆ 25 double bonds, neutral losses similar to those in the product ion spectra of typicoside A 1 and the fragment peak at m/z 1123 corresponding to the loss of the C 5 H 10 fragment were observed. Similar h-type ions arising from the loss of 70 Da (C 5 H 10 ) were observed in the MS/MS spectra of cucumarioside A 2 -2, philinopside E, and colochiroside A 2 , all of which contained a ∆ 25 double bond. These compounds, as well as colochiroside A 3 , do not have an acetoxy group at C-16, which is indicated by the shift in the mass of neutral losses of b 1 and b 2 ions and the production of a neutral loss of 170 Da (C 9 H 14 O 3 ) (168 Da in colochiroside A 3 ).
The spectrum of colochiroside B 1 , which has a 24-hydroxy-25-ene side chain, also shows neutral losses similar to those observed in the MS/MS spectrum of colochiroside A 1 , but masses of neutral losses were shifted by 16 Da. Colochiroside B 2 and cucumarioside H 2 show similar b 1 and b 2 ions but also exhibited neutral losses of 134 (C 6 H 14 O 3 ) and 178 (C 7 H 14 O 5 ) Da. Colochiroside B 3 exhibited a neutral loss series shifted by 2 Da compared to that of colochiroside B 1 , which was caused by the replacement of a hydroxyl with a keto group. Product ion spectra of the glycosides with substitutions at C-23 (okhotoside A 1 -1 and cucumarioside A 0 -1 with 23-oxo side chains and frondoside D with 23-hydroxy side chains) also exhibited a similar characteristic series of neutral losses. Figure 4 shows the molecular networks generated with the sea cucumber triterpene glycosides spectral library as input. The resulting networks consisted of 834 edges and 191 nodes, which represent the analyzed compounds. Although the results were not homogenous, since the view depends on the cosine threshold definition and other parameters, some effect of compound structures on the topology of molecular networks can be observed. Overall, the obtained molecular networks tended to cluster triterpene glycosides according to several structural features. Thus, the number of sulfate groups had the strongest influence on clustering. All the non-sulfated compounds were split into three clusters, as were the monosulfated compounds, which were mostly in separate clusters. Most of the small clusters were represented by nodes related to tri-and tetrasulfated triterpene glycosides. Other structural features that influenced clustering were the number of monosaccharides, the number of acetate groups, and the structure of the side chains of the aglycons. Thus, clusters A and D ( Figure 4) consisted mainly of non-sulfated cladolosides, but cluster A included cladolosides with two acetoxy groups at C-16 and C-22, whereas cluster D consisted of cladolosides without an acetoxy group. Cluster F contained the non-sulfated tetraosides, cucumariosides of the A-group. Clusters C and E consisted of monosulfated triterpene glycosides, whereas cluster B contained both mono-and disulfated compounds. Most of the compounds in cluster B were characterized by the presence of a ∆ 25 double bond, whereas the glycosides in cluster C had ∆ 22 , ∆ 23 , or ∆ 24 double bonds. Most triterpene glycosides forming cluster E had shortened side chains.

Analysis of the Chromatographic Behavior of Triterpene Glycosides
The analyzed metabolites had retention times that ranged from 4.3 to 18.9 min (Table S3). It should be noted that under the conditions used, chromatographic peaks were often observed to be broadened and asymmetrical for di-and trisulfated compounds. Reproducible elution profiles and retention times for tetrasulfated compounds were not achievable under the reversed-phase conditions used. Therefore, direct injection of the sample into the ESI ion source was applied for the analysis of psolusosides P and Q, chilensosides D, E, F, and G, and chitonoidosides K 1 and L, along with two trisulfated quadrangularisosides D and D 1 .

Analysis of the Chromatographic Behavior of Triterpene Glycosides
The analyzed metabolites had retention times that ranged from 4.3 to 18.9 min (Table  S3). It should be noted that under the conditions used, chromatographic peaks were often observed to be broadened and asymmetrical for di-and trisulfated compounds. Reproducible elution profiles and retention times for tetrasulfated compounds were not achievable under the reversed-phase conditions used. Therefore, direct injection of the sample into the ESI ion source was applied for the analysis of psolusosides P and Q, chilensosides D, E, F, and G, and chitonoidosides K1 and L, along with two trisulfated quadrangularisosides D and D1.
Retention times in reversed-phase chromatography are known to be highly dependent on the lipophilicity of a compound. However, it was found that the correlation between retention times and log P, which may be considered a function of compound lipophilicity [32], is weak (R2 = 0.52) ( Figure S19). Moreover, a considerable number of triterpene glycosides with similar calculated log P values displayed significant variations in retention times. For instance, cucumarioside A14 and chitonoidoside A, despite having comparable calculated log P values (−0.41 and −0.383), exhibited distinct chromatographic behaviors with retention times of 6.4 and 16.3 min, respectively. We have attempted to examine the effect of individual substituents and structural features on the chromatographic behavior of triterpene glycosides.
Compared to non-sulfated triterpene glycosides, sulfated compounds exhibited shorter mean retention times (p = 0.0002, Figure S20a). The appearance of the first sulfate group had no significant effect, but the presence of a di-or trisulfated oligosaccharide chain led to a significant reduction of retention time (p = 0.007 and p < 0.0001, respectively, Retention times in reversed-phase chromatography are known to be highly dependent on the lipophilicity of a compound. However, it was found that the correlation between retention times and log P, which may be considered a function of compound lipophilicity [32], is weak (R2 = 0.52) ( Figure S19). Moreover, a considerable number of triterpene glycosides with similar calculated log P values displayed significant variations in retention times. For instance, cucumarioside A 14 and chitonoidoside A, despite having comparable calculated log P values (−0.41 and −0.383), exhibited distinct chromatographic behaviors with retention times of 6.4 and 16.3 min, respectively. We have attempted to examine the effect of individual substituents and structural features on the chromatographic behavior of triterpene glycosides.
Compared to non-sulfated triterpene glycosides, sulfated compounds exhibited shorter mean retention times (p = 0.0002, Figure S20a). The appearance of the first sulfate group had no significant effect, but the presence of a di-or trisulfated oligosaccharide chain led to a significant reduction of retention time (p = 0.007 and p < 0.0001, respectively, Figure S20b). For example, the non-sulfated holotoxin A 1 has a retention time of 13.1 min, while its monosulfated analog, cladoloside L 1 , has a retention time of 12.3 min. The addition of the second and third sulfate groups reduced the retention time more significantly. Monosulfated cucumarioside H 3 exhibited a chromatographic peak at 8.8 min, whereas its disulfated analog, cucumarioside I 4 , had a retention time of 7.4 min. Similarly, disulfated chilensoside B displayed a peak at 10.8 min, whereas trisulfated analog chilensoside C eluted at 9.8 min.
The introduction of an additional monosaccharide unit caused a slight reduction in retention time ( Figure S20c). For example, the appearance of the additional glucose unit in the hexasaccharide chain of kuriloside H (6.6 min) resulted in a small alteration of chromatographic behavior compared to its precursor, which has a pentasaccharide chain, kuriloside I 1 (7.0 min). However, the impact became greater when a branching monosaccharide was added: the addition of xylose to C-2 of the quinovose residue of okhotoside A 1 -1 (12.3 min) reduced the retention time of the resulting pentaoside, cucumarioside A 0 -1 (11.2 min). Replacing a monosaccharide unit also resulted in an alteration in retention time, as was demonstrated by several examples. For instance, substituting the quinovose residue in cladoloside F 1 with a xylose residue in cladoloside E 1 caused a decrease in retention time of 0.6 min, from 15.1 to 14.5 min. In contrast, replacing two glucose units in cladoloside M 2 with xylose units in cladoloside D 1 increased the retention time from 12.1 to 13.3 min. These observations suggest that monosaccharide replacement increases retention time in the row glucose-xylose-quinovose.
The effect of the double bond position in the polycyclic nucleus was found to be negligible ( Figure S20d), whereas the presence of a lactone cycle had a significant impact, particularly the formation of 18(20)-lactone, which led to an increase in retention time (p < 0.0001, Figure S20e). Cucumarioside A 9 , which has two hydroxy groups at positions C-20 and C-18, shows a chromatographic peak at 8.4 min. In contrast, its analog with an 18(20)-lactone cycle (cucumarioside A 7 ) had a retention time of 9.7 min. The presence of acetoxy groups (especially two groups) was found to increase retention time ( Figure S20f). For example, kurilosides K and K 1 , which differ in the presence of the acetoxy group at C-16, exhibited a notable difference in their chromatographic behavior, with retention times of 4.8 and 6.1 min, respectively. Similarly, cladoloside K 2 , which has an acetate group at C-16, had a retention time of 8.3 min, while its analog with an additional acetate group at C-22 (cladoloside K 1 ) shows a shift in retention time to 12.7 min (similar to the case of cladolosides D 1 and D 2 ).
The elution of triterpene glycosides in reversed-phase conditions is significantly affected by the structure of the aglycon side chain and the presence of oxygen-containing substituents in the side chain ( Figure S20g). The compounds with short side chains had the smallest retention times (e.g., fallaxosides C 1 , C 2 , D 2 , and D 7 ), while cucumarioside A 3 (with a butoxy-group attached to C-25) had the longest retention time. The addition of a double bond in the side chain did not result in a statistically significant change in retention time ( Figure S20h), but when comparing the elution of similar compounds, it can be observed that glycosides with saturated side chains have longer retention times. Cucumarioside A 15 , which possesses a saturated side chain, exhibited a retention time of 18.7 min. In contrast, its analog, cucumarioside A 1 , containing a ∆ 24 double bond, exhibited a retention time of 17.1 min. The addition of a double bond at the C-25 position resulted in a smaller reduction of the retention time by approximately 1.1 min, as observed in rows of cladolosides E 1 -E 2 , F 1 -F 2 , P-P 1 , and D-D 1 . The position of the double bond seems to have little influence on retention time (the retention times of magnumosides A 3 with ∆ 25 double bond and A 4 with ∆ 24 double bond differ by 0.3 min). However, isomers with different double bond configurations in the side chain show distinct differences in chromatographic behavior. For example, compounds with a 22Z,24-diene fragment in the side chain (such as pacificusoside G and cucumarioside C 1 ) elute about 0.4 min earlier than analogs with a 22E,24-diene fragment in the side chain (such as pacificusoside E and cucumarioside C 2 ).
The presence of an oxygen-containing substituent in the side chain has been observed to significantly decrease the retention time ( Figure S20i). The introduction of a hydroxyl group, a hydroperoxy group, and a keto group demonstrate the most significant impact on chromatographic behavior ( Figure S20j). Conversely, the presence of an ester bond significantly increased the retention time. The presence of a hydroxy group at C-24 in magnumoside C 2 (5.4 min) significantly changed the retention time compared to a similar compound without a substituent in the side chain (magnumoside C 3 , 9.1 min). The replacement of the hydroxyl group with a keto group also altered the retention time (e.g., frondoside D with a 23-hydroxy fragment in the side chain shows a peak at 9.1 min, while similar okhotoside A 1 -1 with a 23-oxo fragment in the side chain eluted at 12.3 min). The position of the oxygen-containing substituent did not strongly influence the chromatographic behavior, but metabolites with substituents at the C-25 position had longer retention times than their isomers with substituents at the C-24 position, e.g., quadrangularisosides A (9.0 min) and A 1 (8.8 min).
Thus, the retention times of triterpene glycosides on reverse-phase chromatography are affected by various structural features, and structural changes have different effects on chromatographic behavior. Some of them, such as the presence of sulfate groups, changes in the aglycon side chain structure, lactone cycle formation, and the presence of substituents in the side chain, significantly affect the chromatographic behavior of triterpene glycosides, while other structural variations result in a small alteration.

LC-MS Analysis of the Eupentacta fraudatrix Extract
Technical validation was achieved by using the created spectral library to dereplicate triterpene glycosides in the extract of the previously studied Far Eastern sea cucumber, E. fraudatrix. Previous studies of the chemical composition of E. fraudatrix have led to the isolation of 37 triterpene glycosides, including 19 non-sulfated glycosides, 12 monosulfated glycosides, and 6 disulfated triterpene glycosides [30,[33][34][35][36][37]. LC-ESI MS profiling of triterpene glycosides detected a total of 54 compounds, including 44 structurally annotated compounds, by comprehensive analysis of retention times, MS/MS data, elemental compositions, and biogenetic hypotheses. The identification of detected compounds was primarily based on the obtained data, including molecular formulae, MS/MS spectra, and chromatographic behavior. However, the structural identification was limited to the previously isolated compounds from this sea cucumber. The comparison with corresponding standards enabled the identification of 12 glycosides, while the remaining 8 glycosides were annotated based on elemental compositions and MS/MS data only [30].
LC-ESI MS profiling of E. fraudatrix followed by data processing in MZmine 2.53 yielded a total of 552 features. Dereplication of triterpene glycosides was achieved following two strategies: dereplication of the profiled compounds by GPNS against the mgf spectral library by MS data only, and identification of compounds against the local database by MZmine and Bruker Library Editor by comparison of MS/MS spectra and retention times with the data obtained for the authentic chemical standards.
The use of the MS data for library search enabled the annotation of 39 features. It is noteworthy that the metabolome of sea cucumbers displays a substantial prevalence of isomeric triterpene glycosides, with several of these isomers exhibiting minimal structural differences and having closely related MS/MS spectra. Relying solely on mass spectrometry data for identification purposes in such cases may lead to erroneous results. Indeed, the analyzed E. fraudatrix extract shows many isomeric triterpene glycosides with identical m/z and similar MS/MS spectra. Specifically, among five compounds with a molecular ion mass of m/z 1307, three were annotated as cucumarioside H 5 based on the close similarity of their MS/MS spectra. Using retention times enabled a more accurate determination of the metabolites in the sample, resulting in their complete identification. Identification via a database containing not only mass spectra but also chromatographic data allowed us to refine results and identify 27 compounds (Table 3).  Among the identified metabolites, 16 compounds had previously been isolated from E. fraudatrix, including 7 non-sulfated glycosides (cucumariosides A 1 , A 7 , A 11 , A 15 , C 1 , C 2 , and D), 7 sulfated pentaosides (cucumariosides H 2 , H 3 , H 4 , H 5 , H 6 , and H 8 ), and 2 disulfated pentaosides (cucumariosides I 3 and I 4 ). Additionally, 12 compounds that had been previously isolated from other species were identified for the first time in E. fraudatrix. These included colochirosides A 1 , B 1 , and B 2 , magnumoside B 3 , pacificusosides A, B, E, G, and J, quadrangularisoside A, and typicosides A 1 and C 2 . It is noteworthy that the structures of these compounds corresponded to the common structural patterns observed in known triterpene glycosides of E. fraudatrix. According to the literature, the triterpene glycosides found in E. fraudatrix contain 3-O-MeXyl or 3-O-MeGlc as the terminal monosaccharide unit. Colochirosides A 1 , B 1 , B 2 , magnumoside B 3 , quadrangularisoside A, and typicoside A 1 have a linear oligosaccharide chain with four monosaccharides-methylglucose as the terminal monosaccharide unit, xylose, quinovose, and sulfated xylose. Typicoside C 2 has a similar oligosaccharide chain pattern, with sulfated glucose as the third unit. The non-sulfated glycosides, pacificusosides A and B, share a similar oligosaccharide chain pattern with known cucumariosides of the C-group, consisting of five monosaccharide units, including methylxylose, glucose, quinovose, and xylose in the main chain and xylose as the branching unit at the second monosaccharide. Pacificusoside J has a similar structure of an oligosaccharide chain with methylated glucose as the terminal monosaccharide unit. Pacificusosides G and E are tetraosides without terminal methylated units. Furthermore, it is worth noting that pacificusosides were previously found in the Far Eastern starfish Solaster pacificus and were considered food markers. The identification of pacificusosides in E. fraudatrix supports the hypothesis that these substances are obtained by the starfish through their diet and can persist in the starfish organism without undergoing significant metabolic transformations.
The sea cucumber metabolome is characterized by the presence of triterpene glycosides that share a common main structural pattern but vary in minor structural details. An approach based on identifying metabolites with similar MS/MS spectra and characterizing their structures via differences in their fragmentation patterns can significantly expand the scope of annotated metabolites in the LC-MS analysis of complex samples. Indeed, using the GPNS analog search tool for E. fraudatrix significantly extended the list of annotated metabolites, revealing 113 structural analogs that exhibit similar MS/MS fragmentation patterns to those of known compounds in the spectral library. Several typical mass differences were observed that correspond to specific changes in the structure of the compounds. An increase or decrease of m/z of the precursor ion by 176 Da corresponds to the methylglucose, while changes of 146 and 132 Da correspond to quinovose (or methylated xylose) and xylose units, respectively. An increase in mass of 58 Da corresponds to the addition of an acetate group, while a common difference of 30 Da suggests the substitution of glucose with xylose. Smaller differences of 18, 16, 14, 12, and 2 Da correspond to the loss or addition of H 2 O, O, CH 2 , C, and double bonds, respectively.
To highlight the chemical diversity of E. fraudatrix extract, the molecular network was generated using the MS/MS spectra as input data. The resulting networks consisted of 834 edges and 552 nodes, which formed 14 clusters containing more than four nodes ( Figure 5). The majority of the compounds were clustered in five clusters (A-E). Most of the identified and annotated features were found in clusters B, C, and E. Cluster B was mostly composed of non-sulfated triterpene glycosides, while clusters C and E predominantly comprised monosulfated glycosides. On the other hand, clusters A and D mostly consisted of unannotated metabolites, with cluster D mainly containing features related to disulfated triterpene glycosides.
Closely related compounds were found to be represented by closely spaced nodes in the molecular networks. For example, the node related to feature 601 was identified as cucumarioside H 6 , which had several neighbors that exhibited similar fragmentation patterns. The MS/MS spectra of six compounds (242, 265, 273, 633, 639, and 657; the numbers correspond to the feature numbers defined by MZmine), as well as the MS/MS spectra of cucumarioside H 6 , revealed a neutral loss series of 128, 202, 228, 360, and 372 Da ( Figure S21), which is characteristic for triterpene glycosides containing a holostane aglycon with a ∆ 24 side chain (Table 2). Furthermore, a characteristic fragment peak Y 1 was detected at m/z 723 in the MS/MS spectra of all these compounds, indicating similar aglycon and sulfated xylose as the first monosaccharide unit. However, the masses of the molecular ions and the product ion series were shifted compared to cucucumarioside H 6  To highlight the chemical diversity of E. fraudatrix extract, the molecular network was generated using the MS/MS spectra as input data. The resulting networks consisted of 834 edges and 552 nodes, which formed 14 clusters containing more than four nodes ( Figure  5). The majority of the compounds were clustered in five clusters (A-E). Most of the identified and annotated features were found in clusters B, C, and E. Cluster B was mostly composed of non-sulfated triterpene glycosides, while clusters C and E predominantly comprised monosulfated glycosides. On the other hand, clusters A and D mostly consisted of unannotated metabolites, with cluster D mainly containing features related to disul fated triterpene glycosides.  Thus, the use of the obtained database of tandem mass spectra and retention times of 191 triterpene glycosides allowed the accurate dereplication of known triterpene glycosides and the annotation of some novel compounds. The example shown demonstrates the reliability of the database created, but at the same time, it has certain limitations that are common to such databases. The dataset is limited to the available compounds, but it will be expanded in the future as new compounds are isolated. Another limitation relates to data inconsistency. It is important to recognize that tandem mass spectra are heavily influenced by the instrument used and fragmentation settings, similar to how retention times are strongly influenced by the chromatographic setup and column used. Although the database includes MS/MS spectra obtained with different fragmentation energy settings, this limitation complicates spectral comparisons and may lead to the erroneous annotation of isomeric or structurally similar compounds.

Conclusions
This study presents a comprehensive database of sea cucumber triterpene glycosides, consisting of 191 compounds. The data were generated using ultra-performance liquid chromatography-quadrupole time-of-flight mass spectrometry and included retention times and MS/MS spectra. The used compounds were isolated from 15 species of the class Holothuroidea and one starfish species and comprise the majority of known structural variants of sea cucumber triterpene glycosides. By analyzing the fragmentation patterns, characteristic structure-related neutral losses were identified, which aided in the identification of isomeric and novel compounds. The retention time data provide important insights into the factors that influence the chromatographic behavior of triterpene glycosides in RP LC-MS analysis. This database represents a valuable resource for the identification and annotation of sea cucumber triterpene glycosides. To the best of our knowledge, this is the first MS database of sea cucumber triterpene glycosides. This resource will be highly valuable to researchers in the field of natural products, serving as a powerful tool for dereplication and streamlining the process of isolating novel bioactive compounds. The spectral library provided will greatly expedite the annotation of metabolites in MSbased research on echinoderms, including targeted or untargeted metabolomic studies, investigations into the biosynthesis and biological functions of triterpene glycosides, and chemotaxonomic analyses of newly reported or unstudied species. Additionally, the data obtained can serve as a reference dataset for investigating the MS fragmentation patterns of natural products, further enhancing research capabilities in this area.

Data Availability Statement:
The obtained data (raw scans in .mzML format, separate mgf files for every compound, a merged mgf file, and the Bruker .mlb spectral library) are available as Supplementary Materials. MS scans containing raw and derived MS and MS/MS spectra were also deposited on MetaboLights public repository (www.ebi.ac.uk/metabolights/MTBLS7506, accessed on 20 June 2023).