Secondary or Specialized Metabolites, or Natural Products: A Case Study of Untargeted LC–QTOF Auto-MS/MS Analysis

The large structural diversity of specialized metabolites represents a substantial challenge in untargeted metabolomics. Modern LC–QTOF instruments can provide three- to four-digit numbers of auto-MS/MS spectra from sample sets. This case study utilizes twelve structurally closely related flavonol glycosides, characteristic specialized metabolites of plant tissues, some of them isomeric and isobaric, to illustrate the possibilities and limitations of their identification. This process requires specific software tools that perform peak picking and feature alignment after spectral deconvolution and facilitate molecular structure base searching with subsequent in silico fragmentation to obtain initial ideas about possible structures. The final assignment of a putative identification, so long as spectral databases are not complete enough, requires structure searches in a chemical reference database, such as SciFindern, in attempts to obtain additional information about specific product ions of a metabolite candidate or check its feasibility. The highlighted problems in this process not only apply to specialized metabolites in plants but to those occurring in other organisms as well. This case study is aimed at providing guidelines for all researchers who obtain data from such analyses but are interested in deeper information than just Venn diagrams of the feature distribution in their sample groups.


Introduction
Secondary, or specialized, metabolites, according to more recent suggestions [1], comprise all those that are not important to sustain life but rather contribute to adaptation and survival. Primary or central metabolites, by contrast, represent all those that are indispensable to maintain life. Whereas their number is estimated to comprise around 10,000, specialized metabolites probably amount to several hundreds of thousands [1]. Other names for specialized metabolites are natural products or phytochemicals, terms that are used in pharmaceutical sciences to distinguish drugs of natural origin from synthetic ones. Specialized metabolites occur in all prokaryotic and eukaryotic organisms, with a tendency to become more scarce in those organism groups that developed an advanced immune system, such as vertebrates [2].
When the term 'metabolomics' was coined, the first published studies focused on the central metabolism. The predominantly used method was GC-MS, gas chromatography linked to a mass spectrometer with electron impact ionization (EI) [3,4]. The digitization of data acquisition facilitated accessing commercial and freely available spectra libraries. Among the latter, the Golm Metabolome Database (GMD) [5] and the FiehnLib [6] provide the most extensive mass spectral data collections. GC is especially suitable to separate sugars, but it also allows analyzing of amino and organic acids as well as many other central metabolites. Its major constraint is that it fails to separate larger molecules, for example, larger peptides and oligosaccharides, the upper limit being 600-650 Da. Volatility represents a major constraint. Very small terpenoids and simple aromatic specialized metabolites, all of them highly volatile and components of odors that are produced by flavonol trisaccharides with a mass of 756 and 772 Da will be focused on to illustrate the identification problems of isomeric metabolites. In three of the exemplary flavonoid glycosides, a phenylpropanoid acid replaces one sugar, which only in two cases increases the mass to 816 Da. The MS/MS spectra of the exemplary flavonol glycosides derive from an ongoing study that focuses on physiological traits that associate with the development of apomixis (asexual propagation) in polyploid species of goldilocks (Ranunculus auricomus L.), a herbal species that can become very common in less extensively used meadow habitats, especially in Europe and adjacent Asia [36].
This case study aims to provide a guideline for all those who want to start with auto-MS/MS-based identification of specialized metabolites. In this context, different issues will be explored: (1) suitability of auto-MS/MS spectra for identification; (2) limiting structural candidates with the help of molecular structure databases and in silico fragmentation; and (3) SciFinder n as a tool for finding information about structural candidates, which, ultimately, is also necessary.

Plant Material
The origin of the plant material, buds of Ranunculus auricomus, is described elsewhere [36]. Bud material was stored frozen at -20 • C.

Extraction and Fractionation
Several buds, 5-8(10), from one individual were freeze-dried (ZM200, Retsch, Haan, Germany) and extracted with 1 mL MeOH. After drying on a SpeedVac (RVC 2-25, Martin Christ Gefriertrocknungsanlagen, Osterode am Harz, Germany), the residue was dissolved in 0.7 mL water. To achieve optimal dissolution, the samples were incubated in dark for 48 h. Then, 0.7 mL ethyl acetate (p.a.) was added and the sample thoroughly mixed. For efficient phase separation, all samples were incubated for 24 h in dark. Glass vials were standard autosampler vials (Macherey Nagel, Düren, Germany). The MeOH and Etac phases were dried separately on the SpeedVac and stored in argon atmosphere at −20 • C until further processing. The yields of the Etac and water phase varied, 0.1-2.9 mg for the former and 0.7-12.8 mg for the latter.

Sample Preparation
The Etac phase was dissolved in MeOH:water:formic acid (25:75:0.1, v/v/v) to yield a final concentration of 0.1 mg/mL. Samples were injected without further filtration.

LC-TOFMS Analysis
Samples were analyzed by an Agilent 1290 Infinity II LC system coupled to an Agilent G6545A quadrupole time-of-flight mass spectrometer (QTOF, Agilent Technologies, Waldbronn, Germany). The LC system consisted of a G7167A multisampler, a G7104A quaternary pump, and a G7117B DAD UV detector (Agilent Technologies, Waldbronn, Germany). The column was an Agilent Zorbax Eclipse C18 column, 1.8 µm particle size, 100 × 2.1 mm (Agilent Technologies, Waldbronn, Germany). The MS interface was an orthogonal electrospray ionization source (ESI).
The solvent gradient started from 95% A (water + 0.1% formic acid) and changed to 100% B (MeOH + 0.1%) in 15 min that was kept for further 5 min. The flow rate was 0.4 mL/min. The column compartment temperature was set to 35 • C. The DAD recorded spectra from 220 to 580 nm. The injection volume was 8 µL.
All samples were analyzed in the positive and negative ion-mode separately with otherwise identical parameters. The acquisition range was set from m/z = 100-1700 with a scan rate of 2 spectra/sec. The ESI parameters were set as follows: sheath gas flow was 11 L/min, the sheath gas temperature 350 • C, the nebulizer pressure 35 psig, the gas flow 8 L/min, the scan source parameters VCap 3500 V, nozzle voltage 1000 V, fragmentor 175 V, and skimmer1 65 V. Parameters for the auto-MS/MS mode were as follows: m/z = 20-1000, scan rate 3 spectra/sec. The collision energy was set according to mass: 100 (10 eV), 300 (26 eV), 500 (45 eV), 700 (60 eV). The precursor selection was defined as follows: max precursor per cycle 1, threshold abs. 1, threshold (rel., %) 0.01, purity stringency (%) 100, purity cutoff (%) 30, isotope model common, active exclusion enabled but released after 2 spectra and after 0.5 min, precursors sorted after abundance only, and isolation width narrow (~1.3 Da). In the positive mode, reference masses were 121.0508773 and 922.009798, and, in the negative mode, 119.03632 and 966.000725. The choice of the parameters followed recommendations of the Agilent technician with a slight modification (max precursor per cycle) [37].

Identification and Nomenclature
Pure trivial names of putatively identified flavonoid glycoside structures are avoided except for well-known flavonoid ring structures (isorhamnetin, luteolin, kaempferol, and quercetin). Generally, it is more or less impossible to differentiate between isomeric hexoses (glucose and galactose) and pentoses (xylose, arabinose, and apiose). If [M + Na] + MS/MS spectra are available, glucose and galactose and even different linkage types between them and rhamnose can be differentiated [39]. In all other cases, hexoses are assumed to be glucose, deoxyhexoses to be rhamnose, and pentoses to be xylose. Sugars are abbreviated (gal, galactose; glu, glucose; rha, rhamnose; xyl, xylose). Bonds between sugars are not specifically indicated if they are (1→6) (glucose) or (1→4) (rhamnose). Positions on the first sugar linked to the flavonoid are indicated by " and the second sugar by ".
Substitution patterns on the flavonoid ring system were elucidated based on the preva- [40]. MS/MS fragmentation of flavonoid rings were compared to full collision energy ramp MS/MS spectra [41]. Trisaccharide fragmentation was compared to available CID studies [39,[42][43][44]. The auto-MS/MS method generates spectra in which the product ions of the flavonoid ring system are very conspicuous and hint the substitution pattern of the ring system. The number of product ions is much higher for the glycoside part of the precursor ion and they are thus less clearly visible and more difficult to interpret. Table 1 lists the identified product ions that contributed to identification of the 12 flavonol glycosides. All were detected in the negative mode, and only four of them in the positive mode. In all cases, both [M + H] + and [M + Na] + precursor ions were detectable. Only one sodium adduct failed to yield an MS/MS spectrum.   Sugar isomers cannot be differentiated on the basis of MS/MS spectral data, and reporting them as canonical SMILES does justice to this deficiency. Glucose, rhamnose, and xylose represent the most commonly reported sugars in flavonol triglycosides [45], and thus a hexose was assumed to be a glucose, a deoxyhexose a rhamnose, and a pentose a xylose in those cases in which no information from [M + Na] + MS/MS spectra were available. The main reason for this approach was to facilitate database indexing of all compounds.

Results
Table S1 contains all the important information for database indexing, an overview of the structures, trivial names used in the text, semisynthetic IUPAC names, identification level, SMILES, and the most closely resembling CAS registry numbers. In the figures, tables, and the text, a modified trivial nomenclature is used that aims to keep the names as short as possible without losing structure information (see Section 2.7).
Features at 283 Da-this one especially-and 271 Da, and 151 Da provided support for the first sugar being a rhamnose [41]. The feature at 446 Da provided further evidence that sugars 2 and 3 are both glucoses. The two remaining glucose moieties may connect 1→4, in a similar way as it was shown for synthetic flavonol glycosides [51]. Neither the finally assigned structure nor the possible isomer quercetin-7-(2 -glu)-rha-glu are reported in the literature. Table 1. Positive and negative MS/MS data for 12 selected flavonoids; retention times, putative identification, precursor ion masses, calculated precursor ion masses, ∆ ppm values, adduct types, literature/database reference spectra. Abbreviations: glu, glucose; rha, rhamnose; xyl, xylose; Y 0 , flavonoid. Serial letters refer to Figure 1.

Ret.
Putative Structure 1 Prec. ion Calc. D ppm Adduct Lit/Ref.    The structure of kaempferol-3-glu-rha-7-glu was supported by the observed in-source fragmentation. A feature at 593 Da indicated the loss of a glucose, and a feature at 447 Da the further loss of a rhamnose. The features 357 and 241 Da characterize a kaempferol-3glu structure [41]. The proposed presents one of the more commonly occurring flavonol glycosides [46,49,52,53].
Despite reports of 255 and 241 features [41], the observed 256 and 241 Da features can be regarded as congruent with a kaempferol-glu-rha feature. The additional glucose linked to the first glucose sugar could have caused this change. Consequently, the proposal from SIRIUS might be correct, but an isomer in the sugar part of the molecule is also possible. Kaempferol-3-(2 -glu)-glu-rha is mentioned in several studies [49,52,54]. Features at 255 and 271 Da concurred with assumptions of a quercetin-glu structure, and a feature at 283 Da with a quercetin-rha structure [41]. A loss of glucose and rhamnose causes a 446 Da feature in the negative spectrum. This flavonoid glycoside was also detected as [M + H] + and [M + Na] + . The [M + Na] + MS/MS spectrum showed characteristic fragments for an rhaα6glu linkage [39]. The latter also showed a glu-rha feature at 347 Da (glu-rha). Quercetin-3-glu-rha-7-rha represents an often reported flavonol glycoside [55][56][57].
In this case, fortunately, several in-source features were visible in the positive mode, and the [M + H] + adducts at 611, 465, and 303 Da proposed a quercetin-7-glu-glu-rha structure. Rather prominent features at 255 and 271 Da provided support for a quercetinglu structure [41]. The MS/MS spectrum of the in-source [M + H] + adduct at 303 Da showed high similarity to an MS/MS spectrum of quercetin deposited in Massbank (MB: PR309259). Tricetin, an isomer, was not found (HMDB0029620, [61]). The [M + Na] + MS/MS spectrum contained fragments supporting a (2 -glu)-glu-rha trisaccharide structure [39]. The proposed structure is thus quercetin-7-(2 -glu)-glu-rha, which is not described in the literature.
The remaining question concerned the sugar linkage. In this context, the presence of a feature at 489 da pointed at quercetin-3-(2 -rha)-glu-rha [56], which also represents the most commonly reported one in the literature [58][59][60].

Luteolin-7-(2 -glu)-glu-rha (8.56 min)
The negative MS/MS spectrum showed a prominent Y 0 feature at 285 Da that first suggested a 7-substituted kaempferol [40]. The MS/MS spectral search in MS-DIAL afforded kaempferol-3-glu-rha-(3 -rha). MS-Finder proposed 3,7-disubstituted kaempferol derivatives, SIRIUS the same structure as the MS-DIAL search. The flavonoid glycoside was also detectable in the positive mode and the in-source fragmentation product ion at 287 Da, which originally was thought to be kaempferol. The MS-DIAL search, however, revealed the spectrum to show the closest similarity to luteolin (BMDMS-NP 29252). The absence of features at 117 and 135 Da pointed to a linkage of the glycoside moiety to carbon 7 instead of 4 [42].
The sugar moiety was assumed to be (2 -glu)-glu-rha, in analogy to quercetin-7-(2 -glu)glu-rha, for which even a literature report exists on the basis of NMR data [62]. The [M + Na] + MS/MS spectrum contained fragment ions that are characteristic of a 6α linkage between rhamnose and glucose and a 2β linkage between two glucoses. Fragments that characterize galactose were missing [39]. No reports on this flavonoid glycoside exist in the literature. A feature at 357 Da provided some evidence for a glucose on position three [41]. There exist now three possibilities how the additional glucose and the ferulic acid are linked to the glucose. One possibility is isorhamnetin-3-glu-glu-ferulic acid [63] and the other isorhamnetin-3-(2 -glu-ferulic acid)-glu [64], both of which were elucidated by 2D-NMR techniques. However, in consideration of a simple way to convert this ferulic acid-isorhamnetin glycoside to the structure of the one eluting at 9.50 min, isorhamnetin-3glu-ferulic acid-7-glu, the structure of isorhamnetin-3-(2 -glu)-glu-ferulic acid is proposed. A similar linkage pattern was found in quercetin and kaempferol glycosides that form esters with coumaric, caffeic, and sinapic acid, the formation of which seems to be catalyzed by a specific flavonol-phenylacyltransferase [60]. All of these proposals pin the phenolic acid to glucose 6 position. A feature at 175 Da adds support for a ferulic acid moiety.
The spectrum showed a product ion at 524 Da, for which, despite a high probability of deriving from the flavonol glycoside parent ion, no structure could be assigned to.
In-source fragmentation led to a feature of 609 Da that limited the linkage of the p-coumaroyl moiety to the glucose linked directly to quercetin. A feature of 395 Da even pinned the position of the p-coumaric acid to carbon 2 on the glucose ring (for product ion structures, see Figure S1). Consequently, the proposed structure is quercetin-3-(2 -pcoumaric a.)-glu-rha [59].
A feature at 175 Da indicated a presence of ferulic acid, similar to isorhamnetin-3-(2 -glu-6 -feruloyl)-glu. The same mass pointed to an isomeric structure, for which isorhamnetin-3-(6 -feruloyl)-glu-7-glu is the most likely candidate. A series of similar kaempferol and quercetin glycosides in which sinapic acid serves as phenylpropanoid acid occurs in Arabidopsis [60].
The spectrum showed a product ion at 544 Da, for which, despite a high probability of being a derivative from the flavonol glycoside parent ion, no putative structure could be assigned.

UV-Spectra
Despite UV-DAD spectra being acquired in all the analyses, the analyte concentrations turned out to be too low to obtain interpretable spectra.

Structure Identification and Feature Sorting on Basis of Auto-MS/MS Spectra
The exact masses that are obtained in the MS 1 level do not suffice for structure elucidation. More complex specialized metabolites, such as flavonol glycosides, but also other specialized metabolite classes, e.g., saponins or steroid glycosides, require MS/MS spectra for more or less reliable putative identification. The loss or addition of a hydroxyl or a methyl group can happen both on the aglycone and the glycoside moiety, which results in a series of isomeric derivatives. Depending on which collision energies or energy ramps are used in CID or auto-MS/MS methods, the obtained spectra look different. Lower collision energies generate spectra with prominent precursor ions; higher collision energies cause a shift to product ions in the lower mass range [14]. Concerning flavonol glycosides, at low collision energies, you can study the glycoside structure, and, at higher collision energies, the flavonoid structure. The latter is the case for auto-MS/MS spectra, in which substitution patterns are more conspicuously visible than the sugar linkages. (Figure 1). In this study, a multistage energy ramp was used, depending on the molecular weight of the MS 1 ion, to generate a pseudo MS 3 spectrum that corresponds to an MS 3 or MS 4 CID spectrum [65]. The retrieved library spectra for luteolin-7-(2 -glu)-glu-rha document that comparable spectra exist in MS/MS spectra databases (Figure 2). They are, however, much less represented in them than CID spectra and rarely sufficient for identification. CID spectra are acquired at much lower collision energies and show product ion fragments of the gradual fragmentation of the glycoside part of the flavonol glycoside. Such fragments were detected in some of the spectra of other flavonol glycosides in this study, but not in all. In this context, data from full collision energy ramp-MS 2 spectra can prove as helpful [41]. In two cases, kaempferol-3-glu-rha-7-glu (7.35 min) and quercetin-7-(2 -glu)-glu-rha (7.93 min), in-source fragmentation occurred that provided more specific information about the glycoside structure, for the former in the negative and the latter in the positive mode. Consequently, due to their more or less accidental appearing occurrence, it can only be speculated which factors contribute to these phenomena, but they have to always be taken into account. In-source fragmentation is recognized as a useful phenomenon in auto-MS/MS spectral analysis [66][67][68], and one study even explored it in context with flavonoids [69]. The authors proposed fragmentor voltages that were even higher than those used in this study, 230 and 330 V. At this point of the discussion, we must remind ourselves that complex mixtures of specialized metabolites, which are analyzed in untargeted metabolomics, contain different compound classes, and the pre-chosen analyses conditions will only be optimal for a few of them. To detect in-source fragmentation, the application of isotope labeling was recommended [68], but the focus of that study was on central metabolites. In the case of specialized metabolites, this approach would probably not be feasible. The structural diversity is much higher. In the case of luteolin-7-(2 -glu)-glu-rha, the in-source fragmentation leading to the detection of the flavonoid moiety product ion was fortunate because the MS/MS spectrum allowed the identification of luteolin, contrary to previous assumptions of kaempferol, an isomer to luteolin. In the present data, the existence of pronouncedly different sample groups, in which different flavonol glycosides occur, helped to recognize in-source fragmentation in the ion table (Table 2). This approach may not be methodically elegant but turned out to be practically efficient. The aligned ion table of MS-DIAL provides bar charts that inform about the signal intensities in different experimental groups. Similar bar chart patterns that point to related features are highlighted in red. These patterns, together with similarity in other parameters, such as signal-noise ratio, fold change, or p-values from an ANOVA of the sample groups, can provide hints to related adducts or in-source fragments, the feasibility of which still requires checking on the basis of a tentative structure. Depending on the filter values, an alignment analysis can yield 2000-20,000 features, and the challenge is to sort those that are more prominent in terms of amounts and offering MS/MS fragmentation for their identification. In-source fragments, however, may lack MS/MS spectra but still provide important information for structure identification. Recently, additional procedures have been proposed to clean feature lists, for example, the R-based tool MS-CleanR [70]. The latter could not be applied to the present dataset because of missing quality control analyses. The most recent guideline for untargeted LC-MS/MS analyses mentions quality controls, but not in context with feature filtering [29].

UV-Spectra
Along with MS/MS spectra, some studies also represent UV-spectra that can be obtained by the same analysis. There exist two inherent problems. One is that LC-MS/MS analyses require much lower concentrations than are suitable for LC-DAD analyses. This problem may become especially apparent with newer instrumentation. An efficient solution can be a split after the DAD in the ratio of 1:8 [71].
Another problem is solvent gradient times. In order to benefit from the information of UV spectra, they have to be pure. Often, slight shifts in the maxima or the appearance and disappearance of shoulders are highly indicative. This can be achieved with longer analyses times, but analyses times are often kept short by operators to facilitate higher sample throughput. In this study, the minimum elution difference of two flavonol glycosides is 0.02 min and lower. The alignment procedure makes it difficult to report exact values. A DAD would have failed to detect the co-eluting analytes even in the case of sufficient analyte concentrations. However, not only UV spectroscopy benefits from longer analyses times and the resulting improved chromatographic resolution; auto-MS/MS spectrometry would also yield better quality analysis results due to the fact that a lower number of chemical reactions will occur at the same time in the ion source.

Limiting Candidate Structures with Molecular Structure Databases
The previous section discusses some important aspects of MS/MS spectra that merit consideration during the identification process. MS/MS spectral libraries are not yet complete enough for providing precise automatized identification of MS/MS spectra, but hopes exist that the situation will improve in the future [23]. The use of structural databases in attempts to ameliorate this situation has its merits for certain. Even if the provided structures are not congruent or correct, the application of this software provides starting ideas on which structures are possible (Table 3). Only in one case, kaempferol-3-(2 -glu)-glu-rha, the CSI:FingerID module in SIRIUS listed the finally assigned structure as the first hit. In the other cases, MS-FINDER often came closer in terms of list ranking but never did the hit with the highest match factor concur with the finally proposed structure. Furthermore, MS-FINDER presents fragmentation ion products as chemical structures. One has to keep in mind that, in most cases, the shown structure represents only one of many possible isomers. Moreover, it certainly pays off to use more than one software tool. The implemented in silico fragmentation algorithms differ: MS-FINDER uses a rule-based method, CSI:FingerID in SIRIUS a combination of fragmentation trees and machine learning [20]. According to best-reporting practices [29], only kaempferol-3-(2 -glu)-glu-rha earns a C(ii) identification level and all others only C(iii) (Table S1). In addition to a survey about potential structures, 'in silico fragmentation' methods [23] can provide fragmentation tables of corresponding hits to the MS/MS spectrum in question, which proved extremely helpful to identify product ions that contribute to assign a putative structure for the analyte. Figure 3 illustrates the user interface of both software tools.

Searching the Literature and Making Yourself Searchable
The most frequent question that arises during the identification process of MS/MS spectra is if there exist papers that report MS/MS product ions of the candidate structure. This requires specific structure searches capabilities, and the most widely-used and comprehensive databases are SciFinder n [31], CAS Registry via STN [72], and Reaxys [32].  [25][26][27][28], isorhamnetin-3-(2"xyl)-glu-7-rha. The figure aims just to provide an idea how the program interface of the two softwares look alike. The analysis data of the 12 flavonol glycosides are available [74,75] and can be viewed by the publicly available software tools. SciFinder n and Reaxys allow structure input by SMILES that can be generated by many different chemical structure drawing programs or the web tool PubChem Sketcher [76]. The advantage of SciFinder n is that is represents both a reference (Chemical Abstracts) and substance database (CAS registry), both of which are linked. By contrast, Reaxys is more of a substance database that was developed from the discontinued Beilstein structure database [77].
When you start to work with SciFinder n , you will find out that the findings of quite a number of metabolomic studies are registered incompletely; i.e., the list of substances is incomplete or sometimes missing totally. The reasons for this deficiency are manifold: (1) often, metabolite lists are only provided as supplementary files and often hidden in spreadsheet files with several data sheets or in multipage PDF files; (2) the metabolites are designated by ambiguous trivial names on basis of which it is very difficult to conclude an unambiguous structure; and (3) if the glycoside moiety is only indicated as a simple sequence of hexose, deoxyhexoses, or pentoses, the structure information is regarded as too ambiguous for registration. Figure 4 summarizes the nomenclatural dilemma of specialized metabolites and points out which type names should be avoided for registration. In former times, IUPAC names had to be used. Though their generation can be achieved by software tools, they are too long to be practical.
As a result, structure searches do retrieve much fewer hits as would be possible in theory. Many of the retrieved references comprise studies that report the classical combination of spectral information, UV, IR, MS and NMR 1 H, 13 C, and various 2D experiments [22]. In recent years, however, they have become scarcer. This development may lead to a gap between the traditional exploratory research in organismic metabolite diversity and MS/MS-based metabolomics of central and specialized metabolites. To improve structure elucidation accuracy, LC-MS/MS can be combined with LC-NMR [15], but such state-of-the-art instrumentation is only available to a very small fraction of the scientific community that is engaged in untargeted plant metabolomics and research involving specialized metabolites in general [78]. In this study, none of the proposed structures for the twelve flavonol glycosides (Figure 1) could have been developed without consulting SciFinder n . What can be done to avoid incomplete registration of analysis results? A clear indication in the Results section to the existence of a table in the Supplementary Materials that tabulates the detected metabolites should suffice [79]. This table, named Table S1, should contain an overview of the identified structures with SMILES. They are the best identifier codes because InChI keys do not work for novel structures. Canonical SMILES are sufficient because MS/MS data provide no stereochemical information. Caution is always important in the assignment of structures to analytes. However, it is better to present distinct proposals than too cautious ones, the latter of which might fail to be registered in databases. Wrong structures, as they are registered, can be corrected; cautious ones, which will remain unregistered, will also remain anonymous.

Conclusions
This case study aims to provide insights into which problematics can arise when analyzing auto-MS/MS spectra. They differ from those that are obtained by traditional CID experiments (Figure 1), in which the measurement parameters, specifically the collision energy, are optimized for the analyte. This study demonstrates that auto-MS/MS spectra can be used for identification. However, existing spectral and molecular structure databases are not sufficient alone for the identification process. Additionally, accessing of chemical reference databases is required to (1) compare product ions in MS/MS spectra with literature data; (2) to obtain more information about specific structural features of the compound class; and (3) to find out if the putatively identified metabolite is described in the literature or not. More detailed knowledge about the specialized metabolite analyte profiles in LC-MS/MS analyses can substantially contribute to improved visualization and deeper analysis of compound class patterns by molecular networking [80].
Funding: This research received no external funding.
Data Availability Statement: Raw and analysis data are available at Goettingen Research Online. They include spectral data in MassBank [81], NIST [82], and Mascot format [83]. Additionally, for all 12 flavonoid glycosides, the project folders of the MS-FINDER [74] and SIRIUS [75] analyses are provided.