Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction

We introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS2), and NMR into a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter out the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture, and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS2 approach is well suited to the discovery of new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases.


Introduction
Metabolomics has become a key discipline in small molecule research. It has been widely applied to generate new hypotheses for unresolved scientific questions, such as in metabolism-related disorders, environmental carbon cycling studies, complex biofuel analysis, nutritional and food sciences, and synthetic biology optimization [1][2][3][4][5]. Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) are the two most powerful experimental methods for metabolomics [6,7]. This is primarily because of their high-resolution powers providing detection of individual molecular species in a short acquisition time with minimal sample preparation. Connecting the detected signals to scientific interpretation requires assignment of individual signals to their corresponding metabolite identities. This is often performed by querying detected signals against one or several experimental databases such as BMRB [8], HMDB [9], METLIN [10] and COLMAR [11]. The success of this approach for positive metabolite identification requires the presence of the spectra of a metabolite within the database. Although excellent progress has been made in growing the number of metabolites in databases, many of the signals detected in metabolomics experiments cannot be identified through this database approach because of the absence of their spectra in metabolomics databases.
Identification of these unknowns is a major bottleneck. Traditionally, identification of unknowns requires extensive isolation of the molecule from complex mixture in sufficient amount for detailed

Identification of a Known Metabolite without the Use of a Database for Matching
Our NMR/MS-based approach works as follows: we first determine the chemical formula of an unknown LC-MS feature by analyzing its accurate mass and isotopic distributions. Next, we generate all possible candidate structures consisted with the chemical formula. We predict the MS/MS and NMR spectra of each candidate structure. Meanwhile, we collect the experimental NMR spectrum of the same sample. We compare the predicted MS/MS and NMR data against the experimental MS/MS and NMR, and rank them according to the level of agreement to determine the best matching candidate. Figure 1 shows the approach on a simplest case, where there is only one metabolite in the NMR spectrum; therefore, NMR spectrum was paired with the LC-MS spectrum. We used the MS 2 and NMR spectra of valine. The high-resolution LC-MS spectrum of valine allowed the determination of its molecular formula, C 5 H 11 NO 2 . All possible structures consistent with C 5 H 11 NO 2 were determined using the ChemSpider database [21]. There were 453 possible structures. We queried all the 453 MS 2 predictions against the experimental MS 2 of valine by using the MetFrag web server [15]. This gave a scoring for each predicted MS 2 match. Valine MS 2 prediction was the 6th-best match to the experimental MS 2 of valine among 453 predictions. We predicted the 2D 13 C-1 H HSQC NMR spectra [22] of all the 453 structures by using MestReNova software, and queried them against the experimental 2D 13 C-1 H HSQC of valine. Valine NMR prediction was the best match to the experimental NMR spectrum of valine. By using the same strategy, we analyzed nine other common metabolites. Table 1 (column 7) and Table 2 (column 5) show the performance of MS 2 and NMR predictions for these metabolites, respectively. With one exception, the NMR predictions were always either the best or second-best match to their experimental signals; the NMR prediction of panthothenate was the fourth-best match. On the MS 2 side, the predictions matched less successfully to their corresponding experimental MS 2 (Table S1). For instance, nicotinate and glutamine predictions were the 7th-and 10th-best matches, respectively. Phenylalanine was an exception, with the MS 2 prediction being the best match to its experimental data.
Overall, NMR predictions turned out to be a more effective filter than MS 2 predictions. Thus, the structural assignments obtained with NMR prediction tend to provide the most accurate structural information, with MS 2 -based predictions providing an orthogonal method that can distinguish between molecules that yield similar NMR chemical shifts. For instance, proline NMR prediction was the second best to its experimental NMR spectrum with a chemical shift difference of 0.3077 and 1.0908 ppm along 1 H and 13 C dimension. The best match was 5-hydroxy-2-piperidinone with a chemical shift difference of 0.1072 and 2.1928 ppm along 1 H and 13 C dimension. Whereas MS 2 ranks of proline and 5-hydroxy-2-piperidinone were 26 (score: 0.8515) and 167 (score: 0.5964), respectively ( Figure S1). Therefore, the false positive candidate (5-hydroxy-2-piperidinone) based on NMR prediction can be eliminated by MS 2 prediction, showing the advantage of combining these two orthogonal parameters for metabolite identification. By using the same strategy, we analyzed nine other common metabolites. Table 1 (column 7) and Table 2 (column 5) show the performance of MS 2 and NMR predictions for these metabolites, respectively. With one exception, the NMR predictions were always either the best or second-best match to their experimental signals; the NMR prediction of panthothenate was the fourth-best match. On the MS 2 side, the predictions matched less successfully to their corresponding experimental MS 2 (Table S1). For instance, nicotinate and glutamine predictions were the 7th-and 10th-best matches, respectively. Phenylalanine was an exception, with the MS 2 prediction being the best match to its experimental data.
Overall, NMR predictions turned out to be a more effective filter than MS 2 predictions. Thus, the structural assignments obtained with NMR prediction tend to provide the most accurate structural information, with MS 2 -based predictions providing an orthogonal method that can distinguish between molecules that yield similar NMR chemical shifts. For instance, proline NMR prediction was the second best to its experimental NMR spectrum with a chemical shift difference of 0.3077 and 1.0908 ppm along 1 H and 13 C dimension. The best match was 5-hydroxy-2-piperidinone with a chemical shift difference of 0.1072 and 2.1928 ppm along 1 H and 13 C dimension. Whereas MS 2 ranks of proline and 5-hydroxy-2-piperidinone were 26 (score: 0.8515) and 167 (score: 0.5964), respectively ( Figure S1). Therefore, the false positive candidate (5-hydroxy-2-piperidinone) based on NMR prediction can be eliminated by MS 2 prediction, showing the advantage of combining these two orthogonal parameters for metabolite identification.  a Number of structures for a given molecular formula (obtained with ChemSpider); b Average 1 H chemical shift difference (in units of ppm) between predicted and experimental chemical shifts for a given metabolite; c Average 13 C chemical shift difference (in units of ppm) between predicted and experimental chemical shifts for a given metabolite; d Rank-ordered agreement between predicted and experimental chemical shifts of a given metabolite when the metabolite is the only unknown in the NMR spectrum; e Rank-ordered agreement between predicted and experimental chemical shifts of a given metabolite when the metabolite is one of the ten unknowns in the NMR spectrum.

Identification of Known Metabolites in a Test Mixture without the Use of a Database for Matching
In the example above, which considered one unknown metabolite at a time, each NMR spectrum was paired with a single LC-MS feature. However, it is common in metabolomics to analyze complex mixtures consisting of multiple metabolites within unfractionated samples. In these cases, multiple LC-MS features and multiple deconvoluted NMR spectra are generated, without knowing which pairs correspond to the same metabolite. This increases the challenge of metabolite identification. The NMR/MS 2 approach can also be applied to mixtures containing multiple metabolites. When multiple metabolites are present, the workflow is modified to consider all pairwise combinations; NMR predictions are generated for all structural isomers of each LC-MS feature, and they are all compared to every deconvoluted experimental NMR spectra. Figure 2 shows the approach on a sample consisting of three metabolites; valine, methionine and glutamine. Deconvolution of the HSQC spectrum by connectivity information from TOCSY [23] and HMBC [24] resulted in 3 HSQC-peak lists, colored magenta, light-blue and green ( Figure 2).
The LC-MS feature of methionine was first converted to the chemical formula C 5 H 11 NO 2 S. There were 212 possible isomers. We queried HSQC predictions of the 212 structures against the HSQC peak lists of the magenta, light-blue and green individually. Among them, the best match was the one between the predicted HSQC of methionine and the light-blue peaks, corresponding to the experimental HSQC of methionine ( Figure 2). The predicted HSQC of methionine provided a better match to the light-blue peaks than all the other 211 methionine isomers compared to all three deconvoluted peak lists. HMBC [24] resulted in 3 HSQC-peak lists, colored magenta, light-blue and green ( Figure 2). The LC-MS feature of methionine was first converted to the chemical formula C5H11NO2S. There were 212 possible isomers. We queried HSQC predictions of the 212 structures against the HSQC peak lists of the magenta, light-blue and green individually. Among them, the best match was the one between the predicted HSQC of methionine and the light-blue peaks, corresponding to the experimental HSQC of methionine ( Figure 2). The predicted HSQC of methionine provided a better match to the light-blue peaks than all the other 211 methionine isomers compared to all three deconvoluted peak lists. To demonstrate how the approach would work in a more complex mixture, we expanded the list of metabolites from three to ten; thymidine, proline, phenylalanine, pantothenate, nicotinate, methionine, isoleucine, glutamine, leucine and valine. Table 2, column 6 shows the performance of NMR prediction in the ten-metabolite mixture. The ranks of thymidine, proline, phenylalanine, nicotinate, methionine, isoleucine, glutamine and valine did not differ from being the only metabolite in the NMR spectrum (column 5) to being one of the ten metabolites in the mixture (column 6). On the other hand, the rank of panthothenate and leucine increased from 4 to 16 and from 1 to 3, respectively. This is because in the single-metabolite case, NMR predictions of panthothenate and leucine isomers were limited to matching only to the experimental NMR spectra of panthothenate To demonstrate how the approach would work in a more complex mixture, we expanded the list of metabolites from three to ten; thymidine, proline, phenylalanine, pantothenate, nicotinate, methionine, isoleucine, glutamine, leucine and valine. Table 2, column 6 shows the performance of NMR prediction in the ten-metabolite mixture. The ranks of thymidine, proline, phenylalanine, nicotinate, methionine, isoleucine, glutamine and valine did not differ from being the only metabolite in the NMR spectrum (column 5) to being one of the ten metabolites in the mixture (column 6). On the other hand, the rank of panthothenate and leucine increased from 4 to 16 and from 1 to 3, respectively. This is because in the single-metabolite case, NMR predictions of panthothenate and leucine isomers were limited to matching only to the experimental NMR spectra of panthothenate and leucine. Whereas in the mixture, they can also match to experimental NMR signals of the other mixture metabolites. Figure S2 shows this in detail for leucine. When NMR predictions of 962 C 6 H 13 NO 2 isomers (including leucine and isoleucine) were queried against the experimental HSQC peak-lists of the ten metabolites, the best-matching NMR prediction belonged to isoleucine, which found the experimental NMR of isoleucine as the best match (true positive). The second-best match was 3-methylaminopropyl acetate, which found experimental NMR of proline as a match (false positive). Finally, the third-best match was between the predicted NMR of leucine and the experimental NMR of leucine (true positive).
3-methylaminopropyl acetate can be eliminated by MS 2 prediction, since MS 2 prediction of 3-methylaminopropyl acetate was ranked 92nd (score: 0.6665), whereas MS 2 prediction of isoleucine was ranked 22nd (score: 0.8723) when they were compared to experimentally detected MS 2 of isoleucine. MS 2 prediction of 3-methylaminopropyl acetate was ranked 585th (score: 0.2079), whereas MS 2 prediction of leucine was ranked 148th (score: 0.7002) when they were compared to experimentally detected MS 2 of leucine. Therefore, this shows, again, the advantage of combining NMR and MS 2 predictions for metabolite identification.

Identification of Uncatalogued Metabolite in Arabidopsis thaliana
Many of the metabolites in plants are unknown or uncatalogued; therefore, plants are excellent samples to demonstrate the power of the NMR/MS 2 approach. We recorded the LC-MS and LC-MS/MS spectra of Arabidopsis extract and queried this data against the METLIN database [10], but could not identify many of the high-abundance peaks. Similarly, we recorded the 2D 13 C-1 H HSQC spectrum of the sample, and queried this against COLMAR HSQC database [25]. Again, many of the high-abundance peaks could not be identified, as these metabolites were not in either database. One of the highly abundant metabolites had an m/z 436.0332 in negative mode at the retention time~47 min ( Figure S3). The challenge of identifying unknown metabolites starts at the chemical formula determination step. Previous studies have shown that even 1 ppm mass accuracy is often insufficient to unambiguously determine the chemical formula without additional constraints from isotopic patterns [26]. For the accurate mass 436.0332, there are 4291 possible structures in ChemSpider within 30 ppm mass error, 3194 possible structures within 20 ppm mass error, 1909 structures within 10 ppm mass error, 907 structures within 5 ppm mass error, 528 structures within 3 ppm mass error and 379 structures within 1 ppm mass error. By including the isotopic distribution, we narrowed down the possible chemical formula to C 12 H 23 NO 10 S 3 . The next challenge was to determine which of the many isomers of that formula match the structure of the unknown molecule. We queried all MS 2 predictions of all possible isomers of C 12 H 23 NO 10 S 3 against the experimental MS 2 by using the MetFrag web server. MetFrag hits with their matching scores are shown in Figure 3. The structure c was the best match with a matching score of 1.00, structure b was the second-best match (score of 0.87), and structure a was the third-best match (score of 0.79). The main MS/MS peak differentiating c from a and b was at m/z 178.0166 (Figure 3), which corresponds to the theoretical mass of the fragment formula of [C 5 H 9 NO 4 S]-H − , that includes the SO 4 group and part of the aliphatic chain ( Figure S4).
For NMR, we simplified the plant extract into sub-mixtures by performing LC-fractionation at 1 min intervals. At the retention time~47 min, we obtained the 2D 13 C-1 H HSQC spectrum in Figure S5. Querying the HSQC predictions of all possible isomers against the experimental HSQC (Figure 3) also returned the structure c as the best match with average chemical shift ppm error of 0.2081 ppm in 1 H and 1.4187 ppm in 13 C dimension. The key contributor for the differentiation of molecule c from a and b was the methyl group at the end of the aliphatic chain. In molecule c, the methyl was attached to a sulfoxide group, while in molecule a and b it was attached to a thioether. The prediction values of the methyl group in structure c was 2.55 ppm in 1 H and 39.57 ppm in 13

Discussion
The introduced NMR/MS 2 approach is highly compelling for identification of unknown and uncatalogued metabolites for multiple reasons. First, it combines highly selective, universal and orthogonal structure elucidation parameters-accurate mass, MS 2 and NMR-in a single analysis platform to accurately identify metabolites. Secondly, the approach does not require any experimental NMR or MS metabolomics database for identification. Instead, it generates in silico NMR and MS 2 spectra for all possible isomers of a chemical formula and queries them directly against experimental NMR and MS 2 spectra of the unknown of interest. As a proof-of-principle, we used the ChemSpider database [21] as a structure generator, assuming it to contain a sizeable portion of the possible isomers of a chemical formula. To expand the structural space, one could replace ChemSpider with a structure generator such as MOLGEN [27], or one could use a more biological structure resource such as KEGG [28] to focus on biological compounds. Third, the introduced approach has a robust workflow. It not only works in single metabolite samples but also in mixtures.
This cheminformatics approach provided a unique platform to compare performances of NMR and MS/MS predictions for structure elucidation. NMR predictions turned out to be a more effective filter than MS/MS predictions in the model mixture study. Overall, the complementarity of NMR and

Discussion
The introduced NMR/MS 2 approach is highly compelling for identification of unknown and uncatalogued metabolites for multiple reasons. First, it combines highly selective, universal and orthogonal structure elucidation parameters-accurate mass, MS 2 and NMR-in a single analysis platform to accurately identify metabolites. Secondly, the approach does not require any experimental NMR or MS metabolomics database for identification. Instead, it generates in silico NMR and MS 2 spectra for all possible isomers of a chemical formula and queries them directly against experimental NMR and MS 2 spectra of the unknown of interest. As a proof-of-principle, we used the ChemSpider database [21] as a structure generator, assuming it to contain a sizeable portion of the possible isomers of a chemical formula. To expand the structural space, one could replace ChemSpider with a structure generator such as MOLGEN [27], or one could use a more biological structure resource such as KEGG [28] to focus on biological compounds. Third, the introduced approach has a robust workflow. It not only works in single metabolite samples but also in mixtures.
This cheminformatics approach provided a unique platform to compare performances of NMR and MS/MS predictions for structure elucidation. NMR predictions turned out to be a more effective filter than MS/MS predictions in the model mixture study. Overall, the complementarity of NMR and MS/MS predictions increased the accuracy of metabolite identification as compared to using individual techniques alone. We only used a single collision energy (30 V) for fragmentation, perhaps using multiple collision energies or optimizing collision energies for each metabolite and comparing them with other in silico MS/MS tools such as combined energy competitive fragmentation modeling (CFM) [29] and CSI:FingerID [14] may provide better results, this needs to be further tested. To improve the selectivity on the MS-side, one could include additional parameters such as collusion cross-sections from ion mobility mass spectrometry [30] and LC-retention time, although the latter one is highly dependent on experimental conditions. Determination of the correct molecular formula or at least a small set of possible formulas including the correct formula is crucial for downstream analysis. Accurate mass significantly lowers the number of possible structures that are consistent with the detected mass. For cases where accurate mass is still not enough to narrow down to a single formula, the NMR/MS 2 approach still works, as the combined MS 2 and NMR predictions are strong enough to filter majority of false positive structures coming from false chemical formulas. Higher resolution MS systems than TOF, such as Orbitrap and FTICR, would provide better performance, especially for larger metabolites.
With NMR, we used 2D 13 C-1 H HSQC [22] predictions as an effective filter, predictions were performed by empirical calculations. The advantage of empirical calculations is the speed. Each prediction takes about 10 s on a desktop computer. To get more accurate chemical shift values, one could use quantum chemical calculations at the expense of calculation time [31,32]. To make NMR parameters even more selective, one could include additional NMR parameters, in particularly the connectivity information retrieved from 2D 1 H-1 H TOCSY [23], 2D 13 C-1 H HSQC-TOCSY [33] or 2D 13 C-1 H HMBC [24]. Metabolites with high quaternary carbon content may require direct carbon detection experiments such as 1D 13 C NMR and their comparison with 1D 13 C NMR predictions. Successful implementation of the workflow requires detection of a metabolite of interest by both MS and NMR. Therefore, a metabolite should have at least low micromolar concentration to be detected in NMR experiments. Moreover, it should be ionized in a mass spectrometer. Signal overlaps are another challenge for the success of the approach, here we showed that the most straightforward way to resolve this challenge in real mixtures is to perform partial or complete LC-fractionation before NMR analysis. As we have shown with examples, our NMR/MS 2 approach is designed to be able to work in all three conditions, switching from unfractionated to partially fractionated to completely fractionated sample depending on the experimental conditions does not significantly affect the protocol. To the best of our knowledge, this is the first study developed to facilitate the identification of unknown metabolites by combining NMR and MS/MS predictions, which supports previous findings that in addition to HRMS, NMR is required for de-novo identification of plant metabolites [34]. We used this approach to identify a type of glucosinolate, glucoraphanin in A. thaliana. Glucosinolates are biologically active secondary metabolites that occur in Brassicaceae species [35] such as food crops, broccoli and cauliflower. These metabolites influence plant-insect interactions, and glucoraphanin has been shown to be an important chemical for the insect pest resistance in Arabidopsis [36]. Glucoraphanin in vegetables within Brassicaceae is associated with a lower risk of lung and colorectal cancer [37]. Continued application of the NMR/MS 2 approach is ongoing on plant derived metabolites. The unknown/uncatalogued metabolites identified will be added to metabolomics databases for database expansion and dereplication [38]. The NMR/MS 2 approach is applicable to a wide range of samples from plant extracts, to, microbes, soils, dissolved organic matter, food extracts, biofuels and biomedical samples.

Materials and Methods
The model mixture was prepared by dissolving thymidine, proline, phenylalanine, pantothenate, nicotinate, methionine, isoleucine, glutamine, leucine and valine in 5 µL D 2 O, 45 µL H 2 O and 950 µL acetonitrile. The final concentration of each metabolite was 100 µM. 100 µL of this material was injected to LC-MS. NMR sample of the three-compound mixture was prepared by dissolving glutamine, valine and methionine in 180 µL D 2 O, the final concentration of each metabolite was 1 mM. A. thaliana metabolite extract was prepared as described previously [39]. The plant extract was dissolved in 200 µL D 2 O with 0.5 mM DSS, 180 µL of this sample was used for NMR. 5 µL of this sample was diluted with 45 µL H 2 O and 950 µL acetonitrile, 100 µL of this was injected to LC-MS.
2D 13 C-1 H HSQC spectra [22] were collected using a Varian (VNMRS) 600 MHz solution state NMR spectrometer equipped with a Varian z-gradient triple resonance HCN cold probe. The number of scans per t 1 increment were 16 for the three-compound mixture, 64 for unfractionated Arabidopsis extract, 256 for fractionated Arabidopsis extract, and 128 for glucoraphanin standard (Cayman Chemicals, Ann Arbor, MI, USA). 2D 1 H-1 H TOCSY spectra for the unfractionated Arabidopsis extract and fractionated Arabidopsis extract were collected by 32 and 256 scans per t 1 increment. TOCSY mixing time was 90 ms. The 2D 13 C-1 H HSQC, 2D 1 H-1 H TOCSY [23] and 2D 13 C-1 H HMBC [24] spectra of the ten metabolites were downloaded from BMRB database [8]. All data were zero-filled, Fourier transformed, phase and baseline corrected using NMRPipe [40].
Mass spectrometry studies were conducted using an HPLC system (1200 series, Agilent Technologies, Santa Clara, CA, USA) coupled to an Agilent 6538 UHD ESI Q-TOF instrument with a mass resolution of 40K. The instrument was externally calibrated with Agilent low-concentration tuning mix (part no. G1969-85000) before sample analysis, achieving a mass accuracy of ±5 ppm. LC-MS and collision induced dissociation LC-MS/MS data collection was performed together in auto MS/MS mode with a fixed collision energy of 30 V in both positive and negative ionization modes. Samples were separated using a normal-phase approach with a Phenomenex Luna NH 2 column (150 mm × 2 mm i.d., 3 µm particle size). A gradient method was used for separation. The composition of the solvents was A = 10 mM NH 4 Ac plus 10 mM NH 4 OH in 95% water and 5% acetonitrile (mobile phase A) and solvent B = 95% acetonitrile and 5% water (mobile phase B). A 0.1 mL/min flow rate was used for the mobile phase flow, starting with 100% B to 0% B over 60 min, holding at 0% B for 5 min., increasing the flow rate to 0.2 mL/min and re-equilibration with 100% B for 54 min., then returning the flow to 0.1 mL/min for an additional 10 min. The settings for the mass spectrometer were as follows: capillary voltage, 4000 V (positive ion mode) and −4000 V (negative ion mode); drying gas flow (N 2 ), 5 L/min; drying gas temperature, 330 • C; and nebulizer gas (N 2 ), 30 psig. The raw LC-MS data was converted to mzdata.xml format by using MassHunter Qualitative Analysis (Agilent Technologies) and analyzed by mzMine 2 [41]. LC-fractionation of the plant extract was performed by an Agilent HPLC system (1200 series) coupled to an Agilent fraction collector (1100 series). The LC column, elution gradient, flow rates, solvents and injection volume were the same as the LC-MS experiments above. The fractions were collected in a 96-well deep plate (96DeepNunc31mm) with 1-min retention time intervals.
Candidate structures for the chemical formulas were extracted from ChemSpider database [21] by using 'advanced' and 'intrinsic properties' search option. 2D 13 C-1 H HSQC spectra for the candidate structures were predicted using MestReNova software (Santiago de Compostela, Spain). HSQC spectra of the predicted structures were compared with experimental HSQC spectrum and ranked by using a custom developed MATLAB algorithm. The ranking took into account chemical shift differences between experimental and predicted chemical shifts, and the matching ratio. The chemical shift differences were calculated by using a distance matrix as previously described [19,20]. The weights assigned for 1 H and 13 C chemical shifts were 10 and 1, respectively. The matching ratio is defined as the ratio of the matched peaks to the total number of peaks. For example, if a certain metabolite has 6 cross-peaks based on HSQC prediction, and 5 of them have corresponding matched peaks in the experimental HSQC spectrum, the matching ratio for this metabolite is 0.83. Sorting the metabolite candidates by minimum chemical shift difference and maximum matching ratio provided the quantitative ranking. Running the algorithm takes less than one second on a desktop computer. Generation of the MS/MS predictions for ChemSpider candidates and their comparisons with the experimental MS/MS spectrum was performed on MetFrag web server [15] using 30 ppm MS 2 mass accuracy.
Supplementary Materials: The following are available online at www.mdpi.com/2218-1989/8/1/8/s1, Figures S1 and S2: chemical structures of the molecules, whose NMR predictions matched best to the experimental NMR spectra, Figure S3: total ion chromatogram of Arabidopsis metabolite extract, Figure S4: the MS/MS peak differentiating MS/MS prediction of glucoraphanin from the other isomers, Figures S5 and S6: overlay of the experimental 2D 13 C-1 H HSQC spectra of glucoraphanin, unfractionated and fractionated Arabidopsis extracts, Table S1: the list of precursor ions that were selected for fragmentation.