Metabolomics as a Potential Chemotaxonomical Tool: Application on the Selected Euphorbia Species Growing Wild in Serbia

Chemotaxonomy presents various challenges that need to be overcome in order to obtain valid and reliable results. Individual genetic and environmental variations can give a false picture and lead to wrong conclusions. Applying a holistic approach, based on multivariate data analysis, these challenges can be overcome. Thus, a metabolomics approach has to be optimized depending on the subject of research. We used 1H NMR-based metabolomics as a potential chemotaxonomic tool on the selected Euphorbia species growing wild in Serbia. Principal components analysis (PCA), soft independent modeling by class analogy (SIMCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) were used to analyze obtained NMR data in order to reveal chemotaxonomic biomarkers. The standard protocol for plant metabolomics was optimized aiming to extract more specific metabolites, which are characteristic for the Euphorbia genus. The obtained models were validated, which revealed that variables unique for each species were associated with certain classes of molecules according to literature data. In E. salicifolia, acacetin-7-O-glycoside (not found before in the species) was detected, and the structure of the aglycone part was solved based on 2D NMR data. In the presented paper, we have shown that metabolomics can be successfully used in Euphorbia chemotaxonomy.


Introduction
Metabolomics analysis has been applied in numerous areas of research related to plant biology and chemistry. Its role could be defined as the extension of traditional phytochemical investigation following the identification of biomarkers that can be statistically correlated to bioactivity changes but also as an attempt to understand complex plant mechanisms as a part of systems biology [1]. In several studies, metabolomics has been helpful in the fingerprinting of species, genotypes or ecotypes for taxonomic or biochemical (gene discovery) purposes [2][3][4][5][6][7][8][9]. For example, the PCA analysis of 1 H NMR metabolite fingerprinting was used to discriminate five Verbascum species. Among these species, V. xanthophoeniceum and V. nigrum have higher amounts of the pharmaceutically important harpagoside and verbascoside, forsythoside B and leucosceptoside B [10]. Hierarchical analysis of NMR data was well correlated to the phylogenetic data [11], showing that metabolomics data can be used to make a link between chemotaxonomy and phylogeny. Great care must be taken in such analyses to avoid the influence of environmental and genetic variations of the investigated plants. An unbiased LC-MS-based metabolomics standardization of the extraction protocol is needed. In this paper, we propose an optimized extraction protocol for this purpose and report 1 H NMR based metabolomics studies of some Euphorbia species growing wild in Serbia as a potential chemotaxonomic tool.

Optimization of the Extraction for the NMR-Based Metabolomic Analysis of Euphorbia Species
In order to obtain as comprehensive a picture as possible of the metabolites present in Euphorbia plants, we tested several solvents for their extraction. For the optimization, we used a standard protocol described by H. K. Kim et al. [25], changing solvents and running 1 H NMR spectra with water signals suppression ( Figure 1). Solvents for the extraction were selected according to the known literature data. In order to extend the polarity range as much as possible, the following combinations of solvents were tested: 1:1 mixture of deuterated methanol and potassium phosphate buffer in deuterated water (MeOD:KP-D 2 O); deuterated methanol (MeOD); 1:1 mixture of deuterated methanol and deuterated chloroform (MeOD:CDCl 3 ); and deuterated chloroform (CDCl 3 ). As a model system for these studies, we randomly used finely ground freeze-dried aerial parts (containing stems, leaves and flowers) of E. salicifolia. The criteria for the best solvent for the extraction were the number of the signals in spectra, their intensity and resolution. Additionally, different regions of spectra were integrated, and areas were normalized using the same scale for calibration in comparison to proton signals at oxygenated carbons (from δ H 2.9 to 4.8) excluding solvents, in MeOD:KP-D 2 Figures S1-S4). In CDCl 3 extracts, dominant signals were those of waxes, fatty acids, di-and triterpenes in the region from δ H 0.2 to 1.8. More polar metabolites such as sugars and aminoamides were rather sparse, and thus, they were excluded from further investigation. The MeOD:KP-D 2 O extracts contained a large variety of metabolites, with sugar and amino acid signals dominating (in the region from δ H 0.8 to 4.7). They also contained a modest amount and intensity of proton signals from sp 2 hybridized carbons as well as protons from sp 3 hybridized carbons from di-and tri terpenes skeletons and other non-polar molecules. The "sugar" spectral area contained the most intense signals, whereas those in the aromatic region have a ca. five times smaller area. Comparing the spectra obtained after MeOD and MeOD:CDCl 3 extraction showed that more intense and clearly defined signals are observed below δ H 0.8 in the latter, and the ratio of areas between the sugar and aromatic areas became much more uniform. According to these data, we had chosen 1:1 mixture of MeOD:CDCl 3 as a solvent for this metabolomics study. overcome this problem, especially in cases where the sample contains compounds of a wide range of polarity as is the case with Euphorbia species. Consequently, it can be assumed that the application of multivariate analysis would be suitable for a chemotaxonomic study of Euphorbia species, but an optimization and standardization of the extraction protocol is needed. In this paper, we propose an optimized extraction protocol for this purpose and report 1 H NMR based metabolomics studies of some Euphorbia species growing wild in Serbia as a potential chemotaxonomic tool.

Optimization of the Extraction for the NMR-Based Metabolomic Analysis of Euphorbia Species
In order to obtain as comprehensive a picture as possible of the metabolites present in Euphorbia plants, we tested several solvents for their extraction. For the optimization, we used a standard protocol described by H. K. Kim et al. [25], changing solvents and running 1 H NMR spectra with water signals suppression ( Figure 1). Solvents for the extraction were selected according to the known literature data. In order to extend the polarity range as much as possible, the following combinations of solvents were tested: 1:1 mixture of deuterated methanol and potassium phosphate buffer in deuterated water (MeOD:KP-D2O); deuterated methanol (MeOD); 1:1 mixture of deuterated methanol and deuterated chloroform (MeOD:CDCl3); and deuterated chloroform (CDCl3). As a model system for these studies, we randomly used finely ground freeze-dried aerial parts (containing stems, leaves and flowers) of E. salicifolia. The criteria for the best solvent for the extraction were the number of the signals in spectra, their intensity and resolution. Additionally, different regions of spectra were integrated, and areas were normalized using the same scale for calibration in comparison to proton signals at oxygenated carbons (from δH 2.9 to 4.8) excluding solvents, in MeOD:KP-D2O extracts (Supplementary Figures S1-S4). In CDCl3 extracts, dominant signals were those of waxes, fatty acids, di-and triterpenes in the region from δH 0.2 to 1.8. More polar metabolites such as sugars and aminoamides were rather sparse, and thus, they were excluded from further investigation. The MeOD:KP-D2O extracts contained a large variety of metabolites, with sugar and amino acid signals dominating (in the region from δH 0.8 to 4.7). They also contained a modest amount and intensity of proton signals from sp 2 hybridized carbons as well as protons from sp 3 hybridized carbons from di-and tri terpenes skeletons and other non-polar molecules. The "sugar" spectral area contained the most intense signals, whereas those in the aromatic region have a ca. five times smaller area. Comparing the spectra obtained after MeOD and MeOD:CDCl3 extraction showed that more intense and clearly defined signals are observed below δH 0.8 in the latter, and the ratio of areas between the sugar and aromatic areas became much more uniform. According to these data, we had chosen 1:1 mixture of MeOD:CDCl3 as a solvent for this metabolomics study.

Multivariate Data Analysis
Principal components analysis (PCA) was performed on the NMR metabolomics fingerprints of 60 samples originating from six distinct Euphorbia species. Since PCA is a technique for pattern recognition and unsupervised variable reduction, a smaller number of new variables were formed containing the majority of the original variables' variation. This analysis yielded a model with eleven principle components that explains 96% of the total variance in the data. E. maculata and E. salicifolia were clearly distinguished from the remaining samples based on the PCA score plots of the first two principal components ( Figure 2a). Samples of E. panonica and E. cyparissias were separated along the third component (Figure 2b), whereas E. amygdaloides was separated along the fourth component ( Figure 2c). There were no outliers found in any of the score plots.

Multivariate Data Analysis
Principal components analysis (PCA) was performed on the NMR metabolomics fingerprints of 60 samples originating from six distinct Euphorbia species. Since PCA is a technique for pattern recognition and unsupervised variable reduction, a smaller number of new variables were formed containing the majority of the original variables' variation. This analysis yielded a model with eleven principle components that explains 96% of the total variance in the data. E. maculata and E. salicifolia were clearly distinguished from the remaining samples based on the PCA score plots of the first two principal components ( Figure 2a). Samples of E. panonica and E. cyparissias were separated along the third component (Figure 2b), whereas E. amygdaloides was separated along the fourth component ( Figure 2c). There were no outliers found in any of the score plots. The soft independent modeling by class analogy (SIMCA) model was used to confirm the difference between the analyzed Euphorbia species. This is a supervised pattern recognition algorithm based on the PCA of each class individually. The data were separated  The soft independent modeling by class analogy (SIMCA) model was used to confirm the difference between the analyzed Euphorbia species. This is a supervised pattern recognition algorithm based on the PCA of each class individually. The data were separated into training and prediction sets. The logarithmic averaged distances between each class model (DModX) were measured, and class membership was determined by comparing DModX to the critical distance (DCrit). As depicted in Figure 3a-f, all of the samples in the prediction dataset were properly identified, indicating that the model achieved 100% sensitivity and specificity.
Plants 2023, 12, x FOR PEER REVIEW 6 of 14 into training and prediction sets. The logarithmic averaged distances between each class model (DModX) were measured, and class membership was determined by comparing DModX to the critical distance (DCrit). As depicted in Figure 3a-f, all of the samples in the prediction dataset were properly identified, indicating that the model achieved 100% sensitivity and specificity. Six OPLS-DA models were used in order to identify metabolites unique to each Euphorbia species investigated. Samples belonging to the species for which distinctive metabolites are to be identified are defined as belonging to one class, while the remaining samples from all other species are defined as belonging to another class. Using this method, novel variables will account for the greatest possible separation between two previously defined classes. Since the systematic variation of variables in the orthogonal model is divided into two components, one of which is linearly related to the class information and the other is orthogonal to it, model interpretation is facilitated [26]. Therefore, OPLS-DA is appropriate for identifying variables with the highest discriminatory power between two preset groups. Cross-validation, permutation testing, and CV-ANOVA were Six OPLS-DA models were used in order to identify metabolites unique to each Euphorbia species investigated. Samples belonging to the species for which distinctive metabolites are to be identified are defined as belonging to one class, while the remaining samples from all other species are defined as belonging to another class. Using this method, novel variables will account for the greatest possible separation between two previously defined classes. Since the systematic variation of variables in the orthogonal model is divided into two components, one of which is linearly related to the class information and the other is orthogonal to it, model interpretation is facilitated [26]. Therefore, OPLS-DA is appropriate for identifying variables with the highest discriminatory power between two preset groups. Cross-validation, permutation testing, and CV-ANOVA were utilized to evaluate the model's quality (see Supplementary Tables S1-S7 and Figures S5-S10). The most influential variables were chosen based on their impact on the projection scores of the predictive components (VIPpred). Variables having a VIPpred score greater than 1.4 deemed crucial for the separation are shown in Figures 4-9.               The variables that were characteristic and unique for E. seguieriana (Figure 4.) in the region from δH 7.3 to 8.8 originate from benzoate substituents of the myrsinol diterpenes type skeleton previously reported by F. Jeske et al. [27]. This assumption is further substantiated with a variable at δH 4.05 from the tetrahydrofuran ring and those in the regions ΔδH = 4.8-4.95 (exomethylene double bonds), ΔδH = 5.85-6.3 (other olefinic protons), acetate esters at δH ca. 2.0 and methylenes at δH 1.8 characteristic for myrsinol diretpenes isolated from E. seguieriana (see Supplementary Figures S11-S14). The same biomarkers were   The variables that were characteristic and unique for E. seguieriana (Figure 4.) in the region from δH 7.3 to 8.8 originate from benzoate substituents of the myrsinol diterpenes type skeleton previously reported by F. Jeske et al. [27]. This assumption is further substantiated with a variable at δH 4.05 from the tetrahydrofuran ring and those in the regions ΔδH = 4.8-4.95 (exomethylene double bonds), ΔδH = 5.85-6.3 (other olefinic protons), acetate esters at δH ca. 2.0 and methylenes at δH 1.8 characteristic for myrsinol diretpenes isolated from E. seguieriana (see Supplementary Figures S11-S14). The same biomarkers were   The variables that were characteristic and unique for E. seguieriana (Figure 4.) in the region from δH 7.3 to 8.8 originate from benzoate substituents of the myrsinol diterpenes type skeleton previously reported by F. Jeske et al. [27]. This assumption is further substantiated with a variable at δH 4.05 from the tetrahydrofuran ring and those in the regions ΔδH = 4.8-4.95 (exomethylene double bonds), ΔδH = 5.85-6.3 (other olefinic protons), acetate esters at δH ca. 2.0 and methylenes at δH 1.8 characteristic for myrsinol diretpenes isolated from E. seguieriana (see Supplementary Figures S11-S14). The same biomarkers were were recognized by A. I. Elshamy et al., using agglomerative hierarchical clustering of 32 Euphorbia species based on literature data [28].
According to the literature data, diterpene polyesters and bishomoditerpene lactones are present in E. salicifolia. The three euphosalicins as well as a salicinolide were identified in E. salicifolia by Hohmann et al. [29]. Nevertheless, the highest VIP predictive values exhibited variables at δ H 7.94, 7.92, 7.10, 7.08, 6.74, 6.72, 6.65, 6.54, 5.00 and 3. 91 ( Figure 5). After further investigation of 2D NMR spectra (see Supplementary Figures S15-S20), it was confirmed that these signals belong to acacetin-7-O-glycoside (Table 1), which can be considered a chemotaxonomic marker for E. salicifolia. Apigenin was previously reported in the roots of E. salicifolia by College et al. [30], but for the first time, apigenin derivative was detected in this species. Other variables from the loading plots were correlated to signals originating from the diterpene skeleton protons, as detected in this species [27].  [28]. According to the literature data, diterpene polyesters and bishomoditerpene lactones are present in E. salicifolia. The three euphosalicins as well as a salicinolide were identified in E. salicifolia by Hohmann et al. [29]. Nevertheless, the highest VIP predictive values exhibited variables at δH 7.94, 7.92, 7.10, 7.08, 6.74, 6.72, 6.65, 6.54, 5.00 and 3. 91 ( Figure 5). After further investigation of 2D NMR spectra (see Supplementary Figures S15-S20), it was confirmed that these signals belong to acacetin-7-O-glycoside (Table 1), which can be considered a chemotaxonomic marker for E. salicifolia. Apigenin was previously reported in the roots of E. salicifolia by College et al. [30], but for the first time, apigenin derivative was detected in this species. Other variables from the loading plots were correlated to signals originating from the diterpene skeleton protons, as detected in this species [27]. The loading plot of E. amigdaloides contains variables spread across the spectrum (Figure 6) originating from the specific jatrophane diterpenes named amygdaloidins. They contained nicotinate and angeloyl moieties as well as olefinic protons resonating in the region from δH 6.2 to 9.6 (see Supplementary Figures S21-S24). In addition, variables in the area from δH 3.0 to 4.0 are recognized as biomarkers and originate from protons at the oxygenated carbons from the jatrophane skeletons [31,32].
The E. panonnica also known as E. glareosa and E. nicaeensis is rich in diterpenes of the jatrophane and tigliane type [23]. These skeletal types were characterized by a five-membered ring condensed with a twelve-membered ring or seven-membered and six-membered fused rings containing different substituents belonging to jatrophane and tigliane series, respectively. A variable, which corresponds to the chemical shifts the nodal protons of the five-membered ring of tigliane type skeletons (Figure 7), was recognized as a potential biomarker for E. panonnica (see Supplementary Figures S25-S28). These derivatives mostly contain benzoates as aromatic substituents in comparison to jatrophans containing nicotinates. Variables that are not recognized as significant for biomarkers, but are certainly present in E. panonica, are proton signals from the cyclopropane ring which further supports the assumption that benzoate-substituted tiglianes could be used as chemotaxonomic markers for E. panonica.
In contrast to the previously mentioned Euphorbia species, where the highest values of VIP-predictive scores were attributed to signals exhibiting chemical shifts of protons from different diterpene skeletons, in the loading plot of E. cyparissias, the most significant The loading plot of E. amigdaloides contains variables spread across the spectrum ( Figure 6) originating from the specific jatrophane diterpenes named amygdaloidins. They contained nicotinate and angeloyl moieties as well as olefinic protons resonating in the region from δ H 6.2 to 9.6 (see Supplementary Figures S21-S24). In addition, variables in the area from δ H 3.0 to 4.0 are recognized as biomarkers and originate from protons at the oxygenated carbons from the jatrophane skeletons [31,32].
The E. panonnica also known as E. glareosa and E. nicaeensis is rich in diterpenes of the jatrophane and tigliane type [23]. These skeletal types were characterized by a five-membered ring condensed with a twelve-membered ring or seven-membered and six-membered fused rings containing different substituents belonging to jatrophane and tigliane series, respectively. A variable, which corresponds to the chemical shifts the nodal protons of the five-membered ring of tigliane type skeletons (Figure 7), was recognized as a potential biomarker for E. panonnica (see Supplementary Figures S25-S28). These derivatives mostly contain benzoates as aromatic substituents in comparison to jatrophans containing nicotinates. Variables that are not recognized as significant for biomarkers, but are certainly present in E. panonica, are proton signals from the cyclopropane ring which further supports the assumption that benzoate-substituted tiglianes could be used as chemotaxonomic markers for E. panonica.
In contrast to the previously mentioned Euphorbia species, where the highest values of VIP-predictive scores were attributed to signals exhibiting chemical shifts of protons from different diterpene skeletons, in the loading plot of E. cyparissias, the most significant are the variables belonging to the triterpene signals ( Figure 8). Previously mentioned protons from the cyclopropane ring (δ H 0.41) together with methyl groups of triterpenes (from δ H 0.93 to 1.1), methylene skeleton signals at δ H 1.72 and 1.92-1.95 as well as a broad singlet signal at δ H 4.57 were recognizable in the spectra of acetylated cycloartane [18]. Additionally, two signals in the aromatic region at δ H 8.11 and 8.06 are characteristic for E. cyparissias and responsible for the separation (see Supplementary Figures S29-S32).
Accordant to the literature data, E. maculata is rich in the triterpene type of special metabolites [33] and polyphenols [34]. The variables from δ H 0.75 to 1.07 and 1.66 characteristic for the E. maculata (Figure 9) match to the methyl groups signals from triterpene skeletons. Variables from δ H 3.0 to 4.5 were from proton bonded to oxygenated carbons, and those from δ H 5.1 to 7.05 belong to protons on sp 2 -hybridized carbons with chemical shifts corresponding tannin-type polyphenols (see Supplementary Figures S33-S36) described by I. Agata et al. [34].

Chemicals, Samples and Extraction Protocol
The deuterated methanol, deuterated water, KH 2 PO 4 and deuterated chloroform were purchased from Sigma-Aldrich (Saint Louis, MO, USA  (BEOU17882)). From each species, ten samples from different locations containing aerial parts of plants were dried and stored on silica gel separately after collection. The samples were ground using a laboratory mill (IKA ® A11, IKA ® -Werke GmbH & Co. KG, Staufen, Germany) frozen by the addition of liquid nitrogen and stored at −20 • C until analysis. Each sample (50 mg) was measured in 2 mL microtubes and extracted with 0.7 mL of solvent or solvents mixtures on an ultrasonic bath (BANDELIN SONOREX); then, they were centrifuged on 13,400 rpm (MiniSpin Eppendorf) and 500 µL was transferred into 5 mm NMR tubes.

NMR Measurements and Multivariate Data Analysis
NMR spectra were measured on a Bruker 500 AVANCE III NMR, Fällanden, Switzerland, system equipped with a 5 mm BBI probe head and BVT unit at 298K. All spectra were measured using noesypr1d pulse sequence with 16 scans for optimization and 64 scans for metabolomic studies. For each sample, the transmitter frequency was optimized, and for recorded spectra, the phase and baseline were corrected manually using TopSpin 3.6.pl7 Bruker Biospin, Rheinstetten, Germany software.
The phased 1 H-NMR spectra were further processed using the online tool NMR-ProcFlow v1.4.16, INRA UMR 1332 BFP, Bordeaux Metabolomics Facility, France (https: //nmrprocflow.org, accessed on 20 October 2020) for ppm calibration, global baseline correction, and local alignment. To avoid signals of the residual water, MeOH-d 4 , and CDCl 3 , respectively, the regions of δ H 1.24-1.37, 3.33-3.36, and 7.24-7.26 were removed from the study. The Intelligent Binning approach (resolution factor 0.6, SNR > 10) was used to divide each spectrum into variable size buckets. To build the dataset matrix, the data were normalized to overall spectrum intensity using NMRProcFlow. SIMCA software (version 17, Sartorius Stedim Biotech, Goettingen, Germany) was then utilized for the multivariate data analysis.

Conclusions
Presented data confirmed that NMR-based metabolomics involving molecules with a wide range of polarities can be used for the chemotaxonomy of genus Euphorbia. The differences between the analyzed Euphorbia species were confirmed using SIMCA models and distance measurements between each class model. All of the species-specific samples in the prediction dataset were properly identified, and we confirmed model sensitivity and specificity by 100%.
The most important step was the optimization of extraction protocol for plants metabolomics on Euphorbia species. Furthermore, it was demonstrated that the presented optimization has great potential in the more efficient extraction of special metabolites characteristic for this genus and biomarkers for chemotaxonomic classification [29].
As a result of this study, a potentially new metabolite acacetin-7-O-glycoside was detected in E salicifolia, and its presence was confirmed with 2D NMR data. The variables characteristic for myrsinol diterpenes were recognized as biomarkers for E. seguieriana, those characteristic for jatrophanes were recognized as biomarkers for E. amigdaloides, and tiglianes bearing benzoate esters as biomarkers for E. panonnica. The triterpenes with a cyclopropane ring were identified as biomarkers for E. cyparissias, and in E. maculata, these were triterpenes and tannin-type polyphenols.
In conclusion, our experimental data conform to a previously published theoretical study [25] and confirm that our proposed protocol for NMR-based metabolomics could be used in Euphorbia chemotaxonomy.