Metabolomics for Biomarker Discovery: Key Signatory Metabolic Profiles for the Identification and Discrimination of Oat Cultivars

The first step in crop introduction—or breeding programmes—requires cultivar identification and characterisation. Rapid identification methods would therefore greatly improve registration, breeding, seed, trade and inspection processes. Metabolomics has proven to be indispensable in interrogating cellular biochemistry and phenotyping. Furthermore, metabolic fingerprints are chemical maps that can provide detailed insights into the molecular composition of a biological system under consideration. Here, metabolomics was applied to unravel differential metabolic profiles of various oat (Avena sativa) cultivars (Magnifico, Dunnart, Pallinup, Overberg and SWK001) and to identify signatory biomarkers for cultivar identification. The respective cultivars were grown under controlled conditions up to the 3-week maturity stage, and leaves and roots were harvested for each cultivar. Metabolites were extracted using 80% methanol, and extracts were analysed on an ultra-high performance liquid chromatography (UHPLC) system coupled to a quadrupole time-of-flight (qTOF) high-definition mass spectrometer analytical platform. The generated data were processed and analysed using multivariate statistical methods. Principal component analysis (PCA) models were computed for both leaf and root data, with PCA score plots indicating cultivar-related clustering of the samples and pointing to underlying differential metabolic profiles of these cultivars. Further multivariate analyses were performed to profile differential signatory markers, which included carboxylic acids, amino acids, fatty acids, phenolic compounds (hydroxycinnamic and hydroxybenzoic acids, and associated derivatives) and flavonoids, among the respective cultivars. Based on the key signatory metabolic markers, the cultivars were successfully distinguished from one another in profiles derived from both leaves and roots. The study demonstrates that metabolomics can be used as a rapid phenotyping tool for cultivar differentiation.


Introduction
Food demand has been rapidly increasing with the overall growth in the world population, which is expected to reach around 9.7 billion by the year 2050 [1]. Now more than ever, crop improvement and plant breeding studies have become imperative in ensuring food security and sustainability [2]. The primary step involved in plant breeding, inspection, registration, trade and seed production requires the identification of cultivars and varieties, and therefore a rapid and effective method for cultivar fingerprinting is required [3]. Over the years, plant breeding has been greatly improved for unravelling the molecular basis of complex traits using genomic analyses and next-generation sequencing methods [4]. Currently, plant breeding methods have integrated phenotypic traits with a range of marker-assisted selection techniques to more efficiently determine trait outcomes [5]. Although genetic markers have been at the forefront of plant breeding efforts, Figure 1. Triangular arrangement illustrating the genotype × environment × phenotype interactions, with the metabolome at the core, bridging the gap between the genotype and phenotype. The metabolome is the final recipient of biological information flow and carries imprints of genetic and environmental factors. It is more sensitive to perturbations in both metabolic fluxes and enzyme activity than either the transcriptome or proteome and is thus a reflection of the phenotype. Quantitative, global measurements of the metabolome therefore provide an exploration of cellular metabolism, revealing patterns and functional signatures of the biochemical landscape and cellular physiology of the system under consideration [8][9][10].

Differential Chromatographic-Mass Spectrometric Analyses of Respective Oat Cultivars
Methanolic extracts of leaf and root tissues of the respective cultivars were separated on an ultra-high performance liquid chromatography system coupled to a quadrupole time-of-flight high-definition mass spectrometer (UHPLC-qTOF-MS) and detected in both positive and negative electrospray ionisation (ESI) modes. Initial optimisation studies indicated that the majority of extractable metabolites ionised more effectively in the ESI (−) mode; accordingly, only these datasets are further presented and illustrated. The chromatographically distinct base peak intensity (BPI) chromatograms of leaf and root extracts (Figure 2A,B) provide a visual presentation/description of the similarities and differences between the respective cultivars and reflect the complexity of their metabolic profiles. Although chromatography is extremely useful in separating the components based on their polarity, and high-definition mass spectrometry enables accurate mass determination in order to generate empirical formulae to aid in compound annotation, further chemometric analyses were performed to obtain biologically useful information. Figure 1. Triangular arrangement illustrating the genotype × environment × phenotype interactions, with the metabolome at the core, bridging the gap between the genotype and phenotype. The metabolome is the final recipient of biological information flow and carries imprints of genetic and environmental factors. It is more sensitive to perturbations in both metabolic fluxes and enzyme activity than either the transcriptome or proteome and is thus a reflection of the phenotype. Quantitative, global measurements of the metabolome therefore provide an exploration of cellular metabolism, revealing patterns and functional signatures of the biochemical landscape and cellular physiology of the system under consideration [8][9][10].
Metabolomics, defined as the comprehensive qualitative and quantitative analysis of all metabolites in a biological system, is an established omics technology that holds promise in agricultural research; therefore, metabolomics has become an indispensable tool in various plant sciences studies [11,12]. Due to the diverse and large variety of metabolites found in plants, an extensive array of analytical techniques has been developed to obtain sufficient coverage for plant metabolomics. Liquid chromatography-mass spectrometry (LC-MS)-based plant metabolomics, compared to that of gas chromatography-mass spectrometry (GC-MS) and nuclear magnetic resonance (NMR), has been advantageous in detecting a wide range of secondary metabolites with higher sensitivity and selectivity (compared to NMR) and has the ability to detect and identify a broader range of compounds whilst being less time-consuming in the preparation of samples (compared to GC-MS which requires derivatisation) [13][14][15].
In the past, metabolomics has proven crucial for studying plant x environment interactions (e.g., adaptive responses towards biotic and abiotic stresses) and has been applied in metabolomics-assisted breeding of crops. So far, great progress has been made in the development of metabolomics tools for crop improvement. However, there are still bottlenecks with regards to analytical technologies and tools used for data mining and interpretation. Some of these limit metabolome coverage, the maximisation of metabolomics data and the annotation of extracted metabolites [16][17][18]. In plants, metabolites are known to play important roles in crop yield, nutritional quality, growth and development, as well as in plant defence against environmental stresses [19,20]. Different metabolomic applications have therefore been developed to elucidate plant responses and mechanisms under different conditions to determine metabolic profiles for use in crop improvement [16]. As such, metabolomics allows the predictive discovery of biomarkers, independent of genetic and environmental variation. These metabolite biomarkers provide invaluable information on biochemical mechanisms that underly phenotypic traits and can be used in the development of targeted methods for breeding programmes [17,21,22]. Plant metabolites are increasingly incorporated into breeding programmes for the prediction of phenotypic traits and thus provide an early detection tool for identifying favourable traits.
In this study, metabolomics tools and approaches were applied in order to develop a profiling methodology able to discriminate between various oat (Avena sativa L.) cultivars. Oat belongs to the monocotyledonous Poaceae family along with other cereals such as wheat, rice, barley, rye, maize, sorghum and millet [23]. Of these, oat has recently attracted renewed interest due to numerous health and nutritional benefits involved in both human and livestock consumption [24,25]. Oat is also considered a superior cereal crop due to its hardiness and ability to thrive and withstand environmentally poor conditions where other cereals seem to be lacking [26].

Differential Chromatographic-Mass Spectrometric Analyses of Respective Oat Cultivars
Methanolic extracts of leaf and root tissues of the respective cultivars were separated on an ultra-high performance liquid chromatography system coupled to a quadrupole time-of-flight high-definition mass spectrometer (UHPLC-qTOF-MS) and detected in both positive and negative electrospray ionisation (ESI) modes. Initial optimisation studies indicated that the majority of extractable metabolites ionised more effectively in the ESI (−) mode; accordingly, only these datasets are further presented and illustrated. The chromatographically distinct base peak intensity (BPI) chromatograms of leaf and root extracts (Figure 2A,B) provide a visual presentation/description of the similarities and differences between the respective cultivars and reflect the complexity of their metabolic profiles. Although chromatography is extremely useful in separating the components based on their polarity, and high-definition mass spectrometry enables accurate mass determination in order to generate empirical formulae to aid in compound annotation, further chemometric analyses were performed to obtain biologically useful information.

Chemometric Analyses for Profiling the Oat Cultivar Metabolomes
Due to the complexity and multi-dimensionality of metabolomic data, appropriate statistical and chemometric tools are required to obtain chemical information and convert it into biological knowledge [27]. Chemometrics is the science of extracting useful information from complex datasets through pattern recognition and machine learning algorithms [28,29]. Principal component analysis (PCA) is a multivariate technique that increases the interpretability and minimises the loss of biological information by reducing the dimensionality of complex datasets [30]. The underlying structures and characteristics of the data are thus revealed by this unsupervised, explorative method. The illustrated PCA Metabolites 2021, 11, 165 4 of 23 models ( Figure 3) show distinct clustering of the five respective cultivars (Magnifico, Dunnart, Pallinup, Overberg and SWK001), which points to underlying differential metabolic profiles from the leaf ( Figure 3A) and root ( Figure 3B) tissues. The model illustrates both similarities and differences within (PC2/3) and between (PC1) the cultivar groupings. This differential clustering revealed by PCA relates to the differences previously visualised by the chromatographic separation ( Figure 2).

Chemometric Analyses for Profiling the Oat Cultivar Metabolomes
Due to the complexity and multi-dimensionality of metabolomic data, appropriate statistical and chemometric tools are required to obtain chemical information and convert it into biological knowledge [27]. Chemometrics is the science of extracting useful infor- of the data are thus revealed by this unsupervised, explorative method. The illustrated PCA models ( Figure 3) show distinct clustering of the five respective cultivars (Magnifico, Dunnart, Pallinup, Overberg and SWK001), which points to underlying differential metabolic profiles from the leaf ( Figure 3A) and root ( Figure 3B) tissues. The model illustrates both similarities and differences within (PC2/3) and between (PC1) the cultivar groupings. This differential clustering revealed by PCA relates to the differences previously visualised by the chromatographic separation ( Figure 2). In addition to PCA modelling, another unsupervised technique, namely, hierarchical cluster analysis (HiCA), was used to cluster high-dimensional data into a dendrogram based on the dissimilarity and similarity of the samples [31]. In a bottom-up representation ( Figure 3C,D), the algorithm clusters each observation based on their differences and further proceeds by joining the most similar clusters at each step in an iterative manner. The resulting dendrogram illustrates that the metabolic profiles of the leaf tissues ( Figure  3C) of 'Magnifico' and 'Dunnart' appear to be closely related; this is similar in the case of 'Pallinup' and 'Overberg', with 'SWK001' appearing to be most metabolically different and clustering at the far left. The cultivars also cluster separately based on their profiles In addition to PCA modelling, another unsupervised technique, namely, hierarchical cluster analysis (HiCA), was used to cluster high-dimensional data into a dendrogram based on the dissimilarity and similarity of the samples [31]. In a bottom-up representation ( Figure 3C,D), the algorithm clusters each observation based on their differences and further proceeds by joining the most similar clusters at each step in an iterative manner. The resulting dendrogram illustrates that the metabolic profiles of the leaf tissues ( Figure 3C) of 'Magnifico' and 'Dunnart' appear to be closely related; this is similar in the case of 'Pallinup' and 'Overberg', with 'SWK001' appearing to be most metabolically different and clustering at the far left. The cultivars also cluster separately based on their profiles extracted from root tissues ( Figure 3D); in this case, however, 'Dunnart' and 'Pallinup' seem to be most similar metabolically and, in turn, are grouped with 'Overberg'. 'Magnifico' and 'SWK001', in this case, are the most metabolically different from the other cultivars and similar to each other. It is of interest that the unsupervised, explorative method not only underscored differences between cultivars but also highlighted differences between the extracts from roots and leaves of these cultivars. The source-to-sink model describes differences between various plant tissues based on their environment as well as the synthesis and transport of various nutrients required for growth and development [32]. Source tissues are often described as net exporters of resources required for plant growth, such as carbon or nitrogen, while sink tissues are net importers responsible for resource absorption. Mature leaves are net sources of carbon but sink for nitrogen, while root tissues are net sources of nitrogen but sink for carbon [33]. Another example contributing to the differences among the respective tissues is the presence of secondary metabolites. These may be uniquely synthesised by either the leaves or roots, such as the case for avenacins, which are triterpenoid saponins found in roots whilst the leaves contain steroidal saponins known as avenacosides. Both compounds serve a similar purpose but are respectively confined to the various plant tissues [34]. Once differences are apparent, more in-depth information can be obtained through supervised methods such as orthogonal projection to latent structures discriminant analysis (OPLS-DA).
OPLS-DA modelling showed sample classification in the score space between different experimental groups, as depicted in Figure 4A. With the 'SWK001' and 'Dunnart' cultivars, clear clustering and group separation are shown. As a supervised method, OPLS-DA is often considered a model that is prone to overfitting data; therefore, rigorous model validation methods are used to ensure the validity and reliability of the computed model [35]. The reliability of the models was tested using cross-validation analysis of variance (CV-ANOVA) where the significant models had p-values of <0.05. Furthermore, the performance of the OPLS-DA models was evaluated using receiver operator characteristic (ROC) models where perfect classification was depicted as the ROC curve passed through the top left corner, indicating perfect sensitivity and specificity (Supplementary Figure S1). Finally, permutation tests were performed where the OPLS-DA models were statistically shown to be better than the generated permutation models with the R 2 and Q 2 being higher for the OPLS-DA model ( Figure 4C). The loadings S-plot ( Figure 4B) was used to target and select statistically significant discriminatory ions among the different cultivars. Furthermore, each selected variable from the S-plot was evaluated using a dot plot ( Figure 4D) that computes each observation as a unit and subsequently sorts each component into "bins" that represent sub-ranges. Strong discriminating variables show no overlap between the groups, as can be seen in Figure 4D. OPLS-DA models and their corresponding loadings S-plots were similarly constructed for all cultivars for both leaf and root tissue extracts (20 in total for each tissue type-model infographics are available on request).

Differential Metabolic Profiles Based on Discriminatory Ions
Following the selection of discriminant ions from the respective loadings S-plots, a list of putatively identified metabolites was compiled and is presented in Table 1. The statistically significant variables were annotated as described in experimental Section 4.6. The possible chemical structures were then explored by further inspection of the generated fragmentation patterns under various collision energies (MS E ) ( Figure 5). The annotated metabolites thus represent the discriminatory compounds that allowed for differentiation among the different cultivars. Datasets from all five cultivars were compared to one another. In Table 1, the asterisks within the coloured cells indicate metabolites that were discriminatory for the respective cultivars compared to the other four cultivars and are indicated when detected against one or more of the other cultivars. These compounds were placed in the following metabolite classes: carboxylic acids, amino acids, fatty acids, phenolics and flavonoids. In addition, a steroidal saponin (avenacoside A) was annotated in extracts from leaves and a triterpenoid saponin (avenacin A-1) in extracts of roots.
Data visualisation tools were used to illustrate the magnitude and presence of the respective metabolites in various cultivars with heatmap analysis (Figures 6 and 7). Here, the average integrated peak areas of the respective metabolites were used to construct heatmaps using statistical analysis software available on MetaboAnalyst https://www. metaboanalyst.ca/ (accessed on 12 March 2021) [36]. Five well-defined clusters are illustrated that relate to the five different experimental groups. These infographics show clear differences among the cultivars with respect to their various metabolic profiles. These profiles could prove useful in not only discriminating among the various cultivars but also providing useful information on possible links to stress resistance or susceptibility capabilities between them. Among the identified metabolites (Table 1), the differential metabolic profiles based on discriminatory ions present in the hydromethanolic extracts of the various cultivars were as follows: 'Magnifico' contained 9 flavonoids, 5 phenols and avenacoside A in the leaves, and 3 amino acids, 1 carboxylic acid, 3 fatty acids, 2 flavonoids and 1 phenol in the roots. 'Dunnart', on the other hand, had 7 flavonoids, 4 phenols and avenacoside A in the leaves, and 1 amino acid derivative, 1 carboxylic acid, 2 fatty acids, 2 flavonoids, 1 phenol and avenacin A-1 in the roots. 'Pallinup' showed a metabolic profile containing 3 fatty acids, 9 flavonoids and 6 phenols in the leaves, and 1 amino acid, 1 fatty acid, 1 flavonoid, 1 phenol and avenacin A-1 in the roots. 'Overberg' had 1 amino acid, 10 flavonoids, 4 phenols and avenacoside A in the leaves, and 3 amino acids, 1 carboxylic acid, 3 fatty acids, 2 flavonoids and 5 phenols in the roots. Lastly, 'SWK001' showed a metabolic profile containing 1 amino acid, 2 fatty acids, 5 flavonoids, 5 phenols and avenacoside A in the leaves, and 3 amino acids, 1 carboxylic acid, 1 fatty acid, 1 flavonoid, 4 phenols and avenacin A-1 in the roots. Based on these differential metabolic profiles, clear overlap and differences can be seen among the cultivars in the form of a Venn diagram ( Figure 8).

Differential Metabolic Profiles Based on Discriminatory Ions
Following the selection of discriminant ions from the respective loadings S-plots, a list of putatively identified metabolites was compiled and is presented in Table 1. The statistically significant variables were annotated as described in experimental Section 4.6. The possible chemical structures were then explored by further inspection of the generated fragmentation patterns under various collision energies (MS E ) ( Figure 5). The annotated metabolites thus represent the discriminatory compounds that allowed for differentiation among the different cultivars. Datasets from all five cultivars were compared to        Table 1) that are unique to the respective cultivars and, conversely, also shared between the cultivars. Metabolic pathway analyses were performed using the chemometrically extracted metabolites and revealed significant and impactful metabolic pathways. The relative intensities of the different metabolites are illustrated via pie charts among the different pathways. Additionally, colour-coded PCA score plots were used to visually display the presence and abundance of selected discriminant metabolites among the different cultivars (expressed as integrated peak areas in the X data matrix) using vector continuous properties available in SIMCA software (Supplementary Figure S2). The most significant pathways included: aromatic amino acid (Phe, Tyr and Trp) biosynthesis, the phenylpropanoid and flavonoid pathways and the stilbenoid biosynthesis pathway. The linoleic acid pathway was illustrated to be most impactful, followed by phenylalanine (Figure 9). Phenylpropanoid metabolic pathways involve some of the most widely occurring plant secondary metabolites which exhibit a range of biological functions involved in development, defence against biotic and abiotic stresses and modulation of biochemical processes. Additionally, phenylpropanoids are also important for the biosynthesis of key compounds such as flavonoids, coumarins and lignans [37]. Linoleic acids (C18:2) are unsaturated fatty acids that are abundant in plant membranes, important for plant cell structure and maintaining water permeability. Additional desaturation leads to linolenic acid, a precursor molecule in the synthesis of jasmonates, which act as signalling molecules in response to tissue damage caused by pathogens, insects, herbivores or mechanical stress [38].   Table 1) that are unique to the respective cultivars and, conversely, also shared between the cultivars.  Table 1) that are unique to the respective cultivars and, conversely, also shared between the cultivars. Metabolic pathway analyses were performed using the chemometrically extracted metabolites and revealed significant and impactful metabolic pathways. The relative intensities of the different metabolites are illustrated via pie charts among the different pathways. Additionally, colour-coded PCA score plots were used to visually display the presence and abundance of selected discriminant metabolites among the different cultivars (expressed as integrated peak areas in the X data matrix) using vector continuous properties available in SIMCA software (Supplementary Figure S2). The most significant pathways included: aromatic amino acid (Phe, Tyr and Trp) biosynthesis, the phenylpropanoid and flavonoid pathways and the stilbenoid biosynthesis pathway. The linoleic acid pathway was illustrated to be most impactful, followed by phenylalanine (Figure 9). Phenylpropanoid metabolic pathways involve some of the most widely occurring plant secondary metabolites which exhibit a range of biological functions involved in development, defence against biotic and abiotic stresses and modulation of biochemical processes. Additionally, phenylpropanoids are also important for the biosynthesis of key compounds such as flavonoids, coumarins and lignans [37]. Linoleic acids (C18:2) are unsaturated fatty acids that are abundant in plant membranes, important for plant cell structure and maintaining water permeability. Additional desaturation leads to linolenic acid, a precursor molecule in the synthesis of jasmonates, which act as signalling molecules in response to tissue damage caused by pathogens, insects, herbivores or mechanical stress [38]. . Summarised pathway analyses of all MetaboAnalyst-computed metabolic pathways displayed according to their significance or pathway impact. The figure illustrates all the matched pathways arranged by p-values (y-axis; pathway enrichment analysis) and the pathway impact values (x-axis; pathway topology analysis). Each node is coloured according to its corresponding pvalues, with the node sizes determined according to their impact values. The graph thus illustrates the pathways with high impact: linoleic acid (C18:2, n-6) pathway, phenylalanine and stilbenoid biosynthesis, and the pathways with high statistical significance: phenylpropanoid, phenylalanine, tyrosine and tryptophan biosynthesis. values (x-axis; pathway topology analysis). Each node is coloured according to its corresponding p-values, with the node sizes determined according to their impact values. The graph thus illustrates the pathways with high impact: linoleic acid (C18:2, n-6) pathway, phenylalanine and stilbenoid biosynthesis, and the pathways with high statistical significance: phenylpropanoid, phenylalanine, tyrosine and tryptophan biosynthesis.
Differences with regard to the relative intensities of putative signatory biomarkers among the cultivars were thus explored using heatmaps (Figures 6 and 7), colourcoded PCA score plots (Supplementary Figure S2), pie charts ( Figure 10) and radar charts ( Figure 11). Among the cultivars, 'Pallinup' extracts contained a number of metabolites that uniquely presented as discriminatory for this cultivar (hydroxyoctadecatrienoic acid, sinapic acid glucoside, oxalate derivative and isoquercetin in leaves, and trihydroxyoctadecadienoic acid in roots). However, this does not necessarily always indicate absence in the other cultivars but only a greater relative intensity, as can be seen in the generated heatmap (Figure 6). 'Overberg' had two uncommon flavonoids that presented as discriminatory for this cultivar: xeractinol (a flavanol C-glucoside) and neocarlinoside (a tetrahydroxyflavone C-glycoside), as can be seen in the illustrated heatmap ( Figure 6). The radar chart ( Figure 11A) reiterates neocarlinoside as discriminatory in 'Overberg' based on its intensity.
Caffeoylshikimic acid is another metabolite that was detected as discriminatory for 'Magnifico' and 'SWK001'. The Figure 6 heatmap illustrates the presence of caffeoylshikimic acid among these cultivars and demonstrates the abundance to be greater in the 'SWK001' cultivar compared to all the other cultivars, followed by 'Magnifico'. This information is further confirmed by Figure 10A (pie chart), illustrating the pathways in which this metabolite is involved and how it is distributed amongst the cultivars, Figure 11B (radar chart) and Figure S2 (colour-coded PCA score plot). 'Dunnart' showed a greater abundance of feruloylquinic acid in both leaf and root tissue (Figures 6 and 7), thus making this compound a discriminatory ion for this cultivar. Based on these examples, it is clear that this method of cultivar profiling is sensitive enough to detect the presence of specific secondary metabolites among the different cultivars and generate relative intensity values, thus making it useful in cultivar profiling and comparison.
The averaged peak intensities of each metabolite were also combined to produce radar charts (Figure 11), comparatively displaying features from the metabolomes of the leaf and root tissues. A radar chart is a graphical method used to display multivariate data in a two-dimensional plane and illustrates several quantitative variables on axes originating from the same point. These charts are informative as they sort the variables into different positions that show distinct correlations between the different groups [39]. In the respective radar plots, a range of metabolites are presented and plotted based on their averaged peak intensities. In Figure 11A, clear differences and correlations can be seen. Isovitexin 2"-O-glucoside showed to be least abundant in 'SWK001' and most abundant in 'Dunnart'. Isorhamnetin glucoside, on the other hand, was demonstrated as least abundant in 'Dunnart' and most abundant in 'SWK001'. These charts are therefore informative in distinguishing between the various cultivars based on the respective discriminatory metabolites.
To summarise, the results show clear cultivar-related differences with regard to the respective underlying metabolic profiles. Metabolomics as a tool for cultivar discrimination would thus provide a quicker view of the metabolome that could be applied in plant breeding studies to not only differentiate but also elucidate possible predictive stressassociated resistance or susceptibility traits among the cultivars. The results show metabolic differences for carboxylic acids, amino acids, fatty acids, phenolic acids (hydroxycinnamic acids and hydroxybenzoic acids and associated derivatives), flavonoids and saponins. Figure 12 summarises the distribution of the discriminatory metabolic markers from the respective metabolite classes (represented are phenolic acids, saponins, flavonoids and fatty acids) and their associated biological functions.  Table 1). The respective metabolite codes (e.g., C00079-Phenylalanine) indicate KEGG unique identifiers.  sociated resistance or susceptibility traits among the cultivars. The results show metabolic differences for carboxylic acids, amino acids, fatty acids, phenolic acids (hydroxycinnamic acids and hydroxybenzoic acids and associated derivatives), flavonoids and saponins. Figure 12 summarises the distribution of the discriminatory metabolic markers from the respective metabolite classes (represented are phenolic acids, saponins, flavonoids and fatty acids) and their associated biological functions. Based on the graphical summary, clear differences and overlap can be seen among the different cultivars. For instance, 'Overberg' contains a greater number of phenolics and flavonoids as discriminatory markers; therefore, this cultivar could exhibit a multitude of beneficial traits related to the presence of metabolites from these classes such as Based on the graphical summary, clear differences and overlap can be seen among the different cultivars. For instance, 'Overberg' contains a greater number of phenolics and flavonoids as discriminatory markers; therefore, this cultivar could exhibit a multitude of beneficial traits related to the presence of metabolites from these classes such as antioxidant and antipathogenic activity. The 'Dunnart' and 'SWK001' cultivars can be seen as containing both avenacoside A and avenacin A-1 as discriminatory, which could suggest comparative greater defence-associated capabilities in leaves and roots based on the biological activity of these compounds. Ultimately, these metabolic features and their differences contribute to biological variances and could affect how the respective cultivars respond to abiotic and biotic factors.

Discussion
When compared to other cereal crops, oat has been greatly underrated, despite containing a range of unique compounds and nutrients that are greatly beneficial for human health and reduce incidences of certain degenerative diseases [40]. Oat is also considered superior due to its hardiness and ability to thrive and withstand environmentally poor conditions where other cereals seem to be lacking [25]. These benefits are greatly attributed to the rich diversity of secondary metabolites that oat contains such as phenolic acids, flavonoids, phytosterols, carotenoids, avenanthramides, avenacosides and avenacins [41,42]. Among the groups of metabolites identified, carboxylic acids are widely distributed in nature and involved in primary metabolism, responsible for growth and development [43]. For example, citric acid was annotated among the discriminatory ions present in the root extracts. This primary metabolite forms part of the tricarboxylic acid (TCA) cycle and is a pivotal part of energy synthesis, and it provides precursors for the biosynthesis of a range of secondary metabolites and amino acids in plants [44]. Its detection by OPLS-DA and its presence are therefore important in interpreting the metabolic processes that occur and its role in the synthesis of secondary metabolites that were detected as discriminatory metabolites in the respective cultivars.
Common substrates for the synthesis of amino acids include not only intermediates from the TCA cycle but also glycolysis and the pentose phosphate pathway. The latter primarily produces intermediates involved in the synthesis of Phe, Tyr and Trp [45]. Among these, phenylalanine and tryptophan presented as discriminatory metabolites in the leaves and roots. Tryptophan is involved in two distinct pathways to produce secondary metabolites. One such pathway starts with the decarboxylation of tryptophan by tryptophan decarboxylase (TDC) to initiate the synthesis of indole alkaloids. The tryptophan pathway also branches from the shikimate pathway at chorismate, where it is initially synthesised from anthranilate (another discriminatory metabolite) by anthranilate synthase. The secondary metabolites produced via these pathways have been known to play pivotal roles in the defence systems in various members of the grass family (Poaceae) [46,47]. Anthranilate also plays an important role in the synthesis of avenanthramides (Ava), which are oat phytoalexins that are produced in response to pathogen infection. Ava have been found to form dimers and are incorporated into plant cell walls for reinforcement; thus, they function in both the chemical and physical defence of oat against pathogens [48,49].
Flavonoids are synthesised through the phenylpropanoid pathway ( Figure 10A), where 4-coumaroyl-CoA is formed from cinnamic acid, which finally enters the flavonoid ( Figure 10B) biosynthesis pathway [50]. The first enzyme specific for the flavonoid pathway, chalcone synthase, produces chalcone scaffolds from which all flavonoids derive. Flavonoids most common in oat include apigenin, luteolin, tricin, kaempferol, quercetin and their glycoside derivatives [51,52]. The majority of metabolites identified were classified as flavonoids, with most being glycoside derivatives of apigenin, quercetin, kaempferol and tricin. Flavonoids have a range of biological activities in plants such as antioxidant, antimicrobial, signalling, allelopathic and defence against environmental stressors [53].
Phenolic acids are synthesised via the phenylpropanoid pathway from phenylalanine through a process that commonly involves deamination, hydroxylation and methylation [54]. Structurally, all phenolic acids are hydroxylated derivatives of cinnamic acid or benzoic acid. Hydroxycinnamic acid (HCA) derivatives commonly include ferulic acid, caffeic acid, sinapic acid and coumaric acid. Correspondingly, hydroxybenzoic acids include derivatives known as protocatechuic acid, gallic acid, vanillic acid and sinapinic acid [55]. HCAs were abundantly identified among the various cultivars, with derivatives from coumaric acids, ferulic acid and sinapic acid commonly present. These phenolics are generally known to be significant in plant development, particularly in lignin and pigment biosynthesis, and provide structural and scaffolding support to plants [56].
Plants also produce a range of fatty acids, some of which presented as discriminatory metabolites among the cultivars. Commonly, plants produce palmitic, oleic, linoleic and linolenic acids, and in this study, oleic acid derivatives were frequently identified among the cultivars. Oleic acid is converted to linoleic acid which, in turn, is converted to linolenic acid [57]. Oleic and linoleic acids are known to constitute the two major unsaturated fatty acids in plants and are involved in a range of biological activities, some of which include antifungal properties, and also in the synthesis of important defence signalling molecules such as jasmonates [58,59].
Two saponin molecules were also identified in the leaves (avenacoside A) and the roots (avenacin A-1) of the respective cultivars. Avenacosides are biologically inactive phytoanticipins that are converted into biologically active 26-desglucoavenacosides by an avenacosidase enzyme in response to tissue damage or pathogen attack [60]. The major mechanism of activity against pathogens is due to their ability to complex with sterols in the pathogen membrane and cause disruption in the membrane integrity. This process is thought to result in the formation of transmembrane pores by aggregation of the saponin with the sterol groups. The remaining sugar moieties of the active molecules have also been known to play an essential role in membrane permeabilisation, and therefore the removal of these sugar residues could result in loss of biological activity [60][61][62][63]. Avenacins have a similar mechanism of action against pathogens that attack the roots; however, they are already present in biologically active forms. Ultimately, these saponins are responsible for defence against pathogens via the formation of micelle-like aggregations between the saponins and sterols in the membrane [64,65].

Plant Cultivation
Seeds of five oat (Avena sativa L.) cultivars: 'Magnifico', 'Dunnart', 'Pallinup', 'Overberg' (Agricol, Pretoria, South Africa) and 'SWK001' (ARC Small Grain Institute, Bethlehem, South Africa), were obtained and cultivated in triplicate. All cultivars were grown in germination mixture (Culterra, Muldersdrift, South Africa) under greenhouse conditions: a light/dark cycle of 12 h/12 h, with a light intensity of about 84 µmol/m 2 /s and temperature between 25 and 28 • C. Once the plants reached the 3-week maturity stage (seedling stage or three-leaf stage), the leaves and roots were harvested, frozen in liquid nitrogen to quench metabolic activity and stored at −80 • C until metabolite extraction. The experimental design included three independent biological replicates and the experiments were repeated twice.

Metabolite Extraction and Sample Preparation
Liquid nitrogen was added to the leaf and root materials, which were then crushed into powder form using a mortar and pestle. One gram per sample was weighed into a clean 50 mL Falcon tube and 10 mL of 80% cold aqueous methanol (4 • C) was added (m/v ratio of 1:10). The methanol used was analytical grade (Rochelle Chemicals, Johannesburg, South Africa). The mixture was then homogenised using a probe sonicator (Bandelin Sonopuls, Berlin, Germany) set to 55% power for 10 s per sample. Equipment was cleaned between samples to prevent cross-contamination. The homogenates were centrifuged at 5100× g for 20 min at 4 • C in a benchtop centrifuge after which the supernatants were kept and concentrated by evaporating the methanol under vacuum to approximately 1 mL using a rotary evaporator set to 55 • C. The concentrated samples were transferred to 2 mL Eppendorf microcentrifuge tubes and dried in a centrifugal evaporator under vacuum. The dried extracts were then reconstituted by dissolving in 500 µL of 50% aqueous methanol (LC-grade, Romil Pure Chemistry, Cambridge, UK). The samples were subsequently filtered through nylon syringe filters (0.22 µm) into chromatography vials fitted with 500 µL inserts, capped and kept at 4 • C until analysis.

Ultra-High Performance Liquid Chromatography (UHPLC) Analyses
An Acquity UHPLC system (Waters Corporation, Manchester, UK) was used to analyse 2 µL of each sample, separated into its respective components using a binary solvent on an HSS T3 reverse-phase column (2.1 × 150 mm × 1.7 µm; Waters Corporation, Billerica, MA, USA). The solvents used were MilliQ water (solvent A) and acetonitrile (solvent B) (Romil Chemistry, Cambridge, UK), both containing 0.1% formic acid (Sigma, Munich, Germany) and 2.5% isopropanol (IPA, Romil, Cambridge, UK). The run was set to 30 min per 2 µL injection with an elution gradient carried out via a binary solvent system at a flow rate of 0.4 mL/min. The initial conditions were 95% A and 5% B held for 1 min. A gradient was applied to change the chromatographic conditions to 10% A and 90% B at 25 min and changed to 5% A and 95% B at 25.10 min. These conditions were held for 2 min and then changed to the initial conditions at 28 min. The analytical column was allowed to calibrate for 2 min before the next injection. Pooled quality control (QC) samples were also prepared to condition the LC-MS system and assess the reliability and reproducibility of each analysis [66]. Additionally, blank samples (50% MeOH) were also randomly included in the run to monitor potential carry over and background noise. Each sample was analysed in triplicate (technical replicates), and together with the three biological replicates, this generated n = 9, in order to account for analytical variability.

Quadrupole Time-of-Flight Mass Spectrometry (q-TOF-MS)
A high-definition SYNAPT G1 q-TOF mass spectrometry system, controlled by Mass-Lynx XS TM software (Waters Corporation, Manchester, UK), was coupled to the chromatography system to detect metabolites and acquire data in both positive and negative electrospray ionisation (ESI) operation modes. A reference calibrant, leucine encephalin (554.2615 Da), was set as the lockmass and allowed for typical mass accuracies from 1 to 3 mDa. The respective capillary and sampling cone voltages were set as 2.5 kV and 30 V. The desolvation temperature used was 450 • C, with the source temperature set to 120 • C, cone gas flow set to 50 L/h and desolvation gas flow set to 550 L/h. An m/z range of 50-1200 Da was set with a scan time of 0.1 s. The desolvation, collision and cone gas used at a flow rate of 700 L/h was high-purity nitrogen. Data were acquired using five different collision energies (MS E ), ramping from 0 to 50 eV to cause fragmentation of the initial ions so as to ensure that as much information regarding the structures of the respective compounds could be obtained for downstream structural elucidation and metabolite annotation [67,68].

Data Analyses
The datasets obtained were explored and processed using MarkerLynx XS TM software (Waters Corporation, Manchester, UK). The software makes use of a patented algorithm called ApexTrack. The following parameters were used for processing: retention time (Rt) range 2-25 min and m/z range 150-1200 Da. The Rt window was set to 0.20 min and the mass window to 0.05 Da. The mass tolerance was 0.05 Da and the intensity threshold was set to 150 counts. The generated data matrices were exported into "soft independent modelling of class analogy" (SIMCA) software, version 14 (Umetrics, Umea, Sweden), for multivariate data analysis (MVDA). Unsupervised models, namely, principal component analysis (PCA) and hierarchical clustering analysis (HiCA), were used to reduce the dimensionality of the datasets and to explore the underlying structures and characteristics of the data. Supervised orthogonal projection to latent structures discriminant analysis (OPLS-DA) was used for binary classification analyses of cultivars, identifying thus discriminatory ions among the different cultivars. The OPLS-DA models were validated using rigorous methods [10,69,70]. The roles of these MDVA tools in the metabolomics workflow are further described in Section 2.2.

Metabolite Annotation and Semi-Quantitative Comparisons
Metabolites were putatively identified based on their respective (measured) accurate masses (based on which elemental compositions were computed using the MarkerLynx XS software tool) and fragmentation information (for structural elucidation). Each suggested empirical formula was exported and searched for in various databases such as MetaCyc [71], Plant Metabolic Network (PMN) [72], ChemSpider, MassBank of North America [73], Dictionary of Natural Products [74] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) [75]. Processed data matrices were also exported from MarkerLynx XS software to the "Taverna workbench" containing an in-house library and allowing a high-throughput automated assignment of putative metabolite identities based on measured accurate masses and other collected spectral features (for a detailed description, see [76]. Metabolites were putatively identified to level 2 of the Metabolomics Standards Initiative (MSI) unless specified otherwise [77]. Avenacoside A was identified using an authentic standard (Sigma-Aldrich, Muenchen, Germany).
Furthermore, MetaboAnalyst 4.0 https://www.metaboanalyst.ca/home.xhtml (accessed on 12 March 2021) [36] was utilised for additional integrative data analyses. Data pre-treatment (integrity, missing values, filtering and normalisation) was performed prior to downstream chemometric and statistical modelling. A comparison of the magnitude and presence of the identified metabolites among the various cultivars was performed via heatmap analyses using a Pearson distance measure and the Ward clustering algorithm [36,78]. Partial least square discriminant analysis (PLS-DA) was also used to mine the data via MetaboAnalyst for the comparison and visualisation of the relative abundances of the identified metabolites across the various cultivars. "Variable importance in projection" (VIP) score plots, derived from the OPLS-DAs, were generated to indicate the key discriminatory metabolites with VIP scores of >0.5 which are considered significant in discriminating between the cultivars. Additionally, to further visualise changes among the discriminatory metabolites across the various cultivars, radar plots were constructed based on the averages of the relative intensities and illustrated as log-transformed values (Section 2.3).

Conclusions
Metabolomics has been widely applied in crop plant sciences and has shown great progress in understanding how the phenotype links to the metabolome and, by extension, elucidating the active role of metabolites under normal and stress conditions. Metabolomics could therefore provide insights into understanding crop physiology and biochemistry as well as underlying metabolic events. This could greatly improve crop breeding which is currently based on gene and marker-assisted selection. Although the latter has shown success in crop improvement, it is also faced with many limitations such as the fact that the presence of a gene does not necessarily ensure the expression of a trait. Metabolomics has the potential to overcome this limitation and provide useful insights about metabolites involved in resistance, growth and stress responses, which, in turn, can be applied to crop improvement. Thus, in this study, LC-MS-based metabolomics was applied to interrogate the metabolomes of five different oat cultivars. This multidisciplinary omics approach allowed the elucidation and characterisation of differential metabolic profiles that define natural variation among the oat metabolomes under consideration. The identified metabolic classes were carboxylic acids, amino acids, fatty acids, phenolic compounds (hydroxycinnamic acids and hydroxybenzoic acids and associated derivatives) and flavonoids. Further, a steroidal saponin (avenacoside A) was annotated in extracts from leaves and a triterpenoid saponin (avenacin A-1) was annotated in extracts from roots. The differences in the metabolic profiles indicate that untargeted metabolomics can be used to distinguish between cultivars. The results further indicate that to discriminate between the different cultivars, the presence or absence of specific metabolites cannot be the only concluding factor, and the relative intensities or ratios of the metabolites also need to be considered as distinguishing criteria. The secondary metabolite classes that were mentioned have various biological roles that are important in plant growth and development, preventing pathogen infections and maintaining the plant under various environmental conditions. Ultimately, an untargeted LC-MS-based metabolomics approach can be used to detect the underlying metabolites that contribute to phenotypic and physiological traits. This will greatly contribute to a more holistic comprehension of the oat plant metabolome which can ultimately be applied in crop improvement and breeding strategies. Data Availability Statement: The study design information, LC-MS data, data processing and analyses are reported on and incorporated into the main text. Raw data, analyses and data processing information and the meta-data have been deposited into the EMBL-EBI metabolomics repository-MetaboLights50, with the identifier MTBLS2478 (http://www.ebi.ac.uk/metabolights/MTBLS2478) (accessed on 12 March 2021).