Comprehensive Metabolomic Fingerprinting Combined with Chemometrics Identifies Species- and Variety-Specific Variation of Medicinal Herbs: An Ocimum Study

Identification of plant species is a crucial process in natural products. Ocimum, often referred to as the queen of herbs, is one of the most versatile and globally used medicinal herbs for various health benefits due to it having a wide variety of pharmacological activities. Despite there being significant global demand for this medicinal herb, rapid and comprehensive metabolomic fingerprinting approaches for species- and variety-specific classification are limited. In this study, metabolomic fingerprinting of five Ocimum species (Ocimum basilicum L., Ocimum sanctum L., Ocimum africanum Lour., Ocimum kilimandscharicum Gurke., and Hybrid Tulsi) and their varieties was performed using LC-MS, GC-MS, and the rapid fingerprinting approach FT-NIR combined with chemometrics. The aim was to distinguish the species- and variety-specific variation with a view toward developing a quality assessment of Ocimum species. Discrimination of species and varieties was achieved using principal component analysis (PCA), partial least squares discriminate analysis (PLS-DA), data-driven soft independent modelling of class analogy (DD-SIMCA), random forest, and K-nearest neighbours with specificity of 98% and sensitivity of 99%. Phenolics and flavonoids were found to be major contributing markers for species-specific variation. The present study established comprehensive metabolomic fingerprinting consisting of rapid screening and confirmatory approaches as a highly efficient means to identify the species and variety of Ocimum, being able to be applied for the quality assessment of other natural medicinal herbs.


Introduction
Plants are effective biochemists, producing potent biomolecules used since ancient [1] times for curing a range of diseases in the form of traditional systems medicine [2]. More than half of the global population is dependent on traditional medicines for health care, as cited in a World Health Organisation report [3]. The global demand for medicinal plants has increased considerably due to their proven effectiveness in treating various diseases [4]. The beneficial medicinal effects of plant materials result from the combination of secondary metabolites present in the plants, according to the specific chemical structure of each biochemical active compound [5]. The medicinal activity obtained from these wide varieties of different phytochemicals is plant, tissue, species, or variety specific or taxonomy specific. In many parts of the world, medicinal herbs play a substantial part in diets as they are key ingredients in many foods, beverages, pharmaceuticals, and cosmetics [4,6].
Ocimum, a genus of aromatic annual and perennial herbs in the family Lamiaceae, has been used for thousands of years due to its diverse medicinal properties [7]. It is widely distributed over temperature zones (warmer regions) of the world [8]. It has been used for antioxidant and anti-inflammatory purposes as its extracts protect nerves and tissues by preventing the generation of free radicals [7,9]. Furthermore, this plant is well known for its anti-depressant and anti-aging properties due to the presence of a wide variety of secondary metabolites [10]. Moreover, this plant is most popular in the culinary world, and it is widely used in cooking, in many types of cuisines, and it is also used in food flavouring and preservation [11,12]. Ocimum is known as the queen of herbs and is prominently featured in various cuisines across the world including Italian, Vietnamese, Chinese, Thai, Laotian, and Indian [11][12][13]. This plant contains a wide variety of antioxidants and has anti-microbial properties, due to which it is widely used in food and beverages [7,14]. Further, this is an aromatic and medicinal plant with high economic value that is used in the pharmaceutical and aroma industries [15]. Compared to most other herbs, the taxonomy of the Ocimum genome is considered to be very complex. More than 100 species have been recognised within the genus [16]. The quality control of such medicinal plants is extremely challenging as it is not only limited to the botanical level but also given that there are significant variations of chemical profiles within the same species [17]. Indeed, the secondary metabolite expression in a given plant is a function of biotic, abiotic, and genetic factors that specify species, varieties, and cultivars [18]. Effective identification systems and robust methods for the species and variety classification of medicinal plants are needed.
Most of the taxa classification is based on morphology (macro and microscopic identification) and the colour of leaves [19]. These morphological properties frequently depend on a range of environmental conditions, leading to ambiguity in the classification within the genus, and there is enormous variation in the shape and colour of the leaves from different species and within the varieties. In addition to these techniques, DNA-based methods have been proven to be robust for the unambiguous identification of the medicinal plant genus; however, these methods fail to identify the mixing of species that is responsible for the lowering of the quality of medicinal plant products [16]. Furthermore, classification based on volatile oil composition requires the distillation and fractionation of oils, and the chemotype classification based on only one major volatile oil is erroneous as one plant may contain two or more chemical compounds in nearly equal amounts [15,20,21]. In addition, the overall oil profile of major constituents above the fixed threshold 20% of total essential oil content should be considered.
Plant-metabolome-based fingerprinting methods are gaining more attention to address the pitfalls of the previously mentioned techniques to determine metabolite fingerprints for species-specific and variety-specific variation [22][23][24][25]. Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) are commonly used metabolomics platforms. Although NMR is a non-destructive and straightforward procedure, limited number of metabolites are identified due to the complex spectra of wider plant metabolome consisting of primary and secondary metabolites [26]. The separation and identification of complex spectra of metabolite profile obtained by MS-based analytical platforms are improved by the coupling to a range of separation techniques such as gas chromatography and liquid chromatography, the latter of which not only enhances the separation efficiency but also improves the performance of MS. Liquid chromatography/gas chromatography coupled to mass spectrometry have become central platforms in metabolomics studies, and the recent application of these metabolomics platforms, specifically in natural products and plant sciences, has gained more attention [27]. The resolution of hundreds of metabolites with analyte-specific detection and the ability to identify unknowns make these MS-based platforms suitable tool for fingerprinting studies. In addition to these, spectroscopic methods are gaining more attention for metabolic fingerprinting as they are quick and non-destructive [28]. Spectroscopy has been used as a herbs and spices screening tool. For example, Fourier transform infrared (FT-IR) has been used in a range of medicinal herbs and natural medicine. Recently, medicinal herb metabolome fingerprinting has been used for the assessment of various species and authenticity [28]. A significant number of studies have been devoted to describing Ocimum essential oils [21,29,30], but relatively few studies have been devoted to Ocimum leaves as there is a lack of a comprehensive and rapid fingerprinting approach [16,31,32]. Given the fact that Ocimum leaves are used for various medicinal uses and to add distinctive flavours and aromas for the food and pharmaceutical industries, there is a need for comprehensive and rapid metabolomic fingerprinting of Ocimum species and varieties.
In the present study, an untargeted comprehensive multi-metabolomics approach was used to investigate the phenotype of five different Ocimum species: Ocimum basilicum L. , and Hybrid Tulsi, which are the major species used globally. GC-MS-and LC-MS/MS-based metabolomics combined with multivariate pattern recognition analysis was used to identify fingerprints with potential marker metabolites of Ocimum species. The abundance of nine chemical classes of metabolites were compared between these samples. Further, FT-NIR-based metabolic fingerprinting combined with one-class classifier models, DD-SIMCA and K-nearest neighbor, were used to classify the Ocimum species and varieties. Moreover, eight different varieties from Ocimum species were studied. The aim of the study was to identify metabolite markers through comprehensive and rapid metabolomic fingerprinting of Ocimum species and variants because they are utilised for numerous therapeutic purposes and to offer distinctive flavour, fragrance, and aroma to the food and the pharmaceutical industries.  (Table S1). CSIR-CIMAP has a history of Ocimum cultivation for more than fifty years, and all the Ocimum varieties planted have been confirmed by expert botanists. Ocimum plants with similar growth without diseases were randomly selected. It is important to note that fresh Ocimum leaves are rarely available on the markets and are usually dried and then packaged for storage, transportation, and processing. It is also important to note that freeze-drying (lyophilisation) is an effective drying process without compromising the quality [33]. Therefore, after harvest, the fresh Ocimum leaves were rinsed with water and subjected to freeze-drying. The dried Ocimum leaves were ground to a powder, passed through a mesh, sealed in tubes, and then stored at −80 • C until further use.

Sample Preparation and LC-MS/MS Conditions
The dried powder of Ocimum leaves of each sample was weighed precisely and extracted with 1.5 mL of 70% methanol (HPLC grade, Sigma Aldrich, St. Louis, MO, USA) containing U 13 C 6 glucose (an internal standard, Cambridge Isotope laboratories, Andover, MA, USA). Briefly, the extraction process involved the sonication of samples (using an ultrasonic water bath) for 10 min under ice-cold conditions, a further vortex mix for 30 min, followed by centrifugation for 20 min at 13,523 rcf. The supernatant was collected into fresh tubes. The above extraction process was repeated twice, and the combined extracts were concentrated using lyophilisation (Alpha 2-4 LD plus, Christ, Germany). The dried samples were reconstituted with 400 µL of 70% methanol (HPLC grade) for the LC-MS analysis.
The UPLC analysis was performed on an instrument of the Agilent 1290 series (Agilent Technologies, Santa Carla, CA, USA), composed of a Binary pump (G7120A), autosampler (G7129A), Column oven (G7130A), and diode array detector (G4212B). The UPLC separa-tion was accomplished on an Acquity UPLC BEH C18 1.7 µm 2.1 mm× 150 mm (Waters, Milford, MA, USA) operated at 40 • C. Gradient elution was achieved using two solvents: 0.1% (v/v) formic acid aqueous solution (A) and 0.1% (v/v) formic acid in acetonitrile (B) at a flow rate of 0.2 mL/min. The 30 min UPLC gradient elution program was as follows: (i) 75%, from 0 to 20 min (B); (ii) 75%, from 25 min (B); (iii) 5%, from 26 to 30 min (B) of total run time; the injection volume was 2 µL. The MS analyses were performed on a QTOF-MS/MS instrument of the Agilent 6545 series (G6545A), connected with an Agilent 1290 UPLC (Agilent technologies, Santa Clara, CA, USA) through a dual AJS ESI interface. Nitrogen was used as the drying and collision gas in the ESI source. The ion source parameters were as follows: drying gas flow rate, 10 mL/min; heated capillary temperature, 330 • C; nebuliser pressure, 35 psi; VCap, fragmentor, skimmer, and octopole RF peak voltages set at 4000, 180, 45, and 750 V, respectively. The detection was carried out in positive and negative electrospray ionisation modes, and spectra were recorded by MS scanning in the range of m/z 80-1500. The MS/MS analyses were carried out by data-dependent acquisition, and the collision energy was set at 10-30 eV. Mass Hunter software version B.07.00 (Agilent Technology) was used to control the LC-MS/MS system, data acquisition, and processing.

Sample Preparation and GC-MS Conditions
The lyophilised samples of different varieties of Ocimum were stored in air-tight containers at −80 • C after being treated with liquid nitrogen to prevent metabolic activity. Before extraction, dried leaves were ground using a mortar and pestle. The dried powder of the leaves of each sample was weighed precisely and we added 1.5 mL of 70% methanol (HPLC grade) containing 10 µg/mL uniform 13 C6 glucose (an internal standard for relative quantification); following this, the mixture was vortexed vigorously for 10 min and then sonicated for 30 min at 20 using an ultrasonic water bath (53 kHz). Extracts were then vortexed vigorously for 40 min and centrifuged at 13,523 rcf for 20 min at 21 • C to remove plant debris. The supernatant was collected and transferred into a clean tube, and after the extraction of each sample, the supernatant was concentrated by lyophilisation. QC samples were prepared by pooling aliquots (10 µL) from all extracted samples. QC samples were dried again completely using the lyophiliser. The dried samples were resuspended in 60 µL of methoxyamine hydrochloride solution in pyridine (20 mg/mL), then vortexed for thorough mixing and thereafter incubated in a thermomixer for 2 h at 37 • C with 60 rcf. MSTFA (110 µL) (Sigma Aldrich, St.Louis, MO, USA) was then added to each sample, and then we vortexed all the samples and incubated in a thermomixer for 30 min at 37 • C with 60 rcf and injected the samples into GC-MS for analysis.
The GC-MS analysis was performed using an Agilent Technologies 7980A gas chromatography system with the 5977A mass selective detector (Santa Carla, CA, USA). The HP5-MS column with a dimension of 30 mL × 250 µm with film thicknesses of 0.25 µm was used for obtaining the peak separation in the chromatogram. Helium in a split ratio of 3:1 and a flow rate of 1.5 mL/min was used as the carrier gas. The running condition for the samples was 70 • C for 2 min as initial hold and heating ramp of 12.5 • C/min until the temperature reached 295 • C, and finally, with a ramp rate of 25 • C/min, the temperature reached 320 • C, with 5 min as a final hold. Mass spectrometry was conducted at 230 • C as a transfer line and ion source temperature, while 150 • C was the quadrupole temperature, 70 eV the ionisation potential, and 40 to 700 the atomic mass units scan range. Identification of the compounds was performed by comparing their retention indices with those of known compounds obtained by injecting a mixture of standards containing a homologous series of C7-C30 alkanes analysed in the same column under the same chromatographic conditions.

FT-NIR Spectroscopy Analysis
FT-NIR spectra were acquired using the ANTARIS II FT-NIR spectrometer (Thermo Scientific Co., Waltham, MA, USA) equipped with an interferometer and an integrated sphere. Approximately 1 g of weighed powdered samples were placed in glass vials, and spectra were recorded in the range of 10,000-4000 cm −1 by using 64 scans. The FT-NIR raw datasets were measured with a spectral resolution of 4 cm −1 , resulting in 1557 variables. The FT-NIR reflectance spectra were expressed as log (1/R), where R is the reflectance. In order to remove any systematic variation in the model, sample spectra were randomly generated. All spectral measurements were carried out at room temperature (26 ± 1 • C).

Data Processing and Analysis
The raw LC-MS and GC-MS files were processed using Agilent Mass Hunter software and Progenesis. Automated peak detection, retention time alignment, and peak matching were performed, and the data matrix consisted of Rt/(m/z); samples and intensities were further used for statistical analysis [24,34]. Total area normalisation of samples and scaling were performed to make features more comparable in magnitude to each other. Multivariate statistical analysis was performed by using the online platform R. Principal component analysis (PCA) was performed in order to have a better visualisation of all the information contained in the dataset. To further identify the differential metabolites that account for the separation between groups, supervised PLS-DA was used [35]. The developed PLS-DA model was validated using the leave one out cross validation method, and its quality was assessed on R 2 and Q 2 scores [36]. Furthermore, this model was validated using 1000 times permutation tests [36]. The PLS-DA model generates variable importance in projection (VIP) scores. Metabolites with VIP values greater than 1 were identified as potential differential compounds [37]. The unpaired t-test was used to perform univariate analysis. Hierarchical cluster analysis was performed for identifying relatively homogenous clusters of various sample groups on the basis of measured characteristics. For LC-MS, metabolites with mass error less than 5 ppm were considered, and further MS/MS identification of metabolites were performed. These were identified using Metlin, HMDB, PubChem, and KEGG libraries and using an inhouse library. An Automated Mass Spectral Deconvolution and Identification System (version 2.0, NIST, Gaithersburg, MD, USA) was used to perform deconvolution, which enabled us to extract the clean mass spectra from a complex process and helps in the correct identification of the metabolites. The spectra obtained after deconvolution were compared with the pure spectra of standards, when available, and the spectra available, with the NIST library (a match quality of 90% minimum was used as a criterion (v 2.2 g distributed with NIST 2014, USA)) [38].

One Class Classification Model
MATLAB R2021a version 9.10 (MathWorks, Inc., Natick, MA, USA) was used for performing DD-SIMCA models [35]. This one class classifier method is meant to distinguish objects of one target class from all other objects and classes. R version 3.4.1 was used for performing K-nearest neighbour classification [39,40].

Results and Discussion
Plant species identification is one of the essential aspects. Identification of species or a variety of natural medicinal herbs is of great interest to select the correct plants with specific pharmacologically active secondary metabolite profiles, ensuring no adulteration along its complex production chain, and protecting the product's commercial value. There is a growing body of evidence that metabolic fingerprinting can be used to identify the species/variety characterisation of natural medicines. Despite Ocimum being widely used across the world as a traditional medicine and in pharmaceutical and food industries to add a distinctive flavour and aroma to foods, the comprehensive and rapid metabolic fingerprinting of species and varieties of this herb has not been well defined. Therefore, the LC-MS, GC-MS, and FT-NIR-based comprehensive metabolic fingerprinting approach was employed to identify marker compounds and spectral fingerprints to classify samples on the basis of species/varieties.

LC-MS-Based Metabolic Profiling of Ocimum Leaves Identified Species-Specific Variation
Untargeted metabolite profiling using LC-QTOF-MS was performed in both ESI (+) and ESI (−) ionisation modes to cover the maximum metabolome of Ocimum samples. The corresponding QC plots for pooled quality control samples explained the tight clustering of samples, as shown in Figure S1A,B. The corresponding TIC of the acquired metabolite profiles in both ESI (−) and ESI (+) ionisation modes are shown in Figures S2 and S3, respectively. PCA was performed to gain a better visualisation of all the information presented in the datasets collected as it was possible to identify the differences among the various sample groups by projecting dataset objects into the space of the first few principal components. The unsupervised PCA obtained from the LC-MS spectra of all samples revealed the general structure of the complete dataset, in which the first two principal components accounted for 47.4% and 50.8% of the total variation in negative ionisation mode and positive ionisation mode, respectively ( Figure S4A The corresponding loading plot responsible for the clustering of five different Ocimum species samples is shown in Figure S6. Tenfold cross-validation was further performed to find the predictive accuracy and fit of the polynomial model ( Figure S6A,B). The PLS-DA cumulative values with Accuracy = 1.0, R2 = 0.9982, Q2 = 0.9949 for negative ionisation mode and Accuracy = 1.0, R2 = 0.9993, Q2 = 0.9965 for positive ionisation mode showed a good fit of the model. Furthermore, to assess the statistical significance of these deceptively highly predictive multivariate models, permutation tests were conducted by validating the models with 1000 permutation tests ( Figure S7C,D). From the analysis of these distributions, the significance of the power of the optimal models to predict the Ocimum sample metabolite profile was determined to be p < 0.001. To correlate different Ocimum sample groups, Pearson correlations were performed between sample groups, and each Ocimum species sample group had a strong positive correlation with the corresponding sample groups ( Figure 1C). In total, 3626 metabolite features were detected both in positive and negative ionisation modes. The identified metabolites were grouped into nine different chemical classes, including terpenes, amino acids, phenolics, lipids, flavonoids, anthraquinones, sterols, sugars, and aromatic compounds. Unsupervised hierarchical cluster analysis was performed for the quantitative analysis of nine metabolite classes from these five different Ocimum species samples (Figure 2A). O. basilicum, O. africanum and Hybrid tulsi, and O. kilimandscharicum were clustered together while leaving O. sanctum samples separately. There was a relatively higher quantity of terpenes and phenolics present in O. basilicum samples, while O. africanum contained a high concentration of amino acids. Conversely, high quantities of flavonoids, phenolics, anthraquenones, sterols, sugars, and aromatic compounds were present in O. sanctum samples. Importantly phenolic and flavonoid metabolite classes were responsible for the species-specific discrimination of Ocimum samples according to VIP values ( Figure 2B). The corresponding loading plot responsible for the clustering of five different Oc mum species samples is shown in Figure S6. Tenfold cross-validation was further pe formed to find the predictive accuracy and fit of the polynomial model ( Figure S6A,B The PLS-DA cumulative values with Accuracy = 1.0, R2 = 0.9982, Q2 = 0.9949 for negativ ionisation mode and Accuracy = 1.0, R2 = 0.9993, Q2 = 0.9965 for positive ionisation mod showed a good fit of the model. Furthermore, to assess the statistical significance of the deceptively highly predictive multivariate models, permutation tests were conducted b validating the models with 1000 permutation tests ( Figure S7C,D). From the analysis these distributions, the significance of the power of the optimal models to predict the Oc mum sample metabolite profile was determined to be p < 0.001. To correlate different Oc mum sample groups, Pearson correlations were performed between sample groups, an each Ocimum species sample group had a strong positive correlation with the correspon ing sample groups ( Figure 1C). In total, 3626 metabolite features were detected both positive and negative ionisation modes. The identified metabolites were grouped in nine different chemical classes, including terpenes, amino acids, phenolics, lipids, flav noids, anthraquinones, sterols, sugars, and aromatic compounds. Unsupervised hiera chical cluster analysis was performed for the quantitative analysis of nine metabolite cla ses from these five different Ocimum species samples (Figure 2A   In the present study, key flavonoids were identified as significant marker metabolites for the differentiation of species-specific Ocimum samples (Figure 3). These flavonoids are derived from the phenylpropanoid and flavone synthesis pathway [41]. The crucial func- In the present study, key flavonoids were identified as significant marker metabolites for the differentiation of species-specific Ocimum samples (Figure 3). These flavonoids are derived from the phenylpropanoid and flavone synthesis pathway [41]. The crucial function of flavonoids in plants are to protect them against various abiotic (salt, drought, UV radiation, and heat) and biotic stresses such as pathogen and herbivore attacks [42].  Rosmarinic acid is an ester of 3,4-dihydroxyphenyllactic acid and caffeic acid, and both were found abundance in plants of the Lamiaceae family biosynthesised from phenylpropanoid-and tyrosine-derived pathways [43]. Plants use this antioxidant phenolic compound for their defence system [43]. In the present study, high concentrations of rosmarinic acid were present in O. sanctum samples in comparison to other species. Carvacrol is a monoterpene phenol with an abundant presence in many aromatic plants. This metabolite is produced through the methylerythritol (MEP) pathway obtained from isopentyl diphosphate (IDP) and dimethylallyl diphosphate (DMADP) in plastid [44]. In the present study, a relatively high concentration of carvacrol was present in O. kilimandscharicum samples in comparison to other Ocimum species. Salvianolic acid, a phenolic metabolite derived from the phenylpropanoid pathway, promotes osmotic stress survival in plants. Rosmarinic acid is an ester of 3,4-dihydroxyphenyllactic acid and caffeic acid, and both were found abundance in plants of the Lamiaceae family biosynthesised from phenylpropanoidand tyrosine-derived pathways [43]. Plants use this antioxidant phenolic compound for their defence system [43]. In the present study, high concentrations of rosmarinic acid were present in O. sanctum samples in comparison to other species. Carvacrol is a monoterpene phenol with an abundant presence in many aromatic plants. This metabolite is produced through the methylerythritol (MEP) pathway obtained from isopentyl diphosphate (IDP) and dimethylallyl diphosphate (DMADP) in plastid [44]. In the present study, a relatively high concentration of carvacrol was present in O. kilimandscharicum samples in comparison to other Ocimum species. Salvianolic acid, a phenolic metabolite derived from the phenylpropanoid pathway, promotes osmotic stress survival in plants. In the present study, a relatively high concentration of levels of this metabolite was found in Ocimum basilicum in comparison to other Ocimum species samples. Carvacrol, a monoterpenoid phenol, is derived mainly through the methyl-erythritol-phosphate (MEP) pathway in the plastids. This molecule is known for its high antimicrobial activity. Substantial concentration of carvacrol was found in O. kilimandscharicum. Caftaric acid, a phenolic acid, provides plant better UVB protection [45]. In the present study, a high concentration of caftaric acid was found in O. africanum, O. basilicum, and O. sanctum samples. Further, the concentrations of phenolic acid metabolites coniferaldehyde (precursor of eugenol) and protocatechuic acid were found to be relatively higher in O. sanctum samples and Hybrid Tulsi samples, respectively, in comparison to other samples. Random forest classification classified all samples correctly with an OOB error 0 ( Figure S8). Further untargeted metabolite profiling classifies species variety-specific variation of Ocimum samples ( Figure S9).

GC-MS-Based Metabolic Fingerprinting Identified Ocimum Species-Specific Variation
The GC-MS chromatographic profile of five Ocimum species is shown in Figure 4A. A tight clustering of QC samples explains the repeatability of the analytical system used for untargeted metabolite profiling ( Figure S9A). Initially, a non-parametric multivariate analysis method, PCA, was used to project GC-MS spectra into lower dimensional space so that inherent data structure with reduced dimensional representation of original data can be revealed. The PCA model obtained revealed a general structure of the complete dataset, in which the first two principal components cumulatively accounted for 50% of the total variation, with PC1 accounting for 29.  Figure 4B). The corresponding loading plot that was responsible for the observed separation between Ocimum species samples is shown in Figure 4C. Fivefold cross-validation was further performed to find the predictive accuracy and fit of the polynomial model ( Figure S11A). A permutation test was performed to assess the statistical significance of these apparently highly predictive multivariate models. For this, the supervised models were validated using 1000 permutation tests ( Figure S11B). From the analysis of these distributions, the significance of the power of the optimal models to predict the metabolic profiles of sample groups was determined to be p < 0.001. Specific metabolites responsible for the species-specific discrimination of Ocimum samples were further identified using VIP values obtained from the corresponding loading plot (Table S3). Nine metabolites, in which four of them were the primary metabolites malic acid, citramalic acid, ribose, and fructose, and the other five were the secondary metabolites quininic acid, eugenol, gluconic acid, quercetin, and shikimic acid, were considered as discriminatory markers to identify species-specific Ocimum samples ( Figure 4D). Malic acid, a TCA cycle intermediate, plays a crucial role in stomatal opening and closing in plant leaves [46]. Furthermore, this metabolite in mitochondria acts as a common reserve anion in plant vacuoles [46]. Citramalic acid, an analogue of malic acid, was also detected in Ocimum samples. This is a methyl derivative of malic acid derived from the C5-branched dibasic acid metabolism that takes place in the chloroplast stoma [46]. In the present study, a high concentration of malic acid and citramalic acid were present in O. kilimandscharicum in comparison to the other Ocimum species samples.
Ribose is a monosaccharide produced in plant cells through the pentose phosphate pathway that is essential for ATP production [47]. In addition to its critical role in energy production, this five-carbon chain sugar is a vital component of the synthesis of biomolecules, DNA, RNA, and acetyl coenzyme A [46,47]. Supplementing ribose externally in the soil/diet enhances plant growth. Furthermore, this metabolite helps plants to incur additional stress/shock following transplantation [48]. A high concentration of ribose was found in Ocimum sanctum in comparison to other Ocimum species. Fructose, a six-membered monosaccharide, is a secondary product of plant photosynthesis after glucose. This molecule is the sweetest naturally occurring sugar, estimated to be twice as sweet as sucrose with a fruity aroma. Further, this metabolite functions as a regulatory sugar and interacts with signalling by plant hormones [49]. In the present study, a high concentration of fructose was found in O. basilicum samples in comparison to other species.
Eugenol, a phenolic monoterpenoid derived from the phenylpropanoid pathway, is usually found in aromatic herbal plants [43]. Plants use this secondary metabolite as a defence molecule against microorganisms and pests, and as a floral attractant of pollinators. This molecule has high economic value as it has been widely used as an essential oil, Malic acid, a TCA cycle intermediate, plays a crucial role in stomatal opening and closing in plant leaves [46]. Furthermore, this metabolite in mitochondria acts as a common reserve anion in plant vacuoles [46]. Citramalic acid, an analogue of malic acid, was also detected in Ocimum samples. This is a methyl derivative of malic acid derived from the C5-branched dibasic acid metabolism that takes place in the chloroplast stoma [46]. In the present study, a high concentration of malic acid and citramalic acid were present in O. kilimandscharicum in comparison to the other Ocimum species samples.
Ribose is a monosaccharide produced in plant cells through the pentose phosphate pathway that is essential for ATP production [47]. In addition to its critical role in energy production, this five-carbon chain sugar is a vital component of the synthesis of biomolecules, DNA, RNA, and acetyl coenzyme A [46,47]. Supplementing ribose externally in the soil/diet enhances plant growth. Furthermore, this metabolite helps plants to incur additional stress/shock following transplantation [48]. A high concentration of ribose was found in Ocimum sanctum in comparison to other Ocimum species. Fructose, a six-membered monosaccharide, is a secondary product of plant photosynthesis after glucose. This molecule is the sweetest naturally occurring sugar, estimated to be twice as sweet as sucrose with a fruity aroma. Further, this metabolite functions as a regulatory sugar and interacts with signalling by plant hormones [49]. In the present study, a high concentration of fructose was found in O. basilicum samples in comparison to other species.
Eugenol, a phenolic monoterpenoid derived from the phenylpropanoid pathway, is usually found in aromatic herbal plants [43]. Plants use this secondary metabolite as a defence molecule against microorganisms and pests, and as a floral attractant of pollinators. This molecule has high economic value as it has been widely used as an essential oil, aroma ingredient, and food flavouring [50]. In addition, this molecule exhibits effective antioxidant activity. In the present study, a high concentration of eugenol was found in Ocimum sanctum samples in comparison to other Ocimum species. Shikimic acid is a crucial metabolite in plant metabolism for the synthesis of aromatic amino acids, tyrosine, tryptophan, phenylalanine, and vitamins, and the corresponding shikimate pathway is a major link between primary and secondary metabolism, responsible for the synthesis of different secondary metabolites [46]. In the present study, a high concentration of shikimic acid was found in Hybrid Tulsi samples. Quercetin, a penta hydroxyl flavanol derived from the phenylpropanoid pathway, potently provides plants with tolerance against several abiotic and biotic stresses. In the present study, a high concentration of quercetin was found in O. sanctum leaf samples in comparison to other Ocimum species samples. Further, the random forest classification model classified samples with OOB error 0 ( Figure S12).

Rapid Metabolic Fingerprinting with FT-NIR Identified Species-and Variety-Specific Variation of Ocimum Samples
FT-NIR is a spectroscopic technique that has been widely using in the authentication of herbal products and agricultural products, as well as in numerous natural product analyses [25,28]. However, to date, the rapid spectroscopic method combined with chemometrics has not been developed and applied to the authentication of species or varieties of Ocimum  35 replicates, were analysed in the range of 10,000-4000 cm −1 using FT-NIR spectroscopy, and the spectral profile obtained is shown in Figure 5. In all models, the best pre-processing was quantile normalisation followed by Pareto scaling. The average profiles of all samples are shown in Figure 5. The differences were essentially in the absorption intensities. FT-NIR peaks were attributed for stretching and bending vibrations that characterised the functional groups: (i) 4200-4800 cm −1 ; (ii) 4800-5250 cm −1 ; (iii) 5400-6000 cm −1 ; (iv) 6300-7200 cm −1 ; (v) 8000-8800 cm −1 . aroma ingredient, and food flavouring [50]. In addition, this molecule exhibits effective antioxidant activity. In the present study, a high concentration of eugenol was found in Ocimum sanctum samples in comparison to other Ocimum species. Shikimic acid is a crucial metabolite in plant metabolism for the synthesis of aromatic amino acids, tyrosine, tryptophan, phenylalanine, and vitamins, and the corresponding shikimate pathway is a major link between primary and secondary metabolism, responsible for the synthesis of different secondary metabolites [46]. In the present study, a high concentration of shikimic acid was found in Hybrid Tulsi samples. Quercetin, a penta hydroxyl flavanol derived from the phenylpropanoid pathway, potently provides plants with tolerance against several abiotic and biotic stresses. In the present study, a high concentration of quercetin was found in O. sanctum leaf samples in comparison to other Ocimum species samples. Further, the random forest classification model classified samples with OOB error 0 ( Figure S12).

Rapid Metabolic Fingerprinting with FT-NIR Identified Species-and Variety-Specific Variation of Ocimum Samples
FT-NIR is a spectroscopic technique that has been widely using in the authentication of herbal products and agricultural products, as well as in numerous natural product analyses [25,28]. However, to date, the rapid spectroscopic method combined with chemometrics has not been developed and applied to the authentication of species or varieties of  -v2), were analysed using FT-NIR. Overall, 280 samples, with each species/variety having 35 replicates, were analysed in the range of 10,000-4000 cm −1 using FT-NIR spectroscopy, and the spectral profile obtained is shown in Figure 5. In all models, the best pre-processing was quantile normalisation followed by Pareto scaling. The average profiles of all samples are shown in Figure 5. The differences were essentially in the absorption intensities. FT-NIR peaks were attributed for stretching and bending vibrations that characterised the functional groups: (i) 4200-4800 cm −1 ; (ii) 4800-5250 cm −1 ; (iii) 5400-6000 cm −1 ; (iv) 6300-7200 cm −1 ; (v) 8000-8800 cm −1 . From the spectral profiles obtained, it was observed that there were spectral differences in absorbance intensities of Ocimum samples. Chemometric models were built to discriminate between and classify the samples according to their species/variety. Initially, From the spectral profiles obtained, it was observed that there were spectral differences in absorbance intensities of Ocimum samples. Chemometric models were built to discriminate between and classify the samples according to their species/variety. Initially, a preliminary exploratory analysis of the data using PCA was employed. The unsupervised PCA model obtained from FT-NIR spectra of all samples revealed the general structure of the complete dataset, in which the first two principal components cumulatively accounted for 62.5% of the total variation, with PC1 most importantly accounting for 39% of variance, discriminating HT-v2, OS-v1, and OK from OA and OS-v2 samples ( Figure S13). PC2 was responsible for 23.5% variance for discriminating OS-v2 and HT-v2 samples from all other Ocimum samples ( Figure S13). Furthermore, supervised PLS-DA was performed additionally to find a small number of linear combinations of the original variables, which was predicted for the class membership, and that described most of the variability of the FT-NIR metabolic profile of all Ocimum group samples. As presented in Figure 5C, eight distinct clusters were identified in the PLS-DA scores plot, in which two components cumulatively accounted for 49.1% of the total variation, with the first component explaining 27.4% of the variation between OB-v1, OB-v2, HT-v1, HT-v2 and OS-v1, OS-v2, and OA and the second component explaining 21.7% of the variation. PLS-DA was used to validate individual models with 2/3 samples considered as the calibration set and the remaining 1/3 samples considered as the validation set ( Figure 6). The PLS-DA model, having 100% sensitivity and 100% specificity with 100% accuracy and 100% reliability, was obtained for the training sets and validation sets for all samples of Ocimum groups. Table 1 shows the values obtained for the merit figures for the complete PLS-DA model, and Figure 6 illustrates the corresponding predictions of Ocimum sample groups. The calculated values for the DD-SIMCA model using FT-NIR data are presented in a preliminary exploratory analysis of the data using PCA was employed. The unsupervised PCA model obtained from FT-NIR spectra of all samples revealed the general structure of the complete dataset, in which the first two principal components cumulatively accounted for 62.5% of the total variation, with PC1 most importantly accounting for 39% of variance, discriminating HT-v2, OS-v1, and OK from OA and OS-v2 samples ( Figure  S13). PC2 was responsible for 23.5% variance for discriminating OS-v2 and HT-v2 samples from all other Ocimum samples ( Figure S13). Furthermore, supervised PLS-DA was performed additionally to find a small number of linear combinations of the original variables, which was predicted for the class membership, and that described most of the variability of the FT-NIR metabolic profile of all Ocimum group samples. As presented in Figure 5C, eight distinct clusters were identified in the PLS-DA scores plot, in which two components cumulatively accounted for 49.1% of the total variation, with the first component explaining 27.4% of the variation between OB-v1, OB-v2, HT-v1, HT-v2 and OS-v1, OS-v2, and OA and the second component explaining 21.7% of the variation. PLS-DA was used to validate individual models with 2/3 samples considered as the calibration set and the remaining 1/3 samples considered as the validation set ( Figure 6). The PLS-DA model, having 100% sensitivity and 100% specificity with 100% accuracy and 100% reliability, was obtained for the training sets and validation sets for all samples of Ocimum groups. Table 1 shows the values obtained for the merit figures for the complete PLS-DA model, and Figure 6 illustrates the corresponding predictions of Ocimum sample groups.
The calculated values for the DD-SIMCA model using FT-NIR data are presented in Table  2, with 100% sensitivity of all Ocimum training sample groups. For the test samples set, 100% specificity was obtained for all groups of Ocimum samples.    Note: PCs denotes the number of principal components; DoF denotes the degree of freedom; SD denotes score of distance; OD denotes orthogonal distance; SEN denotes the sensitivity of the model; SPE denotes the specificity of the model; α and γ denote type I error and outlier significance level, respectively.
In the DD-SIMCA method, one class model was also used for classification. The model was used to identify species-and variety-specific Ocimum samples. The method consists of two-steps: Firstly, the decomposition of training data matrix by PCA and the secondly classification of new sample set with the derived principal components, represented by the acceptance area in the orthogonal distance (OD) vs. score distance (SD) as an accepted plot, with tan α value. This α value specifies a type 1 error, i.e., false negative decisions. Here, in the models, we considered performing external validation using 70% of the target class samples from each species/variety samples in the calibration set, as well as the remaining samples in the test. The models of the acceptance plots for training and test sets are shown in Figure 7. One hundred percent sensitivity and specificity were obtained for all four groups samples. The summary of DD-SIMCA performance is presented in Table 2.
Another supervised model, K-nearest neighbours, was used for the classification of Ocimum samples. Initially, the complete model (built with 1557 variables obtained in the FT-NIR) did not perform well. Classification using the top 18 variables (value less than 2% of the original quantity) had better results for all Ocimum group samples. In this case, we considered all 280 samples for the analysis. A total of 105 of 280 samples were randomly selected for model validation, i.e., the test set. The remaining 175 samples were considered as a training set. Factor K = 16 was used for classification in the region of 7500 to 4300 cm −1 .
The model classified all samples correctly with 100% sensitivity and specificity, and there were no cases of false positives and false negatives. The corresponding results are presented in Table S4. Another supervised model, K-nearest neighbours, was used for the classification of Ocimum samples. Initially, the complete model (built with 1557 variables obtained in the FT-NIR) did not perform well. Classification using the top 18 variables (value less than 2% of the original quantity) had better results for all Ocimum group samples. In this case, we considered all 280 samples for the analysis. A total of 105 of 280 samples were randomly selected for model validation, i.e., the test set. The remaining 175 samples were considered as a training set. Factor K = 16 was used for classification in the region of 7500 to 4300 cm −1 . The model classified all samples correctly with 100% sensitivity and specificity, and there were no cases of false positives and false negatives. The corresponding results are presented in Table S4.
Upon analysing the profiles, we noted that spectral regions related to hydroxyl (4817 cm −1 and 4913 cm −1 ) and C-O plus O-H combinations first overtone region (5210 to 5314 cm −1 ) were selected [51]. The variations in the absorption intensities of these regions are related to the main differences in composition between eight different varieties of Ocimum samples, justifying the selection of these variables.
Overall, the work presented here involves a comprehensive approach including both FT-NIR-based rapid fingerprinting and mass-spectrometry-based confirmatory methods for the identification of Ocimum species. The key strengths of the present work involve the rapid screening of Ocimum samples without any sample destruction and without the need of isolation of essential oils from Ocimum leaves for their species identification. Further, mass-spectrometry-based complementary approaches with minimal sample preparations for the species and variety confirmation of Ocimum samples are presented. Identification of Ocimum species in real time in the field is challenging. The present study needs to be Upon analysing the profiles, we noted that spectral regions related to hydroxyl (4817 cm −1 and 4913 cm −1 ) and C-O plus O-H combinations first overtone region (5210 to 5314 cm −1 ) were selected [51]. The variations in the absorption intensities of these regions are related to the main differences in composition between eight different varieties of Ocimum samples, justifying the selection of these variables.
Overall, the work presented here involves a comprehensive approach including both FT-NIR-based rapid fingerprinting and mass-spectrometry-based confirmatory methods for the identification of Ocimum species. The key strengths of the present work involve the rapid screening of Ocimum samples without any sample destruction and without the need of isolation of essential oils from Ocimum leaves for their species identification. Further, mass-spectrometry-based complementary approaches with minimal sample preparations for the species and variety confirmation of Ocimum samples are presented. Identification of Ocimum species in real time in the field is challenging. The present study needs to be further explored for the real-time screening of Ocimum leaves from the fields that would greatly help farmers.

Conclusions
Natural medicinal herbs consist of many diversified metabolites and the classification of species and varieties, and blends are difficult to accomplish. Comprehensive metabolomic fingerprinting approaches may offer rapid and confirmatory metabolite fin-gerprinting for the classification of medicinal herbs. In this study, we presented for the first time the species and variety discrimination of medicinal herb Ocimum samples using rapid FT-NIR-based metabolic fingerprinting and LC-MS-and GC-MS-based untargeted metabolomics with the aid of chemometrics including multivariate and one-class models. The high predictive ability of models including PLS-DA, DD-SIMCA, and KNN, in which all samples were correctly classified, was demonstrated. Untargeted LC-MS-based metabolomics identified flavanoids and phenolics as a major class of metabolites that distinguished species-specific variation of Ocimum samples. Moreover, metabolic fingerprinting identified sub-variety classification of different Ocimum species. The key advantage of the comprehensive metabolic fingerprinting system is that a large number of samples can be screened using NIR-based fingerprinting and non-confirming samples can be referred to for confirmatory analysis using LC-MS-based metabolite marker analysis. The present work demonstrated that using a two-tiered system of the rapid fingerprinting method alongside a confirmatory method is appropriate to classify Ocimum species and varieties. The present strategy can also be used for the classification of other natural products and herbs with various species, varieties, and chemotypes.
Supplementary Materials: The following supporting information can be downloaded at https://www. mdpi.com/article/10.3390/metabo13010122/s1, Supplementary information is available alongside the article. Figures S1-S13; Tables S1-S4. Figure Figure S8: Random forest classification of model Ocimum samples. Figure S9: Variety specific variations of Ocimum samples. Figure S10: QC plot and PCA plot for Ocimum species samples obtained by GC-MS. Figure S11: PLS-DA cross-validation results for Ocimum samples analysed by GC-MS. Figure S12: Random forest classification of models for Ocimum samples acquired using GC-MS. Figure S13: PCA analysis of FT-NIR spectral profiles of Ocimum samples from different species and varieties. Table S1: Ocimum species and the varieties used in the present study. Table S2: Discriminatory markers based on LC-MS/MS for Ocimum samples from different species. Table S3: Discriminatory markers for Ocimum samples from different species. Table S4: Classification results for test samples with K-Nearest Neighbors.
Funding: Authors would like to thank the Department of Science and Technology-SERB research grant (SRG/2021/000750-G); Department of Biotechnology, Government of India for Ramalingaswami grant (BT/RLF/Re-entry/21/2020) for fuding. Prof C.E would like to thank Bualuang ASEAN Chair Professor Fund. The authors would like to thank Aroma mission HCP0007.
Institutional Review Board Statement: Not applicable. Institutional review or ethics committee not involved in this study as the work did not involve humans or animals or their samples. CSIR-CIMAP publication communication number CIMAP/PUB/2022/139.

Informed Consent Statement:
Not applicable as the present study did not involve any human subjects.

Data Availability Statement:
The data presented in this study are available in this article and in the supplementary information.