Assessing the Impact of Geographical Distribution and Genetic Diversity on Metabolic Profiles of a Medicinal Plant, Embelia ribes Burm. f.

The extensive use of Embelia ribes Burm. f. (Embelia) in tribal medicine proclaimed global attention as a promising candidate in complementary and alternative medicine. The knowledge of chemical blends is a prerequisite for the selection of raw materials for herbal medicine formulations; however, the influence of geographical distance and genetic diversity on the metabolome of Embelia fruits is unknown. Therefore, we collected Embelia fruits from four locations across the Western Ghats of India and analyzed the metabolic profile and genotypic diversity of Embelia fruits by liquid chromatography-tandem mass spectrometry (LC-MS/MS) and inter simple sequence repeats (ISSR), respectively. LC-MS/MS analysis yielded 583 compounds; however, the trimmed data resulted in 149 compounds. Further, MS/MS analysis identified 36 compounds, among which we reported 30 compounds for the first time from Embelia. These compounds belong to 11 compound classes that suggest location-specific chemical blends of Embelia fruits. Multivariate analysis showed 94% compound diversity across the accessions. ISSR analysis suggests 95% polymorphism across the accessions. A significant positive correlation (80%) between metabolomics and genotypic data matrices validates the genotype’s influence in tuning Embelia’s metabolic profiles. We conclude that the chemical profiles of Embelia are location-specific, which can be explored for the selection of herbal trade sustainably.


Introduction
Embelia ribes Burm. f. (Embelia) [1] is a liana that belongs to the family Primulaceae. Since ancient times, the use of Embelia fruits has been widespread in the form of the drug Vidanga. Embelia's distribution in India ranges from the outer Himalayas to the Western Ghats, at an elevation of up to 1500 m [2] in fragmented populations. The sparse distribution of Embelia in the evergreen to moist deciduous forests of Maharashtra, Karnataka, and Kerala in the Western Ghats and Tamil Nadu in the Eastern Ghats, is evident. The different plant parts of Embelia are used in herbal formulations, as it is rich in various medicinally important molecules, namely embelin and its derivatives, embeliol, 5-O-Methylembelin, and vilangin that have high commercial value. Hence, the pharmacology of Embelia has attracted global attention as a promising candidate in various traditional, complementary, and alternative systems of medicine. In traditional and tribal medicine, the fruits, stems, roots, and leaves of Embelia are used as anthelmintic, antipyretic, and antimicrobial agents. In addition, they are also used to treat diarrhea, kidney stones, snake bites, and bronchitis. However, the fruits of Embelia are more common in traditional medicine and herbal formulations than their leaves and roots. Interestingly, the use of the same plant parts varies from region to region. For example, fruit decoction of Embelia is given for intestinal worms in Kerala [3], antimicrobial medication in Karnataka [4], diarrhea in Arunachal Pradesh, and influenza and snakebite in the Khandesh region of Maharashtra [5]. However, as the climatic condition of the Khandesh region is not suitable for Embelia growth, its use in this region questions the authentication of the used fruits.
The geographical origin and climatic conditions are the notable factors that affect the metabolome of a plant. Plants are adapted to the different geographical, climatic, and edaphic conditions by genotypic and phenotypic alterations. The genotypic alteration also influences the production and accumulation of secondary metabolites in plants [6,7]. Though the specialized metabolite profile is unique to individuals within a species or a close taxonomic group, it may alter if its biosynthetic pathways are influenced by environmental conditions such as climate, soil, pathogens infection, and pest infestation. Therefore, regional variation in the use of the same plant parts can be due to the presence of different blends or proportions of active compounds in them, which in turn links the geography and climate of the habitat of Embelia. In 2011, Saurabh demonstrated that geographical topography and climatic conditions profoundly affect the levels of polyphenols in the bark of Bridelia retusa [8]; similarly, the effect of geographical and climatic conditions on the camptothecin content in Nothapodytes nimmoniana is also demonstrated [9]. Recently, a comparative study of the concentrations of embelin, Embelia's principle component in its fruit from different geographical regions, underlined that the geographic location of a plant could govern the change in the concentration of its principal component [10]. Previously, a comparative analysis of fatty acid composition in the seeds of E. schimperi [11] and fatty acids and esters from E. basal have been attempted [12]; however, the non-targeted metabolomics of Embelia fruits from different geographical regions are yet to be revealed.
Metabolomics has been used for the identification of marker compounds to detect physiological changes, genotype differences, geographic origin, and quality control [13,14]. In a study with the fruit of Butia sp., rutin, epicatechin, isorhamnetin, and stilbene were identified as the significant markers in discriminating the geographic origins and species [15]. Moreover, with the globalization of traditional medicine systems such as Ayurvedic medicine and traditional Chinese medicine, many Asian medicinal plant species are being introduced to cultivation outside of their geographical origins, particularly in the EU and US [16]. As these ecosystems are significantly different from their native origin, these species suffer from measurable differences in chemical composition. In general, studies focus on a single compound or a class of compounds to evaluate the effect of climatic or geographical conditions [17]; however, a plant's whole metabolome is rarely considered for intensive analysis in response to changes in its geographical conditions. To improve global health through traditional and complementary medicine, the World Health Organization (WHO) emphasizes the rational use of raw materials based on evidence and strategic research in traditional and complementary medicine. Embelia, a tremendously important medicinal plant of traditional and complementary medicine, is widely exploited. Therefore, determining the metabolic profile of Embelia is a prerequisite for developing new drugs. However, the metabolome of Embelia and the impact of different geographical regions on it is not known. Determining Embelia's metabolic profile from different locations will provide its complete metabolic blend and reveal new compounds with potential medicinal importance. Moreover, it will allow the sustainable use of Embelia fruits. Therefore, in the current study, we analyzed the non-targeted metabolomic profiles of the Embelia fruits collected from different regions of the Western Ghats of India using liquid chromatography coupled with tandem mass spectroscopy (LC-MS/MS) to identify potential compounds for drug development, and attempt to reveal the relationship between the metabolomic and genotypic diversity.

Molecular Identification of Fruit Samples Collected from Different Geographical Regions
Fruit samples collected across all the accessions ( Figure 1A) were evaluated for their size and texture. Interestingly, fruits from Kodkani, Karnataka, and Wayanad, Kerala, are significantly larger and smaller than those from other accessions, respectively ( Figure 1B). The fruit size of the Kodkani accession is significantly more than the other accessions (one  Figure 1B inset). Dissimilarities in Embelia fruit morphology necessitate the molecular identification of the collected samples. Chloroplast maturase K gene (matK) is one of the most variable coding genes of angiosperms and is used for the identification of plant species. Molecular identification by matK amplification is often used for the accurate genetic identification of an organism. Therefore, all the samples were evaluated by comparing the sequence homology of the matK gene with the reported sequence of E. ribes in the NCBI database (Table S1). The maximum percent identity and query cover threshold was more than 98% and 85%, respectively. Further, a phylogenetic tree was constructed with the matK sequences from all the Embelia accessions, E. tsjeriamcottam (Roem. & Schult.) A. DC. and Maesa indica (Roxb.) A. DC. sequences ( Figure 1C); E. tsjeriam-cottam and M. indica were used as outgroup. Embelia accessions collected across the Western Ghats did not show any genetic distances, whereas the outgroups showed a significant genetic distance from the Embelia accessions ( Figure 1C). Our results confirmed the identity of the collected fruits as E. ribes.
Fruit samples collected across all the accessions ( Figure 1A) were evaluated for their size and texture. Interestingly, fruits from Kodkani, Karnataka, and Wayanad, Kerala, are significantly larger and smaller than those from other accessions, respectively ( Figure 1B). The fruit size of the Kodkani accession is significantly more than the other accessions (one way ANOVA, F4,18 = 12.91; p = 0.0002). Moreover, the fruits from Kodkani were spherical, whereas fruits from other locations were obovate. On the other hand, Wayanad fruits are the least wrinkled among other accessions ( Figure 1B inset). Dissimilarities in Embelia fruit morphology necessitate the molecular identification of the collected samples. Chloroplast maturase K gene (matK) is one of the most variable coding genes of angiosperms and is used for the identification of plant species. Molecular identification by matK amplification is often used for the accurate genetic identification of an organism. Therefore, all the samples were evaluated by comparing the sequence homology of the matK gene with the reported sequence of E. ribes in the NCBI database (Table S1). The maximum percent identity and query cover threshold was more than 98% and 85%, respectively. Further, a phylogenetic tree was constructed with the matK sequences from all the Embelia accessions, E. tsjeriam-cottam (Roem. & Schult.) A. DC. and Maesa indica (Roxb.) A. DC. sequences ( Figure 1C); E. tsjeriam-cottam and M. indica were used as outgroup. Embelia accessions collected across the Western Ghats did not show any genetic distances, whereas the outgroups showed a significant genetic distance from the Embelia accessions ( Figure  1C). Our results confirmed the identity of the collected fruits as E. ribes.

Metabolic Profiles of Embelia Fruits Vary across the Accessions
The ethonopharmacological uses of Embelia fruits are different across geographical regions. The use of Embelia fruits for different ailments suggests that fruits from different locations may have a different blend of compounds that are useful for specific diseases. Therefore, we analyzed the metabolic profiles of Embelia fruits from different accessions. A total of 583 compounds are obtained from negative and positive modes from all the accessions (Data S1). The maximum number of compounds were present in the Kodkani accession (380 compounds); the Manoli accession is comprised of 352 compounds; Wayanad and Nadpal accessions are comprised of 346 and 341 compounds, respectively (Data S1).
The compounds present in all five replicates of any of the accessions are tentatively identified based on the formula. This trimmed data resulted in 149 compounds (Table S2), and the identity of 36 compounds was determined in MS/MS mode (Table 1, Figures S1 and S2). The identity of embelin, a prime compound of Embelia, from fruits samples was confirmed by HPLC ( Figure S3). All further analysis was done with the 36 identified compounds. Similar to the raw data, Kodkani accession possesses the maximum number of compounds (34), followed by Wayanad (27), Manoli (8), and Nadpal (6). Interestingly, out of 36 identified compounds, 30 are reported for the first time from Embelia.
A heatmap was created with the identified compounds to visualize the compound diversity ( Figure 2). Principal coordinate (PCoA) and cluster analysis determined the relationship between the accessions. PCoA ordination captured maximum variation in coordinates 1 and 2, where Kodkani and Wayanad accession form a group, but Manoli and Nadpal accession is placed separately ( Figure 3A). Like the PCoA, the cluster analysis also placed Kodkani and Wayanad accessions in the same clade ( Figure 3B); Manoli accession is close to this cluster but in a separate clade. The three accessions are grouped as one large cluster that is further connected to a separated clade of Nadpal accession. The high bootstrap value (99) of the cluster of Kodkani and Wayanad accessions signifies their metabolic similarity compared to the other accessions.
We look further to find the compound similarities across the accessions by creating the Venn diagram ( Figure 3C, Table 1). We found that only two compounds, embelin and n-propyl sec-butyl disulfide, are present in all the accessions. The number of unique compounds present in Manoli (hexenyl-(3z)-hexenoate (3z-)) and Wayanad (bolegrivialol) accessions is one; however, Kodkani accession possesses six unique compounds ( Figure 3C, Table 1) that belong to the phytophenols, namely 3,4,5-Trihydroxyflavanone, Dihydro 3-coumaric acid, quercetin, isoquercitrin, naringenin, and piceatannol (Table 1). On the contrary, Nadpal accession did not show the occurrence of any unique compounds. Moreover, through the pairwise analysis of similarity percentage (SIMPER test; Bray-Curtis dissimilarity index) in the compounds at a 50% cut-off, we identified 15 compounds responsible for the grouping accessions (Table S3).

Identified Compounds Represent Major Chemical Groups and Potential Medicinal Properties
Compounds identified by MS/MS analysis were classified according to their chemical classes, which include quinones, phytophenols, terpenoids, organosulfur compounds, organic acids, aliphatic hydrocarbons, lipids, oxygenated hydrocarbons, organic oxygen compounds, organic heterocyclic compounds, and amino acids ( Figure 4). Interestingly, Kodkani accession possesses all the 11 groups of compounds that suggest it as the most diverse accession among all. Compounds present in Manoli, Nadpal, and Wayanad accessions represent six, four, and 10 compound groups, respectively. Quinones and organosulfur compounds are present across all the accessions; however, different accessions show a different blend of compounds ( Figure 4). Reports suggest that most identified compounds are active against different ailments mainly cancer, cardiac disorder, and oxidative stress (Table S4).

Accessions of Embelia Are Genetically Diverse
Differences in fruit phenotype hinted towards the genetic diversity of Embelia across the accessions. Moreover, genetic variability is one of the prime factors that tunes plants' metabolic profiles. Therefore, to check if the genotypic diversity orchestrates the metabolic variations across the Embelia accessions, we examined the genotypic diversity of Embelia Kodkani accession showed all 11 groups, followed by ten groups in Wayand, six in Manoli, and four in Nadpal.

Accessions of Embelia Are Genetically Diverse
Differences in fruit phenotype hinted towards the genetic diversity of Embelia across the accessions. Moreover, genetic variability is one of the prime factors that tunes plants' metabolic profiles. Therefore, to check if the genotypic diversity orchestrates the metabolic variations across the Embelia accessions, we examined the genotypic diversity of Embelia accessions with inter simple sequence repeat (ISSR) markers. As demonstrated in a previous study, we used nine ISSR markers (Table S5) that are polymorphic for Embelia plants [18]. All the markers yielded robust and reproducible polymorphic amplification patterns. Ninety-four bands were generated, with an average of 10.4 products per primer. Among these, 90 bands (95.74%) were polymorphic and four (4.25%) were monomorphic (Table S6). The number of polymorphic bands ranged from six (in the case of primer 816) to 13 (in the case of primer 857) (Table S6). While the primers 809, 857, and 881, had a high percentage of polymorphic bands (PPB), primer 816 showed the lowest PPB (6.38%) (Table S6). A hierarchical cluster was generated with the binary matrix using an unweighted neighborjoining method with Jaccard's coefficient of dissimilarity that clustered the accessions into two main clades, and the clades having Embelia accessions are further formed into three clades ( Figure 5). The presence of Wayanad and Kodkani accessions in the same clade suggests that they share the maximum similarity. Interestingly, Manoli accession shares similarities with Wayanad, Kodkani, and Nadpal accessions. E. tsjeriam-cottam was used as an outgroup that showed significant differences with Embelia accessions and clustered separately. In the Embelia cluster, the high percentage of polymorphism suggests high genetic variability among the accessions under study.
(in the case of primer 857) (Table S6). While the primers 809, 857, and 881, had a high percentage of polymorphic bands (PPB), primer 816 showed the lowest PPB (6.38%) (Table  S6). A hierarchical cluster was generated with the binary matrix using an unweighted neighbor-joining method with Jaccard's coefficient of dissimilarity that clustered the accessions into two main clades, and the clades having Embelia accessions are further formed into three clades ( Figure 5). The presence of Wayanad and Kodkani accessions in the same clade suggests that they share the maximum similarity. Interestingly, Manoli accession shares similarities with Wayanad, Kodkani, and Nadpal accessions. E. tsjeriamcottam was used as an outgroup that showed significant differences with Embelia accessions and clustered separately. In the Embelia cluster, the high percentage of polymorphism suggests high genetic variability among the accessions under study.

Figure 5. Genotypic diversity among Embelia ribes accessions from different geographical regions.
After scoring the amplified products on Embelia and E. tsjeriam-cottam by nine IISR primers, obtained binary data was used to generate a phylogenetic tree. E. tsjeriam-cottam was used as an outgroup. Wayanad and Kodkani accession share the same clade, whereas the Manoli accession connects the Nadpal accession with Wayanad and Kodkani accessions. The outgroup is forming a separate clade. The scale at the bottom of the phylogenetic tree quantifies the Jaccard distance between the genotypes. Numbers at each tree node indicate the bootstrap value of the respective cluster computed at the 1000 bootstrapping threshold.

Metabolic Diversity in Embelia Fruits Correlates with Genotypic Diversity
As the cluster analysis with metabolite and ISSR matrix showed similar grouping of the accessions, we hypothesized that metabolomic variation in different Embelia accession is due to their genotypic diversity. Therefore, we performed Mantel's test with both matrices. Mantel's test showed significant positive correlation (R = 0.799; p = 0.043) between genotypic and metabolic data matrix. Therefore, we conclude that their genotype governs the metabolic diversity in Embelia fruits along the Western Ghats accessions. After scoring the amplified products on Embelia and E. tsjeriam-cottam by nine IISR primers, obtained binary data was used to generate a phylogenetic tree. E. tsjeriam-cottam was used as an outgroup. Wayanad and Kodkani accession share the same clade, whereas the Manoli accession connects the Nadpal accession with Wayanad and Kodkani accessions. The outgroup is forming a separate clade. The scale at the bottom of the phylogenetic tree quantifies the Jaccard distance between the genotypes. Numbers at each tree node indicate the bootstrap value of the respective cluster computed at the 1000 bootstrapping threshold.

Metabolic Diversity in Embelia Fruits Correlates with Genotypic Diversity
As the cluster analysis with metabolite and ISSR matrix showed similar grouping of the accessions, we hypothesized that metabolomic variation in different Embelia accession is due to their genotypic diversity. Therefore, we performed Mantel's test with both matrices. Mantel's test showed significant positive correlation (R = 0.799; p = 0.043) between genotypic and metabolic data matrix. Therefore, we conclude that their genotype governs the metabolic diversity in Embelia fruits along the Western Ghats accessions.

Discussion
The different parts of Embelia plants, especially the fruits, are predominantly used in folklore medicine. However, a detailed study of the active principles and their benefits and risks need to be evaluated to integrate traditional medicine practices into healthcare systems. However, limited information on the chemical profiles of Embelia fruits made it challenging to predict the compound or blend of compounds presumably responsible for a specific ailment's remedy. The geographical region-specific use of Embelia fruits against various ailments stemmed us to study their metabolic profiles across different geographical regions. Our study is the first attempt at the non-targeted metabolic profiling of Embelia fruits that reveals Embelia's chemical diversity associated with their geographical location, genotypic diversity, and ethnopharmacology. Moreover, new compounds can be explored further for their functionality against ailments.
Our study highlighted that the metabolic blend of Embelia fruit varies across geographical regions. The climatic and edaphic differences restricted to a particular geographical area could govern the chemistry of a plant [8]. Chemical diversity in the plants also validates the region-specific use of the Embelia fruits against various ailments. Unfortunately, the metabolic profile of Embelia fruit is mainly unknown, as most of the studies focused on the major chemical groups, namely, quinones, flavonoids, essential oils, etc. [10,19,20]. We demonstrate a spectrum of varied compounds across the different accessions from the Western Ghats of India. On a geographical scale, two Karnataka accessions, Nadpal and Kodkani, are situated in proximity with an aerial distance of 132.35 km. Therefore, these accessions share fewer climatic and topographical variations. However, the multivariate analysis of the chemical compositions did not group Kodkani and Nadpal; surprisingly, Kodkani is grouped with Wayanad accession, which is 361.38 km (aerial distance) away from the Kodkani accession. Interestingly, the Manoli accession is 361.38 km and 632.65 km away from Kodkani and Wayanad accession, respectively, and showed similar percent similarity in the compound profile with both locations. These observations question the contribution of geographical parameters to the chemical diversity among different locations. Plant populations may respond genetically to differential selection pressures brought by environmental factors, such as climatic heterogeneity and geographic isolation [21], and plants' genetic diversity could be the major player in their observed metabolic diversity. An analysis of genetic diversity with ISSR markers revealed high genetic variation across the accessions. Embelia requires particular geographical conditions to grow and distribute in patches; therefore, high genetic variability among the Embelia accessions contradicts the fact that geographically restricted species tend to have less genetic variation than the standard widespread species [22]. On the other hand, reports also suggest that species distributed in patches showed more significant differentiation than the more continuously distributed species [23]. Moreover, species with small populations and less genetic variability are vulnerable to extinction [24]. Therefore, presumably, being a sparsely distributed and threatened species, Embelia showed high genetic variation among the accessions, making it less prone to extinction. A significant correlation between the matrices of ISSR and metabolic data confirms that metabolic diversity in Embelia fruit is attributed to its genetic variability.
In traditional and folklore medicine, Embelia fruits are mainly used as antihelmintic, antidyspepsia, appetizer, mild-laxative, carminative, alexiteric, and antipyretic. However, to treat a particular disorder successfully, the presence of a specific compound or a blend of compounds is necessary. Chemical variation in the plant sample can influence the effectiveness of the formulated drugs against a particular disease. Therefore, the selecting of raw materials based on their chemical composition is a prerequisite. Variations in the traditional use maybe rely on this hypothesis. Therefore, we categorized the identified compounds from Embelia fruits according to their functionality. The compounds having antipyretic, antihelmintic, and antioxidant activity, is fairly equally distributed throughout all the accessions (Table S4); this supports the use of Embelia fruits as antipyretic and antihelmintic across all regions. In folklore medicine, Embelia fruits are used for the treatment of cancer, mainly in Karnataka [25]. Interestingly, Embelia fruits from Kodkani, one of the Karnataka accessions, are dominated by compounds with anticancerous properties. Previous reports have designated embelin and its derivatives from Embelia as anticancerous compounds; however, we identified other compounds with similar properties but different mechanisms of action. Moreover, piceatannol, a unique compound present in Kodkani accession, is a hypoglycemic agent [26,27]; this correlates with the use of Embelia fruits against polyuria, which may cause by the hyperglycemic condition. Compounds that control cardiac rhythm and have anticoagulant properties from the Karnataka accessions also justified the local use of Embelia fruits against cardiac ailments. There is a growing importance of phytochemicals as male contraceptives because the anti-fertility effects of phytochemicals are reversible. For instance, sterility induced by embelin in male albino rats was reversed within 15-30 days [28]. Therefore, the chemical constituents showed that anti-fertility properties are clinically crucial for developing male contraceptives [29]. Polyphenols from Mucuna urens. can inhibit the endogenous gonadotrophic activity in male albino rats [30]; therefore, the presence of different phytophenols can be examined as a potential target to develop male contraceptives. The current study is the first non-targeted metabolomics study of Embelia fruits that highlighted the region-specific compounds blend governed by intraspecific genetic variations. We found a few compounds, namely 1-pentanesulfenothoic acid, 2-formylglutarate, ethyl 1-methylpropyl disulfide, galactonic acid, 3-isopropylmalic acid, and 3-furoic acid, whose functions are unknown. Testing these compounds against mammalian cell lines may reveal their new potential use. Moreover, the generated metabolomic profiles can be utilized as a metabolic fingerprint of the accessions during the formulation of new herbal drugs. The identification of favorable chemotypes will support quality control and the sustainable use of the plant materials.

Plant Material
Embelia fruits were used for metabolic and genetic diversity analysis. Fruits collected by the local villagers from the forest around Manoli, Kolhapur, were procured from local roadside sellers; fruits were also collected from the forests of Nadpal, Karnataka (Nadpal accession), Kodkani, Karnataka (Kodkani accession), and Wayanad, Kerala (Wayanad accession). The leaf tissues of E. tsjeriam-cottam and Maesa indica were collected from the Savitribai Phule Pune University (SPPU) medicinal garden and used for genetic diversity analysis as an outgroup. Specimen samples from each location and genotype are deposited at the Department of Botany, SPPU.

Metabolite Extraction and LC-MS/MS Acquisition
Dried Embelia fruits were pulverized and~40 mg of tissue was extracted in 400 µL 70% methanol (v/v) spiked with the internal standard formononetin (2 µg ml −1 ) by vortexing overnight. The extracts were centrifuged for 20 min at 13,000 rpm at 4 • C and the supernatant was collected. Further, the supernatant was filtered through microfilters (20 µm) and subjected to LC-QTOF-MS/MS (Agilent Technologies, Stuttgart, Germany) for analysis.
A Zorbax C18 (1.7 µm, 2.1 mm × 100 mm) column was used to separate metabolites. The mobile phases used were 0.1% formic acid (solvent A) and acetonitrile containing 0.1% formic acid (solvent B). Separation was achieved at a flow rate of 0.3 mL min −1 and a column temperature of 25 • C. The solvent gradient profile followed an initial isocratic separation with 95% A (1 min) followed by the gradient of solvent A 5%, B 95% until 12.00 min, solvent A 95%, and B 5%, until 12.5 min, that extended up to 16th minute. The mass spectrometer was used in centroid mode for both negative and positive ionization with a pump limit of 1 min, draw speed of 200 µL min −1 , and eject speed of 400 µL min −1 . Samples were analyzed with an injection volume of 10 µL where the pressure limit in the column was maintained at 0 to 800 bar, retention time (RT) exclusion tolerance was maintained at (±) 0.2 min, and ion source (dual ESI) with a limit of two precursors per minute.

LC-MS/MS Data Analysis
The personal compound database and library (PCDL) Mass Hunter Qualitative Analysis BO.07.00 tool (Agilent technologies) was used to analyze the compound spectra. The PCDL library was made by accessing mainly the METLIN database, all possible libraries for the medicinally important and general plant metabolites and phytochemicals associated with medicinally essential plants. MS/MS acquisition was performed with five replicate fruits to examine the biological variations within the accession and reproducibility. To eliminate the background contaminant compounds from the analysis, only the compounds with an abundance greater than 10,000 counts and a score of more than 70 are considered. These trimmed compounds were screened against PCDL library and resulted in 3574 compounds. Compounds were identified using the 'find by formula' function in the software package with a mass threshold of 7 ppm and a peak distance threshold of 10 ppm in MS mode. The resulting 583 compounds were further trimmed based on their presence in all five replicates of any accessions resulting in 149 compounds across the accessions. These compounds were analyzed in MS/MS mode for the presence of the daughter ions to confirm their identity. Finally, 36 compounds were identified in MS/MS mode (Table 1, Figures S1 and S2).

Genomic DNA Isolation, and Polymerase Chain Reaction (PCR)
DNA was isolated using a CTAB-STE method by Doyle and Doyle with modifications [31,32]. Isolated DNA was dissolved in TE buffer and quantified in Nanodrop (ND-1000 Spectrophotometer). 100 ng of isolated DNA was amplified using the matK genespecific primer pairs (forward 5 TCCGCTACTGGGTAAAAGATG 3 , reverse 5 ATATCGCCCCAAATCGGTCA 3 ) and nine inter simple sequence repeat (ISSR) primers (Table S5) in a thermocycler (Biorad) with one cycle of initial denaturation (95 • C for 2 min), 40 cycles (matk) or 34 cycles (ISSR) of denaturation (95 • C for 15 s), annealing (60 • C for matK or 50-60 • C for ISSR) (Table S5) for 1 min, extension (72 • C for 30 s), and one cycle of final elongation (72 • C for 7 min). The amplified DNA was separated on a 1.5% agarose gel and documented on a gel documentation system (Biorad).

Sequencing and Data Analysis
Amplified products by matK primers were outsourced for Sanger sequencing (Bioserve). Obtained sequences were analyzed using the CROMAS (version 2.6.6) software and aligned using multiple sequence comparison by log expectation (MUSCLE www.ebi.ec.uk (accessed on August 2021). Homology searches were performed within the Genbank non-reductant database using the BLAST algorithm (http://www.ncbi.nim.nih.gov./BLAST/ (accessed on 31 August 2021). All the matK sequences generated have been deposited in the NCBI GenBank database (Accession numbers: Bank It2604113: Seq1-OP081086, Seq2-OP081087, Seq 3-OP081088, Seq4-OP081089, Seq5-OP081090, and Seq6-OP081091). Evolutionary analyses were conducted in the MEGA X (version 10.0.5) to generate an optimal tree with a sum of branch length 6.743 and a bootstrap of 1000 replicates. The evolutionary distances were computed using the maximum composite likelihood method with Tamura 3 parameter model and are in the units of the number of base substitutions per site. All ambiguous positions were removed for each sequence pair using the pairwise deletion option. There was a total of 831 positions in the final dataset. The tree was drawn to scale, with branch lengths in the same units as the evolutionary distances.

Analysis of Genetic Diversity and Construction of Phylogenetic Tree
Amplicons obtained from each ISSR primer are scored for their presence (1) and absence (0) across the accessions to analyze the genetic diversity. The percentage of polymorphism and polymorphic bands were calculated across the accessions and the used primers, respectively [18]. The binary data matrix was converted into a genetic similarity matrix, and a neighbor-joining tree was obtained with maximum likelihood using the Jaccard coefficient with DARwin 6 (6.0.21) genetic analysis tool. The length of the obtained tree branches was verified with the unweighted neighbor-joining method.

Statistical Analysis
Data were analyzed by one-way ANOVA followed by Tukey's post hoc test, and the significance was determined at p ≤ 0.05. Multivariate analysis (PCoA, clustering, SIMPER) of metabolomics data was done in PAST 3 [33]. DARwin 6 genetic analysis tool [34] and MEGA X were used for genetic diversity analysis.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/plants11212861/s1. Data S1: compounds obtained from negative and positive modes from all the accessions in LC-MS/MS; Table S1: NCBI blast results for matK sequences; Table S2: compounds identified tentatively in both positive and negative mode of LC-MS; Table S3: contribution of differentially regulated compounds in groupings of the accessions; Table S4: reported functions of the identified compounds from Embelia accessions; Table S5: list of ISSR primers with their nucleotide sequences and annealing temperatures used for amplification; Table S6: analysis of genetic diversity in Embelia accession by ISSR primers. Figure S1: representative LC-MS chromatograms (BPC) from fruits samples from all accessions; Figure S2: MS and MS-MS spectra for 36 identified compounds from different accessions of Embelia fruits; Figure S3: confirmation of Embelin from fruit samples. Data Availability Statement: All the matK sequences generated have been deposited in the NCBI GenBank database (Accession numbers: Bank It2604113: Seq1-OP081086, Seq2-OP081087, Seq 3-OP081088, Seq4-OP081089, Seq5-OP081090, and Seq6-OP081091).