DNA Barcoding of Wild Plants with Potential Medicinal Properties from Faifa Mountains in Saudi Arabia

Wild medicinal plants are the main source of active ingredients and provide a continuous natural source for many folk medicinal products, a role that is important for society’s health with an impressive record of utilization. Thus, surveying, conserving, and precisely identifying wild medicinal plants is required. The current study aimed to precisely identify fourteen wild-sourced medicinal plants from southwest Saudi Arabia, within the Fifa mountains area located in Jazan province, using the DNA barcoding technique. Two DNA regions (nuclear ITS and chloroplast rbcL) were sequenced and analyzed for the collected species using BLAST-based and phylogeny-based identification methods. Based on our analysis, ten of the fourteen species were successfully identified by DNA barcoding, five were identified as morphologically inspected, and three were morphologically indifferent. The study was able to distinguish some key medicinal species and highlight the importance of combining morphological observation with DNA barcoding to ensure the precise identification of wild plants, especially if they are medicinally relevant and associated with public health and safety usage.


Introduction
There is no doubt that medicinal plants are used by a substantial portion of society, and their use is widely recognized. The use of herbs for maintaining human health, especially for chronic diseases, has been practiced worldwide for centuries [1]. In developed countries, people are increasingly using traditional medicine as an alternative to, or alongside, modern medicine [2]. Despite the commercialization of traditional medicine over the last few decades, many medicinal plants remain gathered from the wild [3].
It is imperative that medicinal plants are correctly identified if they are to be used in a safe manner [4,5]. The majority of medicinal plants are classified based on morphological characteristics by expert botanists or by the use of analytical techniques to determine their quality (for example, organoleptic, macroscopic, microscopic, and chemical profiling methods) [4,5]. Nevertheless, neither morphological features nor previous methods can easily identify related species, particularly in cases involving powders or processed products obtained from plants [4,5]. In this regard, species adulteration and the use of spurious materials have become increasingly important concerns for health and safety reasons [4,5].
The technique of identifying biological specimens using DNA short sequences is called DNA barcoding [6,7]. A DNA barcode method for global species identification was first developed by Hebert et al. [8], attracting worldwide attention [9][10][11]. Plant DNA barcoding is imperative for the conservation and utilization of plants, as well as to identify species

Study Area
The study examined the flora inhabit Faifa Mountains (17 •

Sample Collection
Fourteen different species of plants belonging to 12 families were collected from their high-altitude natural habitats during the summer of 2021. A leaf sample was collected from each species (approximately 25 g), and all samples were labeled with a site code and dried immediately with silica gel at room temperature for DNA extraction. Species identification and assignment were independently confirmed prior to the molecular studies and were based on an assessment of morphological descriptors (Tropicos.org).

DNA Extraction, PCR Amplification, and Sequencing
The total genomic DNA of each sample was isolated from~200 mg of dried leaves using the WizPrep™ gDNA Mini Kit (Cell/Tissue; Wizbiosolutions Inc, Gyeonggi-do, Republic of Korea), according to the manufacturer's instructions, with a final elution volume of 50 mL. The isolated DNA was tested for quality by 1% gel electrophoresis and visualized under UV light using the Ingenius3 Gel documentation system (Syngene, Cambridge, UK). Extracted DNA was stored at −20 • C until required for PCR.

DNA Extraction, PCR Amplification, and Sequencing
The total genomic DNA of each sample was isolated from ~ 200 mg of dried leav using the WizPrep™ gDNA Mini Kit (Cell/Tissue; Wizbiosolutions Inc, Gyeonggi-do, R public of Korea), according to the manufacturer's instructions, with a final elution volum of 50 mL. The isolated DNA was tested for quality by 1% gel electrophoresis and visu ized under UV light using the Ingenius3 Gel documentation system (Syngene, Cambridg UK). Extracted DNA was stored at −20 °C until required for PCR.

Sequence Alignment and Data Analysis
After sequencing, the chromatograms obtained were further analyzed using Geneious R10 [28]. To check the quality of each sequence, the peaks corresponding to each nucleotide were examined, and a consensus sequence was produced after trimming the poor-quality DNA sequence ends and aligning forward and reverse sequences. The consensus sequences were identified using the BLAST search tool in the NCBI database, applying default parameters. Each sequence of the rbcL gene and ITS region were aligned separately with the BLAST query results using the MAFFT aligner [29], implemented in Geneious R10. The phylogenies for each gene region were generated using maximum likelihood methods (ML). The ML tree was reconstructed in MEGA X with 1000 bootstrap repeats [30]. Other types of analysis were performed to investigate the influence of the various phylogenetic estimation methods on our results. We carried out Bayesian analysis with the parallel version of MrBayes 3.2 [31], using the HKY85 model implemented in Geneious R10.

Morphological Observation and Provisional Identification
All the observed plant samples were identified as flowering plants belonging to the class Magnoliopsida (Angiosperms). The 14 plants were found evenly to present two major clades, the Asterids (four species) and the Rosids (eight species), as well as two species of the unclassified order Caryophyllales. In the case of the Asterids, all the samples were identified as Lamiids, where three orders were presented. The order Lamiales was presented by two species of the families, Acanthaceae and Orobanchaceae, namely Barleria prionitis and Lindenbergia siniaca, respectively. The other orders were uniquely presented by Trichodesma boissieri (family Boraginaceae, order Boraginales) and Withania somnifera (family Solanaceae, order Solanales; Figure 2).

Chloroplast rbcL Gene
A. BLAST-based identification The retrieved rbcL sequences ranged between 450 to 454 bp with an average of 452 ± 2 bp; the sequenced quality was Q20 = 99.6%. The rbcL sequence of each sample was used to perform BLAST independently to retrieve top hits available in the database and filtered by > 95% pairwise identity (PI). The BLAST search found that the species (01)   In the case of the Rosids, two clades were identified, the Malvids presented by the Hibiscus micranthus (family Malvaceae, order Malvales), and Myrtus communis (family Myrtaceae, order Myrtales). The other clade of the Fabids was presented in four orders, order Zygophyllales of single species Tribulus terrestris (family Zygophyllaceae), and order Rosales of single species Sageretia thea (family Rhamnaceae), order Malpighiales presented by two species, Acalypha sp. and Ricinus communis (family Euphorbiaceae), and the order Fabales presented by two species of the family Fabaceae, namely Crotalaria incana and Vachellia tortilis ( Figure 2).

A. BLAST-based identification
The retrieved rbcL sequences ranged between 450 to 454 bp with an average of 452 ± 2 bp; the sequenced quality was Q20 = 99.6%. The rbcL sequence of each sample was used to perform BLAST independently to retrieve top hits available in the database and filtered by > 95% pairwise identity (PI). The BLAST search found that the species (01) matched Rumex nepalensis (KX015758; PI = 99.7%), species (02) Table 1). B. Phylogeny-based identification The rbcL sequences, along with the BLAST top 5 hits, were aligned together and trimmed to equal length. The retained total nucleotide alignment was 452 bp, with total identical sites of 334 (73.9%), PI = 91.8%, and Q20 of at least 99.6% of the retained nucleotides.
The nucleotide frequencies of non-gaped sites were 26%, 22.2%, 23.7%, and 28.1% for A, C, G, and T, respectively, with GC% = 45.9%. The maximum likelihood tree was constructed based on the aligned sequences and visualized as a rooted circular cladogram (Figure 3).

Nuclear ITS Region
A. BLAST-based identification The retrieved ITS sequences ranged between 518 to 684 bp with an average of 601 ± 83 bp; the sequenced quality was Q20 = 97.5%. The ITS sequence of each sample was used to perform BLAST independently to retrieve top hits for each plant sample. Limited by the database and filtered by > 95% pairwise identity (PI), the species (01)   The phylogenetic signals were highly in accordance with the registered taxonomical information in the taxonomy database (NCBI), where the two major clades corresponding to the Asterids and the Rosids were distinguished. However, the two species belonging to the unranked Polygonoideae subfamily were grouped with the Fabaceae family and formed part of the fabids, namely (species 01 and 02). The monophyletic observation for species (01) prevented its clear classification at the species level, in contrast to species (02) where the species can confidently be identified as Oxygonum sinuatum (bootstrap value = 100). Species (03) was correctly clustered with other members of the genus Withania (family Solanaceae) but not closely related to any specific accession. The same case was observed for species (10, 12, and 14) representing genera Tribulus (family Zygophyllaceae), Acalypha (family Euphorbiaceae), and Senegalia (family Fabaceae), respectively (all bootstrap value > 0.97). Species (04) showed a different clustering than the matched BLAST hit, in which the species clustered with Trichodesma calycosum rather than T. africanum. The correct taxonomical assignment of the studied species was observed for species (05, 06, 07, and 08), where the studied species were closely or clearly clustered with the matched species at high bootstrap support (>0.80). The monophyletic case was also found for species (09) that impeded the correct species assignment of this sample to a certain species from the genus Sageretia (family Rhamnaceae; Figure 3). The species (13) was clustered with three unidentified species and one known species of the genus Crotalaria; thus, the whole clade, including the sample understudy, was assigned as C. incana (family Fabaceae).

A. BLAST-based identification
The retrieved ITS sequences ranged between 518 to 684 bp with an average of 601 ± 83 bp; the sequenced quality was Q20 = 97.5%. The ITS sequence of each sample was used to perform BLAST independently to retrieve top hits for each plant sample. Limited by the database and filtered by > 95% pairwise identity (PI), the species (01) Table 2).

B. Phylogeny-Based Identification
The ITS sequences, along with the BLAST top 5 hits, were aligned together and trimmed to equal length. The retained total nucleotide alignment was 762 bp, with total identical sites of 187 (24.5%), PI = 54.5%, and Q20 of at least 95.7% of the retained nucleotides. The nucleotide frequencies of non-gaped sites were 19%, 29.3%, 31.3%, and 20.4% for A, C, G, and T, respectively, with GC% = 60.6%, while the gaps were 22.5% of the total alignment.
Based on the aligned sequences, the maximum likelihood tree was constructed and visualized as a rooted circular cladogram (Figure 4). The phylogenetic clustering is in accordance with the published taxonomical information only at the high ranks. In detail, the two major clades corresponding to the Asterids and the Rosids were defined, as well as the unranked Polygonoideae subfamily. However, all species of the same family were clustered together correctly, but the families were not grouped in accordance with known taxonomical information. Species (01 and 02) were correctly clustered with other members of the Polygonoideae subfamily and closely related to each other and were defined as the BLAST results. Equally, the correct taxonomical assignment at the family level of the studied species was observed for species (03, 04, 05, and 06) presenting the Asterids, and clustered with Withania somnifera, Trichodesma boissieri, Lindenbergia species, and Barleria bispinosa, respectively. Species (07 and 08) were assigned correctly to the Malvids, but species (07) was not clustered to a single species of the genus Hibiscus, in contrast to species (08), which was clustered clearly with Myrtus communis. Species (09) showed a monophyletic status with two Sageretia species; thus, its identity remains uncertain. Species (10) was clustered incorrectly superior to malvids; it represents the fabids and is correctly distinguished as Tribulus terrestris (bootstrap values > 0.80). The family Euphorbiaceae was represented by two genera, Acalypha sp. (species 12) and Ricinus communis (species 11). The species (13 and 14) of the Fabaceae family were clearly identified with high bootstrap values (>0.80) as Crotalaria incana and Acacia tortilis (syn. Vachellia tortilis; Figure 4).

Integrative Comparative Analysis
The comparison between the morphological inspection versus the DNA barcoding identification showed agreements as well as disagreements. Based on the BLAST results, the molecular identification using both molecular loci agreed with the morphological inspection for species 02, 08, 09, 10, and 13, as Oxygonum sinuatum, Myrtus communis, Sageretia thea, Tribulus terrestris, and Crotalaria incana, respectively. Species 01, 03, 06, 11, and 14 were equally identified between rbcL and ITS in most cases (except for species 14) but were not equal to the morphological inspection. Those species were identified as Rumex nepalensis, Withania somnifera, Barleria prionitis, Ricinus communis, and Vachellia tortilis, respectively.
Total disagreement between the morphological inspection, ITS, and rbcL was found at the species level for species 04, 05, 07, and 12. In the case of species 04, it was confusing to have a clear match with three different species for each of the markers, namely Trichodesma calycaroum (rbcL) and Trichodesma boissieri (ITS). However, the rbcL BLAST-based identification was matching with Trichodesma africanum. Species 05 and 12 were both of unknown species of the genus Lindenbergia and Acalypha, where none of the two markers matched a species with certainty. The rbcL phylogenetic analysis showed enough genetic variation to delimit species by paraphyletic clustering for species 07 in contrast to the ITS monophyletic clustering for this species, identified as Hibiscus sabiensis (Table 3).

Integrative Comparative Analysis
The comparison between the morphological inspection versus the DNA barcoding identification showed agreements as well as disagreements. Based on the BLAST results, the molecular identification using both molecular loci agreed with the morphological inspection for species 02, 08, 09, 10, and 13, as Oxygonum sinuatum, Myrtus communis, Sageretia thea, Tribulus terrestris, and Crotalaria incana, respectively. Species 01, 03, 06, 11, and 14 were equally identified between rbcL and ITS in most cases (except for species 14) but were not equal to the morphological inspection. Those species were identified as Rumex

Discussion
The drug's efficacy decreases if it is adulterated, and in some cases, it can be lethal if it is substituted with toxic adulterants [16]. The adulteration of herbal materials usually occurs due to materials not having readily distinguishable morphological characters or the substitution of economically valuable materials with inexpensive herbs [16]. Hence, correctly identifying medicinal plants using genetics may enhance the quick and precise identification of species of economic interest.
To standardize the international use of DNA barcodes, the scientific community has made considerable efforts to search for suitable DNA regions to barcode every species [32]. After an extensive inventory of gene regions in the mitochondrial, plastid, and nuclear genomes, the nuclear ITS region and the chloroplast genes rbcL and/or matK have generally been agreed upon as the standard DNA barcodes of choice and were recommended by the Consortium for the Barcode of Life (CBOL) as a standard two-locus barcode for global plant databases because of their species discrimination ability together [33]. Indeed, in our analysis, by comparing ITS and rbcL phylogenetic analysis, we found that the ITS tree was better at defining taxonomy at the family or lower levels in contrast to the rbcL, which was able to define higher taxonomical ranks. Although ITS was more efficient in differentiating species, using it solely will not be recommended due to the variation within species [22,[34][35][36]. Combining both regions guided by morphological inspection helped identify the wild species. In the current analysis, we identified 10 out of the 14 species, five of which were identified as morphologically inspected, in contrast to three species where the morphological inspection was indifferent. In one case, species (4) differed by morphological inspection, and both barcodes were incongruent. Two species were morphologically inspected as Lindenbergia siniaca and Acalypha fruticose, but the barcodes were not successfully identified based on the NCBI database. Database and sequence search strategies are the essential limits to the success of the barcoding technique for species identification [37].
The wild plants are usually enlisted as medicinal and ethnobotanical "folk medicine" plants. Our study found a list of certain species that can be identified as medicinal plants. However, as previously mentioned, the identification of medicinal plants has had a long history, and the correct identification of these plants is a prerequisite for their safe application. For example, a plant of medicinal value (contains flavonoids, anthraquinones, and gallic acid) has been reported from the sampling region in Saudi Arabia as Rumex nervosus [38], a sample that we equally have inspected. However, the DNA barcoding using both markers confirmed that the species is Rumex nepalensis, which can raise doubts to its safe application without incorporating an effective identification tool as the DNA barcoding. Similarly, another sample was identified by DNA as Barleria prionitis, even thought it was morphologically identified as Barleria bispinosa, a native species to the Arabian peninsula [39], which again raise doubts around the common wild plants identified in the region, as well as the safe application of this species, as several species of Barleria are known for their medicinal or ornamental values, but not all [40]. Vachellia (syn. Acacia) etbaica is a wooden wild plant growing in the desert of Egypt and proximate deserts around [41], a plant we found in our study, but was proven by DNA to be a proximate species Vachellia tortilis, a medicinal tree that has edible gum and can be used as Arabic Gum [42]. An interesting finding is the presence of Trichodesma boissieri, a plant that was only identified by the ITS phylogenetics, a plant that has been reported from the northern parts of the Arabian Peninsula [39] but has never been reported that far into the south. The climate change effect on the sampling area may contribute to the vegetation diversity detected in the Faifa mountains. Climate change affects species distributions through changes in plant growth and reproduction; it can act directly (e.g., drought, wind) and indirectly (e.g., temperature and disease outbreaks) [43].
Based on our findings, we recommend an in situ conservation plan for those wild medicinal plant species; it has a valuable role to play in maintaining genetic resources for folk medicinal plants and allowing for the continued adaptation and evolution of migrated plant genotypes.