Essential Oil Chemotypes and Genetic Variability of Cinnamomum verum Leaf Samples Commercialized and Cultivated in the Amazon

Cinnamomum verum (Lauraceae), also known as “true cinnamon” or “Ceylon cinnamon” has been widely used in traditional folk medicine and cuisine for a long time. The systematics of C. verum presents some difficulties due to genetic variation and morphological similarity between other Cinnamomum species. The present work aimed to find chemical and molecular markers of C. verum samples from the Amazon region of Brazil. The leaf EOs and the genetic material (DNA) were extracted from samples cultivated and commercial samples. The chemical composition of the essential oils from samples of C. verum cultivated (Cve1-Cve5) and commercial (Cve6-c-Cv9-c) was grouped by multivariate statistical analysis of Principal Component Analysis (PCA). The major compounds were rich in benzenoids and phenylpropanoids, such as eugenol (0.7–91.0%), benzyl benzoate (0.28–76.51%), (E)-cinnamyl acetate (0.36–32.1%), and (E)-cinnamaldehyde (1.0–19.73%). DNA barcodes were developed for phylogenetic analysis using the chloroplastic regions of the matK and rbcL genes, and psbA-trnH intergenic spacer. The psbA-trnH sequences provided greater diversity of nucleotides, and matK confirmed the identity of C. verum. The combination of DNA barcode and volatile profile was found to be an important tool for the discrimination of C. verum varieties and to examine the authenticity of industrial sources.


Introduction
The Cinnamomum Schaeff genus belongs to the Lauraceae family and comprises 336 evergreen aromatic trees and shrubs distributed in Asia, Australia, and the Pacific Islands [1][2][3]. Many of these species have high economic importance and are used as an ingredient in several food products to provide flavor and aroma [4,5].
Cinnamomum verum J. Presl. (syn: C. zeylanicum Blume), also known as "true cinnamon" or "Ceylon cinnamon", is native to Sri Lanka and southern India but also distributed in Southeast Asia, China, Burma, Indonesia, Madagascar, the Caribbean, Australia, and Africa [6]. Sri Lanka stands out for the most significant production of C. verum globally, corresponding to approximately 70% of the global production [7]. For a long time, C. verum had been widely used as a spice in traditional folk medicine and culinary practices [8]. This species has received more attention in the last decades due

Chemical Composition and Multivariate Analysis
The yields and volatile compositions of the C. verum oils of cultivated and commercial samples are displayed in Table 1.
(E)-Cinnamaldehyde and cinnamyl acetate were also significant constituents in the leaves and flowers of C. verum from Benin and India, respectively [43,44], and in other Cinnamomum species, such as C. osmophloeum from Taiwan [45]. In addition, these compounds are common to major constituents of C. verum bark oil [46,47].
The concentration of linalool in our study varied from 2.66% to 3.69%. However, the higher the concentration of this compound, the greater the flavor and fragrance, making the oil more commercially valuable [49]. The leaves of a specimen collected in Manaus (AM, Brazil) presented a linalool content of 7.0% [41]. The presence of caryophyllene oxide in the CVe3 sample collected in Belém (PA) may indicate the process of plant maturation, since (E)-caryophyllene may oxidize into caryophyllene oxide [29].
Cinnamon is a natural component showing a wide range of pharmacological functions; among the biological properties assigned to the majority of compounds, we can cite the chemotypes rich in eugenol and benzyl benzoate, which have antifungal and antioxidant potential [18]. C. verum oil, especially rich in cinnamaldehyde, can act in synergism with antibiotics commercial to increase antimicrobial potential [15]. The cinnamaldehyde compound and its derivatives act as a high anti-carcinogenic agent [9]. Finally, antidiabetic, antioxidant, and antimicrobial activities were reported for (E)-cinnamyl acetate [44].

DNA Authentication and Genetic Variability of Cinnamomum verum Samples
A DNA barcode was incorporated to address the challenge of chemical plasticity, as the genetic makeup of a particular species should be more stable under various environmental conditions [52]. In plants, the establishment and refinement of DNA barcodes have been more challenging due to the distinct genetic diversity among different species [53]. PCR amplification and sequencing success are important factors for selecting ideal barcode loci [54,55]. The sequences of the matK, rbcL, and psbA-trnH regions were constructed for all studied samples, including five specimens of C. verum (Cve1, Cve2, Cve3, Cve4, Cve5) which had confirmed morphological identity, and four commercial samples (Cve6-c, Cve7-c, Cve8-c, Cve9-c). The PCR success rate was 100% for all the analyzed loci except matK, which did not show any amplification in two commercial samples (CVe7-c and CVe9-c). All PCR products were successfully sequenced, and high-quality bidirectional sequences were obtained.
The success of species identification depends on the quality of the barcode sequence and the taxonomic coverage of reference sequences in the GenBank database [56]. The sequence with the highest homology, maximum query coverage, and maximum score was used as a reference to assign the identity of the species. The herbal analysis of the market samples revealed that all of C. verum samples were authentic. Using authenticated raw materials is the basic starting point in developing safe and high-quality natural health products. The possibility of adulteration is high due to misidentification because the collectors do not have the taxonomic expertise to differentiate morphologically similar species [57].
In relation to the regions, homology searches using the NCBI Blast program found matched sequences with C. verum, resulting in species-level identification for the matK region. On the other hand, the psbA-trnH and rbcL sequences provided identification only at the genus level. Another important criterion of an ideal barcode is its discriminatory power [55,58].
The psbA-trnH sequences had the highest nucleotide diversity (π: 0.01449), polymorphic sites (20 bp), and parsimonious-informative characters (6 bp). The intergenic spacer is described as a DNA barcode rich in simple sequence repeats and small insertions and deletions (INDELs) [59,60]. In contrast, the sequences of matK and rbcL coding regions showed a phylogenetically conserved nature, with low nucleotide diversity (0.00000-0.000123), polymorphic sites (0 and 3 bp), and parsimonious-informative (0 bp) ( Table 2). The alignment of the concatenated matrix (rbcL+matK+psbA-trnH) presented a total of 1699 bp of characters, of which 6 bp are considered informative, and 23 bp are polymorphic sites (Table 2).
A multi-locus approach of barcode regions was used to establish the DNA barcode signatures from commercialized and cultivated of C. verum samples in the Amazon. Interestingly, cultivated samples in different locations (Benevides, Belém, Curuçá and Maranhãozinho) showed great genetic similarities with commercial samples. The alignment of the concatenated matrix showed great genetic variability in the psbA-trnH intergenic region compared to matK and rbcL in samples obtained (Figure 2).   Due to the greater nucleotide diversity, we use the pbsA-trnH region to check the distances between sequences. The genetic distances were low, with a mean of 0.015 (Supplementary Materials: Table S1), indicating little genetic variability between specimens of C. verum. DNA barcodes have also been suggested to discriminate species and identify adulterants in Cinnamomum. For example, commercial samples of cinnamon were identified as adulterants in C. aromaticum and C. malabathrum using sequences of rbcL, matK, and psbA-trnH [27]. Nevertheless, these same regions used individually or in combination did not show sufficient genetic variation to discriminate C. capparu-coronde, C. citriodorum, C. litseifolium, C. sinharajaense, C. ovalifolium, and C. verum species in Sri Lanka [61].

Molecular and Chemical Methods
Most Cinnamomum plants are highly economically valuable tree species. However, Cinnamomum species share similar morphological features in their taxonomy. Thus, developing a rapid and feasible method for the identification of Cinnamomum plants is needed to prevent their adulteration of trees [62]. DNA barcodes can be incorporated to address the challenge of chemical plasticity, as the genetic makeup of a particular species should be more stable under various environmental conditions [52].
The DNA barcode can only authenticate the medicinal plant while the chemical profile provides information on the presence and concentration of compounds with pharmacological activity [63]. This diversity of compounds is generally determined by the genetic constitution of the plant, although environmental factors may also influence the type, amount, and concentrations of the compounds present in the essential oil [64].
Chemical compounds commonly occur similarly in members of the same phylogenetic clade, and their presence or absence may indicate the common origin and, therefore, lineage [65]. The differences/fluctuations in the composition of secondary metabolites could be due to genetic modifications linked to the adaptation of these plant species to their environment [66]. Our phylogenetic analysis study allowed specific taxonomic identifications up to the level of C. verum varieties.
The complementary use of chemical and molecular markers for quality control achievement of the C. verum species and other plant materials should be tested in commercialized leaves and in herbal preparations. The identification of adulterants, fillers and/or substitutes could be accomplished only if the molecular databases of medicinal plants are enriched with more studies [67,68].  Table 1. Four commercial samples of cinnamon leaves were purchased in local markets of companies in Belém (PA, Brazil) and labeled as Cve-6c to Cve9-c to check the authenticity of the commercialized product (Table 3).

Essential Oil Extraction
The leaves were dried for two days at room temperature and then subjected to essential oil distillation. The dry leaves were pulverized and submitted to hydrodistillation using a Clevenger-type apparatus (3 h). The oils were dried over anhydrous sodium sulfate, and the yields were calculated based on the dry weight of the plant material. The moisture content of each sample was measured using an infrared moisture balance ID50 with a heat source (Marte ® , Santa Rita do Sapucaí, MG, Brazil). The moisture content of each sample was measured using an infrared moisture balance. The procedure was performed in triplicate.

GC-MS and GC(FID) Analysis
The oil samples were analyzed on a GCMS-QP2010 Ultra system (Shimadzu Corporation, Tokyo, Japan), equipped with an auto-injector (AOC-20i). The parameters of analysis were: A silica capillary column Rxi-5ms (30 m [69]. The components of oils were identified by comparing their retention indices and mass spectra (molecular mass and fragmentation pattern) with data stored in the [28,29,70] libraries.

Multivariate Statistical Analysis of Chemical Composition
The chemical compositions of the leaf samples with a percentage above 3% were used as variables in multivariate analysis. First, the matrix's data standardization was performed by subtracting the mean and dividing it by the standard deviation. The Principal Component Analysis was applied to verify the interrelation in the oil's components (OriginPro trial version, OriginLab Corporation, Northampton, MA, USA).

DNA Isolation, PCR Amplification, and Sequencing
Genomic DNA material was extracted from 100 mg of dried leaf tissue of each plant using a plant DNA isolation Kit (PureLink™ Genomic DNA, Invitrogen, Carlsbad, CA, USA) according to the protocol given by the company and stored at −20 • C. Three chloroplast DNA regions were used for amplification: rbcL, matK, and the intergenic spacer psbA-trnH. The Consortium for the Barcode of Life's (CBOL) plant working group recommended using a core of a two-locus combination of rbcL + matK as the plant barcode, with psbA-trnH as complementary sequences [55] Table 4 presents the sequences of the primers of each fragment and its PCR amplification conditions.

Sequence Identity and Distance Genetics Analysis
The forward and reverse sequences of each amplified region (matK, rbcL, and psbA-trnH) were edited and aligned using the software MUSCLE algorithm [75] implemented within MEGA 7 software [76]. Sequences were compared with available sequences in the National Center for Biotechnology Information (NCBI) GenBank database (http:// www.ncbi.nlm.nih.gov/, accessed in 1 June 2022), using the tool Blast N. DNA sequences generated in this study were deposited in the NCBI GenBank, and accession numbers are listed in the Supporting Information (Table 5). The sequences of rbcL, matK, and psbA-trnH were analyzed in DnaSP v6 [77] to obtain the median length described the genetic variability of each marker (bp) and total alignment length (bp), both discounting gaps, the number of sites with gaps, and nucleotide diversity (π). The sequencer was concatenated using the program Phylosuite [78] and aligned with the CLUSTAL W [79] in Mega software. The alignment was edited in BIOEDIT program [80]. The pbsA-trnH region was used to estimate the pairwise distance using the Kimura two-parameter (K2P) model [81].

Conclusions
We developed a pioneering study by integrating the volatile profile and molecular sequences for rapid authentication and discrimination of C. verum samples in this study. The essential oils of the samples with occurrence in the Amazon were rich in benzenoids and phenylpropanoids. The wide array of volatile chemical structures identified in the samples and their distribution pattern was utilized to differentiate chemotypes, such as (E)-cinnamyl acetate, benzyl benzoate, (E)-cinnamaldehyde, caryophyllene oxide, spathulenol, linalool, and eugenol. The species identity has been confirmed using barcode sequences, which is crucial for commercial samples with morphological data limitations.