Identification of Cumin (Cuminum cyminum) MicroRNAs through Deep Sequencing and Their Impact on Plant Secondary Metabolism

The pharmacological properties of plants lie in the content of secondary metabolites that are classified into different categories based on their biosynthesis, structures, and functions. MicroRNAs (miRNAs) are small non-coding RNA molecules that play crucial post-transcriptional regulatory roles in plants, including development and stress-response signaling; however, information about their involvement in secondary metabolism is still limited. Cumin is one of the most popular seeds from the plant Cuminum cyminum, with extensive applications in herbal medicine and cooking; nevertheless, no previous studies focus on the miRNA profile of cumin. In this study, the miRNA profile of C. cyminum and its association with the biosynthesis of secondary metabolites were determined using NGS technology. The sequencing data yielded 10,956,054 distinct reads with lengths ranging from 16 to 40 nt, of which 349 miRNAs were found to be conserved and 39 to be novel miRNAs. Moreover, this work identified 1959 potential target genes for C. cyminum miRNAs. It is interesting to note that several conserved and novel miRNAs have been found to specifically target important terpenoid backbone, flavonoid biosynthesis, and lipid/fatty acid pathways enzymes. We believe this investigation will aid in elucidating the implications of miRNAs in plant secondary metabolism.


Introduction
Cuminum cyminum L. (Apiaceae), commonly known as cumin, is an annual, aromatic, herbaceous plant [1][2][3][4] native to Egypt, the Mediterranean, and the countries of South Asia. This plant has been widely grown in India, China, Turkey, Pakistan, Morocco, Egypt, Syria, Mexico, Iran, and Chile [5]. Cumin is a commercially important seed spice known for its distinct aromatic impact in a variety of culinary cultures around the world [4][5][6][7][8], pharmacological attributes in both traditional and current medicine [1,3,5,6], as well as in beverage, perfumery, alimentary [4,5], and beauty industries [2]. Asia, the Middle East, and North Africa have been identified as the largest producers and exporters of cumin seeds and products [4]. The world production of cumin in 2012 was approximately 300,000 tons, with approximately 70% of the production coming from India. The main import markets for cumin seeds are the United States (15.8%), Egypt (13%), Brazil (6%), the United Kingdom were obtained (NCBI SRA accession number: SRP376517). Subsequently, the adaptors, low-quality reads, and other short RNAs like rRNA, snoRNA, snRNA, and tRNA were removed (Table 1), acquiring a total of 2,126,409 distinct reads with lengths ranging from 16 to 40 nt. According to the size distribution of unique C. cyminum reads, 24 nt was found to be the most prevalent one (36.44%), followed by 23 nt (8.59%), 21 nt (6.25%), and 22 nt (5.40%) (Figure 1).

Identification of Conserved and Novel miRNAs in C. cyminum
Unique reads obtained in this study were first matched against miRbase-22 (https://www.mirbase.org/ accessed on 10 January 2022) using the BLASTn program, and a total of 142 conserved cumin miRNAs spanning 63 miRNA families were identified (Table 2 shows the conserved miRNAs with the highest read counts for each family; for further info, please refer to Supplementary Materials). The frequency of the conserved miR-NAs varied greatly between families, and all showed substantial homology with their

Identification of Conserved and Novel miRNAs in C. cyminum
Unique reads obtained in this study were first matched against miRbase-22 (https://www.mirbase.org/ accessed on 10 January 2022) using the BLASTn program, and a total of 142 conserved cumin miRNAs spanning 63 miRNA families were identified ( Table 2 shows the conserved miRNAs with the highest read counts for each family; for further info, please refer to Supplementary Materials). The frequency of the conserved miRNAs varied greatly between families, and all showed substantial homology with their respective homologs (maximum of single mismatch). Conserved miRNA data showed that the most abundant miRNA families were miR156, miR166, miR396, and miR159, with 42, 28, 24, and 23 members, respectively ( Figure 2); moreover, the read counts of the miRNA families varied from 1 to 400,512, where the miR156 family had the highest number of reads (400,512), followed by the miR159 (184,400) and miR166 (144,536) families (Supplementary Table S1).  Following the identification of conserved miRNAs, the remaining unaligned 1,057,102 reads were subjected to predict novel C. cyminum miRNA candidates by applying strict filtering criteria. A total of 39 novel cumin miRNAs were identified in this study (Table 3), and their secondary hairpin structures were deduced ( Figure 3). The read counts of these novel miRNA candidates varied from 2 to 1382. The cci-miRN10-3p displayed the highest number of reads (1382), followed by cci-miRN2-3p (975) and cci-miRN20-5p (262). The precursor sequences of these novel miRNA candidates had high Minimum Folding Free Energy Index (MFEI) values ranging from 0.70 to 1.41 with an average of 0.98 ± 0.14, distinguishing them from other types of RNAs such as tRNAs (0.64), rRNAs (0.59), and mRNAs (0.62-0.66) [21].

Experimental Validation of C. cyminum miRNAs by qPCR
In order to validate the data obtained by high-throughput sequencing, a qPCR analysis was performed. Five conserved and five novel C. cyminum miRNAs were randomly

Experimental Validation of C. cyminum miRNAs by qPCR
In order to validate the data obtained by high-throughput sequencing, a qPCR analysis was performed. Five conserved and five novel C. cyminum miRNAs were randomly selected for the experiments. The results showed similar relative expression patterns with respect to the read counts for all miRNAs analyzed (Figure 11). Figure 10. Target enzymes of novel miRNAs of C. cyminum in the carotenoid biosynthesis pathway. EC:1.14.15.24-Beta-carotene 3-hydroxylase. The red boxes represent the targeted enzymes of the corresponding novel miRNAs.

Discussion
Secondary metabolites are synthesized by plants to adapt and respond to the environment's biotic and abiotic stressors [22]. Moreover, these secondary metabolites have multiple applications in a number of industries that aroused commercial interest and a field of study to explore overproduction alternatives [22]. MiRNAs are short non-coding RNAs that control the plant's secondary metabolism by targeting the mRNA of crucial enzymes involved in the biosynthetic pathways [11]. Therefore, it becomes imperative to identify the miRNAs and their targets in unexplored medicinal plant species. The present study investigated the miRNA profile of cumin (C. cyminum) by Illumina small RNA sequencing technology to better understand their involvement in secondary metabolism.
It is well established that most of the plant miRNA families are highly conserved in the plant kingdom, while some others are specific to certain species (novel miRNAs). The degree of conservation of a miRNA indicates its evolutionary status and is closely related to its expression patterns and functions [23]. In this study, the conserved C. cyminum miRNAs from young cumin seedlings (15-day-old seedlings) were distributed in 63 families, from which miR156 was the most persistent family with 42 members and 400,512 read counts. Liu et al. (2018) reported that miR156 controls the phase transition in plant development and accumulates during the juvenile stage, decreasing when the plant ages, which is consistent with our results of cumin miRNAs from the seedling stage [23].
The pharmacological potential of C. cyminum is attributed to the presence of bioactive compounds such as terpenoids, phenols, and flavonoids [24]. Several studies reported that cuminaldehyde is the most prominent chemical compound in C. cyminum seeds, followed by cuminic alcohol, γ-terpinene, p-cymene, and β-pinene [25,26]. Therefore, terpenoids play a vital role in cumin's secondary metabolism. Terpenoids backbone is produced through successive condensations of isopentenyl diphosphate (IPP) and its isomer, dimethylallyl diphosphate (DMAPP). After different enzymatic steps, the backbone is arranged and modified to give rise to a morphological and functional diversity of terpenoids [27,28]. Specifically, terpenoids can be classified depending on the number of carbons (or isoprenoid units, C5H8) in C10 monoterpenes, C15 sesquiterpenes, C20 diterpenes, C30 triterpenes, and C40 tetraterpenes. Particularly, C30 triterpenoids contain a carbon skeleton of four or five rings of varied configurations, while steroids are tetracyclic compounds produced from terpenoid precursors and contain a perhydro-1,2-cyclopentane-phenanthrene moiety [29]. For both triterpenoids and steroids, a common step is needed in their biosynthesis pathway, which consists of the cyclization of squalene (C30 compound) into cycloartenol by the enzyme cycloartenol synthase [27]. This enzyme belongs to the oxidosqualene cyclase gene family [30]. Interestingly, the present study revealed that cycloartenol synthase is targeted by the conserved cumin miRNAs cci-miR156q, cci-miR157b-3p, cci-miR159c, cci-miR172c-5p, cci-miR172d-5p, cci-miR5564b, and cci-miR6476a and the novel cumin miRNAs cci-miRN3-5p, cci-miRN19-3p, cci-miRN32-5p, and cci-miRN34-5p. It suggests these cumin miRNAs can target a key regulator enzyme in both sterol and triterpenoid production due to squalene cyclization being a crucial branch point in their biosynthesis pathway [30]. Our results coincide with the findings of Srivastava et al. (2018), who reported that miR5140 and miR159 families were involved in the gene regulation of cycloartenol synthase and sterol delta-7 reductase 1, which are directly involved in the biosynthesis of withanolides, a group of steroids produced by the medicinal plant Withania somnifera [17]. Similarly, Jeena et al. (2021) identified the cleavage site of cycloartenol synthase by Bm-miR172c-5p from the medicinal plant Bacopa monnieri and suggested that this miRNA could be a key regulator in the plant response to abiotic stress [31].
Additionally, our study showed that the conserved cumin miRNAs cci-miR166d-5p, cci-miR319d-3p, cci-miR396a-3p, and cci-miR6476a target both enzymes homogentisate phytyltransferase and homogentisate genarylgeranyltransferase. These enzymes are responsible for condensing homogentisic acid to phytyl-pyrophosphate (Phytyl-PP) and geranylgeranyl pyrophosphate (GGPP) for later production of tocopherols and tocotrienols in the Ubiquinone and Other Terpenoid-Quinone Biosynthetic Pathway. Importantly, these enzymes perform the first committed reaction in their biosynthesis due to an irreversible enzymatic reaction in which the product must continue through the pathway [32]. Our data are consistent with the report of Zheng et al. (2019), in which several miRNAs of soybean (Glycine max) targeted genes of the same biosynthetic pathway under biotic stress [33].
Moreover, in this investigation, we showed that the conserved cumin miRNAs cci-miR159, cci-miR394b-5p, cci-miR397, and cci-miR2118 target the enzyme ent-kaurene synthase, responsible for producing ent-Kaur-16-ene from ent-copalyl diphosphate, playing a key bifunctional role in the biosynthesis of phytohormones and terpenoids [34,35]. Additionally, this enzyme has been conserved in all land plants and reported in higher concentrations in rapidly dividing plant tissues [36]. Likewise, Barozai et al. (2011) found in a computational study that the miRNA families miR158 and miR393 from Picea glauca and Picea sitchensis targeted ent-kaurene synthase [37].
Regarding carotenoids, they are C40 tetraterpenoids that play an important role in the physical condition of plants and are biomolecules of great commercial interest due to their different applications in fragrances, food, cosmetics, and biofuels [38]. In addition, they have antioxidant, anti-inflammatory, and anti-allergic properties [39]. Carotenoids are made by two C20 moieties of GGPP, linked together at their tails to give up a C40 linear hydrocarbon skeleton adaptable to different structural changes [39]. Each GGPP moiety is produced by the linear sequential addition of three IPP molecules to one DMAPP molecule [39][40][41]. We noticed that the novel miRNA cci-miRN2-3p targets the enzyme beta-carotene 3-hydroxylase, which plays a vital role in several sequential steps for the synthesis of astaxanthins. Astaxanthins are carotenoids with high pharmaceutical importance synthesized by ketolase and beta-carotene 3-hydroxylase enzymes from β-carotene [42]. Importantly, beta-carotene 3-hydroxylase and other related enzymes are responsible for ratelimiting steps in β-carotene catalysis [43]. Therefore, the cumin novel miRNA cci-miRN2-3p could represent a key regulator in astaxanthins biosynthesis. Similarly, another study of potato seedling miRNAs revealed that the novel miRNA stu-novel-miR5125 targeted beta-carotene 3-hydroxylase, which could be upregulated by low-temperature stress [44]. Furthermore, a miRNA regulatory network from Cryptomeria fortunei, a medicinal plant from China, indicated that several miRNAs, including pde-miR159, mdm-miR396a, and hbr-miR6173 were related to the carotenoid biosynthesis [45].
Finally, the qPCR experiment to validate the sequencing analysis was done with five conserved and five novel C. cyminum miRNAs. The results demonstrated that both the Illumina sequencing and the qPCR assays displayed similar expression patterns. This behavior was also observed in the leaves of the medicinal plant C. camphora [18], 15-dayold seedlings of wild-type A. thaliana [46], and 15-day-old seedlings of Tibetan barley (Hordeum vulgare L. var. nudum) [47].

Plant Materials, RNA Extraction and Quality Control
C. cyminum seeds were collected from a local supermarket and washed thoroughly with 2% sodium hypochlorite and colloidal silver (10 drops of Microdacyn ® in 100 mL of distilled water). For seed germination, Petri dishes were prepared with sterile filter paper, 2 mL of MS 1/2 medium, and a thin layer of cumin seeds, which were left incubating at 25 • C for 15 days with a 12-h light/dark photoperiod. Seedling samples of healthy 15-days-old-C. cyminum were collected and instantly frozen in liquid nitrogen. Total RNA (including small RNA) was isolated from 100 mg of the seedling samples using the Direct-zol RNA micro kit (Zymo Research, Irvine, CA, USA) following the manufacturer's instructions. RNA concentration and purity were quantified utilizing a Nanodrop2000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The integrity of the samples was assessed on Tapestation (Agilent, Santa Clara, CA, USA).

Small RNA Library Construction and Sequencing
A single small RNA sequencing library from pooled RNA samples (3 biological replicates) was prepared using QIAseq ® miRNA Library Kit (Qiagen, Germantown, MD, USA). Briefly, 63 ng of total RNA was used as starting material, and 3 adapters were ligated to the specific 3 OH group of miRNAs, followed by ligation of 5 adapter. Subsequently, the adapter-ligated fragments were reverse transcribed with Unique Molecular Index (UMI) assignment, and the cDNA was enriched and barcoded by PCR amplification. The resultant cDNA library was then quantified by Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA), and the fragment size distribution of the library was evaluated on Agilent 2200 TapeStation system. Finally, high-throughput single-ended sequencing was performed using an Illumina NovaSeq 6000 for 75 cycles, following the manufacturer's instructions.

Small RNA Sequencing Data Analysis
Illumina GA raw data was processed by srna-workbench (V3.0_ALPHA). For this step, 3 adaptors and low-quality reads (<Q30) were removed, and reads matching other ncRNAs such as rRNAs, tRNAs, snRNAs, and snoRNAa, as well as sequences smaller than 16 bp and larger than 40 bp were eliminated. The remaining small RNA sequences were aligned against miRbase22 (http://www.mirbase.org, accessed on 10 January 2022) in order to identify conserved C. cyminum miRNAs. Those sequences that did not show homology were considered for the prediction of novel miRNAs using bowtie (https://bowtie-bio. sourceforge.net/index.shtml, accessed on 10 January 2022) and Mireap_0.22b [48]. Conserved miRNAs were identified by the homology approach against mature Viridiplantae miRNA sequences due to the unavailability of the C. cyminum genome. In this study, only novel miRNAs with suitable precursor secondary structures and with MFEI values ≥ 0.70 were considered. The secondary structures of the precursors were predicted using the The psRNATarget tool (https://www.zhaolab.org/psRNATarget/, accessed on 10 January 2022) was used to predict the target genes of both novel and conserved miRNAs with copy numbers equal to or greater than five. Potential Cuminum cyminum miRNA targets were annotated using the Gene Ontology (GO) in the bioprocesses (BP) domain. Additionally, a biological network of the miRNAs and their targets was built using the MFE values of the miRNA-target interaction and visualized using Cytoscape 3.2 (https: //cytoscape.org/releasenotes320.html, accessed on 10 January 2022) in order to identify the coregulation of possible targets. Finally, a thorough analysis of miRNA targets linked to pathways for the biosynthesis of secondary metabolites was performed.

Extraction of Small RNA and Experimental Validation of C. cyminum miRNAs by qPCR
In order to validate the identified conserved and novel miRNAs, small RNA was extracted from frozen C. cyminum seedlings using a mini miRNeasy kit (Qiagen, Maryland, USA) following the manufacturer's instructions. Quality and quantity of small RNA samples were measured using NanoDrop One UV-Vis microvolume spectrophotometer (Thermo Scientific, Wilmington, DE, USA). Resulting small RNAs were then polyadenylated and reverse transcribed using the Mir-X miRNA First-Strand Synthesis kit (Clontech, Mountain View, CA, USA), and finally, the qPCR was performed using the TB Green ® Advantage ® qPCR Premix (Takara Bio USA, Inc., San José, CA, USA) in a StepOne™ Real-Time PCR System (Applied Biosystems, Carlsbad, CA, USA). The primers for cumin novel miRNAs were listed in Supplementary Table S2. The reactions were performed in a 48-well optical plate with the following conditions: initial polymerase activation step for 10 s at 95 • C, denaturation by 45 cycles of 5 s at 95 • C, annealing and extension for 20 s at 55 • C. Following the amplification cycle, a melting curve analysis with temperature ranges of 56-95 • C and increments of 0.5 • C every 10 s was performed. For each sample, the reactions were run with three technical and six biological replicates, and the relative expression of the miRNAs was normalized to U6, the endogenous control.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/plants12091756/s1, Table S1: Conserved miRNAs in Cuminum cymin um, Table S2: Sequence of primers of C. cyminum novel miRNAs. Funding: The authors would like to thank the Regional Department of Bioengineering, South-Central Region of Tecnologico de Monterrey, for their support regarding the equipment and required infrastructure for this project.
Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/search/all/?term=SRP376517.