Genome-Wide Analysis of Alternative Splicing and Non-Coding RNAs Reveal Complicated Transcriptional Regulation in Cannabis sativa L.

It is of significance to mine the structural genes related to the biosynthetic pathway of fatty acid (FA) and cellulose as well as explore the regulatory mechanism of alternative splicing (AS), microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in the biosynthesis of cannabinoids, FA and cellulose, which would enhance the knowledge of gene expression and regulation at post-transcriptional level in Cannabis sativa L. In this study, transcriptome, small RNA and degradome libraries of hemp ‘Yunma No.1’ were established, and comprehensive analysis was performed. As a result, a total of 154, 32 and 331 transcripts encoding key enzymes involved in the biosynthesis of cannabinoids, FA and cellulose were predicted, respectively, among which AS occurred in 368 transcripts. Moreover, 183 conserved miRNAs, 380 C. sativa-specific miRNAs and 7783 lncRNAs were predicted. Among them, 70 miRNAs and 17 lncRNAs potentially targeted 13 and 17 transcripts, respectively, encoding key enzymes or transporters involved in the biosynthesis of cannabinoids, cellulose or FA. Finally, the crosstalk between AS and miRNAs or lncRNAs involved in cannabinoids and cellulose was also predicted. In summary, all these results provided insights into the complicated network of gene expression and regulation in C. sativa.


Introduction
Recently, Cannabis sativa L. has attracted worldwide attention due to its wide use in medicine, food, health products, textiles and other fields [1,2]. The main products of C. sativa are derived from cannabinoids, fatty acid (FA) and cellulose [3][4][5]. Cannabinoids are types of C. sativa-specific secondary metabolites rich in female flowers [6,7]. To date, more than 100 cannabinoid compounds have been found in C. sativa, of which ∆9-tetrahydrocannabinolic acid (THCA) and cannabidiolic acid (CBDA) are the major components [8]. Both THCA and CBDA have functions in relieving pain, keeping calm, preventing vomiting, regulating immunity and reducing artery occlusion [9,10]. However, THCA has hallucinogenic and addictive effects [11,12]. Content and composition of cannabinoids are highly variable among C. sativa plants. According to the content and ratio of THCA and CBDA, C. sativa can be mainly divided into marijuana and hemp. Marijuana contains high THCA and low CBDA, while hemp is the opposite. Besides CBDA, hemp is cultivated for fiber and oil. Hemp seed is a kind of nutritious food, rich in FA [13]. Moreover, the proportion of ω-6 and ω-3 FAs is 3:1, consistent with the recommended balance for human health [14]. The cultivation area and the export of products of hemp in China account for about 50% and 25% of the world, respectively [15].
With the release of the draft genome of C. sativa, the biosynthetic pathway of cannabinoids has been deciphered and is involved in the pathways of hexanoate, plastid-localized 2-C-methyl-d-erythritol 4-phosphate (MEP) and geranyl pyrophosphate (GPP) [16][17][18]. The biosynthetic pathway of FA has been studied in some plants [19,20], but there is no report on C. sativa so far. The biosynthesis of cellulose is very complex in plants. Besides cellulose synthase (CESA), some non-CESA proteins are also involved in the process [21]. However, the genes related to the biosynthesis of cellulose in C. sativa have not been mined yet. In brief, the regulation in the biosynthesis of cannabinoids, FA and cellulose in C. sativa is still unknown to date.
It is known that gene expression and regulation consist of many layers. Alternative splicing (AS) is a significant mechanism for regulating the expression of genes in eukaryotes [22]. AS refers to one mRNA precursor producing diverse transcripts, which is thought of as an important post-transcriptional mechanism for increasing the diversity of transcripts and proteins [23]. AS plays a major role in affecting protein function; therefore, it is necessary to systematically identify AS events. However, AS has not been reported in C. sativa.
Non-coding RNAs, especially for microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), have been comprehensively studied using bioinformatic and experimental approaches in plants [24,25]. However, they are not reported in C. sativa. MiRNAs are types of 20-24 nucleotide non-coding RNAs, usually originating from single-stranded transcripts with imperfect stem-loop structure [26], which cleavage or inhibit translation of target genes at posttranscriptional level or through methylation modification at the transcriptional level [27][28][29]. MiRNAs function as gene regulators during the plant growth process [30][31][32] and are also involved in adaptive responses to abiotic stresses [33,34] and synthesis of secondary metabolites [35][36][37]. LncRNAs have become a popular research topic in molecular biology in recent years. LncRNAs display a key regulatory role in plant cell development, which can be trans-acting for target genes in a sequence complementary manner [38] or cis-acting in other mechanisms [39].
It was reported that AS and miRNAs could crosstalk, which increased the complexity of gene expression and regulation, emerging as one of the hot topics in molecular biology. On the one hand, the AS isoform affected the biogenesis of miRNA and lncRNA. For example, there is a kind of tissue-specific AS regulating the biogenesis of miR-412 involved in cell death [40]. MiR846 and miR842 are regulated by abscisic acid, originating from different AS isoforms [41]. On the other hand, miRNA targeted partial AS isoforms, decreasing the regulation by miRNA [42].
In this study, the structural and transporter-coding genes related to the biosynthetic pathway of FA and cellulose were mined in C. sativa for the first time. Then, AS events, miRNAs and lncRNAs were analyzed, and the potential regulation in the biosynthesis of cannabinoids, FAs and cellulose was predicted. Moreover, the crosstalk between AS and miRNAs or lncRNAs was revealed. Taken together, all these results will offer candidate genes related to the biosynthetic pathway of cannabinoids, FA and cellulose and enhance the knowledge of gene expression and regulation in hemp.

RNA-Seq Library Construction and Analysis
'Yunma No.1' is the first industrial hemp cultivar bred by the Yunnan Academy of Agricultural Sciences. Its THCA content is <0.3%, complying with the internationally adopted EU standards for hemp [43]. To obtain transcriptomic information of 'Yunma No.1', a strand-specific RNA-Seq library was constructed and sequenced by 90 bp pairedend (PE) sequencing. After filtering the low-quality reads, in total,~66 million clean reads were obtained. More than 83% of them could be mapped to the published C. sativa 'Purple Kush' genome using tophat with the default parameter [44]. Then, the 735,445,652 mapped reads were used for assembly by Cufflinks software [45]. To predict the gene functions, the 67,035 assembly transcripts were annotated by NCBI nonredundant protein sequences (NR) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases [46,47].

Structural and Transporter Genes Related to the Biosynthetic Pathway of Cannabinoids, FA and Cellulose
The genes related to cannabinoids, FA and cellulose metabolism were focused on because the main products of C. sativa are derived from cannabinoids, FA and cellulose [3][4][5].

Genome-Wide Identification of AS
AS events were identified using ASTALAVISTA with default options [73]. Results showed that 27,907 AS events were predicted, including 9773 intron retention (IntronR), 3404 exon skipping (ExonS), 15,970 alternative 3 last exon (AltA) and 6388 alternative 5 first exon (AltD) (Table S4) [74]. These results were consistent with the four AS types that account for the majority in plants [75]. To validate the authenticity of the predicted AS, five AS events were verified by RT-PCR. The electrophoresis results were in concordance with the RNA-Seq data ( Figure S1).
Then, the AS events that occurred in structural genes related to cannabinoids, FA and cellulose were analyzed. Results showed that AS occurred in the cannabinoid biosynthetic structural genes LOX, AAE, PT, DXR, MCT and HDR (Table S5). AS also occurred in the FA biosynthetic structural genes, including PDHC and ACCase occurred AS (Table S6) as well as the cellulose biosynthetic structural genes, including CESA, CSL, UDP-galactose transporter, UDP-glucose transporter, kinesin-like protein, kinesin-4, COBRA-like protein, CTL, katanin-like protein, FRA, SuSy and MAPKKK (Table S7).

Identification of Regulatory Proteins in C. sativa
Regulatory proteins play a key role in various biological processes, such as vegetation growth, stress response and the synthesis of bioactive compounds [76,77]. To predict the regulatory proteins in C. sativa, the 48,799 assembly transcripts were compared with the sequences in the iTAK database. In total, 2012 tfs were identified including 63 families in which myb-> myb-related (152), bhlh (128), myb-> myb (118), c3h (114) and garp-> garp-g2-like (99) were the top five families. A total of 900 trs were found including 24 families, in which set (152), snf2 (128), others (127), phd (79) and gnat (74) were the top five families. A total of 2161 pks were identified, which included 114 families. Among them, the largest family is rlk-pelle_dlsv with 354 members (Table S8). Among them, some regulatory proteins were potentially involved in the cellulose metabolism. For example, the mitogen-activated protein kinase kinase (MAPKKK) could promote cell division, differentiation and the synthesis of cellulose in Arabidopsis [78], which contains 48 transcripts. Myb domain protein 46 (myb46) enhanced the content of cellulose by targeting several secondary wall nac regulators and cesas in Arabidopsis [79], which contained one transcript (Table S3).

Identification of miRNAs in C. sativa
The small RNA library was constructed using pooled RNA of stems, roots and leaves of C. sativa and sequenced by the HiSeq 2000 platform. After filtering, a total of 22,164,487 clean reads ranging from 18 to 30 nucleotide (nt) in size were obtained. Among them, the clean reads with counts ≥5 were further analyzed by the psRobot and Mireap software to predict the miRNAs in C. sativa. After checking manually, a total of 563 miRNAs were predicted. Among them, 183 miRNAs were identified as conserved miRNAs by comparing with miRbase, which could be divided into 36 families in which miR156 was the largest family, with 16 members. The remaining 380 miRNAs were adjudged to be C. sativa-specific miRNAs, which were grouped into 60 families. Among them, CsmiRNA-n4 was the largest family that possessed 34 members (Table S9).

Target Prediction of miRNAs in C. sativa
The psRNATarget software was used to predict the target genes of the miRNAs in C. sativa. As a result, the 178 conserved and 365 C. sativa-specific miRNAs were found to potentially target 1822 and 3951 transcripts, respectively (Table S10). These transcripts were involved in multiple functions. Among these target transcripts, 2673 of them were annotated by the KEGG database. KEGG enrichment showed that 15, 25, 160, 271 and 29 transcripts were involved in cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems, respectively, in which the top five enrichment pathways were RecQ-mediated genome instability protein 1, CCR4-NOT transcription complex subunit 4, KAS II, GPI ethanolamine phosphate transferase 3 subunit O and alpha-mannosidase ( Figure 3).

Cleavage Targets Guided by miRNAs
miRNAs could act on their targets by cleavage [80]. In order to validate the authentic cleavage targets of miRNAs, a degradome library was sequenced, and psRobot software was used to analyze the cleavage site of target genes and the reads of degradation fragments [81]. As a result, a total of 10,810,702 clean reads represented by 5,623,324 unique sequences were obtained after high-quality filtering and removing the adapters and null inserts. Most of the miRNAs in plants target genes at the 9th, 10th and 11th nucleotides of the complementary region between miRNA and mRNA, occasionally at the other site [82]. The cleavage sites of miRNA targets were predicted by psRobot software, as shown in Table S13.
In total, 145 transcripts were identified, which were targeted by 62 miRNAs, including 40 conserved miRNAs and 22 C. sativa-specific miRNAs (Table S14). The major targets of conserved miRNAs were TFs involved in development, responding to abiotic stresses and metabolism. In aspects of development, for example, miR156 targeted 34 squamosa promoter binding protein like transcripts (SPLs). In Arabidopsis thaliana L. SPLs played an important part in flowering development [83]. miR160 targeted six auxin response factor 16 (ARF16) transcripts. It was reported that miR160 regulated auxin mediated development by ARF10/16/17 in tomato through posttranscriptional regulation [84]. MiR164 targeted 10 CUP SHAPED COTYLEDON 2 (CUC2) transcripts. MiR164 regulated the axillary meristem formation by targeting CUC1 and CUC2, participating in the development of cotyledons and floral organs in Arabidopsis [85]. In response to abiotic stresses, 20 transcripts encoding NAC TFs were identified as targets of miR164. NAC TFs were validated for response to salinity as well as high temperature and drought stresses in Brassica juncea L. Czern [86]. In aspects of metabolism, miR393 targeted two transport inhibitor response 1 (TIR1) genes and two auxin signaling f-box 2 (AFB2) genes. In Arabidopsis, TIR1 and AFB2 had an effect on the auxin sensitivity [87]. In addition, 16 ubiquitin-conjugating E2 enzyme transcripts were targeted by miR399. A previous study showed that miR399 targeted ubiquitin-conjugating E2 enzyme genes functioning in the control of inorganic phosphate homeostasis in Arabidopsis [88].

Identification of lncRNAs in C. sativa
To decipher the functions of lncRNAs in C. sativa, a pipeline was firstly established to predict the lncRNAs. After removing 48,800 transcripts without Nr annotations, the 1375 transcripts less than 300 bp in length were discarded. Then, the remaining transcripts were filtered by ORF ≤ 100 aa using ESTScan, leaving 8260 genes. Next, the CPAT was used to filter the 430 possible protein-coding genes, leaving a total of 7830 transcripts. Finally, BLASTN was used to remove the housekeeping npcRNAs (47) from the alignment with the Rfam database, thereby obtaining 7783 lncRNA candidates and 47 housekeeping npcRNAs (Table S15, Figure 4A). After that, an assay of the length distribution of lncRNAs was conducted. It was found that 5834 lncRNAs ranged from 300 to 1000 bp. Most of the lncRNAs were distributed between 1000 and 2000 bp. The others included 525 lncRNA candidates with 2000-4000 bp and 41 over 4000 bp in length ( Figure 4B).

Target Prediction for lncRNAs in C. sativa
BLASTN was used to predict the targets of lncRNAs to reveal their regulatory role in C. sativa. Among the 7783 lncRNAs, 1516 lncRNAs were revealed to target 3452 transcripts (Table S16). Among the 3452 transcripts, 1536 were annotated by the KEGG database. KEGG enrichment showed that 26, 25, 102, 224 and 25 genes were involved in cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems, respectively, in which the top five enrichment pathways were 1,4-alpha-glucan branching enzyme, uroporphyrinogen-III synthase, RecQ-mediated genome instability protein 1, structure-specific endonuclease subunit SLX1 and flap endonuclease-1 ( Figure S2).
Then, the targets related to the biosynthetic pathway of cannabinoids, FA and cellulose were focused on. Therefore, lncTar software was used to further predict the interactions between lncRNAs and target genes. As a result, nine lncRNAs were found to target the LOX, GPPS, AAE and DXS genes in the cannabinoid biosynthetic pathway, and eight lncRNAs were found to target the COBL, kinesin-like and MAPKKK genes in the cellulose metabolism (Table 2). In addition, 111 TFs, 59 TRs and 141 PKs were also identified, which were targeted by 92, 90 and 135 lncRNAs, respectively (Table S17).

QRT-PCR Analysis of the Expression Files of lncRNAs and Target Genes
Seven lncRNA/structural gene pairs were selected for qRT-PCR validation, and the primer sequences are shown in Table S19. As a result, three lncRNA/structural gene pairs were identified with negative expression files in at least four tissues, including lncR880 (T_00090880)/AAE (T_00058852), lncR682 (T_00043682)/kinesin-like protein (T_00043674) and lncR578 (T_00030548)/COBL (T_00016031) (Figure 5), indicating the negative regulation of these lncRNAs.

Discussion
In this study, transcriptome, small RNA and degradome libraries of hemp 'Yunma No.1' were constructed and comprehensively analyzed. Compared with previous studies in C. sativa, three advancements were made. Firstly, the protein-coding genes related to cannabinoids, FA and cellulose were mined, and the transporter and regulatory protein genes were systematically predicted.
Secondly, to reveal the gene expression and regulation at post-transcriptional level, the AS, miRNAs and lncRNAs in C. sativa firstly were predicted. A total of 27,907 AS events in C. sativa were identified, and five AS events were validated by RT-PCR at random. Among them, AS occurred in some structural genes of the cannabinoids, FA and cellulose biosynthetic pathway.
MiRNAs and lncRNAs functioned in multiple pathways. We focused on the top five KEGG enrichment pathways of miRNA/lncRNA targets. For miRNAs, it was reported that miR-362 targeted RecQ-mediated genome instability protein 1 [94]. miR-125b targeted ER alpha-1, 2-mannosidase and played a critical role in maintaining protein homeostasis in the mammalian secretory pathway [95], while, KAS II, GPI ethanolamine phosphate transferase 3 subunit O and CCR4-NOT transcription complex subunit 4 potentially targeted by miRNAs were not reported. KAS II was involved in fatty acid elongation from 16:0-ACP to 18:0-ACP [96]. GPI ethanolamine phosphate transferase 3 subunit O functioned in GPI-anchor biosynthesis [97]. The top five KEGG enrichment pathways targeting lncRNAs have not been revealed to date. For example, 1,4-alpha-glucan branching enzyme played a significant role in the biosynthesis of branched polysaccharides, glycogen and amylopectin [98]. All these results indicate the important regulation of miRNAs and lncRNAs in C. sativa.
MiRNAs and lncRNAs potentially regulate the biosynthesis of cannabinoids, FA and cellulose in two aspects. On the one hand, they directly target the structural genes (Tables 1 and 2, Figure 5). On the other hand, they targeted the structural gene related regulatory protein genes. Previously, we determined some GA-or isoflavone-related TFs, TRs and PKs potentially targeted by miRNAs, indicting the significant roles of regulatory protein genes in metabolism [77,99]. Based on the known biosynthetic pathways of cannabinoids, Luo et al. successfully biosynthesized cannabinoids along with the unnatural analogues in yeast [100], but the different isoforms of structural genes, regulatory protein genes and non-coding RNAs were not considered. In this study, only a strand-specific RNA-Seq library was constructed. In the future, more libraries should be sequenced, and co-expression analysis will be carried out between the interested structural genes and regulatory genes to screen out and validate highly relevant regulators and make full use of them.
Additionally, the crosstalk between AS and miRNAs or lncRNAs in the biosynthesis of cannabinoids, FA and cellulose was studied for the first time in C. sativa (Tables S12 and S18). AS could regulate miRNA and lncRNA-mediated gene regulation by producing mRNA isoforms, which contained miRNAs/non-coding RNAs target sites or not [101]. A previous study revealed that the 44 of 354 identified miRNA binding sites were affected by AS in Arabidopsis [42]. Some lncRNA genes also possessed introns, leading to the production of lncRNA isoforms [102]. In Arabidopsis, Calixto et al. identified 135 coldresponsive lncRNAs, of which AS occurred in one-third [103]. In this research, the crosstalk between 17 lncRNAs and their target transcripts was predicted in C. sativa (Table S18).
During our work, a PacBio SMRT and a nanopore-based assembly genome of C. sativa was released in the NCBI under the accession numbers PRJNA73819 and GCA_900626175.2, providing good sources for transcriptome assembly of C. sativa for the future [16,17].
Taken together, all these results enhance the knowledge of gene expression and regulation in C. sativa.

Plant Materials and RNA Extraction
Hemp 'Yunma No.1' plants were cultivated in field conditions at the Yuanjiang experimental station of Bast Fiber Crops, Chinese Academy of Agricultural Sciences. Different tissues including flowers, fruits, stems, leaves and roots were collected from three female hemp plants in September 2014 and quickly frozen in liquid nitrogen. Total RNAs of fruits and roots were extracted using Trizol Reagent (Invitrogen, Carisbad, CA, USA). Total RNAs of flowers, stems and leave were extracted using Lysis Buffer PL (Bioteke, Wuxi, China). Then, these RNAs were treated by RQ1 DNase (Promega, Madison, WI, USA) to eliminate the genome DNA. Briefly, about 100 µg total RNA of each sample was incubated with 20 U DNase I at 37 • C for 30 min. Then, equal volumes of phenol/chloroform/isoamyl alcohol (25:24:1) were added and mixed. After centrifugation at 12,000 rpm for 5 min at room temperature, the upper layer was transferred to a new tube. Subsequently, 1/10 volume of 3 M sodium acetate and 2 volumes of chilled ethanol were added, mixed and kept at for 20 min at −80 • C. Finally, after centrifugation at 12,000 rpm for 10 min at 4 • C, the precipitation was washed with chilled 70% ethanol, dried and dissolved in a suitable amount of DEPCtreated water. Equal quantities of total RNAs from different tissues were pooled together, then their integrity and concentrations were examined using a NanoDrop 2000c bioanalyzer (Thermo Fisher Scientific, Waltham, MA, USA). The pooled RNA with an integrity number of more than seven was used for the following RNA-Seq, small RNA and degradome library construction.

RNA-Seq Library Construction and Analysis
The RNA-Seq library was built by using the dUTP method according to the manufacturer's protocol [104]. First, mRNA was isolated from 2 µg total RNA by using Oligo (dT) magnetic beads. Then, about 20 ng of mRNA was used for library construction. After the RNA sample was qualified, the mRNA was enriched with magnetic beads with Oligo (dT). Fragmentation buffer was used for breaking mRNA into short fragments, which were used as a template to synthesize one-strand cDNA with six base random primers (random hexamers). Then, we added buffer, dNTPs (the dTTP in dNTP was replaced by dUTP) and DNA Polymerase I and RNase H to synthesize two-strand cDNA. The double-strand cDNA was purified by AMPure XP beads. The second strand of cDNAs containing Us was degraded by USER enzyme and further repaired, A-tailed and connected to the sequencing adapter. Finally, PCR amplification was performed, and the final library was purified by AMPure XP beads. After the library was constructed, Qubit was used for preliminary quantification, and then Agilent 2100 was used to detect the size of the insert in the library. High-throughput sequencing was carried out by using a HiSeq 2000 platform. Raw reads were first processed using Trimmomatic with the following options: ILLUMINACLIP: TruSeq3-PE. fa: 2:30:10 LEADING:5 TRAILING:5 to remove adapters and discard reads containing low quality or N bases (below quality 5). After filtering, the clean reads were aligned with the C. sativa 'Purple Kush' genome [18] using tophat with the -a/-min-anchor 8 option, which required that the anchor length should be more than 8bp for splicing-junction reads [18,44]. Then, the mapped reads were assembled by Cufflinks software using -F 0.05 -A 0.01 -I 15000 [45]. Subsequently, the assembled files were used to identify the AS events and types by ASTALAVISTA with default options using the following command: /usr/java/jdk1.6.0_45/jre/bin/java -Xmx4G -jar/srv/web/cgi -bin/astal avista/jar/gphase5.jar astalavista.gtf -nonmd -genome csa -output gtf -html -html 2 > &1 [73]. To determine gene functions, the assembly transcripts were annotated by NR and KEGG databases.

Validation of AS through Reverse Transcriptional PCR
Total RNA (2 µg) as described for RNA-Seq library construction was used for reverse transcription PCR (RT-PCR) to validate the AS events. Briefly, RT was carried out using 2 µg of total RNA by 200 U M-MLV Reverse Transcriptase (TaKaRa, Dalian, China) in a 20 µL volume under the following the conditions: 65 • C for 5 min; 25 • C for 10 min; 42 • C for 60 min and 70 • C for 15 min. The resulting cDNA was used for PCR amplification, which was carried out under the following conditions: 95 • C for 3 min; 45 cycles of 94 • C for 30 s; 58 • C for 30 s and 72 • C for 15 s. The PCR products were separated by electrophoresis with a 3% agarose gel. The primers are listed in Table S19.

Small RNA Library Construction and Analysis
The small RNA library was constructed as described [105]. Briefly, 10-30-nt small RNAs were purified from 1 µg Total RNA by a 15% denaturing polyacrylamide gel and then ligated to the adapters. After reverse transcription by M-MLV (TaKaRa, Dalian, China), these small RNAs were amplified by PCR and isolated by 6% polyacrylamide gel, and suitable sizes were recovered and checked by qPCR. Finally, the library was sequenced by the HiSeq 2000 platform at Beijing HiTseq Technology Co. Ltd. (Beijing, China).
After sequencing, the raw reads were first processed using Trimmomatic software [106] with the option LEADING:5 TRAILING:5 to discard reads containing low quality or N bases (below quality 5). Then, adaptor and shorter reads (less than 16 nt) were removed by fastx_clipper in the FASTX-Toolkit v0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/) (2 February 2010) using the option fastx_clipper -Q 33 -a CTGTAGGCACCATCAATCA -l 16 -d 0 -n -v -M 4. Then, the clean small RNAs with counts ≥5 ranging from 18 to 30 nt were further analyzed. These small RNAs were compared with the C. sativa 'Purple Kush' genome sequences by the psRobot software (v1.2) [81] and mireap software [107] using the default parameters. Then, these predicted miRNAs were checked manually according to the criterion of miRNAs [108]. Minimum free energy index (MFEI) was also calculated [109], and a cutoff >0.85 was used to distinguish miRNAs from protein-coding genes because the MFEI of more than 90% of miRNA precursors is >0.85.

Prediction of MiRNA Targets
Through the psRNATarget Server using an expectation penalty score of ≤3, targets of miRNAs were predicted [110].

Degradome Library Construction, Sequencing and Analysis
Briefly, about 150 µg pooled total RNA was used for degradome library construction by BGI (Shenzhen, China) following a published protocol [111]. In brief, mRNAs were isolated using oligo(dT) magnetic beads. Then, single-stranded 5' RNA adaptors were ligated to mRNA fragments using T4 RNA ligase (Ambion, Austin, TX, USA) and then reverse-transcribed into cDNA using Superscript III reverse transcriptase (Invitrogen, Carlsbad, CA, USA). After digestion with MmeI, 3' DNA adaptors were ligated to the digested DNA fragments. Finally, the products were amplified by PCR and sequenced using the HiSeq 2000 system (BGI, Shenzhen, China). The cleavage sites and the targets of miRNAs were identified using the psRobot software (v1.2) using the options -ts 2.5 -fp 2 -tp 17 -gl 17 -p 1 -gn 1 [81].

Prediction of LncRNAs
A pipeline was established for identification of lncRNAs in hemp ( Figure 4). First, the assembly transcripts without Nr annotation were further analyzed. Then, the transcripts less than 300 base pairs (bp) in length were removed. Subsequently, the remaining transcripts were filtered by ESTScan with a cutoff of ORF ≥ 100 amino acids (aa) [112]. Next, the coding potential assessment tool (CPAT) was used to filter the possible protein coding genes [113]. Finally, BLASTN program was used to remove the housekeeping npcRNAs annotated by the Rfam database.

Prediction of LncRNA Targets
Targets of lncRNAs were predicted by using BLASTN to search the lncRNAs with regions significantly complementary to the protein-coding genes using the cutoff 1 × 10 −5 [114,115]. Then, the interesting lncRNA-RNA interactions were further analyzed by LncTar using the cutoff of the normalized deltaG (ndG) ≤ 0.1 [116].

QRT-PCR Analysis of the Expression Files of LncRNAs and Targets
RNA purity, concentration and integrity of each sample treated by DNase I were examined using a NanoDrop 2000c bioanalyzer (Thermo Fisher Scientific, Waltham, MA, USA). Then, RT was carried out using 1 µg total RNA and 200 U M-MLV Transcriptase (TaKaRa, Dalian, China) in a 20 µL volume. The reaction was conducted at 70 • C for 10 min, 42 • C for 60 min and 70 • C for 15 min. The RT product was diluted 40 times with sterile water. The qPCR was conducted on the BIO-RAD CFX system in a 20 µL volume containing 4 µL diluted cDNA, 250 nM forward primer, 250 nM reverse primer, and 1 × SYBR Premix Ex Taq II (TaKaRa, Dalian, China) using the following conditions: 95 • C for 3 min, 40 cycles of 95 • C for 15 s, 59 • C for 15 s and 72 • C for 15 s. Each test was performed in triplicate. The 40S ribosomal protein S8 (T_00004056) was used as an internal reference because its expression level is relatively stable in different tissues. Melting curves were analyzed to verify the specificity using the Bio-Rad CFX Manager software. The relative expression levels were calculated using the 2 −∆∆CT method [117]. The primer sequences are listed in Table S20.  Data Availability Statement: All the raw data of the transcriptome and small RNAs are deposited in the NCBI short read archive (SRA) under the accession numbers SRR12904745 and SRR12886428.

Conflicts of Interest:
The authors declare no conflict of interest.