Genome-Wide Identification and Characterization of UTR-Introns of Citrus sinensis

Introns exist not only in coding sequences (CDSs) but also in untranslated regions (UTRs) of a gene. Recent studies in animals and model plants such as Arabidopsis have revealed that the UTR-introns (UIs) are widely presented in most genomes and involved in regulation of gene expression or RNA stability. In the present study, we identified introns at both 5′UTRs (5UIs) and 3′UTRs (3UIs) of sweet orange genes, investigated their size and nucleotide distribution characteristics, and explored the distribution of cis-elements in the UI sequences. Functional category of genes with predicted UIs were further analyzed using GO, KEGG, and PageMan enrichment. In addition, the organ-dependent splicing and abundance of selected UI-containing genes in root, leaf, and stem were experimentally determined. Totally, we identified 825 UI- and 570 3UI-containing transcripts, corresponding to 617 and 469 genes, respectively. Among them, 74 genes contain both 5UI and 3UI. Nucleotide distribution analysis showed that 5UI distribution is biased at both ends of 5′UTR whiles 3UI distribution is biased close to the start site of 3′UTR. Cis- elements analysis revealed that 5UI and 3UI sequences were rich of promoter-enhancing related elements, indicating that they might function in regulating the expression through them. Function enrichment analysis revealed that genes containing 5UI are significantly enriched in the RNA transport pathway. While, genes containing 3UI are significantly enriched in splicesome. Notably, many pentatricopeptide repeat-containing protein genes and the disease resistance genes were identified to be 3UI-containing. RT-PCR result confirmed the existence of UIs in the eight selected gene transcripts whereas alternative splicing events were found in some of them. Meanwhile, qRT-PCR result showed that UIs were differentially expressed among organs, and significant correlation was found between some genes and their UIs, for example: The expression of VPS28 and its 3UI was significantly negative correlated. This is the first report about the UIs in sweet orange from genome-wide level, which could provide evidence for further understanding of the role of UIs in gene expression regulation.


Introduction
The existence of intron in genes was initially discovered in late 1970 [1]. Gradually, it has come to be identified as a common event in all eukaryotic genomes. In the past, it was recognized as an important gene structure that was eliminated from transcripts by a complex molecular mechanism 2 of 23 called a spliceosome [2][3][4]. Recently, however, the intron retention (or alternative splicing, AS) in transcripts has also been widely identified, suggesting that they are not only components in gDNA but may also play roles in gene expression regulation or even functioning.
Introns are important for the gene expression process and are thought to go through five phases before the formation of mature mRNA: Genomic intron, transcribed intron, spliced intron, excised intron and exon-junction complex (EJC)-harboring transcript [4]. The presence and removal of introns in gene transcripts affect almost every step of gene expression, including transcription and polyadenylation, mRNA export, localization, translation and attenuation [5,6]. Leon et al. [7] reported that the maize Adhl intron 1 enhanced gene expression in the legume species for about 10-fold. The Sh1 intron 1 enhances chimeric gene expression in rice and maize protoplasts approximately 100-fold [8]. These have revealed that the introns in the transcripts showed positive effect on gene expression, i.e., intron-mediated enhancement (IME) [9][10][11].
In eukaryotes, mature mRNAs have a tripartite structure consisting of a 5 -untranslated region (5 UTR), a coding region (CDS) and a 3 -untranslated region (3 UTR). It has been widely proven that UTRs play crucial roles in the control of mRNA translation efficiency, stability and subcellular localization [12]. There are also introns in the 5 and 3 UTRs of many protein-coding genes. The 5 UTR intron (5UI) is usually longer than the intron within the CDS and may affect gene expression, mRNA stability or mRNA export [13]. Reports have shown that, like CDS introns close to the 5 end of a gene, introns located in the 5 UTR also often exert positive effects on gene expression by affecting the basal promoter activity [14][15][16][17][18]. In the study of Norris et al. [19], the 5UIs of Arabidopsis thaliana polyubiquitin genes (UBQ3, UBQ10, and UBQ11) were identified to be quantitative determinants of chimeric gene expression. Grant et al. [16] reported that the 5UI might contribute to the high activity of the soybean polyubiquitin promoter. Sivamani and Qu [20] reported that the rice polyubiquitin gene (rubi3) promoter containing the 5UI conferred approximately 20-fold higher GUS gene expression than the intron-less promotor in bombarded rice suspension cells. Chaubet-Gigot et al. [21] demonstrated that the 5UI of Arabidopsis replacement histone H3 genes together with their endogenous promoters could produce higher expression of the H3 genes. Kamo et al. [15] also revealed that transgenic Gladiolus (monocot) and Arabidopsis (dicotyledon) plants overexpressing the Gladiolus polyubiquitin promoter (GUBQ1) containing 5UI showed higher GUS expression level compared with the transgenic plants overexpressing GUBQ1 promoter without it.
Unlike 5UI, 3UI was thought to usually downregulate gene expression levels [22,23] through a nonsense-mediated decay (NMD) pathway [24,25]. The NMD inhibition affects a much higher proportion of 3UI containing gene transcripts than non-3UI containing gene transcripts, and the most significantly enriched NMD-affected gene transcripts are those that encode RNA-binding proteins [23,26,27]. The transcripts containing 3UIs were previously thought to be nonfunctional because of their rapid degradation through NMD. Recently, however, some 3UI containing transcripts were also proved to be functional [23].
The genome-wide identification of UIs can provide information for the understanding of gene expression regulation and alternative splicing network in the species from the whole genome level. Cenik et al. [28] studied the UI distribution in human genome and found that the highly expressive genes often had a short 5UI, indicating the 5UI in the human genome enhanced the expression of certain genes in a length-dependent manner. Hong et al. [29] analyzed the intron size, abundance, and distribution in A. thaliana, Drosophila melanogaster, human, and mouse UTRs, found that 5UIs were approximately twice as large as introns in CDSs and 3UIs, and there was a sharp drop in intron size at the 5 UTR-CDS boundary. Chung et al. [30] analyzed the UI distribution in A. thaliana genes, found that 5UIs from Arabidopsis were more proximal to the UTR end or closer to the transcriptional start site. Moreover, they also found that the 5UI in EF1a-A3 gene could enhance its expression in a size dependent manner.
Sweet orange (Citrus sinensis) is an important fruit with high nutrition and economical values. It accounts for approximately 60% of citrus production for both fresh fruit and processed juice consumption (FAO, http://faostat.fao.org/default.aspx). Its genome has been published and the UTR information was well addressed [31], which could be used for the genome-wide identification of UIs. Based on the C. sinensis genome, we identified and characterized the distribution of introns, including 5UI, 3UI, and introns in CDSs. The UI existence conditions in several genes were determined using RT-PCR and their expression pattern in different organs were investigated using qRT-PCR. To our knowledge, this is the first report of horticultural plant UIs from genome-wide level.

UTR Introns in the Citrus sinensis Genome
Totally, 29,655 protein-coding genes were identified from the sweet orange genome, of which 15,823 (53.36%) contained both 5 UTR and 3 UTR, 1093 (3.69%) only contained 5 UTR, 1585 (5.34%) only contained 3 UTR, and 11,154 (37.61%) contained neither. In Arabidopsis, more genes were found to lack 5 UTR annotation rather than 3 annotation [30], which is similar to the result found in our present study. To show the intron density in UTRs and CDSs, we normalized the intron density to the average number of introns per nucleotide of each gene transcript sequence. The intron density followed the order: CDS > 5 UTR > 3 UTR, which was 1.42 × 10 −4 , 2.98 × 10 −3 and 6.36 × 10 −5 respectively ( Table 1). The intron density in 5 UTRs is~2.2 times the intron density in 3 UTRs and is only~4.8% of the intron density in CDSs. The intron density in the UTRs and CDSs of Arabidopsis gene transcripts also follow the same order, but with much higher density [30]. Notably, the intron density in the Arabidopsis 5 UTRs (~1.6 × 10 −3 , about 60% of CDSs) was > 10 times higher than that of sweet orange, indicating that the 5UI regulation of gene expression in Arabidopsis is more frequent. The high 5 UTR intron density might be due to the fact that the Arabidopsis genome is better sequenced and annotated with less unknown sequences (Ns). In both sweet orange and Arabidopsis, the 3 UTR intron density is the lowest (the 3UI density is only), suggesting that plants may utilize a similar NMD pathway like mammals [30,32,33]. A total of 965 5UIs and 745 3UIs respectively were identified from 825 5 UTR-and 570 3 UTR-containing gene transcripts, corresponding to 617 and 469 genes (Supplemental Materials  Table S1), respectively. Among these UI-containing gene transcripts (UI-Ts), 77 (74 genes) were found to contain both 5UI and 3UI. Around 86.43% of the 5UI-Ts and 78.60% of the 3UI-Ts only contain one 5UI or 3UI ( Table 2). The proportion of UI-Ts with two or more UIs dropped greatly, which is consistent with previous reports in other organisms [28,29,34]. Only 10.91% of the 5UI-Ts contain two 5UIs, and 15.79% of the 3UI-Ts contain two 3UIs. The percentages of UI-Ts containing more than three 5UIs or 3UIs were both less than 4%. Notably, we also identified some UI-Ts were of several UIs, for example, a A20/AN1-like zinc finger family protein gene (Cs7g06380.1) and an unknown gene (orange1.1t06039.1) had five 5UIs; a zinc finger protein gene (Cs2g17870.1) and another unknown gene (Cs5g25765.1) had four 5UIs; two NB-ARC domain-containing disease resistance protein genes (Cs1g15550.1, Cs1g17000.1) had six 3UIs, four pentatricopeptide repeat (PPR)-containing protein genes (Cs2g11780.1, orange1.1t01460.1, orange1.1t03486.1,) and a NB-ARC domain-containing disease resistance protein gene (Cs1g16990.1) had five 3UIs, eight unknown genes (Cs2g08495.1, Cs3g12715.1, orange1.1t05956.1, orange1.1t03536.1), one S-phase kinase-associated protein 1 (Cs3g10260.1), one pentatricopeptide repeat (PPR)-containing protein gene (Cs5g26550.1), one NB-ARC domain-containing disease resistance protein gene (Cs5g28645.2) and one tetratricopeptide repeat (TPR)-like superfamily protein gene (Cs7g15390.1) had four 3UIs. Interestingly, by searching the expression data of sweet orange genes in the genome database, we found that the UI-Ts containing multiple 3UIs usually tend to record low-expression levels (average FPKM < 4). We further mapped the UI-Ts to the sweet orange chromosomes ( Figure 1). Results showed that the UI-Ts were unevenly distributed in all the sweet orange chromosomes. Chromosome 2 showed the highest 5UI-T distribution (16.79%) and the highest density (5.26 × 10 −6 ), while chromosome 1 showed the highest 3UI-T distribution (13.83%) and the highest density (3.58 × 10 −6 ) ( Table 3). Chromosome 9 contained the least 5UI-Ts and 3UI-Ts, respectively accounting for 4.96% and 3.62% (Table 3). The 5UI-Ts densities in chromosome 2, 3, 4, 5, 6, 7, and 9 were all higher than the 3UI-Ts densities. Notably, the 5UI-T density is about 2.75 times higher than the 3UI-T density in chromosome 2, and 2.57 times higher in chromosome Moreover, the UI-Ts distribution in the same chromosome is also uneven. For example, significant more 5UI-Ts were found in the 3 ends of chromosome 3, 5, and 6 and in the 5 end of chromosome 4, and significantly more 3UI-Ts in the 3 ends of chromosome 1, 2, 5, and 8.
We further calculated the UI distribution in the sweet orange chromosomes. Like the UI-T distribution result, the highest 5UI density was found in Chromosome 2 and the highest 3UI density was found in Chromosome 1, and the lowest 3UI density (2.38 × 10 −6 ) was found in chromosome 9 ( Table 3). The 5UI density in chromosome 2 and chromosome 6 were both~2 times of 3UI density. However, unlike the 5UI-T distribution condition, the lowest 5UI density was found in chromosome 8, which might be due to the chromosome length differences.
Intron number and size distribution analysis result showed that the number and length of introns within 5 UTRs, CDSs and 3 UTRs varied (Figure 2A We further calculated the UI distribution in the sweet orange chromosomes. Like the UI-T distribution result, the highest 5UI density was found in Chromosome 2 and the highest 3UI density was found in Chromosome 1, and the lowest 3UI density (2.38 × 10 -6 ) was found in chromosome 9 ( Table 3). The 5UI density in chromosome 2 and chromosome 6 were both ~2 times of 3UI density. However, unlike the 5UI-T distribution condition, the lowest 5UI density was found in chromosome 8, which might be due to the chromosome length differences.   By comparing the intron size distributions within 5 UTRs, 3 UTRs, and CDSs, it is obvious that short introns accounted for a lot, which is similar to the result in Arabidopsis [30]. Most abundant short introns <50 nucleotides were found in the 3 UTRs, followed by in the 5 UTRs, and in CDSs short introns were very rare. The relative frequency of introns with length ranging from 100 to 300 nucleotides in CDSs was significantly higher than that in 5UIs and 3UIs. Longer introns above 300 nucleotides in UIs were more than that in CDSs, indicating that CDS intron length was more conservative. Figure 2B,C respectively represents the distribution of intron position within the 5 UTRs and 3 UTRs, relative to the beginning and end of the corresponding UTRs. It appears that 5UIs are preferentially located close to the stop ends of 5 UTRs (nearby the translation initiation site of genes), although the location preference close to the beginning of 5 UTR is also relatively high. The splicing-dependent complex mRNA-protein (mRNP) component, which showed positive influence on rapid export and translation of newly synthesized mRNA, is also deposited as close as possible to the 5 end of the mRNA [35]. Studies have also shown that 5UI would lead to large accumulation of EJCs, which interact with the translation initiation site and result in IME [36,37]. The proximity of 5UI to the end of 5 UTR (which is also close to the translation start site and 5 end of mRNA) is well consistent with the 5UI regulatory role in gene translation [30].
It was reported that an EJC downstream of the stop codon should persist and stimulate NMD [38]. In our present study, we found that 3UIs are more frequently located at the beginning of 3 UTRs, i.e., close to the stop codon. The proximity of 3UI to the stop codon might cause the stop codon from being recognized as premature and triggering NMD [39].
Intron removal is influenced by many splicing signals and factors [40]. The splice sites (SS) in the exon-exon junction, including 5 donor site and 3 acceptor site, were usually conserved. However, in some instances the SS alternate, resulting in AS event [41]. By comparing the nucleotide preferences surrounding the splicing junction using sequence logos [42], the nucleotide bias around the donor and acceptor site of 5 UTR, CDS and 3 UTR introns were found. Figure 3A,B respectively shows the aggregation of nucleotides around the splice donor (GT) and splice acceptor (AG) junctions in 5 UTRs and 3 UTRs. About 97.92% of the 5UIs were with splice site pair of GT-AG, followed by GC-AG (1.45%), AT-AC (0.20%), CT-AC (0.20%), CT-GG (0.10%), and TG-AT (0.10%). Of the 3UIs, 97.18% were with splice site pair of GT-AG, followed by GC-AG (2.14%), AT-AC (0.26%), GT-TG (0.13%), TA-AG (0.13%), and TT-AG (0.13%). The splice site pair category results of sweet orange UIs were similar to the findings in Arabidopsis and mammalian genomes [30,43]. The sequence logos showed that both the UTR introns and the CDS introns exhibit A/T-rich element in both donor sites and receptor sites. The early recognition of introns is thought to be mediated by UA-binding proteins [40]. Thus, it was suspected that the A/T-rich element might contribute to intron recognition. It was reported that U-rich sequence in the 5UI can bind to the RNA-binding protein, which can promote the translation initiation of uAUG by interacting with transcription initiation factors [44][45][46][47]. This suggests that the A/T-rich element in the 5UI contribute to gene translation initiation. Additionally, Chung et al. [30] identified a significant C-rich region near the donor site of Arabidopsis 5UI, which was predicted to be necessary for the spliceosomal recognition of introns within non-coding sequences. Although a C-rich region (ranges from +5 to +25 bases after intron start) can also be found in sweet orange 5UIs, the frequency of C is much lower.

Functional Implications of Genes with UTR Introns
To explore the functional characteristics of 5UI-Ts and 3UI-Ts, GO enrichment analysis was performed. From the aspect of biological processes, 107 and 7 GO terms were significantly enriched for 5UI-Ts and 3UI-Ts, respectively (Supplemental Materials Tables S2 and S3). For the 5UI-Ts, genes involved in "translation", "peptide biosynthetic process", "metabolic process", and "gene expression" accounted a lot. While, 3UI-Ts were mainly involved in "defense response", "response to stress", "response to stimulus", "histone methylation", "peptidyl-lysine methylation", "histone lysine methylation", and "peptidyl-lysine modification". From the aspect of cellular component, 41 GO terms, including genes involved in "organelle", "ribosome", "intracellular ribonucleoprotein complex", "intracellular part", "cell part", and "cell" were significantly enriched for 5UI-Ts. For the 3UI-Ts, there are just 4 terms, i.e., "exocyst", "cell cortex", "cell cortex part", and "cytoplasmic region", were identified to be significantly enriched. From the aspect of molecular function, 45 and 12 GO terms were significantly enriched respectively for 5UI-Ts and 3UI-Ts. For the 5UI-Ts, genes were mainly involved in "protein binding", "zinc ion binding", "methionine adenosyltransferase activity", "glycylpeptide N-tetradecanoyltransferase activity", "ribonuclease inhibitor activity" "myristoyltransferase activity", "structural constituent of ribosome", "translation factor activity", "RNA binding", and so on. For the 3UI-Ts, genes were found to involved in "ADP binding", "protein binding", "binding", "histone binding" and "translation factor activity, RNA binding", and so on. The GO enrichment analysis results indicated that the 5UI-Ts were highly correlated with the gene expression, while many 3UI-Ts were mainly involved in stress responses. Moreover, the main functions of 5UI-Ts and 3UI-Ts differed and their roles in different cell parts varied.
By using KEGG enrichment analysis, we investigated the pathway enrichment of both 5UI-Ts and 3UI-Ts (Supplemental Materials Table S4 and S5). It has been confirmed that the presence or absence of a 5UIs can determine the mRNA export mechanism [23,48]. Consistently, the only significantly enriched pathway for 5UI-Ts is "RNA transport". Genes such as translation initiation factor protein EIF1 (Cs4g10310.1, Cs4g10310.2), EIF1A (Cs4g10310.1, Cs4g10310.2) and EIF5 (Cs3g18950.1, Cs6g18000.1, Cs6g18000.2), protein transport proteins (SEC13, Cs1g15600.1 and Cs2g28780.1), nonsense-mediated mRNA decay protein 3 (NMD3, Cs6g17980.1), and Ran GTPase-activating protein 1 (RANGAP1) (Cs9g06440.1, Cs9g06440.2, and Cs9g06440.3) which were identified to be RNA transport related. Notably, EIF1 and EIF5 have been revealed to function in stimulating the assembly of the translation initiation complex by interacting with the 40S ribosomal subunit [49]. SEC13, as a component of the nuclear pore complex (NPC) and the COPII coat, has been proved to be required for efficient mRNA export from the nucleus to the cytoplasm and for correct nuclear pore biogenesis and distribution [50]. NMD3, however, was found to be involved in NMD of mRNAs containing premature stop codons [51]. The RANGAP1 converts cytoplasmic GTP-bound RAN to GDP-bound RAN, which is essential for RAN-mediated nuclear import and export [52]. From these reports, it can be concluded that these RNA transport related genes were all involved in gene expression regulation. The existence of UI in these genes indicated that UI play roles in regulating gene expression.
We further applied PageMan to show the pathway enrichment using the corresponding genes of the 5UI-Ts, 3UI-Ts and UI-Ts with both 5UI and 3UI (respectively 617, 469 and 74 genes). PageMan enrichment analysis showed that for the 5UI and 3 UI containing genes, most genes were categorized into "not assigned" pathway (Supplemental Materials Tables S6 and S7).

UTR Introns and Transcriptional Enhancers
Transcriptional enhancers have been identified in intron sequences by computational methods [64][65][66]. Cenik et al. [28] found genes with regulatory roles are particularly enriched with 5UIs compared to genes without 5UI. Therefore, UIs can be considered as important cis-regulatory elements that regulate the multiple levels of gene expression. In this study, we predicted the cis-acting elements in UI sequences. Totally, 39893 and 28702 cis-acting elements are respectively identified in 5UIs and 3UIs. Among these cis-acting elements, "core promoter elements around -30 of transcription start" and "common cis-acting element in promoter and enhancer regions" take the largest part, and more than 88% 5UIs and 3UIs contain these elements.
It has been found that introns present in the 5 UTR often lead to increased expression of transgenes [64][65][66][67]. Grant et al. [16] found that the synthetic 5 UTR intron fragments of the Glycine max polyubiquitin (Gmubi) gene placed downstream of the 35S core promoter could enhance the expression of transgene, suggesting that these fragments can function as promoter regulatory elements and contribute to increased expression. The existence of promoter elements in UIs suggested that UIs function similarly to promoters in regulating gene expression. Besides, "light responsive element", "cis-acting regulatory elements essential for the anaerobic induction", and many phytohormone responsive elements were also widely found in UI sequences (Supplemental data Tables S9 and S10), suggesting that UIs might function in the light or phytohormone responses.

Experimental Validation of 5UI and 3UI Splicing
By using leaf gDNA and cDNAs of root, leaf and stem as templates, PCR reactions were performed to verify the UI existence in UTRs. The results proved the existence of introns in the UTRs and intron retention events were found in some UI-Ts (Figure 4).
Two 3UIs, one 5UI and one 3UIs were annotated respectively in PPR (Cs6g01290.1), VPS28 (Cs2g06750.1) 5 UTR, R gene (Cs5g21990.1) 3 UTR, our RT-PCR result showed that the cDNA sequence did not contain these introns in all the three organs, which meant they were all removed during the forming of mature mRNA. While, for the one 3UI containing LTP gene (Cs5g09070.2), intron retention was found in all the organs. Moreover, there was one 5UI, one 5UI and one 3UI annotated in the PPR gene (Cs6g01290.1) 5 UTR, DUF247 gene (Cs2g24990.1) 5 UTR, and GRAS gene (Cs8g18700.1) 3 UTR respectively. However, by using RT-PCR, two bands were amplified from cDNA. Among the two bands, one was the same as the gDNA sequence and the other one was shorter than the gDNA sequence. The removed sequence length was the same as the length of the UI. This indicated that, for these three UTRs, intron retention happened in some gene transcripts.
The VPS28 gene also contained one 3UI. According to the RT-PCR result, we found that 3 UTR amplified from root cDNA is the same length as the gDNA sequence. The 3 UTR amplified from stem cDNA, however, was shorter than the gDNA, and the missed length was same to the length of the 3UI. The result indicated that intron transcription of the VPS28 3 UTR differed in these two organs. Moreover, no band was amplified from the leaf cDNA, suggesting that the expression of VPS28 3 UTR in leaf was so low that it could not be successfully amplified by RT-PCR.
Two 5UIs, one was 231 bp and the other 168 bp, were annotated in the EIN3 gene (Cs2g29100.1). By using RT-PCR, we amplified tree bands from the cDNAs of the three organs. The length of largest band was same as the gDNA, the lacked length of the middle one compared with gDNA was the length of the small intron, while the missing length of smallest one was the length of the big intron. To further study the expression of UIs in different organs and to investigate the correlation between the expression of UIs and their corresponding genes, qRT-PCR analysis of the 8 selected genes and all the UIs annotated in them in three sweet orange organs, i.e., root, stem and leaf, was performed ( Figure 5). The qRT-PCR results revealed expression of all the UIs in all the three organs, which is different from the RT-PCR results. This might be caused by the sensitivity difference between the two techniques.
PPR gene contained one 5UI and two 3UIs, the gene's expression in leaf was significantly higher than in the root, the expression of 3UI-1 in leaf and stem was very significantly lower than in the root, the expression of 3UI-2 in stem was significantly lower than in the root. VPS28 gene contained one 5UI and 3UI, the gene's expression in leaf was significantly higher than in the root, while the expression of 3UI in leaf is significantly lower than in the root. DUF247 gene contains one 5UI, the There are three 3UIs annotated in the TPR gene (Cs8g15200.1), and their length was 1006 bp, 208 bp and 158 bp, respectively. By using RT-PCR, we amplified two target bands, one was 1164 bp (the length of the largest 3UI plus the smallest 3UI) shorter than gDNA and the other 1372 bp (the total length of all the three 3UIs) shorter than gDNA.
To further study the expression of UIs in different organs and to investigate the correlation between the expression of UIs and their corresponding genes, qRT-PCR analysis of the 8 selected genes and all the UIs annotated in them in three sweet orange organs, i.e., root, stem and leaf, was performed ( Figure 5). The qRT-PCR results revealed expression of all the UIs in all the three organs, which is different from the RT-PCR results. This might be caused by the sensitivity difference between the two techniques. PPR gene contained one 5UI and two 3UIs, the gene's expression in leaf was significantly higher than in the root, the expression of 3UI-1 in leaf and stem was very significantly lower than in the root, the expression of 3UI-2 in stem was significantly lower than in the root. VPS28 gene contained one 5UI and 3UI, the gene's expression in leaf was significantly higher than in the root, while the expression of 3UI in leaf is significantly lower than in the root. DUF247 gene contains one 5UI, the gene's expression in leaf and stem was respectively very significantly and significantly higher than in the root, while the 5UI expression in stem is very significantly lower than in the root. EIN3 gene contains two 5UIs, the gene's expression in leaf is very significantly higher than in the root. The expression of 5UI-1in both leaf and stem were significantly higher than in the root. The expression of 5UI-2 in leaf was significantly higher than in the root. LTP gene contains one 3UI, the gene's expression in leaf was significantly lower than in the root, and the 3UI's expression significantly lower than in the root. GRAS gene contains one 3UI, the expression of the gene and 3UI showed no significant difference among the three organs. The R gene contains one 3UI, the gene's expression showed no significant difference among the three organs, but the expression of its 3UI in leaf and stem was significantly higher than in the root. TPR gene contains three 3UIs, the gene's expression in leaf and stem was significantly higher than in the root, and the expression of 3UI-1 in leaves significantly higher than in the root, but other two UIs showed no significant difference in the three organs.
It has been reported that the 5UI expression function in enhancing gene expression, and 3UI expression usually lead to the degradation of its corresponding gene [13,[68][69][70][71]. Consistently, in our present study, we found that the expression of VPS28 and its 3UI was significantly negative correlated. Although no significant correlation was identified, the relative coefficient between the expression of EIN3 and its two 5UIs was positive. However, we also found that the expression of R gene and its 3 UI were significantly positive correlated. The relative coefficient between the expression of LTP and its 3UI, TPR and its three 3UIs was positive, the expression of PPR and its 5UI, VPS28 and its 5UI, DUF247 and its 5UI, was negative. This might be due to the very complex nature of gene expression regulation and its involvement of many sequence elements, noncoding RNAs or factors [72][73][74]. Thus, the detailed regulatory role of UIs in gene expression needs to be further studied by experimental researches.

Genome-Wide Identification of Sweet Orange UIs
Genome data of C. sinensis was downloaded from the Citrus sinensis annotation project (http: //citrus.hzau.edu.cn/orange/download/index.php/) [31,75]. Based on the genome annotation data, introns in CDSs, 5 UTR and 3 UTR were extracted. As many genes having alternative splicing events, intron retention in any transcript was excluded in order to identify all the UTR introns. Statistical analysis of UI density, position preference, length and nucleotide composition was performed using Perl. And the ggplot2 software was used for the figures drawing.

Enrichment Analysis of Genes Containing UIs
To illustrate the genes containing UIs, we conducted GO and KEGG pathway enrichment analyses of the 5UI containing and 3UI containing genes on the Dynamic GO Enrichment Analysis online website (https://www.omicshare.com/tools/Home/Report/goenrich) and on the dynamic KEGG enrichment analysis online website (https://www.omicshare.com/tools/Home/report/koenrich), respectively. Moreover, to better show the pathway enrichment result, MapMan and its embedded PageMan software were used [76].

Cis-Acting Element Prediction of UI Sequences
Intron regulatory elements play important roles in regulating gene expression [16]. In this study, the cis-acting elements in UIs were analyzed and retrieved using PlantCARE (http://bioinformatics.psb. ugent.be/webtools/plantcare/html/) to show the distribution of cis-acting elements in UI sequences.
Leaf, stem and root samples of Four-month-old Valencia sweet orange (Citrus sinensis cv. Valencia) were collected, pre-cooled in liquid nitrogen and then stored at −80 • C for further gDNA and RNA extraction. For DNA isolation, E.Z.N.A. ® HP Plant DNA kit (Omega, Norcross, GA, USA) was used. For total RNA isolation, the total RNA extraction kit (TIANGEN, Beijing, China) was used. After RNA integrity and quality check using 1% agarose gel electrophoresis and spectrophotometry, high quality RNA samples were reverse-transcribed into cDNA using the First Strand cDNA Synthesis Kit (Thermo Scientific, Wilmington, DE, USA). The results of the validation showed that none of the UIs of some genes were successfully cloned. The extracted gDNA and reverse-transcribed cDNA of different tissues were used as templates to amplify by RT-PCR verification. Primers used for UI structure verification were designed according to the UTR sequences of each UI containing gene transcripts to amplify sequence containing all the possible UIs. Primer information was listed in Supplemental Materials Table S1.
To study the expression level of UI in different organs of sweet orange, qRT-PCR was performed to show the expression pattern of UIs of the selected eight genes using the same RNA for UI verification. cDNA for qRT-PCR was obtained using the RNA samples using Hifair ® II 1st Strand cDNA Synthesis SuperMix kit. Reactions were carried out on the LightCycler480 Real-time PCR in a final volume of 20 µL containing 1 µL of cDNA, 10 µL Hieff ® qPCR SYBR ® Green Master Mix (No Rox/ Low Rox/ High Rox), 0.8 µL each of the forward and the reverse primers (2 µM), and 7.4 µL of sterile water. The thermocycler was programmed as: 95 • C for 5 min followed by 40 cycles of 95 • C for 10 s, 54~58 • C for 20 s and 72 • C for 20 s. The expression was calculated by 2 −∆∆Ct method [77] with β-actin gene as internal control [78]. Data processing and differential significance analysis were performed using Excel2016 and SPSS17.Information of primers used for qRT-PCR analysis was also shown in Table S11.

Conclusions
In this study, we identified a total of 965 5UI sequences and 745 3UI sequences from 825 5UI-Ts (corresponding to 617 genes) and 570 3UI-Ts (469 genes). Among these UI-Ts, 77 (74 genes) contain both 5UI and 3UI. The density of 5UI and 3UI was respectively the highest in chromosome 2 and chromosome 1, and were both the lowest in chromosome On chromosome 2 and chromosome 6, the density of 5UI was found to be significantly greater than 3UI. The average length of 5UIs and 3UIs were similar and were both larger than CDS introns. 5UIs were more biased towards the both ends especially close to the stop end of the 5 UTR, and 3UIs were biased towards the start of 3 UTR. Both the UTR introns and the CDS introns exhibit A/T-rich element, which is considered to be an efficient splicing intron [40,79]. Function enrichment analysis revealed that genes containing 5UI were significantly enriched in the RNA transport pathway, which supported their role in mRNA export [23]. Genes containing 3UI were significantly enriched in splicesome, suggesting that the 3UIs to be widely involved in the splicing of pre-mRNA or NMD [69,80,81]. Notably, many PPRPs and R genes were identified to be UI-containing, indicating that UIs contribute to the expression regulation of these two gene families and play roles in the plant responses to biotic and abiotic stresses during translation and post-translational processes [62,82,83]. The regulatory role of UIs in the expression of genes containing UIs, especially the R genes, can be further studied. Moreover, many promoter enhancing related elements were identified in the UIs. The existence of these sequences demonstrated their regulatory roles in gene expression. Additionally, the expression of UIs in gene transcripts was confirmed by using RT-PCR and qRT-PCR. To our knowledge, this is the first report about the identification and characterization of horticultural plants from genome-wide level. The results obtained from this study could provide evidences for further understanding of the role of UIs in regulating gene expression.
Supplementary Materials: The following are available online at http://www.mdpi.com/1422-0067/21/9/3088/s1, Table S1: Information of the 5UIs and 3UIs identified in the sweet orange (Citrus sinensis) genome and the gene transcripts containing them. Table S2: Gene Ontology (GO) enrichment analysis result of 5UI-Ts. Table S3: Gene Ontology (GO) enrichment analysis result of 3UI-Ts. Table S4: KEGG pathway enrichment analysis result of 5UI-Ts. Table S5: KEGG pathway enrichment analysis result of 3UI-Ts. Table S6: PageMan pathway enrichment analysis of the 617 genes containing 5UI. Table S7: PageMan pathway enrichment analysis of the 469 genes containing 3UI. Table S8: PageMan pathway enrichment analysis of the 74 genes containing both 5UI and 3UI. Table S9: Information of the identified cis-acting elements in both 5UI sequences. Table S10: Information of the identified Cis-acting elements in both 3UI sequences. Table S11: Information of the 5UIs and 3UIs identified in the sweet orange (Citrus sinensis) genome and the gene transcripts containing them.