Genome-Scale Computational Identification and Characterization of UTR Introns in Atalantia buxifolia

Accumulated evidence has shown that CDS introns (CIs) play important roles in regulating gene expression. However, research on UTR introns (UIs) is limited. In this study, UIs (including 5′UTR and 3′UTR introns (5UIs and 3UIs)) were identified from the Atalantia buxifolia genome. The length and nucleotide distribution characteristics of both 5UIs and 3UIs and the distributions of cis-acting elements and transcription factor binding sites (TFBSs) in 5UIs were investigated. Moreover, PageMan enrichment analysis was applied to show the possible roles of transcripts containing UIs (UI-Ts). In total, 1077 5UIs and 866 3UIs were identified from 897 5UI-Ts and 670 3UI-Ts, respectively. Among them, 765 (85.28%) 5UI-Ts and 527 (78.66%) 3UI-Ts contained only one UI, and 94 (6.38%) UI-Ts contained both 5UI and 3UI. The UI density was lower than that of CDS introns, but their mean and median intron sizes were ~2 times those of the CDS introns. The A. buxifolia 5UIs were rich in gene-expression-enhancement-related elements and contained many TFBSs for BBR-BPC, MIKC_MADS, AP2 and Dof TFs, indicating that 5UIs play a role in regulating or enhancing the expression of downstream genes. Enrichment analysis revealed that UI-Ts involved in ‘not assigned’ and ‘RNA’ pathways were significantly enriched. Noteworthily, 119 (85.61%) of the 3UI-Ts were genes encoding pentatricopeptide (PPR) repeat-containing proteins. These results will be helpful for the future study of the regulatory roles of UIs in A. buxifolia.


Introduction
Introns, the removed genomic sequences from corresponding RNA transcripts, have been intensely studied since their first discovery [1,2]. They can be generally divided into three groups (Groups I-III). Group I and II introns are of self-splicing activity and are both widely identified in some bacterial and organellar genomes [3,4], while group III introns are spliceosomal introns mainly found in the nuclear genomes of eukaryotes, and the excision of this kind of introns is spliceosome-dependent [5]. Accumulated evidence has shown that the presence of introns and the behaviors of spliceosomes affect almost every step of gene expression [6,7]. Some introns have the effect of boosting gene expression [8][9][10][11], and this intronic effect is called intron-mediated enhancement (IME). The addition of alcohol dehydrogenase-1 (Adh1) first intron increased the expression of a maize chimeric chloramphenicol acetyltransferase (CAT) gene for 100-fold [12]. The Shrunken-1 (Sh1) intron 1 could enhance chimeric gene expression by approximately 100-fold, and the combined Sh1 first exon and intron 1 could enhance report gene expression by more than 1000-fold [13]. The expression of petunia small subunit of ribulose bisphosphate carboxylase (rbcS) in transgenic tobacco overexpressing its gDNA expressed about five-fold higher than that in a transgenic plant overexpressing its cDNA [14]. The first intron of Arabidopsis elongation factor 1 beta gene (AteEF-1β) was proved to be required for the gene's high expression due to the enhancer-like element that existed in this coding sequence intron (CI) [15].

Genome-Wide Identification of A. buxifolia UIs
Based on the annotation information of the A. buxifolia genome, introns in CDSs, 5 UTR and 3 UTR were separately extracted. Then, the 5UIs and 3UIs were identified from the A. buxifolia genome according to the method described by Shi et al. [42]. Introns between UTR exons were extracted according to the genome annotation file, and introns showed retention in the exon region of any transcripts were excluded to ensure the identified UIs strictly exist in the intron regions. Information of the identified 5UIs and 3UIs were shown in Additional file Tables S1 and S2, respectively. UI density, position preference, length and nucleotide composition statistical analysis were performed using Perl, and figures were drawn using ggplot2 [42].

Gene Pathway-Enrichment Analysis of UI-Ts
To annotate and illustrate the A. buxifolia genes containing UIs, we conducted PageMan pathway-enrichment analysis for all the UI-Ts, 5UI-Ts and 3UI-Ts, respectively. Briefly, the genes containing 5UI and/or 3UI were first subjected to Mapman analysis based on the abovementioned A. buxifolia mapping file. Then, pathway-enrichment analysis was performed using PageMan embedded in MapMan [47]. By applying the Benjamini and Hochberg adjustment, pathways with corrected p value < 0.05 were considered significantly enriched by UI-Ts.

Identification of Introns in A. buxifolia CDSs and 5 and 3 Untranslated Regions (UTRs)
In total, we identified 16,218 5 UTRs, 16,337 3 UTRs and 28,412 CDSs from the A. buxifolia genome. Additionally, 597 5 UTRs (accounting for 3.68% of 5 UTRs), 452 3 UTRs (2.77% of 3 UTRs) and 21, 005 CDSs (73.93% of CDSs) were found to contain introns. The intron-harboring ratios for 5 UTRs and 3 UTRs were significantly lower than those in CDSs, which may well explain why the UIs were often overlooked. After normalizing the intron density to the average number of introns per nucleotide of each gene transcript sequence, we found that the intron density followed the order: CDS > 5 UTR > 3 UTR, which was 3.11 × 10 −3 , 8.42 × 10 −5 and 3.08 × 10 −5 , respectively ( Table 1). The 5UI density is~2.7 times higher than 3UI's, which is only~2.7% of the CIs.

Intron Sizes and Distributions within UTRs and CDSs
The introns within 5 UTRs, CDSs and 3 UTRs of A. buxifolia varied greatly in amounts and lengths ( Figure 1). The average length distributions of A. buxifolia 5UIs and 3UIs were more similar (5 UTR: n = 1077, the mean, median, LQ, UQ and SD length correspond to 823.74, 409, 165, 851 and 2599.52 nucleotides, respectively; 3 UTR: n = 866, the mean, median, LQ, UQ and SD length is 838.85, 452, 141, 932.5 and 1823.39 nucleotides, respectively). Their mean and median intron sizes were~2 times higher than the CIs (n = 319,907, mean = 430.4 nucleotides, median = 175 nucleotides, LQ = 102 nucleotides, UQ = 464 nucleotides and SD = 1698.36 nucleotides). The frequency of CIs with lengths ranging from 100 to 300 nucleotides was significantly higher than 5UIs and 3UIs. However, the relative frequencies of short introns < 50 nucleotides and introns > 300 nucleotides of 3UIs and 5UIs were higher than CIs. Similar to sweet orange [42], the A. buxifolia 5UIs and 3UIs are more preferentially located at the stop ends of 5 UTRs and at the beginning of 3 UTRs, respectively.

Nucleotide Conservation around the Splice Junctions
To show the nucleotide bias around the donor and acceptor sites of 5UIs, CIs and 3UIs, the sequence logos were used [49]. Results showed that both the A. buxifolia UIs and

Nucleotide Conservation around the Splice Junctions
To show the nucleotide bias around the donor and acceptor sites of 5UIs, CIs and 3UIs, the sequence logos were used [49]. Results showed that both the A. buxifolia UIs and CIs possess A/T-rich element around both donor and receptor sites [42]. Moreover, GT-AG was found to be the major splice site pair in both A. buxifolia 5 UTRs (98.32%) and 3 UTRs (98.26%), followed by GC-AG splice site pair (accounting for 1.67% of 5 UTRs and 1.73% of 3 UTRs) ( Figure 2).

Cis-Acting Elements and TFBS Prediction Analysis of 5UIs
In total, 46,543 cis-acting elements belonging to 47 element types were identified from all 5UI sequences; each 5UI contains 43.22 elements on average (Table 3, Additional file

Cis-Acting Elements and TFBS Prediction Analysis of 5UIs
In total, 46,543 cis-acting elements belonging to 47 element types were identified from all 5UI sequences; each 5UI contains 43.22 elements on average (Table 3, Additional file  Table S3). About 82.92% and 88.02% of 5UIs contained 'core promoter element around −30 of transcription start' and 'common cis-acting element in promoter and enhancer regions', respectively. Additionally, these two kinds of elements took the largest part, respectively, accounting for 26.01% and 11.47% of the total elements. More than 73.00% 5UI-Ts contain both the two elements. Besides, many light-related elements were identified in 5UI sequences.

Discussion
In the present study, based on the A. buxifolia genome data, we identified the introns existing in the UTR and CDS regions. Similar to sweet orange [42], more than 70% of A. buxifolia CDSs were found to contain introns. Unlike the CDSs, few UTRs were found to own intron. Only 3.68% of 5 UTRs and 2.77% of 3 UTRs were intron-containing. This might well explain why UTR introns are often neglected. Bioinformatic analysis of UIs and UI-containing transcripts (UI-Ts) was then performed. Additionally, the results obtained in this study were shown as follows.

The Lengths of A. buxifolia UIs Were Less Conserved Than CIs, and Most UI-Ts Contain Only One UI
Although the density of UIs was significantly lower than that of CIs, their mean and median intron sizes were~2 times those of CIs. Similar to Arabidopsis and C. sinensis [39,42], the frequency of A. buxifolia CIs with 100~300 nucleotides was significantly higher than that of 5UIs and 3UIs, but the relative frequencies of short introns <50 nucleotides and long introns >300 nucleotides in 5 UTRs and 3 UTRs were higher than those in CDSs, indicating that the UI lengths were less conserved. Moreover, in accordance with previous studies [16,38,41], we found that most 5UI-Ts and 3UI-Ts contain only one UI.

A/T-Rich Elements around Both Donor Sites and Receptor Sites of A. buxifolia UTRs Are Important for UI Recognition and Removal
Accumulated evidence showed that there were many splicing signals and factors influencing intron removal and mRNA transcription [51]. Among these factors, splice site pairs greatly influenced the effectivity of recruiting splicing machinery [52]. Consistent with previous studies [40,42], the 5 donor sites and 3 acceptor sites in A. buxifolia UTRs were also very conserved, and GT-AG and GC-AG were the two major splice site pairs for both A. buxifolia 5 UTRs and 3 UTRs. Moreover, it has been suspected that the A/T-rich elements play a role in intron recognition [42,[53][54][55][56][57]. In the present study, an A/T-rich element was found around both donor sites and receptor sites of A. buxifolia UTRs, indicating that they might function in UI recognition and removal.

A. buxifolia 5UIs Were Rich of Gene-Expression-Enhancement-Related Elements and TFBSs, Indicating That They Might Contribute Greatly to Gene Expression Regulation
Currently, evidence has shown that UTR introns, especially the 5UIs, contribute greatly to gene expression regulation [58][59][60]. Some 5UIs and other 5 -proximal introns with cis-elements were proved to have the ability of enhancing gene transcription [6,61,62]. Kamo et al. [24] demonstrated that the 5UI of GUBQ1 acted as the promoter core sequence to increase GUS translation efficiency in transgenic plants. Lu et al. [58] and Samadder et al. [63] reported that the 5UI of rice rubi3 gene could improve the gene's expression at both transcriptional and post transcriptional levels. In our present study, the A. buxifolia 5UIs were found to be rich of gene-expression-enhancement-related elements (such as 'core promoter element around −30 of transcription start' and 'common cis-acting element in promoter and enhancer regions' elements), indicating that these 5UIs may play roles in enhancing the expression of their corresponding genes.
Cenik et al. [41] reported the particular enrichment of 5UIs in genes with regulatory roles. The eukaryotic genes' transcription is regulated by transcription factor (TFs), which could interact specifically with sequences in the promoter regions of the genes they regulate [53,54]. Consistently, regulatory genes tend to have more transcription factor binding sites (TFBSs) in their 5UIs [21]. In our present study, we found that the A. buxifolia 5UIs contain many TFBSs for BBR-BPC, MIKC_MADS, AP2 and Dof TFs. Functional analyses have revealed the indispensable role of BBR/BPC proteins in the gene expression control of TF genes [55][56][57]. Several Dof proteins have also been proved to contribute to gene expression activation by interacting with some other regulatory proteins [64,65]. The enrichment of these TFBSs in A. buxifolia 5UIs indicated that they play roles in the expression regulation of their corresponding downstream genes.

Many UI-Ts Are Involved in RNA Metabolism
Pathway-enrichment analysis revealed that UI-Ts involved in 'RNA' pathways were significantly enriched. Moreover, these 'RNA'-pathway-related UI-Ts are more inclined to contain 5UI, which supported the findings of Cenik et al. [41] that genes with regulatory roles were more prone to possess 5UI. In this study, 40.00% (12/30) of the 'RNA. regulation of transcription. unclassified'-pathway-related 5UI-Ts were genes encoding A20/AN1-like zinc finger (Znf) family proteins. Plant A20/AN1-ZnF family proteins were implicated in the plant responses to various abiotic and biotic stresses [66], suggesting that UIs might function in A. buxifolia stress responses. The bHLH TFs contributed greatly to regulating multiple plant cellular and biological processes [67]. In this study, genes encoding bHLH transcription factors were also enriched by 5UI-Ts. In addition, the 'RNA binding' pathway was significantly enriched by 3UI-Ts. The correlations between 3UI-Ts and RNA binding have been proved in previous studies. The expression of 3UI-Ts was more likely to be inhibited by NMD than gene transcripts with no 3UI, and the most significantly enriched NMD-affected gene transcripts were those encoding RNA-binding proteins [36,68]. Eukaryotes RNA-binding proteins (RBPs) play crucial roles in almost all aspects of posttranscriptional gene expression regulation [69]. In this study, two UBP1-associated protein 2A (UBA2A) genes were identified to be 3UI-Ts. It was reported that UBA2A could bind to RNA molecules containing U-rich sequences in 3 UTRs and might make contributions to the stabilization of mRNAs in the nucleus [70]. Thus, it is hypothesized that the 3UIs of the two UBA2As contribute to the mRNA stabilization status in A. buxifolia.

Most A. buxifolia 3UI-Ts Are Members of PPRP Gene Family, and Many UI-Ts Are Stress-Response-Related or with Unknown Function
Noteworthily, 119 (85.61%) of the 3UI-Ts belong to genes encoding pentatricopeptide (PPR) repeat-containing proteins. In Arabidopsis, rice and Populus trichocarpa, 441, 477 and 626 PPRP gene members were identified, respectively [50,71,72]. PPRP gene family widely exists in plants, especially terrestrial plants, and plays a crucial role in plant growth and development, and stress response processes [73][74][75]. Usually, the PPRP genes contain no intron in their CDSs [76,77]. In our present study, we also found that all the UI-containing PPRPs contained no intron in their CDSs, indicating that alternative splicing events of PPRPs mainly occurred in UTRs, and UIs contributed greatly to the tissue-or organ-specific expression and the regulatory functions of PPRPs.
Stress can affect the efficiency or patterns of splicing and intron retention for stabilizing the transcripts or serving to modify its biological functions [78][79][80]. The 3UI-Ts of sweet orange were significantly enriched in stress pathway [42]. In this study, we also identified many UI-Ts related to 'stress' and 'signaling', suggesting that UIs may play an extremely important role in plant defense responses [42,78]. Additionally, we found many UI-Ts, which were categorized into the 'unclassified' pathway. The functions of these UIs and their corresponding UI-Ts need to be further studied.

Conclusions
In this study, we performed a genome-scale computational analysis of UIs in A. buxifolia, investigated their size and nucleotide distribution characteristics, explored the regulatory cis-elements and TFBSs in the 5UI sequences, and explicated the possible functions of UI-Ts. In total, we identified 1077 5UIs and 866 3UIs from the A. buxifolia genome data. The density of 5UIs and 3 UIs was lower than that of CIs, but they were twice as big as CIs. The 5 donor sites and 3 acceptor sites in A. buxifolia UTRs were very conserved, and GT-AG was the mostly commonly splice site pair for both 5 UTRs and 3 UTRs. A/T-rich elements functioning in intron recognition and removal were found around both donor sites and receptor sites of A. buxifolia UTRs. Most UI-Ts contained one 5UI or 3UI. Many gene-expression-enhancement-related elements and TFBSs were discovered in the A. buxifolia 5UIs, indicating that 5UIs play a role in regulating or enhancing the expression of their corresponding downstream genes. Consistently, pathway-enrichment analysis revealed that UI-Ts involved in 'RNA' pathways were significantly enriched. Notably, more than 85% of the 3UI-Ts were genes encoding pentatricopeptide (PPR) repeat-containing proteins. The results obtained in this study are of great significance for further understanding the regulation of UIs in A. buxifolia gene expression.