Coupling and Coordination in Gene Expression Processes with Pre-mRNA Splicing

RNA processing is a tightly regulated and highly complex pathway which includes transcription, splicing, editing, transportation, translation and degradation. It has been well-documented that splicing of RNA polymerase II medicated nascent transcripts occurs co-transcriptionally and is functionally coupled to other RNA processing. Recently, increasing experimental evidence indicated that pre-mRNA splicing influences RNA degradation and vice versa. In this review, we summarized the recent findings demonstrating the coupling of these two processes. In addition, we highlighted the importance of splicing in the production of intronic miRNA and circular RNAs, and hence the discovery of the novel mechanisms in the regulation of gene expression.


Introduction
Most eukaryotic protein-coding genes contain introns. Human primary pre-mRNAs on average contain approximately 27 K nucleotides and 9 exons, but an average mature mRNA contains only 3.5 K nucleotides [1]. In other words, more than 85% of the nucleotides are intronic sequences which should be removed before the mRNA is being translated. The reason why cells waste so many resources to OPEN ACCESS generate the "junk" during transcription remains a mystery. However, undoubtably, an effective system to recognize and remove introns is essential for preventing the production of abnormal proteins, which may function in a dominant negative manner and competitively inhibit the activity of their full-length native form [2].
Pre-mRNA splicing is a succession of two transesterification reactions ( Figure 1). The reactions are catalyzed by the complex named spliceosome. Spliceosome is a complex comprised of both RNA molecules (e.g., small nuclear ribonucleoproteins) and proteins. Spliceosome is found throughout the entire nucleus [3], where transcription and many other RNA processing pathways take place. Spliceosome recognizes a donor splice site and an acceptor splice site that are located at the 5' and 3' end of intron, respectively. For the 5' splice site, the only highly conserved cis-elements are the proximal dinucleotide (GU) of the intron. However, for the 3' splice site, three separated cis-elements are required: the branch site, the polypyrimidine tract and the 3' splice site dinucleotide (AG). In brief, for the first trans-esterification reaction, the 2' hydroxyl group of the conserved adenosine at the branch site attacks the conserved guanine of the 5' splice site at the exon-intron junction. A 2'-5' phosphodiester bond is formed and the exon-intron junction is cleaved. A 2'-5' phosphodiester RNA lariat structure and a free 3'-OH (leaving group) at the upstream exon are produced. After the rearrangement of the spliceosome components, the second trans-esterification reaction begins with another nucleophilic attack. The 3'-OH end of the released exon attacks the scissile phosphodiester bond of the conserved guanine of the 3' splice site at the intron-exon junction. Finally, the two exons are ligated together and the intron is released as a stable lariat structure product [4]. The lariats need to be debranched by debranching enzymes before degraded or processed into useful RNAs such as intronic snoRNAs and mirtrons [4]. Intronic lariats will accumulate in the cytoplasm in the absence of Dbr1 enzymatic activity [5]. Pre-mRNA splicing includes intron exclusion and exon ligation. In most cases, introns start from the sequence GU as 5' splice sites and end with the sequence AG as 3' splice site. A highly conserved nucleotide A at the branch site located approximately 20-50 bases upstream of the 3' splice site. Lariat was considered as an unstable intermediate. Recent findings suggested that those intron products have unexpected long half-lives and are precursors for other RNAs such as miRNAs from mirtrons [6]. The factors determining the stability and fate of intron products are largely unknown.
In addition to the 5' and 3' splice sites mentioned above, additional cis-elements named exonic/intronic splice enhancers or silencers can also influence the overall fidelity of pre-mRNA splicing [7][8][9]. An analysis focusing on mutations near splice junctions revealed that approximately 15% of disease causing mutations lead to RNA splicing defects [10,11]. With the advent of advanced strategies for predicting the effects of sequence variations on splicing and cryptic splice sites, more diseases caused by splicing defects will be explored [12][13][14]. Defects in pre-mRNA splicing are considered as the primary cause of many diseases, such as neurodegenerative diseases and cancers [15][16][17][18][19][20]. Hence, targeting pre-mRNA splicing could be a potential treatment for those diseases [5,[21][22][23][24][25].
On the other hand, pre-mRNA splicing requires some degree of flexibility [26]. Exons and introns are either retained or removed to generate a diversity of splicing variants known as alternative splicing [27,28]. Alternative splicing is essential for regulation of gene expression and for increasing the proteome complexity. For example, a premature stop codon is introduced by alternative splicing that suppresses the expression of the gene by degradation through nonsense-mediated decay (NMD) during cytoplasmic translation [29]. In addition, alternatively spliced mRNA variants can produce protein isoforms with altered amino acid sequences and domains resulting in changes in enzymatic activity, cellular localization and/or binding partners [1]. Therefore, alternative splicing is considered to be the most important source of structural and functional diversity at the protein level. It is estimated that about 95% of transcripts from multi-exon genes undergo alternative splicing, some instances of which occur in a tissue-specific manner and/or under specific cellular conditions [30,31]. There are four main types of alternative splicing events (Figure 2), including exon skipping, intron retention, alternative 3' splice site and 5' splice site selection [27]. More complex alternative splicing events such as mutually exclusive exons, exon/intron scrambling, alternative promoter usage and alternative polyadenylation are less frequent [27,32].

Figure 2.
Many splicing variants could be formed from the same pre-mRNA by alternative splicing. Circular RNA, generated by splicing, is a new member of the splicing variants. Several mechanisms for the formation of circular RNAs have been proposed, including the circularization of exons, facilitated by the presence of adjacent repetitive sequence [33][34][35][36].
Although splicing is tightly regulated [37][38][39][40], several lines of evidence suggested that the splicing of many pre-mRNAs is suboptimal [41] and that unspliced nascent transcripts and aberrant splicing intermediates are detected, especially when the intracellular RNA degradation activities are inhibited [42][43][44][45][46][47][48]. The recognition and degradation of the unspliced/mis-spliced transcripts and the excised introns become very crucial steps to maintain proper cellular growth and even survival. In this review, we summarized recent findings in coupling and coordination in gene expression processes with pre-mRNA splicing. The "by-products" generated from splicing escaped from RNA degradation were also discussed.

Splicing and Nuclear RNA Surveillance
So far, most studies are focusing on the recognition and degradation of unspliced mRNA by nonsense-mediated mRNA decay (NMD) [49][50][51]. NMD is an important RNA surveillance system that functions to detect and degrade RNAs with premature stop codon and prevent the expression of erroneous or truncated proteins in cytoplasm. A typical branchpoint usually harbors a translation termination codon without proper splicing. It remains at the unspliced RNAs and triggers the activity of NMD [46]. Therefore, the stop codon within splicing signal provides an important role to guarantee the cytoplasmic degradation of unspliced transcripts by NMD.
Nevertheless, a number of observations bring to the idea that nuclear RNA surveillance system not only plays a key role in eliminating the aberrant unspliced transcripts and splicing intermediates, but also directly involves in the regulation of the splicing process. Firstly, most of the unspliced mRNAs are trapped in the nucleus [52,53]. Secondly, unspliced transcripts and splicing intermediates are hardly detected in wild-type cells unless nuclear RNA surveillance is inactivated [42][43][44]. Thirdly, certain nuclear exosome components are recruited to intronic regions of transcribing genes [54][55][56]. Fourthly, a number of RNA binding factors, such as shuttling Ser-Arg-rich (SR) RNA-binding proteins and cap binding complex (CBC), which are recruited cotranscriptionally and exhibit physical or genetic interactions with nuclear RNA surveillance components, are directly involved in splicing [57][58][59][60][61][62][63]. Finally, splice-site mutations can cause Rrp6p-mediated nuclear retention of the unspliced RNAs and transcriptional down-regulation of the splicing-defective genes [43,64].
The exosome is a multi-subunit protein complex involved in RNA surveillance by degrading aberrantly processed RNAs and RNA processing intermediates [65]. Both nuclear and cytoplasmic exosomes have the same common core components, but are decorated with a variety of different peripheral proteins (such as Rrp6p, Dis3p, TRAMP and SKI complex) [66]. According to the current model, substrates of the nuclear exosome are recognized and subsequently recruited to the nuclear exosome by its cofactor, TRAMP complex [67][68][69]. The TRAMP complex is also a multi-protein complex comprising of the RNA helicase Mtr4p, a poly(A) polymerase (either Trf4p or Trf5p) and a zinc knuckle RNA binding protein (either Air1p or Air2p) [70]. The TRAMP complex cooperates with the nuclear exosome of eukaryotic cells and is involved in the 3' end processing of snoRNAs and ribosomal RNA. TRAMP complex is cotranscriptionally recruited to nascent RNA transcript [71], and physically interacts with spliced-out introns [72] and splicing factors [71,73], and thereby facilitates their degradation by the exosome. Deletion of TRAMP components leads to further accumulation of unspliced pre-mRNAs even in a yeast strain defective in nuclear exosome activity, suggesting a novel stimulatory role of TRAMP in splicing [71]. The cotranscriptional recruitment of TRAMP before or during splicing may function as a fail-safe mechanism to ensure the preparation for the subsequent targeting of splicedout introns for rapid degradation by the nuclear exosome [71,73].
Consistent with the hypothesis above, recent study demonstrated that two shuttling SR proteins Gbp2p and Hrb1p are necessary for quality control of spliced mRNAs [74]. Gbp2p and Hrb1p stabilize the binding between TRAMP complex and spliceosome-bound transcripts [74]. Unspliced RNAs are retained in the nucleus and channeled to the TRAMP/exosome mediated degradation by Gbp2p and Hrb1p [74]. Taken together, Gbp2p and Hrb1p function as part of the fail-safe mechanism to ensure the cotranscriptional recruitment of TRAMP before or during splicing to prepare for the subsequent targeting of spliced-out introns to rapid degradation by the nuclear exosome. However, it remains unclear when the nuclear exosome and TRAMP are recruited and how they recognize unspliced pre-RNAs or spliced introns.

Spliceosome-Mediated Decay
Spliceosome-mediated decay (SMD) was first proposed in 2013 when it was observed that the expression of ~1% of mRNAs without any intron were upregulated in the yeast cells defective with the splicing factor PRP40 [75]. Spliceosome associates with those intronless mRNAs probably through the cis-elements similar to 5' splice site and branchpoint splice signals ( Figure 3). The spliceosome endonucleolytically cleaves those intronless mRNA and the products are degraded by a nuclear RNA surveillance system [75]. The existence of SMD provided a plausible explanation for the coordinated regulation of expression levels of the homologous genes bromodomain factor (BDF) 1 and BDF2 in the yeast under different stress conditions [76]. Interestingly, the expression level of BDF2 is also subjected to an additional layer of post-transcriptional control through RNase III-mediated decay (RMD) [77]. RNase III Rnt1p cleaves a stem-loop structure within the BDF2 mRNA to down-regulate its expression [77]. The SMD and RMD pathways of the BDF2 mRNA are differentially activated or repressed in specific environmental conditions [77]. The crosstalk between SMD and RMD pathways remain to be further explored. Maybe due to lack of proper 3' splice site required for the canonical pre-mRNA splicing as shown in Figure 1, spliceosome only cleaves the intronless mRNA at the 5' splice site without proceeding to the second transesterification. The incompletely spliced products are degraded by the nuclear exosome. Ineffective transition from the first to the second step of splicing could also promote the pre-mRNA to nuclear degradation [75].

Splicing and microRNA Processing
miRNAs are categorized as "intergenic" or "intronic" by their genomic locations. Large-scale bioinformatic analysis identified that many pre-microRNAs (miRNAs) are located in introns (named mirtrons) [78][79][80] or across exon-intron junctions [81]. As intronic miRNAs share common regulatory mechanisms with their host genes, the expression patterns of intronic miRNAs and their host genes are similar, while intergenic miRNAs are known to be transcribed as independent transcription units [82]. As shown in Figure 4, coupling between the splicing and microRNA processing machineries within a supraspliceosome context was proposed [83][84][85][86]. Supraspliceosome is a huge (21 MDa) nuclear ribonucleoprotein (RNP) complex in which numerous pre-mRNA processing steps take place [87]. Two key components of microRNA processing (the ribonuclease (RNase) III enzyme Drosha and the RNA binding protein DGCR8) and pre-miRNAs are co-sedimented with supraspliceosomes by glycerol gradient fractionation [85]. Other splicing factors such as serine/arginine-rich splicing factor 1 (SRSF1; Formerly SF2/ASF), heterogeneous nuclear ribonucleoprotein (hnRNP) A1 and K homology (KH) domain RNA binding protein (KSRP) have been proposed with moonlighting function in microRNA processing [88][89][90][91]. Processed pri-miRNAs are also found in supraspliceosomes [87]. Recent findings supported the model that the initiation of spliceosome assembly at the 5' splice site promotes microRNA processing by recruiting Drosha to intronic miRNAs [92]. Knockdown of U1 splicing factors globally reduces intronic miRNAs. It is consistent with the notion that the first step of the processing of mirtrons is splicing instead of microRNA processing and the debranched introns mimic the structural features of pre-miRNAs to enter the miRNA-processing pathway without Drosha-mediated cleavage [93]. Interestingly, Drosha may function as a splicing enhancer and promote exon inclusion [94]. Drosha binds to the exon and stimulates splicing in a cleavage-independent but structure-dependent manner [94]. To sum up, the expression of mirtrons is positively regulated by the splicing and microRNA processing. studies suggested that splicing and microRNA processing are more closely associated than previously thought. Drosha is recruited to splice site with spliceosome as supraspliceosome [84,85]. Drosha may play a key role in the coordination of the regulation of mirtronic microRNAs biogenesis and splicing.
Interestingly, some intronic miRNAs in humans can be transcribed independently of their host genes. The competition model between spliceosome and microRNA processing complex was proposed especially for miRNAs across exon-intron junctions [81,95]. It was suggested that nearby cis-elements and pre-miRNA secondary structure would interfere with splice site recognition [81,95]. In addition, inhibition of splicing by spliceostatin A upregulates the levels of the intronic miRNAs [85], whereas overexpression of Drosha increases the levels of the intronic and the exonic miRNAs [81]. These findings strongly supported that Drosha, instead of the miRNAs generated from canonical miRNA gene silencing pathway, directly represses the expression of genes by cleavage of the mRNAs [81].

Splicing and Circular RNAs
Circular RNAs are widely expressed noncoding RNAs and are generated cotranscriptionally by non-canonical mode of RNA splicing [32,83,96,97]. As mentioned above, during splicing, the spliceosome produces a free OH group at the 3' end of the intron. This free OH group attacks the phosphodiester bond between the downstream exon and intron. A debranching failure and "back-splicing" (a process in which downstream exons are spliced to upstream exons in reverse order [33,83,[98][99][100]) produces a circular intronic long non-coding RNAs [101]. Recent deep sequencing studies have clearly revealed that thousands of circular RNAs generated from protein-coding genes in many organisms including human, and the number of circular RNAs per cell is far more than their linear protein-coding RNAs counterparts [83,[102][103][104][105][106][107]. The accumulation of circular RNAs in cells may be attributed to the higher resistance of circular RNAs to endogenous exoribonucleases and hence their longer half-life [100,107,108].
Although circular RNAs are produced during splicing, the production of circular RNAs competes with canonical pre-mRNA splicing was also observed [96]. The production of these circular RNAs is mediated by intronic sequences [96,102,103,109]. A recent study demonstrated that the expression of a subset of circular RNA is regulated by the splicing factor muscleblind [96]. Therefore, circular RNAs may not only represent products of defective pre-mRNA splicing and nuclear RNA surveillance. They may actually be actively produced [34]. Interestingly, the production of circular RNAs seems to be responsible for a decline in the efficiency of canonical linear splicing. Circular RNAs accumulate in the nervous system and increase with age in Drosophila [110]. The mechanism and function of age-related modulation of circular RNA accumulation remain to be explored.
The function of most circular RNAs remains unclear, although their expression levels are closely related to diseases [105,111]. As circular RNAs are mainly found in the nucleus rather than the cytoplasm [103], and circular RNAs lack proper start and/or stop codons, it is unlikely that circular RNAs can code for proteins. However, a number of mechanisms of the regulatory potency of circular RNAs in gene expression are proposed. Certain circular RNAs function in regulating the expression of their host genes [103]. Circular RNAs accumulate at their sites of transcription, associate with elongation RNA polymerase II (RNAP II), and acts as a positive regulator of RNAP II transcription [103]. Some of these circular RNAs have been shown to act as molecular sponges by competing and/or sequestering miRNAs, and hence regulates miRNA level [112]. The potential function of circular RNAs in gene expression, their association with diseases in humans and their implications for therapeutic applications remains to be further explored [34,113].

Conclusions and Perspectives
In summary, the interactions between splicing and other RNA processing systems are more complicated and dynamic than we have ever thought. How does the exosome distinguish its targets splicing intermediates from the fully spliced RNAs? How is the expression of the selected splicing variants, intronic miRNAs and circular RNAs regulated through the coordination of the pre-RNA splicing and other RNA processing pathways? Those fundamental questions remain unaddressed. Through advances in technologies [114][115][116], development of new strategies [117][118][119][120][121][122][123], and establishment of databases for sharing information [124][125][126], hopefully those questions will be addressed in the near future.