Sequence and Evolutionary Features for the Alternatively Spliced Exons of Eukaryotic Genes

Alternative splicing of pre-mRNAs is a crucial mechanism for maintaining protein diversity in eukaryotes without requiring a considerable increase of genes in the number. Due to rapid advances in high-throughput sequencing technologies and computational algorithms, it is anticipated that alternative splicing events will be more intensively studied to address different kinds of biological questions. The occurrences of alternative splicing mean that all exons could be classified to be either constitutively or alternatively spliced depending on whether they are virtually included into all mature mRNAs. From an evolutionary point of view, therefore, the alternatively spliced exons would have been associated with distinctive biological characteristics in comparison with constitutively spliced exons. In this paper, we first outline the representative types of alternative splicing events and exon classification, and then review sequence and evolutionary features for the alternatively spliced exons. The main purpose is to facilitate understanding of the biological implications of alternative splicing in eukaryotes. This knowledge is also helpful to establish computational approaches for predicting the splicing pattern of exons.


Introduction
It is well acknowledged that phenotypic and functional diversity are contributed by the variable transcription of genes in eukaryotes to a considerable extent [1]. The genic transcription could vary in terms of mRNA molecules and expression levels. In the case of the former, the same gene could be alternatively transcripted into more than one of the mRNA molecules with similar but not identical functions, and this one-to-several relationship of gene to mRNAs mainly results from alternative splicing of pre-mRNA [2]. Beside the well-known roles of regulating individual development and driving species evolution, increasing evidence also supports the supposition that disturbances of alternative splicing can be the causes or consequences of many diseases in humans [3]. Furthermore, economically important traits, such as reproduction, disease resistance and environmental fitness, have been successfully explained by alternative splicing events in animals and plants [4,5]. Therefore, the evolutionary dynamics and regulatory mechanisms for alternative splicing of eukaryotic pre-mRNAs have received considerable attention during the past decade [6][7][8].
In earlier studies, detection of alternative splicing events mainly depended on the expressed sequence tags (ESTs) with an estimation of about 50% of human genes subjected to alternative splicing [9]. With the aid of high-throughput RNA sequencing (RNA-Seq), two pioneering studies comprehensively investigated the transcriptome complexity in human and suggested that almost 95% of multiexon genes undergo alternative splicing [10,11]. The recent study also revealed that noncoding exons are universally alternatively spliced [7]. It seems reasonable to expect, therefore, that almost all multiexon genes would have the potential to be alternatively spliced and translated into multiple protein isoforms. However, this landscape may be challenged by the recent observation that most human genes actually have only a single main protein isoform [12]. Furthermore, computational analyses of alternative splicing events have been largely facilitated by the sophisticated bioinformatic tools during the past years, including the reference genome-guided and de novo approaches [13,14]. Due to the application of full-length transcript sequencing [15], such as by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), the accuracy, robustness and reliability have been largely improved for detecting alternative splicing events. Using the PacBio RS II platform, our lab successfully explored a large number of novel alternative splicing events in rabbits, a non-model organism [16].
A direct consequence of alternative pre-mRNA splicing is that all exons can be classified to be either constitutively or alternatively spliced depending on whether it is virtually included into all mature mRNAs. Because the alternatively spliced exons are not ubiquitously used, they would be less evolutionarily constrained than that of constitutively spliced exons. Therefore, we may anticipate that they would have been associated with distinct biological characteristics due to the differential evolutionary constraints. To better understand these differences will help us to explain the evolutionary origins, regulatory mechanisms, and biological implications of alternative splicing. In this paper, we focus on reviewing both sequence and evolutionary features for the alternatively spliced exons.

Representative Types of Alternative Splicing Events
In the classic understanding, eukaryotic genes are continually transcribed from transcriptional start to stop sites and produce pre-mRNAs, from which the noncoding introns must be precisely spliced out and coding exons are ligated to form mature mRNAs. However, the two processes of gene transcription and pre-mRNA splicing could be coupled so that splicing is also regulated by transcription factors [6,17,18]. The splicing process of pre-mRNAs is primarily carried out by the spliceosome that is a large macromolecule and composed of five small nuclear ribonucleoprotein particles (U1, U2, U4, U5 and U6 snRNPs) and hundreds of non-snRNP factors [19]. Recently, the cryo-electron microscopy structures of spliceosome have been successfully solved in human and yeast, which are very helpful to understand the accurate splicing mechanism [20,21].
For a single pre-mRNA, multiple splice sites could be alternatively recognized and used for producing different transcript isoforms, and this process is formally defined as alternative splicing. The alternative splicing events have been generally classified into four main types at least, including the exon skipping (also called alternative cassette exon), alternative 3 splice site (SS), alternative 5 SS, and intron retention [22]. We have schematically illustrated a four-exon gene in Figure 1A and first shown that all exons are constitutively spliced to produce transcript i0. In general, transcript i0 would have the maximum number of exons and the longest sequence in comparison with other isoforms. By comparing with i0, transcript i1 represents an exon skipping in which the third exon is entirely spliced out. Similarly, alternative 3 SS (i2) and 5 SS (i3) events are illustrated when splice site changes on 3 end of the second exon and 5 end of the first exon, respectively. The second intron was not spliced out of transcript i4 that is called intron retention. A minor type of alternative splicing event was recently suggested and named the exonic intron [23], that is the alternative inclusion for an internal region of exon (not shown). It should be noted, of course, that more than one alternative splicing event could be simultaneously observed in a single transcript isoform. In a broader sense, there are two additional types of transcript isoforms that differ on 3' end of the first exon and 5' end of the last exon, which are introduced at the initial and terminal stages of genic transcription, respectively.

Two Kinds of Exon Classification
According to the definition of alternative splicing, all exons could be classified to be either constitutively or alternatively spliced. Although this classification appears to be simple, there are some ambiguous cases when describing the different types of exons in some of the literature. For example, exons with alternative 3′/5′ SS are inconsistently treated as either constitutively or alternatively spliced exons [24,25]. Also, the term "alternatively spliced exon" would be only restricted to alternative cassette exons [26]. Therefore, here we concisely summarize the two kinds of exon classification. First, all exons could be classified into four types in accordance with the different alternative splicing events, including the constitutive exon, alternative cassette exon, alternative 3′ SS exon, and alternative 5′ SS exon ( Figure 1B). Among them, the latter three types could be collectively called the alternative exon. Because this classification maintains exonic intactness, it will face an uncertainty in selecting the representative sequence among the several forms in different length for the alternative 3′/5′ SS exons. Actually, it was also suggested in some parts of the literature that both constitutive and alternative exons could be further classified into simple, multiple and complex

Two Kinds of Exon Classification
According to the definition of alternative splicing, all exons could be classified to be either constitutively or alternatively spliced. Although this classification appears to be simple, there are some ambiguous cases when describing the different types of exons in some of the literature. For example, exons with alternative 3 /5 SS are inconsistently treated as either constitutively or alternatively spliced exons [24,25]. Also, the term "alternatively spliced exon" would be only restricted to alternative cassette exons [26]. Therefore, here we concisely summarize the two kinds of exon classification. First, all exons could be classified into four types in accordance with the different alternative splicing events, including the constitutive exon, alternative cassette exon, alternative 3 SS exon, and alternative 5 SS exon ( Figure 1B). Among them, the latter three types could be collectively called the alternative exon. Because this classification maintains exonic intactness, it will face an uncertainty in selecting the representative sequence among the several forms in different length for the alternative 3 /5 SS exons. Actually, it was also suggested in some parts of the literature that both constitutive and alternative exons could be further classified into simple, multiple and complex forms according to the observed occurrence times among all transcript isoforms [25,27]. Second, a single exon is dividable and could be separately classified into the constitutive and alternative exon regions in a more simply way ( Figure 1C), which means that one exon would not be treated as an intact unit. For the two kinds of exon classification, which one should be used will mainly depends on the studied biological questions. Of course, there is an ongoing debate on whether all exons would be alternatively spliced actually [25]. Also, we might keep in mind that the exon's classification may be changeable when the different sets of transcripts are used for analysis of alternative splicing.

Annotating Exons
Due to the wide application of RNA-Seq, a large number of transcript sequences have been assembled and always stored in the standard general transfer format (GTF) or general feature format (GFF) files. Therefore, it is sometimes necessary to extract the constitutive and alternative exons suitable for subjecting to specific analyses, such as robust quantification of isoform expression [28]. However, it is not a trivial task for non-bioinformatic researchers to distinguish and use different types of exons. To address this issue, we prepared a bioinformatic script (available upon request) for annotating constitutive and alternative exons based on the custom set of transcripts. This script was written in the Python language and designed to separately address the two kinds of exon classification. First, each exon was maintained to be intact and directly annotated to be constitutive or alternative. Second, all exons can be divided and then classified into the constitutive and alternative exon regions. This script outputs a BED-like file that fully retains the original information from the inputted file ( Figure 1D).

Core Splicing Signals and Regulatory Elements
Spliceosome discriminates exonic/intronic sequences and determines the 5 and 3 SSs by primarily recognizing core splicing signals ( Figure 2A). The 5 SS is selected by complementary recognition between a 9-nt motif and U1 snRNP. This motif always spans the boundary between exon (−3 to −1) and intron (+1 to +6), and its nucleotide composition is the major determinant of 5 SS selection as recently revealed by the massively parallel splicing assay [29]. For the great majority of introns, the two nucleotides at +1 and +2 positions of 5 SS are almost conserved with GU (more than 98%) and GC (less than 1%) [30]. Although both of them have been believed to be canonical types with normal splicing, a recent study also reported that no more than 20% of GU type could retain their capacity to generate the normal transcripts when it is substituted by GC type [31]. The nucleotide composition of 5 SS motif has been also recently updated by analyzing more than 1000 species/lineages [32].
In contrast to 5 SS, the selection of 3 SS is more complicated and would jointly involve three motifs that are generally located within the last region of intron with~50 nts in length [33], including the branch point (BP), polypyrimidine tract (PPT) and AG dinucleotide at the intron/exon junction ( Figure 2A). By binding to SF1/BBP, the BP is an extremely degenerate motif with consensus sequence of YUNAY in humans. Most BPs are proximally located away from 3 SS with 14~50 nts, whereas some could be distantly located up to 350 nts [34]. The PPT is a~20-nt pyrimidine-rich motif and adjacently located to the consensus AG dinucleotide with only 2-nt distance in general, both of which are recognized by the large subunit (U2AF65) and small subunit (U2AF35) of U2 auxiliary factor, respectively. However, a recent study also revealed that the purine-rich elements are widely inserted between the PPT and AG dinucleotide and play positive roles for regulating pre-mRNA splicing [35]. Two adjacent splice sites would be dependently recognized by spliceosome for achieving a higher precision. Therefore, the basic recognition process could be described by two models of "Exon Definition" and "Intron Definition" (Figure 2B), for which spliceosome recognizes the pairing between the two adjacent splice sites across an exon or intron, respectively [36]. The choice of recognition model would mainly depend on the relative length of exons and introns, and this exon-intron architecture is an important evolutionary feature [37]. In contrast to 5′ SS, the selection of 3′ SS is more complicated and would jointly involve three motifs that are generally located within the last region of intron with ~50 nts in length [33], including the branch point (BP), polypyrimidine tract (PPT) and AG dinucleotide at the intron/exon junction ( Figure 2A). By binding to SF1/BBP, the BP is an extremely degenerate motif with consensus sequence of YUNAY in humans. Most BPs are proximally located away from 3′ SS with 14~50 nts, whereas some could be distantly located up to 350 nts [34]. The PPT is a ~20-nt pyrimidine-rich motif and adjacently located to the consensus AG dinucleotide with only 2-nt distance in general, both of which are recognized by the large subunit (U2AF65) and small subunit (U2AF35) of U2 auxiliary factor, respectively. However, a recent study also revealed that the purine-rich elements are widely inserted between the PPT and AG dinucleotide and play positive roles for regulating pre-mRNA splicing [35]. Two adjacent splice sites would be dependently recognized by spliceosome for achieving a higher precision. Therefore, the basic recognition process could be described by two models of "Exon Definition" and "Intron Definition" (Figure 2B), for which spliceosome recognizes the pairing between the two adjacent splice sites across an exon or intron, respectively [36]. The In addition to direct contacts between spliceosome and core splicing signals, pre-mRNA splicing has been regulated by various splicing regulatory elements (SREs) that are the short motifs and enriched within both exons and introns. These SREs regulate the splicing process by recruiting the sequence-specific RNA-binding proteins (RBPs), such as the SR proteins or hnRNPs, that will either activate or inhibit the recognition and use of the adjacent splice sites [38]. Therefore, SREs have been conventionally classified as exonic/intronic splicing enhancers (ESEs/ISEs) and exonic/intronic splicing silencers (ESSs/ISSs) according to their locations and functional roles ( Figure 2C). Both high-throughput computational and experimental approaches have been employed for identifying SREs. Castle et al. (2008) conducted the first genome-wide screen for 4-mer to 7-mer words and computationally identified a large number of SREs [39]. Many in silico methods for predicting SREs have been successfully developed during the past years, which was specifically reviewed recently [40]. Based on the fluorescence-based splicing reporter, the experimental approach was developed to systematically identify SREs [41]. The genome-wide discovery of SREs have been significantly advanced by combining the high-throughput sequencing with immunoprecipitation, such as the crosslinking immunoprecipitation sequencing (CLIP-seq) and RNA immunoprecipitation sequencing (RIP-seq) [42][43][44]. On the whole, hundreds of SREs have been computationally or experimentally discovered and most of them are mainly involved in regulating tissue-specific alternative splicing of pre-mRNAs [45]. Recently, it was revealed that SREs are also responsible for controlling the oscillating alternative splicing [46].

Strength of Core Splicing Signals
The sequence degeneracy of motifs of core splicing signals is the main basis for determining alternative splice sites to be recognized and used. Therefore, it has been widely observed that alternative exons have obvious differential sequence features in comparison with constitutive exons. Splice sites can be quantified as having strong or weak splicing strength depending on how their motifs of the core splicing signals resemble the optimal consensus sequences. In practice, a position weight matrix could be generated and used for scoring the splicing strength of 3 /5 SS by calculating nucleotide frequencies of the motif sequences at each position [6,37]. Additionally, the physical-chemical properties, intra-motif dependencies and machine learning models have recently been successfully adopted into in silico methods for predicting the 3 /5 SS strength [47][48][49].
In general, strong splicing strength of splice sites could facilitate unambiguous recognition by spliceosome and herein result into constitutive splicing, whereas weak splicing strength is more easily subjected to alternative splicing. Therefore, alternative cassette exons and alternative 3 /5 SS exons have the weaker splicing strength than that of constitutive exons [50,51]. However, the relatively weaker strength of splice sites was only observed in the variable ends of alternative 3 /5 SS exons [24]. Grau-Bové et al. (2018) comprehensively investigated the alternative splicing landscape across 65 eukaryotic species and confirmed the significant and consistent relationship between alternative cassette exon and weaker strength of both 3 and 5 SSs [52]. Furthermore, mutations within the recognized motifs of core splicing signals could obviously change the strength and then influence the splicing pattern. Recently, about 10% of exonic pathogenic mutations were found to actually disrupt the spliceosome assembly [53]. Jaganathan et al. (2019) employed a 32-layer deep neural network for successfully identifying the pre-mRNA splicing, which more importantly could accurately and robustly predict the effects of synonymous and intronic mutations on alternative splicing [54]. These sequence features, along with the evolutionary features that are stated below, are summarized in Table 1.

Distribution of SREs
Beside the core splicing signals, alternative and constitutive exons can also differ significantly on their exonic and intronic SREs. Yeo et al. (2007) first identified 314 conserved intronic SREs by comparative genomic approach and found that SREs inserted between two competitive splice sites would be much likely to generate alternative 3 /5 SS exons [55]. By analyzing the alternative splicing landscape among 48 human tissues and cell lines, six clusters of SREs that are represented by UCUCU, UGCAUG, UGCU, UGUGU, UUUU and AGGG were found to be enriched near the alternative cassette exons, which also showed distinct patterns in terms of genomic location and tissue specificity [39]. Rosenberg et al. (2015) systematically measured the splicing patterns of synthetic mini-genes and found that the vast majority of SREs within alternative exons could influence the choice of splice sites in an additive manner [56].
The splicing enhancers would play predominant roles in constitutive splicing, while splicing silencers mainly regulate alternative splicing [38,57]. On the one hand, mutations of splicing enhancers would lead to conversion from constitutive to alternative exons. It was recently observed in human chronic granulomatous disease that the exon 5 of cytochrome b beta chain (CYBB) gene was skipped because of its site mutation of ESEs [58]. On the other hand, the disrupted recognition of ESE that is caused by mutations of the corresponding RBPs was also observed to induce mis-splicing of key hematopoietic regulators in myelodysplasia [59]. The splicing silencers are more abundant in the alternative exons than that in constitutive exons, and the exclusion of alternative exons frequently requires cooperative regulation by multiple silencer elements [60]. In addition, the distributed positions of SREs within exons would also have different influences on the constitutive or alternative splicing [61]. On the whole, it would be less likely to distinguish the constitutive and alternative exon-specific SREs, because most of them do actually function in a spatiotemporal regulation manner.

Exon-intron architecture
Alternative cassette exons are shorter and flanked by the longer introns, which leads to higher intron-to-exon length ratios. Constitutive and alternative exons have differential GC contents at the exon-intron boundaries. Constitutive exons in short length require additional splicing enhancers from the adjacent introns.

Origin
The evolutionary young exons are more likely to be alternatively spliced and have the high inclusion levels only in specific tissue(s). Evolutionary conversion from constitutive to alternative exons is associated with the decreased splicing strength. Changes in exon inclusion level are more likely to be functionally relevant.

Selective constraint
Alternative cassette exons have the faster evolution at amino acid level and higher conservation of nucleotide sequence. Alternative 3 /5 SS exons have differential selective constraints between the variable and fixed ends. Alternative 3 /5 SS exons have high symmetry levels for the alternative region between two competitive splice sites.

Regulatory and coding roles
The evolutionary young exons are more likely located within UTRs and play the regulatory roles. Ancient alternative exons are more likely involved in producing the distinct protein isoforms.

Exon-Intron Architecture
Because the exon-intron architecture significantly affects the recognition models of spliceosome, the relative length of exons and introns is an important characteristic to distinguish the constitutive and alternative exons. It has long been recognized that the alternative cassette exons are flanked by longer introns for both upstream and downstream than those of constitutive exons [52], whereas the alternative 3 /5 SS exons don't show such an obvious difference [62]. For example, in humans the mean length of upstream and downstream introns are~4070 and~3470 nts for the constitutive exons and~5580 and~5020 nts for the alternative cassette exons, respectively [37]. However, length differences of the flanking introns between constitutive and alternative exons were less obvious in lower vertebrates, which suggests that this feature would be a consequence of evolution [37].
Beside introns, the shorter length of alternative cassette exons was previously observed in mammals [37,63], which was recently confirmed by comprehensively analyzing 65 eukaryotic species [52]. Therefore, there is a significant association of alternative cassette exons with higher intron-to-exon length ratios. It was observed that alternative cassette exons have lower GC contents than that of constitutive exons [51]. However, the association between GC content and exon type would display species-specific differences [52]. In addition, the GC contents of exon-intron boundaries could be differentiated between constitutive and alternative exons [64]. Also, the differential GC contents between exons and introns could interact with intron length for regulating alternative splicing [65]. Furthermore, the constitutive splicing of short exons would require additional enhancers from the adjacent introns [66], which may explain why alternative cassette exons always have the shorter length.

Evolutionary Ages and Inclusion Levels of Exons
Within one existing gene, one or more exons can be newly created and also lost during the evolutionary process ( Figure 3A). Among both of them, exon creation events are much more widespread than exon loss and hence significantly contribute to diversification of protein function [67]. New exons could derive from the external insertion/tandem duplication of existing exons and also from de novo exonization of intronic sequences [68]. The exonizations mainly originate from these transposable elements, such as the long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), because they always carry the consensus motifs resembling real splice sites [69]. When putting exons' splicing patterns in an evolutionary context, they become much more complex, as shown in Figure 3B. First, exons could be assigned different evolutionary ages according to what extent they were evolutionary conserved, such as the species-specific (or recently created), lineage-specific (or early evolved), and ancient (or fully conserved) exons. Similarly, the splicing pattern of orthologous exons would change or not change when compared among different organisms. Therefore, every exon should be described by its conservation levels for both evolutionary origin and splicing pattern, for which six exons were representatively exemplified ( Figure 3B).
For an alternative exon, another critical issue that should be taken into consideration is the inclusion level, which was defined as the fraction of the gene's transcripts that include this exon [70]. In practice, the exon's inclusion levels could be quantified by counting the mapped cDNA fragments, such as RNA-Seq reads and ESTs, in support of their respective splice junctions ( Figure 3C). According to the quantified inclusion levels, an alternative exon could be subjectively classified into major and minor isoforms, both of which would have distinct and important biological implications.
For an alternative exon, another critical issue that should be taken into consideration is the inclusion level, which was defined as the fraction of the gene's transcripts that include this exon [70]. In practice, the exon's inclusion levels could be quantified by counting the mapped cDNA fragments, such as RNA-Seq reads and ESTs, in support of their respective splice junctions ( Figure  3C). According to the quantified inclusion levels, an alternative exon could be subjectively classified into major and minor isoforms, both of which would have distinct and important biological implications.

Evolutionary Origins
The systematic investigation of exon origin and evolution was first conducted in rodents by comparative genomic analysis, which revealed that the species-specific exons are more likely to be alternatively spliced and characterized by low inclusion levels [71]. The subsequent similar studies analyzing more vertebrate species also supported an obvious relationship of an exon's evolutionary age with both the potential of alternative splicing and inclusion levels [72,73]. However, these studies were less reliable in inferring the presence/absence of orthologous exons and estimating inclusion levels of alternative exons because they employed comparative genomic approaches and EST data. One later study comprehensively sequenced cDNA molecules among nine tissues from five vertebrates and found that the degree of evolutionary conservation for alternative splicing patterns varied substantially among different tissues, and the ancient alternative exons had the weakest strength of splice sites [74]. Furthermore, the recently converted exons from constitutive to alternative splicing had splice sites of decreased strength, whereas the inverse conversions were not associated with such changes [74]. The recent reanalysis of these RNA-Seq data further found that the species-specific alternative exons would have high inclusion levels in the specific tissue(s), which are also associated with the increased gene expression [75]. By focusing on primate lineage [76], it was suggested that changes in exon inclusion level are more likely to be functionally relevant than that of conversion of splicing pattern. Overall, these studies support the contention that alternative exons are associated with younger evolutionary ages and higher tissue-specific differences of inclusion level in comparison with constitutive exons.
The above conclusions have mainly been drawn from analyses of alternative cassette exons because they are more easily and accurately detected by comparative genomic approaches. Although it is well known that alternative 3 /5 SS exons result from competitive usage of the cryptic splice sites, their evolutionary origins and conservation levels have not been systemically analyzed yet. A recent report studying the alternative splicing landscape in response to infection in humans suggested that the cryptic splice sites would always not be conserved [77]. Furthermore, the cryptic splice sites are generally thought to be associated with lower inclusion levels than that of the nearby canonical splice sites [78]. However, special caution should be paid to this conclusion because the inclusion levels of alternative 3'/5' SS exons are similarly expected to be highly variable among different tissues.

Selective Constraints
Alternative cassette exons are the most common type of alternative splicing event in vertebrates and are thought to be less evolutionarily constrained because they are not virtually included into all mature mRNAs [79]. Higher non-synonymous substitution rates (Ka) were previously observed in alternative cassette exons than that in constitutive exons by analyzing human-mouse orthologous exons, which indicates faster evolution at the amino acid level and significant contribution to protein functional diversification [80,81]. By contrast, alternative cassette exons have lower synonymous substitution rates (Ks) and the increased conservation of nucleotide sequences [82]. Accordingly, the species-specific alternative cassette exons that originated relatively recently only have slightly increased conservation and Ka/Ks ratio [63]. In addition, the peptide sequences encoded by the alternative cassette exons are also less likely to be located within the essential structural units of proteins [83].
Alternative 3 /5 SS exons were previously suggested to be intermediate states because the variable ends are more similar to alternative cassette exons but the fixed ends resemble constitutive exons in terms of sequence conservation level and Ka/Ks ratio [24]. Although the entire sequences of alternative 3 /5 SS exons are less symmetrical (i.e., divisible by 3) like constitutive exons, the regions between the two competitive splice sites show high symmetry levels, and hence are more similar to alternative cassette exons [24,63]. High symmetry levels were also observed for the alternatively spliced internal regions of protein-coding exons [23]. A recent study further revealed that non-synonymous mutations were preferentially located within the alternatively spliced coding regions specific to the minor transcript isoforms [84]. All of these results support the supposition that there are different evolutionary constraints between the constitutive and alternative exons.

Regulatory and Coding Roles
Alternative splicing events have been found to not be evenly distributed throughout the mRNA molecules and are hence differentially involved in regulatory and coding roles. The alternative exons, especially for these evolutionary young exons [72], are more likely to lie within or adjacent to the untranslated regions (UTRs), which is also consistent with the recent observation in humans that the alternative splicing of UTRs was very common and often highly complex [7]. Therefore, the inclusion or exclusion of alternative exons within 5 and 3 UTRs would positively play regulatory roles by influencing the mRNA translational efficiency, second structure, stability, and subcellular localization, which was recently reviewed [85]. However, the more ancient alternative exons are more likely located within coding regions for producing distinct protein isoforms [74]. In addition to coding the additional amino acid segments, inclusion of alternative exons could also provide preferable translation start/end sites that would yield the truncated proteins [86]. These differences indicate the important consequences of evolutionary regulation, because a large proportion of species-and lineage-specific alternative exons are restrictively expressed in the specific tissues and developmental stages.

Conclusions
Here, we provide an overview of several issues in relation to alternative splicing of eukaryotic genes, mainly focusing on sequence and evolutionary features for the alternatively spliced exons. Nevertheless, some interesting topics still remain to be specially addressed in the future, such as bioinformatic approaches for identifying allele-specific alternative splicing events from RNA-Seq data. Funding: This work was financially supported by Sichuan Province ("Sichuan Agriculture Research System").