Mechanistic Insights of Aberrant Splicing with Splicing Factor Mutations Found in Myelodysplastic Syndromes

Pre-mRNA splicing is an essential process for gene expression in higher eukaryotes, which requires a high order of accuracy. Mutations in splicing factors or regulatory elements in pre-mRNAs often result in many human diseases. Myelodysplastic syndrome (MDS) is a heterogeneous group of chronic myeloid neoplasms characterized by many symptoms and a high risk of progression to acute myeloid leukemia. Recent findings indicate that mutations in splicing factors represent a novel class of driver mutations in human cancers and affect about 50% of Myelodysplastic syndrome (MDS) patients. Somatic mutations in MDS patients are frequently found in genes SF3B1, SRSF2, U2AF1, and ZRSR2. Interestingly, they are involved in the recognition of 3′ splice sites and exons. It has been reported that mutations in these splicing regulators result in aberrant splicing of many genes. In this review article, we first describe molecular mechanism of pre-mRNA splicing as an introduction and mainly focus on those four splicing factors to describe their mutations and their associated aberrant splicing patterns.


Splicing Signals and Splicing Reaction
Pre-mRNA splicing is a critical step for protein expression in higher eukaryotes [1]. For constitutive splicing, all exons are ligated in order without any insertions and deletions of nucleotides. The essential signals for splicing reaction mostly reside at both ends of introns ( Figure 1) [1]. At the 5 end, consensus sequence of GURRGU (R stands for purine) can be found in most of introns in mammals ( Figure 1). This site is called 5 splice site (5 ss). CAG consensus sequence is often discovered at the 3 end of introns (Figure 1), which is called 3 splice site (3 ss). In addition to them, pyrimidine (Y) residue stretch precedes to 3 splice site in order to support recognition of 3 splice site in mammals ( Figure 1, (Y)nNCAG). A branch point sequence (BP), at which lariat formation occurs by 2 -5 phosphodiester bond formation with Guanine residue at 5 splice site, resides 20-30 nucleotides upstream of the 3 splice site (Figure 1). Although the sequence for branch point in budding yeast is well-conserved as UACUAAC (underlined A is a branch point) among introns, the conserved sequence around branch point in mammals is YUNAY (branch point is underlined, Y and N stand for pyrimidine and any nucleotide, respectively), which is more diverse (Figure 1) [2]. Then, pyrimidine residue stretch also supports branch point sequence recognition (Figure 1). The splicing reaction consists of two steps, the first step and the second step. In the first step reaction, cleavage at 5 ss and formation of lariat structure in intron occur. The second step reaction includes cleavage at 3 ss and ligation of exons to produce mRNA. Both steps require ATP and divalent cations in vitro. As a divalent cation, magnesium is most efficient in in vitro splicing reaction.

Figure 1.
A scheme for splicing reaction with two steps. Schematic representation of sequences required for splicing reaction. Boxes show exons, and lines between boxes represent introns. Conserved sequence elements of metazoan pre-mRNAs. R and Y stand for purine and pyrimidine residues, respectively. N indicates any nucleotides. Conserved 5′ and 3′ splice sites, and Adenosine residue used for branch nucleotide are underlined.

Spliceosome Formation with Splicing Factors
Splicing reaction takes place in a large ribonucleoprotein complex, termed the spliceosome [1]. The assembly of the spliceosome on pre-mRNA occurs with stepwise association of the uridine (U)-rich small nuclear RNPs (snRNPs) (U1, U2, U4, U5, and U6) ( Figure 2) and a multitude of non-snRNP splicing factors [1]. U snRNPs consist of short RNA, Sm proteins, and a few specific proteins of each U snRNPs. As the first step of the reaction, 5' splice site is recognized by U1 snRNP by RNA-RNA pairing. U2 snRNP then come to associate with a branch point sequence with the help of U2 snRNP auxiliary factor (U2AF) complex that consists of U2AF1 and U2AF2 heterodimer. The RNA component of U2 snRNP also hybridizes with pre-mRNA to recognize BP. The tri-snRNP, U4/U5/U6, then becomes joining to the spliceosome. Two U snRNPs, U4 and U6, form a heterodimer by pairing their RNA components. The spliceosome is activated by removal of U1 and U4 snRNPs to remodel pre-mRNA-U snRNPs and U snRNP-U snRNP interactions, and the first step reaction, the cleavage at the 5' splice site and formation of a lariat structure, takes place. Then, the cleavage at the 3' splice site and ligation of two exons occur as the second step reaction. Several lines of evidence suggested that U6 snRNA has catalytic activity for the splicing reaction.
There is another type of intron which is called a minor intron or ATAC intron, since AT and AC are often found at 5′ and 3′ splice sites in genome DNA, respectively ( Figure  2). There are about 800 minor introns, which corresponds to 0.4% of total introns in humans [3,4]. Although it is thought that splicing efficiency of minor introns is less than that of major introns, minor introns remain residing in the genome of higher eukaryotes. It is Figure 1. A scheme for splicing reaction with two steps. Schematic representation of sequences required for splicing reaction. Boxes show exons, and lines between boxes represent introns. Conserved sequence elements of metazoan pre-mRNAs. R and Y stand for purine and pyrimidine residues, respectively. N indicates any nucleotides. Conserved 5 and 3 splice sites, and Adenosine residue used for branch nucleotide are underlined.

Spliceosome Formation with Splicing Factors
Splicing reaction takes place in a large ribonucleoprotein complex, termed the spliceosome [1]. The assembly of the spliceosome on pre-mRNA occurs with stepwise association of the uridine (U)-rich small nuclear RNPs (snRNPs) (U1, U2, U4, U5, and U6) ( Figure 2) and a multitude of non-snRNP splicing factors [1]. U snRNPs consist of short RNA, Sm proteins, and a few specific proteins of each U snRNPs. As the first step of the reaction, 5 splice site is recognized by U1 snRNP by RNA-RNA pairing. U2 snRNP then come to associate with a branch point sequence with the help of U2 snRNP auxiliary factor (U2AF) complex that consists of U2AF1 and U2AF2 heterodimer. The RNA component of U2 snRNP also hybridizes with pre-mRNA to recognize BP. The tri-snRNP, U4/U5/U6, then becomes joining to the spliceosome. Two U snRNPs, U4 and U6, form a heterodimer by pairing their RNA components. The spliceosome is activated by removal of U1 and U4 snRNPs to remodel pre-mRNA-U snRNPs and U snRNP-U snRNP interactions, and the first step reaction, the cleavage at the 5 splice site and formation of a lariat structure, takes place. Then, the cleavage at the 3 splice site and ligation of two exons occur as the second step reaction. Several lines of evidence suggested that U6 snRNA has catalytic activity for the splicing reaction.   There is another type of intron which is called a minor intron or ATAC intron, since AT and AC are often found at 5 and 3 splice sites in genome DNA, respectively ( Figure 2). There are about 800 minor introns, which corresponds to 0.4% of total introns in humans [3,4]. Although it is thought that splicing efficiency of minor introns is less than that of major introns, minor introns remain residing in the genome of higher eukaryotes. It is possible that regulation of gene expression through regulation of minor intron splicing is important for those organisms. For this minor intron spliceosome formation, there is a different set of U snRNPs (U11, U12, U5, U4atac, and U6atac, Figure 2) [3,4]. The 5 splice site and the branch point sequence is recognized by a heterodimer that consists of U11 snRNP and U12 snRNP, respectively. Instead of U4 and U6 snRNPs, U4atac and U6atac snRNPs are recruited to a minor spliceosome. Interestingly, U5 snRNP is a common snRNP for both major and minor introns ( Figure 2). For both major and minor spliceosomes, formation of the spliceosome takes place with stepwise assembly of complexes designated as H, E, A, B, and C ( Figure 2). The spliceosome is dissociated into two complexes, the mRNP complex and the intron complex, by an RNA helicase RHX34/HRH1 [5]. The mRNP complex, which contains Exon Junction Complexes on mRNA [6,7], will be exported to the cytoplasm, and mRNAs will serve as templates for translation. On the other hand, the intron complex is supposed to be retained and degraded in the nucleus after the removal of U snRNPs and other splicing factors. Post-splicing intron complexes are degraded via the Intron Large (IL) and Intron Small (IS) complexes [8]. The IL complex contains U2, U5, and U6 snRNPs, whereas those U snRNPs are barely detected in the IS complex [8]. The IL complex also contains hPrp19 complex factors whose homologs in budding yeast are involved in both mRNA splicing and the DNA repair process [8]. Those findings suggest an interplay between mRNA splicing and transcription-coupled DNA repair (TCR). Removal of U snRNPs from the IL complex is mediated by the hPrp43/TFIP11 complex in humans [8]. After dissociation of splicing factors, introns in the IS complex are susceptible to RNA lariat debranching enzyme 1 (Dbr1) protein that linearizes introns via dissolving 2 -5 phosphodiester bond [8,9]. Interestingly, hDbr1 can shuttle between the nucleus and the cytoplasm [10], suggesting that it is involved in the RNA quality control process by linearizing intron-lariat containing RNAs accidentally exported from the nucleus.

Alternative Splicing
The splicing described above is termed constitutive splicing, which utilizes all exons in the pre-mRNA molecule. In contrast, another type of splicing, alternative splicing, employs several alternative exons for both inclusion and exclusion [11,12]. There are several types of alternative splicing, such as alternative 5 or 3 splice sites, exon-skipping/-inclusion, intron retention, and mutually exclusive exons ( Figure 3). Alternative splicing produces many variants of mRNAs that are translated into proteins with different structures and functions. Therefore, this step likely contributes to generating diversity of the expressed proteins in higher eukaryotes from the limited number of genes [13]. The Human Genome Project also revealed that over 95% of human genes undergo alternative splicing [14,15]. Abnormalities of both constitutive and alternative splicing in humans may cause dysfunctions or absence of the encoded functional proteins, often resulting in hereditary diseases [16]. and functions. Therefore, this step likely contributes to generating diversity of the expressed proteins in higher eukaryotes from the limited number of genes [13]. The Human Genome Project also revealed that over 95% of human genes undergo alternative splicing [14,15]. Abnormalities of both constitutive and alternative splicing in humans may cause dysfunctions or absence of the encoded functional proteins, often resulting in hereditary diseases [16]. In intron retention alternative splicing, the intron shown in a thick line is recognized as an alternative exon.

Cis-Regulatory Elements and Trans-Acting Factors in Splicing
In vertebrates, intron length is much longer than that in lower eukaryotes. Thus, it is assumed that exon recognition, rather than intron recognition, is a major mechanism for splicing [11,17]. For exon recognition, both upstream and downstream intronic regions of the corresponding exon play important roles. The upstream intronic region includes the branch point (BP), polypyrimidine stretch, and 3′ splice site AG dinucleotide, while the downstream intron region has a 5′ splice site sequence. BP is recognized by U2 snRNP in accordance with the SF3B complex [1,18,19]. Polypyrimidine stretch and AG dinucleotide are bound to U2AF2 and U2AF1, respectively, which form a heterodimer [1,18,19]. The 5′ splice site downstream of exon is associated with U1 snRNP. In addition to those intronic elements, some exons contain exonic regulatory elements that are called exonic splicing enhancer (ESE). ESE is often bound to Serine-Arginine-rich splicing factor (SRSF) family

Cis-Regulatory Elements and Trans-Acting Factors in Splicing
In vertebrates, intron length is much longer than that in lower eukaryotes. Thus, it is assumed that exon recognition, rather than intron recognition, is a major mechanism for splicing [11,17]. For exon recognition, both upstream and downstream intronic regions of the corresponding exon play important roles. The upstream intronic region includes the branch point (BP), polypyrimidine stretch, and 3 splice site AG dinucleotide, while the downstream intron region has a 5 splice site sequence. BP is recognized by U2 snRNP in accordance with the SF3B complex [1,18,19]. Polypyrimidine stretch and AG dinucleotide are bound to U2AF2 and U2AF1, respectively, which form a heterodimer [1,18,19]. The 5 splice site downstream of exon is associated with U1 snRNP. In addition to those intronic elements, some exons contain exonic regulatory elements that are called exonic splicing enhancer (ESE). ESE is often bound to Serine-Arginine-rich splicing factor (SRSF) family proteins to promote exon recognition [18,20,21]. To date, twelve SR proteins have been found in humans [21]. SRSF family proteins share a common feature, one or two RNA binding domains (RBDs) at amino-terminus and Arginine-Serine-rich (RS) domain at carboxy terminus [18,20,21]. RS domain is a protein-protein interaction domain among RS domain-containing proteins [18,20,21]. U2AF1/U2AF35, U2AF2/U2AF65, and U170K, a component of U1 snRNP, also harbor the RS domain. SRSF proteins bind to ESE by its RBD and interact with both U2AF and U170K through the RS domain, which bridges the 3 splice site and 5 splice site over the exon (Figure 4). This crosstalk between 3 and 5 splice sites promotes the recognition of the exons that have ESEs [11,17]. In contrast, there is another regulatory sequence that reduces exon recognition. This regulatory element is called exonic splicing silencer (ESS). ESSs are bound to another class of splicing regulators, heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNP protein family consists of 20 proteins, named from A1 to U, which have many different RNA binding domains [22]. hnRNPs are nuclear abundant proteins that have many cellular functions, such as transcription, splicing, RNA transport/localization, translation, and RNA stability [22]. It was demonstrated that one of the hnRNP family proteins, hnRNP A1, mediates silencing by binding initially to a high-affinity binding site in the exon, which then promotes further hnRNP A1 association with the upstream and downstream regions of the exon [23]. Thus, this results in inhibition of ESE binding of SR proteins (Figure 4). The elements described above are also found in introns, which, in turn, are called Intronic Splicing Enhancer (ISE) and Intronic Splicing Silencer (ISS) [11]. Those regulatory elements and regulatory factors are involved not only in constitutive splicing, but also in tissue-and stage-specific alternative splicing [11]. Both SR proteins and hnRNP proteins are involved in alternative splicing. SR proteins tend to promote exon inclusion by binding to ESE and ISE, whereas hnRNP proteins, especially hnRNP A/B proteins, have a tendency to enhance exon skipping via binding to ESS and ISS [11]. The ratio between SR protein and hnRNP A/B protein levels varies among tissues [24], which likely contributes to tissue-specific alternative splicing. Several tissue-specific splicing regulators are also reported. For example, Nova, KSRP, and RbFox1 proteins are neuron-specific RNA binding proteins that mediate neuron-specific alternative splicing [11]. RNA binding motif protein 24 (Rbm24) and Rbm38 are required for muscle differentiation through splicing modulation [11]. Epithelial cells also express epithelial-specific splicing regulators, ESRP1 and 2, to pursue epithelial-specific alternative splicing [11]. It is likely that the combination of expression levels of SR proteins, hnRNP proteins, and tissue-specific splicing regulators determines which alternative splicing pattern is selected in each tissue.

Human Diseases Caused by RNA Processing Defects
Mutations in those cis-regulatory elements and trans-acting splicing regulators cause aberrant splicing patterns that often result in diseases in humans. These diseases that have defects in RNA metabolism steps have recently been called 'RNA diseases', and it is estimated that up to 15% of all point mutations that result in human genetic disease cause an RNA splicing defect [16]. Ten years ago, it was demonstrated that splicing factors are frequently mutated in myelodysplastic syndrome (MDS). MDS is a heterogeneous group of chronic myeloid neoplasms characterized by many symptoms, such as ineffective hematopoiesis, peripheral blood cytopenia, and a high risk of progression to acute myeloid leukemia [25]. Next-generation sequencing of the patients' genome DNA revealed that SF3B1, U2AF1, SRSF2, and ZRSR2 are the most frequently mutated splicing factors [26-

Human Diseases Caused by RNA Processing Defects
Mutations in those cis-regulatory elements and trans-acting splicing regulators cause aberrant splicing patterns that often result in diseases in humans. These diseases that have defects in RNA metabolism steps have recently been called 'RNA diseases', and it is estimated that up to 15% of all point mutations that result in human genetic disease cause an RNA splicing defect [16]. Ten years ago, it was demonstrated that splicing factors are frequently mutated in myelodysplastic syndrome (MDS). MDS is a heterogeneous group of chronic myeloid neoplasms characterized by many symptoms, such as ineffective hematopoiesis, peripheral blood cytopenia, and a high risk of progression to acute myeloid leukemia [25]. Next-generation sequencing of the patients' genome DNA revealed that SF3B1, U2AF1, SRSF2, and ZRSR2 are the most frequently mutated splicing factors [26][27][28]. Since SF3B1, U2AF1, and SRSF2 have the particular amino-acid residues mutated in many patients, those mutations are supposed to cause 'gain-of function' mutations in those genes. In contrast, many different mutations were found in ZRSR2 genes, suggesting that those mutations result in 'loss-of-function' of the ZRSR2 gene product. Therefore, it was assumed that aberrant splicing caused by splicing factor mutations results in onset of MDS. Several lines of evidence have been accumulated to demonstrate how MDS onsets are caused by mutations in certain splicing factors.
In this review article, we mainly introduce four of the most frequently mutated splicing factor genes (SF3B1, U2AF1, SRSF2, and ZRSR2) in MDS with the aberrant splicing mechanism caused by those mutations and an outline of MDS from a splicing point of view.

SF3B1
SF3B1 is one of the components of the SF3B complex that stabilizes U2 snRNP binding to the branch point sequence during pre-mRNA splicing ( Figure 5) [2,29]. SF3B1 gene is located on chromosome 2q33.1. Approximately 20-28% of MDS patients harbor SF3B1 mutations [27,[30][31][32]. Surprisingly, SF3B1 mutations are responsible for the ring sideroblast (RS) phenotype in ∼98% of cases [31]. In a study with NOD scid gamma (NSG) mouse, mice transplanted with hematopoietic stem cells (HSCs) from SF3B1 mutant MDS-RS patients develop the characteristic ring sideroblasts phenotype [33]. Most recently, the International Working Group for the Prognosis of Myelodysplastic Syndromes (IWG-PM) provided supporting evidence that shows the recognition of SF3B1-mutant MDS as a distinct diagnostic entity [34]. The dataset they used includes 3479 patients with known SF3B1 mutation status that represents the largest MDS data set with genetic data reported to date. Their validation strongly supports the correlation of SF3B1 mutations with clinical phenotype in MDS. SF3B1 mutations in MDS patients have a cluster as a 'hot spot' at 700th residue of Lysine changed to Glutamine, which resides in HEAT domain repeats [32,35]. Other hotspots (R625, H662, and K666) are also assumed to have a similar functional impact due to their close spatial proximity in HEAT repeats [32,36]. Many studies have demonstrated that SF3B1 mutations cause aberrant splicing via cryptic 3 splice site usage [37][38][39][40][41]. Since the SF3B complex is involved in recognition of a branch point sequence, this mutation is highly likely to cause cryptic branch point sequence recognition and usage of 3 splice site. The aberrant splicing caused by cryptic 3 splice site usage often creates premature termination codons in the mRNA, resulting in transcript degradation by nonsense-mediated decay (NMD) [42,43]. In SF3B1 mutant samples, reduction of intronretaining isoforms was consistently reported [44,45]. These results suggest that reduced intron retention is due to the ability of SF3B1 mutants to select an upstream aberrant 3 splice site [44]. Decreased intron retention was more prominent in the cytoplasm of SF3B1 mutant cells, suggesting that nuclear export of intron-retaining transcripts was impaired [45]. By using RNA-sequencing, many dysregulated gene isoforms and aberrantly spliced target genes in SF3B1 mutant MDS have been identified [37,39,40,[46][47][48][49][50]. It becomes of great interest to identify and functionally characterize specific target genes of mutant SF3B1 in order to discover drug targets. As for drug candidates, it has been suggested that SF3B1 inhibitors have potential in treating the preleukemic state and related myeloid disorders. Since SF3B1 is essential for splicing, it was expected that SF3B1 inhibitors completely block the splicing reaction for all introns. However, inhibition of splicing by spliceostatin A (SSA) is partial in cultured cells and produces shorter transcripts that are translated into truncated proteins in tumor cells [51,52]. Another compound, pladienolide B, is an antitumor macrolide, and it was also found to interact with SF3b to inhibit splicing [53]. E7107, a derivative of pladienolide D, displayed strong antitumor activity [53]. It turned out that E7107 blocks spliceosome assembly by preventing tight binding of U2 snRNP to pre-mRNA [52]. It is worthy to design small chemical compounds that strongly associate and inhibit SF3B1 as drug candidates for MDS and other cancers such as breast cancer and lung adenocarcinoma.
inhibitors have potential in treating the preleukemic state and related myeloid disorders. Since SF3B1 is essential for splicing, it was expected that SF3B1 inhibitors completely block the splicing reaction for all introns. However, inhibition of splicing by spliceostatin A (SSA) is partial in cultured cells and produces shorter transcripts that are translated into truncated proteins in tumor cells [51,52]. Another compound, pladienolide B, is an antitumor macrolide, and it was also found to interact with SF3b to inhibit splicing [53]. E7107, a derivative of pladienolide D, displayed strong antitumor activity [53]. It turned out that E7107 blocks spliceosome assembly by preventing tight binding of U2 snRNP to pre-mRNA [52]. It is worthy to design small chemical compounds that strongly associate and inhibit SF3B1 as drug candidates for MDS and other cancers such as breast cancer and lung adenocarcinoma.

SRSF2
SRSF2, originally called SC35 [54], is a member of the SR protein family that is involved in both constitutive and alternative splicing [11,21]. The SRSF2 gene is located on chromosome 17q25.2. SRSF2 mutations have been found in about 14% of patients with MDS [55]. SRSF2 contains an RBD for RNA binding and an RS domain for interaction with other proteins. SRSF2 promotes exon recognition by binding to ESE in pre-mRNA through its RBD ( Figure 5). Through its RS domain, SRSF2 interacts with U2AF heterodimer and U170K, which results in promotion of the association of those factors to the upstream 3 splice site and the downstream 5 splice site, respectively [11,18,56]. In MDS patients, the mutations in SRSF2 are clustered in the 95th Proline residue as a 'hot spot' [28,57]. Like SF3B1 and U2AF mutations in MDS, it was assumed that these hot spot mutations cause gain-of function of the mutant proteins. Since this Proline residue resides slightly outside of the RNA Binding Domain of SRSF2, it was assumed that the mutations in MDS do not affect RNA binding activity of SRSF2. However, splicing pattern changes with SRSF2 mutations were reported in culture cells, mouse models, and primary human samples [57][58][59]. Analyses and comparison of binding sequence motifs for wild type and mutant SRSF2 proteins revealed that mutant SRSF2 proteins have higher binding affinity to GGNG and CCNG motifs (C as Cytosine, G as Guanosine, N as any nucleotides) in addition to the Purine-rich motif, which wild type binds efficiently [57][58][59][60]. This affinity change results in differential splicing of many genes, including EZH2, a gene implicated in the pathogenesis of MDS [61]. EZH2 is a SET-domain containing histone methyltransferase that is a component of the Polycome Repressive Complex 2 (PRC2). PRC2 catalyzes tri-methylation of histone H3 at Lys 27 (H3K27me3) to regulate gene expression. Since aberrant splicing of EZH2 under mutant SRSF2 includes exon 9.5 that contains the stop codon, protein level of EZH2 is likely reduced in MDS patient cells [45,57,62,63]. Another epigenetic factor mutation implicated a pathogenic crosstalk between altered states of epigenome and splicing in a subset of leukemias. Yoshimi et al. demonstrated that aberrant splicing of INTS3 contributed to leukemogenesis in concert with mutant IDH2 and was dependent on mutant SRSF2 binding to cis-regulatory elements in INTS3 pre-mRNA and increased DNA methylation of INTS3 [64].
It has also been demonstrated that MDS-responsible mutations in SRSF2 and U2AF1 cause expansion of R loop [65,66]. Expansion of R loop formation results in activation of the DNA damage response pathway [66]. Efficient formation of R loop may take place by slowing down rearrangement of mRNA-protein complexes during/after splicing. It is of great interest how splicing factors take part in R loop formation and/or resolution.

U2AF1
U2AF1 is originally identified as a component of the U2 snRNP auxiliary factor complex (U2AF) that facilitates association of U2 snRNP to the branch point sequences [67,68]. U2AF1 is a small subunit of the U2AF heterodimer that is responsible for the recognition of AG dinucleotide in pre-mRNA 3 splice sites ( Figure 5) [1,11,18]. Another subunit, U2AF2, recognizes a pyrimidine stretch residing between the branch point sequence and the 3 splice site [67,68]. Since U2AF1 and 2 contain the RS domain, they can interact with the U170K protein that also has an RS domain to get an interaction between 3 splice site and downstream 5 splice site over the exon. The U2AF1 gene is located on 21q22.3, and its mutations occur in approximately 7-11% of MDS patients [28,30,32,69]. U2AF1 mutations also have 'hot spots' at S34 and Q157 that are located in conserved zinc finger domains, Zn1 and Zn2 [26,30,32,70,71]. Both S34 and Q157 mutations in U2AF1 have been shown to affect splicing through RNA binding activity, but they have different effects on 3 splice site recognition. It was demonstrated that U2AF1 S34 mutants tend to promote aberrant exon inclusion when the 3 splice site sequence is CAG or AAG [71][72][73][74][75][76][77]. In contrast, Q157 mutants affect recognition at one nucleotide downstream of the 3 splice site AG dinucleotide, promoting exon inclusion when a Guanine is at this position [73]. Recently, it was found that U2AF1 S34 mutant induces inclusion of exon 4 in alternative splicing of interleukin-1 receptor-associated kinase 4 (IRAK4). This isoform encodes IRAK4-L protein that causes innate immune activation [78]. IRAK4 activates NF-κB and MAPK pathways via mediating signaling downstream of the Toll-like receptor (TLR) superfamily [78]. IRAK4 is divided into two spliced isoforms dependent on exon 4 being contained or excluded: IRAK4-L and IRAK4-S [78]. IRAK4-S could control the innate immune response in normal hematopoietic cells, while IRAK4-L mediates NF-κB maximal activation, resulting an uncontrolled innate immune response in malignant hematopoietic cells [78]. IRAK4-L is also expressed highly in breast and colon cancer cell, indicating its association with oncogenicity [78]. Furthermore, mutant U2AF1(S34F) AML cells acquire a dependency on IRAK4-L and are sensitive to IRAK4 inhibitors, which suggests a therapeutic strategy [78]. Most recently, crystal structure analysis of yeast U2AF1 revealed that the 3 splice site AG dinucleotide is strongly recognized by the two Zn finger domains and how aberrant alternative splicing occurs with MDS mutations [79]. By using Förster resonance energy transfer (FRET), the influence of both wild-type or S34F mutant U2AF1 on the conformational dynamics of U2AF2 and RNA complexes was also determined. Warnasooriya et al. demonstrated that the U2AF heterodimer (U2AF1 + U2AF2) binds weak pyrimidine tracts as a mixture of closed and open U2AF2 conformations, and the S34F mutation of U2AF1 modulates shifts between open and closed U2AF2 [80]. It may help to design chemical compounds targeting mutated Zn finger domains to inhibit or modify their RNA binding activity as therapeutic approaches to MDS.

ZRSR2
The ZRSR2 gene is located on chromosome Xp22.2 and mutated in about 5% of MDS patients, predominantly males [28]. Relatively little is known about this protein's function in mRNA splicing. This protein is another member of the SR-rich family of splicing factor, and it was shown to be responsible for the recognition of the 3 splice acceptor site for both a major and a minor intron in vitro ( Figure 5) [81]. Since the mutations of ZRSR2 in MDS patients were found all over the coding region as out-of-frame insertions, deletions, nonsense, and missense, the nature of the mutations is likely loss-of-function [82]. The ZRSR2 mutations cause abnormal splicing via intron retention of U12-depedent minor introns [82]. Humans have a limited number (about 800, 0.4% of total numbers of introns) of minor introns [3,4]. Among them, several genes including some E2F transcription factors and several genes in the MAPK/ERK pathway show aberrant splicing in ZRSR2 mutant MDS samples [82].
Most recently, it was demonstrated that impaired minor intron excision by knock-out of ZRSR2 protein enhances hematopoietic stem cell self-renewal, and mutations in minor introns are suggested to be potential cancer drivers [83]. However, the precise molecular mechanism how ZRSR2 mutations affect minor intron splicing remains to be elucidated. Elucidation of the molecular mechanism for ZRSR2 involvement in minor intron splicing may provide useful information for both basic knowledge for minor intron splicing and identification of targets for MDS with ZRSR2 mutations.

Other Splicing Factor Mutations in MDS
In addition to four splicing factors, rare mutations in several other splicing factors were also identified. One of the mutated splicing factors is PRPF8. Studies from the yeast Prp8 protein revealed that PRPF8 protein is an essential factor for splicing and interacts with U5 snRNA to align 5 and 3 splice sites in the spliceosome [84]. PRPF8 mutation causes missplicing [85], highly likely through alteration of splice sites selection. An analysis for the precise mechanism of how PRPF8 mutations affect exon recognition remains to be performed. Another factor is LUC7L2. LUC7L2 is an ortholog of splicing factor LUC7 which is involved in recruitment of splicing factors. The LUC7L2 protein is assumed to be involved in the recognition of non-consensus splice donor sites in association with the U1 snRNP [86]. Interestingly, one RNA helicase protein, DDX41, was shown to be mutated in MDS and AML [87]. Several RNA helicases are known to be involved in splicing steps likely by causing spliceosome conformation change with their ATP-dependent helicase activity. Indeed, the most common mutation is the R525H mutation, which is assumed to affect adenosine triphosphate (ATP) binding [87]. Very rare mutations have also been found in SF3A1, SF1, PRPF40B, and U2AF2 [88], which are mainly involved in recognition of the branch point and the 3 splice site. It is of great interest for RNA scientists to investigate how these mutations affect splicing reaction. Those analyses will shed light not only on MDS pathogenesis but also on understanding the basic splicing mechanism.

Conclusions and Future Perspectives
In this review, we introduced four major splicing factors mutated in MDS with aberrant splicing caused by mutations. Interestingly, most of the proteins described in this review are involved in 3 splice site recognition ( Figure 5). In addition, SRSF2 is involved in exon recognition through ESE binding. Taken together, it is likely that a splicing mode called exon recognition ( Figure 6) [17] participates in aberrant splicing in MDS. In higher eukaryotes, the average length of introns is much longer than that of lower eukaryotes. In fact, the average length of human introns is 5849 nucleotides, while that of nematodes is 335 nucleotides [89]. In contrast, the average length of internal exons, which is no longer than 300 nucleotides, does not differ between vertebrates and lower eukaryotes. Therefore, exon recognition is likely a major mode for splicing in vertebrates whose intron size is large, while intron recognition is dominant in lower eukaryotes in which introns are relatively short ( Figure 6). The facts that 5 splice site mutations result in skipping of adjacent exons and cause human diseases also support the exon recognition model. For 5 terminal exons, it is assumed that the cap structure serves as a substitute of the 3 splice site. The cap structure is recognized by a nuclear cap binding protein complex that consists of NCBP1/2 proteins in the nucleus [90][91][92]. As NCBP1 was demonstrated to associate with U2 snRNP [92], it is possible that NCBP1-U2 snRNP interacts with U1 snRNP at the 5 splice site to define the first exon. As for 3 terminal exons, poly(A) addition signal and poly(A) addition machinery are assumed to serve as interactors with U2 snRNP on the branch point in the last intron. As supporting evidence, mutation of the 3 splice sites inhibits the polyadenylation cleavage reaction in vitro [93]. In the exon recognition model, definition of the 3 splice site region highly likely takes place first, and this step is critical for exon recognition. Although many excellent works have been performed and provide information for the mechanism of vertebrate exon recognition, it remains unclear whether different factors/mechanism are involved in different exons. It is expected that precise analyses of the aberrant splicing mechanism in MDS with mutant splicing factors also contribute to uncovering the regulation of alternative splicing through exon recognition in vertebrates.  Figure 6. Schematic representation of exon recognition and intron recognition models during splicing. In lower eukaryotes, whose intron size is small, intron recognition is a dominant mode for splicing (upper panel). Introns are recognized by crosstalk between U1 snRNP and U2 snRNP that bind to the 5′ splice site and a branch point, respectively. On the other hand, exon recognition is major for splicing in vertebrates, in which introns are long. In this type, exons are recognized via interaction over exons between U2 snRNP and U1 snRNP that bind to a branch point and the 5′ splice site, respectively (lower panel).
Although some aberrant splicing patterns in dysregulated genes have been identified to be involved in MDS onset as described above, it is still under investigation how different mutations in different splicing factors cause different MDS phenotypes. To date, there seems to be no common gene(s) whose aberrant splicing is responsible for MDS onset caused by mutations in four main splicing factors SF3B1, SRSF2, U2AF1, and ZRSR2. It is assumed that hot spot mutations among them in SF3B1, SRSF2, and U2AF1 do not cause reduction of the encoded proteins, whereas mutations in ZRSR2 reduce functional protein amount. Splicing pattern analyses implicate that common pathways affected by mutations of those factors are epigenetics and signal transduction pathways. These points have to be addressed in future analyses. The approaches from mechanistic analyses of aberrant splicing caused by mutated splicing factors should shed light on research for therapies of MDS by identifying drug targets.
Author Contributions: All the authors contributed to the work by writing and proofreading the manuscript. N.K. took the primary responsibility for the final content. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported in part by Grants-in-Aid for Scientific Research (18K06012 to N.K.).
Acknowledgments: This work was supported by Grants-in-Aid for Scientific Research (Grant No. 18K06012) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan. We apologize to all scientists whose research could not be properly discussed and cited in this review owing to space limitations.

Conflicts of Interest:
The authors declare no conflicts of interests. Figure 6. Schematic representation of exon recognition and intron recognition models during splicing. In lower eukaryotes, whose intron size is small, intron recognition is a dominant mode for splicing (upper panel). Introns are recognized by crosstalk between U1 snRNP and U2 snRNP that bind to the 5 splice site and a branch point, respectively. On the other hand, exon recognition is major for splicing in vertebrates, in which introns are long. In this type, exons are recognized via interaction over exons between U2 snRNP and U1 snRNP that bind to a branch point and the 5 splice site, respectively (lower panel).
Although some aberrant splicing patterns in dysregulated genes have been identified to be involved in MDS onset as described above, it is still under investigation how different mutations in different splicing factors cause different MDS phenotypes. To date, there seems to be no common gene(s) whose aberrant splicing is responsible for MDS onset caused by mutations in four main splicing factors SF3B1, SRSF2, U2AF1, and ZRSR2. It is assumed that hot spot mutations among them in SF3B1, SRSF2, and U2AF1 do not cause reduction of the encoded proteins, whereas mutations in ZRSR2 reduce functional protein amount. Splicing pattern analyses implicate that common pathways affected by mutations of those factors are epigenetics and signal transduction pathways. These points have to be addressed in future analyses. The approaches from mechanistic analyses of aberrant splicing caused by mutated splicing factors should shed light on research for therapies of MDS by identifying drug targets.
Author Contributions: All the authors contributed to the work by writing and proofreading the manuscript. N.K. took the primary responsibility for the final content. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported in part by Grants-in-Aid for Scientific Research (18K06012 to N.K.).