Widespread Alternative Splicing Changes in Metastatic Breast Cancer Cells

Aberrant alternative splicing (AS) is a hallmark of cancer and a potential target for novel anti-cancer therapeutics. Breast cancer-associated AS events are known to be linked to disease progression, metastasis, and survival of breast cancer patients. To identify altered AS programs occurring in metastatic breast cancer, we perform a global analysis of AS events by using RNA-mediated oligonucleotide annealing, selection, and ligation coupled with next-generation sequencing (RASL-seq). We demonstrate that, relative to low-metastatic, high-metastatic breast cancer cells show different AS choices in genes related to cancer progression. Supporting a global reshape of cancer-related splicing profiles in metastatic breast cancer we found an enrichment of RNA-binding motifs recognized by several splicing regulators, which have aberrant expression levels or activity during breast cancer progression, including SRSF1. Among SRSF1-regulated targets we found DCUN1D5, a gene for which skipping of exon 4 in its pre-mRNA introduces a premature termination codon (PTC), thus generating an unstable transcript degraded by nonsense-mediated mRNA decay (NMD). Significantly, distinct breast cancer subtypes show different DCUN1D5 isoform ratios with metastatic breast cancer expressing the highest level of the NMD-insensitive DCUN1D5 mRNA, thus showing high DCUN1D5 expression levels, which are ultimately associated with poor overall and relapse-free survival in breast cancer patients. Collectively, our results reveal global AS features of metastatic breast tumors, which open new possibilities for the treatment of these aggressive tumor types.


Introduction
Following transcription, introns are removed from the pre-mRNA transcript and exons are joined to each other to produce the mature messenger RNA (mRNA), a process known as splicing [1]. The splicing reaction is carried out in the nucleus by the spliceosome, a dynamic and large ribonucleoprotein (RNP) complex composed of proteins and RNAs [1,2]. Different from constitutive splicing, where an exon is always included in the final mRNA, alternative splicing (AS) produces more than one mRNA isoform from a single pre-mRNA using various combinations of 5 and 3 splice-sites to achieve proteome diversity. In humans, at least 95% of pre-mRNAs undergo AS [3]. Different types of AS events arise from the different recognition of splice sites and include: Cassette exons or skipped exons (SE), alternative 5 splice-site (A5SS), alternative 3 splice-site (A3SS), alternative transcription start sites (AltS), alternative transcription termination sites (AltT), mutually exclusive exons (MEE), multi-exon skipping (MES), and intron retention (IR). These different types of AS high expression levels of DCUN1D5 were associated with a poor 5-year overall survival rate and relapse-free survival in breast cancer patients. Collectively, our data contribute to defining the impact of AS changes in breast cancer thus leading to a better comprehension of the molecular underpinnings involved in the malignant transformation. Total RNAs were extracted using RixoEX reagent (GeneAll, Seoul, Korea, Cat: 301-001) or Rneasy Mini kit (Qiagen, Hilden, Germany, Cat: 74106) according to the manufacture's instruction. Reverse transcription was performed using M-MLV reverse transcriptase (ELPIS, Daejeon, Korea, Cat: EBT-1028) and 1 µg RNA for cDNA synthesis. Then, 0.5 µL cDNA was used for PCR amplification. Quantitative RT-PCR (RT-qPCR) was performed with iQ™ SYBR ® Green Supermix kit (BioRad, Hercules, CA, USA, Cat: 1708880) or QuantiTect SYBR Green PCR (Qiagen, Hilden, Germany, Cat: 204145) using a LyghtCycler 480 (Roche, Basel, Switzerland), according to the manufacturer's instructions using GAPDH, B2M, or RPLP0 as an internal control. Primers used in this study are listed in Supplementary Table S1.

Alternative Splicing Analysis with RASL-Seq
A pool of oligonucleotides was designed to detect 5530 AS events. RASL reaction was performed as previously described. Two oligonucleotide sets were designed to detect mRNA isoforms of one gene with cassette exon included and excluded. The mixture of oligonucleotides was hybridized with RNAs and selected with biotin-labeled oligo dT. Two nearby oligos were then ligated and barcoded for high-throughput sequencing using Illumina Hiseq 2500 apparatus (Illumina, San Diego, CA, USA). Splicing events were filtered for a minimum of 5 read counts in all biologic triplicates. AS changes were filtered using the following criteria: Ratio change of at least 2 and p-value < 0.05. Gene enriched in up-, down-and non-differentially regulated (ndiff) AS events in high-metastatic breast cancer cells are listed in Supplementary Table S2.

Calculation of Splicing Score
3 splice site sequences including 20 nt of 3 end intron and 3 nt of 5 exon were used to analyze 3 splice-site score using MaxEntScan [37,38]. 9 nt sequences containing 3 nt of 3 exon and 6 nt of 5 intron were used to calculate 5 splice-site scores using MaxEntScan. Cassette exon length, intron length, median transcript length, and GC content were calculated after extracting sequences using BEDTools [37]. RNA-binding motifs were analyzed using Discriminative Regular Expression Motif Elicitation (DREME) algorithm [39].

Gene Expression and Splicing Analyses of Transcriptomic Data
Expression analysis of RBPs, whose binding motifs were enriched in up and down identified cassette exons, in breast specimens (normal breast; luminal breast cancer; HER2 positive breast cancer; TNBC) of the TCGA-BRCA dataset was performed by using UAL-CAN web-tool (http://ualcan.path.uab.edu; accessed on 3 March 2021) [44].
DCUN1D5 mRNA splicing patterns (skipping of exon 4) were analyzed using TCGA SpliceSeq (http://projects.insilico.us.com/TCGASpliceSeq; accessed on 3 March 2021) [45], a web-based resource known to provide Percent Splice-In (PSI) values for splicing events retrieved in specimens from the TCGA-BRCA level 3 dataset. TCGA-BRCA cancer omics data were also analyzed using UALCAN web-tool (http://ualcan.path.uab.edu; accessed on 3 March 2021) to determine total DCUN1D5 expression levels. Transcripts per million (TPM) data were used to generate boxplots. Interquartile ranges (minimum, 25th percentile, median, 75th percentile, maximum) are shown. UALCAN also provides an estimate of statistically significant differences in gene expression level between groups employing TPM values and a student's t-test. Gene expression data were also retrieved using the Oncomine tool (http://oncomine.org/resource; accessed on 3 March 2021) [46]. Breast cancer microarray databases with altered DCUN1D5 gene expression were filtered considering their significance (p-value < 0.05). Log 2 -median centered intensity is shown. Additional breast cancer datasets (Perou GSE3521 and GSE10893) were retrieved using the Human Cancer Metastasis Database (HCMDB) (https://hcmdb.i-sanger.com; accessed on 3 March 2021) [47]. Gene expression data are shown as Log 2 intensity. HCMDB also provides breast transcriptome data classified based on metastasis status.
Gene and transcript expression levels in breast cancer cell lines were retrieved in the Cancer Cell Line Encyclopedia database (CCLE; https://portals.broadinstitute.org/ccle/; accessed on 3 March 2021) [48]. Breast cancer cell lines were classified as luminal, HER2 positive, or TNBC according to Dai et al., 2017 [49]. Percentage of cassette exon inclusion of each analyzed event in the CCLE was calculated considering all the annotated transcripts in which the cassette exon is present of total transcripts in which the alternative region is transcribed (at least one constitutive exon should be present upstream and downstream the alternative exon).

Survival Analysis
Overall survival curves of breast cancer patients were generated by analyzing clinical data (survival time and survival status) of patients with breast cancer of the TCGA-BRCA dataset. Patients were stratified (split by median) according to DCUN1D5 RSEM expression. Clinical data and RNA-seq data of DCUN1D5 expression were obtained from TSVdb webtool (http://www.tsvdb.com; accessed on 3 March 2021) [50]. Log-rank Mantel-Cox test was employed to determine any statistical difference between the survival curves. Additional 5-years relapse-free survival curves were obtained from a Kaplan-Meier Plotter (https://kmplot.com/analysis; accessed on 3 March 2021) [51] with indicated probes for DCUN1D5. Hazard ratio (and 95% confidence intervals) and log-rank P are displayed.

Plasmids
T7-tagged SRSF1 vector was generated as described previously [52]. SRSF3-T7 plasmid was previously described [52]. DCUN1D5-GFP expression plasmid was produced by PCRamplification using human genomic DNA as a template and inserting in pEGFP-C1 vector with HindIII/BamHI enzymes. Primers used are listed in Supplementary Table S1.

Plasmid Transfection and siRNA Silencing
MCF7 and HEK-293 cells were transiently transfected with lipofectamine LTX with Plus reagent (Invitrogen, Cat: 15338100), according to the manufacturer's protocol. Briefly, cells were seeded in a 12-well or 96-well plate in order to reach 70-90% of confluence the day of the transfection. SRSF1 silencing in MCF7 cells was performed by using three consecutive RNAiMax Lipofectamine (Life Technologies, Waltham, MA, USA, Cat: 13778075) transfection of 30 pmol of SRSF1 siRNA (gcaacagcaggagucgcaguu/cugcgacuccugcuguugcuu) or a control siRNA.

Statistical Analyses and Plotting
Two-sided t-tests were performed for all splice-site strength analyses [38]. Ordinary one-way ANOVA for multiple comparisons was performed to compare more than 2 groups. Mann-Whitney two-sided U test was performed for transcript median length, exon length, upstream intron length, and GC content [53]. Plots in Figure 3 are plotted with R (http://www.R-project.org/; accessed on 25 February 2021).

Identification of Widespread AS Changes in High-Metastatic Breast Cancer Cells
To identify pre-mRNAs undergoing abnormal AS regulation in high-metastatic breast cancer cells compared to low-metastatic cancer cells, we performed a RASL-seq using RNAs extracted from MDA-MB-231 and MCF7 cell lines, respectively. RASL oligonucleotide pool was designed to detect 5530 AS events in the human genome. Collectively, we detected 1558 AS events that expressed both isoforms in MCF7 and MDA-MB-231 cells with a minimum of 5 read counts for each isoform, allowing calculating isoform ratio changes. By analyzing biological triplicates, AS switches of these two cell lines with ratio changes ≥2 and ≤−2 were considered to be significantly (p < 0.05) up-regulated and down-regulated splicing events, respectively. Conversely, AS events with the ratio change comprised between 2 and −2 were considered to be non-differentially regulated (ndiff). Using this cutoff, we identified 925 AS events-~59.4% of a total of 1558 identified-significantly altered in MDA-MB-231 cells compared to MCF7 cells, suggesting a striking difference in splicing programs between these two cell lines (Supplementary Table S2). In MDA-MB-231 cells, 552 and 268 SEs exhibited significantly increased or decreased ratios compared to those in MCF7 cells ( Figure 1A). RASL-seq mostly represents AS of SEs, although it can also be used to investigate other types of AS events. Accordingly, we were also able to detect 39 A5SS, 31 A3SS, 15 MEE, 3 MES, 9 AltS, and 8 AltT ( Figure 1A) (Supplementary Table S2). Importantly, most of the regulated AS events (98.2%) affected protein-coding genes, further supporting the importance of AS in the generation of human protein diversity ( Figure 1B). Taken together, RASL-seq results reveal global AS switches associated with the metastatic potential of MDA-MB-231 cells.
To identify the functional impact of the altered AS events in metastatic cancer cells, we next applied Gene Ontology (GO) analysis to locate functional categories enriched in the set of genes showing increased or decreased cassette exons splicing. As shown in Figure 1C
To further prove that our RASL-seq in MCF7/MDA-MD-231 was also effective in the identification of differently regulated AS events in highly metastatic breast cancer cells, we have also validated the 18 cassette exons shown in Figure 2 in another model of lowmetastatic versus high-metastatic breast cancer, namely T47D and BT-549 cells. As shown in the Supplementary Figure S1, we found that 16 out of 18 events (MARK3 exon 15; DST exon 93; TBC1D13 exon 3; MYOF exon 17; DCUN1D5 exon 4; FGFR1OP2 exon 4; FNBP1 exon 10; SYNE2 exon 106; SLK exon 13; ADD3 exon 13; USO1 exon 15; KIF13A exon 38; ATP11C exon 29; SMARCA1 exon 13; ACIN1 exon 4; SMG7 exon 18) were in the same direction (inclusion or skipping) also in low-metastatic T47D compared to high-metastatic BT-549 breast cancer cell lines. Moreover, we also exploited the CCLE [48] database to analyze inclusion levels of the aberrantly regulated cassette exons in a total of 46 breast cancer cell lines classified according to their major subtype (luminal; HER2 positive, or TNBC). Notably, a significant increase in exon inclusion in high-metastatic TNBC was found for MARK3, MYOF, FGFR1OP2, and FNBP1 events. Conversely, increased skipping of cassette exons in TNBC cells was observed for SLK, ADD3, USO1, and KIF13A (Supplementary Figure S2 and Supplementary Table S4). Collectively, our results showed a global splicing modification of metastatic breast cancer cells.

Distinct Sequence Properties of AS Cassette Exons Differently Regulated in High-vs. Low-Metastatic Breast Cancer Cells
As widespread differences of AS events were observed in high-metastatic MDA-MB-231 cells, we wanted to understand sequence features of enhanced or repressed AS cassette exons in these metastatic cells. The outcome of AS reaction depends on the coordinated action of trans-acting RBPs that bind cis-regulatory sequences within the nascent pre-mRNA and promote inclusion or skipping of specific AS exons. Thus, we first looked for enrichment of RNA motifs associated with the metastasis potential of MDA-MB-231 cells across AS cassette exons and their flanking intronic sequences (200 nt). In up-regulated cassette exons, we found that numerous motifs were enriched in upstream (RAAAUG, CACAG, UU-UNUUUU, UYUCUSU, AAAUAY) and downstream (UUUUWAAA, SCCAGGC, GURAG, UUUUMU) introns, but not inside the AS exons ( Figure 3A). Additionally, in downregulated cassette exons, we found enriched motifs located in the regulated exons (YUACA, GGAGRA) or in the upstream intron (ACAG) ( Figure 3B). In ndiff cassette exons, we could detect only a motif (GCCUGGS) located at upstream intron ( Figure 3C). RBPs that might bind to these motifs are listed in Supplementary Table S5. Importantly, we found that several RBPs, which are able to recognize the enriched sequences at 5 and 3 splice-sites and inside the AS cassette exons, were differentially expressed in breast cancer and during breast tumor progression, including PTBP1 [54,55], SRSF1 [28,30], and SRSF9 [30].
We further compared splice site strength of up-regulated (n = 552) and down-regulated (n = 268) cassette exons in MDA-MB-231 cells vs. MCF7 cells as well as non-differentially regulated exons (ndiff) (n = 307). To this aim, we compared 5 and 3 splice-site scores of cassette exon (Alt) and its flanking constitutive exons (C1 and C2) by obtaining splicesite scores of each of the four splice-sites (Supplementary Figure S3A). As shown in Supplementary Figure S3B, flanking exons included similar strength of 5 and 3 splicesites. In addition, we were not able to observe differences in 3 splice-site strength of cassette exons (in either up-and down-regulated exons) and 5 splice-site strength in the down-regulated exons (Supplementary Figure S3B). However, we found that upregulated cassette exons have a stronger 5 splice-site than non-differentially regulated exons (Supplementary Figure S3B).
It has been also shown that intron-exon architecture affects splicing site recognition. For example, short introns favor the "intron definition" model for exon skipping decisions, whereas long introns favor the "exon definition" and exon skipping in some genes [56]. Additionally, differential exon-intron GC content also regulates inclusion levels of alternatively spliced exons by affecting nucleosome occupancy and recruitment of splicing factors on the nascent pre-mRNA [57]. Thus, we analyzed the length of introns and exons as well as GC contents of flanking introns of cassette exons. As shown in Supplementary Figure  S3C, up-regulated exons had shorter exon length and lower GC contents. We also observed that down-regulated cassette exons contained shorter upstream intron and higher GC contents, but not the differences in exon length (Supplementary Figure S3C). Collectively, these pre-mRNA intrinsic features could contribute to splice site selection and AS determination.

DCUN1D5 Expression Levels Are Significantly Associated with Breast Cancer Metastasis, and Survival
Among the validated AS events (Figure 2), we focused on exon 4 (ENSE00003623357) of Defect in cullin neddylation 1 domain containing 5 (DCUN1D5, a.k.a., squamous cell carcinoma related oncogene 5, SCCRO5), a member of the DCUN protein family characterized by the presence of a conserved C-terminal potentiating of neddylation (PONY) domain required for neddylation, a post-translational modification interesting a complex network of protein substrates [58][59][60]. DCUN1D5 was found to be up-regulated in different cancer types, whereas its elevated expression levels were reported to increase cell proliferation, migration, and invasion [35], thus suggesting the involvement of DCUN1D5 in cancer progression. DCUN1D5 exon 4 was preferentially included in high-metastatic MDA-MB-231 cells compared to low-metastatic MCF7 ( Figure 2F). As shown in Figure 4A, skipping of DCUN1D5 exon 4 introduces in the mature mRNA a PTC that could generate an unstable transcript (that we called NMD+ DCUN1D5) degraded through the NMD pathway. Thus, the coupling of AS of DCUN1D5 exon 4 and NMD (AS-NMD) could be an important mechanism that controls the total DCUN1D5 expression levels in the cell. In line with this hypothesis, Tani and colleagues [61] found that DCUN1D5 mRNA is one of the targets of UPF1, a key NMD player [62]. In support of the AS-NMD involvement, we found that the inclusion of DCUN1D5 exon 4 was accompanied by higher DCUN1D5 total mRNA expression in MDA-MB-231 cells compared to MCF7 cells ( Figure 4B). In addition, we also found that DCUN1D5 splicing profiles parallel DCUN1D5 expression levels in a wide panel of breast cancer cell lines present in the Cancer Cell Line Encyclopedia (CCLE) database (Supplementary Figure S4). In particular, breast cancer cell lines with high metastatic potential (TNBC subtype) showed high levels of DCUN1D5 exon 4 inclusion and high DCUN1D5 expression levels compared to breast cancer cell lines with low metastatic potential (luminal subtype) (Supplementary Figure S4A,B). Based on these results, we next wondered whether the AS-NMD might be involved in regulating DCUN1D5 expression in metastatic breast cancer cells. To demonstrate the involvement of the NMD pathway in regulating DCUN1D5 expression, we treated MCF7 breast cancer cells with cycloheximide (CHX), a well-known NMD inhibitor [63]. As shown in Figure 4C,D, CHX significantly stabilized the expression of the NMD+ DCUN1D5 transcript as detected by RT-PCR and RT-qPCR ( Figure 4C,D). These results were also confirmed in two other cell lines (HEK-293 and HeLa) known to express DCUN1D5 (Supplementary Figure S5A,D). Notably, we also found a significant negative correlation (r = −0.35; p-value = 0.016) between the percentage of NMD+ transcripts and total DCUN1D5 expression in breast cancer cell lines of the CCLE database (Supplementary Figure S4C). To support the involvement of DCUN1D5 in breast cancer cell biology, we overexpressed GFP-tagged DCUN1D5 in low-metastatic MCF7 cells. As shown in Supplementary Figure S6, we confirmed that recombinant DCUN1D5-GFP localized in the nucleus (as previously reported) where it is required for its oncogenic function [58] (Supplementary Figure S6A,B). Importantly, we found that overexpression of DCUN1D5 increased the proliferation of MCF7 cells (Supplementary Figure S6C).
To further investigate DCUN1D5 expression in breast cancer, we used The Cancer Genome Atlas (TCGA-BRCA) [64] to compare non-pathological samples with primary breast tumors. As shown in Figure 5H, we confirmed DCUN1D5 up-regulation in tumor specimens compared to normal ones using UALCAN web-tool (http://ualcan. path.uab.edu; accessed on 3 March 2021) [44]. RNA-seq TCGA dataset also allows the quantification of AS events that can be retrieved by using the TCGA Spliceseq web-tool (https://bioinformatics.mdanderson.org/TCGASpliceSeq; accessed on 3 March 2021) [45]. Importantly, we found a reduction of DCUN1D5 exon 4 skipping, which sustains the production of the DCUN1D5 NMD+ transcript, in tumors compared to normal breast tissues ( Figure 5I). Box blots indicate DCUN1D5 expression as Log 2 -median centered intensity, Log 2 Intensity, or TPM as indicated. **** p < 0.0001, *** p < 0.001, ** p < 0.01, * p < 0.05. We next wondered whether DCUN1D5 was expressed differently in metastatic breast tumors compared to non-metastatic tumors. Strikingly, as showed in Figure 6A, we found high DCUN1D5 mRNA expression levels in high-metastatic triple-negative (TNBC) and HER2 positive breast cancers compared to low-metastatic luminal tumors annotated in the TCGA-BRCA dataset. Notably, protein expression of DCUN1D5 was also up-regulated in both TNBC and HER2 positive breast tumors compared to luminal ones as determined using data from the Clinical Proteomic Tumor Consortium (CPTAC) Confirmatory/Discovery dataset retrieved by UALCAN web-tool ( Figure 6B). Furthermore, DCUN1D5 exon 4 skipping is highly reduced in highly metastatic TNBC and HER2 positive breast tumors than luminal breast tumors ( Figure 6C). Accordingly, we found high DCUN1D5 expression in breast cancer tissues with metastasis at distant sites compared to breast tumor without metastasis [Perou breast cancer dataset (GSE3521)] ( Figure 6D). Collectively, these results indicate that DCUN1D5 exon 4 skipping and as a consequence, the expression of the NMD+ DCUN1D5 transcript inversely correlates with total (mRNA and protein) DCUN1D5 levels in breast cancer specimens. Remarkable, low DCUN1D5 exon 4 skipping and high DCUN1D5 expression levels are found in highly metastatic breast tumors such as TNBC and HER2 positive breast tumors. Nevertheless, even if we could not rule out the existence of additional mechanisms regulating total DCUN1D5 expression levels, our data suggest that an AS-NMD program activated by DCUN1D5 exon 4 skipping could contribute, at least in part, to control DCUN1D5 expression in breast cancer cells. Finally, we compared 5-year overall survival rate of breast cancer patients with high and low DCUN1D5 expression levels. As shown in Figure 6E, patients who expressed high DCUN1D5 levels in breast tumors from the TCGA-BRCA dataset exhibited a significantly lower survival rate than patients with lower DCUN1D5 expression. Notably, 5-year relapse-free survival rate was also decreased in breast cancer patients with high DCUN1D5 expression retrieved from Kaplan-Maier Plotter (https://kmplot.com/analysis/; accessed on 3 March 2021) [51] (Figure 6F).
Collectively, our results strongly support a relevant role of high DCUN1D5 expression levels and reduced exon 4 skipping during metastatic progression of breast cancer.

Molecular Mechanisms Regulation DCUN1D5 Exon 4 Splicing
In order to decipher the molecular mechanism regulating DCUN1D5 splicing, we analyzed the sequence of exon 4 to identify splicing enhancers (Exonic Splicing Enhancer or ESE) that could represent the binding sites for RBPs showed in Supplementary Table S5. By using ESEFinder 3.0 web tool [43], we were able to identify a putative ESE responsive to the human SRSF1 ( Figure 7A; Supplementary Table S6), a member of the SR family of splicing regulators, which is frequently upregulated in different cancers [65]. Importantly, putative SRSF1 binding motifs were also predicted by using two additional tools such as RBPmap [40] and SpliceAid2 [41] (Figure 7A; Supplementary Table S6). To test the hypothesis that SRSF1 could regulate DCUN1D5 exon 4 splicing we have transiently transfected MCF7 cells with a plasmid overexpressing the T7-tagged SRSF1 protein or depleted SRSF1 by using a siRNA-mediated knockdown approach. As shown in Figure 7B,C, we found that SRSF1 overexpression was able to promote DCUN1D5 exon 4 inclusion in MCF7 cells, whereas its depletion increases the production of the NMD+ transcript ( Figure 7D,E). Similarly, this result was also confirmed in another cell line (HEK-293) (Supplementary Figure S7A,B). Notably, overexpression of different SR proteins, such as SRSF3, was not able to alter the DCUN1D5 exon 4 splicing profile in breast cancer cells (Supplementary Figure S7C,D). Collectively, our results indicate SRSF1 as a novel regulator of DCUN1D5 exon 4 splicing.

Discussions
Breast cancer is one of the most common cancer types among women, affecting 2.1 million women each year (https://www.who.int/cancer/prevention/diagnosis-screening/ breast-cancer/en/; accessed on 9 January 2021). Despite remarkable progress, our understanding of the molecular mechanisms involved in the development and progression of this tumor is still limited. In this study, we performed a highly specific, quantitative, and cost-effective RASL-seq of high-and low-metastatic breast cancer cells in order to identify AS programs that could characterize very aggressive breast cancer tumors. Our RASL-seq was able to detect 5530 annotated AS events. We found that 925 AS events, mainly affecting protein-coding genes, were altered in high-compared to low-metastatic cells, thus supporting a widespread role of AS in generating the proteomic diversity contributing to breast cancer progression. Moreover, our RT-PCR analysis, confirmed that RASL-seq shows a highly accurate and reproducible validation rate (90%, 18/20). Since RASL-seq provides an accurate and meaningful comparison of AS events between two groups, we are able to detect metastatic-specific AS without fruitless efforts. GO analysis of differentially spliced genes showed enrichment for biological processes including apoptosis, Wnt signaling, cell cycle, DNA replication, cell-cell adhesion, RNA processing, and chromatin modifications. These results suggest that perturbation of AS profiles of these genes could play important roles in different aspects of breast cancer progression. Importantly, whereas up-and down-regulated AS cassette exons shared common biological pathways such as cell-cell adhesion, mRNA processing, positive regulation of GTPase activity, and apoptosis, biological processes specific to each category were also found.
Global analyses based on ESTs, whole genome-seq, and RNA-seq have been shown that breast cancers are characterized by numerous aberrant AS events [7,11,12,14,23,[66][67][68][69][70]. Accordingly, our RASL-seq results identified a large number of AS events that were altered in high-metastatic breast cancer cells compared to low-metastatic breast cancer cells using the MDA-MB-231/MCF7 in vitro model for TNBC/ER positive breast tumors [71]. Notably, among AS events differentially expressed in MDA-MB-231 vs. MCF7 cells we found genes previously annotated as alternatively spliced in breast tumor tissues and between different breast cancer subtypes [27]. In particular, we found genes with key roles in breast oncogenesis and progression (for example RAC1, MAP3K, and CTNND1).
However, since MDA-MB-231 and MCF7 cells originated from different patients, we cannot exclude the possibility of individual differences in AS events. Nevertheless, we were able to confirm a high percentage (16/18) of aberrant AS events associated with high-metastatic MDA-MB-231 in another cellular model of low-and high-metastatic breast cancer cell lines (T47D vs. BT-549).
Furthermore, our analysis of annotated CCLE transcripts in 46 breast cancer cell lines clearly indicates that a large fraction of our identified AS changes are associated with the most aggressive and highly metastatic "triple-negative" breast cancer subtype.
Among differentially regulated AS cassette exons identified by our RASL-seq analysis, we found Defect in cullin neddylation 1 domain containing 5 (DCUN1D5, a.k.a., squamous cell carcinoma related oncogene 5, SCCRO5) exon 4. DCUN1D5 is a key component of the neddylation pathway, a post-translational modification involved in a number of biological processes such cell cycle progression, metabolism, immunity, and tumorigenesis [72]. Notably, it has been reported that DCUN1D5 has oncogenic activity [35,58]. We found that DCUN1D5 exon 4 was preferentially included in high metastatic breast cancer cells. We also demonstrated that an AS-NMD program, which occurs at the level of DCUN1D5 exon 4, regulates the DCUN1D5 steady-state mRNA levels. Accordingly, DCUN1D5 expression was increased in high-metastatic compared to non-metastatic tumor tissues. Importantly, the oncogenic potential of DCUN1D5 has been reported in two papers [35,58], showing that increased expression of DCUN1D5 promoted migration, invasion, and transformation. Consistent with these results, we found that DCUN1D5 showed higher expression in breast tumors than in normal breast tissues. Consistently, breast cancer patients in the high DCUN1D5 expression group demonstrated a lower 5-year overall survival rate than those in the lower expression group.
To gain insights into the molecular mechanisms underlying DCUN1D5 splicing, we focused on SRSF1 since binding motifs for this AS regulator are located within exon 4. Remarkably, we found that SRSF1 stimulated DCUN1D5 exon 4 inclusion. SRSF1 is a prototypical member of the SR family and a proto-oncogene, whose expression levels are increased in different tumor types probably as a consequence of gene amplification [65,[73][74][75]. Importantly, SRSF1 is a direct target of the proto-oncogene c-Myc and SRSF1 overexpression promotes tumor formation in mice [65,75]. SRSF1-mediated AS regulation generates protein isoforms promoting cell migration and EMT [7,52], which have pro-oncogenic capabilities or lack tumor suppressor activity [65,76,77], and influence angiogenesis [78]. Our data add DCUN1D5 exon 4 to the list of SRSF1 splicing targets. Moreover, they support the conclusion that SRSF1 activation could contribute to the malignant progression of breast cancers by stabilizing DCUN1D5 expression through the involvement of the AS-NMD pathway. Due to the relevance of DCUN1D5 expression in metastatic breast cancer, a better characterization of DCUN1D5 activity and its association with neddylation post-translation modifications will provide novel insights into breast cancer biology.
Aberrant AS has emerged as a key feature of breast cancer [11,12]. However, characterization of the functional roles for the majority of AS events associated to highly metastatic breast cancer cells is very limited. RASL-seq represents a rapid and cost-effective tool allowing the identification of AS programs associated with metastatic potential of breast cancer cells. In this regard, our work contributes to revealing AS switches associated with breast cancer metastasis that have the potential to serve as novel tools for diagnostic, prognostic, or therapeutic applications for breast cancer patients. Here, we focused on DCUN1D5 splicing, however, future studies should understand how other identified AS events impact breast cancer progression and metastasis formation.  Table S1: List of primers used for RT-PCR, RT-qPCR and cloning, Table S2: List of up-: down-and non-differentially regulated AS events of genes in high-metastatic breast cancer cells. List of genes in each category shown in Figure 1A, Table S3: List of genes shown in Figure 1C,D, Table S4: List of breast cancer cell lines and Ensembl transcripts with or without the RASL-seq validated cassette exon shown in Supplementary Figure S2, Table S5: List of RBPs that potentially interact with RNA sequence motifs, Table S6: SRSF1 binding motifs predicted with ESEFinder 3.0, RBPMap, and SpliceAid2.