3.1. Framework for Studying Alternative Splicing Outcomes in lncRNA Knockdown RNA-Sequencing Experiments
In this study, we investigated alternative splicing events, functions of alternatively spliced gene sets, and RBP-lncRNA binding patterns across 39 lncRNA knockdowns in HeLa (cervical cancer), K562 (myeloid leukemia), and U87 (glioblastoma) cell lines. An overview of the three-step analysis is illustrated in
Figure 1 (see Materials and Methods). First, we collected and processed the raw RNA-Seq data for multiple lncRNA knockdown samples in human cancer cell lines and identified splicing events using the replicate multivariate analysis of transcript splicing (rMATS) [
22] pipeline. Secondly, alternative splicing summaries were analyzed, and functional enrichment analysis of the effected gene sets was performed. Third, we analyzed the binding profiles of RNA-binding proteins (RBPs) with respect to the skipped exon (SE) and retained intron (RI) events across 39 lncRNA knockdowns, finding that the RBPs’ binding profiles around SE and RI events were also unique to each cell line. Additionally, we observed RBPs’ substantial binding preference for knocked down lncRNAs and proposed that certain lncRNAs demonstrate a sponge-like RBP binding activity.
3.2. Skipped Exon Events Followed by Retained Intron Events Are the Most Prominent Alternative Splicing Events Occurring Due to the Loss of lncRNAs across Multiple Human Cell Lines
We collected the RNA sequencing data from dCAS9-based lncRNA knockdown study [
4] which contained 39 lncRNA knockdowns in U87, HeLa, and K562 cell lines (see Materials and Methods). LncRNA knockdown sequence replicates and control samples were aligned onto the human reference genome (hg38) using HISAT2 [
20]. The overall percentage of alignment is highlighted in
Supplementary Table S1, demonstrating a high fraction of read alignment to the reference genome (average alignment rate ≥ 91%). The alternative splicing (AS) event identification analysis was performed by deploying the rMATS pipeline over the three cell lines. A filter of fdr < 0.1 and
p-value < 0.05 was employed to extract the most likely alternative splicing events (see Materials and Methods). A total of 26,167 high-confidence AS events were extracted from the rMATS analysis output across three cell lines. Skipped exon (SE) events were the most predominant events, constituting 44.4% of the total events, followed by retained intron (RI) events at 22.5%. Sample-wise distributions of splice events are provided as
Supplementary Table S2. The output from the rMATS analysis was parsed and summarized into a total of 17,525 unique, statistically significant splice events (11,630 SE and 5895 RI) and their AS event frequency across 39 lncRNA knockdowns in the three cell lines is available as
Supplementary Table S3.
3.3. Hierarchically Clustered Heatmaps of AS Event Frequency in lncRNA Knockdowns Reveal Cell-Type-Specific Alternative Splicing Signatures
Annotated splicing summaries were compiled to generate matrices depicting the frequency of alternative splicing events occurring in lncRNA knockdowns across three different cell lines. In order to visualize the AS events from the matrix, heatmaps were generated for SE and RI events. A legible resolution of the SE and RI heatmaps was obtained after missing data were filtered out. Across RI and SE events, we observed that the alternatively spliced gene clusters were predominantly unique across each cell line (
Supplementary Figures S4 and S5). Consequently, even though the same lncRNA was knocked out across two cell lines, the genes that were alternatively spliced were cell-type specific. SE events had the most pronounced cell-type specificity, in comparison to other AS events. In tandem, we also observed that the cellular and phenotypic functions affected by genes alternatively spliced across SE and RI events were specific to each cell line.
Among the lncRNA knockdown dataset described in the Materials and Methods, we found only 12 lncRNAs to be common across three cell lines. Amongst all the AS events in the three different cell lines, no single genes were affected in the same way by a knockdown. The resulting alternatively spliced genes from the 12 lncRNA knockdowns which were present in at least two cell types were observed to be cell-type specific.
Figure 2 illustrates how the cell lines which had a lncRNA knocked down affected the frequency of SE events in specific genes. The genes affected by RI events were also altered in a cell-type-specific manner. However, the intensity of the signal was relatively lower in RI events when compared to SE events.
In the AS event frequency heatmap, the SE events had very defined clusters and continued to be clustered in a distinct cell-type-specific manner [see
Supplementary Figure S5]. The genes which were annotated with SE events were rarely shared across two cell lines. Out of the total 759 alternatively spliced genes, only 7.3% (37 genes) were alternatively spliced across at least two cell lines. When enrichment analysis was performed over the few SE affected genes which were shared across different cell lines, there was too little information for a statistically significant functional annotation to emerge. The number of SE-event-affected genes shared across cell lines can be seen in the Venn diagram in
Figure 2. Similarly, a heatmap of RI event frequencies was generated. Akin to the observations from SE event analysis, the gene sets with RI events were mostly cell-type specific [see
Supplementary Figure S5].
The gene sets affected by RI events were cell-type specific, but these gene sets overlapped slightly more among the cell lines, relative to SE distributions. Given the small number of genes alternatively spliced and RI events shared (maximum of 30 shared events), the scope for significant functional annotation was low. The 162 genes affected by RI events in the K562 cell line had a distinct signal from the other cell lines, further emphasizing their cell-type specificity.
3.5. Analysis of lncRNA AS Events Proximal to RBP Binding Sites Reveals Cell-Type-Specific Interactions and Supports a lncRNA–RBP Sponge Model
As a conduit to understand lncRNAs’ role in alternative splicing, lncRNAs’ interactions with RBPs were extracted. RBP binding profiles for 22 lncRNAs were obtained from documented eCLIP experiments from the ENCODE database.
Supplementary Material 11 highlights the binding profiles which overlapped with proximal (±500 bp) alternative splice site locations, revealing 4,261,897 RBP binding locations for 148 RBPs on 22 lncRNAs. The significance of relative frequencies of bound and unbound RBP sites was gauged by deploying Fisher’s exact test on each lncRNA’s RBP binding preferences.
Figure 4 showcases the intensity of each RBP’s interactions to their respective 11 lncRNAs in the K562 cell line (
Figure 4A) and 14 lncRNAs in the HeLa cell line (
Figure 4B). Additionally, we observed lncRNAs which acted as RBP sequestering sponges, which is illustrated in
Figure 4C, based on their extensive interactions with RBPs.
Figure 4D demonstrates how sponging lncRNAs like LINC00909 have many interactions with a variety of RBPs.
The lncRNA–RBP binding-profile-based clustering analysis across both cell lines was not very informative. However, an interesting behavior was revealed where certain lncRNAs (e.g., LINC00909, LINC00263, and LINC00910) had many binding events across many RBPs. Therefore, the downstream alternative splicing caused by the loss of a lncRNA is induced by the absence of an RBP binding sponge. As highlighted in the model shown in
Figure 4C, in cancer cells a lncRNA might bind to many RBPs, where its expression level could facilitate extensive RBP interactions. However, in the event of a lncRNA knockdown or due to the loss of function of a lncRNA, an abundance of RBPs interact with pre-mRNA targets illustrated in
Figure 4C, thereby inducing alternative splicing.
As highlighted in
Figure 4A, most lncRNAs in the K562 cell line bound generally to RBPs like KHSRP, CSTF2T, YBX3, ZNF622, SAFB2, SRSF1, and QKI. LncRNAs LINC00910, LINC00680, RP11-392P7.6, and LINC00909 showed a very high number of RBP interactions in the K562 cell line (see
Supplementary Figure S4,
Figure 4D) and exemplify the proposed RBP sponge binding model. Other interesting patterns of lncRNA–RBP binding included distinct RBP binding preferences for lncRNAs from the same family, namely XLOC_042889 and XLOC_038702.
Despite only having 10 RBPs binding within the HeLa cell line proximal AS events, lncRNA LINC00909′s RBP interactions further reinforced our proposed lncRNA–RBP sponge model. As illustrated in
Figure 4B, RBPs ELAVL1 and HNRNPU were observed to have many null values across lncRNA knockdown samples and one RBP (HNRNPC) had many significant binding associations across all lncRNA knockdowns. The parameters of Fisher’s exact test require bound and unbound frequencies of the genes, which can be observed in the contingency table (
Table 1), and thus any RBP which is not reported to be unbound will yield a null result. Both ELAVL1 and HNRNPU were manually checked across all lncRNAs in the HeLa cell line, and they only had values for binding across lncRNAs. Thus, the binding preferences of ELAV1 and HNRNPU were very indifferent and were not considered as contributors to the RNA-binding protein sponge model; however, the binding specificity of these RBPs could be revealed as more interaction data are collected across other lncRNAs.