With technological improvements and the application of integrated methodologies, significant progress has been achieved in uncovering new lncRNA molecules. Some of these practical strategies can be further applied to achieve new insights into lncRNA functions.
3.1. Application of Chromatin Signatures to Determine LncRNAs from Intergenic Regions
Several individual studies have applied a systematic and integrative strategy with multiple biological features to identify lncRNAs, mainly in intergenic regions (lincRNAs), first in mouse [
27] and then in zebrafish [
46] and human [
28] genomes. Distinguished from other previous trials, a brand new feature of “H3K4me3-H3K36me3” chromatin signatures has been utilized in all three species to confirm lncRNA promoters using the histone 3 Lys 4 trimethylation (H3K4me3) signature followed by identification of actively transcribed lncRNA regions using the histone 3 Lys 36 trimethylation (H3K36me3) signature. By differentiating the “H3K4me3-H3K36me3” chromatin signatures of lncRNAs from those of known coding genes/microRNAs/endogenous siRNAs, these analyses reliably identified lncRNA-expressed genomic sequences, largely in intergenic regions (
Figure 1b). In addition, other stringent criteria have also been taken into account for lncRNA characterization, including the identification of poly(A) sites, transcription initiation signals, expression patterns among tissues and potential coding capacity. Loss-of-function and gain-of-function of certain conserved lncRNAs demonstrated crucial biological roles of lncRNAs in zebrafish [
46], indicating functional conservation despite limited sequence conservation. More importantly, 7some lincRNAs have been shown to play important roles in multiple layers of biological processing, including epigenetic regulation and pluripotency maintenance (reviewed by Guttman [
14], Rinn [
13] and their colleagues).
3.2. Development of a Non-Polyadenylated RNA Enrichment Strategy to Uncover LncRNAs from Introns
Most RNA polymerase II transcripts, including mRNAs and lncRNAs, are polyadenylated (poly(A)+) at their 3’ ends. The application of transcriptome analysis of poly(A)+ RNA by high-throughput deep sequencing (mRNA-seq) has revealed a digital map of poly(A)+ transcripts from both known and previously un-annotated genes [
67]. However, the transcribed portion of the genome is more than poly(A)+ transcripts, and there are a large number of non-polyadenylated transcripts (poly(A)− transcripts), including ribosomal RNAs (rRNAs) generated by RNA polymerases I and III, other small RNAs generated by RNA polymerase III, replication-dependent histone mRNAs [
68] and some lncRNAs [
24,
69] transcribed by RNA polymerase II. Depletion of ribosomal RNAs (RiboMinus) from total RNA results in both poly(A)+ and poly(A)− transcripts available for deep sequencing analysis. This has led to the discovery of many new poly(A)− transcripts when compared with poly(A)+ RNA deep sequencing [
70,
71]. However, rRNA-depletion methods cannot physically separate poly(A)− transcripts from poly(A)+ RNAs, thus it is difficult to directly annotate poly(A)− transcripts using only the rRNA-depletion method. Recently, a combination of both rRNA and poly(A)+ RNA removal was applied to obtain a largely pure population of poly(A)- RNAs for high-throughput deep sequencing [
34]. This type of poly(A)− RNA-seq of the human cell transcriptomes surprisingly revealed many previously un-annotated RNA transcripts, including a new family of lncRNAs from introns in humans [
35] (
Figure 1b). In addition, with the same separation strategy for poly(A)− transcripts followed by deep sequencing analyses, additional poly(A)− lncRNAs from intronic regions were also found in various human cell lines [
38]. Interestingly, RNA fractionation from nuclear homogenates also indicated the presence of stable intronic sequence RNAs in
X. tropicalis [
72]. As most lncRNAs are tissue/cell-specific and species-specific, further application of poly(A)− RNA-seq for different tissues and species may result in the identification of additional intron-derived lncRNAs.
What mechanism(s) can generate RNA transcripts without canonical poly(A) tails at their 3' ends? For most of the replication-dependent histone pre-mRNAs, evolutionarily conserved stem-loop structures in their 3’ UTRs direct U7 snRNA-mediated 3’ end formation to stabilize mature mRNAs and confer cell cycle dependent regulation of their accumulation [
67]. For
MALAT1 and
Menε/β lncRNAs, their 3' end maturation depends on RNase P cleavage [
24,
69], stabilized by highly conserved A- and U-rich motifs that form a triple-helical structure [
73,
74]. For telomerase RNA in
S. pombe, incomplete splicing, but not the complete splicing, generates a functional
TER1 transcript [
75]. However, it appears that none of the above mechanisms are applicable to explain the biogenesis of lncRNAs from introns, as introns are generally rapidly degraded after splicing. Yin
et al recently demonstrated that intron-derived
sno-lncRNAs depend on the snoRNA machinery at both ends for their processing and on snoRNP complexes at both ends to protect intronic sequences from exonucleotic trimming [
35]. Genome-wide analysis of poly(A)− RNAs from introns has revealed a large number of lncRNAs from intron regions [
34,
38]; however, only some are capped with snoRNAs. The biogenesis of others needs to be further addressed. Finally, in addition to poly(A)− RNA-seq, the development of more specific experimental and computational approaches will help to understand other poly(A)− lncRNAs matured by RNase P cleavage or incomplete splicing.
3.3. Determination of Co-Factors to Study LncRNA Biogenesis and Function
It’s now clear that lncRNAs play important roles in a variety of biological processes [
13,
14,
63]. So far, only a handful of mechanisms have been identified to explain how lncRNAs function
in vivo. Accumulated lines of evidence suggest that very often lncRNAs function by recruiting and assembling other co-factors, which are usually proteins but possibly other RNAs [
51,
76,
77] or DNAs [
78]. Clearly, identifying these co-factors is of key importance for understanding lncRNA function.
The lncRNA
Xist is capable of recruiting Polycomb Repressive Complex 2 (PRC2) to remodel chromatin modifications [
79], resulting in transcriptional inactivation of one X chromosome. Similarly,
Air and
Kcnq1ot1 lncRNAs achieve transcriptional silencing by recruiting chromatin-remodeling complexes during genomic imprinting [
80,
81]. Indeed, many lncRNAs have been identified to bind with PRC2 or other chromatin-modifying complexes for transcriptional repression [
32,
82]. In addition, lncRNAs can also activate gene transcription by binding specific protein factors. For instance,
Evf-2 binds the Dlx-2 protein, which in turn increases the activity of the Dlx-5/6 enhancer [
83]. Interestingly, one specific lncRNA might play complementary roles in gene expression regulation by selectively recruiting either PcG for repression [
84] or Trithorax group proteins (TrxG) for activation [
85].
In addition, lncRNAs can act as molecular scaffolds. For example, telomerase RNA component (TERC) acts as a flexible scaffold for bridging protein subunits together to promote telomerase activity [
86].
NEAT1 lncRNA is crucial for the integrity of paraspeckles [
21,
22,
23,
24], and a recent study revealed that
NEAT1 is capable of initiation of paraspeckle
de novo formation [
87].
Moreover, lncRNAs can also function as molecular sponges or decoys to affect gene regulation mediated by protein cofactors. For example,
Gas5 lncRNA binds the glucocorticoid receptor (GR) to compete against the association of the GR with other glucocorticoid response DNA elements, resulting in functional repression of GR [
88]. PWS region
sno-lncRNAs trap Fox family members to alter local Fox protein concentration and, subsequently, modulate Fox-regulated alternative splicing events [
35]. Meanwhile, lncRNAs also act as competing endogenous decoys through their microRNA response elements (MREs) to titrate the availability of miRNAs for the other RNA molecules [
30,
51,
76,
77]. Finally, promoter associated lncRNAs can directly interact with enhancer DNA elements to form DNA:RNA triplexes to carry out their regulatory function [
78].
Taken together, these studies suggest that the functional specificity of a given lncRNA is largely dependent on the association with its co-factors, mainly protein partners. Hence, it is important to find associated protein co-factors in order to fully understand the functional roles of lncRNAs. While the potential binding capacity can be predicted by computationally searching for consensus RNA sequences/motifs, direct lncRNA-protein interactomes can also be retrieved from cross-linking immuno-precipitation coupled with high-throughput sequencing (CLIP-seq) (
Figure 1b), or using labeled lncRNAs as baits to pull down protein partners.
How do lncRNAs bind to their protein co-factors? There are a variety of known mechanisms for this.
Xist contains at least two distinct domains. One is the RepC domain, which is bound by YY1 and hnRNP U for the localization; the other one is the RepA domain, which recruits PRC2 for in-
cis gene expression regulation [
89,
90]. Different from
Xist, the PWS region
sno-lncRNAs contain multiple consensus hexamer motifs for Fox family splicing regulators [
91], which leads to the sequestration of Fox proteins and subsequently the alteration of patterns of Fox-regulated alternative splicing [
35]. Interestingly, low evolutionarily conserved lncRNAs have been found associated with the same proteins. For example, human
NEAT1 and mouse
Men ε/β share low primary sequence similarity, but both are associated with DBSH proteins [
21,
22,
23,
24]. This suggests that RNA structure features may sometimes play important roles in the determination of their protein partners. Thus, the recent application of genome-wide structural analysis that determines ncRNA secondary structure has begun to decipher the functional elements of the yeast transcriptome [
92]. Similar studies in higher eukaryotes will help to reveal structural information and diverse biological insights of lncRNAs, possibly with their protein co-factors.
Figure 1.
Schematic diagram of long noncoding RNA discovery and function analysis using genome-wide methods. (a) Genomic locations for long noncoding RNA (lncRNA) transcription. Boxes shown as annotated genes and exons. Arrows label the direction of transcription. (b) Methodology for lncRNA discovery and functional association with proteins. H3K4me3 signature defines transcription initiation. H3K36me3 signature defines transcription elongation. Signals of poly(A)+RNA-seq indicate polyadenylated RNAs (including most annotated mRNAs and lncRNAs). Signals of poly(A)-RNA-seq indicate non-polyadenylated RNAs, including recently identified intronic transcripts. Signals of CLIP-seq/RIP-seq reveal the association of RNA transcripts with RNA binding proteins.
Figure 1.
Schematic diagram of long noncoding RNA discovery and function analysis using genome-wide methods. (a) Genomic locations for long noncoding RNA (lncRNA) transcription. Boxes shown as annotated genes and exons. Arrows label the direction of transcription. (b) Methodology for lncRNA discovery and functional association with proteins. H3K4me3 signature defines transcription initiation. H3K36me3 signature defines transcription elongation. Signals of poly(A)+RNA-seq indicate polyadenylated RNAs (including most annotated mRNAs and lncRNAs). Signals of poly(A)-RNA-seq indicate non-polyadenylated RNAs, including recently identified intronic transcripts. Signals of CLIP-seq/RIP-seq reveal the association of RNA transcripts with RNA binding proteins.