1. A Surprising New Entrant into the Repertoire of Expressed Genes
A role for RNA as regulatory molecules, rather than just as templates (mRNAs) and components of the machinery (ribosomal RNAs, transfer RNAs, spliceosomal RNAs) for the production of proteins, was established in the 1990s with the surprising discovery of the RNA interference (RNAi) and related pathways, which utilize small RNA guides to regulate mRNA stability and translation, and to control transposons [1
At the same time, seminal long non-protein-coding RNAs (lncRNAs), involved notably in X inactivation (Xist) [8
] and in genomic imprinting (H19) [12
](later shown to play a role in cancer [13
])were also discovered.
The subsequent revelation from high-throughput cDNA sequencing (RNAseq) in the 2000s that tens of thousands of long intronic, intergenic and antisense lncRNAs are transcribed from the genomes of mammals [14
] and other complex organisms [28
] was also surprising. The range of cellular non-coding RNAs may be underestimated by sequencing strategies that target polyadenylated RNA, as many are derived from retrotransposons [29
], introns [24
] and/or derived by processing longer transcripts [31
], including circular lncRNAs [33
]. Some lncRNAs are extremely long [35
]. Indeed, it appears that the vast majority of the genomes of all organisms, irrespective of the proportion that is protein-coding, is transcribed, mainly to produce non-coding RNAs [28
2. An Unwelcome Player
However, despite the RNAi precedent, and with few exceptions [37
], the existence of these uncharacterized lncRNAs was initially ignored or dismissed as “transcriptional noise”. Not only was it unclear how they might fit into the existing conceptual framework of genetic information and gene regulation, assumed to be transacted by proteins acting in combinatoric fashion, their sheer number, if functional, threatened the primacy of this framework, which has long been an article of faith in molecular, cellular and developmental biology.
The possibility that lncRNAs might be functional also contradicted the widely held belief, dating from the late 1970s, that the intronic and ‘intergenic’ sequences from which they are transcribed, and which dominate the real estate of the mammalian genome, are largely evolutionary debris (‘junk’) [40
], comprised of hangovers from the prebiotic assembly of “genes” [41
] expanded by accumulation of retrotransposon parasites (“selfish DNA”) [42
The idea that the most non-coding RNAs are noise from biologically inert regions of the genome was superficially bolstered by the observation that most are expressed at low levels and are generally less conserved than protein-coding sequences (although there are notable exceptions [43
]) similar to ancient retrotransposon-derived sequences, assumed to be non-functional and evolving ‘neutrally’ [44
]. This is a circular argument of dubious merit [45
], since there is increasing evidence that retrotransposon-derived sequences have been exapted for various functions and coopted as mobile modules to alter the patterns of gene expression [30
]. The comparison of the rate of divergence of an extant set of ancient repeats also does not include the (unknown) number that have diverged to the point of unrecognizability, which therefore underestimates the rate of their presumed neutral evolution, and the extent of evolutionary selection on the genome [45
The (lack of a high) conservation argument also fails to take into account the fact that adaptive radiation occurs mainly by the relatively rapid evolution of the regulatory sequences under positive selection, that such sequences have quite different structure–function constraints to proteins, and that they are subject to rapid turnover [45
]. Thus, many lncRNAs are likely to be lineage-specific.
At this point it is important to remember that the metazoan proteome is remarkably static. Both the nematode and human genomes contain ~20,000 protein-coding genes, most of which are functionally orthologous, despite orders of magnitude difference in their developmental (and cognitive) complexity. By contrast, the proportion of the genome that is non-protein-coding, and the number and range of non-coding RNAs expressed therefrom, increases with developmental complexity [28
], raising the obvious possibility that these sequences are responsible for specifying developmental complexity and phenotypic diversity.
3. Evidence of Long Non-Coding RNA Functionality
Indeed, there are many different rate classes of sequence evolution in mammals, indicating that at least 45% of the alignable regions of mammalian genomes are not evolving neutrally [53
], with at least 18% of the mammalian genome conserved at the level of predicted RNA structure [54
There are also many indices of lncRNA functionality [55
], including conservation of promoters [16
], regulation by canonical transcription factors [58
], and chromatin signatures of active gene expression [57
]. Moreover, lncRNA exons have been found to be more conserved than neutrally evolving ancestral repeat sequences, albeit at lower levels than protein-coding genes [57
The case for lncRNA functionality is also supported by their dynamic expression patterns in differentiating cells and their highly specific spatial (including subcellular) localization [57
], especially in the brain [63
], which also explains their low abundance in RNAseq analyses of whole tissues [26
]. Indeed, high-resolution analyses using RNA capture technologies have revealed an extraordinary diversity of lncRNAs, most of which are likely to be cell-specific, and which have yet to be catalogued or characterized [27
] Perhaps the most intriguing are the 3’UTR-derived lncRNAs that are expressed separately from, and appear to convey differentiation signals independently of, their normally associated mRNAs [66
There have been many studies examining lncRNA biology over the past decade (too many to reference here, but see http://www.lncrnadb.org
] and http://www.noncode.org
]) linking lncRNAs with cellular processes, including the formation of specialized subnuclear organelles [72
], chromatin domains [75
], regulation of splicing [76
], enhancer action [78
], and binding to chromatin-modifying proteins such as polycomb [81
], trithorax [61
] and Dnmt1 [86
]. Some of these functions may not be mediated by the lncRNA itself, but through mechanisms associated with their biogenesis [88
]. There are also many studies linking lncRNAs with differentiation and development [92
], and with diseases, including coronary artery disease and diabetes [33
], schizophrenia [96
] and cancer (see e.g., [43
Knockdown of lncRNAs by small interfering RNA (siRNA)-related methodologies frequently results in observable changes in cellular behavior or characteristics in culture [55
]. On the other hand, chromosomal deletion of lncRNA sequences often do not show overt phenotypic consequences. For example, only 5 of 18 lncRNA mouse knockouts resulted in lethality or growth defects [99
]. However, most phenotypic screens do not examine behavioral or cognitive effects. For example, deletion of the widely brain expressed non-coding RNA BC1 showed no developmental consequences [101
], but later tests showed the mutant mice, although having normal brain morphology and no obvious neurological deficits, exhibited decreased exploratory ability and increased anxiety [102
In this context, it is worth noting that deletion of a subset of the most highly conserved sequences in the mammalian genome, ultraconserved elements (UCEs) [103
], which are surely functional on the evolutionary evidence, also did not result in obvious abnormalities [105
], although a later study showed subtle neurological alterations [106
While skeptics remain, the most likely interpretation is that the documented functional examples are emblematic of an army of regulatory RNAs that guide epigenetic trajectories and specify cell state during a very complex and precise developmental ontogeny—from a single fertilized cell to a mobile, cognizant adult—and that most of the human genome is devoted to this purpose [37
]. Indeed, the proportion of the mammalian genome devoted to cognitive function, rather than body plan development, may be considerably underestimated, given the preponderance of lncRNA expression in the brain [63
]. Not surprisingly then, many lncRNAs are primate-specific [57
Indeed, the growing body of evidence is now leading to a general acceptance of the relevance of (many or most) lncRNAs to cell and developmental biology [92
], and increasingly neurobiology [63
], with the debate, such as it remains, shifted to the proportion of lncRNAs that may be biologically relevant. For me, the best indicator, although by no means proof, is their precise expression patterns [26
], on which basis one can project that most are likely to be functional.
If so, the current protein-centric framework for understanding the genetic programming of differentiation and development is incomplete, a legacy of the mechanical worldview that held sway at the birth of molecular biology. Reconsideration of this framework to incorporate not only proteins but also structural and regulatory RNAs [109
] is overdue.
4. Long Non-Coding RNA Structure–Function Relationships
The most pressing challenges now are to determine the structure–function relationships in lncRNAs and to parse their functional repertoire. This should resolve lingering questions and place lncRNAs into an integrated conceptual framework, together with small regulatory RNAs, transcription factors and signaling pathways, among others, for understanding the decisional hierarchies that control the 4-dimensional ontogeny of complex multicellular organisms [119
There is logic and experimental evidence to suggest that lncRNAs have a modular architecture, given their likely role as scaffolds and epigenetic guide molecules [26
]. This is strengthened by a recent high depth sequencing study that found, unexpectedly and in contrast to the limited information that had been previously available [57
], that the internal exons of lncRNAs are almost universally alternatively spliced [27
], which clearly implies modularity.
If this is correct, the establishment of the exon as the primary unit of lncRNA structure–function, combined with the observation of conservation of lncRNA structure [54
] and the presence of structural orthologs around the genome [121
], should provide a framework for determining which structural RNA modules associate with which effector proteins [121
]. It is envisaged that such studies will lead to expanded structure–function databases [122
] whereby specific protein (e.g., polycomb) binding domains in regulatory RNAs can be identified genome- and transcriptome-wide, and thereby the roles of and effector pathways for different lncRNAs and their alternatively spliced isoforms. It may be much harder, as exemplified by snoRNAs, to determine the RNA and DNA targets of lncRNAs, and which modules impart this function.
This framework should also allow parsing of the different types and roles of lncRNAs in establishing chromatin territories, enhancer looping, guidance of epigenetic modifier proteins that impose DNA and histone modifications, and the formation of subcellular domains, among others.
In addition, while most lncRNAs are nuclear and associated with chromatin, some are cytoplasmically localized [57
] with functions yet to be discovered. There is increasing evidence that RNAs are involved in the nucleation of liquid crystal domains in conjunction with disordered RNA-binding proteins [125
], potentially an entirely new dimension of cell biology beyond that of the well-characterized membrane-bound organelles. High-resolution imaging will be required, along with high-resolution RNA sequencing, and oligonucleotide or antibody capture to dissect the components of the structures where lncRNAs are localized.
5. From Hard- to Soft-Wiring
A new and rapidly emerging frontier is the role of RNA editing [126
] and RNA modification [128
] in modulating RNA signaling pathways in response to developmental cues and environmental signals, which may lie at the heart of the epigenetic plasticity seen in physiological adaption, complex diseases such as cancer and diabetes, and brain function [130
There is still much to do to understand the role of small RNAs, especially the ti/spliRNAs that are derived from transcription start sites and exonic borders [131
], and fragments of tRNAs [133
] and snoRNAs [136
], some of which may function as miRNAs [137
], as well as to decipher their evolutionary links and the regulatory networks in which they participate.
Finally, and most intriguingly, is the role of RNA in intercellular and transgenerational inheritance (soft-wired inheritance), for which there is not only evolutionary logic [140
] but also increasing evidence [143
]. The emerging picture is not (simply) of RNA as a transient intermediate between ‘gene’ and protein, but rather as the central computational engine of cell biology, differentiation and development, brain function and perhaps even evolution itself. Many textbooks may have to be rewritten once the full dimensions of regulatory RNA biology are revealed.