Regulatory Roles for Long ncRNA and mRNA

Recent advances in high-throughput sequencing technology have identified the transcription of a much larger portion of the genome than previously anticipated. Especially in the context of cancer it has become clear that aberrant transcription of both protein-coding and long non-coding RNAs (lncRNAs) are frequent events. The current dogma of RNA function describes mRNA to be responsible for the synthesis of proteins, whereas non-coding RNA can have regulatory or epigenetic functions. However, this distinction between protein coding and regulatory ability of transcripts may not be that strict. Here, we review the increasing body of evidence for the existence of multifunctional RNAs that have both protein-coding and trans-regulatory roles. Moreover, we demonstrate that coding transcripts bind to components of the Polycomb Repressor Complex 2 (PRC2) with similar affinities as non-coding transcripts, revealing potential epigenetic regulation by mRNAs. We hypothesize that studies on the regulatory ability of disease-associated mRNAs will form an important new field of research.


Introduction
RNA molecules are best known for their ability to convey genetic information encoded in the DNA into the synthesis of specific proteins. This messenger function makes RNA an essential player in OPEN ACCESS today's DNA/RNA/protein world. It is commonly believed that our current DNA/RNA/protein world was preceded by a so-called RNA-world, a term first used by Gilbert in 1986 [1]. This world was based primarily on RNA molecules, which stored genetic information similar to DNA, and catalyzed chemical reactions similar to enzyme proteins in today's world [2,3]. The RNA-world hypothesis has implicated a crucial role for RNA in the origin of life. Also in today's DNA-based life, the function of RNA molecules is not limited to being a messenger for protein synthesis. In fact, only about 1-2% of the RNA present within a human cell is protein-coding, the remainder being non-coding RNA (ncRNA). The vast majority of this ncRNA is ribosomal RNA (rRNA) and transfer RNA (tRNA)-both involved in the process of translation [4]-as well as mitochondrial RNA (mtRNA) transcribed from DNA present in the mitochondria. In addition, and especially thanks to recent advances in massive parallel sequencing, the near entire repertoire of RNA molecules has now been identified. Important work by the ENCODE Consortium on the characterization of the complete RNA profile of human cells has shown that about 62% of genomic bases is represented in RNA molecules [5]. To date, this has resulted in the annotation of 13,249 unique long non-coding RNAs (lncRNAs) versus the 20,447 known protein-coding loci (GENCODE v15) with lncRNA numbers likely to increase further in later releases of GENCODE [6]. From an ever-increasing number of functional studies it has become apparent that lncRNAs-transcripts over 200 nucleotides in size-are involved in the regulation of gene expression at many levels, ranging from changing the epigenetic state of genes to influencing mRNA stability and translation. Also in the context of cancer, many lncRNAs have been shown to possess tumor suppressive or oncogenic properties [7][8][9][10][11][12][13][14][15][16][17]. This implies there is a much more complex role for RNA in cancer than previously anticipated. This review highlights both the differences and similarities between protein-coding and long non-coding transcripts. The roles of short RNA molecules (such as miRNAs) and their involvement in cancer are excellently reviewed elsewhere (e.g., [18][19][20][21][22]). Importantly, we summarize evidence for multifunctional roles for protein-coding transcripts. These multifunctional roles warrant a further (re-)investigation of deregulated transcripts in cancer, at the protein level and at the regulatory level.

Non-Coding versus Coding RNA
For most mRNAs ample evidence for their protein coding ability exists. Likewise, an ever-growing list of publications proves the involvement of lncRNAs in diverse aspects of gene regulation. Despite this major discrepancy in function, lncRNAs are in many ways very similar to mRNAs. The majority of active lncRNA genes are occupied by the same histone modifications as protein-coding genes, are synthesized by the same RNA polymerase II transcriptional machinery, 5' capped and are often spliced with similar exon/intron lengths [23,24]. Moreover, most long non-coding transcripts are polyadenylated [25][26][27]. Alternatively, some lncRNAs are generated via alternative pathways, and are for example not polyadenylated and likely to be expressed by RNA polymerase III [25,28], or excised during splicing [29]. Still, most known lncRNAs and their biogenesis pathways are indistinguishable from mRNAs. Global analyses of long non-coding transcripts did reveal a general bias towards a two-exon structure and localization in the chromatin and nucleus [30]. They are also expressed at lower levels and more frequently in a cell type specific manner compared to mRNAs [31]. Still, there is a significant overlap between transcript expression levels and distribution of coding and non-coding RNA. Only, their lack of protein coding ability and conservation is differentiating lncRNAs from mRNAs [26,32]. These are therefore the main criteria from telling both types of transcripts apart.
Protein-coding ability-Proof of protein-coding ability can be obtained from experiments such as Western blotting using specific antibodies or via mass spectrometry. For example, in 2012, about one-third of all annotated human protein-coding genes were supported by peptide hits derived from mass spectrometry spectra submitted to PeptideAtlas [6]. This still leaves a large gap of evidence for many supposedly translated mRNAs. In contrast, finding proof of the inability of non-coding RNA to be translated into proteins is much harder. Bá nfai and colleagues have shown that many annotated lncRNAs that are expressed at levels similar to mRNAs indeed lack mass spectrometry evidence, but still some did reveal peptides indicating they may be wrongly annotated as non-coding [33].
Theoretically, each open reading frame (ORF) containing a start and stop codon can give rise to a polypeptide or protein. To discriminate protein-coding from non-coding transcripts a minimum length of the ORF is generally being used. For example, the FANTOM consortium that analyzed the mouse transcriptome described coding RNA to have an ORF of at least 300 nucleotides (nt; i.e., 100 amino acids) [34]. Similarly, the human transcriptome was analyzed by another consortium called H-Invitational that used a cutoff of 60 nt (20 amino acids) [35]. Unfortunately, these arbitrary cutoffs are far from ideal and have resulted in numerous incorrectly annotated RNAs for several reasons. Firstly, ncRNAs are likely to have an ORF by chance [36]. For example, a group of well documented lncRNAs including H19, Xist, Mirg, Gtl2, and Kcnq1ot1 all contain ORFs longer than 100 codons, while they do not code for protein [37]. Secondly, transcripts with an experimentally proven ability to encode for proteins shorter than 100 amino acids, will be falsely considered as non-coding. Many of such known short proteins are involved in critical pathways in immunity, cell signaling and metabolism [38]. In fact, about five percent of all currently annotated proteins are less than 100 amino acids in size, which would all be incorrectly annotated using this cutoff ( Figure 1). Lowering the threshold below 100 amino acids would allow the inclusion of very small known human proteins such as sarcolipin (SLN) [39] or ribosomal protein L41 (RPL41) with protein sizes of 31 and 25 amino acids, respectively [40]. Noncanonical, yet functional ORFs down to 11 amino acids have now been reported, indicating the possible existence of a new class of mRNAs [41]. However, setting the border of the ORF at a very low number of amino acids would obviously misclassify many ncRNA as coding RNA.
Sequence conservation-Instead of measuring the length of the ORF one could also examine the evolutionary conservation of the ORF. If the ORF of a novel transcript shows homology with other known proteins this indicates that the RNA could function as mRNA, while novel, non-conserved ORFs are likely to occur by chance and often do not function as protein-coding [42]. However, more recent research has revealed a frequent lack of conservation in newly identified protein-coding exons [43]. A further complicating factor is the common evolution of protein-coding genes, or copies thereof, into ncRNAs, such as pseudogenes. For example, the Xist gene evolved from a protein-coding gene and therefore still shows great overlap with mRNA features and a strong conservation [44]. Other pseudogenes have even been shown to be resurrected into protein-coding genes, further complicating the feature discrimination between mRNAs and lncRNAs [45].  20,640). In this analysis, only the largest protein size was included when multiple isoforms were listed for a single gene ID.
LncRNAs versus untranslated regions of mRNAs-Interestingly, a recent study revealed significant similarities between lncRNAs and the 3' untranslated regions (3' UTRs) in protein-coding RNAs in structural features and sequence composition [46]. Both lncRNAs and 3' UTRs obviously lack protein-coding capacity and are intron-poor. Importantly, also the secondary structure predictions were highly similar between lncRNAs and the 3' UTRs of protein-coding transcripts, most likely due to a similar (lower) GC content. Also thermodynamically, lncRNAs were more similar to UTRs than to coding sequences [47]. Moreover, direct sequence comparisons revealed highly similar hexamer compositions in lncRNAs and 3' UTRs, which differed significantly from 5' UTRs or ORFs [46].
In conclusion, although lncRNAs and mRNAs do differ in their protein-coding ability, the above-mentioned facts about lncRNAs reveal a high degree of similarity between lncRNAs and mRNAs, or parts thereof. LncRNAs have been shown to play critical regulatory roles in diverse cellular processes including chromatin remodeling, transcription, post-transcriptional processing, as well as intracellular trafficking [48][49][50]. The presence of the intriguing parallels between the lncRNAs and mRNA raises the question whether protein coding transcripts may be able to fulfill regulatory functions similar to lncRNAs.

Regulatory Functions of lncRNAs and mRNAs
LncRNAs appear to be involved in nearly all aspects of gene regulation, including X-inactivation, imprinting, epigenetic regulation, nuclear and cytoplasmic trafficking, transcription, mRNA splicing and translation [51]. Through these involvements, lncRNAs have shown to be important players in a wide range of biological processes, such as proliferation, cell cycle, apoptosis, differentiation and maintenance of pluripotency [52]. Participation of lncRNAs into this wide range of processes can be explained by the ability of transcripts to fold into stable secondary structures, which in many cases dictate their functions [51]. Based on known examples, several functions have been proposed for lncRNAs. At the simplest level, lncRNAs can serve as decoys, preventing the access of transcription factors and other proteins to the chromatin [53,54]. In a scaffold model, lncRNAs can bring together multiple protein partners to form ribonucleoprotein complexes. Importantly, the concept of RNA as molecular scaffold is likely to be a more common mode of action as hundreds of lncRNAs have been identified to form ribonucleic protein interactions with multiple protein partners [15,[55][56][57]. Finally, lncRNAs can function as guides for the proper localization of specific regulatory protein complexes in cis (on neighboring genes) or in trans (distantly located genes). The protein complexes brought on by the lncRNAs can act as epigenetic repressors and activators, as well as transcription factors [58].
Knowledge on how lncRNAs search for selective sites in the genome and how they interact with chromatin or target RNAs is slowly accumulating. LncRNAs can interact with RNA molecules via the formation of complementary hybrids [8,59,60]. They can also directly bind DNA by forming stable triplex structures via base-pairing [53,61] or by displacing one of the DNA strands and forming so-called R loops [62]. Alternatively, sequence-specific DNA-binding proteins can guide lncRNAs to target regions in the genome [63]. Recently, a novel mechanism of lncRNA targeting via chromosomal looping has been described for HOTTIP lncRNA [64].
For more detailed information about the mechanisms of lncRNAs action we refer to excellent reviews by others [65][66][67][68]. Also, their involvement in gene deregulation in cancer has been thoroughly reviewed elsewhere [9,10,69]. However, such regulatory roles are not solely attributed to non-coding transcripts. Also protein-coding transcripts have been shown to be involved in a number of regulatory mechanisms. Of course, many examples of cis-regulatory functions of mRNAs are known-mostly residing in the non-coding regulatory elements (untranslated regions, or UTRs)-and involve the regulation of stability, splicing and translation of the transcript [70][71][72]. Regulatory elements in the 5' UTR can play an important role in the control of translation initiation. Length, GC-content and secondary structures all affect translation efficiency [73,74]. Likewise, the 3' UTR can contain elements that are important in transcript cleavage, stability, translation and mRNA localization. The 3' UTR serves as a binding site for numerous regulatory proteins as well as miRNAs [75][76][77][78].

Structural Function
LncRNAs can serve as structural scaffolds involved in the formation of nuclear domains. The first described non-coding RNA with a structural role is Satellite III (SATIII) [79]. SATIII is involved in the formation of nuclear stress bodies (nSBs) when cells are subjected to thermal, hypertonic or chemical stresses [123]. These cellular stresses change the heterochromatin state of SATIII repeats on chromosome 9q11-12 to a euchromatin state. After transcription, SATIII RNA remains within the locus and recruits serine-arginine rich splicing factor SF2/ASF and several heat shock transcription factors like HSF1 and SAF-B to form nSBs [124]. SATIII was even shown to be sufficient for the formation of nSBs in the absence of a stress trigger [81]. A second lncRNA with an architectural role within the nucleus is nuclear-enriched autosomal transcript (NEAT1). NEAT1 is a 3.7 kb long unspliced, polyadenylated transcript that is localized at the edges of SC35 domains in paraspeckles, which are found in all cells in interphase [125,126]. NEAT1 was concluded to be essential for the assembly, maintenance and structural integrity of these paraspeckles [80,126,127].
Not only ncRNAs, but also mRNAs have been shown to perform architectural roles for cellular substructures. Two of these nuclear structures are the histone locus bodies (HLBs) and the associated Cajal bodies. The HLBs are known to harbor large amounts of histone pre-mRNA and histone 3'-end processing components [128], whereas the Cajal bodies contain small nuclear ribonucleoproteins (snRNPs) and are suggested to generate and recycle these proteins [129,130]. The de novo formation of both these nuclear components was shown to be induced by histone 2b (H2B) pre-mRNA [81]. In the same paper, spliced RNA Polymerase II transcripts are suggested to contribute to the morphogenesis of splicing speckles by functioning as a scaffold for pre-mRNA splicing factors. Another good example of an mRNA with a structural role is VegT, found in Xenopus laevis oocytes [131]. The VegT transcript was shown to be an integral part of the cytokeratin cytoskeleton at the vegetal cortex of the oocytes and responsible for the localization of Vg1, Bicaudal-C and Wnt11 mRNAs at this position. Depletion of VegT mRNA therefore resulted in the delocalization of these mRNAs [131]. Furthermore, the acquired disruption in the cytokeratin cytoskeleton network could be rescued by injecting exogenous VegT mRNA [82].

Transcriptional Control
A second level of lncRNA-directed regulation is by (co-)transcriptional control. Here, the recruitment of RNA polymerase II, transcription factors and/or co-factors to gene promoters is facilitated or prevented by long non-coding RNAs. The lncRNA MEG3 activates the p53 tumor suppressor gene and the growth differentiation factor 15 (GDF15) gene by enhancing p53 binding to the GDF15 gene promoter, thereby inhibiting cell proliferation [83]. While MEG3 is expressed in many normal human tissues, reduced levels of MEG3 are frequently observed in a variety of cancers and associated with hyper-proliferation [14,132,133]. Another example is the abundant lncRNA MALAT1, which is frequently upregulated in many cancers and can regulate alternative splicing by modulating the phosphorylation of serine/arginine-rich splicing factors (SRSFs) [12,[134][135][136]. Depletion of MALAT1 altered the localization and activity of these splicing factors, leading to altered splicing patterns for a set of pre-mRNAs [84]. The lncRNA GAS5 contains a hairpin sequence motif, mimicking a DNA binding site of the glucocorticoid receptor, thereby serving as a decoy to release the receptor from the DNA and preventing transcription of metabolic genes [54]. In the case of the human dihydrofolate reductase (DHFR) gene, a lncRNA initiated from the upstream DHFR-minor promoter inhibits the assembly of the pre-initiation complex at the major promoter by forming a stable triple helix complex with promoter sequences, as well as through direct interactions with the general transcription factor IIB (TFIIB) resulting in the silencing of the DHFR gene [53,66].
The human Steroid Receptor RNA Activator (SRA) transcript was initially identified as a ncRNA that co-activates the Progesterone Receptor [86]. More recently, SRA RNA has been confirmed to co-activate many nuclear receptors, including estrogen (α and β), androgen, glucocorticoid, retinoic acid (α), peroxisome proliferator activated receptors (δ and γ), thyroid and vitamin D receptors [87][88][89]137], reviewed in [85]. Additionally, it was shown that SRA RNA can enhance the activity of transcription factors like MyoD and GATA3 [90,91]. It is thought that SRA ncRNA functions as a scaffold for nucleoprotein complexes with both positive regulators (e.g., receptor co-activator SRC-1, RNA helicases p68 and p72, pseudo-uridine synthases Pus1p and Pus3p [86,88,90,92,93,138]) and negative regulators (such as the SMRT/HDAC1 Associated Repressor Protein SHARP or the SRA stem-loop interacting RNA-binding protein SLIRP [89,94,139]). With the discovery of three new isoforms of SRA it was shown that these could also be translated into the protein SRAP [140]. Considering the fact that these longer SRA isoforms include the same core sequence as needed for the regulatory RNA function, this RNA was concluded to be bi-functional. Deregulated SRA RNA levels have been implicated in a variety of cancers [141][142][143][144][145][146]. Interestingly, high expression levels of the SRAP protein were shown to be a predictor for positive outcome in breast cancer [147].

Transcription Elongation
Transcriptional pausing is a well-known phenomenon, where RNA polymerase II (RNAPII) becomes trapped downstream of the transcriptional start site (TSS) and is unable to escape into productive elongation [148]. P-TEFb, the positive transcription elongation factor, plays an essential role in facilitating RNAPII escape from this paused state. When recruited to promoters, P-TEFb phosphorylates the C-terminal domain (CTD) of RNAPII, allowing the escape into productive elongation [148]. In vivo, P-TEFb is present in two states: an active P-TEFb form, associated with Brd4 and other factors, and in an inactive ribonucleoprotein form, referred to as 7SK snRNP, containing a 331-nt non-coding RNA known as 7SK snRNA. RNase footprinting and mutagenesis experiments have indicated that 7SK contains a high degree of secondary structure, with stem-loops at both the 5' and 3' ends [96,[148][149][150]. The 5' stem loop binds P-TEFb as well as the Hexim1 protein, which acts to inhibit the kinase activity, while the 3' stem-loop binds the Larp7/PIP7S protein, which, in addition to a methylphosphate capping enzyme (Mepce), stabilizes the RNA [95][96][97][98][99]151,152]. For a long time the mechanism of P-TEFb release from the inhibitory complex was not known. However, a recent study has demonstrated the important role of HIC mRNA for P-TEFb activation [100]. The 3' UTR of HIC mRNA binds to and activates P-TEFb by displacing 7SK RNA from inhibitory complex. Analysis of the secondary structure of HIC mRNA 3' terminal region revealed the existence of hairpins resembling similar structures within 7SK RNA [100]. It is speculated that other mRNAs with similar secondary structure may exert the same function and multiple P-TEFbs containing RNPs exist [100].

miRNA Sponge
MicroRNAs-a large class of small ncRNAs-have emerged as a critical element in gene regulation by interacting with incompletely complementary sequences in target messenger RNAs [66,153,154]. They function by annealing to complementary sites on the coding sequences or 3' UTRs of target gene transcripts, where they promote the recruitment of protein complexes that impair translation and/or decrease the stability of mRNA, ultimately leading to a decreased target protein abundance [153,154]. Aberrant expression of miRNAs has been linked to many cancer types as well as other human diseases [155,156]. There is now evidence that the inverse mechanism may also take place, whereby mRNA levels can affect the distribution of miRNAs. Such RNA molecules can compete for miRNA binding, thereby acting as a miRNA sponge or decoy independent of a possible protein-coding function (reviewed in [157]). Natural miRNA sponges were first discovered in plants [158] and more recently also in virally infected primate cells [159], and in human cells [101]. The miRNA sponge/decoy function has been recently described for a number of lncRNAs. Specifically, the 3' region of the PTEN-P1 lncRNA was found to bind the same set of regulatory miRNA sequences that normally target the tumor-suppressor gene PTEN, alleviating the PTEN mRNA repression and allowing its translation into the tumor-suppressor protein PTEN [66,101]. Another interesting example is lncRNA HULC, which may act as an endogenous miRNA sponge that down-regulates a series of miRNAs, including miR-372. Inhibition of miR-372 by HULC led to reduced translational repression of its target gene, PRKACB, which in turn induced phosphorylation of transcription factor CREB [102,160].
Similarly, two mRNA transcripts were recently shown to act as miRNA sponges: the 3' UTR regions of Versican (VCAN) mRNA in hepatocellular carcinoma (HCC) and of CD44 mRNA in breast cancer cells [103,104]. The elevated levels of VCAN mRNA in HCC and HepG2 cells sequester miR-133a, miR-199a*, miR-144 and miR-431, thereby increasing the protein levels of amongst others CD34 and fibronectin (FN1), which have similar miRNA binding sites in their 3' UTRs [103]. Increased levels of the 3' UTR of VCAN increased proliferation, survival, migration, invasion, colony formation, and enhanced endothelial cell growth, but decreased apoptosis [103]. Similarly, CD44 mRNA is elevated in breast cancer cells and its 3' UTR harbors binding sites for miR-328, miR-512-3p, miR-491 and miR671 [104]. Elevated CD44 (3' UTR) levels sequester these miRNAs thereby increasing the protein levels of amongst others COL1α1 and fibronectin 1 (FN1), and enhanced the cell motility, invasion and cell adhesion and metastasis. Figure 2 shows a schematic representation of the miRNA sponge function of mRNA molecules. Importantly, by binding these miRNAs, the UTR sequences not only regulate their own transcript level homeostasis, they may also affect other transcripts by changing the available pool of these miRNAs through their decoy function [161]. Dynamics in this mode of regulation can be obtained by changing the length of the 3' UTR. For example, rapidly proliferating cells express shortened 3' UTRs, thereby decreasing the available positions for miRNA to bind [162]. Figure 2. miRNA sponge function for mRNA. In a normal cell, a specific miRNA can target a number of mRNAs resulting in the inhibition of translation and/or degradation of these transcripts. When the expression levels of one of the mRNAs targeted by this miRNA is changed, a redistribution of the specific miRNA will cause a change in protein translation for multiple transcripts. In this schematic figure, the overexpressed yellow mRNA functions as a sponge for the red miRNA, yielding increased green and blue protein levels. In contrast, a depletion of the yellow miRNA sponge would result in a decrease in green, blue and yellow protein levels.

RNA Degradation
Global transcriptome analyses has provided evidence that a large proportion of the genome can simultaneously produce transcripts from both strands, and that antisense transcripts commonly link -neighboring genes‖ in complex loci into chains of linked transcriptional units [163]. According to data generated by the FANTOM3 project, 4,520 pairs of full-length transcripts were able to form sense/antisense pairs on exons as detected in the mouse genome. Among them, 1,687 pairs were formed between protein coding genes, 2,478 by protein-coding/non-coding gene pairs and 355 by non-coding genes only [163]. Expression profiling revealed frequent concordant regulation of these sense/antisense pairs. One of the possible mechanisms for this transcript-mediated gene regulation is based on the sense-antisense RNA duplex formation. These sense-antisense transcript pairs can be regarded as Natural Antisense Transcripts (NATs). NATs are simply RNAs containing sequences that are complementary to other endogenous RNAs [105]. These can occur in cis, as described above, but they can also be transcribed in trans from separate loci (trans-NATs). Both cisand trans-NATs can affect gene expression at the level of transcription, maturation, transport, stability and translation [105]. Numerous examples of cisand trans-acting lncRNAs base-pairing with mRNA molecules and affecting its stability or translation have been describe so far [8,59,106,[164][165][166].
A recently discovered group of trans-acting lncRNAs, termed half-STAU1-binding site RNAs (½-sbsRNAs), can activate the decay of specific target mRNAs. Staufen 1 (STAU1)-mediated messenger RNA decay (SMD) involves the degradation of translationally active mRNAs upon STAU1 binding to the 3' UTR via double-stranded RNA [60]. STAU1-binding sites are formed by imperfect base-pairing between an Alu element in the 3' UTR of an mRNA target and an Alu element in a cytoplasmic lncRNA [60]. Evidently, Alu elements are highly needed to form RNA duplexes between mRNA and lncRNA that can be recognized by STAU1. As many mRNAs contain Alu elements in their 3' UTRs, it is highly plausible that also direct mRNA-mRNA base pairing may be a substrate for STAU1-mediated decay. A bioinformatic analysis revealed many stretches of imperfect base-pairing between Alu sequences localized within 5' and 3' UTR regions of mRNAs, similar to the ½-sbsRNAs mode of action [105]. Whether such putative mRNA-mRNA pairings are functional and act via the SMD pathway will be the topic of future research.

Translational Control
LncRNAs are best known for their roles as regulators of transcription. However, recent studies have shown an important role of long non-coding RNAs in mRNA translation [8,106,108,164]. LncRNAs can modulate translation by two different mechanisms. As mentioned above, the cisand trans-acting lncRNAs are capable to pair with mRNA molecules forming double-stranded RNA structures and thus inhibiting mRNA translation [8,106]. Alternatively, lncRNAs can act by affecting the general translation machinery [108]. LincRNA-p21 is an example of a trans-acting lncRNA involved in translation inhibition [8,167]. The transcripts CTNNB1 and JUNB (encoding β-catenin and JunB, respectively) base-pair imperfectly with lincRNA-p21 at several places throughout the coding regions and UTRs. The formed lincRNA-p21-mRNA complex further interacts with translation repressors Rck and Fmrp, suggesting that lincRNA-p21 can repress the translation of target mRNAs by operating via multiple mechanisms [8,167]. Another example of a cis-acting lncRNAs is antisense mRNA for PU.1 gene [106,168]. The processed antisense RNA in the cytoplasm can bind to the sense PU.1 transcript and stall translation between initiation and elongation steps [106,168].
Protein-coding antisense mRNA transcripts are also capable to form RNA duplexes with sense mRNA molecules leading to translation inhibition. Antisense BCMA RNA is transcribed from the same locus as BCMA and has typical mRNA features, e.g., polyadenylation, splicing, Kozak consensus sequence and an open reading frame encoding an experimentally proven 115 amino acid peptide: p12 protein [107]. Experimental data suggests that antisense BCMA inhibits the expression of BCMA protein, while it does not affect the expression level of BCMA mRNA. The inhibition of BCMA expression is obtained through the action of the antisense RNA and not of the p12 protein, although the exact mechanism is not fully understood [107].
A ncRNA that acts by affecting the general translation machinery is the Xenopus laevis transcript BC1. BC1 transcript-expressed in neurons and germ cells-inhibits the assembly of the translation initiation complex [169]. The 3' region of the BC1 RNA interacts with eIF4A and PABP and disrupts the functional link between the two factors which is necessary for efficient translation in Xenopus oocytes [108]. A near-complete restoration of translation occurs after introduction of excess eIF4A and PABP, indicating that translation repression by BC1 happens via eIF4A and PABP [108].
The ability to inhibit the general translation machinery is also identified for several mRNAs. These transcripts mainly act through the interaction of their UTRs with the RNA-dependent protein kinase (PKR). PKR is a serine-threonine protein kinase that is activated by intermolecular autophosphorylation upon binding to RNA molecules. The 3' UTR regions of cytoskeletal muscle mRNAs can act as trans-regulators by inhibiting translation through the activation of PKR [109]. Specifically, the 3' UTRs of tropomyosin, troponin and cardiac actin mRNAs can induce muscle cell differentiation and appear to function as tumor suppressors. These RNA sequences are predicted to form secondary structures with extended duplex stretches. It was shown that the 3' UTRs of cytoskeletal mRNAs interact with the RNA-binding domain of the PKR [109]. Once activated, PKR phosphorylates its substrates, including translation initiation factor eIF2α, which results in sequestration of another initiation factor, eIF2β, ultimately leading to inhibition of protein synthesis [109]. An important observation from this study is that full-length mRNA transcripts are more efficient at inhibiting translation than only their 3' UTR regions, suggesting the entire transcript is required for proper functioning [109]. Similarly, the P23/TCTP full-length mRNA but not a truncated version thereof, was able to bind and activate PKR, resulting in the inhibition of translation [110]. Several other protein coding transcripts have been reported to interact with PKR through their structured UTRs: the 5' UTRs of VEGFA mRNA [111] and IFN-γ mRNA [112], and the 3′ UTRs of TPM1 mRNA [113] and TNF-α mRNA [114]. In all cases PKR activation caused inhibition of translation, which can have a cis effect on the translation level of the mRNA itself as well as a more general trans effect on the translation level of other transcripts.
Another mRNA with translational control is the tumor suppressor gene p53 [115]. This gene is mutated in about half of all cancers and therefore considered a driver mutation gene [170,171]. The p53 protein works mainly as a transcription factor that acts upon cellular stresses such as DNA damage, stress of the endoplasmic reticulum (ER), hypoxia and telomere erosion [172]. When p53 is induced by this cellular stress, it can trans-activate a variety of target genes which promote cell cycle arrest, senescence or apoptosis [173,174]. Another p53 target with a different function is the MDM2 gene. Its protein product is an E3 ubiquitin ligase which promotes polyubiquitination and proteasomal degradation of p53, thereby forming a negative regulatory feedback loop [175][176][177]. Interestingly, MDM2 is also involved in a positive regulatory feedback loop of p53. The mRNA of p53 can interact with the RING domain of MDM2, which prevents the E3 ligase activity and furthermore stimulates translation of the p53 mRNA [115]. At first, the interaction between the MDM2 protein and p53 mRNA was considered to control the function of MDM2 [115]. Later, it was demonstrated that phosphorylation of the Ser395 residue of MDM2 is required for the p53 mRNA-MDM2 interaction and thereby acts as the switch for MDM2 between being a negative or a positive regulator [178].

Unknown Function
Recently, an example of a regulatory lncRNA in prostate cancer was described, with a proven functionality, but through a yet unknown mechanism of action [116]. In this high throughput RNA-sequencing study on clinical prostate cancer samples, a panel of 121 transcriptionally deregulated lncRNAs (Prostate Cancer-Associated Transcripts, or PCATs) were identified, representing potentially functional lncRNAs associated with prostate cancer. One of these transcripts, called PCAT-1 was selectively upregulated only in prostate cancer and shown to function predominantly as a transcriptional repressor by facilitating trans-regulation of genes preferentially involved in mitosis and cell division, including known tumor suppressor genes, such as BRCA2 [116].
Also several mRNAs, and more specifically their UTRs, have been reported to function as regulators (riboregulators) that suppress tumor formation but through unknown mechanisms. Results from Rastinejad and Blau suggest that the 3' UTRs of certain differentiation-specific RNAs are trans-acting regulators in feedback loops that inhibit cell division and promote differentiation [179]. More recently, the 3' UTR of several other transcripts were shown to reduce proliferation and induce differentiation of both myogenic cells and fibroblasts. The 3' UTR of prohibitin (PHB), an inhibitor of cell proliferation, significantly suppresses the tumorigenic properties and metastatic phenotype of transformed MCF7 cells [117]. Similarly, the 3' UTR of ribonucleotide reductase (RNR), a key rate-limiting enzyme in DNA synthesis, significantly suppresses the tumorigenic properties and metastatic phenotype of transformed fibroblasts cells [118]. Also the 5' UTR can fulfill such actions: the 5' UTR of the human c-myc P0 transcript suppresses the malignant phenotype of human breast cancer cells with decreased anchorage-independent proliferation, enhanced susceptibility to programmed cell death, and complete loss of the ability to form tumors in the intact animal [119]. For all these cases mentioned above, it is clear the UTRs harbor trans-regulatory functions, but the exact mechanism of their action is currently still unknown.

Epigenetic Regulatory Potential of Protein-Coding RNA
It is well known that many lncRNAs are involved in the regulation of gene expression at the epigenetic level. Approximately 20-30% of all lncRNAs have been shown to be able to physically interact with specific epigenetic enzymes, which control the reversible modification of histone residues and DNA methylation, thereby influencing the activity of genes [120,180]. Upon binding, the lncRNAs can guide chromatin modifying complexes to their target regions. Such lncRNAs can guide either gene activators (for example the lncRNAs HOTTIP [64] or Mistral [181]) or gene repressors (e.g., HOTAIR [55], HOTAIRM1 [120], ANRIL [15,57], Kcnq1ot1 [56], Air [121], Xist [182] or pRNA [61]). LncRNAs can even function as a scaffold, bringing together multiple protein partners to form ribonucleoprotein complexes, which are subsequently guided to their genomic target locations. For example, HOTAIR can simultaneously bind to both the polycomb repressive complex 2 (PRC2) and the LSD1-CoREST complex using specific domains of the RNA molecule [55], while ANRIL and HOTAIRM1 directly interact with proteins from both PRC1 and PRC2 complexes [15,57,120]. Similarly, the lncRNA Kcnq1ot1 interacts with both the PRC2 and G9a (EHMT2) to lay down the silencing histone marks H3K27me3 and H3K9me2, respectively [56]. In recent attempts to characterize all RNA molecules that interact with the PRC2 complex, RNA immunoprecipitation experiments combined with next generation sequencing have been conducted by us and others [29,122,183]. Thus far, these studies have mainly focused on the interactions between lncRNAs and PRC2 complex components. Zhao and colleagues focused mainly on imprinted non-coding transcripts and MEG3 in particular, which directs PRC2 to the reciprocally imprinted Dlk1 coding gene [122]. Guil et al. only describe results for non-coding intronic RNA sequences [29]. They report several intronic RNA regions capable of interacting with PRC2 components and inducing repression of the host gene in cis. One of their examples is the SMYD3 intronic RNA, which can bind to EZH2, a component of the PRC2 complex, thereby targeting this repressive complex to the SMYD3 gene. SMYD3 is a SET domain-containing H3K4 methyltransferase with oncogenic properties, which is frequently overexpressed in colorectal, breast and liver cancer [184,185]. Reducing the levels of SMYD3 by SMYD3 intronic RNA, resulted in reduced tumor growth, and revealed SMYD3 intronic RNA to harbor tumor suppressive abilities [29]. Similarly, several other intronic RNAs with stand-alone regulatory functions were recently described in mice, implicating this to be a common type of multi-functionality within mammalian (primary) transcripts [186]. Finally, in experiments from our own laboratory, we analyzed the binding ability of transcripts over 200 nucleotides in size to SUZ12, one of the PRC2 complex components, in prostate cancer cells [183]. Both SUZ12 and EZH2 proteins are part of the PRC2 complex, contain RNA binding domains and have been shown to interact with RNA molecules [55,57,182,187].
To specifically gain insight into the binding of protein-coding RNA molecules to the PRC2 complex, we initially compared results for both mRNAs and lncRNAs in experiments from our own laboratory. In these experiments, we determined the SUZ12-bound RNA fraction in the human prostate cancer cell line LNCaP upon formaldehyde-fixation (RNA-IP) via next-generation sequencing and compared these results to input material [183]. To our surprise, protein-coding transcripts appeared to bind to the PRC2 complex with similar affinities as lncRNAs did. In fact, a substantial portion of mRNAs (and lncRNAs) bound with even stronger affinities to PRC2 than previously reported lncRNA-PRC2 interactors, including HOTAIRM1, ANRIL and KCNQ1OT1 ( Figure 3A). Independent replicates reproduced our initial findings. Next, we decided to reanalyze the raw data from similar experiments from the Esteller laboratory [29]. In these experiments EZH2-RNA interactions were studied in the human colorectal cancer cell line HCT116 by UV cross-linking (iCLIP) and next-generation sequencing. We compared the levels of EZH2-bound transcripts with background levels (IgG-bound fraction) to calculate fold-enrichment values. This reanalysis confirmed the findings from our own experimental data, and showed similar enrichment levels for mRNAs and lncRNAs, again with many transcripts binding stronger than known lncRNA interactors ( Figure 3B). The (re-)analysis of data from both the Esteller lab and our lab yielded very similar results, even though both studies were conducted in different cancer cell lines, targeting different PRC2 complex components and using different experimental set ups. Finally, we included results from the Zhao et al. study, in which mouse embryonic stem cells were used to identify RNAs that interacted with the PRC2 complex component EZH2 via immunoprecipitation and next-generation sequencing [122]. There, over 9,000 transcripts were detected that interacted with EZH2, including many protein-coding genes ( Table 2). Even though the depth of sequencing in this study was much lower than the study by Guil et al. and our study, their data also showed frequent enrichments of protein-coding transcripts, in particular those encoding for oncogenes and tumor suppressors, similar to transcripts from imprinted genes.  Figure 3. RNA binding to PRC2 complex components. (A) Analysis of data from our lab showed that both mRNAs and lncRNAs bind to the PRC2 complex component SUZ12 with similar binding affinities [183]. For comparison, known lncRNA-PRC2 interactions and their fold enrichments are shown in red. Here, the RNA-IP experiments were performed on the prostate cancer cell line LNCaP upon formaldehyde-fixation; (B) Reanalysis of the raw data from the Guil et al. confirmed our finding that both protein-coding and non-coding RNA can bind with high affinity to the PRC2 complex, in this case the EZH2 subunit [29]. These data were obtained from UV cross-linking experiments (iCLIP) in the colorectal cancer cell line HCT116.
In conclusion, all three studies described above imply a vast level of interaction between proteins of the PRC2 complex and protein-coding RNAs. These results are also in line with recent mRNA-proteome interaction studies where mRNAs appear to interact with regulatory enzymes and proteins. In these large proteome studies hundreds of mRNA binding proteins were identified [188,189]. As expected, the list of RNA binders was enriched for already known RNA binding proteins, involved in mRNA splicing, localization, processing and translation. However, also proteins functioning in transcription regulation were clearly identified, including transcription factors and co-activators (such as MYBBP1A and EDF) [188]. What functions these RNA-protein interactions have and by what mechanism these proteins may modulate transcription remains to be determined. Here, we hypothesize that mRNAs such as those binding to the PRC2 complex can indeed have additional regulatory functions ( Figure 4). Currently, we cannot rule out the possibility that these mRNA-PRC2 interactions are non-specific events, but their levels of enrichment in all three studies are similar to or even stronger than known functional lncRNA-PRC2 interactions. Further studies are needed to prove a functional role for these mRNA-PRC2 interactions. . Proposed guide function for mRNA. Many mRNAs have here been shown to interact with PRC2 complex components. Similar to lncRNAs, we propose that mRNAs are involved in guiding the PRC2 complex to its target locations in the genome, where it can repress genomic regions by depositing a trimethyl mark on the lysine 27 residue of histone H3 (K27me3). Which part of the mRNA directly interacts with the PRC2 complex is currently not known.

Conclusions
From the vast amount of papers it is clear that long non-coding RNA can have a variety of important roles in gene deregulation in cancer. Evidence of similar roles for protein-coding transcripts is now slowly accumulating. Here, we have combined, reviewed and extended the current knowledge of trans-regulatory roles for mRNA. Side-by-side, we have compared lncRNA and mRNA examples with similar regulatory functions. We have shown that mRNAs can frequently be associated with the PRC2 complex components and hypothesize a common guiding role for mRNA molecules. Future experiments need to further substantiate these speculations. Lastly, conclusions from loss-of-function experiments for mRNAs may need to be reinterpreted as the effects may not automatically be solely attributed to the associated protein function, but instead may also be partially due to affected regulatory functions. Again, further experimentation will show the extent of these regulatory roles for coding RNA.