Understanding Long Noncoding RNA and Chromatin Interactions: What We Know So Far

With the evolution of technologies that deal with global detection of RNAs to probing of lncRNA-chromatin interactions and lncRNA-chromatin structure regulation, we have been updated with a comprehensive repertoire of chromatin interacting lncRNAs, their genome-wide chromatin binding regions and mode of action. Evidence from these new technologies emphasize that chromatin targeting of lncRNAs is a prominent mechanism and that these chromatin targeted lncRNAs exert their functionality by fine tuning chromatin architecture resulting in an altered transcriptional readout. Currently, there are no unifying principles that define chromatin association of lncRNAs, however, evidence from a few chromatin-associated lncRNAs show presence of a short common sequence for chromatin targeting. In this article, we review how technological advancements contributed in characterizing chromatin associated lncRNAs, and discuss the potential mechanisms by which chromatin associated lncRNAs execute their functions.

RNA, considered a potential primordial molecule of life, has been serving as a central molecule in molecular biology research due to its ability store information and execute catalytic functions in diverse biological contexts. It has long been argued that RNA mediated informational activities have slowly evolved into a more stable and easily replicable DNA, while the catalytic functions have evolved into highly versatile polypeptides [1]. However, experiments by Beadle and Tatum led to the proposition of "one gene-one enzyme" hypothesis (simplified to a more popular "one gene-one protein" assumption) [2] which formed the basis for the formulation of central dogma of molecular biology [3,4]. This concept that genes solely encoded the functional components of cells in the form of proteins (i.e., the 'enzymes') was influenced by the limitations of experimental and technological advancements in the field that shaped the perception of molecular mechanism of this era. This theory minimized the functional potential of RNA as a molecule merely bridging DNA to protein. Accordingly, an increase in the number of protein coding genes was proposed to correlate positively with the increase in organismal and molecular complexity. However, the most convincing evidence against this theory came with the advancement of technology that allowed generation of high throughput transcriptome sequencing data from several model organisms across the evolutionary ladder. These data proved beyond doubt that the number of protein coding genes does not correlate with the organismal complexity [5]. However, while the genomes of higher eukaryotes pervasively transcribe to form RNA, only a small percentage of the transcribed RNAs were found to be translated into proteins [6]. This noncoding portion of the transcribing genome (termed non-coding RNA) has consistently increased with organismal complexity [7]. Consistent with the latter notion, recent evidence demonstrated that long noncoding RNAs play a critical role in tissue and developmental-dependent biological functions, suggesting that noncoding portion of the genome plays a significant role in rewiring multi-layered gene expression controlling the organismal development.
The above classifications of lncRNAs represent one of the most simplistic approaches of sub classification and might not include all the categories. Moreover, the basis of many of these classifications can have overlapping implications, for example, a linear lncRNA that is localized in the nucleus act as a scaffold in recruiting chromatin modifying complexes for the regulation of its target genes in cis (Table 1). Several lncRNAs are also known to exhibit pleiotropic effects [20,42] making such categorizations to be error prone.  [55,56] It has recently been shown that several lncRNAs regulate gene expression in critical cellular contexts by organizing chromatin into active and inactive domains through direct interaction with different chromatin modifying enzymes [28,33,57]. For example, sense (Xist) and antisense (Tsix) pair of lncRNAs from the mammalian X chromosome inactivation center (XIC) exclusively work in cis-acting fashion to regulate chromatin structure to bring about X chromosome inactivation. Xist and Tsix lncRNAs have been acting as paradigms for chromatin dependent gene regulation by lncRNAs on whole chromosome and at a single gene level, respectively. In particular, Xist, by scaffolding several chromatin modifying enzymes, it has been shown to regulate higher order (3D) chromatin structure of the X-chromosome during the onset of X chromosome inactivation (XCI) [31,58,59]. Likewise, several antisense lncRNAs from the imprinted clusters have also been shown to regulate gene expression in large chromosome domains through organizing higher order chromatin structure (Table 1). Although we begin to understand the functional role of lncRNAs and their interacting proteins in the regulation of chromatin structure, but how they are targeted to chromatin is not very well understood. In this review, we will focus on the regulatory functions of chromatin associated lncRNAs and how the emergence of various technological advances has enabled comprehensive annotation and functional identification of these transcripts. Finally, we will discuss the mechanisms by which chromatin associated lncRNAs make contacts with their target genes.

Approaches to Define RNA-Chromatin Interactions
In the early 1980s, elegant genetic and molecular studies identified a phenomenon of parent-of-origin-specific allelic expression called genomic imprinting which explained in part the molecular mechanism of dosage compensation [60]. Independently, two imprinted genes were identified: the paternally expressed protein-coding gene Igf2 and the maternally expressed lncRNA H19. Both genes map to the distal end of the mouse chromosome 7, which lie in proximity to each other forming the H19/IGF2 cluster [61,62]. H19 lncRNA was among the first lncRNAs to be functionally characterized in various biological contexts, including genomic imprinting. This era witnessed a burst in the identification and functional characterization of several imprinted lncRNAs with chromatin regulatory functions, resulting in a gradual explosion of different techniques over the next decade to address the mechanism of lncRNA interaction with chromatin modifiers or other proteins and with chromatin ( Figure 1). One of the most defining experimental evidence, implicating lncRNAs in chromatin organization, came in 1991 when Xist lncRNA was shown to localize to the inactivated X chromosome [63]. This observation was followed up by several other studies where imprinted lncRNAs were all found to execute their actions by being in close interaction with chromatin [42,64]. Mechanistic studies of imprinted lncRNAs for their role in the regulation of imprinted gene clusters were based on experimental approaches that were locus-or gene specific, where localization and binding protein partners were identified for any given lncRNA (Table 1). These mechanistic studies based on imprinted lncRNAs, inspired to develop experimental approaches that can identify lncRNAs which can bind to a given protein, in particular to different chromatin modifiers such as PRC2 [65][66][67], YY1 [68,69], CTCF [70] and others [71]. These "protein centric" approaches (also refer Box 1) led to the global identification of lncRNAs that bind to several chromatin modifiers and thus possibly to chromatin. These approaches have identified potential chromatin interacting lncRNAs, however with few exceptions, direct targeting of lncRNAs to chromatin was not validated. This led to the next wave of experimental approaches with a focus to identify more direct evidence for lncRNA-chromatin interactions. These approaches can be broadly divided as RNA and non-RNA centric approaches as discussed below.
Box 1. Methods to study global RNA-protein interactions.
• RIP-seq: RNA immunoprecipitation (RIP) exploits antibodies to pull down RNA bound to a given protein and the immunoprecipitated RNA subjected to high throughput sequencing (RIP-seq), thereby, enabling global identification of RNAs bound to protein of interest. Technical variants of this methods include native RIP-seq [72] and formaldehyde cross linked fRIP-seq [71]. • RIPiT-Seq: RNA: Protein immunoprecipitation in tandem (RIPiT) is suitable for RBPs with poor inherent ultraviolet (UV) crosslink ability. This method yields highly specific RNA binding footprints of any cellular RNPs and the resulted RNA footprints can then be combined with high-throughput sequencing (RIPiT-Seq) thereby providing a means to map the RNA binding sites of such RBPs [73]. This method has been used to identify and validate RNA binding pocket within WDR5 chromatin modifier [74]. • CLIP: Improves the specificity of RIP by UV crosslinking of RNA/protein complexes before extraction. This allows the removal of weakly bound RNA through stringent washing. The remaining RNA can then undergo reverse transcription and PCR amplification (or next generation sequencing). The main drawback of this method is the loss of a significant proportion of transcripts which are stalled at the cross-linking site resulting in truncated cDNAs. UV crosslinking can also introduce some bias as its ability to bind RNA to protein varies depending on the base/proximity of the reactive amino acids mediating the interaction. HITS-CLIP when CLIP is combined with high throughput next generation sequencing [75]. • iCLIP: Individual-nucleotide-resolution CLIP (iCLIP) was developed to enable the recovery of truncated cDNAs lost in conventional CLIP. The iCLIP protocol employs UV irradiation as a cross-linking source that preserves in vivo RNA-protein interactions through promoting covalent bonds at the sites of protein-RNA interactions. Following mild RNAse treatment, to obtain RNA fragments in an optimal size range, RNA-protein complexes are immunoprecipitated. The immunoprecipitated RNA is dephosphorylated to enable an adapter ligation to the 3 end of the RNA and radioactive labelling at the 5 end. This method includes SDS-PAGE separation and transfer to nitrocellulose membrane to capture radiolabelled, immunoprecipitated, crosslinked RNA-protein complexes. The captured RNA is then reverse transcribed into cDNA. Following cDNA circularization, restriction enzyme digestion to linearize the cDNA prior to PCR amplification and library preparation for high-throughput sequencing. Truncated cDNA represents the majority in the cDNA library and the position of the preceding nucleotide, after mapping to the genome, corresponds to the cross-linking site (Huppertz et al., 2014). • eCLIP: Enhanced CLIP improves library preparation and circular ligation steps of iCLIP allowing greater power in filtering and mapping truncated sequences. eCLIP replaces the 5 adaptor ligation with a 3 cDNA ligation [76], whereas further improved eCLIP protocol Monitored eCLIP (meCLIP) uses both a 5 ligation and a 3 cDNA ligation [77]. • irCLIP: This is similar to iCLIP apart from the fact that it makes use of a biotinylated, fluorescent 3 DNA adaptor [78]. • BrdU-CLIP: BrdU-CLIP built on the same principle as that of CLIP and iCLIP but employs a nucleotide analogue BrdUTP in reverse transcription to capture truncated and non-truncated cDNA products using BrdU antibody [79]. • GoldCLIP: Improved with a shortened iCLIP protocol that removes the SDS-PAGE separation and membrane transfer steps. RNPs are tagged with Halo-tag and overexpressed in cell line of interest. Halo-tagged protein-RNA complex affinity purified using Halo-ligand. Following denaturing washes, the purified RNAs subjected to high-throughput sequencing [80].
• PAR-CLIP: Photo-Activatable Ribonucleoside enhanced Cross-Linking and Immunoprecipitation (PAR-CLIP) is a modified CLIP method, where the introduction of photo-activated nucleosides in the media are taken up by cells and subsequently used for protein-RNA crosslinking thereby enabling the following advantages. First, PAR-CLIP shows in general 100-to 1,000-fold higher RNA recovery, in comparison to the conventional cross-linking at 254 nm. Secondly, UV radiation-induced T-to-C mutations characteristic of the cross-linked sites that have incorporated photo-activated nucleoside analogues. Based on this, PAR-CLIP exploits mutation analysis to improve the identification of precise RBP binding positions or footprint [81]. Studies using PAR-CLIP have identified that both EZH2 and JARID2 can directly interact with RNA in cells [82,83]. Their interaction with RNA is mutually exclusive and antagonistic to their ability to interact and bind to chromatin. • fCLIP: Formaldehyde cross-linking, immunoprecipitation and sequencing (fCLIP) uses formaldehyde as a cross-linking reagent for CLIP to characterize the RNA binding protein binding regions on double stranded RNAs. dsRNAs are inefficiently crosslinked by UV, thus making it difficult to study the interactions between dsRNA binding proteins and their substrates. It has been used to characterize mapping of in vivo DROSHA cleavage sites at single nucleotide resolution [84]. is an improved protocol of RIC, which finemaps the protein domains that interacts with mRNAs. UV irradiated cells were given stringent denaturing washes to purify the resulting covalently linked RBP-RNA complexes with oligo(dT) magnetic beads. As a defining modification to RIC, post elution the RBPs were subjected to partial proteolysis to retain only those protein regions that are bound to the RNA and are separated by a second oligo(dT) selection from the non-interacting peptides that are released into the supernatant. Mass-spectrometric analysis of the eluted and released peptides to calculate peptide intensity ratios between these fractions will determine the RNA-binding regions [86].  • RIP-seq: RNA immunoprecipitation (RIP) exploits antibodies to pull down RNA bound to a given protein and the immunoprecipitated RNA subjected to high throughput sequencing (RIP-seq), thereby, enabling global identification of RNAs bound to protein of interest. Technical variants of this methods include native RIP-seq [72] and formaldehyde cross linked fRIP-seq [71]. • RIPiT-Seq: RNA: Protein immunoprecipitation in tandem (RIPiT) is suitable for RBPs with poor inherent ultraviolet (UV) crosslink ability. This method yields highly specific RNA binding footprints of any cellular RNPs and the resulted RNA footprints can then be combined with high-throughput sequencing (RIPiT-Seq) thereby providing a means to map the RNA binding sites of such RBPs [73]. This method has been used to identify and validate RNA binding pocket within WDR5 chromatin modifier [74]. • CLIP: Improves the specificity of RIP by UV crosslinking of RNA/protein complexes before extraction. This allows the removal of weakly bound RNA through stringent washing. The remaining RNA can then undergo reverse transcription and PCR amplification (or next generation sequencing). The main drawback of this method is the loss of a significant proportion of transcripts which are stalled at the cross-linking site resulting in truncated cDNAs. UV crosslinking can also introduce some bias as its ability to bind RNA to protein varies depending on the base/proximity of the reactive amino acids mediating the interaction. HITS-CLIP when CLIP is combined with high throughput next generation sequencing [75]. • iCLIP: Individual-nucleotide-resolution CLIP (iCLIP) was developed to enable the recovery of truncated cDNAs lost in conventional CLIP. The iCLIP protocol employs UV irradiation as a cross-linking source that preserves in vivo RNA-protein interactions through promoting covalent bonds at the sites of protein-RNA interactions. Following mild RNAse treatment, to obtain RNA fragments in an optimal size range, RNA-protein complexes are immunoprecipitated. The immunoprecipitated RNA is dephosphorylated to enable an

Chromatin Oligoaffinity Precipitation (ChOP)
ChOP is one of the first techniques developed to affinity purify chromatin associated lncRNAs using biotinylated oligonucleotides. This technique was first used to determine the occupancy of Alu and B2 RNAs at the promoters of the repressed genes during heat shock [88]. Later this technique was successfully employed to characterize the occupancy of imprinted lncRNA Kcnq1ot1 across the 1 Mb Kcnq1 imprinted cluster [32]. This technique set a stage for the development of several other techniques based on the usage of biotinylated oligonucleotides on cross-linked chromatin to characterize interacting lncRNA binding proteins, and also genome-wide binding sites for lncRNAs.

RNA Antisense Purification (RAP)
This technique captures a target RNA of interest through hybridization with antisense biotinylated oligos. By cross-linking endogenous macromolecular complexes prior to RNA capture with different cross-linking agents such as 4 -aminomethyltrioxalen (AMT), a psoralen-derivative crosslinker, formaldehyde and disuccinimidyl glutarate (DSG), RAP allows for the identification of RNA, proteins and DNA loci that cross-link to and co-purify with the target RNA. While RAP-RNA approach elucidated the functions of U1 small nuclear RNA and Malat1 in RNA processing, the RAP-DNA technique enables genome-wide mapping of RNA-DNA interactions. Thus, RAP provides an important tool for systematic interrogation of lncRNA function and mechanism. RAP was used to study the localization of Xist during the onset of XCI. Xist lncRNA exploits the three-dimensional genome architecture to spread along the inactive X chromosome [31]. This also helped in dissecting the mechanism of Firre lncRNA to spatially assess inter chromosomal interactions [89,90].

Chromatin Isolation by RNA Purification (ChiRP)
Tilling antisense biotinylated oligonucleotides were used to retrieve specific lncRNA of interest bound to chromatin and proteins, which can be assayed separately, thus, generating a complete interaction profile for that particular lncRNA. Drosophila rox2 lncRNA, human telomerase RNA TERC and HOTAIR lncRNAs were found to have precise but distinctly different transgenomic targets [34]. Like ChOP, this technique enabled the identification of both cis and trans genomic targets of lncRNAs along with the identification of their interacting protein partners. The specificity and robustness of this technique enabled efficient interrogation of lncRNA functions.

Capture Hybridization Analysis of RNA Targets (CHART)
Like ChOP, ChIRP and RAP-DNA, CHART was developed to map the genome-wide binding profile of the chromatin-associated RNAs. CHART, ChOP, ChIRP and RAP-DNA built on the same principle i.e., purifying RNA associated chromatin regions using biotinylated antisense oligonucleotides. This protocol also uses a small number of 22-28 nucleotide antisense oligonucleotides, complementary to single stranded regions of a target RNA that are accessible for hybridization, to purify RNAs from a cross-linked chromatin extract. RNA-chromatin complexes immobilized on beads are eluted using RNase H, and the eluted genomic DNA is subsequently sequenced using high-throughput sequencing technologies and mapped to the reference genome to identify RNA-chromatin associations. The technique was initially used to successfully determine the genome-wide binding profile of the roX2 ncRNA, a regulator of dosage compensation, in Drosophila S2 cells. This technique was later applied to map genome-wide chromatin occupancy for several mammalian long ncRNAs, including Xist, Neat1 and Malat-1 long ncRNAs [91][92][93].
The key differences between CHART and other three techniques i.e., RAP, ChOP and ChIRP are as follows: 1.
CHART uses a two-step formaldehyde cross-linking approach to fix nuclei.

2.
RNase H sensitivity assay is used to identify regions in the target RNA that are accessible for hybridization with antisense oligonucleotides. A small number of short oligonucleotides that have been predetermined to interact with the RNA target are then used in CHART to enrich for RNA-chromatin complexes.

3.
Antisense oligonucleotide bound RNA-chromatin complexes are eluted using RNase H. This reduces nonspecific false positive binding events generated by direct binding of antisense oligonucleotide probes to DNA [92].
The overall limitation of these oligonucleotide-based approaches is that the efficacy of these methods is based on the size of the lncRNA in question. Cellular localization of lncRNAs also affect the efficiency of this method. Transcript abundance along with cisversus transaction can also lead to inherent differences in the retrieval efficiencies of purified RNAs. These issues would limit investigations to specific lncRNAs and thus there is a need for alternate approaches that overcome above limitations in part. The need for such an approach possibly led to the gradual development of several techniques for identification of global targeting of RNAs to chromatin as we shall discuss next.

Chromatin RNA Isolation by Sucrose Gradient Fractionation
In one of the initial efforts to globally characterize ncRNAs associated with chromatin, human skin fibroblast cells (HF) were treated with micrococcal nuclease (MNase) followed by separation of different length chromatin fragments on a sucrose gradient. Soluble chromatin fraction was collected from the gradient, and RNA was isolated and subjected to high throughput sequencing [57]. This effort led to the initial characterization of several evolutionarily conserved chromatin associated RNAs (CARs) which mostly mapped to intronic genomic regions. One of the intergenic CARs namely, Intergenic10 was functionally characterized and it has been shown to positively regulate the transcription of non-overlapping nearby genes in cis. Importantly, ncRNA field, then, was conceptually predominated with studies understanding the functional significance of ncRNAs in the regulation of genomic imprinting [32,45,67]. Interestingly, in all these contexts, ncRNAs were regulating imprinted status by transcriptional repression of target genes. Thus, positive transcriptional regulation by Intergenic10 CAR of neighbouring genes also opened up the prospect of other ncRNA mediated and/or assisted gene activation mechanisms.

Chromatin RNA Immunoprecipitation (ChRIP)
Pre high-throughput sequencing era witnessed robust application of this technique to validate the enrichment of lncRNAs in different chromatin compartments using antibodies against different histone modifications to pull down soluble chromatin fraction. RT-qPCR with specific primers were used to validate candidate lncRNA enrichment in the pull-down chromatin fractions. Using this technique, both human and mouse Kcnq1ot1 lncRNAs were found to be enriched in repressive chromatin fractions [29,30].
With advancement in RNA sequencing technologies, and improved efficiency of library preparation methods, it was possible to combine RNA sequencing with ChRIP pull down RNAs (ChRIP-seq). Eventually, this procedure was modified by combining photoactivatable ribonucleoside-enhanced crosslinking followed by high-throughput RNA-sequencing to characterize lncRNAs that are associated with different chromatin fractions [94]. Using this modified ChRIP assay, repressive chromatin was purified using antibodies against H3K27me3 and EZH2, and sequencing of RNA purified from these fractions identified 276 commonly enriched chromatin-associated lncRNAs. EZH2, a component of the PRC2 complex, catalyses trimethylation of lysine 27 of histone H3 (H3K27me3), a repressive histone modification associated with gene silencing [33]. Thus, the lncRNAs that were commonly enriched in both the H3K27me3 and EZH2 immunopurified chromatin fractions were referred to as repressive chromatin associated lncRNAs. This study identified MEG3 lncRNA as one of the top and highly conserved inactive chromatin associated lncRNAs, which was found to regulate TGF-β pathway genes through formation of RNA-DNA triplex structures. This study provided a flexibility to study lncRNAs that are functionally enriched in distinct chromatin domains as defined by signature histone modifications. In addition, using the ChRIP assay, active chromatin was purified with antibodies against H3K4me2 and WDR5. WDR5, a part of MLL1/MLL complex, catalyses the formation of H3K4me2 and H3K4me3. RNA isolated from the immunopurified chromatin fractions identified 209 chromatin associated lncRNAs [28], commonly enriched in both the H3K4me2 and WDR5 immunopurified chromatin fractions, and named these lncRNAs as active lncCARs. 43% of these active lncCARs mapped to divergent (XH) transcription units. Active XH transcription units were identified to be enriched with H3K4me2, H3K4me3 and WDR5. Active XH CARs depletion resulted in the loss of expression of the corresponding protein coding genes along with loss of H3K4me2, H3K4me3 and WDR5 at the active XH promoters. This study unravelled a new aspect of chromatin-based regulation at the divergent XH transcription units by this newly identified class of H3K4me2/WDR5 chromatin enriched lncRNAs. This approach of identifying EZH2 and WDR5 interacting lncRNAs, enriched in the H3K27me3 and H3K4me2 chromatin regions, respectively, was one of the initial approaches of mechanism-based screening for functional chromatin bound lncRNAs.

Profiling Interacting RNAs on Chromatin by Deep Sequencing (PIRCh-seq)
PIRCh-seq was developed to identify chromatin-associated transcriptome using antibodies recognizing histone H3, and six other distinct histone modifications associated with both active and repressive chromatin states [95]. This study additionally integrated the profiles of RNA secondary structure and RNA m6A modification to identify RNA sequences that are in contact with chromatin. Further, the authors have also characterized single nucleotide variants that define allele-specific RNA-chromatin interactions. This study has many parallels with the previous studies published using ChRIP-seq [28,33]. Both PIRCh-seq and ChRIP-seq techniques were based on the same principle i.e., antibody-based chromatin RNA immunoprecipitation. Thus, it would be interesting to compare the data obtained from these techniques to identify common chromatin associated lncRNAs which may play an important role in organizing different chromatin compartments.

GRID-seq
Global RNA interactions with DNA by deep sequencing (GRID-seq) comprehensively characterizes all potential chromatin associated RNAs and their cognate DNA binding regions in an unbiased fashion [96]. This exploits the principle of proximity ligation where a bivalent linker is used to ligate RNA to DNA in situ on fixed nuclei. This approach identified distinct classes of cisand transacting chromatin associated RNAs that included large sets of both coding mRNAs and ncRNAs that bind to active promoters and enhancers, especially super-enhancers ( Table 2).

MARGI-seq
Mapping RNA-genome interactions (MARGI-seq) is also based on proximity ligation where chromatin associated RNAs were ligated to target DNA using specially designed RNA and DNA linker sequences. Successfully ligated products, in the form of RNA-linker-DNA, are selected and converted to cDNA and subjected to paired-end sequencing [97]. By using human pluripotent embryonic stem cells and human embryonic kidney cell lines, MARGI-seq has identified several chromatin-associated RNAs, including well characterized lncRNAs with chromatin regulatory properties such as Xist, NEAT1 and MALAT1. Most of the MARGI-reported lncRNA attachment regions across the genome are enriched with active histone modifications such as H3K27ac and H3K4me3.

ChAR-seq
This is yet another high-throughput method to characterize RNA-chromatin interactions based on proximity ligation where RNA-DNA ends were preserved in the context of chromatin followed by in situ ligation of RNA with oligonucleotide bridge containing biotin modification and DpnII restriction site. After second strand synthesis using oligonucleotide bridge, DpnII digestion and ligation reaction were performed to capture DNA-RNA contact points [98]. This method was employed in Drosophila cells and characterized three types of RNAs: nascent RNA transcripts close proximity to their start sites, small RNAs involved in transcription elongation and RNA processing, RNAs involved in dosage compensation.
In all of the above methodologies, RNAs that were found to be chromatin bound were mostly exonic or intronic mRNA population compared to lncRNAs/ncRNAs. In the majority of these technologies, except ChRIP-seq and PIRCh-seq, nascent RNAs are not excluded in their pulldowns. A comparative analysis of the data generated from these various approaches has been summarized and presented in Table 2.
One explanation is that all these techniques (except ActD treated ChRIP-seq) are identifying transcription dependent or transcriptionally coupled targeting of lncRNAs to the chromatin. Arguably, any positive correlation between nascent and steady state levels of an RNA and its chromatin enrichment is an indicative of transcriptional background. Although, nascent transcripts and/or the act of transcription plays an important role in gene regulation, in this review, however, we would like to focus our discussion mainly on mature functional chromatin bound lncRNAs. Moreover, each of the studies have re-validated chromatin targeting of already well characterized abundantly expressed lncRNAs, rather than identifying new targets. This argues for strategies that might help to reduce co-transcriptional purification of RNAs with chromatin. Both RNA and non-RNA centric approaches/techniques identified genomic targets of individual lncRNAs, and also global targets of chromatin bound lncRNAs, respectively. Cumulatively, these evidences emphasize that on a global scale chromatin targeting of lncRNAs is a prominent mechanism and that these chromatin targeted lncRNAs exert their functionality mainly by fine tuning chromatin architecture resulting in an altered transcriptional readout. With the identification of such large-scale global chromatin association of several lncRNAs, the next pertinent essential question was to address the mechanism of chromatin targeting of lncRNAs.

Mechanisms by which lncRNA Targeted to Chromatin
The most persisting and pertinent question in this field has been to identify and establish unifying principles that can define and predict the functionality of lncRNAs and in a sense provide "specificity" to their modus operandi. Recent computational approaches have been trying to find signature sequence motifs in lncRNAs that can define or predict their interacting protein partners, localization and function [101][102][103][104]. Since there is no sequence conservation among the functionally conserved lncRNAs across the evolutionary spectrum, it is not advisable to look for sequence motifs that can dictate the association with DNA or transcription factor. However, by comparing the transcriptomes of 17 species, short patches of sequences were identified in lncRNAs that are evolutionary conserved and also full length orthologous lincRNA sequences from different species [102]. The latter effort has opened up a new way of looking into possible sequence information to identify conserved motifs that signify functionality. A more recent study used sequence comparison method to deconstruct linear sequence relationships in lncRNAs and evaluated similarity based on the abundance of short motifs called k-mers. Despite lack of sequence homology, lncRNAs with related functions had similar k-mer profiles, and also k-mer profiles correlated with protein binding and subcellular localization of lncRNAs [104]. Interestingly, a recent investigation by functionally screening the libraries of short fragments tiling across nuclear enriched lncRNAs and mRNAs, identified short sequences from Alu repeats, and C rich motifs that dictate the nuclear localization. Furthermore, the study implicated hnRNPK protein in the nuclear accumulation of lncRNA and mRNAs [103]. Certainly, the latter evidence constitutes a significant advancement in identifying RNA sequences that dictate function and localization, but we are far from understanding how lncRNAs are targeted to chromatin in a sequence specific fashion. In several contexts, the binding sites for several chromatin regulatory RNAs were characterized on a genome-scale. In addition, as explained earlier, several novel technologies were developed to characterize global RNA-DNA contacts. Surprisingly, none of these studies provided functional sequences that dictate chromatin targeting. However, global chromatin occupancy of MEG3, HOTAIR and roX lncRNAs revealed enrichment of GA rich sequences, which are potential landing sites for these lncRNAs. Although, common GA rich sequences were identified among the three RNA pulldowns, how three RNAs obtain specificity in targeting to their target genes is currently unknown [33,34,101]. Based on the published literature, here we summarize the mechanisms by which lncRNAs are targeted to chromatin. In principle, RNAs may be associated with chromatin via one of the three modes ( Figure 2).

1.
Histone modifications, chromatin and DNA modifiers in the chromatin enrichment of lncRNAs: lncRNAs, which act as a scaffold and/or guide, are targeted to chromatin by proteins having dual RNA-and DNA binding capabilities like hnRNPK [71], PGC1α [105], PRC2 [65,66], YY1 [68,69], CTCF [70], DNMTs [106]. Alternatively, lncRNAs get targeted to chromatin by interacting with RNA binding proteins (RBPs) that facilitate interaction with additional DNA binding proteins, like hnRNPU [89] (Figure 2). It is important to emphasize here that both cisand trans-acting lncRNAs can be targeted in this way, contrary to the prevailing view that cis acting chromatin bound lncRNAs are mostly coupled to transcription [107][108][109][110] In contrast to the actual definition of cis action being "on the same chromosome", but over time it has been erroneously conceptualized as "action restricted to site of synthesis/transcription". The best studies of cis regulation of chromatin bound lncRNAs comes from classical genomic imprinting loci where imprinted lncRNAs are monoallelically transcribed and are targeted to silence multiple genes on the same chromosome as exemplified from studies of mouse and human Kcnq1ot1 lncRNAs [29,30,32], Airn [42,111,112], Xist [42,112,113] etc. Chromatin targeting of H3K4me2 and WDR5 bound lncCARs (Active XH lncCARs) have been shown to be essential in maintaining active transcription of neighboring protein coding genes [28]. Chromatin targeting of active XH lncCARs occurs in part via WDR5 which has the potential to interact with both RNA and H3K4me2, an active histone chromatin mark. Thus, divergent transcription units enriched with H3K4me2 could recruit active XH lncCARs via WDR5. Similarly, recruitment of inactive CARs to their target genes could in part occur via EZH2, a PRC2 component with potential for the interactions with RNA and histone H3K27me3 [33].

2.
RNA:DNA triplex: Formation of triple helix nucleic acid structures involves Hoogsteen base-pairing interactions between RNA and the major groove of double-stranded DNA [114,115]. This RNA-DNA interaction has a stringent requirement for both polypurine sequence in DNA and a length restriction. Triplexes can form both in vitro and in vivo contexts and factors like GC content, extent of sequence complementarity, histone H3 tails, triplex target site (TTS) proximity to nucleosome entry site and open chromatin structure influence the stability of triplexes [116]. Multiple lncRNAs (having triplex forming sequence called Triplex Forming Oligonucleotides or TFOs) appear to use this mechanism to directly target specific complimentary sequences across the genome (Triplex Forming Regions or TFRs) to exert their regulatory functions ( Figure 2, Table 3). Best examples of DNA:RNA triplex formation by lncRNA with specific DNA sequences include pRNA, which represses in cis the transcription of rRNA genes by targeting DNMT3b to their promoters [56], Fendrr which regulates developmental genes by recruiting the PRC2 complex [50], PARTICLE which regulates the expression of MAT2A in response to low-dose radiation [117], MEG3 which guides PRC2 to the regulatory regions of TGF-β pathway genes [33] and PAPAS which guides the CHD4/NuRD complex to the rDNA promoter [118]. Recently, a global approach mapped RNA: DNA triplexes genome-wide using protein free-nucleic acids, isolated from chromatin. This approach re-validated known triplex forming lncRNAs and also identified several novel candidate lncRNAs that may execute their actions via triplex formation [119]. Besides the latter experimental approach, a computational method called Triplex Domain Finder (TDF) has been developed to detect triplex forming regions in lncRNAs, and triplex target regions across the human genome. This method successfully validated DNA-binding domains of known triplex forming lncRNAs such as Fendrr, HOTAIR and MEG3 [101]. Two important aspects need to be considered about specificity of triplex formation mediated targeting of lncRNAs to chromatin. Firstly, there is generic sequence feature (polypurine stretch or TFOs) in lncRNAs that dictates its ability to form triplex at the genomic regions with TFRs. This still lacks one to one specificity. The question in that case remains whether any lncRNA with triplex forming capability can be targeted to all the "triplex targetable" i.e TFRs at genomic locations? and secondly, which factors initiate, promote and maintain triplex formation at target locations and that in principle is there a possibility of any difference between cis and trans targeting of triplexes lncRNAs (Figure 2)?   [141][142][143]; [144] 3. R-loop formation: R-loops are three stranded RNA/DNA structures, which form co-transcriptionally at guanine-rich clusters (G-clusters) in the template strand during gene transcription [145,146]. It has been shown that RNAs containing four or more consecutive guanine residues near the 5' end facilitates R-loop formation. R-loops, in the mammalian genome, predominantly seen at promoters and enhancers associated with GC-skewed sequences [147,148] and their formation and dynamics have been linked to transcriptional activities under physiological conditions [149,150] (Figure 2, Table 3). Recent evidence suggests that R-loop formation by lncRNAs seem to affect gene expression in cis through diverse mechanisms. For example, transcription of VIM-AS1 promotes the formation of R-loop structure that was found to promote transcriptional activation of its neighboring VIM gene and destabilization of R-loop structure affected VIM expression [140]. In another context, lncRNA GATA3-AS1 was found to be required for the formation of permissive chromatin marks H3K27 acetylation and H3K4 di/tri-methylation, at the GATA3-AS1-GATA3 locus. Mechanistically, GATA3-AS1 interacts with MLL1 methyltransferase and tethers to this gene locus via formation of DNA-RNA hybrid (R-loop) [151]. R-loop formation is a part of co-transcriptional process that targets nascent transcripts to chromatin in cis. Theoretically if any RNA with a GC-skewed sequences have the potential to form R-loop, then the pertinent question is how and in combination with which specific in-cis or transfactors define the cisand/or transmechanism of actions? (Figure 2).

lncRNA-Dependent Mechanisms in Chromatin Organization
First clue linking RNA and chromatin came from mammalian X chromosome inactivation wherein Xist localizes along the X chromosome undergoing inactivation. A decade after initial observation, and with the development new technologies such as chromatin immunoprecipitation (ChIP), RNA immunoprecipitation (RIP) and ChRIP, ChIRP and ChOP, we began to understand RNA-dependent chromatin changes at the onset of X inactivation, and during imprinted gene silencing. Efforts to understand common contact points along the X chromosome using ChIRP, RAP-DNA seq were not informative, rather these techniques noted that overall 3D structure allows Xist to spread across the inactive X chromosome ( Figure 3A). Interestingly, transcription factors such as hnRNPU, YY1, with Dual RNA and DNA binding specificity, have been implicated in-cis function of Xist. Like Xist, Kcnq1ot1 lncRNA has also been shown to employ 3D contacts at the genomic level in executing allele-specific gene silencing [152,153] (Table 1) ( Figure 3B). However, active XH lncCARs from divergent transcription units interact with WDR5-methyl transferase complex through the RNA binding pocket of WDR5 and recruited to neighbouring protein coding gene promoters in cis via H3K4me2 (as WDR5 reads H3K4me2). Alternatively, active XH lncCARs directly binds to H3K4me2 enriched chromatin at the neighboring protein coding promoters and acts as a scaffold for the efficient docking of WDR5-methyl transferase complex which is necessary to maintain H3K4me2 levels and also for the conversion of H3K4me2 to H3K4me3. ( Figure 3C). In this case, in-cis targeting of lncCARs is required to maintain transcriptional activation of neighbouring genes and does not involve spreading of the lncRNA as observed in the context of Xist and Kcnq1ot1 mediated transcriptional repression ( Figure 3A,B). Consistent with a role of lncRNAs in the organization of higher-ordered chromatin structure, a recent investigation implicated CTCF-RNA interaction in 3D organization of the genome wherein CTCF association with DNA is dependent on RNA and that the deletion of RNA binding zinc-finger motifs from CTCF resulted in loss of its interaction with RNA. These results convincingly document the interdependence of RNA and chromatin architectural proteins in higher-order chromatin organization. This phenomenon was also noticed in the context of eRNAs. eRNAs, bidirectionally transcribed from active enhancers, have been implicated in chromatin looping between enhancers and their cognate promoters. It is not clear whether this chromatin looping involves the interaction between enhancer binding proteins and eRNAs. However, a recent study suggested that a regulatory motif in CBP, which is enriched at the active enhancers, make specific contacts with eRNAs and this RNA interaction is required for CBP HAT activity [154]. Even lncRNAs have also been implicated in trans-chromosomal interactions as exemplified in the case of Firre lncRNA. Firre by interacting with hnRNPU, acts as a regulatory framework for promoting inter-chromosomal interactions. Considering that lncRNA is an important constituent of interphase [57] and meta-phase [155] chromatin compartments and that its functional role in gene expression by promoting intra-and inter-chromosomal interactions, we can expect to see greater insights into lncRNA mediated chromatin organization with the coevolution of technology that probe RNA-chromatin interactions.  transcribed (arrows depicting transcription) from an unmethylated paternal ICR (imprinted control region) (sky blue box), located within the intron 10 of its sense partner gene Kcnq1 gene. It functions in-cis to repress (blunted arrows represent transcriptional repression) lineage specific imprinted genes. Kcnq1ot1 (light green) interacts with and recruits G9a-PRC1-PRC2 complex to the promoters of placental linage genes (Blue boxes), while it additionally interacts with DNMT1 and targets G9a-PRC1-PRC2/DNMT1 complex to the promoters of genes that are silenced in all tissues in lineage independent fashion (Red boxes). The targeting and spreading to specific promoters across 1 mega-base region, unlike the whole X-chromosome spreading by Xist, is mediated by the three-dimensional folding of the chromatin. (C) Active XH lncCARs exemplify the case of in-cis targeting of lncRNAs to specific promoter regions of neighbouring protein coding genes to maintain their transcriptional activation. In the model, either the XH lncCARs first binds to WDR5-methyl transferase complex through the RNA binding pocket of WDR5 and then targeted (dashed black arrows) to chromatin at H3K4me2 (WDR5 reads H3K4me2), or they can directly bind H3K4me2 enriched chromatin (dashed red arrows) and act as a scaffold for the efficient docking of WDR5-methyl transferase complex which is necessary to maintain H3K4me2 levels and catalyse the conversion of H3K4me2 to H3K4me3. The maintenance of H3K4me2 marks is possibly mediated by a different WDR5methyl transferase complex that is independent of the role of XH lncCARs.

Conclusions and Future Outlook
With the development of novel technologies to probe lncRNA-chromatin interactions and lncRNA-chromatin structure regulation, we now realize the extent of lncRNA involvement in chromatin organization. Despite improvement in our understanding of lncRNA role in chromatin-based gene regulation, biochemical and molecular details are still missing. For example, the RNA-binding specificity of PRC2 and its regulation by lncRNAs have been debated [156][157][158] due to contradictory experimental evidences. Additionally, interpretation of the role of lncRNAs in interacting or recruiting any chromatin modifiers based on existing methodologies should be concluded with more caution as exemplified by the recent observation that HOTAIR mediated repression is not dependent on interaction with PRC2 [159,160]. One pertinent and essential question arising from these studies is to understand the precise unifying mechanism for chromatin targeting of lncRNAs. There are two major impediments in understanding of chromatin targeting of lncRNAs. Firstly, lack of a unifying intent and approach to consolidate chromatin associated lncRNA data generated from different methodologies that is followed up by a systematic validation of interactions with novel lncRNAs rather than the same handful of well characterized imprinted lncRNAs such as HOTAIR, MALAT1, NEAT1 and MEG3 lncRNAs. Mechanistic validation of more novel chromatin bound lncRNAs would possibly lead to the identification robust unifying principles that dictate all these different modes of chromatin targeting. Secondly, barring handful of lncRNAs, chromatin interacting maps for the majority of functional lncRNAs is lacking. Motif analyses from the chromatin interacting maps of a few lncRNAs reveal enrichment of GA rich sequences. These polypurine rich sequences (TFRs) suggested to form triplexes with purine rich RNA fragments (TFOs). Two important aspects one needs to consider about the specificity of triplex formation mediated targeting of lncRNAs to chromatin. Firstly, polypurine stretch in lncRNAs dictates its ability to form triplex at genomic regions with TFRs. One major drawback with the latter suggestion is that it lacks one to one specificity. The question in that case remains whether any lncRNA with triplex forming capability can be targeted to all the "triplex targetable" i.e., TFRs at genomic locations? And secondly, which factors initiates, promotes and maintains triplex formation at target locations and that in principle is there a possibility of any difference between cis and trans targeting of triplexes lncRNAs? In addition, it would be interesting to know whether lncRNAs with potential to form triplexes have hitherto unknown conserved motifs or secondary structures that aid in targeting to chromatin. Addressing these issues would significantly enhance our understanding of mechanisms that dictate lncRNAs association with chromatin.
One of the interesting outcomes of the high-throughput approaches that were developed to probe RNA chromatin interactions was that high-amount mRNA from protein coding transcription units. It is currently not clear about the significance of chromatin association of mRNAs from protein coding genes. These mRNAs were also seen enriched in the chromatin fraction in the techniques where transcriptional inhibition was carried out using Actinomycin D. It is known that many mRNAs have noncoding variants but the proportion of enrichment of noncoding variants was not that higher compared protein coding variants. Hence it will be interesting to characterize the noncoding functions of protein coding mRNAs in chromatin organization and gene expression. Another interesting possibility might involve the role of transcription where the involved protein complexes associate in a sequence dependent (transcript per se) or in a sequence independent manner to target RNAs to the chromatin that facilitates an efficient regulatory condensate formation via phase separation. Both active and inactive phase separated condensates have been shown to coexist physiologically [161] and carryout gene activation or inactivation, respectively [162][163][164]. Hence it will be a very interesting possibility to explore the functional role of RNA in the formation phase separated regulatory condensate. Thus, it is now clearly evident that post high-throughput technology revolution, RNA component is taking center stage in modern biology research due to its functional versatility.