3.1. Antibody-Independent Profiling of R-Loop Complexes: A General Approach
We devised a simplified, quick, and inexpensive method for the detection of DNA/RNA hybrid regions in a complex genome (
Supplementary Figure S1). We first validate with telomeres associated TERRA, taking advantage of its abundance as a repetitive element with multiples sites of detection. Unlike the current DRIP assay, this is a direct approach, which is not based on antigenic recognition or RNase H recognition of the hybrid, with the advantages of avoiding possible artifacts. A variant method was developed to obtain stable DNA/RNA hybrids that are maintained during the TRIzol chloroform extraction procedure [
20,
21]. It was established that the TRIzol reagent permits the sequential recovery of RNA, DNA, and proteins. First, cells were lysed with TRIzol buffer, adding chloroform led to separation of water and phenol/chloroform phases, a clear upper aqueous layer with RNAs, interphase, and a red lower layer with DNA and protein. RNAs were recovered by precipitation with isopropanol from the aqueous phase while DNA was recovered with ethanol precipitation from the interphase. In the most generally used TRIzol chloroform extraction procedure [
20,
21], a small but reproducible amount of RNA is retained, together with high molecular weight genomic DNA, at the chloroform-water interface, while the free RNAs molecules of a variety of molecular sizes are found in the upper aqueous phase. The material recovered from the chloroform-water interface as a whole (genomic DNA with DNA/RNA hybrid) was dissolved in Tris-HCl 10 mM, EDTA 1 mM, and immediately ethanol precipitated to remove trace of Trizol and chloroform. The pellet was then dissolved in proteinase K hydrolysis buffer to completely remove all proteins. DNA and DNA/RNA complexes were then recovered by chromatography on a DNA binding Zymo-SpinTM column (Zymo-Research Corp Irvine CA, USA) to eliminate all forms of remaining impurities present in the extraction, free RNAs, proteins, and degraded nucleic acids. As illustrated below in the case of the telomeric complex, a specified genomic area could be analyzed separately by cleavage of genomic DNA with an appropriate restriction enzyme followed by gel electrophoresis, in the absence of a nucleic acid denaturation step and direct transfer from the gel to a nitrocellulose filter. Negatively charged nucleic acids (single strand DNA and RNAs) are efficiently retained on the positive charged nitrocellulose membrane, while double stranded DNAs fragments are poorly or not at all retained. These conditions have been previously documented in T. Maniatis’ classic molecular biology book from (Molecular cloning: A laboratory Manual) which allow the transfer of single-stranded RNA or DNA fragments while double-stranded fragments are not retained on the membrane. In addition, after transfer to a nitrocellulose membrane, the signals are revealed with a specific validated probe of telomere with motif (CCCTAA) [
8] repeats to reveal repetitive motifs of telomeres containing Leading (DNA) and TERRA fragments. Alternatively, whole preparations of the DNA-associated RNA were obtained by extensive hydrolysis of pancreatic DNase and a further RNA purification step via the Zymo-SpinTM RNA binding column (Zymo-Research Corp Irvine CA, USA) leads to DNA associated fraction of the RNAs which here we call the D-RNAs fraction for RNA sequencing analysis compared to the R-RNAs of the free fraction (see
Supplementary Figure S1).
3.2. The Telomeric TERRA Complex
The ends of the human and mouse linear chromosomal DNA molecules of individual chromosomes are long TTAGGG/CCCTAA repeats. After the discovery of the homologous UUAGGG-repeated TERRA [
4,
5,
6,
7,
8,
9,
10,
11,
20,
21,
22], the DNA telomeric repeats were thought to be engaged in three-stranded R-loops with TERRA. Attempts in normal tissues to directly show evidence of R-loop structures has so far only been partially successful. In cell culture, mostly from two human pathologies, could R-loops be revealed by DRIP assay, namely telomerase-negative (ALT) cancers [
23] and ICF syndrome cells [
24]. The structure of the telomeres of healthy human cells, as well as of any murine cells, remains to be directly shown. One main site of TERRA transcription was identified in mouse and human cells [
9,
25], and it has even been considered that transcription from individual sub-telomeric promoters on every chromosome is a unique characteristic of a class of malignancies (ALT cancers).
Here, we generated results in parallel on human and mouse sperm cells. Sperm cells were chosen for the first trial because they have silent transcription, and the majority of cytoplasmic RNAs are already removed at the compacting stage of spermiogenesis, thereby minimizing the risk of contamination by cytoplasmic RNAs. The total nucleic acid fraction recovered from the TRIzol chloroform-water interface was ethanol precipitated and after purification (see
Supplementary Figure S1 and Methods for protocol) was subjected to Msp1 (CCGG) restriction enzyme cleavage for resection of the DNA telomeric sequences up to the first upstream CCGG site, less than an average of 2000 bp from the repeats on every chromosome. Each sample was then submitted to gel electrophoresis in 8% acrylamide which produced a sharper band followed by electro-transfer to nitrocellulose under conditions of binding of the single-stranded RNA and DNA material. Without the NaOH step before the transfer, double-stranded nucleic acid materials in this condition are not retained on the nitrocellulose membrane, and only hybrid nucleic acid fragments with single-stranded nucleic acid (DNA or RNA) or hybrid molecules containing a region of “single-stranded” DNA or RNA molecules with double-strand fragments such as telomeric R-loops structure are retained. The positively charged nitrocellulose membrane retains single-stranded RNA or DNA. After hybridization with oligonucleotide probes complementary to the telomeric repeats and TERRA revealed a strong, unique homogeneous signal. Identical results were generated in every cell type tested, exemplified in
Figure 1, for three laboratory healthy mouse lines (
C57BL/6,
B6D2, and
129/sv) and the three tissues analyzed, brain (B), total testis (T), and purified epididymal sperm (S).
Figure 1a shows DNA/RNA complex detection, the pattern of the material from the chloroform-water interface DNA-associated RNA fraction that we named “Tatar-blot” for Telomere associated TERRA. The probe (CCCTAA)
n used in
Figure 1a reveals Leading strand (5′-3′) telomeric DNA and, if present UUAGGG TERRA (see
Figure 2 for evidence). In addition, the presence of single-stranded DNA or RNA or DNA/RNA hybrid fragments in the complex is the only condition that will allow the DNA fragments to be maintained on the nitrocellulose membrane, especially without a double-stranded fragment denaturation step. With the same conditions and protocol applied to human sperm, we also observed one major signal. Visualization of the complex requires resection from chromosomal sequences. The clearest results were generated by cleavage with Msp1 restriction enzyme at (CCGG) sites present at least once in the immediate subtelomeric region less than 1–2 Kb from the ends of all mouse and human chromosomes so that the contribution of chromosomal sequences is minimal to facilitate transfer and retention on the nitrocellulose membrane. BamH1, Bgl2, Alu1, Mse1, and Dpn1 restriction endonucleases tested (
Figure 1b) did not generate comparable results, most probably because they cleave at greater and variable distances from the telomeres, thus generating larger double-stranded fragments, a limiting factor for efficient retention on nitrocellulose.
Figure 1b shows BamH1, Msp1, HpaII and Alu1 restriction enzymes with nucleic acid materials collected from mouse tissues (brain, testis, and sperm) and human saliva. In agreement with previous observations of increased methylation of CpG sites in subtelomeric regions [
26], cleavage with Hpa2, a methylation-sensitive isoschizomer of Msp1, failed to show evidence of the complex (
Figure 1b). Specificity was determined by hybridization of the transfers to radioactive probes for other repetitive (SINE, Xist-A) or single-copy genomic sequences (sequences can be seen in
Supplementary Table S2), none of them generating a significant signal. Electrophoretic migration of the complexes was always equivalent to that of a high molecular weight linear DNA molecule (
Figure 1a,b) but was clearly not informative as to the actual size. Here, with a universal probe that reveals all telomeric repeats, the sizes of the fragments were not studied due to the complex structure of R-loop with variable repeat length and also because R-loops migration under these conditions does not sufficiently allow for separation of the fragments to visualize individual chromosomes. In addition, the absence of NaOH and HCl treatment in “TaTar-blot” analysis is the only way to preserve the structure of the R-loop of the DNA/RNA hybrid intact. It is why the size determination of the fragments was not informative at this level of analysis.
When the same extracts were further processed for Southern blot analysis on agarose gels (0.8%) followed by 45 min in 0.5 N sodium hydroxide (a condition that denatures double-stranded DNA and removes all RNA), transfer and hybridization (
Figure 1b) revealed signals with high molecular weight molecules and below with a lower signal smeary profile (right with longer exposure of autoradiogram) starting with the expected large size of the mouse telomere (>12 kb). It is known that mouse telomeres are long with variable lengths and that to separate these long DNA fragments requires pulsed-field gel electrophoresis; therefore, routine agarose electrophoresis gel does not separate very long DNA fragments. To reveal large fragments of the DNA after pulsed-field gel electrophoresis NaOH and HCl treatment steps of the agarose gel are required before transfer to the membrane. NaOH and HCl treatment destroys the R-loop hybrids, which is why this could not be used in “Tatar-blot” protocol.
In addition, these complexes did not appear to depend on the telomerase, as they were observed in three generations of telomerase-negative mouse sperm cells
Terc−/− (
Figure 1c). The same profile was generated by the probe CCCTAAn (
Figure 1c top), which revealed TERRA and a single-stranded leading telomeric DNA strand, and the TTAGGGn probe (
Figure 1c bottom), which revealed a lagging DNA strand. These results revealed that complexes transferred were bound to the membrane with both single-strand telomeric DNA and RNA as well. The overall signals revealed between the first and third generation of
Terc−/− did not vary much, possibly because the telomere shortening timing in sperm cells is different from reported somatic cells, and this last point requires further investigation. The same results were also obtained with
Tert−/− sperm (not shown). In the case of the
Tert−/− and
Terc−/− mutants, they could also be evidenced in embryonic fibroblast cultures from homozygous crosses during the first three generations in which the mutants maintain telomeres [
27,
28] and after immortalization in a culture of mutant cell lines (M.R., unpublished).
Complexes with the same electrophoretic profiles were identified in different somatic cell types. To minimize contamination of the cytoplasmic RNA, TRIzol extraction was applied twice on the somatic cells. DNA-associated RNAs are stably maintained with genomic DNA even after twice treatment with TRIzol. Comparable profile and amounts in every mouse cell tested, from embryonic fibroblasts to adult tissues (testis, brain, liver, and kidney) as well as cultivated cell lines and short-term cultures of differentiated cells, in
Figure 1a,b are exemplified for mouse brain and testis extracts. This was also the case for human tissues (
Supplementary Table S1), including saliva, blood, and sperm (
Figure 1a,b). Identical electrophoresis profiles were generated for murine and human sperm extracts by probes against the CCCTAA motive (
Figure 1a). To evaluate nucleoplasmic free TERRA molecules, RNAs were precipitated with isopropanol from the water phase of the TRIzol extraction and analyzed in
Figure 1d by Northern blot assay (1% denaturing agarose gel with formaldehyde), the oligo probe (CCCTAA motif) revealed the 3 kb major TERRA fragments recovered from the TRIzol aqueous fraction. The quantity of the free TERRA/cell number (10
6 cells/lane) extract was much higher in the testes and brains compared to sperm cells.
Properties of the complexes were compatible with the R-loop structure schematically drawn in
Figure 2 (
Figure 2a), as revealed by the G-rich strand of telomeric DNA displaced by the TERRA hybrid. To evaluate nucleic acid complexes in Tatar assay (
Figure 2b), nucleolytic attack was performed by incubating extracts (20 µL) for 30 to 60 min with either RNaseA 0.5 µg/µL (which removed only free strands of RNA and not RNA hybridized with DNA), RNaseH 1 u/µL (which removed only RNA hybridized with DNA), or DNase 10 u/µL (
Figure 2b) which removed DNA. The electrophoretic signal disappeared after incubation with DNase (removal of all DNA), pancreatic RNase (removal of all RNA), formaldehyde (completely denatured R-loop complexes), or RNaseH, while it was not affected by RNase A (
Figure 2b). Higher signal with RNase A could be explained by the removal of the free strand of RNA not hybridized with DNA that could facilitate migration and transfer of the complexes to the membrane. Another feature of the complex was the slower rate of migration during gel electrophoresis performed in the presence of 20 µM PhenDC3(4), which reacts with G quadruplexes [
29] that blocked the structure (
Figure 2b) as the band was shifted to the top of the gel.
The same results were also generated in parallel on a series of human and mouse cells, ex vivo organ extracts that were mostly normal cells, but included cultured cell lines (
Supplementary Table S1), and the human U-2 OS cancer cell line (ATCC
® HTB-96™) as a positive control, in which DNA/RNA telomeric hybrids have previously been identified by DRIP [
23].
3.3. Transcription of Individual Telomeres from Local Subtelomeric Promoters
A key aim of this paper was to establish whether the observed structure was present in only one or a few telomeres or, on the contrary, was a general feature of the genome. It has been generally assumed that R-loops are generated by transcription from adjacent promoters. On the other hand, a major part of TERRA has been reported in cell culture to be transcribed from chromosome 18 in mouse cells and from chromosome 20q in humans [
9,
25] so that our data are so far compatible either with a unique R-loop on these chromosomes or with binding in trans to other chromosomes of RNAs from the mouse 18 and human 20q loci. However, it appeared that unique subtelomeric sequences of different chromosomes are represented in the RNA from the DNA-associated fraction. We generated probes for sequences between the first 5′ Msp1 site just before the G-rich repeats of the telomeres of chromosomes 9, 17, 18, 19, X, and Y telomeres and probes for the complementary strand up to the C-rich repeats (indicated “xxx/x’x’x’” in
Figure 2a). By using the UCSC genome browser Blat tool [
30], we ensured that these flanking regions are unique in the whole genome. All the probes (
Supplementary Table S2) generated positive hybridization signals in the complex revealed by “Tatar blot” as illustrated in
Figure 3 for the sub-telomeric sequences of chromosomes 9, 17, 18, and Y. As controls chromosomal S6 (a repetitive sequence) and none of the probes for sequences on the centromere side of the terminal Msp1 restriction enzyme sites generated signals. “Tatar blot” analysis indicates specific nascent TERRA from each chromosome ends.
We further ascertained by Illumina high-throughput analysis of the DNA-bound RNA molecules (see
Supplementary Figure S1b for RNA preparation) that their sequences are compatible with transcription in each telomere from a local promoter
Figure 4 (
Supplementary File S2). Starting from purified mouse sperm, we first confirmed by RNA-seq that a fraction of the DNA-associated RNA molecules amounting to 0.1 percent of the reads showed the characteristic UUAGGG TERRA repeats (minimum four repeats,
Supplementary Figure S3c). We then searched the RNA sequence libraries for chromosomal sequences next to the repeats and extended the search in the 5′ direction to collect upstream flanking regions in
Figure 4a. We again ensured that these flanking regions are unique in the whole genome. As exemplified for chromosomes 2 and 10 in
Figure 4b, their 5′ extension in the mouse genome indicated that nascent TERRA originates in each case from promoters in the immediate sub-telomeric region. Complementary evidence for the synthesis of these RNAs from distinct promoters is shown in
Figure 4b, with the heat map showing the percent identity matrix between each pair of sequences of chromosomes 2 and 10. The maximum identity is 93%, thus making it impossible that these reads could originate from a unique chromosome (
Figure 4b). The available sequence data and libraries allowed us to reach the same conclusion for chromosomes 2, 3, 5, 10, 12, 13, 14, 16, 19, and X see
Supplementary Figure S2. R-loop structures involving the products of local transcription, therefore, appear as a general feature—especially taking into account some constraints in the analysis, for instance, the fact that in the latest mouse assembly (mm10), chromosomes 4, 6, X, and Y remained to be sequenced up to the repetitive telomeric tract. Testes samples TD1 and TD3 also contain nascent TERRA in their DRNA fractions, as shown in
Supplementary Figure S2. In conclusion, Illumina high-throughput analysis of the DNA-bound RNA molecules completely confirmed the results of the molecular analysis of the “Tatar blot”.
DNA associated TERRA found from every telomeric end in mouse and human sperm cells is consistent with the idea that TERRA is expressed from subtelomeric promoters and remained associated at the telomeres ends. TERRAs that remained bound to the sperm genome were generated from the last wave of transcription before the compaction step at the late stage of spermiogenesis. Our results need to be compared with previous reports of major TERRA promoters in C18 (mouse) and 20q (human) chromosomes [
9,
25]. There is, in fact, an apparent contradiction since these promoters were identified in experiments in which TERRA was prepared by standard TRIzol extraction from abnormal cells such as cancer cells, and thus correspond to the nucleoplasmic RNA fraction (free RNA fraction in aqueous phase) thought to be involved in other, non-telomeric functions [
31,
32]. Thus, it is important to recover TERRA from both nucleoplasmic and DNA-associated fractions.
3.4. Extension Analysis to Other DRNAs with UUAGGG Motif Repetition
Since the association of TERRA transcripts with telomeres could be revealed by the proposed method, we also asked if TERRA hybrids located at defined non-telomeric sites could be detected from data generated by Illumina high-throughput analysis of the DNA-bound RNA molecules. In order to try to distinguish stable complexes of R-loops [
8,
33] and the transient association of the nascent RNAs during the process of transcription, we attempted to verify the data from the mouse DRNA-seq (mouse sperm and testes). In
Supplementary Figure S3a,b for a general evaluation of the RNA molecules associated with DNA (indicated “DRNAs”), the RNA-seq analysis was extended to all genomic regions containing transcripts in mouse sperm (indicated D1 and D3) and in the total testicular tissues, (TD1 and TD3) in the
Supplementary Figure S3a,b.
Genome-wide profiling shows representative landscapes of expression signals of chromosomes 2, 10, 12, and 18 (see
Supplementary Figure S4). For each chromosome, the green bar shows the location of TERRA-homologous regions (minimum four consecutive TTAGGG repeats, necessary to form respective G-quadruplex structures), which are present at several chromosomal locations in DRNA fractions.
In addition, research on known non-telomeric TERRA sites has identified a fraction of TERRA transcripts near the sex chromosome that undergo pairing in both sexes, the inactive X chromosome (Xi) of female cells, and the Y [
32] chromosome (see
Figure 5a). The Asmt locus on the X chromosome and Erdr1 locus on the Y chromosome have been proposed to pair together via UUAGGG repeats. An example of transcripts is visualized (
Figure 5a) within the pairing center in the testes and sperm RNA data. As shown in
Figure 5a, DRNAs transcripts from the Erdr1 locus with UUAGGG repeats were present on the Y chromosome. On the other hand, the Asmt transcripts were present on the X chromosome. These results indicate retention of PAR-TERRA transcripts in pseudoautosomal regions of X (Asmt) and Y (Erdr1) in the DNA fraction of the mouse sperm and testes. Other regions are exemplified with non-telomeric TERRA sites in
Figure 5b, on chromosomes 2, 9, 14, and Y, respectively.