Variability of Human rDNA and Transcription Activity of the Ribosomal Genes

Human ribosomal DNA is represented by hundreds of repeats in each cell. Every repeat consists of two parts: a 13 kb long 47S DNA with genes encoding 18S, 5.8S, and 28S RNAs of ribosomal particles, and a 30 kb long intergenic spacer (IGS). Remarkably, transcription does not take place in all the repeats. The transcriptionally silent genes are characterized by the epigenetic marks of the inactive chromatin, including DNA hypermethylation of the promoter and adjacent areas. However, it is still unknown what causes the differentiation of the genes into active and silent. In this study, we examine whether this differentiation is related to the nucleotide sequence of IGS. We isolated ribosomal DNA from the nucleoli of human-derived HT1080 cells, and separated methylated and non-methylated DNA by chromatin immunoprecipitation. Then, we used PCR to amplify a 2 kb long region upstream of the transcription start and sequenced the product. We found that six SNVs and a series of short deletions in a region of simple repeats correlated with the DNA methylation status. These data indicate that variability of IGS sequence may initiate silencing of the ribosomal genes. Our study also suggests a number of pathways to this silencing that involve micro-RNAs and/or non-canonical DNA structures.


Introduction
Human ribosomal DNA is represented by hundreds of repeats or units in each cell. Every repeat consists of two parts: a 13 kb long 47S DNA with genes encoding 18S, 5.8S, and 28S RNAs of ribosomal particles, and a 30 kb long intergenic spacer (IGS) [1][2][3]. The transcription of rDNA effected by RNA polymerase I (pol I) is very intensive; however, it does not take place in all the repeats. The transcriptionally active genes are characterized by the epigenetic marks of the active chromatin, including hypomethylation of the promoter and adjacent IGS areas [4][5][6][7]. A significant part in the regulation of the transcription activity is played by IGS. It has a complex structure and contains various functional or supposedly functional regions producing non-coding RNAs, which participate in pol I activity regulation, as well as stress reactions and other processes in the cell nucleus and cytoplasm [8,9]. Especially important for rDNA silencing is a 2 kb long regulatory region upstream of the transcription start site (URR). This region includes sequences producing two long non-coding RNAs (lncRNA), promoter RNA (pRNA), and promoter and pre-rRNA antisense RNA (PAPAS). The first of these molecules is transcribed by pol I from a region beginning at about −2 kb and stretching into the 47S DNA. The primary transcript is processed into a set of 150-300 nt functional molecules, which can inhibit pol I transcription initiation through the recruitment of the nucleolar remodeling complex (NoRC) [10,11]. PAPAS is produced in the anti-sense direction by RNA polymerase II (pol II) from a region covering the URR and a large part of the ribosomal genes; this lncRNA is actually represented by a heterogenous pool of 12-16 kb long transcripts arising from multiple start sites. PAPAS can inhibit rDNA transcription in different ways, mainly by recruiting the histone methyltransferase Suv4-20H2 to the rDNA promoter [12][13][14]. Apparently, both IGS and 47S DNA encode many other transcripts contributing to the regulation of pol I activity. They include various types of small RNAs, especially micro-RNAs (miRs) and RNAs produced by the transposable Alu elements [8]. Recently, it has been found that non-canonical structures, particularly abundant in the rDNA locus, such as R-loops, G-quadruplexes, intercalated motifs, and triplexes, may also be important factors in the silencing/activating of ribosomal genes [15][16][17][18][19]. E.g., pRNA and PAPAS were found to form triplexes in IGS, which seem to be an indispensible condition for the silencing effected by these lncRNAs [19].
However, although many factors engaged in the locus regulation have been discovered, the essential question remains unanswered: how are the rDNA repeats differentiated into active and silent? That is to say, why do some repeats pass into a state favorable for transcription, i.e., acquire the proper chromatin structure and mode of regulatory noncoding RNA expression, while the other repeats in the same cell remain silent?
The answer is probably suggested by the fact that the rDNA locus is highly variable, probably because of the multiplicity of the units, which causes instability [9]. In this study, we examine whether the differentiation of rDNA into active and silent units is related to the nucleotide sequence of IGS. We isolated ribosomal DNA from the nucleoli of human fibrosarcoma cells (HT1080), and separated methylated and non-methylated DNA by chromatin immunoprecipitation with antibodies against 5-methylcytosine. Then, we used PCR to amplify the regulatory region URR and sequenced the products to find the frequencies of all the significant variants. Our study reveals a correlation between the sequence variability in the region and its methylation status, indicating that certain single nucleotide substitutions and short deletions may play a key role in silencing ribosomal genes. The results of this work also suggest a number of pathways to this silencing that involve micro-RNAs and/or non-canonical DNA structures.

General Characteristics of the Observed Variability in the HT1080 Cells
In order to enrich the rDNA content, the DNA extracted from the isolated nucleoli was separated into the fractions of predominantly methylated and predominantly nonmethylated with subsequent PCR amplification of the 2 kb long regulatory region upstream of the transcription start (URR) ( Figure 1). Then, both fractions were subjected to deep sequencing (see Methods), which revealed multiple variants determined in reference to the human rDNA GenBank sequence U13369.1 (https://www.ncbi.nlm.nih.gov/nuccore/U1 3369.1?report=graph&from=19514, accessed on 1 October 2022).  unit including genes encoding 18S, 5.8S, and 28S ribosomal RNAs; intergenic spacer (IGS); promoter with upstream control element (UCE) and core promotor element (CPE); promoter and pre-rRNA antisense (PAPAS) and promoter-associated RNA (pRNA). (C) The regulatory region upstream of the transcription start site (URR); the fragments amplified by PCR are shown below. Variants with significant difference of methylated and non-methylated state are indicated by red arrowheads. Yellow rectangles represent regions of simple repeats Q1 and Q2. The green rectangle corresponds to the gene encoding hsa-mir-6724. Two green semicircles indicate regions corresponding to miRs produced outside the rDNA locus.
The results are presented in Tables 1 and 2. Among the variants that appeared with the frequency >5%, we found totally: 33 SNVs, 5 simple inserts and deletes (indels), and a number of variously sized (from 1 to 34 bp) deletions in two regions of simple repeats (CCCT)n: Q1 (41667-41686; n = 5) and Q2 (41842-41877; n = 9), both flanked by CT and GC upstream and downstream, respectively. One SNV and one insert (at 41006 and 41008) belonged to the Alu element; one SNV at 42945 is situated between the UCE and CPE of the rDNA promoter. In all, 21 variants occurred with low frequency (5-30%). In addition, 13 variants seem to be a specific feature of the studied cell line, since they appeared in 70-100% reads. Moreover, three variants were found in 30-70% reads and, accordingly, the respective positions may be called hot spots of variability. Table 2. Deletions in the methylated/non-methylated regions of simple repeats Q1 and Q2. The frequencies were counted from the total number of deletions. To assess the correlation of these deletions with methylation status, we selected only the reads that covered entire Q1 or Q2 regions plus 5 nucleotides upstream and downstream.

Frequency
Selected

Comparing Variants in the Methylated and Non-Methylated Pools
Most variants did not significantly correlate with the methylation status. Among SNVs, significant correlation was observed at only seven positions: highly variable SNV at 41574; and low variability SNV at 41574, 41675, 41679, 41680, 41831, and 42329. All these variants appeared more frequently in the methylated than in the non-methylated fractions of rDNA.
Since the repeats with CCCT motif seem to be optimal for the formation of noncanonical DNA structures, especially G-quadruplexes, we assembled the deletions of various lengths in the regions Q1 and Q2 as separate groups of data (Table 2). Comparing the deletion frequency in the methylated and non-methylated fractions, we observed a significant difference in Q1, but not in Q2 (Table 2). In the latter, almost all the repeats had deletions independently of the methylation status. Q1 and Q2 differed by the length of the deletions; however, in both regions, this length was expressed predominantly in fourfold numbers from 4 to 32 (Table 3). Each deletion in Q1 and Q2 typically had on its ends the motif CTCC. , also belonged to the Q1 region and reduced the number of repeats in it. It was, thus, the general tendency that the predominantly methylated fraction of rDNA was more variable, in particular, with less CCCT repeats than the predominantly non-methylated sequences.

Search for Micro-RNAs (miRs) Corresponding to the Discovered Variants
Looking for plausible pathways that may connect variations of the rDNA sequence with its transcription activity, we used the publicly available miRBase (https://www. mirbase.org/search.shtml, accessed on 1 October 2022), and searched for RNA and DNA sequences matching with areas containing the discovered variants. Two methods were employed for this purpose: (1) For each studied position n we selected (from the reference database) a series of 20 bp long sequences containing n and beginning at n − 19, n − 18, . . . , n. These 20 sequences were subsequently entered in the database. (2) For each variable position n, we selected one 49 bp long sequence situated between n − 24 and n + 24, which was searched in the database with BLASTN and SSEARCH. Both searches were run with default parameters and a species restriction set to "human". Search results were sorted and evaluated based on e-value; however, match length and percent identity were taken into consideration as well.
Applying these procedures to the six discovered SNVs significantly correlating with rDNA methylation, we found three miRs matching the small areas that contained some of these variants (Table 4). One miR thus corresponded to three SNVs; the other two miRs each corresponded to one SNV. Remarkably, hsa-miR-6724 is transcribed from the 92 bp long gene lodged within IGS, upstream of the promoter. SNV42329 appearing in about 30% of reads alters this gene. The other two miRs are transcribed from the sequences outside the rDNA. Hsa-miR-4769 partly matches to one of the (CCCT) boxes, namely Q1. This region includes three SNVs as well as various deletions. Hsa-miR-6739 matches to a region downstream of the position 41831; however, the discovered variant G>T (C>A in the anti-sense strand) moves this position into the miR consensus region.

Discussion
In this study, we established a correlation between the sequence variability and methylation status of the rDNA regulatory region. Since this status reflects the state of transcription activity [7,20], our data suggest that variability of the IGS sequence in individual rDNA units can initiate selective silencing/activation of the ribosomal genes (Tables 1-3). SNVs in six positions and deletions in the region Q1 may be the first links in the chain of events resulting in the inhibition of the transcription. Moreover, our data seem to indicate how the sequence variability can be translated into a crucial shift of methylation and transcription status. In this respect, the region Q1 is particularly remarkable; its variability included deletions as well as three SNVs (41675; 41679; 41680) (Tables 1 and 2), all of which reduced the number of CCCT repeats. Importantly, these repeats are favorable for the formation of non-canonical DNA structures. The regular patterns with abundant cytosines often generate intercalated motifs (i-motifs), the quadruplex structures, which may participate in the regulation of gene expression [21][22][23]. Still more probable in the Q1 region is the formation of another quadruplex structure, G-quadruplex (G4), based on the complementary strand region (GGGA) n . Indeed, we find here a perfect agreement with the formula used for the prediction of G4 structures: G a N 1 G b N 2 G c N 3 G d N 4 , where G signifies guanine; a, b, c ≥ 3, N 1,2,3,4 are intermediate loop sequences of 1 to 7 nt [24]. In our case, a, b, c = 3; N i = 1. Recent studies have shown that G4 structures not only destabilize the function of the genome, but may also contribute to the transcription regulation in rDNA and other regions of the genome [25]. Accordingly, it seems probable that G4 in the intact Q1 region should inhibit production of the pRNA transcript. On the other hand, the deletions and SNVs reducing the number of CCCT repeats would prevent G4 formation and, thus, facilitate pRNA expression. Now, since pRNA causes the hypermethylation of rDNA and silencing of the ribosomal genes, the discovered variants in the Q1 region are apparently likely to contribute to, or even initiate, the transcription suppression. The same variability may similarly affect the expression of PAPAS, though perhaps only in a fraction of cells, for this lncRNA is usually transcribed in stress conditions [26].
The situation of two other SNVs (41831 and 42329) suggests additional pathways connecting the sequence variability and gene silencing. These variants are positioned in the areas well matched to certain miRs and there are reasons to believe that miR functions may be disrupted by single-nucleotide replacement [27,28]. SNV42329 is located within the gene of hsa-miR-6724 (42320-42342); and the putative targets of this miR seem to be related, for the most part, to cancer or cancer-linked pathways, such as MAPK signaling and ERBB signaling, which are closely connected to the rDNA methylation and transcription [29]. This indicates that SNV42329 may have an impact on the expression of ribosomal genes. SNV41831 makes the upstream end of an area matched to hsa-miR-6739. This miR is produced outside of the rDNA locus. Nevertheless, it still can have an impact on the transcription activity of rDNA, because R-loops, which are often generated in the locus by pol II and other factors, may function as regulators of gene expression or primers for replication [16,19,30,31]. The single DNA strands of the R-loops would enable miRs to bind to their consensus sequence in IGS. Thus, SNV41831 may also affect rDNA transcription through interaction with hsa-miR-6739, although the particulars of the process are still not clear. Finally, hsa-miR-4769 partly matches with the quadruplex favorable region Q1, where we find three SNVs: 41675; 41679, and 41680 (Table 3). However, the quadruplexes, especially G4, are mutually convertible with R-loops, and the two non-canonical states sometimes exist in a dynamic equilibrium [32,33]. This would enable hsa-miR-4769 to form a new R-loop or to stabilize an existing one by binding to the anti-sense strand of the region, when it is released from the quadruplex state. Another obstacle for the transcription of pRNA would thus be created. However, the three SNVs would prevent the interaction of the miR with Q1 and facilitate pRNA expression, which would result in silencing of the ribosomal genes.
Taken together, our results indicate that a number of SNVs and short deletions, which correlate with DNA methylation of the rDNA regulatory region, may contribute to or initiate silencing of the locus. All these variants, except for one SNV, proved to be related to the functional/potentially functional regions of rDNA, which allowed us to suggest certain hypothetical pathways connecting this silencing with the DNA variability: by suppressing non-canonical structures generated in IGS and facilitating pRNA transcription; -by altering the structure of an miR produced in IGS; -by interfering with the activity of miRs transcribed outside the rDNA locus.

Cell Culture
The human fibrosarcoma cell line, HT1080, was obtained from American Type Culture Collection (Rockville, MD, USA). The cells were maintained in complete DMEM Dulbecco's modified Eagle medium containing 10% FBS and 1% penicillin-streptomycin. All the chemicals were obtained from Merck Life Science (Prague, Czech Republic) unless otherwise stated. Cell cultures were maintained at 37 • C, in a humidified atmosphere of 5% CO 2 .

Extraction of the Nucleoli
Nucleoli of HT1080 cells were isolated as follows [34,35]. In short, the cells were homogenized in a hypotonic buffer (10 mM HEPES, 10 mM KCl, 1.5 mM MgCl 2 , and 0.5 mM dithiothreitol; pH 7.9) using a pre-cooled glass Dounce homogenizer. The nuclear pellet was resuspended in 0.25 M sucrose (plus 10 mM MgCl 2 ) and cleared by centrifugation over a cushion of S2 solution (0.35 M sucrose and 0.5 mM MgCl 2 ). The nuclei were resuspended in S2 solution and sonicated at high amplitude in 15 s intervals, followed by phase-contrast monitoring for nuclear disruption. The sonicated sample was layered onto a cushion of 0.88 M sucrose and 0.5 mM MgCl 2 , and the nucleolar pellet was recovered after centrifugation. All the steps were performed at 4 • C.

Methylated DNA Immunoprecipitation
The DNA was extracted from the nucleoli by a QIAmp DNA Mini Kit (QIAGEN, Hilden, Germany). Methylated DNA immunoprecipitation (MeDIP) was performed using the Methylated DNA Immunoprecipitation Kit (Abcam, Cambridge, UK) according to the manufacturer's protocol. The total isolated DNA was sonicated (five pulses for 25 s at high amplitude with 45 s intervals between pulses while resting on ice) to yield DNA fragments of 500-1000 bp in length. We used 1 µg of fragmented DNA for the immunoprecipitation. After heat denaturation, the DNA was incubated for 2 h at room temperature with a 1 µL anti-5 mC antibody containing 1 µL of normal mouse IgG as the negative control. The supernatant was collected as a nonmethylated DNA fraction. The methylated DNA was released from the DNA/antibody complex using proteinase K at 65°C for 1 h. Subsequently, the samples were purified in spin columns and eluted. Both DNA fractions were amplified by PCR with the primers listed in Table 5.

Sequencing and Bioinformatic Analysis
PCR products were sequenced on an Illumina MiSeq using the Nextera XT DNA Library Preparation Kit (Illumina, San Diego, CA, USA). The sequencing data were uploaded to the Galaxy web platform and we used the public server at usegalaxy.org for bioinformatic analysis [36]. Mapping was conducted by BWA-MEM. [37]. FreeBayes was used to identify sequence variants [38]. An IGV (integrated genomics viewer) was used for data visualization [39]. The numbers of reads overlapping Q1 and Q1 repeats ( Figure 1, Table 2) were obtained with scripts employing samtools, bedtools, and other in-house scripts.