Frequent Germline and Somatic Single Nucleotide Variants in the Promoter Region of the Ribosomal RNA Gene in Japanese Lung Adenocarcinoma Patients

Ribosomal RNA (rRNA), the most abundant non-coding RNA species, is a major component of the ribosome. Impaired ribosome biogenesis causes the dysfunction of protein synthesis and diseases called “ribosomopathies,” including genetic disorders with cancer risk. However, the potential role of rRNA gene (rDNA) alterations in cancer is unknown. We investigated germline and somatic single-nucleotide variants (SNVs) in the rDNA promoter region (positions −248 to +100, relative to the transcription start site) in 82 lung adenocarcinomas (LUAC). Twenty-nine tumors (35.4%) carried germline SNVs, and eight tumors (9.8%) harbored somatic SNVs. Interestingly, the presence of germline SNVs between positions +1 and +100 (n = 12; 14.6%) was associated with significantly shorter recurrence-free survival (RFS) and overall survival (OS) by univariate analysis (p < 0.05, respectively), and was an independent prognostic factor for RFS and OS by multivariate analysis. LUAC cell line PC9, carrying rDNA promoter SNV at position +49, showed significantly higher ribosome biogenesis than H1650 cells without SNV. Upon nucleolar stress induced by actinomycin D, PC9 retained significantly higher ribosome biogenesis than H1650. These results highlight the possible functional role of SNVs at specific sites of the rDNA promoter region in ribosome biogenesis, the progression of LUAC, and their potential prognostic value.


Introduction
Lung cancer is the leading cause of cancer-related deaths worldwide, and non-small cell lung cancer (NSCLC) accounts for most cases (approximately 85%) [1]. Lung adenocarcinoma (LUAC) is the most common histological subtype of NSCLC [1][2][3]. Numerous studies have identified biologically significant genetic alterations involved in various aspects of NSCLC, particularly of LUAC, including carcinogenesis, cancer progression, prognosis and treatment. Recent comprehensive studies showed that mutations in TP53, STK11, or SMARCA4 are associated with poor prognosis [4,5]. The identification of several types of genetic alterations, such as mutations in EGFR or BRAF, and gene fusion events involving ALK, ROS1 and RET, provided potential therapeutic targets [1,[4][5][6]. Besides protein-coding mRNA, there are different types of non-coding RNA (ncRNA) species comprising RNA molecules that are not translated into proteins. While most cancer genome studies focused exclusively on variants that alter the amino acid sequences of protein-coding genes [7], in recent years, a new and crucial role of ncRNA in cancer has been highlighted. Although initially ncRNAs were thought to be "junk", with no functional purpose, trailblazing research has discovered their diverse functions in coding, decoding, transfer, regulation, gene expression, and complex integrated networks [7][8][9][10]. Now the huge contribution of ncRNAs in cancer development and progression has been recognized. Two of the most extensively studied ncRNA classes, long non-coding RNAs (lncRNAs) and microRNAs (miRNAs), particularly play critical roles in human cancer. Numerous large-scale studies have shown that many lncRNAs and miRNAs may represent possible promising diagnostic and prognostic biomarkers [10].
The ribosomal RNA gene (rDNA) encodes ribosomal RNA (rRNA), which is the most abundant ncRNA, accounting for~90% of the total RNA present in a cell [8,9]. The main rDNA, i.e., 45S rDNA, comprises several hundred transcription units organized in tandem repeats and clustered on five chromosomal loci on chromosomes 13, 14, 15, 2, and 22 in humans [11]; the transcription units are transcribed solely by RNA polymerase I (Pol I) in the nucleolus to produce the 47S/45S precursor rRNA (pre-rRNA). This pre-rRNA transcript is then processed into three mature rRNAs (18S, 5.8S, and 28S rRNAs). The last small molecule, 5S rRNA, is synthesized outside the nucleolus by RNA polymerase III (Pol III) [12][13][14][15][16]. Concurrent with transcription and ribosomal protein-pre-rRNA assembly, pre-rRNAs are extensively processed and modified, ultimately forming ribosomes, which are cellular organelles responsible for the machinery of protein synthesis. These coordinated processes, from the initial rDNA transcription to the creation of mature ribosomes, are called ribosome biogenesis [16,17]. Impaired ribosome biogenesis causes a series of diseases known as "ribosomopathies," including genetic disorders such as Diamond-Blackfan anemia, Shwachman-Diamond syndrome, X-linked dyskeratosis congenita and cartilage hair hypoplasia, which stem from impaired protein synthesis resulting from Cells 2020, 9,2409 3 of 22 dysfunctional ribosomes. Some data suggest an active role for ribosome biogenesis in carcinogenesis through its effect in balancing protein translation, thereby altering the synthesis of proteins that play an important role in the genesis of cancer and resulting in uncontrolled cancer cell proliferation [15,16]. Most reported ribosomopathies are caused by ribosomal protein gene mutations; however, rDNA alterations could also cause defects in ribosome biogenesis and generate dysfunctional ribosomes.
Only a few studies have examined genetic aberrations in rDNA in lung cancer. Shiao et al. [18] reported the occurrence of single nucleotide polymorphisms (SNPs) in the rRNA gene among human LUAC cell lines in sites located upstream and downstream of the rRNA transcription start site. A recent study demonstrated that coupled 5S rDNA copy number amplification and 45S rDNA loss is associated with the presence of certain somatic genetic alterations, as well as with an increased estimated cancer cell proliferation rate and nucleolar activity across several cancers, including LUAC, gastric adenocarcinoma, and head and neck squamous cell carcinoma [19]. However, the occurrence of germline and somatic DNA alterations in the rDNA promoter region in human LUAC tissues and their clinicopathological relevance have not yet been elucidated. In this study, we examined the frequency of germline and somatic DNA single-nucleotide variants (SNVs) in the rDNA promoter region and their clinicopathological and prognostic relevance in LUAC patients. Our findings suggest the prognostic implications of rDNA promoter genomic alterations in LUAC, and thier potential for future individualized therapies.

Patients and Samples
We retrospectively reviewed 82 LUAC samples that were surgically resected at the Division of Thoracic and Cardiovascular Surgery, Niigata University Medical & Dental Hospital between April 2007 and December 2010. Eligible individuals were required to receive complete resection (surgical margin negative), have pathologically confirmed primary lung cancer, and have a sufficient amount of cancer tissue for genetic analyses. The following clinical data were available: age at surgery, sex, smoking history, Union for International Cancer Control (UICC) tumor-node-metastasis (TNM) classification stage [20], EGFR mutation status, and prognosis. EGFR mutation testing was performed as previously described [21]. The patients with a missing or too-short follow-up period (less than 30 days) were excluded from the study. Written informed consent for the use of resected tissue for genetic analysis was obtained from each patient. The study was approved by the Ethics Committee on Genetics of Niigata University School of Medicine (G2015-0679, H13-0012) and was conducted in accordance with the Declaration of Helsinki.
The surgically resected specimens were fixed routinely in 10% formalin (4% formaldehyde) solution, and the lung tissues obtained from the maximal cut surface area of the cancer tissues were processed into paraffin blocks for histopathological examination. Tissue sections were prepared (3 µm thickness) and subjected to hematoxylin and eosin (H&E) and Elastica van Gieson staining to diagnose histological subtype and pathological T stage. Histological classification was performed according to the World Health Organization (WHO) classification [22,23]. The pathological stage was determined according to the TNM classification of malignant tumors (UICC) 8th edition [20]. All slides were reviewed by two pathologists (R.O. and H.U.) blinded to the clinical data and sequence results. If the two pathologists had differing opinions regarding the diagnosis, they observed the slides together, and consensus was reached through discussion.

Cell Culture
The human non-small cell lung carcinoma cell lines PC9, NCI-H23, NCI-H441, NCI-H1650, NCI-H2935 and HCC4006, and colon adenocarcinoma cell lines Caco-2, SW480 and COLO205, were purchased from American Type Culture Collection (ATCC; Manassas, VA, USA). The human LUAC cell line A549, colon adenocarcinoma cell lines HCT116 and DLD-1, gastric adenocarcinoma  lines Kato III and MKN45, Burkitt lymphoma cell line Raji, urinary bladder carcinoma cell line  T24, hepatoblastoma cell line HepG2, and cervical carcinoma cell line HeLa were obtained from the Japanese Collection of Research Bioresources (JCRB) Cell Bank (Osaka, Japan). Jurkat cells, a human T cell leukemia cell line, were obtained from the RIKEN Cell Bank (Tsukuba, Japan). The MKN74 cell line was derived from patients with intestinal type gastric carcinoma [24]. All cell lines were cultured in a suitable medium according to the information on culture conditions provided by the American Type Culture Collection (ATCC), the Japanese Collection of Research Bioresources Cell Bank (JCRB), and the RIKEN Cell Bank. Cells were maintained at 37°C in humidified air with 5% CO 2 .
The Pol I-specific inhibitor actinomycin D (ActD) (#A9415, Sigma-Aldrich, St. Louis, Missouri, USA) was initially dissolved in dimethyl sulfoxide (DMSO) (Sigma-Aldrich) and then diluted in the suitable media. Cells were treated with a low concentration of actinomycin D (50 ng/mL) for 2 h at 37°C in humidified air with 5% CO 2 . As a negative control, supplementation with an equivalent amount of DMSO was used.

Microdissection and DNA Extraction
Formalin-fixed paraffin -embedded (FFPE) samples of lung cancers were enriched for neoplastic cellularity to a minimum of 70% of three consecutive 10 µm sections by manual microdissection or laser capture microdissection with a Leica LMD 7000 (Leica Microsystems GmbH, Wetzlar, Germany). For laser capture microdissection, the FFPE sections were placed onto polyethylene naphthalate membrane slides (Leica Microsystems GmbH, Wetzlar, Germany). Microdissected samples were deparaffinized using xylene and 100% ethanol or Deparaffinization Solution (Qiagen, Hilden, Germany) according to the manufacturer's protocol. For paired normal tissue DNA preparation, lung tissue or resected lymph nodes without cancer cells were used.
DNA from FFPE samples was isolated using the QIAamp DNA FFPE Tissue Kit (Qiagen) on a QIACube (Qiagen) according to the manufacturer's instructions with the following modifications. FFPE samples were incubated in buffer ATL with proteinase K for 3-6 h at 56°C with occasional vortexing pulses. Genomic DNA was eluted in a 30 µL volume. DNA from cultured cells was extracted using the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer's instructions. The isolated DNA in the 1.5 mL tube was stored at −20°C until assayed. Before PCR, the isolated DNA was diluted with sterile nuclease-free distilled water (1:10 v/v).

Sanger Sequencing
The specific primer pairs for 45S rDNA (GenBank Accession No. U13369 and AL592188) and sizes of the expected amplicons are presented in Table S1. Primers were synthesized by Sigma Aldrich (St. Louis, MO, USA). PCR conditions were as follows: the reaction mixtures contained 5 µL of 5X Colorless GoTaq ® Flexi Buffer, 1 µL of MgCl 2 (25 mM), 0.5 µL of dNTPs (10 mM), 0.5 µL of each primer (10 µM), 0.125 µL of GoTaq ® Flexi DNA polymerase (Promega, Madison, WI, USA), 2 µL of diluted (1:10 v/v) template DNA, and 15.375 µL of sterile nuclease-free distilled water in a total volume of 25 µL. All PCR reactions were carried out as follows: 1 cycle at 95°C for 10 min, followed by 40 cycles each consisting of 95°C for 1 min, 55°C for 1 min, 72°C for 1 min, and a final cycle at 72°C for 10 min. Five microliters of the amplified products was resolved on 2% TBE agarose gels and visualized by ethidium bromide staining. PCR products were purified using ExoSAP-IT (Affymetrix, Cleveland, OH, USA) and subsequently directly sequenced with the BigDye Terminator v1.1 Kit (Applied Biosystems, Foster City, CA, USA). The sequencing products were purified using the NucleoSEQ Kit (MACHEREY-NAGEL, Düren, Germany) and analyzed on a 3500 Genetic Analyzer (Applied Biosystems). The sequence data were analyzed using the GENETYX (Software Development Co. Ltd., Tokyo, Japan) and Chromas software (version 2.6.1; Technelysium Pty, Ltd., South Brisbane, QLD, Australia).

Deep Sequencing Using Illumina MiSeq
Forward (5 -TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG) and reverse (5 -GTCTCGT GGGCTCGGAGATGTGTATAAGAGACAG) Illumina adapter overhangs were added to the 5 ends of the primers listed in Table S1 to enable Illumina Nextera XT DNA indexing of the PCR products. Primers were synthesized by FASMAC Co. Ltd. (Kanagawa, Japan). Primary PCR was carried out using the GoTaq ® Flexi DNA polymerase (Promega) under the same conditions as for Sanger sequencing. Secondary PCR was performed with the Nextera XT Index Kit (Illumina, San Diego, CA, USA) and KAPA HiFi polymerase (Kapa Biosystems, Wilmington, MA, USA). The amplicons were purified after both PCR steps using Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA). To confirm the amplicon size and determine the concentration, DNA separation was performed using the DNA 1000 Lab-on-a-Chip Assay Kit with an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) following the manufacturer's protocol. The resulting library was sequenced on the Illumina MiSeq with the PhiX Control Kit and MiSeq Reagent Kit Nano v2 (300 cycles) (Illumina). Alignment to the GRCh38 genome was performed using MiSeq Reporter Software v2.5. Integrative Genomics Viewer version 2.3 (Broad Institute, Cambridge, MA, USA) [25] was used to visualize the read alignment. A sequencing coverage of 100× (bidirectional) and a minimum VAF of 5% in the wild-type background were used as cutoffs.

Quantitative Real-Time Reverse Transcription Polymerase Chain Reaction (qRT-PCR)
Total RNA was extracted from PC9, A549, NCI-H441, NCI-H1650, MKN45 and MKN74 using the PureLink™ RNA Mini Kit with PureLink™ DNase Set (Thermo Fisher Scientific, Waltham, MA, USA). Reverse transcription was performed using the SuperScript™ IV VILO™ Master Mix with ezDNase (Thermo Fisher Scientific). Quantitative real-time PCR was performed using SsoAdvanced™ Universal SYBR Green Supermix (Bio-Rad, Hercules, CA, USA) on a LightCycler ® 96 Real-Time PCR System (Roche, Basel, Switzerland). Sequences of the primer sets are shown in Table S1. The amplification conditions were as follow: 94 • C for 3 min, followed by 45 amplification cycles at 94 • C for 15 s, 60 • C for 20 s and 72 • C for 30 s. Relative fold changes were determined by the comparative threshold (∆∆CT) method using cyclophilin as endogenous normalization control.

Statistical Analysis
All statistical analyses were performed using R version 3.4.1 (R Foundation for Statistical Computing, Vienna, Austria) and EZR version 1.37 (Saitama Medical Center, Jichi Medical University, Saitama, Japan) [26]. Statistical evaluation of the associations between two categorical variables was based on Fisher's exact test. In the qRT-PCR analysis, significant differences were calculated from the average value of 3 wells. Data are presented as mean ± SD and the statistical analysis involved p-values calculated by paired two-tailed Student's t tests. Kaplan-Meier survival curves were constructed for recurrence-free survival (RFS) and overall survival (OS). Differences between the groups were estimated using the log-rank test. From univariate analysis, we selected variables with p-value < 0.05 (statistical criterion) by Cox regression and assessed multicollinearity between variables. A multivariate analysis was undertaken through Cox regression with Firth's penalized likelihood [27,28]. All statistical tests were two-sided; a p-value < 0.05 was regarded as statistically significant.

Assessment and Validation of the Assay Design for Identification of rDNA Target Region
The human rRNA promoter contains two essential elements: the upstream control element (UCE; −156 to −107) and the core promoter (CP; −45 to +18). The UCE serves as the binding target for cellular trans-activating proteins, such as upstream binding factor (UBF), and is involved in transcription initiation of Pol I, whereas the CP overlaps the transcription start site (+1). The entire promoter region of rDNA is contained in an intergenic spacer region (IGS) between rDNA units, which are transcribed In total, 20 human cancer cell lines were screened by both a PCR-based targeted next-generation sequencing (NGS) method and Sanger sequencing. The SNVs identified by NGS with a minimum variant allele frequency (VAF) of 5% are summarized in Table 1. Recurrent SNVs at positions −206, −96, −72, and +49 were observed in more than one cell line, respectively. Other SNVs were observed in only one of the cell lines (Figures 2 and S1, and Table 1). None of the cells had SNVs in rDNA coding regions, including 18S rRNA or 28S rRNA. Table 1. SNVs around rDNA promoter region in 20 human cancer cell lines.
In total, 20 human cancer cell lines were screened by both a PCR-based targeted next-generation sequencing (NGS) method and Sanger sequencing. The SNVs identified by NGS with a minimum variant allele frequency (VAF) of 5% are summarized in Table 1. Recurrent SNVs at positions −206, −96, −72, and +49 were observed in more than one cell line, respectively. Other SNVs were observed in only one of the cell lines ( Figure 2 and Figure S1, and Table 1). None of the cells had SNVs in rDNA coding regions, including 18S rRNA or 28S rRNA.  Next, we compared the Sanger sequencing and NGS results. Generally, recent descriptions of NGS analysis pipelines for laboratory testing for detecting alleles of clinical samples have recommended using a detection threshold of 5% to minimize technical artifacts and give reliability [32]. When a minimum VAF of 5% was used as the cutoff, all variants found in all cultured cells demonstrated 100% concordance between Sanger sequencing and NGS (Figures 2 and S1). Based on the data obtained from the cell lines, we decided to focus on the rDNA promoter region for subsequent analyses of cancer tissue samples using the more cost-effective Sanger sequencing.

Patient Characteristics
The demographic and clinicopathological characteristics of 82 primary LUAC patients are listed in Table 2. Histologically, there were 8 (9.8%) minimally invasive adenocarcinoma (MIA) cases, 3 (3.7%) lepidic predominant adenocarcinoma, 8(9.8%) acinar predominant adenocarcinoma, 33 (41.5%) papillary predominant adenocarcinoma, 2 (2.4%) micropapillary predominant adenocarcinoma, 15 (18.3%) solid predominant adenocarcinoma, 8 (9.8%) invasive mucinous adenocarcinoma, 2 (2.4%) fetal adenocarcinoma, and 2 (2.4%) pleomorphic carcinoma (1 was acinar predominant and 1 was solid predominant). In total, 3 cases (2 of 34 papillary and 1 of 8 acinar predominant) fulfilled the criteria of adenosquamous carcinoma; adenocarcinoma with focal squamous cell carcinoma component (more than 10% and less than 40%). There was no adenocarcinoma in situ case ( Table 2). The median follow-up period for surviving patients was 113.9 months (range, 10.7-145.9). Next, we compared the Sanger sequencing and NGS results. Generally, recent descriptions of NGS analysis pipelines for laboratory testing for detecting alleles of clinical samples have recommended using a detection threshold of 5% to minimize technical artifacts and give reliability [32]. When a minimum VAF of 5% was used as the cutoff, all variants found in all cultured cells demonstrated 100% concordance between Sanger sequencing and NGS ( Figure 2 and Figure S1). Based on the data obtained from the cell lines, we decided to focus on the rDNA promoter region for subsequent analyses of cancer tissue samples using the more cost-effective Sanger sequencing.

Survival Analysis of LUAC Patients
The 5-year and 10-year RFS rates in LUAC patients were 63.9% and 59.1%, respectively; the 5-year and 10-year OS rates were 79.1% and 59.1%, respectively.
The presence of germline SNVs at position +1 to +100 in the rDNA promoter region, lesions of higher stages (III and IV), nodal and/or distant metastasis, and invasive histological subtypes was significantly associated with worse prognosis in terms of both RFS ( Figure 6A and Figure S3A-C; p = 0.030, p < 0.001, p < 0.001, and p = 0.022, respectively) and OS ( Figure 7A and Figure S4A-C; p = 0.004, p < 0.001, p < 0.001, and p = 0.032, respectively) after surgery. Germline SNVs at position +49 and +52 were significantly associated with shorter OS (p = 0.001 and p = 0.035, respectively), whereas no significant association with RFS was detected (Table S3). None of the somatic SNVs had a significant correlation with survival. Although germline and/or somatic mutations at sites extending from position +1 to +100 were associated with worse OS (p = 0.025), the association did not reach statistical significance for RFS. No association was found between clinical outcome and other clinicopathological factors, including sex, age, EGFR mutation, smoking, or presence of other types of mutations (Table S3).
stage, nodal and/or distant metastasis, EGFR mutations, and MIA versus invasive histological subtype.

Survival Analysis of LUAC Patients
The 5-year and 10-year RFS rates in LUAC patients were 63.9% and 59.1%, respectively; the 5year and 10-year OS rates were 79.1% and 59.1%, respectively.
The presence of germline SNVs at position +1 to +100 in the rDNA promoter region, lesions of higher stages (III and IV), nodal and/or distant metastasis, and invasive histological subtypes was significantly associated with worse prognosis in terms of both RFS (Figures 6A and S3A-C; p = 0.030, p < 0.001, p < 0.001, and p = 0.022, respectively) and OS (Figures 7A and S4A-C; p = 0.004, p < 0.001, p < 0.001, and p = 0.032, respectively) after surgery. Germline SNVs at position +49 and +52 were significantly associated with shorter OS (p = 0.001 and p = 0.035, respectively), whereas no significant association with RFS was detected (Table S3). None of the somatic SNVs had a significant correlation with survival. Although germline and/or somatic mutations at sites extending from position +1 to +100 were associated with worse OS (p = 0.025), the association did not reach statistical significance for RFS. No association was found between clinical outcome and other clinicopathological factors, including sex, age, EGFR mutation, smoking, or presence of other types of mutations (Table S3).
The results of univariate and multivariate Cox regression analyses are shown in Table 4. The variable nodal and/or distant metastasis was excluded from the final model to avoid collinearity with stage. In multivariate Cox regression analysis, the presence of germline rDNA promoter SNVs between position +1 and +100 was an independent prognostic factor for unfavorable RFS (hazard ratio (HR), 3.669; 95% confidence interval (CI), 1.545-8.373, p = 0.004) and OS (HR, 4.870; 95% CI, 2.004-11.56, p < 0.001) after adjustment for age, pathologic stage, and invasive/MIA histological subtype. From the results of the multivariate analysis, we further evaluated the prognostic impact of the presence of germline rDNA promoter SNVs between position +1 and +100 in Stage I and Stage I/II patients after excluding MIA cases (n = 42 and n = 60, respectively). The presence of germline SNVs between position +1 and +100 was significantly associated with a worse prognosis for RFS (p = 0.004 in Stage I; p = 0.001 in Stage I/II) and OS (p < 0.001 in Stage I and Stage I/II) in both subgroups by log-rank test ( Figures 6B,C and 7B,C).  The results of univariate and multivariate Cox regression analyses are shown in Table 4. The variable nodal and/or distant metastasis was excluded from the final model to avoid collinearity with stage. In multivariate Cox regression analysis, the presence of germline rDNA promoter SNVs between position +1 and +100 was an independent prognostic factor for unfavorable RFS (hazard ratio (HR), 3.669; 95% confidence interval (CI), 1.545-8.373, p = 0.004) and OS (HR, 4.870; 95% CI, 2.004-11.56, p < 0.001) after adjustment for age, pathologic stage, and invasive/MIA histological subtype. From the results of the multivariate analysis, we further evaluated the prognostic impact of the presence of germline rDNA promoter SNVs between position +1 and +100 in Stage I and Stage I/II patients after excluding MIA cases (n = 42 and n = 60, respectively). The presence of germline SNVs between position +1 and +100 was significantly associated with a worse prognosis for RFS (p = 0.004 in Stage I; p = 0.001 in Stage I/II) and OS (p < 0.001 in Stage I and Stage I/II) in both subgroups by log-rank test ( Figure 6B,C and Figure 7B,C).  SNV, single-nucleotide variant, rDNA promoter SNV, germline SNV from the position +1 to +100; n.s., not significant; * p < 0.05. 1 This variable was excluded from the final multivariate Cox regression model to avoid multicollinearity in relation to the Stage (III-IV/I-II). 2 Firth's correction was used because of quasi-complete separation; there was no event in one of the subgroups.

rDNA Promoter SNVs in Publicly Available Databases
To validate the prognostic value of rDNA promoter SNVs, we searched the publicly available germline and somatic variant databases derived from whole-exome sequences (WXS) and wholegenome sequences (WGS), including the International Cancer Genome Consortium (ICGC) [33], the National Bioscience Database Center Human Database [34], the European Genome-phenome Archive [35], the Cancer Genome Atlas (TCGA) [36], and the Genome Aggregation Database (gnomAD) [37]. In the ICGC data portal [33], there were two sets of open-access data comprised of somatic variants. GRCh37 (hg19) was the reference genome used for the alignment. In hg19 (National Center for Biotechnology Information; NCBI), 45S rDNA was unplaced, unlocated, and unlocalized, and it was

rDNA Promoter SNVs in Publicly Available Databases
To validate the prognostic value of rDNA promoter SNVs, we searched the publicly available germline and somatic variant databases derived from whole-exome sequences (WXS) and whole-genome sequences (WGS), including the International Cancer Genome Consortium (ICGC) [33], the National Bioscience Database Center Human Database [34], the European Genome-phenome Archive [35], the Cancer Genome Atlas (TCGA) [36], and the Genome Aggregation Database (gnomAD) [37]. In the ICGC data portal [33], there were two sets of open-access data comprised of somatic variants. GRCh37 (hg19) was the reference genome used for the alignment. In hg19 (National Center for Biotechnology Information; NCBI), 45S rDNA was unplaced, unlocated, and unlocalized, and it was newly mapped in GRCh38 (RNA45SN1, N2, N3, N4, N5 in NCBI Gene). Therefore, the 45S rDNA sequence has not been analyzed in the ICGC. In the TCGA data portal [36], the TCGA lung adenocarcinoma (TCGA-LUAD) open-access data comprised only somatic variants by using GRCh38 as the reference genome. TCGA-LUAD is a WXS dataset which was sequenced using the Agilent SureSelect Human All Exon 50Mb kit (Agilent Technologies, Palo Alto, CA, USA) [38,39]. We obtained the BED file for the kit containing the coordinates of capture target genomic positions from Agilent. The original BED file was designed based on hg19, and was converted from the GRCh37/hg19 genome assembly to GRCh38/hg38 with UCSC Genome Browser's LiftOver [40], using the default webtool parameters according to the manufacturer's instruction. We confirmed that TCGA-LUAD WXS did not analyze the rDNA gene promoter regions (positions −248 to +100 of RNA45SN1 at chromosome (chr) 21: 8432974-8433321, RNA45SN2 at chr21:8205740-8206087, RNA45SN3 at chr21:8388787-8389134) by design. The gnomAD is a large population genomics database which compiles germline WGS and WXS studies around the world [37]. One of its datasets, the gnomAD v2.1, contains data from 125,748 exomes and 15,708 whole genomes, including the Exome Aggregation Consortium (ExAC) dataset from blood ("germline") samples from TCGA. However, the gnomAD v2.1 was based on hg19. Another dataset, gnomAD v3, was based on GRCh38 and includes 71,702 WGS data. The ribosomal RNA gene promoter region showed "No coverage", i.e., the sequence of this region was not read in all WGS studies presented in gnomAD v3 ( Figure S5). Thus, we were not able to find suitable information that can be used to validate our findings obtained from our patient cohort.

Evaluation of rRNA Expression
To investigate whether the rDNA promoter SNVs between position +1 and +100 influence rRNA synthesis, we performed qRT-PCR to compare rRNA transcription levels between the LUAC cell lines carrying the SNVs and those without the SNVs. PC9 cells with the SNV at +49, A549 cells with the SNV outside +1 to +100, and the H1650 cells without the SNVs were treated with vehicle control (DMSO) or low concentrations of ActD (50 ng/mL), which has been shown to selectively inhibit the Pol I-driven rRNA transcription [41,42]. As shown in Figure 8, the PC9 and A549 cells showed significantly higher transcription levels of endogenous 47/45S pre-rRNA than H1650 (p < 0.05), whereas there was no significant difference in 47/45S pre-rRNA transcripts between PC9 and A549 cells. No significant differences were observed in endogenous 28S mature rRNA transcription levels between the cells with SNV and those without SNVs. When the cells were treated with low-dose ActD, a dramatic decrease in the level of 47/45S pre-rRNA was observed in all the three cell lines (p < 0.05). As for 28S mature rRNA expression, ActD did not affect the 28S mature rRNA transcription level in PC9. In A549 and H1650 cells, ActD caused a mild to moderate reduction in 28S mature rRNA, but the difference was not significant (p = 0.099 and p = 0.053, respectively). ActD-treated PC9 with the SNV at +49 retained significantly higher 47/45S pre-rRNA and 28S mature rRNA expressions compared to ActD-treated H1650 without SNV (p < 0.05).

Discussion
In this study, we investigated the clinical and prognostic features of germline and somatic single nucleotide genetic alterations in the rDNA promoter region among Japanese LUAC patients. Our study identified frequent and recurrent genomic alterations in the rDNA promoter region in normal and LUAC tissues. Importantly, the presence of germline SNVs between positions +1 and +100 in the rDNA promoter region demonstrated a significant association with worse prognosis in terms of both RFS and OS.
Recent technical advances in large-scale sequencing and genomics methods have enabled the comprehensive analysis of somatic SNVs in human long ncRNAs (lncRNAs), impacting lncRNA expression [43,44], and studies have been conducted to unveil the role of lncRNAs in the regulation of rRNA synthesis in human cancer [45,46]. However, few studies have focused on the existence of SNVs and/or SNPs in human rDNA and their functional role in the regulation of rRNA synthesis. Shiao et al. identified hotspot SNPs at positions −234, −233, −181, −104, −96, −72, +52, +139, +144, +207, +225 and +290 (relative to the transcription start site at position +1) using sequencing products amplified from the region −388 to +306 of the rRNA in six human LUAC cell lines and a nontransformed, immortalized line from human peripheral lung epithelium. Importantly, no SNP was identified in the UCE or CP [18]. In our work, several SNVs were detected in the rDNA promoter region in the A549, H23, PC9, Caco-2, SW480, COLO205, DLD-1, Raji, Jurkat, and HeLa cell lines. Consistent with the previous study [18], the SNV at position −96 was found in the A549 LUAC cell line, and no SNVs were detected in PCR-amplified products from H441 or H23 cell lines. As expected, all 20 cell lines lacked SNVs in UCE or CP regions.
Given the unique array of rDNA genes with several hundreds of transcription units organized in tandem repeats and clustered on five chromosomal loci [11], it might be challenging to reliably detect variants with low VAF using the Sanger method if the primers bind to the promoters of all

Discussion
In this study, we investigated the clinical and prognostic features of germline and somatic single nucleotide genetic alterations in the rDNA promoter region among Japanese LUAC patients. Our study identified frequent and recurrent genomic alterations in the rDNA promoter region in normal and LUAC tissues. Importantly, the presence of germline SNVs between positions +1 and +100 in the rDNA promoter region demonstrated a significant association with worse prognosis in terms of both RFS and OS.
Recent technical advances in large-scale sequencing and genomics methods have enabled the comprehensive analysis of somatic SNVs in human long ncRNAs (lncRNAs), impacting lncRNA expression [43,44], and studies have been conducted to unveil the role of lncRNAs in the regulation of rRNA synthesis in human cancer [45,46]. However, few studies have focused on the existence of SNVs and/or SNPs in human rDNA and their functional role in the regulation of rRNA synthesis. Shiao et al. identified hotspot SNPs at positions −234, −233, −181, −104, −96, −72, +52, +139, +144, +207, +225 and +290 (relative to the transcription start site at position +1) using sequencing products amplified from the region −388 to +306 of the rRNA in six human LUAC cell lines and a nontransformed, immortalized line from human peripheral lung epithelium. Importantly, no SNP was identified in the UCE or CP [18]. In our work, several SNVs were detected in the rDNA promoter region in the A549, H23, PC9, Caco-2, SW480, COLO205, DLD-1, Raji, Jurkat, and HeLa cell lines. Consistent with the previous study [18], the SNV at position −96 was found in the A549 LUAC cell line, and no SNVs were detected in PCR-amplified products from H441 or H23 cell lines. As expected, all 20 cell lines lacked SNVs in UCE or CP regions.
Given the unique array of rDNA genes with several hundreds of transcription units organized in tandem repeats and clustered on five chromosomal loci [11], it might be challenging to reliably detect variants with low VAF using the Sanger method if the primers bind to the promoters of all rDNA genes on different chromosomes. For the assessment of the whole population of molecules and validation of the results obtained using the Sanger method, pyrosequencing was utilized by Shiao et al. [18], while we used NGS, which is the more recent revolutionary and reliable method. Although recent advances in NGS employing unique molecular identifiers have improved the sequencing accuracy, the current inherent error rates of NGS library preparation approaches, sequencing chemistry and platforms complicate the reliable detection of low-frequency variants without compromising specificity. Generally, the lower limit of VAF that has clear clinical utility is generally in the range of 5% [32]. Considering a minimum VAF cutoff of 5% in NGS, the sequence results detected in all cultured cells showed 100% agreement between Sanger and NGS data. These results support the reproducibility of our assay, despite the complicated repetitive genomic structure of the rDNA genes [11]. While we focused on the significant common variants with ≥5% VAF in this study, recent genetic studies have shown the importance of the low-frequency variants with less than 5% VAF in the pathogenesis of diseases, such as somatic mutations in cancer [47], genetic disorders [48], as well as the hereditary risks of diseases and their progression [49,50], including lung cancer [51]. Sequencing methods and bioinformatics pipelines that enable the distinction of true variants from artifacts are improving [50][51][52][53][54]. In comparison to oncogenes and tumor suppressor genes (2 copies), the interpretation of VAFs thresholds for rDNA (hundreds of copies) is much more challenging. Future advances in sequencing technology may allow the accurate identification of low-frequency pathogenic variants in the rDNA genes, and the understanding of their extensive role in lung cancer.
Of note, Shiao et al. also found that differences in the frequency of C residue at position +139, which is located downstream of the rRNA transcription start site, correlated negatively with rRNA abundance [18], albeit relying on a limited data set from a small number of cell lines. Another study by Zhang et al. [31] investigated the frequency of SNPs at CpG sites in the region surrounding the transcription start site for rRNA (from position −248 to +100) in a human keratinocyte cell line, HaCaT. They also demonstrated that the binding site of Pol I regulatory proteins such as UBF and basonuclin is located downstream and upstream of UCE and CP, respectively, which were initially identified as Pol I regulatory sites by their association with subunits of Pol I. The CpG island methylation level at the promoters of tumor suppressor genes is widely known to contribute to tumorigenesis and tumor progression in many types of cancer by changing the promoter activity [55]. The methylation status of the rDNA promoter region and its relationship with 45S pre-rRNA expression have been evaluated in several types of tumors, including breast, colon, oral, prostate, uterine, and cervical cancers, and myelodysplastic syndrome, but the results vary among tumor types [56][57][58][59][60][61]. In the only study available (to our knowledge) on the effect of CpG site-specific methylation status on promoter activity, Ghoshal et al. [62] reported that most CpGs between positions −136 and −58 were significantly hypomethylated, and the methylation of the rDNA promoter region resulted in a decline in the expression of rRNA genes in human hepatocellular carcinoma. The methylation of CpG sites at −347, −102 and −9 inhibited rRNA promoter transcription activity, probably by recruiting methyl-CpG binding proteins, whereas the CpG site at +152 did not play a role in the promoter activity. These observations suggest that a much broader region than UCE and CP may serve as the rDNA promoter, but whether the methylation status at specific CpG sites in the rDNA coding region has an effect on rDNA promoter activity is still unclear.
Previous studies have identified hypermethylation of the rDNA promoter region as a prognostic factor in breast [63], endometrial [64], and ovarian cancers [65]. The increased expression of 45S pre-rRNA is linked to the poor prognosis of patients with rhabdomyosarcoma [66] and colon cancer [67]. In this study, recurrent rDNA promoter SNVs at positions −206, −96, and −72 alone had no significant prognostic impact regardless of whether they were germline or somatic alterations, whereas germline SNVs at positions +49 and +52 alone, as well as the germline alterations found at position +1 to +100, were significantly associated with worse prognosis. In particular, the single nucleotide alteration at position +49 observed in CpG sites may affect the methylation of the rDNA promoter. The correlation between 45S pre-rRNA expression levels and rDNA promoter SNVs in LUAC and corresponding non-tumor tissues was not assessed because the RNA quality was not consistent among FFPE tissue samples due to degradation. Such an analysis would be valuable because it could provide functional insights into the role of rDNA promoter SNVs. Instead, to determine the possible effect of the rDNA promoter SNVs on ribosomal biogenesis and under ribosomal stress (also called nucleolar stress) [68], we used low-dose ActD to inhibit ribosome biogenesis in LUAC cell lines. Interestingly, the endogenous 47/45S pre-rRNA transcription level was significantly higher in PC9 cells carrying the rDNA promoter SNV, and A549 with the SNV outside the range of +1 to +100, than H1650 without SNV, although low-dose ActD significantly reduced the 47/45S pre-rRNA synthesis and no significant decrease in the 28S mature rRNA level was observed in all three cell lines. PC9 cells carrying the rDNA promoter SNV at +49 retained a transcription level of 47/45S pre-rRNA and 28S mature rRNA that was significantly higher than H1650 without the SNV. These results suggest that the rDNA promoter SNVs may be involved in ribosomal biogenesis, ribosomal/nucleolar stress tolerance, drug resistance and cell proliferation. A549 cells with the SNV outside the range +1 to +100 did not retain 45S pre-rRNA and 28S mature rRNA expression as much as PC9, but showed a mild decrease in 28S mature rRNA compared to H1650, suggesting that the ribosome biogenesis rate and the effect of ActD may differ depending on the location of SNVs. Overall, 28S rRNA production was less affected by ActD than de novo rRNA production represented by 47/45S pre-rRNA. This was expected because 28S rRNA is much more stable to low-dose ActD than 47/45S pre-rRNA, but can be reduced in several cell lines depending on the ActD concentration and treatment time [69][70][71][72]. Further validation experiments using multiple cell lines are warranted to confirm the suppressive effect of the SNV on ribosomal biogenesis and ribosomal/nucleolar stress. Ribosome biogenesis is a complex metabolic process that involves a large number of molecules, including ribosomal proteins [11,14,16,17], ncRNAs [11,73], UBF [14,15], and other suggested Pol I regulatory factors, such as Runx2 [74] and RasL11a [75]. Additionally, ribosomal stress response is regulated by numerous molecules in both p53-dependent and p53-independent manners [76,77]. Whether the SNV in the PC9 cell line is somatic or germline remains unknown. It will be important and informative to investigate the p53 and UBF phosphorylation status, the methylation status of the rDNA promoter, and the affinity of UBF and other regulatory factors to the rDNA promoter region in cells carrying SNVs and those without SNVs via chromatin immunoprecipitation (ChIP) and NGS-based ChIP-sequencing. In order to understand why germline rDNA promoter SNVs had more predictive power than somatic SNVs, and to elucidate the particular effects of individual alterations on methylation status and ribosomal biogenesis, will require further investigation.
The prognostic power of somatic SNVs from positions +1 to +100 was equivocal, probably because of their rarity in our data set (n = 2, at position +52) ( Figure S3 and Table 3) and due to the small sample size. Interestingly, germline and somatic rDNA promoter SNVs were nearly mutually exclusive, except in two LUAC cases (Table 2, Table 3, and Table S2). The limited sample size also caused an unbalanced distribution of some clinical characteristics, such as sex, number of stage I tumors at surgery, and histological subtype of LUAC. In addition, the cohort solely consisted of Japanese patients. In an attempt to confirm the prognostic implications of the rDNA promoter germline SNVs of the patient cohort used in this study, we searched public databases containing the WGS or WXS data of LUAC to validate our results in another independent cohort. Unfortunately, the rDNA promoter SNVs were not available in the open-access data of the WGS or WXS databases, including TCGA [36] and gnomAD [37]. Future large-scale studies are needed to validate the robustness of the SNV signatures in rDNA and to elucidate their prognostic implications.
Importantly, our survival analysis revealed a clear association of the presence of germline SNVs between +1 to +100 in the rDNA promoter region with an adverse outcome, independent of stage and histology, particularly in the patients with pathological early-stage invasive LUAC. Clinically, adjuvant chemotherapy is considered the global standard of care for patients with completely resected NSCLC, including adjuvant tegafur/uracil chemotherapy for stage I NSCLC patients in Japan [78,79], and global standard cisplatin-based adjuvant chemotherapy for the patients at stages II to III of the disease [79][80][81]. Nevertheless, the considerable acute and late toxicities could be associated with adjuvant chemotherapy [79][80][81]. The better identification of the patient subgroups that are at greater risk of disease progression and are more likely to benefit from adjuvant chemotherapy is needed. No biomarker currently exists to predict the effects of adjuvant chemotherapy on OS [81]. If the prognosis could be improved by adding chemotherapy to the poor prognosis group with rDNA promoter germline SNVs, germline rDNA promoter SNV status could be a valuable molecular marker to refine patient selection for adjuvant chemotherapy.
Ribosome biogenesis takes place in the nucleoli, which have long been linked to cancer; an increase in the number and size of nucleoli has been used as a marker of malignancy for over a century [14,[82][83][84]. Based on the finding that an increase in rRNA transcription affects cancer progression by not only promoting protein synthesis, and thus proliferative capacity, but also controlling cellular checkpoints and chromatin structure, targeted therapeutic approaches with first-in-class selective inhibitors of rDNA transcription, CX-5461 and PMR-116, have been explored [85,86]. The in vitro screening of the cytotoxic activity of CX-5461 revealed a broad spectrum of activity in cell lines derived from various cancers, including the A549 LUAC cell line [87]. In 2018, a phase I clinical trial of CX-5461 for hematological malignancies was completed, and phase I/II trials for solid tumors, including breast cancer and non-melanomatous skin cancer, are underway (Clinical trial identification, NCT02719977) [85,86]. Further analysis is needed to investigate the effects of rDNA promoter SNVs on ribosome biogenesis and prognosis.

Conclusions
We have identified frequent germline and somatic rDNA promoter SNVs in human LUAC tissues. The presence of the germline rDNA promoter SNVs located in the region between position +1 and +100 (with respect to the transcription start site at +1) was significantly associated with an adverse outcome in our LUAC patient cohort. Additional studies are required to validate our findings with larger cohorts and to investigate the functional role of SNVs at specific sites of the rDNA promoter region in cancer progression.