Gene Expression Analysis of Aggressive Clinical T1 Stage Clear Cell Renal Cell Carcinoma for Identifying Potential Diagnostic and Prognostic Biomarkers

The molecular characteristics of early-stage clear cell renal cell carcinomas (ccRCCs) measuring ≤7 cm associated with poor clinical outcomes remain poorly understood. Here, we sought to validate genes associated with ccRCC progression and identify candidate genes to predict ccRCC aggressiveness. From among 1069 nephrectomies performed on patients, RNA sequencing was performed for 12 ccRCC patients with aggressive characteristics and matched pairs of 12 ccRCC patients without aggressive characteristics. Using a prospective cohort (ClinicalTrials.gov Identifier: NCT03694912), the expression levels of nine genes (PBRM1, BAP1, SETD2, KDM5C, FOXC2, CLIP4, AQP1, DDX11, and BAIAP2L1) were measured by reverse-transcription polymerase chain reaction from frozen tissues, and their relation to Fuhrman grade was investigated in 70 patients with small ccRCC (≤4 cm). In total, 251 genes were differentially expressed and presented fold changes with p-values < 0.05; moreover, 10 genes with the greatest upregulation or downregulation in aggressive ccRCC remained significant even after adjustment. We validated previously identified genes that were associated with ccRCC progression and identified new candidate genes that reflected the aggressiveness of ccRCC. Our study provides new insight into the tumor biology of ccRCC and will help stratify patients with early-stage ccRCC by molecular subtyping.


Introduction
Approximately one-third of patients treated for localized clear cell renal cell carcinoma (ccRCC), the most common subtype of RCC, relapse following surgery; moreover, 15% of these patients exhibit metastatic potential, which can lead to death [1][2][3]. Tumor growth and changes in radiographic images

Results from the RNA-Seq Analysis
RNA-seq analysis generated 8266 × 10 6 base pairs (bp) from ccRCC with aggressive characteristics and 7583 × 10 6 bp from ccRCC without aggressive characteristics. In total, 71.68 × 10 6 reads in ccRCC with aggressive characteristics and 66.69 × 10 6 reads in ccRCC without aggressive characteristics were mapped. There were no differences in the number of reads between the two groups, among which a total of 12,636 genes were identified.

Differentially Expressed Genes
In total, 251 genes were differentially expressed with fold changes and p-values < 0.05; 153 upregulated genes and 98 downregulated genes had fold changes ≥2 in ccRCC with aggressive characteristics when compared with matched ccRCC without aggressive characteristics (Figure 1 and Figure S1). The ten genes that were most upregulated or downregulated in patients with aggressive ccRCC and retained significance even after adjustment are summarized in Table 2 and represented as box plots in Figure S2.

Results from the RNA-Seq Analysis
RNA-seq analysis generated 8266 × 10 6 base pairs (bp) from ccRCC with aggressive characteristics and 7583 × 10 6 bp from ccRCC without aggressive characteristics. In total, 71.68 × 10 6 reads in ccRCC with aggressive characteristics and 66.69 × 10 6 reads in ccRCC without aggressive characteristics were mapped. There were no differences in the number of reads between the two groups, among which a total of 12,636 genes were identified.
For the ten selected genes, a supervised hierarchical clustering analysis of patients with aggressive ccRCC versus matched pairs of patients with non-aggressive ccRCC was performed and revealed clustering of Pt1-4 and Pt1-9 in the non-aggressive ccRCC group ( Figure S3).
We further verified the expression of 16 genes from the TCGA ccRCC database using UALCAN ( Figure S5). Our results showed that 15 of the 16 genes (all except RGPD8) were differentially expressed in ccRCC samples, compared to the normal kidney samples. Moreover, the expression levels of 13 out of 16 genes (except RGPD8, NPR3, and KDM5C) differed according to the cancer stage.

Association between Oncological Outcomes and Expression of the 10 Newly Selected Genes
Cancer-specific survival (CSS) and recurrence-free survival (RFS) and their association with the ten newly selected genes were analyzed. For CSS, patients with cancer-specific death presented higher expression patterns of RGPD8, BAIAP2L1, and DDX11 and lower expression patterns of SLC16A9, FRAS1, AQP1, TMEM38B, and PRUNE2, in both the univariate and multivariate analyses. For RFS, those with cancer recurrence presented higher expression patterns of DDX11 and lower expressions of TMEM38B and PRUNE2, in both the univariate and multivariate analyses. Genes that were significantly associated with both CSS and RFS included DDX11, TMEM38B, and PRUNE2 (Table 3).

GO Analysis and KEGG Analysis of DEGs
DEGs were functionally classified into biological process (BP), cellular component (CC), and molecular function (MF) categories ( Figure S7). In the BP category, the top three most enriched terms were "single-organism process," "cellular process," and "single-organism cellular process" ( Figure S7A). In the CC category, the top three most enriched terms were "cell part," "cell," and "organelle" (Figure S7B). In the MF category, the top three most enriched terms were "binding," "protein binding," and "ion binding" ( Figure S7C). Moreover, the top three most enriched terms in the KEGG analysis were "metabolic pathways," "cell cycle," and "complement and coagulation cascades" ( Figure S7D).

Validation of Target Genes Using Frozen Tissue PCR
Of the ten genes newly identified through RNA-seq analysis, AQP1, DDX11, and BAIAP2L1 were previously reported to be potent indicators of outcome in patients with ccRCC. We analyzed the expression of these three genes along with that of the six genes previously identified by the qRT-PCR analysis of frozen tissue samples. Our results showed that DDX11 expression and tumor size were significantly greater in patients with high Fuhrman grade (3 and 4), both in the univariate and multivariate analyses (Table S5).
A panel of six genes (PBRM1, BAP1, SETD2, KDM5C, FOXC2, and CLIP4) was chosen for mutational frequency analysis based on the clinical relevance of significantly mutated genes across all ccRCC cohorts [12,13]. However, these six genes were not differentially expressed, perhaps because our patient cohort included unique cases for which cancers exhibited aggressive characteristics, such as synchronous metastasis, recurrence, or cancer-specific death, even though the clinical stage was low. Moreover, small sample sizes would have affected this result. As in other large NGS studies, PBRM1 mutation was identified in 45.8% patients. However, BAP1 mutation was observed in 25% of patients, which is slightly higher than the value obtained in other studies, perhaps because our cohort comprised patients with more aggressive ccRCC; we intentionally collected aggressive ccRCC to pair with non-aggressive ccRCC. BAP1, a critical gatekeeper for disease progression, has a mutually exclusive interaction with PBRM1, whereas the BAP1 mutation is associated with worse prognosis and a higher Fuhrman grade than the PBRM1 mutation [7,14]. This finding was further supported by the MSKCC cohort study where BAP1 mutations were associated with poor prognostic factors (higher T stage, higher nuclear grade, large size, more necrosis), and the presence of metastatic disease at presentation) [15].
SETD2 mutation was observed in all cases of ccRCC in our study, which diverges from other NGS studies in which the frequency of SETD2 mutation was substantially lower (approximately 11%) [12,13]. However, Liu et al. reported a multicenter study in China that had evaluated radical nephrectomy cases, wherein the SETD2 protein deficiency rate was 34.1%, although there may have been a discrepancy between protein deficiency and gene mutation rates [16]. KDM5C mutations were found in less than 40% of the study population, compared to 3.8-6.8% reported in other NGS studies [12,13]. We believe that this difference in KDM5C mutation frequency could result from selection bias because of the unique group of patients who were defined as aggressive. However, owing to the intratumoral heterogeneity (ITH), there could also have been a sampling bias. Moreover, gene mutation status was based on multiple types of sequencing that only reflect the gene status of very circumscribed cancer lesions, and thus could underestimate mutational rates [16].
The frequency of PBRM1 mutation was not different in ccRCC with and without aggressive characteristics. BAP1, KDM5C, FOXC2, and CLIP4 mutations were enriched in aggressive ccRCC, which was not statistically significant because of small sample sizes. This finding is similar to the results of other studies that have reported an association between BAP1 and KDM5C mutations and aggressiveness of tumors [9,10]. FOXC2, a gene known to be associated with tumor aggressiveness and synchronous metastasis in ccRCC, based on our previous studies [2,11], was found to be associated with aggressive characteristics of tumors in early-stage ccRCC, in the present study. The frequency of PBRM1 mutation was similar between patients with aggressive and non-aggressive ccRCC, perhaps because it is an early, essential event in tumorigenesis that does not impact clinical outcome, but instead plays a principal role in tumor initiation. BAP1 mutation was more frequently observed in aggressive ccRCC; BAP1 is believed to be associated with disease progression and a worse prognosis [17].
With regard to metastasis, the lung and bone were the two most common sites, with up to 60% and 40% of patients having lung and bone metastasis, respectively [18]. A recent study suggests that patients with bone metastases have an unfavorable prognosis; however, another study found no prognostic differences between metastatic sites [19], although this remains controversial. According to this study, synchronous bone metastatic patients were grouped with non-aggressive ccRCC patients, while synchronous lung metastatic patients were grouped with the aggressive ccRCC group. These results show that the early bone metastasis group genetically clusters with the less aggressive ccRCC group and suggest that the prognosis of early bone metastasis ccRCC patients might not be bad, although future studies are needed to verify this finding. The exact role of MOCOS and BAIAP2L1 in lung metastasis needs to be studied further, as these genes may help us understand differences in metastatic potential in ccRCC and the prognostic value of different metastatic sites.
We included AQP1, DDX11, and BAIAP2L1 from the newly identified gene set because these were previously associated with ccRCC or have a known relation to other tumors [20][21][22]. A prospective study examining frozen tissue qRT-PCR of AQP1, DDX11, BAIAP2L1, and six previously identified genes (PBRM1, BAP1, SETD2, KDM5C, FOXC2, and CLIP4) showed that expression of DDX11 was a significant factor for predicting tumor aggressiveness based on Fuhrman grade. Bhattacharya et al. demonstrated that inhibiting DDX11 in melanoma cells decreased proliferation and rapidly increased apoptosis [21]. In the future, we will perform an in vitro and in vivo study using DDX11 since ccRCC has similar tumor characteristics as melanoma.
We only included DDX11 and TMEM38B in our final analysis of CSS and RFS since although using three genes increased the statistical significance (Table S4), the combination of DDX11 and PRUNE2 was not associated with recurrence in the multivariate analysis (Table 4). Therefore, although combining three genes would of course increase the statistical significance compared to two genes in clinical settings, it is also important to reduce the number of genes used. Therefore, we tried to select the most important gene that is both related to CSF and RFS.
We propose that the factors involved in tumor aggressiveness and metastasis or recurrence are different. Previous studies have focused on using Fuhrman grade as the golden standard and have compared gene expression or mutation based on Fuhrman grade. However, Fuhrman grade only reflects cell differentiation and does not reflect metastatic and recurrence potential. Recent studies have reported clonal and subclonal mutations in ccRCC, whereas clonal mutations are present in all tumor regions, and subclonal mutations are present in some, but not all, tumors [23]. Clonal cancer evolution is also known to be involved in ccRCC through Darwinian selection [23]. TRACERx Renal classified tumors into seven distinct evolutionary subtypes [6,7], and primary tumors with low ITH, in addition to a low fraction of the tumor genome affected by somatic copy-number alterations (SCNAs), had an overall low metastatic potential. In contrast, primary tumors with low ITH but elevated SCNAs were associated with rapid progression at multiple sites [7]. Thus, our knowledge regarding potential ccRCC biomarkers for guiding intervention and surveillance has recently been dramatically expanded. Recent article by Wach et al. reported the RNA sequencing of collecting duct carcinoma, a rare renal cell carcinoma subtype with a very poor prognosis and solute carrier (SLC) gene family (SLC3A1, SLC9A3, SLC26A7, and SLC47A1) was included in the most downregulated genes [24]. According to this study trend, our newly identified biomarkers may further aid in unraveling the evolutionary trajectories of ccRCC tumor growth.
This study has a few limitations. First, this study intentionally collected aggressive cases of ccRCC; thus, selection bias may have resulted in statistical differences from other NGS studies. However, this collection method has benefits, including the examination of unique cases of interest to clinicians. Second, because of the retrospective nature of this study, the study sample size was small; this limited our ability to further investigate the ITH of primary tumors, although the ITH in ccRCC tumors, including even small renal masses, may be substantial [1]. In future studies, we plan to overcome this ITH limitation by analyzing the circulating tumor DNA in plasma. Third, RNA-seq was based only on surgical FFPE samples and we could not provide validation results from a different independent dataset. To overcome these limitations and validate our findings, a prospective clinical trial is now underway (ClinicalTrials.gov Identifier: NCT03694912) that will include an examination of the mutation profiles of the selected genes (PBRM1, BAP1, SETD2, KDM5C, FOXC2, CLIP4, MOCOS, BAIAP2L1, DDX11, and AQP1) for aggressive ccRCC. However, our preliminary data obtained using frozen tissue PCR show promising results for DDX11, and in vivo and in vitro studies on each gene are ongoing. The demonstration of immunohistochemistry in selected genes would have provided the association between RNA and protein. However, we have focused on the use of RNA expression profiles since it is easier and faster to use in the clinical settings compared to immunohistochemistry. Nevertheless, the protein expression profiles would provide more specific information regarding the role of selected genes. Therefore, we are planning to include protein expression analysis in the future study.

Patients and Tissues
For this retrospective study, data were gathered from 1132 patients with ccRCC (≤7 cm) who underwent radical and partial nephrectomy between January 2008 and December 2014. An aggressive tumor was defined as a tumor exhibiting synchronous metastasis, recurrence, or cancer-specific death; synchronous metastasis was defined as metastasis detected at or within three months of the primary RCC diagnosis [25]. Twelve patients with aggressive ccRCC and 12 matched patients in terms of gender, BMI, percentage of radical surgeries performed, clinical tumor sizes, and Fuhrman grades with non-aggressive ccRCC were examined in order to adjust the confounding factors other than aggressiveness of ccRCC. This study was approved by the Institutional Review Board of the Yonsei University Health System (project no: 4-2013-0742).
We included patients with ccRCC (≤7 cm) treated with nephrectomy alone and for whom formalin-fixed paraffin-embedded (FFPE) tumor tissue samples were available; we also included patients who did not exhibit the typical characteristics of ccRCC. Specimens that were inappropriate for molecular analysis were excluded, and clinicopathologic features were recorded for each patient (Supplementary Method 1). Diameters of the primary tumors were measured using imaging modalities.
For validation analyses, we used a prospective cohort (ClinicalTrials.gov Identifier: NCT03694912) sampled between November 2018 and July 2019 and included cases with small ccRCCs (≤4 cm). Data on clinicopathologic features were included, as previously described in our retrospective study group. The levels of six previously identified genes (PBRM1, BAP1, SETD2, KDM5C, FOXC2, and CLIP4), plus three genes (AQP1, DDX11, and BAIAP2L1) among the ten newly identified genes, were considered potent indicators of outcome in ccRCC patients according to a literature search. Gene expression was measured by reverse-transcription polymerase chain reaction (qRT-PCR).

Tissue Preparation
Formalin-fixed, paraffin-embedded (FFPE) sections from patients with ccRCC were obtained from the archives of the Department of Pathology at Yonsei University College of Medicine (Seoul, Korea). All cases were reviewed and classified by a urologic pathologist (N.H.C.). Non-tumor elements were identified based on hematoxylin and eosin-stained slides from each sample by a urologic pathologist (N.H.C.). The samples were subsequently cut into 20-µm sections before being transferred to an extraction tube.

RNA Extraction and Sequencing
Briefly, for FFPE tissues, RNA was extracted from 20 µm sections using an FFPE RNeasy kit (Qiagen, Gaithersburg, MD, USA), and 100 ng of RNA was used to construct a sequencing library using a TruSeq RNA Access library prep kit (Illumina, San Diego, CA, USA) according to the manufacturer's protocols (Supplementary Method 2).
The sequencing libraries were quantified by the Kapa Biosystems Library Quantification Kit for Illumina Sequencing platforms according to the qPCR Quantification Protocol Guide (KapaBiosystems, Wilmington, MA, USA Cat #KK4854) and qualified by the TapeStation D1000 ScreenTape (Agilent Technologies, Palo Alto, CA, USA Cat # 5067-5582). Indexed libraries were then submitted to Illumina NovaSeq 6000 (Illumina), and paired-end (2 × 101 bp) sequencing was performed by Macrogen Incorporated.

Analysis of RNA Sequencing-Differentially Expressed Gene (DEG) Selection
Paired-end sequencing reads from cDNA libraries (101 bp) were generated and the trimmed reads were aligned to the reference human genome ([University of California Santa Cruz] UCSC hg19). Transcriptome assembly of known transcripts, novel transcripts, and alternative splicing transcripts was processed.
Based on the results of this processing, the abundance of transcript and gene expression was calculated as read count (the number of reads mapping to a gene) or FPKM value (fragments per kilobase of exon per million fragments mapped) per sample. To identify DEGs from aggressive and non-aggressive ccRCC groups, genes with more than one sample of zero read count were pre-filtered. We then measured the expression levels of 27,685 RefSeq genes and 12,636 genes (Supplementary Method 3).
For DEGs, unsupervised hierarchical clustering analysis was performed with the complete linkage method and Euclidean distance as a measure of similarity. Principal component analysis (PCA) was performed to reduce the dimensionality of the data set by transforming it into a new set of variables to summarize the data features process (Supplementary Method 3). All data analysis and visualization of DEGs were conducted using R.3.5.1 (www.r-project.org).

Variant Calling
For variant calling of RNA-seq data, trimmed reads were aligned to the human genome (UCSC hg19). The alignment of the reads used in this analysis was created through Split 'N' Trim, mapping quality reassignment, indel realignment, and base recalibration process (Supplementary Method 4). For our target gene list (PBRM1, BAP1, SETD2, KDM5C, FOXC2, and CLIP4), the frequency of alterations in all samples was represented on the heatmap. Individual genes are represented as rows, and individual patients are represented as columns.

qRT-PCR
Total RNA was extracted from frozen tissue samples using TRIzol ® reagent (Ambion, Life Technologies, Carlsbad, CA, USA), and 1 µg of total RNA was reverse-transcribed into first-strand cDNA using an iNtRon Maxime RT PreMix (Intronbio, Sungnam, Korea Cat No. 25081) according to the manufacturer's protocol (Supplementary Method 5).

GO Analysis and KEGG Analysis
Gene Ontology (GO) analysis per biological process (BP), cellular component (CC), and molecular function (MF) for DEGs were performed based on Gene Ontology (http://geneontology.org/). KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis for DEGs was also performed based on the KEGG pathway (https://www.genome.jp/kegg/) database (Supplementary Method 6).

Validation of Gene Expression, Survival Analyses, and Statistical Analyses
UALCAN (http://ualcan.path.uab.edu) was used to validate the expression of ten newly identified genes and six target genes according to the stage of the cancer. The Kaplan-Meier plotter database (http://kmplot.com) was used to analyze the associations between 16 genes (10 newly identified and six target genes) and overall patient survival. Additional details for statistical analyses are provided in Supplementary Method 7.

Conclusions
In summary, we identified DEGs in aggressive early-stage ccRCC, defined as those with synchronous metastasis, recurrence, or cancer-specific death, and found that MOCOS, RGPD8, BAIAP2L1, and DDX11 showed significantly higher expression levels, whereas SLC16A9, FRAS1, AQP1, TMEM38B, and PRUNE2 showed significantly lower expression levels compared to those in non-aggressive ccRCC. Moreover, we verified previously known genes that are associated with tumor aggressiveness by analyzing mutational frequency. Further research on these molecular biomarkers may help stratify patients with early-stage ccRCC by molecular subtyping and aid in making clinical decisions accordingly.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/1/222/s1. Table S1: PCR primer sequences of target genes. Table S2: Patient baseline characteristics. Table S3: Mutational frequencies of the candidate aggressiveness-associated genes. Table S4. Expression of combination of DDX11, TMEM38B, and PRUNE2 by oncological outcomes (cancer-specific death and recurrence) (in addition to Table 4). Table S5. Comparison of expression levels of target genes according to Fuhrman grade in clear cell renal cell carcinoma (≤4 cm). Figure S1. Volcano plots showing a comparison of gene expression levels in aggressive ccRCC and non-aggressive ccRCC. Figure S2. Box plots showing a comparison of expression levels of ten selected genes in aggressive and non-aggressive ccRCC. Figure S3. Supervised hierarchical clustering analysis (red, high relative expression; green, low relative expression) of aggressive ccRCC patients (n = 12, red) versus matched non-aggressive ccRCC patients (n = 12, blue) based on the ten selected genes. Figure S4. Expression levels of BAIAP2L1 and MOCOS according to the synchronous metastatic sites (lung and bone) shown as box plots. Figure S5. Validation of the expression of 16 genes in the TCGA database according to stage. Figure S6

Conflicts of Interest:
The authors declare no conflict of interest.