Comprehensive Profiling of Primary and Metastatic ccRCC Reveals a High Homology of the Metastases to a Subregion of the Primary Tumour

While intratumour genetic heterogeneity of primary clear cell renal cell carcinoma (ccRCC) is well characterized, the genomic profiles of metastatic ccRCCs are seldom studied. We profiled the genomes and transcriptomes of a primary tumour and matched metastases to better understand the evolutionary processes that lead to metastasis. In one ccRCC patient, four regions of the primary tumour, one region of the thrombus in the inferior vena cava, and four lung metastases (including one taken after pegylated (PEG)-interferon therapy) were analysed separately. Each sample was analysed for copy number alterations and somatic mutations by whole exome sequencing. We also evaluated gene expression profiles for this patient and 15 primary tumour and 15 metastasis samples from four additional patients. Copy number profiles of the index patient showed two distinct subgroups: one consisted of three primary tumours with relatively minor copy number changes, the other of a primary tumour, the thrombus, and the lung metastases, all with a similar copy number pattern and tetraploid-like characteristics. Somatic mutation profiles indicated parallel clonal evolution with similar numbers of private mutations in each primary tumour and metastatic sample. Expression profiling of the five patients revealed significantly changed expression levels of 57 genes between primary tumours and metastases, with enrichment in the extracellular matrix cluster. The copy number profiles suggest a punctuated evolution from a subregion of the primary tumour. This process, which differentiated the metastases from the primary tumours, most likely occurred rapidly, possibly even before metastasis formation. The evolutionary patterns we deduced from the genomic alterations were also reflected in the gene expression profiles.


Patient RC-1 Medical History and Sample Origin
A male patient with diagnosis of clear cell renal cell carcinoma (ccRCC) pT3bNoM1 underwent surgery for a primary tumour resection in the right kidney. Four samples were taken from the primary tumour; a tumour region with tumour grade 4 (Pr1), tumour grade 3 (Pr2), tumour grade 4sarcomatoid differentiation (Pr4), and tumour grade 3 (Pr3). Also a tumour thrombus reaching inferior vena cava was resected (sample VT with tumour grade 2). In the same year the patient underwent wedge excision for the metastatic lesions in the right and the left lungs. Three samples were taken from this procedure; a metastasis in the lingula of the left lung with tumour grade 4sarcomatoid differentiation (M1), a metastasis in the dorsal apex, lower lobe of the left lung with tumour grade 4 (M2), a metastasis in the lateral basal, the under lobe of the right lung with tumour grade 3 (M3). A year later a recurrent metastatic lesion was found in the left lung. The patient received conventional immunotherapy of PEG-interferon with a good response. Two years later, the patient underwent resection of the whole upper lobe of the left lung (M4 with tumour grade 4-rhabdoid).

DNA and RNA Isolation of FFPE Blocks
The FFPE blocks were serially cut in 10 µm slides, in which the first and the last slides (3 µm in thickness) were stained with H&E and used as reference in identifying tumour regions with a different WHO/ISUP grade. The odd cut slides were processed for DNA isolation and the even slides were processed for RNA isolation. The isolation of DNA was performed following the Adaptive Focused Acoustics TM -based DNA extraction of FFPE/ truXTRAC TM FFPE DNA kit protocol (Covaris, Woburn, MA, USA) and RNA isolation was performed following the Adaptive Focused Acoustics TMbased RNA extraction of FFPE/ truXTRAC TM FFPE microTUBE RNA kit protocol (Covaris, Woburn, MA, USA).

Correction Based on Tumour Cell Purity and Somatic Mutation Identification
The mutant read frequency (MRF) of each variant detected by WES was corrected for the normal cell admixture. The approximate percentage of normal cells in each sample was calculated based on the read counts for personal variants on the short arm of chromosome 3. As arrayCGH data indicated loss of 3p in all samples except M4, all tumour cells contribute only one copy of the personal variant, whereas all normal cells contribute one copy of both alleles for a certain personal variant. M4 appears to have two identical copies of chromosome 3. Thus the imbalance between both alleles can be used to estimate the percentage of tumour cells. The MRF of all variants in each sample was recalculated based on the percentage of total reads belonging to the tumour cells.
For each patient, the mutations present in each tumour area were classified into major clonal, minor clonal, absent, or inconclusive [5]. If the total number of mutant reads was ≥5 and the MRF ≥ 25%, the mutation was considered to be a major clone. If the total number of mutant reads was ≥5 and the MRF < 25%, or the total number of mutant reads was ≥3 and the total read count was ≥ 10, the mutation was defined as a minor clone. If the total number of mutant reads was < 3 and the total read count was ≥ 10, the mutation was defined as absent. Every mutation with total read count < 10 was considered as inconclusive. Matched normal kidney samples were included to remove personal variants in WES and TS data. The Integrative Genomic Viewer (IGV) was used to confirm the authenticity of the identified somatic mutations [6].

Ploidy Estimations Based on Median BAFs of the Germline Variants
We used the SNV allele counts for the germline variants to get more information on the ploidy state of each tumour sample. The germline variants were selected based on having a B-allele frequency (BAF) 0.4-0.5 in the normal sample. We calculated the BAF of all variants in tumour samples and determined the median BAF for a selection of chromosome, or chromosomal segments (supplementary table 1). The median BAF for the germline variants in the normal sample is 0.46 and is representative of an even number of chromosomal copies. If in a tumour sample the median BAF for any segment is close to this value, this indicates that also in the tumour cells the absolute copy number for this genomic segment is even. If this occurs for the lowest ploidy level in an array CGH plot this means that this level represents a copy number of 2. In contrast, if the median BAF in a tumour sample is very different from 0.46, this indicates either an odd number of copies or an isodisomic situation.

Somatic Mutation Validation by Targeted Sequencing for
To validate the somatic mutations previously detected by WES in Patient RC-1, a targeted sequencing assay based on Single Primer Enrichment Technology (SPET) has been designed using the Ovation TM Custom Target Enrichment System (NuGEN, San Carlos, CA, USA). Landing probes were designed close to, and on both sides of the selected mutations (see supplementary Table 3). The library preparation was done according to the FFPE-specific protocol from the manufacturer (NuGEN, San Carlos, CA). Single-end sequencing of enriched libraries was performed on the Illumina NextSeq 550 System (Illumina, San Diego, CA, USA).

Generation of Phylogenetic Trees
Unsupervised hierarchical clustering was done in R 3.4.0 with the ape 5.0 package using the binary distance matrix based on the presence or absence of amplification/deletion in each chromosome arm generated by arrayCGH.

RNAseq Reads Processing and Gene Expression Analysis
In the processing of RNAseq reads, FASTQ files were polyA and polyG trimmed. Then, the reads were aligned to the reference genome (grch37 1000 genomes reference build with decoy sequences from the GATK bundle ( [7,8] with ensembl version 75 transcript annotation [9] using hisat [10]. General read operations were performed using SAMtools [11]. The gene-level quantification was performed using Htseq-count [12]. The analysis of gene expression was done per patient in R 3.4.0 using DESeq2 package, which is based on negative binomial generalized linear models [13]. Only genes with a mean read count ≥50 across all samples of a patient were used for further analysis. Unsupervised gene clustering, based on the 500 most highly variable expressed genes across all samples in each patient, was done to compare the gene expression profiles in each patient. Regularized-logarithm transformation (rlog) was used to transform the count data to the log2 scale which minimizes differences between samples with small counts of genes, and which normalizes according to library size. The heat map was constructed based on the amount of deviation of each gene in a specific sample from the mean expression of that particular gene in all samples. The colour ranges for gene expression was adjusted with the rlog -2 to 2 as the core and rlog -6 to 6 as the extension.
Supervised gene clustering analysis was used to identify genes differentially expressed between primary tumours and metastases from all five patients, in which P-values were adjusted for multiple testing correction (Benjamin-Hochberg algorithm). A Padj-value ≤0.01 was considered statistically significant.