Clonal Evolution and Timing of Metastatic Colorectal Cancer

Simple Summary Half of all colorectal cancer (CRC) patients develop metastasis, despite current management. The aim of this study was to help guide precision medicine in metastatic CRC patients, by performing genomic characterisation of primary CRC and metastatic tumours, and revealing the effects of therapy on the metastatic process. We confirmed common ancestry between paired primary CRC and metastatic tumours, with most metastases seemingly having disseminated late (after acquiring most genomic diversity) from their corresponding primary tumour, via either a single clone (monoclonal spread) or multiple clones (polyclonal spread). Treatment prompted the selection for distinct resistant clones, through monoclonal seeding to distant metastatic sites. Overall, this study supports the importance of early clinical detection and surgical removal of the CRC tumour, whilst further highlighting the challenges for treating and managing metastatic CRC with increased intratumour heterogeneity (either due to early dissemination or metastatic spread through multiple clones) and the underlying risk of future therapeutic resistance in treated patients. Abstract Colorectal cancer (CRC) is the third most frequently diagnosed cancer worldwide, where ~50% of patients develop metastasis, despite current improved management. Genomic characterisation of metastatic CRC, and elucidating the effects of therapy on the metastatic process, are essential to help guide precision medicine. Multi-region whole-exome sequencing was performed on 191 sampled tumour regions of patient-matched therapy-naïve and treated CRC primary tumours (n = 92 tumour regions) and metastases (n = 99 tumour regions), in 30 patients. Somatic variants were analysed to define the origin, composition, and timing of seeding in the metastatic progression of therapy-naïve and treated metastatic CRC. High concordance, with few genomic differences, was observed between primary CRC and metastases. Most cases supported a late dissemination model, via either monoclonal or polyclonal seeding. Polyclonal seeding appeared more common in therapy-naïve metastases than in treated metastases. Whereby, treatment prompted for the selection of distinct resistant clones, through monoclonal seeding to distant metastatic sites. Overall, this study reinforces the importance of early clinical detection and surgical excision of the CRC tumour, whilst further highlighting the clinical challenges for metastatic CRC with increased intratumour heterogeneity (either due to early dissemination or polyclonal metastatic spread) and the underlying risk of future therapeutic resistance in treated patients.

Simple Summary: Half of all colorectal cancer (CRC) patients develop metastasis, despite current management. The aim of this study was to help guide precision medicine in metastatic CRC patients, by performing genomic characterisation of primary CRC and metastatic tumours, and revealing the effects of therapy on the metastatic process. We confirmed common ancestry between paired primary CRC and metastatic tumours, with most metastases seemingly having disseminated late (after acquiring most genomic diversity) from their corresponding primary tumour, via either a single clone (monoclonal spread) or multiple clones (polyclonal spread). Treatment prompted the selection for distinct resistant clones, through monoclonal seeding to distant metastatic sites. Overall, this study supports the importance of early clinical detection and surgical removal of the CRC tumour, whilst further highlighting the challenges for treating and managing metastatic CRC with increased intratumour heterogeneity (either due to early dissemination or metastatic spread through multiple clones) and the underlying risk of future therapeutic resistance in treated patients.
Abstract: Colorectal cancer (CRC) is the third most frequently diagnosed cancer worldwide, wherẽ 50% of patients develop metastasis, despite current improved management. Genomic characterisation of metastatic CRC, and elucidating the effects of therapy on the metastatic process, are essential to help guide precision medicine. Multi-region whole-exome sequencing was performed on 191 sampled tumour regions of patient-matched therapy-naïve and treated CRC primary tumours (n = 92 tumour regions) and metastases (n = 99 tumour regions), in 30 patients. Somatic variants were analysed to define the origin, composition, and timing of seeding in the metastatic progression of therapy-naïve and treated metastatic CRC. High concordance, with few genomic differences, was observed between primary CRC and metastases. Most cases supported a late dissemination model, via either monoclonal or polyclonal seeding. Polyclonal seeding appeared more common in therapy-naïve metastases than in

Introduction
Colorectal cancer (CRC) is the third most diagnosed cancer worldwide, and one of the top most in Saudi Arabia, with a distinctly earlier (≤10 years) disease onset compared to other ethnicities [1,2]. High mortality in CRC has been attributed to metastatic disease [1][2][3], whereby half of all patients develop metastasis, despite the improved management of metastatic CRC in recent years [4].
The evolutionary process of CRC is well documented. Whereby, a gradual multi-step model has been used to describe the transformation of intestinal epithelium into invasive adenocarcinoma, from which disseminating tumour cells can then migrate directly, or via haematogenous or lymphogenous spread and colonise regional or distant organs [5], such as the liver, lung, and peritoneum and less frequently the bone and brain [6].
Predominantly, seeding originates from the primary tumour and is spread via a single clone (monoclonal seeding) [7] or multiple subclones (either synchronous or asynchronous polyclonal seeding) to a metastatic site [8][9][10]. Recent studies have also shown metastatic cross-seeding, where one metastasis can seed secondary metastatic sites, via metastatic cascading [8,11].
Several studies have traditionally defined the process of CRC metastasis through a linear progression or late dissemination model, where metastatic divergence occurs late in the primary tumour relative to tumorigenesis [11][12][13][14]. However, few recent studies have challenged this model, suggesting that metastatic seeding might occur early [15,16], before clinical detectability (<0.01 cm 3 ) and years before diagnosis or surgery, proposing most CRCs may be "born to be bad" [15].
Understanding the complexity of the metastatic process has been very problematic, even in this advanced genomic era, with difficulties going beyond the procurement of patient-matched paired primary tumours and their distant metastases. Hence, it is of high clinical importance to clarify the origin, seeding composition, and timing of spread to help extricate the evolutionary metastatic process in CRC.
We sought to expand upon current knowledge on the metastatic process in CRC, in the Saudi population, and further highlight the effect of therapy. We used whole-exome sequencing (WES) analysis from multiple tumour regions of paired primary CRC and distant metastatic lesions, from 30 patients with metastatic CRC, to define the origin, composition, and timing of seeding in the metastatic progression of therapy-naïve and treated metastatic CRC tumours.
Overall, this study reinforces the importance of early detection and removal of the CRC tumour, whilst further highlighting the clinical challenges for metastatic CRC with increased intratumour heterogeneity (either due to early dissemination or polyclonal metastatic spread) and the underlying risk of future therapeutic resistance in treated patients.
Tumour heterogeneity was further assessed spatially, within tumours (intratumour heterogeneity; accounting for differences amongst multiple spatially distinct primary tumour regions or metastatic tumour regions), and temporally, between tumours (intertumour heterogeneity; accounting for differences between patient-matched primary tumour regions and metastatic tumour regions) in each patient, using Shannon's diversity index (SDI) [17,18]. Following dissemination, all patients demonstrated significantly more intertumour (median 2.0, range 0.48-3.57) rather than intratumour (primary tumour-specific median 1.5, range 0.48-2.46, and metastatic-specific median 1.2 mutations, range 0.12-3.01) heterogeneity, where the tumours (primary and metastasis) continued to evolve independently after the metastatic seeding event (BH adjusted p = 0.030 for primary tumours and p = 0.002 for metastatic tumours; Mann-Whitney U test) ( Figure S2a). Intertumour heterogeneity, and intratumour heterogeneity amongst metastases, were independent of clinicopathological variables, such as TNM (tumour-node-metastasis) classification and histological grading, where no association was observed (p > 0.05; ANOVA test). However, interestingly, less intratumour heterogeneity amongst primary tumours was significantly associated with a higher prevalence of multiple regional lymph node metastases (p = 0.0176; ANOVA test). Therefore, increased regional lymph node involvement may not necessarily reflect proportionally increased intratumour heterogeneity in the primary tumour.
Dynamic mutational processes were identified using the 30 published mutational signatures from the COSMIC database (https://cancer.sanger.ac.uk/cosmic/signatures) ( Figure S3). Whereby, mutational signature 1, associated with age at diagnosis and mutational signatures related to defective DNA mismatch repair (MMR; 6 and/or 15), were activated consistently, across the cohort. However, upon further stratification according to treatment, mutational signature 3, associated with failure of DNA double-strand break-repair by homologous recombination (HR) and mutational signature 17, were specifically activated late in the progression of treated metastases only. Recently, the COSMIC mutational signature 17 (SBS17) has further been split into SBS17a (of unknown aetiology) and SBS17b (an exogenous signature related to capecitabine and 5-fluorouracil, or 5-FU, chemotherapy, and damage by reactive oxygen species, with significantly more T > G nucleotide changes [19,20]). Treatment, which mainly involved a therapeutic regimen comprising capecitabine and 5-FU chemotherapy, was significantly associated with more T > G nucleotide changes in the current metastatic CRC cohort (treated 3.73% vs. therapy-naïve 2.64%) (BH adjusted p = 0.014, Chi squared test). Therefore, the intratumor heterogeneity, post-dissemination, in treated metastases is probably resultant of the collective dysregulation of DNA repair processes, and the capecitabine and 5-FU chemotherapy.
Although CNV patterns were also largely similar between primary and metastatic tumours (Table S4), CNVs were significantly biased towards being deletions, with 90% (range 0-100%) of all CNVs identified as deletions compared to 10% (range 0-100%) of amplifications (BH adjusted p < 0.0001; Mann-Whitney U test). Despite treatment, CNVs persisted, as previously reported [21], which may confer resistance to therapy. age, primary site, metastatic site, onset of metastasis, treatment status, number of sequenced regions, and genome doubling. The percentage of variants shared between primary and metastasis, specific to primary, or specific to metastasis have been indicated in the middle panel. "P" refers to primary CRC tumour and "M", refers to metastatic lesion. The treatment status of the primary tumour of patient 112-107, marked by an asterisk (*), was not available.
Evidence of chromosomal instability, generated through genome doubling (GD) or polyploidy, resulting in additional copies of the entire genome of a tumour cell, was also investigated. A region with mean ploidy ≥3 was considered to have undergone genome doubling. Whereby, 40% (12/30) of cases presented GD events, most of which were restricted to the metastatic lesion ( Figure 1 and Figure S1, Table S4), in agreement with previous reporting [22] albeit, this was not statistically significant (p > 0.05; Chi-squared test). Although chromosomal instability is characteristically generated in CRC, treatment and the metastatic organ site colonised appeared to play a minimal role on copy number evolution in metastatic CRC.

Driver Events
Potential driver events were identified using a list of cancer genes, from the COSMIC cancer gene census (v90), The Cancer Genome Atlas-human colorectal carcinoma (TCGA-CRC), and other large genomic studies, which were then further assessed for functional impact using several prediction tools, as previously described [23] (see Materials and methods, Tables S3 and S4 for further details of 'driver classification'). The distribution of driver events was further investigated in the multiple tumour regions sampled from each patient. A total of 131 driver mutations were identified, across the metastatic CRC cohort. Where, 94 were clonal (including shared clonal, primary-specific clonal, or metastatic-specific clonal) and 37 were subclonal (including shared subclonal, primary-specific subclonal, or metastatic-specific subclonal) ( Figure 2).
Recurrent driver mutations in the cohort were found encompassing cancer genes with known roles in CRC [24,25] [26][27][28][29]. High concordance in driver genes was observed between primary CRC and metastatic tumours, in agreement with previous reporting [30]; of which, 88% of shared (between the primary tumour and corresponding metastatic lesions) driver mutations were clonal.
If not readily clonal in its primary-metastatic pair, most driver mutations became clonal in their corresponding metastatic tumours, likely upon migration due to evolutionary bottlenecking. In contrast, driver mutations in PIK3CA were consistently subclonal, in agreement with previous reporting [31]. The pattern of disseminating subclonal driver mutations in the primary tumour, either becoming clonal or remaining subclonal upon seeding/colonising the metastatic site, was only observed in therapy-naïve tumours. Whereas, the pattern of disseminating clonal driver mutations in the primary tumour, becoming subclonal upon seeding/colonising the metastatic site, was only observed in treated tumours; thus re-emphasising the relevance of adjuvant therapy following tumour excision in CRC for effectively eradicating and/or preventing subclones in the primary tumour, from either successful seeding/colonising and/or thriving at the metastatic site.  (Table S4 and Figure S1).

Origin of Metastasis
Phylogenies were clustered spatially and temporally for each CRC tumour, lymph node, where relevant, and distant metastasis, based on mutational cancer cell fractions (CCFs) from both SNVs and Indels (Table S5 and Figure S4). All sampled regions indicated a common clonal ancestry, amongst matched primary-metastasis tumour pairs, whereby metastases were often seeded by major clones in the primary tumour, further supported by the increased driver homogeneity and fewer metastatic-specific driver mutations observed in the cohort.
Notably, distant metastasis-founding subclones were not detected in any primary region sampled from patient CRC-233, which instead underwent a metastatic cascade, from the regional lymph node to the liver. Despite the remaining two cases (CRC-141 and CRC-293) with available lymph node metastases having seemingly metastasised independent of lymphatic involvement to their corresponding distant metastatic sites (prostate and ovary), they may also have indeed spread lymphogenously by an alternative lymph node that was not sampled.
Phylogenetic analysis further revealed mixed seeding patterns, as previously reported in CRC [11,32], including monoclonal seeding (60.6%, 20/33 of all metastases), where only a single clone from the primary tumour/lymph node metastasis was involved in seeding the metastasis, and polyclonal seeding (39.4%, 13/33 cases), where more than one primary tumour/lymph node metastasis subclone contributed to seeding the metastasis. Thus, indicating that multiple subclones may be able to attain metastatic potential and colonise distant metastatic sites through mutual cooperation.

Metastatic Divergence
To estimate the timing of dissemination in relation to the founding of the primary tumour, the final molecular time at the primary tumour was approximated and further classified as late (final molecular time ≥ 50%), or early dissemination (final molecular time < 50%) ( Figure 3 and Figure S4).   Whereby, most clonal diversity emerged at the primary CRC site, which was distributed to distant metastatic sites, via metastatic divergence after a median of 79.6% (range 20.8-95.3) final molecular time at the primary tumour ( Figure 3). Contrastingly, across the cohort, 78.6% (81/103) of all primary tumour clusters were exclusive to primary tumours, either never having metastasised due to natural negative selection (in therapy-naïve metastases), having arisen after the colonisation of the metastatic lesion, having successfully been eradicated under therapy (in treated metastases), or having seemingly not been present due to block sampling.

ONSET OF METASTASIS
Nearly all patients exhibited linear metastatic progression patterns in their distant metastases (and regional lymph node metastases, where relevant), irrespective of treatment, clonal seeding, and site colonised, whereby all primary tumours disseminated relatively late in molecular time, gaining most of their genetic diversity at the primary tumour.
In contrast, four patients (083-32, 306-171, CRC-474, and CRC-477) incurred parallel metastatic progression to the liver, disseminating after a median final molecular time of 42.1% (range 20.8-46.7%) at the primary tumour. As expected, early dissemination was mostly found in synchronous metastases (75%), evident irrespective of treatment. Interestingly, early disseminating cases were significantly enriched for five altered (via mutation or copy-number deletion) tumour suppressor driver genes, including BRCA2 Unlike a previous report [11,32], where synchronous metastases were shown to disseminate significantly earlier than metachronous metastases, no significant difference was found between synchronous and metachronous metastases in the current metastatic CRC cohort. However, treated synchronous metastases (median 57.2%, range 41.8-83.9%) seemingly appeared to disseminate earlier (on average 23.4% earlier) than untreated synchronous metastases (median 79.5%, range 20.8-95.3%) and treated metachronous metastases (median 81.7%, range 46.7-94.3%). However, this did not reach statistical significance, possibly due to the small number of treated synchronous metastases analysed (p > 0.05, Chi-squared test).

Therapy-Resistant Subclones
Neutral selection (dN/dS, a ratio of the substitution rate at non-synonymous sites to those at synonymous sites,~1) was consistently observed throughout the cohort, when considering all exonic mutations in both primary tumours and metastatic lesions (Table S6 and Figure S5). However, upon stratification according to treatment, in the metastases, a sharp decline in selection (dN/dS < 0.75) was observed in metastatic-specific clonal mutations, which substantially increased (dN/dS ≥ 1.5) in the subclonal mutations. Whereas, therapy-naïve tumours remained consistently neutral. This suggests very stringent selective pressure from treatment, before successful colonisation at the metastatic site, presumably to select therapy-resistant 'fit' clones in the micrometastasis, whereby positive selection followed during late metastatic progression, to produce diversity that can adapt to the changing microenvironment.
In further support of this, the majority of all disseminating primary tumour subclones, across treated samples, showed significant resistance to therapy (see Materials and methods for quantification of 'therapy resistance'), with significant resistance in ARID1B  (Table 2 and Table S5, Figure 4). Moreover, patients showing parallel progression (CRC-474 and CRC-477) exclusively showed resistance in all primary tumour disseminating subclones, comprising APC, TP53, CREBBP, LEF1 (MIM: 153245), SPEN (MIM: 613484), and MAP2K4 driver mutations. Furthermore, metastatic-specific subclones, containing driver mutations in ELF4, ERCC4, FUS, PTPRT, and SMAD4, may also confer resistance to therapy. Although select variants in driver genes seemingly displayed resistance or were acquired following therapy, in particular genes including APC, KRAS, NF1, RNF43, and TP53 may have potential to be clinically actionable, by indicating the possible suitability of certain therapies in patients harbouring these variants (Table 2).

Discussion
Through systematic analysis of WES data from multiple spatial regions of 30 paired primary CRC and distant metastatic tumours, we describe the overall genomic concordance between primary CRC and metastasis, with predominantly late metastatic divergence, through monoclonal or polyclonal spread from metastatic 'fit' clones in the primary tumour. Treatment had a diminutive effect on chromosomal instability and most driver mutations, assumingly to attain resistance mechanisms for increased chances of tumour cell survival. Furthermore, the few existing metastatic-specific driver mutations were mostly found in treated cases, similar to previous studies [34][35][36]. The intratumour heterogeneity, post-dissemination, in treated metastases was likely resultant of the collective dysregulation of DNA repair processes, from defective DNA MMR and HR, and the capecitabine and 5-FU chemotherapy, comparable to recent studies [14,32,37].
Of great clinical importance, most cases in our metastatic CRC cohort supported late dissemination, in accordance with the Fearon-Vogelstein multistage progression MSS CRC model [11][12][13][14]. This was consistent, irrespective of treatment status and the organ site colonised. Thus, further highlighting the clinical relevance of current routine screening, and the ability of sigmoid scoping and colonoscopy to successfully detect early CRC lesions, to aid the prevention of metastatic disease. Despite all metastatic-specific driver mutations being present in late disseminating metastases, most driver events were shared between the primary tumour and metastasis, underlining the prospect of using a single diagnostic biopsy from the primary tumour to represent the majority of genomic variation at the metastatic site.
The few metastatic cases in the cohort following an early dissemination model (and similarly cases that incurred polyclonal seeding of the metastasis) can make it difficult to design effective therapeutic strategies using a single biopsy or limited regions, thus warranting the use of additional clinical intervention, such as liquid biopsies (including circulating cell-free tumour DNA profiling) for early detection and metastasis prevention [38]. However, these exceptional cases must not overshadow the importance of screening methods for early detection and the excision of the primary tumour, as suitable ways to reduce CRC mortality in most cases. Two studies have argued that although cancer evolution can be traced using phylogenetic approaches, the timing of dissemination cannot be fully resolved without the aid of other factors, such as chronological references and tumour size or volume, whereby they proposed very early dissemination of primary CRC tumours, i.e., parallel metastatic progression [15,16]. Therefore, metastatic progression in our CRC cohort may also have disseminated earlier than indicated using phylogenetic divergence.
Similar to our metastatic cohort, recent studies have also demonstrated both monoclonal and polyclonal seeding [11,32]. As expected [10,39], polyclonal seeding was more common in lymph node metastases (irrespective of treatment) and therapy-naïve distant metastases (55.6%) compared to treated distant metastases (20%). Thus, in the absence of therapy, multiple primary tumour disseminating subclones acquired metastatic potential for seeding regional and distant sites. Polyclonal seeding occurred both synchronously (at the same time) and asynchronously (over multiple waves), arising from distinct spatial regions in the primary tumour. All lymph node metastases that were seeded through polyclonal spread incurred consecutive waves of seeding, possibly due to their geographical proximity to the primary tumour and higher seeding frequency (as the draining lymph nodes encounter higher rates of tumour cells) compared to distant organ sites [33]. Nascent micrometastases may attract subsequent waves of recurrent seeding for the successful colonisation of a distinct metastatic site [40]. Therefore, the removal of primary tumour (or metastasis) after detection in some cases may be clinically necessary to prevent further metastatic seeding. This intervention has particularly been successful in the management of synchronous metastatic disease, whereby excision of the primary tumour with subsequent adjuvant therapy was associated with better overall survival in prostate cancer and non-small cell lung cancer [41,42].
Furthermore, although adjuvant therapy may effectively target micrometastases, preventing (or at least delaying) further metastatic progression for many patients, it selects for resistant subclones that can make subsequent management and treatment more challenging for patients that do inevitably develop metastasis.
This study was mostly (>70% of primary CRC and distant metastases) based on multi-region WES of precious patient-matched primary CRC and metastatic tumour samples, with additional lymph node metastases in three cases. Furthermore, ultra-depth targeted sequencing (median 2691×, range 1921-3842) was used to validate all somatic variants. Collectively, this allowed a higher resolution for the detection of subclonal mutations, more accurate phylogenetic analysis, and identification of polyclonal seeding patterns. However, the results of this study need to be validated in a larger cohort of multi-region patient-matched primary and metastatic high-depth sequenced data.

Patients and Tumour Samples
We sampled a total of 191 temporally and spatially distinct tumour regions with a median of 8 regions per patient (ranging 2-11), consisting of 92 primary tumour regions (median of 4 regions/patient, ranging 1-5) and 99 matched-metastatic lesions (median 4 regions/patient, ranging 1-7) from the liver (n = 18), ovary (n = 4), lung (n = 3), urinary bladder (n = 2), prostate (n = 1), adrenal gland (n = 1), and spleen (n = 1), with corresponding normal colorectal tissue from 30 patients with advanced-stage CRC from King Faisal Specialist Hospital and Research Centre. To assess intratumour heterogeneity, ≥2 spatially distinct tumour regions from each primary CRC and metastatic lesion, separated by a margin of 0.5-1.0 cm (depending on tumour size), were used in this study. Three patients had available metastatic lymph node tissue in addition to their distant metastases. Although 10 patients had incurred further distant metastases to additional organ sites, including the liver, lung, bone, and adrenal gland, the tumour tissue from these sites were not surgically excised due to widespread disease. The median age of CRC onset in Saudi Arabia is~10 years younger than reported for other ethnicities [1,2]. Clinical characteristics of the CRC cohort are provided in Table 1 and Table S1.

Ethics Statement
This study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (IRB) of the King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia under the Project RAC # 2190 016, on 8th October 2019. A waiver of consent was granted by the IRB for the use of archived formalin-fixed paraffin-embedded CRC clinical samples. All clinical data was de-identified.

Whole-Exome Sequencing
Agilent SureSelectXT Target Enrichment was used for library preparation. The quantification and quality of DNA was measured by PicoGreen and agarose gel electrophoresis. For each sampled biopsy, 1 µg of FFPE genomic DNA was diluted in EB Buffer and sheared to a target peak size of 150-300 bp, using the Covaris LE220 focused-ultrasonicator (Covaris, Woburn, MA, USA), according to the manufacturer's recommendations. The fragmented DNA was repaired, and adapters were ligated to the fragments. Upon ligation, the adapter ligated product was polymerase chain reaction (PCR) amplified. The final purified product was then quantified using the TapeStation DNA screentape D1000 (Agilent). The captured DNA was then washed and further amplified. The final purified product was then quantified using qPCR, according to qPCR Quantification Protocol Guide (KAPA Library Quantification kits for Illumina Sequencing platforms) and qualified using the TapeStation DNA screentape D1000 (Agilent). Whole-exome sequencing (WES) was performed at a median coverage of 166× (range 199-255), using Illumina NovaSeq 6000 platform (Illumina, San Diego, USA), for each tumour region (n = 191) and matched normal colorectal tissue (n = 30) (Table S2). Phred quality (Q) scores were calculated to check the accuracy of each base nucleotide called, with greater Q values indicating a greater accuracy of sequencing. The median Q value obtained at the quality score of 30 was 94.21, and at the quality score of 20 was and 97.51 (Table S2). A base nucleotide quality score of 30 indicates the chances of having a base call error to be 1 in 1000 (base call accuracy 99.9%).

Variant Calling
Short somatic variants, including single nucleotide variants (SNVs), dinucleotide variants, and small insertions and deletions (indels), were called via the short variant caller GATK v4.0.12.0 tool-MuTect2 [44,45], using tumour with matched normal, to exclude germline variants. Variants presenting with the following features were also excluded: common single nucleotide polymorphisms (SNPs; minor allele frequency of >0.01) found in dbSNP, the National Heart, Lung, and Blood Institute exome sequencing project, 1000 Genomes, Exome Aggregation Consortium (ExAC), and in our in-house data from exome sequencing of~800 normal samples. Only mutations with a variant allele frequency (VAF) ≥1%, minimum alternative reads as 4 and regional sequencing depth ≥15, where germline reads ≥8 with a VAF < 1%, were retained. Variants with more than 2 mapping quality zero reads were considered false-positives and removed from the analysis. In addition to sequencing depth, VAF and minimal counts of alternative reads, a minimal low log odds (LOD) threshold of 6.3 was used for somatic mutations to filter out artefacts.

Mutation Validation
Further target capture sequencing using SureSelect DNA Design at a median depth of 2691× (range 1921-3842) was performed to validate the putative somatic variants (Tables S2 and S3). All filters were applied as above. SNVs were further confirmed using BamReadCount, whereas all indels were manually verified using the Integrated Genomics Viewer (IGV) v2.4.10 [49,50] to filter out false positives.

Using SNPs for Patient Sample Mismatch or Swaps
All tumour regions and the corresponding germline sample from a single patient should show a highly similar SNP profile. The VAF of 24 SNPs identified by Pengelly et al. [51] were checked in each tumour region to ensure no sample swaps or contamination.

Microsatellite Stability Classification
All samples were devoid of germline mutations in DNA mismatch repair (MMR) genes, MLH1, MSH2, MSH6, and PMS2. To further validate the MSI status from whole-exome sequencing data, the MSIsensor tool [52] was used to determine the MSI score of each sample. All samples were confirmed to be MSS with a lower MSI score (MSI sensor score < 3.5) (Table S7). Finally, immunohistochemistry (IHC) analysis was performed on MMR proteins (MLH1, MSH2, MSH6, and PMS2), as previously described [53], whereby all samples corroborated positive expression of MMR proteins. Collectively, all samples were categorised as MSS.

Mutational Signature Analysis
Mutational signature profiles were predicted using the deconstructSigs package in R, by comparing 30 published mutational signatures from the COSMIC database reported in different cancer types.

Gene Copy Number Profiling, Cancer Cell Fraction, and Genome Doubling
FACETS v0.5.13 [54] was used to determine copy number variations (CNVs) and regions of loss of heterozygosity (LOH). Mean allelic frequency (MAF) was produced from mutation data. ABSOLUTE v1.0.6 [55] defined the integer copy number and cancer cell fractions (CCFs) of mutations and CNVs. CNVs were further defined as amplifications and deletions, in relation to the average ploidy of all sampled regions from a given patient using the copy number for each segment from facets. Gene-level amplifications and deletions were classified by mean gene copy number ≥2× ploidy +1 copy or ≤2× ploidy −1 copy, respectively [23]. Genome doubling was calculated based on the mean ploidy of the sample. If the mean ploidy reported by FACETS was greater than 3, the sample was termed to have undergone genome doubling [56].

Clonality
If all tumour regions acquired the equivalent mutation, the mutation was classified as clonal/trunk. Absence of the mutation in any one region was classified as subclonal/branched. Similarly, if all tumour regions harboured the equivalent CNV, the gene was classified as clonally amplified or clonally deleted. Absence of the CNV in any one region was classified as subclonal.

Driver Mutation Classification
Non-silent variants were classified based on a list of potential driver cancer genes (n = 786), from the COSMIC cancer gene census (v90), The Cancer Genome Atlas (TCGA)-CRC [25], and other large genomic studies [35,57]. Putative driver mutations were classified: If the gene was identified as a tumour suppressor by COSMIC and the variant was found to be deleterious (stop-gain or predicted deleterious in two of the three computational tools-SIFT, PolyPhen-2 and MutationTaster); or if the gene was classified as tumour oncogene and an exact variant was found ≥3 times in COSMIC [23].

Driver Copy Number Classification
Copy number events were classified based on the same list of potential driver CNVs detailed above for driver mutation classification. Putative driver CNVs were classified: If the gene was identified as tumour suppressor by COSMIC and found to be a deletion; or if the gene was classified as tumour oncogene and was found to be amplified [23].

Phylogenetic Trees
Cancer cell lineages were reconstructed using LICHeE [58], by clustering all validated (at a median coverage of 2691×) somatic variants from all biopsied tumour regions from a single patient, whilst accounting for presence/absence and CCFs of a somatic variant across the sampled tumour sites and regions.

Molecular Time at Dissemination
Final molecular time at the primary tumour was estimated from the phylogenetic analysis of all mutations in genomic regions with consistent CCF across all sampled regions, by calculating the percentage of shared clustered mutations in the primary tumour and metastasis for each patient, where additional private mutations in either tumour are assumed to have occurred post-dissemination [59]. Molecular time at dissemination was further defined as late dissemination (linear progression), if the percentage of shared clustered mutations between the primary tumour and metastatic lesions ≥50%, or early dissemination (parallel progression), if the percentage of shared clustered mutations was <50% [60].

Quantifying Therapy Resistance
To ascertain the altering subclonal patterns of metastatic progression during treatment, the average CCF of disseminating primary tumour subclones were measured. Subclones found to be equivalent and/or significantly increasing in CCF during metastatic progression (if the average subclone CCF increase between the primary tumour and metastasis yielded p < 0.05; Chi-squared test) were considered to be 'significantly resistant'. Furthermore, significantly resistant driver mutations (≥20% CCF difference) [61] contributing to the overall resistance of the subclone were also described, where relevant (Table 2 and Figure 4). In treated primary-metastasis pairs, all mutations in the primary tumour were either present prior to therapy, and further selected for and expanded in the metastasis (adaptive resistance) or were induced by therapy (acquired resistance) in both the primary tumour and metastatic lesion [62].

dN/dS Analysis
The effect of selection was estimated using a dN/dS (the substitution rate at nonsynonymous sites to those at synonymous sites) ratio [63]. Maximum-likelihood method implemented in dNdScv R package was used to estimate the dN/dS values for missense (ωmis) and nonsense (ωnon) mutations exome-wide.

Heterogeneity Analysis
Intra-and inter-tumour heterogeneity were calculated using the Shannon diversity index (SDI), a measure of species diversity [17,18]. The SDI for each sample (primary, metastasis or primary-metastasis pair) was analysed based on the presence of different tumour clusters using the vegan R package. To confirm the monoclonal or polyclonal origin of metastases, we calculated the root diversity score (RDS), as previously described [33]. The RDS values were estimated using a python script by Reiter et al. (https://github.com/johannesreiter/rootdiversity).

Statistical Analysis
Where relevant, Mann-Whitney U test was utilized to compare continuous variables, Chi-squared test for categorical variables, and Wilcoxon signed-rank test for nonparametric paired groups were used to determine associations. Associations between clinicopathological parameters and heterogeneity (SDI) were performed using ANOVA in R. For all the statistical tests performed, p < 0.05 was considered statistically significant. However, to decrease the false discovery rate, p-values were corrected using the Benjamini-Hochberg (BH) method, adjusted at <5%.

Conclusions
This study provides valuable insight into our understanding of tumour evolution and the effect of therapy on metastatic CRC, especially in the Saudi population, with recognised earlier disease onset. Despite most of our cases supporting a late dissemination model, the presence of cases that either incurred prior treatment or have additional intratumour heterogeneity (due to early dissemination or polyclonal spread to the metastasis) may raise clinical challenges for targeted therapy and metastasis prevention. However, the results of this study need to be validated in a larger cohort.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/10/2938/s1, Figure S1: Raw copy number plots using FACETS, Figure S2: Box plot representation of tumourheterogeneity, via (a) Shannon Diversity Index (SDI) (b) Root Diversity Scores (RDS) of 21 samples with multiple sampled regions, Figure S3: Contribution of mutational signatures, Figure S4: Phylogenetic trees, Figure S5: Selection in primary CRC and metastatic tumour lesions, Table S1: Clinicopathological details of 30 CRC metastatic cases, Table S2: The quality control metrics of the whole exome and targeted capture sequencing., Table S3: Somatic mutations validated by target capture sequencing, Table S4: Copy number variations predicted by using FACETS algorithm, Table S5: Variant allele frequencies and cancer cell fraction of somatic mutations, Table S6: dN/dS scores for different types of mutations, Table S7: Microsatellite instability predicted using whole exome sequencing data on MSIsensor platform.