Plasma-Based Measurements of Tumor Heterogeneity Correlate with Clinical Outcomes in Metastatic Colorectal Cancer

Simple Summary Molecular characterization of circulating tumor DNA (ctDNA) can offer a window into tumor genetic heterogeneity, especially in metastatic cancers where different lesions may harbor different mutations. The presence of multiple tumor clones may be reflected by dispersed variant allele frequencies of mutations detected in a single ctDNA sample. We hypothesized that the degree of dispersion of somatic mutations detected with a targeted next-generation sequencing assay may correlate with clinical outcomes in metastatic colorectal cancer. We found that patients with high ctDNA-based tumor heterogeneity after first-line bevacizumab and chemotherapy had shorter progression-free survival and worse objective response. Plasma-based measurements of tumor heterogeneity may have prognostic value in various cancer types and should be further explored for assessing treatment response and other clinical applications. Abstract Sequencing circulating tumor DNA (ctDNA) from liquid biopsies may better assess tumor heterogeneity than limited sampling of tumor tissue. Here, we explore ctDNA-based heterogeneity and its correlation with treatment outcome in STEAM, which assessed efficacy and safety of concurrent and sequential FOLFOXIRI-bevacizumab (BEV) vs. FOLFOX-BEV for first-line treatment of metastatic colorectal cancer. We sequenced 146 pre-induction and 89 post-induction patient plasmas with a 198-kilobase capture-based assay, and applied Mutant-Allele Tumor Heterogeneity (MATH), a traditionally tissue-based calculation of allele frequency distribution, on somatic mutations detected in plasma. Higher levels of MATH, particularly in the post-induction sample, were associated with shorter progression-free survival (PFS). Patients with high MATH vs. low MATH in post-induction plasma had shorter PFS (7.2 vs. 11.7 months; hazard ratio, 3.23; 95% confidence interval, 1.85–5.63; log-rank p < 0.0001). These results suggest ctDNA-based tumor heterogeneity may have potential prognostic value in metastatic cancers.


Introduction
Colorectal cancer (CRC) is characterized by high molecular heterogeneity [1,2]. In CRC and other cancer types, analysis of tumor tissue samples, with a single biopsy or multiregional sequencing, can be limited by sampling and therefore not capture genetic heterogeneity within the same sample or between sites [3][4][5][6]. In the past few years, new studies have demonstrated the use of next-generation sequencing (NGS) of circulating tumor DNA (ctDNA) from plasma for molecular characterization of CRC [7,8]. ctDNA sequencing has the potential to reveal more information than tissue sequencing, particularly in the context of therapy resistance beyond the primary tumor [9,10]. Plasma sampling can overcome limitations of tumor biopsy by providing a noninvasive alternative for tumor genotyping and more fully capturing tumor heterogeneity [11].
Various methods have been developed to calculate tumor heterogeneity, though they have been based on whole exome sequencing (WES) results of a single tissue sample per patient and estimates can vary widely by method [12]. Here, we present exploratory analyses of tumor heterogeneity measurements using ctDNA sequencing data from patients with metastatic CRC receiving first-line chemotherapy and bevacizumab in the Sequencing Triplet With Avastin and Maintenance (STEAM) trial. We assessed the potential prognostic value of measuring ctDNA-based tumor heterogeneity with a single plasma sample before and after induction chemotherapy.

Samples and Sequencing
The STEAM study (NCT01765582) was a Phase II trial evaluating bevacizumab (BEV) with concurrent or sequential 5-fluorouracil/leucovorin/oxaliplatin (FOLFOX) or 5fluorouracil/leucovorin/irinotecan (FOLFIRI) vs. FOLFOX-BEV for the first-line treatment of patients with metastatic CRC. Patients received induction therapy for four months, with an optional addition of two months. Study protocols for STEAM were approved by the Institutional Review Boards at each participating study site and patients provided written informed consent to participate in the biomarker program. Tissue biomarker analysis with KRAS, NRAS, and BRAF as well as final clinical data were previously described [13]. For this analysis, sequencing data were available for 146 pre-induction and 89 post-induction plasma samples. Pre-induction samples were collected one day before or on the day of induction therapy. Post-induction samples were collected 60 to 356 days (median 133 days) after the start of induction therapy. All post-induction samples were collected prior to the start of second line therapy. cfDNA was extracted from 4 mL of plasma and sequenced using the hybrid-capture based AVENIO ctDNA Surveillance Kits (for research use only, not for use in diagnostic procedures) on the Illumina HiSeq 4000. The 198-kilobase AVENIO Surveillance panel covers regions of 197 genes that are recurrently mutated in CRC and lung cancer. The input mass ranged from 0.52 to 208 ng (median 50 ng) and the median deduplicated depth per sample ranged from 504 to 8101 (median 4495).

Variant Calling
Single nucleotide variants (SNVs) and insertion/deletions (indels) were called with the AVENIO ctDNA Analysis Software (for research use only, not for use in diagnostic procedures) [14]. The software includes bioinformatics methods based on CAPP-Seq (cancer personalized profiling by deep sequencing) [15] and iDES (integrated digital error suppression) [16], and reports AF and MMPM values for each detected variant. MMPM is calculated as the AF multiplied by the extracted mass (ng) and adjustment factor of 330 haploid human genome equivalents per ng, then divided by plasma volume (mL). For each sample, the mean AF or MMPM of detected somatic variants was calculated and used as an estimate of tumor content or ctDNA quantity in the plasma.
We differentiated somatic cancer variants from germline variants using a machine learning model [17] that takes into account the variation in allele fraction (AF) from the same mutation found in multiple samples of the same patient. Germline mutations will have stable AF levels over time, whereas somatic ctDNA mutations will have variable AF levels, especially over the course of treatment. In cases where the tumor is not responding to treatment, somatic mutations may also have stable AF levels; therefore, we applied one modification to the method by re-classifying mutations as somatic if they were listed in the Loci of Interest in the AVENIO ctDNA Analysis Software. The Loci of Interest includes variants previously curated for clinical significance in the AVENIO ctDNA Analysis Software [14], and covers cancer hotspots in the following genes in the AVENIO ctDNA  Surveillance panel: ALK, APC, BRAF, EGFR, ERBB2, KIT, KRAS, MET, NRAS, PDGFRA, RET, and ROS1.
Given that not all patients had more than one plasma sample available for this analysis, we also applied filters to further remove germline mutations by using public databases of common germline variants. In particular, we classified variants as germline single nucleotide polymorphisms (SNPs) if they were present in >0.001 population frequency in subpopulations in ExAC release 0.3.1 or 1000 Genomes phase3v5b, or annotated as a common SNP in dbSNP build 144.

Calculation of MATH
The formula for Mutant-Allele Tumor Heterogeneity (MATH) is: 100 × MAD/median(X), where MAD is the median absolute deviation, or median(|Xi − median(X)|). MATH for each sample was calculated on the VAFs of somatic variants (Xi) in the sample. For samples with 0 or 1 somatic mutation, a MATH score of 0 was assigned, and these samples were classified as low tumor heterogeneity. Sub-analyses were also performed to separate out these samples as an "undefined" MATH group, as it is possible that additional tumor mutations exist beyond the panel of genes sequenced.

Statistical Analysis
Distributions of patient characteristics were compared between the BEP and the remaining ITT population in STEAM by Wilcoxon rank-sum test, Pearson's chi-squared test, and Fisher's exact test as appropriate. Pre and post-induction BEPs were compared with the remaining ITT separately.
PFS was defined as the time from randomization to the date of disease progression or death from any cause in first-line treatment. Subjects without an event were censored based on the last evaluable date of tumor assessment during first-line treatment.
The Kaplan-Meier method and log-rank tests were used to estimate the median PFS and log-rank p-value. Unadjusted and adjusted Cox proportional hazards models were used to assess the association between PFS and MATH, either as a continuous or a categorical variable. Unadjusted Cox proportional hazards models were performed for each clinical factor to assess its association with PFS. The clinical variables evaluated were: age, sex, ECOG score at baseline (0 vs. 1), cancer type at diagnosis (colon vs. rectal), location of primary tumor (left vs. right), prior cancer surgery, extent of metastatic disease (liver limited vs. non-liver limited), treatment arm, and liver resection outcome in first line (R0+R1 vs. no resection). Models were assessed separately for pre and post-induction populations. Significant clinical variables (p < 0.05) were included in adjusted modeling between MATH and PFS.
ORR was defined as the proportion of patients in first-line treatment with unconfirmed complete response or partial response, following RECIST version 1.1. To assess the association between MATH groups and objective response (complete or partial), logistic regression was used to calculate odds ratio and the associated 95% CI.
Correlations between ctDNA-based metrics such as MATH, mean AF, and number of somatic mutations were assessed by Spearman's correlation test.
The top 10 most frequently mutated genes, in addition to NRAS, BRAF, and combined RAS status (KRAS or NRAS mutant), were included in the analysis of association between mutated genes and MATH group (high vs. low). Two-sided Fisher exact tests were used for each gene and corrected for multiple comparisons using the Bonferroni method.

Cohort Characteristics
STEAM (NCT01765582) evaluated bevacizumab (BEV) with concurrent or sequential 5-fluorouracil/leucovorin/oxaliplatin (FOLFOX) or 5-fluorouracil/leucovorin/irinotecan (FOLFIRI) vs. FOLFOX-BEV for the first-line treatment of patients with metastatic CRC. Final clinical data have been previously described [13]. Retrospective sequencing on available plasma samples was performed with the AVENIO ctDNA Surveillance Kits (for research use only, not for use in diagnostic procedures). Of the 280 patients in the intent-to-treat (ITT) population in STEAM, 146 had a pre-induction plasma sample and 89 had a post-induction plasma sample available for analysis, which defined the biomarkerevaluable populations (BEPs) ( Figure A1). We examined whether patients with evaluable plasma sequencing data had different baseline characteristics compared to the remaining ITT population. Distributions of baseline characteristics were generally not significantly different between the pre-induction or post-induction BEP and remaining ITT population (Tables 1 and 2). Progression-free survival (PFS) was also not significantly different between the pre/post-induction BEP and remaining ITT populations ( Figure A2).

MATH in Pre-Induction Plasma
We assessed plasma-based tumor heterogeneity using Mutant-Allele Tumor Heterogeneity (MATH), a measure of variant allele frequency (VAF) variability divided by the median, which was first applied to solid tumors using WES data on tissue samples with matched normal [18]. To determine whether plasma-based MATH could be prognostic of progression-free survival (PFS), we analyzed 146 pre-induction plasma, which were collected either one day prior to or on the day of starting induction therapy. With the AVENIO Surveillance panel, a median of seven somatic mutations (range 0 to 25) were detected per sample. No somatic mutations were detected in two samples, and only one somatic mutation was detected in 11 samples. Figure 1a,b shows the MATH score and respective AF distribution of variants for each sample.
As a continuous variable, an increase in pre-induction MATH was associated with shorter PFS (p = 0.0128) ( Table 3). Given that tumor content in plasma could influence the number of mutations detected and calculations of tumor heterogeneity, the following ctDNA metrics were explored in correlations with MATH and potential adjustment factors: number of somatic mutations detected, mean AF, and mean mutant molecules per milliliter (MMPM). There was moderate correlation between MATH and mean AF (Spearman rho = 0.542, p < 0.0001), mean MMPM (Spearman rho = 0.595, p < 0.0001), and number of somatic mutations (Spearman rho = 0.536, p < 0.0001) ( Figure A3a-c). Slight significant association of MATH with PFS was still seen in adjusted models with mean MMPM or number of somatic mutations but not mean AF ( Table 3). The association between pre-induction MATH and PFS was also no longer statistically significant (p = 0.1341) when adjusting for clinical factors, specifically ECOG, treatment arm, and liver resection during first-line treatment. This was also seen when adjusting for clinical factors with mean MMPM or number of somatic mutations. As a continuous variable, an increase in pre-induction MATH was associated with shorter PFS (p = 0.0128) ( Table 3). Given that tumor content in plasma could influence the number of mutations detected and calculations of tumor heterogeneity, the following ctDNA metrics were explored in correlations with MATH and potential adjustment factors: number of somatic mutations detected, mean AF, and mean mutant molecules per milliliter (MMPM). There was moderate correlation between MATH and mean AF (Spearman rho = 0.542, p < 0.0001), mean MMPM (Spearman rho = 0.595, p < 0.0001), and number of somatic mutations (Spearman rho = 0.536, p < 0.0001) ( Figure A3a-c). Slight significant association of MATH with PFS was still seen in adjusted models with mean MMPM or number of somatic mutations but not mean AF ( Table 3). The association between pre-induction MATH and PFS was also no longer statistically significant (p = 0.1341) when adjusting for clinical factors, specifically ECOG, treatment arm, and liver resection during first-line treatment. This was also seen when adjusting for clinical factors with mean MMPM or number of somatic mutations.   Next, we explored MATH as a categorical variable by classifying the top quartile (MATH greater than or equal to 123.9) as high tumor heterogeneity. High tumor heterogeneity was correlated with shorter PFS (median 9.30 vs. 11.70 months; log-rank p = 0.0301) (Figure 2a). When samples with 0 or 1 somatic mutation were separated out as an "undefined" MATH group, the difference in PFS was still statistically significant (high vs. low vs. undefined: median 9.30 vs. 11.53 vs. 13.04 months; log-rank p = 0.0333) (Figure 2b). However, MATH as a categorical variable (high vs. low) was not significant (hazard ratio (HR), 1.41; 95% confidence interval (CI), 0.91-2.19) when adjusting for ECOG, treatment arm, and liver resection in a multivariable Cox regression model for PFS. geneity was correlated with shorter PFS (median 9.30 vs. 11.70 months; log-rank p = 0.0301) (Figure 2a). When samples with 0 or 1 somatic mutation were separated out as an "undefined" MATH group, the difference in PFS was still statistically significant (high vs. low vs. undefined: median 9.30 vs. 11.53 vs. 13.04 months; log-rank p = 0.0333) (Figure 2b). However, MATH as a categorical variable (high vs. low) was not significant (hazard ratio (HR), 1.41; 95% confidence interval (CI), 0.91-2.19) when adjusting for ECOG, treatment arm, and liver resection in a multivariable Cox regression model for PFS. Categorical MATH (low vs. high) was also not significantly associated with an objective response in both unadjusted (p = 0.8691) and adjusted (p = 0.6835) logistic regression models. Of the most frequently mutated genes ( Figure A4a), none were significantly enriched in high vs. low MATH groups. KRAS or NRAS were mutated in 54.1% of patients with high MATH and 48.6% of patients with low MATH. BRAF was mutated in 5.4% of patients with high MATH and 4.6% of patients with low MATH. The distributions of RAS and BRAF between MATH low and high groups were not statistically significant (p = 0.5681 and p = 1.0000, respectively). BRAF mutation in pre-induction plasma was not significantly associated with PFS (HR, 1.44; 95% CI, 0.63-3.31). High vs. low MATH, adjusting for BRAF, was significantly associated with PFS (HR, 1.58; 95% CI, 1.03-2.44). Similar associations were seen for RAS mutation in pre-induction plasma. RAS mutation was not significantly associated with PFS, and high vs. low MATH, adjusting for RAS, was still significantly associated with PFS (HR, 1.61; 95% CI, 1.05-2.47).

MATH in Post-Induction Plasma
We then examined MATH in plasma collected after the start of induction therapy; collection ranged from 60 to 356 days after treatment start date. Eighty-nine patients had a post-induction plasma sample with evaluable sequencing data using the AVENIO Surveillance panel. A median of three somatic mutations (range 0 to 23) per sample were detected. No somatic mutations were detected in six samples, and one somatic mutation was detected in 20 samples. MATH scores and the corresponding distribution of variant AF for each post-induction sample are shown in Figure 1c,d.
Increase in post-induction MATH was associated with shorter PFS (estimated 6% increase in hazard for every five-unit increase in MATH: HR, 1.06; 95% CI, 1.03-1.09) ( Table  4). Post-induction MATH was moderately correlated with mean AF (Spearman rho = 0.493, p < 0.0001) and mean MMPM (Spearman rho = 0.489, p < 0.0001) but strongly correlated with number of somatic mutations (Spearman rho = 0.809, p < 0.0001) ( Figure A3d Categorical MATH (low vs. high) was also not significantly associated with an objective response in both unadjusted (p = 0.8691) and adjusted (p = 0.6835) logistic regression models. Of the most frequently mutated genes ( Figure A4a), none were significantly enriched in high vs. low MATH groups. KRAS or NRAS were mutated in 54.1% of patients with high MATH and 48.6% of patients with low MATH. BRAF was mutated in 5.4% of patients with high MATH and 4.6% of patients with low MATH. The distributions of RAS and BRAF between MATH low and high groups were not statistically significant (p = 0.5681 and p = 1.0000, respectively). BRAF mutation in pre-induction plasma was not significantly associated with PFS (HR, 1.44; 95% CI, 0.63-3.31). High vs. low MATH, adjusting for BRAF, was significantly associated with PFS (HR, 1.58; 95% CI, 1.03-2.44). Similar associations were seen for RAS mutation in pre-induction plasma. RAS mutation was not significantly associated with PFS, and high vs. low MATH, adjusting for RAS, was still significantly associated with PFS (HR, 1.61; 95% CI, 1.05-2.47).

MATH in Post-Induction Plasma
We then examined MATH in plasma collected after the start of induction therapy; collection ranged from 60 to 356 days after treatment start date. Eighty-nine patients had a post-induction plasma sample with evaluable sequencing data using the AVENIO Surveillance panel. A median of three somatic mutations (range 0 to 23) per sample were detected. No somatic mutations were detected in six samples, and one somatic mutation was detected in 20 samples. MATH scores and the corresponding distribution of variant AF for each post-induction sample are shown in Figure 1c,d.
Increase in post-induction MATH was associated with shorter PFS (estimated 6% increase in hazard for every five-unit increase in MATH: HR, 1.06; 95% CI, 1.03-1.09) ( Table 4). Post-induction MATH was moderately correlated with mean AF (Spearman rho = 0.493, p < 0.0001) and mean MMPM (Spearman rho = 0.489, p < 0.0001) but strongly correlated with number of somatic mutations (Spearman rho = 0.809, p < 0.0001) ( Figure A3d-f). However, when including mean AF, mean MMPM, or number of somatic mutations in an adjusted model with MATH, MATH remained significantly associated with PFS (Table 4). The association between post-induction MATH and PFS also remained statistically significant after adjusting for treatment arm and liver resection during first-line (5 unit increase in MATH: HR, 1.05; 95% CI, 1.02-1.08). Association of MATH with PFS was still significant after adjusting for clinical factors with mean AF, mean MMPM, or number of somatic mutations. To assess MATH as a categorical variable, we classified post-induction samples in the top quartile (MATH greater than or equal to 67.7) as high tumor heterogeneity. High tumor heterogeneity was associated with shorter PFS (median 7.16 vs. 11.70 months; log-rank p < 0.0001) (Figure 3a). When samples with 0 or 1 somatic mutation were designated as a third "undefined" MATH group, the difference in PFS was also statistically significant (median 7.16 vs. 10.74 vs. 17.68 months; log-rank p < 0.0001) (Figure 3b). Patients classified in the high MATH category in post-induction plasma had shorter PFS compared to patients classified in the low MATH category (HR, 3.23; 95% CI, 1.85-5.63). MATH (high vs. low) was still significantly associated with PFS after adjusting for mean AF, mean MMPM, or number of somatic mutations ( Table 4). The significant association was still observed after adjusting for clinical factors only (HR, 3.34; 95% CI, 1.90-5.86), or in combination with mean AF, mean MMPM, or number of somatic mutations (Table 4).
in the high MATH category in post-induction plasma had shorter PFS compared to patients classified in the low MATH category (HR, 3.23; 95% CI, 1.85-5.63). MATH (high vs. low) was still significantly associated with PFS after adjusting for mean AF, mean MMPM, or number of somatic mutations ( Table 4). The significant association was still observed after adjusting for clinical factors only (HR, 3.34; 95% CI, 1.90-5.86), or in combination with mean AF, mean MMPM, or number of somatic mutations (Table 4). Objective response rate (ORR) was significantly higher for patients with low MATH compared to high MATH (87.9% vs. 56.5%; p = 0.0026) ( Table 5). Odds ratio for objective response (complete response or partial response) favored patients with low tumor heterogeneity (OR, 5.58; 95% CI, 1.84-16.88); this was still observed after adjusting for treatment arm and liver resection during the first line (OR, 5.94; 95% CI,1.87-18.90) ( Table 5). Amongst the most frequently mutated genes in post-induction plasma ( Figure A4b), KRAS or NRAS were mutated in 43.5% of patients with high MATH and 12.1% of patients with low MATH; this difference was statistically significant after Bonferroni correction (adjusted p = 0.033). While patients with RAS mutations in the post-induction plasma had shorter PFS (median difference: 4.3 months; HR, 2.53; 95% CI, 1.42-4.53)), high vs. low MATH status was still significantly associated with PFS in an adjusted model with RAS mutation status (HR, 2.81; 95% CI, 1.58-4.99). Objective response rate (ORR) was significantly higher for patients with low MATH compared to high MATH (87.9% vs. 56.5%; p = 0.0026) ( Table 5). Odds ratio for objective response (complete response or partial response) favored patients with low tumor heterogeneity (OR, 5.58; 95% CI, 1.84-16.88); this was still observed after adjusting for treatment arm and liver resection during the first line (OR, 5.94; 95% CI,1.87-18.90) ( Table 5). Amongst the most frequently mutated genes in post-induction plasma ( Figure A4b), KRAS or NRAS were mutated in 43.5% of patients with high MATH and 12.1% of patients with low MATH; this difference was statistically significant after Bonferroni correction (adjusted p = 0.033). While patients with RAS mutations in the post-induction plasma had shorter PFS (median difference: 4.3 months; HR, 2.53; 95% CI, 1.42-4.53)), high vs. low MATH status was still significantly associated with PFS in an adjusted model with RAS mutation status (HR, 2.81; 95% CI, 1.58-4.99).

Change in MATH from Pre-Induction to Post-Induction
There were 87 patients who had both pre-induction and post-induction plasma samples available for analysis. Out of the 87 patients, four patients had MATH scores equal to 0 at both time points. For these patients, the change in number of somatic mutations was used to determine the MATH classifier. One patient had no change in the number of somatic mutations and was excluded from this analysis. We assessed the change in MATH in the remaining 86 paired plasma samples and found that patients with at least a 10-fold decrease in MATH had longer PFS (log-rank p = 0.0078; HR, 2.18; 95% CI, 1.21-3.91) ( Figure 4). All three patients with pre and post-induction MATH scores equal to 0 had a decrease of variant count from 1 to 0. These patients were classified in the >/= 10-fold decrease group. The association was no longer significant after adjusting for treatment arm and liver resection during first-line treatment (HR, 1.65; 95% CI, 0.90-3.05) ( Table 6). Figure A5 presents examples of changes in VAF in a patient with a greater than 10-fold drop in MATH and another patient with an increase in MATH.
ples available for analysis. Out of the 87 patients, four patients had MATH scores equal to 0 at both time points. For these patients, the change in number of somatic mutations was used to determine the MATH classifier. One patient had no change in the number of somatic mutations and was excluded from this analysis. We assessed the change in MATH in the remaining 86 paired plasma samples and found that patients with at least a 10-fold decrease in MATH had longer PFS (log-rank p = 0.0078; HR, 2.18; 95% CI, 1.21-3.91) (Figure 4). All three patients with pre and post-induction MATH scores equal to 0 had a decrease of variant count from 1 to 0. These patients were classified in the >/= 10-fold decrease group. The association was no longer significant after adjusting for treatment arm and liver resection during first-line treatment (HR, 1.65; 95% CI, 0.90-3.05) ( Table 6)

Discussion
Since sequencing of solid tumor tissue samples may underestimate overall tumor heterogeneity, we hypothesized that heterogeneity could be measured using the distribution of allele frequencies of detected variants from plasma. Using plasma samples collected before and after induction during the STEAM trial, we explored the association of tumor heterogeneity measurements with PFS and ORR. We found that high tumor heterogeneity in post-induction plasma correlated with shorter PFS and worse objective response.
To our knowledge, this is the first application of MATH to ctDNA NGS results, and the first description of plasma-based MATH correlating with clinical outcome. MATH was originally presented as a potential prognostic biomarker in head and neck squamous cell carcinoma [18,19]. Other studies have reported a correlation between MATH and survival in various cancer types, including breast cancer [20], lung adenocarcinoma [21], FGFR3-mutated muscle-invasive bladder cancer [22], melanoma [23], and uterine corpus endometrial carcinoma [24]. Our findings are consistent with studies in CRC that show association of higher MATH scores with poorer outcomes or other clinical and biological factors. For example, higher MATH correlated with a greater number of subclones, which was associated with shorter PFS in CRC Stages I-IV [25]. Higher MATH in male patients in TCGA data was an independent risk factor for shorter OS [26]. MATH correlated with the risk of metastases in stage II CRC [27], and low MATH could predict better response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer [28].
All previous studies with MATH were based on WES of tumor tissue samples. Our work demonstrates feasibility in targeted sequencing of plasma samples, which offers two advantages. First, even though costs of sequencing are declining, WES poses computational and variant interpretation challenges. The 198-kilobase Surveillance panel used in this work was optimized by panel design to maximize the number of mutations that can be detected in CRC and lung cancer. The median number of somatic mutations per patient was 7 and 3 in the pre and post-induction plasma, respectively. Second, liquid biopsy is a non-invasive alternative to tissue biopsy, and can also reflect the genetic heterogeneity of multiple tumor sites that may be undersampled and biased with the sequencing of a single tumor tissue biopsy [29]. Other methods to infer clone phylogeny require collecting multiple tissue samples and face their own analytical challenges [30]. Further studies will be needed to explore plasma-based MATH in other cancer types and treatment regimens.
As for the biological basis underlying the association between plasma-based MATH and response to chemotherapy regimens, more work is required to uncover possible mechanisms. Chemotherapy remains the standard of care for metastatic CRC for patients without targetable mutations even though drug resistance often develops. Chemotherapy is known to apply selective pressure on tumor clones, and ctDNA levels drop during 48-h application of FOLFOX in patients with stable disease or partial response [31]. We observed longer PFS in patients with a greater decrease, such as more than a 10-fold drop, in tumor heterogeneity after the four to six-month induction period. Patients with a smaller decrease, or even stable or increased tumor heterogeneity, may have tumors that have developed resistance to chemotherapy or bevacizumab. Preclinical data show that chronic exposure of CRC cells to bevacizumab leads to increased tumor cell migration, invasion, and metastatic potential [32]. Perhaps tumor cells migrate to other sites and continue to proliferate and evolve new clones, resulting in similar or higher plasma MATH values. Other mechanisms of resistance to bevacizumab treatment are possible. One example is amplification of POLR1D, which may increase expression of VEGFA [33], though we were unable to test this as this region was not sequenced in the panel we used. There may also be tumor evolution through crosstalk with the surrounding microenvironment, where chemotherapeutic agents may upregulate expression of inhibitory immune checkpoints [34,35]. Future studies that incorporate other data types, such as protein or gene expression analysis of immunoregulatory molecules and image-based morphological characterization of tumor heterogeneity [36], could help elucidate these findings.

Conclusions
Analyses with a 198-kilobase ctDNA NGS assay on the STEAM study enabled the first demonstration of a plasma-based MATH assessment. We found that lower tumor heterogeneity was associated with longer PFS and higher ORR in metastatic CRC treated with first-line chemotherapy and bevacizumab. ctDNA-based tumor heterogeneity may have potential prognostic value in other metastatic cancers and treatment settings.