Genomic Instability of Circulating Tumor DNA as a Prognostic Marker for Pancreatic Cancer Survival: A Prospective Cohort Study

Simple Summary This prospective cohort study showed that circulating tumor DNA-genomic instability (ctDNA-GI) I-scores, which was defined as the natural logarithm of the sum of LOESS-normalized Z-scores of sequenced reads in 1 Mb bins, are prognostic of the outcome of either localized or metastatic pancreatic adenocarcinoma. At baseline, 24.1% of patients had high genomic instability with I-score. Multivariable analyses demonstrated I-score was a significant factor for progression-free survival and overall survival. Abstract Genomic instability of circulating tumor DNA (ctDNA) as a prognostic biomarker has not been evaluated in pancreatic cancer. We investigated the role of the genomic instability index of ctDNA in pancreatic ductal adenocarcinoma (PDAC). We prospectively enrolled 315 patients newly diagnosed with resectable (n = 110), locally advanced (n = 78), and metastatic (n = 127) PDAC from March 2015 through January 2020. Low-depth whole-genome cell-free DNA sequencing identified genome-wide copy number alterations using instability score (I-score) to reflect genome-wide instability. Plasma cell-free and matched tumor tissue DNA from 15 patients with resectable pancreatic cancer was sequenced to assess the concordance of chromosomal copy number alteration profiles. Associations of I-score with clinical factors or survival were assessed. Seventy-six patients had high genomic instability with I-score > 7.3 in pre-treatment ctDNA; proportions of high I-score were 5.5%, 5.1%, and 52% in resectable, locally advanced, and metastatic stages, respectively. Correlation coefficients between Z-scores of plasma and tissue DNA at segment resolution were high (r2 = 0.82). Univariable analysis showed the association of I-score with progression-free survival in each stage. Multivariable analyses demonstrated that clinical stage-adjusted I-scores were significant factors for progression-free and overall survival. In these patients, ctDNA genomic I-scores provided prognostic information relevant to progression-free survival in each clinical stage.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is a public health problem because of its dismal prognosis and increasing incidence. Despite the improvements in diagnosis and therapy applied during the past few decades, the five-year survival rate for pancreatic cancer is 10% worldwide [1,2]. Treatment options depend on several factors, including the cancer type and stage, possible side effects, and patient preferences and overall health conditions. Thus, a better understanding of the biology pertinent to PDAC might lead to more efficacious therapeutic strategies.
Genomic instability is a typical hallmark of cancer. It promotes inter-and intra-tumor heterogeneity and enables cancer cell adaptation to environmental stress, thereby driving aggressive tumor behavior and resistance to cancer therapies [3,4]. Moreover, recently integrated whole-genome analysis uncovered that the molecular subtypes of pancreatic cancer are linked to specific copy number aberrations in genes such as mutant KRAS and GATA6 [5]. These data support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype associated with disease progression.
Liquid biopsies are of particular interest from a clinical perspective because they are non-invasive and assess biomarkers released by primary tumors and metastases that reflect tumor biology [6]. Most studies have assayed KRAS oncogene mutations to identify circulating tumor DNA (ctDNA), and several groups, including ours, have reported that the presence of a KRAS mutation has a negative influence on the prognosis of pancreatic cancer patients [7][8][9][10]. However, studies for genomic instability in ctDNA have not been thoroughly conducted.
Chromosomal structural variations such as chromosomal rearrangement, duplica-tion, and deletion are prominent mechanisms of genomic damage in pancreatic cancer [11,12]. Surrogate measures of defects in DNA maintenance have potential therapeutic selection implications. Irinotecan and nanoliposome-encapsulated irinotecan, combined with 5-fluorouracil and/or oxaliplatin, have become the main cytotoxic agents for pancreatic cancer. Defects in DNA repair and DNA damage checkpoints have been identified with enhanced sensitivity to topoisomerase 1 inhibitors [13]. Furthermore, the combination of tumor copy number alterations and mutation load suggested a better predictor for identifying patients most likely to respond to immunotherapy than the mutation burden alone [14,15]. Current studies have elucidated that poly ADP ribose polymerase (PARP) inhibitors and platinum agents might be effective for inducing tumor regression in solid tumors bearing an unstable genome, including pancreatic cancer [11].
However, there are significant hurdles to overcome as technical challenges of DNA sequencing using small diagnostic samples preserved in fixatives such as formalin, analytical demands, and the return of results within a clinically relevant timeframe [11]. Therefore, identifying genomic instability in ctDNA can predict outcomes more effectively and increase the efficacy of treatment chemotherapy. Here, we investigated genomic instability in ctDNA as a prognostic and predictive marker of survival and the therapeutic response in PDAC.

Study Design and Sample Collection
This study prospectively enrolled 315 patients newly diagnosed with PDAC between March 2015 and January 2020. Patient blood samples and clinical data were collected from three hospitals: The National Cancer Center; the Seoul National University Bundang Hospital; and the Gachon University Gil Medical Center, Republic of Korea. This study was approved by Institutional Review Board (IRB No. NCC2015-0054, NCC2016-001, and NCC2019-027), and the participating patients gave their informed consent. Patients were divided into three clinical-stage groups: patients with (1) surgical resection, (2) local but unresectable disease (locally advanced), and (3) metastatic disease. Blood samples were collected before and after treatment, every three months for resectable and locally advanced patients, and every two months for metastatic patients, with restaging imaging after initiation of anticancer treatment. We included patients who received FOLFIRINOX (FOL-folinic acid, F-fluorouracil, IRIN-irinotecan, OX-oxaliplatin) therapy as first-line treatment, with the aim to explore the application of genomic instability in ctDNA in monitoring tumor burden change following treatment. Treatment response was assessed every four cycles of chemotherapy via abdominal enhanced computerized tomography (CT) and/or enhanced magnetic resonance imaging (MRI) of the liver. According to the Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1. [16], tumor response was quantitatively defined as complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD).
Formalin-fixed paraffin-embedded (FFPE) tissues (n = 15) were obtained from the biobank of the National Cancer Center, Korea. Thirty-eight healthy controls were enrolled from the National Cancer Center (IRB No. NCC2017-0083).

Sample Processing and DNA Extraction
Up to 10 mL of peripheral blood was collected by venipuncture in two types of evacuated blood collection tubes: K2EDTA (BD #366643; Becton Dickinson and Company, Franklin Lakes, NJ, USA) or Cell-Free DNA Streck BCT (Streck #218962; Streck, Omaha, NE, USA). Plasma from blood collected in K2EDTA tubes was separated within two hours after drawing blood to ensure ctDNA integrity. Processing of blood from Cell-Free DNA Streck BCT tubes was carried out within five days. Whole blood was centrifuged at 1600× g for 10 min, after which the supernatant was centrifuged again at 16,000× g for 10 min to remove any remaining contaminating cells. Supernatants were immediately stored at −80 • C until use.

Library Preparation for Whole-Genome Sequencing of Cell-Free DNA
A Tapestation 4200 (Agilent Technologies, Santa Clara, CA, USA) was used to examine the size of cell-free DNA (cfDNA) fragments before library construction, and samples showing a proper size distribution were used for library construction (Supplementary Figure S1). DNA libraries were prepared using a TruSeqNano kit (Illumina Inc., Cat# FC-121-4003, San Diego, CA, USA). Briefly, approximately 5 ng of cell-free DNA (cfDNA) was subjected to end-repair, adenylation, and adaptor ligation. The pooled libraries of 28 samples per run were analyzed with a NextSeq 500 instrument (Illumina Inc.) using 75 bp single-end read mode.

Whole-Genome Sequencing of Tumor Tissue DNA
The extracted genomic DNA was sheared to 180-250 bp using an M220 focused ultrasonicator (Covaris, Woburn, MA, USA). The sheared genomic DNA sizes were verified with a Tapestation4200. Libraries were constructed using an Accel-NGS 2S plus DNA Library Kit (Swift Biosciences Inc., Ann Arbor, MI, USA). We sequenced an average of 3.1 million reads on a NextSeq 500 system (Illumina, San Diego, CA, USA) using 75 bp single-end read mode.

Genomic Instability Calculation (I-Score)
Sequenced reads were aligned to the hg19 human reference genome using the BWAmem algorithm (0.7.5.a) with default parameters [17]. PCR duplicates were removed with Picard release 1.96 (https://broadinstitute.github.io/picard/, accessed on 23 August 2014), and reads with mapping quality below 60 were excluded from further analysis. After splitting the whole genome into 2897 1 Mb bins, 163 bins located in low mapping regions, such as telomeres, were not used for genomic instability calculation. The relative frequency of sequencing reads mapped to each bin was calculated and corrected for GC content bias using the LOESS algorithm [18]. To measure the local instability in patients with PDAC, the Z-score was calculated by comparing the relative frequency for each bin with that of 38 healthy control subjects. A Z-score for each bin was calculated with the formula below (Equation (1), Figure 1a): where RF bin is the relative frequency of a bin in a patient with PDAC, M bin is the mean of relative frequencies of a bin in normal healthy subjects, and SD bin is the standard deviation of relative frequencies of a bin in normal healthy subjects. To measure the extent of genome-wide copy number instability, we developed an Iscore that summarizes the local Z-scores into a single value. High I-score means a high level of genomic instability. LOESS algorithm was applied to smooth Z-scores of adjacent bins before I-score calculation, which helps reduce noise. I-score was calculated as described below Equation (2): To measure the extent of genome-wide copy number instability, we developed an Iscore that summarizes the local Z-scores into a single value. High I-score means a high level of genomic instability. LOESS algorithm was applied to smooth Z-scores of adjacent bins before I-score calculation, which helps reduce noise. I-score was calculated as described below Equation (2):

Identification of Recurrent Copy Number Alterations Regions in Pancreatic Cancer Patients
Hierarchical clustering analysis was performed using the heatmap.2 function in the R software package gplots [19]. Genome segmentation analysis was carried out using the DNA copy R software package [20] to divide the genome into equal DNA copy number regions, which are called copy number segments. The genomic regions recurrently amplified or deleted across the 315 patients with PDAC were identified using the Genomic Identification of Significant Targets in Cancer (GISTIC2.0) algorithm [21]. The cutoff for statistical significance was set to a false-discovery rate (FDR) adjusted p-value (q-value) < 0.25. Default parameters were used for GISTIC analysis.

Gene Sum Score Calculation and Validation
The genomic regions identified by GISTIC analysis were divided into gene-level intervals, and Z-score for each gene was calculated. Each gene was scored 1 or 0 according to its Z-score. Genes in GISTIC amplification regions with a Z-score higher than 2 or genes in GISTIC deletion regions with a Z-score less than −2 were scored as 1. All other genes were scored as 0. To select the minimum subset of genes having a prognostic impact, we tested the overall survival (OS) difference between patients with gene scores 1 and 0 for all GISTIC genes using log-rank tests. Five-fold cross-validation within 315 patients with PDAC was carried out for internal validation. In each of the cross-validation training sets, genes were sorted by the log-rank p-values and the top N most significant genes were independently selected in one of the five CV training sets, and the gene sum score (GSS) was defined as the sum of scores of selected genes in each CV training set. Top N genes ranging from 1 to 50 were investigated and the best combination of top N genes and the cutoff value were finally set in each cross-validation training set. The optimal cutoff values for GSSs were set to the point with the most significant log-rank test split. Further external validation was conducted using a publicly available data set that analyzed putative gene-level copy number profiles from tumor tissue DNA in patients with PDAC [22,23]. Genes labeled as high-level amplification or homozygous deletion were scored to 1 for GSS calculation in the external data set. Five GSSs were calculated, and the prognostic impacts of GSS on OS were validated. Schematic illustration of the GSS calculation and validation workflow can be found in Supplementary Figure S2a.

Comparing Copy Number Aberration Profiles of ctDNA and Matched Tumor Tissue DNA
Plasma ctDNA and matched tumor tissue DNA were sequenced from fifteen resectable pancreatic cancer patients to assess the degree of concordance between chromosomal copy number alterations (CNA) profiles. The similarity between CNA profiles was measured with Pearson's correlation coefficients comparing Z-scores at segment resolution. Genomic regions segmented with tumor tissue DNA were assumed as the baseline, and the Z-score for each plasma DNA segment was calculated as the mean value of Z-scores of 1-Mb size bins covering the baseline regions. To calculate Z-scores for tissue DNA, each bin reference value was constructed using 2012 samples from healthy females.

Statistical Analysis
Associations between I-score and other clinical factors were tested with two-sample t-tests, Pearson chi-square tests, or Fisher's exact test for each variable. We aimed to test the hypothesis that the detection of the I-score is associated with clinical outcome. OS and progression-free survival (PFS) were the primary outcomes. OS was calculated from the day Cancers 2021, 13, 5466 6 of 14 of diagnosis to the day of last follow-up or death from any cause. PFS was measured from the day of diagnosis to the day of progression or death. The associations of I-score with PFS and OS were assessed using univariable or multivariable Cox proportional hazards models. The effects were presented as hazard ratio (HR) and 95% confidence interval (CI). After backward variable selection with an elimination criterion of p < 0.05, only the stage was adjusted in PFS and OS multivariable models. Survival curves were estimated with the Kaplan-Meier method, and the survival difference was tested using a log-rank test. The ability of factors to predict PFS and OS was assessed with Harrell's C-Index. All patients were categorized into binary I-score groups (low and high) using the method proposed by Contal and O'Quigley [24], based on the log-rank test statistic. A two-sided p-value < 0.05 was considered statistically significant. All statistical analyses were performed with SAS (version 9.4; SAS Institute Inc., Cary, NC, USA) and R statistical software (version 3.6.2, R Foundation for Statistical Computing, Vienna, Ausrtria).

Patient Demographics and Distribution of I-Score
The clinical details, including age, sex, performance status, and tumor status, are provided in Table 1. Of the 315 patients enrolled, 181 were men, the mean age was 65.4 ± 9.7 years, and the median follow-up period was 18.9 months (range, 0-55.6 months). Patients with resectable, locally advanced, and metastatic cancers accounted for 34.9% (n = 110), 24.8% (n = 78), and 40.3% (n = 127) of all patients, respectively (Supplementary Table S1). We employed an I-score cutoff value of 7.3 to divide patients into low and high groups, as determined by Contal and O'Quigley [24]. Among 315 patients, 76 (24.1%) had pretreatment ctDNA I-scores higher than the cutoff values (Table 1). Age, sex, and Eastern Cooperative Oncology Group (ECOG) performance status were not significantly different between the low and high I-score groups. The rate of high I-score in ctDNA was higher in the metastatic group (n = 66, 52.0%) than in resectable (n = 6, 5.5%) or locally advanced (n = 4, 5.1%) groups. The frequency of high carbohydrate antigen 19-9 (CA19-9) and carcinoembryonic antigen (CEA) levels was 31.2% and 42.1%, respectively, were higher in the high I-score group than the low I-score group. The I-score has a significant association with clinical stage, tumor location, CA19-9 level, and CEA level (p < 0.001).

ctDNA I-Score Is Associated with Response to Chemotherapy
The responses of 48 patients receiving FOLFIRINOX as first-line chemotherapy are summarized in Supplementary Table S3. The objective response rate was 46.7% in the high I-score group and 17.6% in the low I-score group (p = 0.076). Figure 3a shows the responses according to changes in I-score in 18 metastatic patients at 3 months. At 6 months, two patients with 37.2% and 9.6% increased I-score levels showed PD (Figure 3b). In contrast, two (40%) of five patients with a 13.1% or 15.9% decrease in I-score levels showed PD. Among those with decreased scores, one with a decrease of 25.1% had a PR, and two with a decrease of 26.7% and 32.3% had SD. A representative ctDNA I-score plot demonstrates a dynamic range within an individual patient whose liver metastasis had initially responded and progressed during the administration of gemcitabine-based chemotherapy, which was associated with an initial drop and subsequent increase in ctDNA I-score (Figure 3c).

ctDNA I-Score Is Associated with Response to Chemotherapy
The responses of 48 patients receiving FOLFIRINOX as first-line chemotherapy are summarized in Supplementary Table S3. The objective response rate was 46.7% in the high I-score group and 17.6% in the low I-score group (p = 0.076). Figure 3a shows the responses according to changes in I-score in 18 metastatic patients at 3 months. At 6 months, two patients with 37.2% and 9.6% increased I-score levels showed PD (Figure 3b). In contrast, two (40%) of five patients with a 13.1% or 15.9% decrease in I-score levels showed PD. Among those with decreased scores, one with a decrease of 25.1% had a PR, and two with a decrease of 26.7% and 32.3% had SD. A representative ctDNA I-score plot demonstrates a dynamic range within an individual patient whose liver metastasis had initially responded and progressed during the administration of gemcitabine-based chemotherapy, which was associated with an initial drop and subsequent increase in ctDNA I-score (Figure 3c). The responses according to changes in I-score in 18 metastatic patients. At six months post-treatment, two patients with 37.2% or 9.6% increased I-score showed progressive disease (PD). In contrast, two (40%) of five patients with a 13.1% or 15.9% decrease in I-score showed progressive disease. (c) Case presentation with computerized tomography (CT) images.

Identifying Recurrent CNAs in Pancreatic Cancer
According to their I-score, the unsupervised hierarchical clustering analysis showed that the 315 patients were clustered into two distinct groups (Figure 1c). All 60 patients in the first group had high I-scores ranging from 7.4 to 10.1, showing recurrent amplification and deletion patterns. In contrast, the second group consisted of 255 patients with relatively low I-scores, and only 16 patients had an I-score higher than 7.3. GISTIC analysis The responses according to changes in I-score in 18 metastatic patients. At six months post-treatment, two patients with 37.2% or 9.6% increased I-score showed progressive disease (PD). In contrast, two (40%) of five patients with a 13.1% or 15.9% decrease in I-score showed progressive disease. (c) Case presentation with computerized tomography (CT) images.

Identifying Recurrent CNAs in Pancreatic Cancer
According to their I-score, the unsupervised hierarchical clustering analysis showed that the 315 patients were clustered into two distinct groups (Figure 1c). All 60 patients in the first group had high I-scores ranging from 7.4 to 10.1, showing recurrent amplification and deletion patterns. In contrast, the second group consisted of 255 patients with relatively low I-scores, and only 16 patients had an I-score higher than 7.3. GISTIC analysis identified five amplification and three deletion regions statistically significant across 315 patients (Figure 1b, Supplementary Table S4). GISTIC amplification regions included 1698 genes, and 36 were reported to be frequently, more than 10% of the study cohort, amplified in pancreatic adenocarcinoma patients. Similarly, GISTIC deletion regions included 364 genes and 22, which have been frequently deleted (Supplementary Table S5). We found that these GISTIC regions overlapped cancer-related genes as well, such as oncogenes, tumor suppressor genes, and genes related to poor prognosis when amplified or deleted [25][26][27][28][29]. Among the five amplification regions, four overlapped with seven oncogenes and five candidate genes. Similarly, one deletion region contained four tumor suppressor genes and one oncogene (Table 3). Clear CNAs were identified in all tumor tissues from 15 patients. One patient showed a detectable copy number-gain and loss pattern in plasma ctDNA. It is noteworthy that the profile of CNAs observed in plasma is highly similar to CNAs in tissue (Supplementary Figure S3a). Correlation coefficients between Z-scores of plasma ctDNA and tissue DNA at segment resolution reached 0.82 (Supplementary Figure S3b).

GISTIC Genes and GSS
In each cross-validation set, the top N most significant genes ranging from 14 to 43 were selected to calculate GSSs. Then, the optimal cutoff values for each GSS were set to the point that maximizes the OS difference between GSS-high and -low groups in cross-validation training sets (Supplementary Table S6). The prognostic impact of each GSS was internally validated in the matched cross-validation test set. There were significant (p < 0.05) differences in OS rates in three of the five cross-validation sets (Supplementary Figure S2b). In addition, GSS-high groups had a higher HR than the GSS-low groups in OS (Supplementary Figure S2c). GSS_Overlap and GSS_Union were calculated in the external data set using 8 genes repeatedly selected across all five cross-validation sets and 79 genes selected at least once from all cross-validation sets, respectively (Supplementary Table S7). By applying the cutoff value of 1 and 8, both GSS_Overlap and GSS_Union showed a significant prognostic impact in OS (Supplementary Figure S2d,e).

Discussion
This study of 315 patients is the most extensive prospective study evaluating the role of plasma ctDNA in PDAC. Although the strategy of using KRAS mutations in ctDNA as a tumor marker may be theoretically optimal in a disease like PDAC where KRAS mutation rates exceed 90%, the stochastic nature of circulating ctDNAs may lead to underestimation of the true circulating tumor burden or nature if detection is limited to a single mutation [9]. In the present study, ctDNA I-scores were prognostic of the outcome and predictive of response to systemic chemotherapy. Although patients with PDAC with a high I-score are highly responsive to chemotherapy, progressions are commonly observed within months despite treatment. We found that ctDNA I-score demonstrated a strong correlation with PFS or OS across different disease stages. Moreover, ctDNA I-scores may provide unique predictive information on chemotherapy outcome in localized disease and the metastatic setting. In contrast to the challenges of repetitive tissue biopsies, serial ctDNA I-score monitoring may provide an attractive alternative strategy for monitoring drug resistance and guiding treatment.
To the best of our knowledge, this study is the first to suggest the predictive value of ctDNA in patients with PDAC. In 49 patients with advanced PDAC receiving FOLFIRINOX, radiologic responses were significantly associated with ctDNA I-score. However, ctDNA I-scores were not able to differentiate patients with SD from patients with PD. Further, there were no significant differences between high and low I-score groups in terms of response duration, PFS, and OS. Although it did not reach statistical significance, changes in total ctDNA I-score represent a potential marker to predict treatment response. Additional large prospective studies are needed to investigate whether ctDNA I-scores can predict treatment response, considering that total cfDNA is not reliable for treatment response prediction due to low specificity for the overall tumor burden [30].
Comparison of the matched plasma and tissue DNA showed that the chromosomal CNA profile obtained from plasma cfDNA closely reflected CNAs in a tumor tissue, which implies the potential ability of cfDNA as a minimally invasive surrogate marker for tumor tissue DNA. Even though abnormal plasma CNAs were detected in only one of 15 patients, this low detection rate was due to the nature of resectable PDAC [31]. Considering that even KRAS mutations-one of the most common oncogenic variants in pancreatic cancer-are hardly detectable with cfDNA in early-stage pancreatic cancer, a low detection rate in our study was inevitable [10].
Moreover, we found that the GSS approach using a small subset of genes, less than 79, could predict OS in both internal and external validation data. Profiling CNAs from only these small sets of genes instead of a genome-wide approach could reduce the test's cost. Some of these genes were related to pancreatic cancer prognosis. For example, the amplification of KRAS [32] and deletion of CDKN2A [33] are associated with poor prognosis. Elevated expression of CDCP1 [34] has also been reported to be correlated with poor prognosis. Several other genes, such as BCAT1 [35], BCL2 [36], and ATF6 [37], are related to prognosis in other types of cancers. One potential limitation of the external validation of GSS is that the DNA source of the external validation data was not plasma but solid tumor tissue. Even though plasma cfDNA can closely reflect the true CNA profile of tumor tissue, some CNAs could be missed out owing to the low level of ctDNA in plasma cfDNA. In this regard, the statistical significance calculated in external validation might be overestimated.
Focal CNAs correlate with proliferation markers and chromosomal arm level, and whole chromosome CNAs correlate with immune evasion markers [38], implying distinct underlying mechanisms [39]. Current challenges for the clinical application of genomic instability measures are two-fold [40]. The first challenge lies in developing an optimal algorithm to integrate clinical parameters with genomic instability measures, genomic data, and transcriptomic data (germline single-nucleotide polymorphisms, mutations, CNVs, somatic mutations, CNAs, and gene expression differences). The second challenge is the translation of these algorithms into clinical protocols. Despite studies demonstrating the potential of integrated genomic instability measures and clinical parameters, we believe that evaluating these clinical trial tools will be critical to improving pancreatic cancer treatment.

Conclusions
In conclusion, our findings in a relatively large cohort of patients with PDAC with either metastatic or localized pancreatic adenocarcinoma demonstrated the prognostic value of the ctDNA I-score for survival in this malignancy.