Shallow Whole-Genome Sequencing of Cell-Free DNA (cfDNA) Detects Epithelial Ovarian Cancer and Predicts Patient Prognosis

Simple Summary Despite the progress in diagnostics and therapeutics, epithelial ovarian cancer (EOC) remains a fatal disease. Using shallow whole-genome sequencing (WGS), we identified copy number variations (CNVs). In addition, we quantified chromosomal instability using genome-wide instability and found that it could detect newly diagnosed EOC. In addition, the data showed RAB25 amplification (alone or with CA125), and disease-free survival and overall survival. Our data demonstrated that cfDNA, detected by shallow WGS, represents a potential tool for diagnosing EOC and predicting its prognosis. Abstract Despite the progress in diagnostics and therapeutics, epithelial ovarian cancer (EOC) remains a fatal disease. Using shallow whole-genome sequencing of plasma cell-free DNA (cfDNA), we investigated biomarkers that could detect EOC and predict survival. Plasma cfDNA from 40 EOC patients and 20 healthy subjects were analyzed by shallow whole-genome sequencing (WGS) to identify copy number variations (CNVs) and determine the Z-scores of genes. In addition, we also calculated the genome-wide scores (Gi scores) to quantify chromosomal instability. We found that the Gi scores could distinguish EOC patients from healthy subjects and identify various EOC histological subtypes (e.g., high-grade serous carcinoma). In addition, we characterized EOC CNVs and demonstrated a relationship between RAB25 amplification (alone or with CA125), and disease-free survival and overall survival. This study identified RAB25 amplification as a predictor of EOC patient survival. Moreover, we showed that Gi scores could detect EOC. These data demonstrated that cfDNA, detected by shallow WGS, represented a potential tool for diagnosing EOC and predicting its prognosis.


Introduction
Globally, 239,000 new epithelial ovarian cancer (EOC) cases are diagnosed yearly. In addition, 152,000 ovarian cancer (OC)-related deaths occur annually, making it the second leading cause of cancer-related death in females. Standard EOC treatment strategies include surgery and conventional chemotherapy [1,2]. However, EOC recurs in most patients (~70%), with a poor prognosis [3]. Thus, it is critical to understand the molecular pathways affecting prognosis. Indeed, markers that predict poor prognosis in EOC could help establish therapeutic strategies. NE, USA) within one week before the surgery. All patients received systemic chemotherapy after the debulking surgery.

Preparation of cfDNA and NGS Data Preparation
cfDNA was extracted from plasma. For 30 EOC samples, extraction was performed with the QIAamp Circulating Nucleic Acid Kit (Qiagen, Valencia, CA, USA) using the manufacturer's protocol. For ten EOC samples, cfDNA was extracted using the Chemagic cfDNA 2K Kit (PerkinElmer, MA, USA). The NGS library was quantified using the Qubit dsDNA HS assay kit (Thermo Fisher, MA, USA). Library fragment size was determined with D1000 screen tape (Agilent, CA, USA) using the Tapestation4200 (Agilent, CA, USA). Sequencing was performed with PE100 using the DNBSEQ-G400RS High-throughput Rapid Sequencing kit (MGI Tech Co., Shenzhen, China) and DNBSEQ-G400 sequencer (MGI Tech Co., Shenzhen, China).

Data Processing for CNV Detection and Z-Score Grouping
Mosdepth v0.3.1 was used to check the read depth of OC-specific genes [29]. The LOESS algorithm was used to normalize GC bias [30]. The average and standard deviation of the gene body depth were calculated to determine the Z-scores using the 20 healthy subjects' samples. Z-scores were calculated using the gene body depth of the patients. A Z-score exceeding 2 indicated amplification, and a score less than -2 signified a deletion.
To investigate whether OC-specific genes affected disease recurrence/progression, we performed Z-score grouping and two-step disease-free survival (DFS) analysis. The genes in the TCGA dataset were divided into AMP and DEL genes. AMP genes were defined as those with more amplified samples than deletion samples; DEL genes consisted of the genes with more deletion samples than amplified samples (Tables S3 and S4). Z-scores were grouped according to DEL and AMP information. For the AMP genes, Z-scores greater than 2 were grouped as 1, and Z-scores less than 2 were grouped as 0. For the DEL genes, Z-scores less than −2 were grouped as 1, and Z-scores greater than −2 were grouped as 0 (Table S5).

Genome-Wide Instability Score
Genome-wide Z-scores were calculated to quantify chromosomal instability using the following equation [23]: Genome-wide instability score (Gi score) = ∑ absolute(Z − score)

Identification of OC-Specific Genes and CNV Validation Using the TCGA Dataset
The ovarian serous cystadenocarcinoma TCGA dataset (n = 489) was used to confirm the CNV pattern. The data were analyzed with the cBioPortal for Cancer Genomics (http://cbioportal.org, accessed on 10 November 2015) [31,32].
We evaluated the similarity of the copy number profiles of 33 genes obtained by shallow WGS using cfDNA and TCGA WGS using gDNA. Data for 95 ovarian tumor samples and 75 normal samples were downloaded from the TCGA database. The data/analyses presented in the current publication are based on the use of study data downloaded from the dbGaP website, under dbGaP accession number phs000178.v11.p8. The read depths of the target gene regions were determined using mosdepth v. 0.3.3. Normalization of the GC content was processed using the LOESS algorithm in R. Z-scores of the read depths were calculated using the mean and standard deviation from the normal samples. Genes with a Z-score that exceeded 2 were assumed to be amplified; those with a Z-score less than -2 were assumed to be deleted. Pearson correlation was used to investigate the correlation between the mean copy number from the cfDNA and that of the TCGA gDNA. All statistical analyses were performed with R-4.0.3 and visualized using the gg-plot2 and ggpubr R packages (https://www.R-project.org, accessed on 24 April 2020 and https://CRAN.R-project.org/package=ggpubr, accessed on 27 June 2020) [30,33].

Statistical Analysis
DFS represents the time from treatment to relapse/progression. OS is the time to death, regardless of disease relapse/progression. Relapse is defined as a disease recurring more than six months after surgery; progression refers to a disease recurring within six months of surgery. DFS and OS from our data were determined by the Kaplan-Meier method. The genes with CNVs were analyzed using univariate analysis. For the multivariate analysis, the genes with CNVs and median CA125 level were used after classifying the patients into four subgroups: (1) genes with CNVs and CA125 > median CA125 (U/mL); (2) genes with CNVs and CA125 ≤ median CA125 (U/mL); (3) genes without CNVs and CA125 > median CA125 (U/mL); (4) genes without CNVs and CA125 ≤ median CA125 (U/mL). Inter-group comparisons were made using the log-rank test. Significant differences were defined as p < 0.05. The Wilcoxon rank sum test was used to analyze the statistical significance of the genome-wide instability scores (Gi scores) between groups.

Clinical and Pathology Data of Subjects
This study included 40 epithelial EOC patients who received primary debulking surgery and adjuvant chemotherapy. The clinical-pathological characteristics of the patients are presented in Table 1. The median age at diagnosis was 54 years (range: 26-75). Nine patients had stage I disease, and two had stage II. The majority of the patients (73%, 29/40) were stage III/IV (III, n = 26; IV, n = 3). Most patients (57.5%, 23/40) were diagnosed with high-grade serous carcinoma (HGSC). The remaining patients were diagnosed with mucinous and clear cell carcinoma (n = 5 each), low-grade serous carcinoma (n = 3), and endometrioid carcinoma (n = 1).

Genome-Wide Z-Scores from Shallow WGS Detect Chromosomal Instability
Shallow whole-genome sequencing was performed to evaluate chromosomal instability in plasma cfDNA from EOC patients. For each patient sample, more than 79 M reads (175.56 ± 64.91 for all samples) were obtained; more than 110.6 M reads (200.52 ± 31.48 for all samples) were acquired for each normal sample. The coverage for each patient and normal subject was about 5.15× and 5.55×, respectively. To investigate the diagnostic performance of the shallow WGS CNV counts, the Z-scores for all CNVs were calculated for each sample. The heatmap shows the somatic amplifications in red (Z-score > 2) and deletions in blue (Z-score < −2) ( Figure 1A). The CN profile showed the CN state across the genome in a predefined number of genes. Red segments represent CN gain in the genomewide scatter plot using the Z-score, and green segments show CN loss ( Figure 1B,C). Each dot represents a gene for which the copy number was inferred. In addition, we investigated the correlation of the 1× and 5× in silico analysis and found that the shallower WGS data resulted in similarly reliable outcomes (R = 0.8, p < 2.2 × 10 −16 ) ( Figure 1D). In this analysis, we downsampled the reads of the original data to 1x and calculated the correlation using the Z-scores of target genes. Red are the segments with copy gain, and green are those with copy loss. (C) Target gene scatter plot using the Z-scores (OC11). Scatter plot using the Z-scores of the target genes for each sample (red circle, Z-score ≥ 2; green circle, Z-score < 2; black circle, other). (D) Correlation plot using an insilico method. Downsampling of the reads of the original data to 1X. The correlation was calculated Scatter plot using the Z-scores with the bins for the segmented genome set to 100,000 bases. Red are the segments with copy gain, and green are those with copy loss. (C) Target gene scatter plot using the Z-scores (OC11). Scatter plot using the Z-scores of the target genes for each sample (red circle, Z-score ≥ 2; green circle, Z-score < 2; black circle, other). (D) Correlation plot using an in-silico method. Downsampling of the reads of the original data to 1X. The correlation was calculated using the Z-scores of the target genes.

Genome-Wide Instability by Shallow Whole-Genome Sequencing Characterizes EOC
We determined the genome-wide instability in cfDNA using the sum of absolute Z-scores (abs [Z-score]) ( Table S6). The genome-wide instability score (Gi score) for the EOC patients was significantly elevated compared to that of the healthy subjects (p = 0.0007, Wilcoxon test) (Figure 2A). In addition, the median Gi score of the advanced EOC patients (stage III/IV) was significantly higher than the median Gi score of the healthy subjects (p = 0.000579), whereas the median Gi score of the healthy subjects did not significantly differ from that of the early-stage EOC patients (stage I/II) ( Figure 2B). There was a trend towards higher Gi scores for advanced-stage EOC patients compared to early-stage patients; however, the difference did not reach statistical significance. However, significant differences were observed between healthy subjects and HGSC (p = 0.02) ( Figure 2C). The Gi score differences were not significant between the no recurrence/progression and recurrence/progression groups and between the wild-type BRCA1/2 and BRCA1/2 pathogenic variant groups ( Figure S1).

Copy-Number Variations in cfDNA predict EOC Patient Survival
We selected 33 EOC-specific genes from the TCGA dataset (n = 489, ovarian adenocarcinoma patients) ( Table 2) [16]. Z-scores exceeding 2 indicated "amplification" (AMP); "deletion" (DEL) was defined by Z-scores less than −2 ( Figure 1A-C). MYC, MECOM, PRKCI, CCNE1, and EIF5A2 were the top five most commonly detected genes with CNVs in the TCGA dataset, and AKT1, RPS6KA2, PIK3R1, EIF5A2, and PRKCI were the top five from our data. Only EIF5A2 and PRKCI were presented in the top five most commonly detected genes in both datasets. To demonstrate the specificity of the observed CNVs identified in the cfDNA of EOC patients, we retrieved segmented CN data for ovarian tissue-derived gDNA in the TCGA database. The two datasets were in concordance based on the Z-scores, yielding a Spearman correlation coefficient of 0.44 (p = 0.0095) ( Figure S2 and Table S7).

Copy-Number Variations in cfDNA predict EOC Patient Survival
We selected 33 EOC-specific genes from the TCGA dataset (n = 489, ovarian adenocarcinoma patients) ( Table 2) [16]. Z-scores exceeding 2 indicated "amplification" (AMP); "deletion" (DEL) was defined by Z-scores less than −2 ( Figure 1A-C). MYC, MECOM, PRKCI, CCNE1, and EIF5A2 were the top five most commonly detected genes with CNVs in the TCGA dataset, and AKT1, RPS6KA2, PIK3R1, EIF5A2, and PRKCI were the top five from our data. Only EIF5A2 and PRKCI were presented in the top five most commonly detected genes in both datasets. We used Kaplan-Meier survival analysis to determine whether the results from the shallow WGS analysis of cfDNA were associated with clinical outcomes (i.e., DFS and OS). DFS analysis was performed using the 0/1 grouping information (0 = EOC patients with no relapse/progression, 1 = EOC patients with relapse/progression) (Table S1). Patients who did not relapse/progress at the time of sample collection were treated as censored data. An event occurred when the patient gene grouping matched the TCGA CNV results. Of the 33 genes from the TCGA data, 10 genes (RAB25, PIK3CA, MECOM, DLEC1, KIT, EGFR, CCNE1, PTEN, AKT1, and AKT2) had p-values less than 0.05 (Table S8). Four genes (DLEC1, KIT, EGFR, and CCNE1) occurred only once (event: n = 1) and were excluded ( Figure 3). OS analysis was performed using the following 0/1 grouping information: 0 = EOC patients who were alive; 1 = EOC patients who died. Three EOC patients were not available for follow-up. Therefore, only 37 of 40 EOC patients were analyzed (Table S1). Three of the thirty-three genes from the TCGA dataset (DPH1, ERBB2, and NOTCH3) were excluded because none of the EOC patients harboring CNVs in these genes died during the follow-up period. We found four genes with CNVs (RAB25, DAB2, TP53, and RPS6KA2) present in more than three EOC patients, that demonstrated statistical significance in the OS analysis (p < 0.05) (Figure 4, Table S9). In particular, we found that patients with amplified RAB25 (RAB25_AMP, n = 3) had shorter DFS (p = 0.0014) and OS (p = 0.00011).  OS analysis was performed using the following 0/1 grouping information: 0 = EOC patients who were alive; 1 = EOC patients who died. Three EOC patients were not available for follow-up. Therefore, only 37 of 40 EOC patients were analyzed (Table S1). Three of the thirty-three genes from the TCGA dataset (DPH1, ERBB2, and NOTCH3) were excluded because none of the EOC patients harboring CNVs in these genes died during the follow-up period. We found four genes with CNVs (RAB25, DAB2, TP53, and RPS6KA2) present in more than three EOC patients, that demonstrated statistical significance in the OS analysis (p < 0.05) (Figure 4, Table S9). In particular, we found that patients with amplified RAB25 (RAB25_AMP, n = 3) had shorter DFS (p = 0.0014) and OS (p = 0.00011).  OS analysis was performed using the following 0/1 grouping information: 0 = EOC patients who were alive; 1 = EOC patients who died. Three EOC patients were not available for follow-up. Therefore, only 37 of 40 EOC patients were analyzed (Table S1). Three of the thirty-three genes from the TCGA dataset (DPH1, ERBB2, and NOTCH3) were excluded because none of the EOC patients harboring CNVs in these genes died during the follow-up period. We found four genes with CNVs (RAB25, DAB2, TP53, and RPS6KA2) present in more than three EOC patients, that demonstrated statistical significance in the OS analysis (p < 0.05) (Figure 4, Table S9). In particular, we found that patients with amplified RAB25 (RAB25_AMP, n = 3) had shorter DFS (p = 0.0014) and OS (p = 0.00011).
For the DFS, the six genes with CNVs (RAB25, PIK3CA, MECOM, PTEN, AKT2, and RPS6KA2) and CA125 (subgroup 1) had the shortest DFS rate of the four subgroups (p < 0.05) (Figures 5A and S3). In contrast, all six genes without CNVs and CA125 (subgroup 4) had the longest DFS rate (p < 0.05). For subgroup 2, only five genes contained CNVs (PIK3CA, MECOM, PTEN, AKT2, and RPS6KA2). These genes had the second shortest DFS rate. Subgroup 3 had the third shortest DFS rate. We performed OS analysis using the six genes from the DFS analysis. The results showed that only "RAB25_AMP and CA125 > 575.5 U/mL" had the shortest OS survival rate reaching statistical significance (p = 0.00056) ( Figure 5B).
For the DFS, the six genes with CNVs (RAB25, PIK3CA, MECOM, PTEN, AKT2, and RPS6KA2) and CA125 (subgroup 1) had the shortest DFS rate of the four subgroups (p < 0.05) (Figures 5A and S3). In contrast, all six genes without CNVs and CA125 (subgroup 4) had the longest DFS rate (p < 0.05). For subgroup 2, only five genes contained CNVs (PIK3CA, MECOM, PTEN, AKT2, and RPS6KA2). These genes had the second shortest DFS rate. Subgroup 3 had the third shortest DFS rate. We performed OS analysis using the six genes from the DFS analysis. The results showed that only "RAB25_AMP and CA125 > 575.5 U/mL" had the shortest OS survival rate reaching statistical significance (p = 0.00056) ( Figure 5B).

Discussion
In this study, we investigated whether chromosomal instability in cfDNA, detected by shallow WGS, is a potential marker for EOC and a prognostic indicator. We found that plasma cfDNA could detect genome-wide instability in EOC. Our data also showed that CNVs could provide comprehensive genetic data for EOC patients. Moreover, RAB25 amplification predicted EOC patient survival. Our study is the first to reveal that RAB25 amplification in cfDNA, identified by shallow-WGS, can predict EOC patient prognosis.
Using shallow WGS, we demonstrated that detecting chromosomal instability in cfDNA from EOC patients is feasible. We investigated the correlation of 1× and 5× in silico analysis and found that the shallower WGS data produced similarly reliable outcomes ( Figure 1D). Next, we quantified genome-wide instability by developing a "genome-wide

Discussion
In this study, we investigated whether chromosomal instability in cfDNA, detected by shallow WGS, is a potential marker for EOC and a prognostic indicator. We found that plasma cfDNA could detect genome-wide instability in EOC. Our data also showed that CNVs could provide comprehensive genetic data for EOC patients. Moreover, RAB25 amplification predicted EOC patient survival. Our study is the first to reveal that RAB25 amplification in cfDNA, identified by shallow-WGS, can predict EOC patient prognosis.
Using shallow WGS, we demonstrated that detecting chromosomal instability in cfDNA from EOC patients is feasible. We investigated the correlation of 1× and 5× in silico analysis and found that the shallower WGS data produced similarly reliable outcomes ( Figure 1D). Next, we quantified genome-wide instability by developing a "genome-wide instability score" or Gi score using the sum of (abs[Z-score]). EOC patients had higher Gi scores than healthy subjects. In addition, we observed a trend towards higher Gi scores in advanced-stage EOC patients compared to early-stage patients. Our results are consistent with recent studies showing that cfDNA detected by shallow WGS signifies chromosomal instability in EOC patients [34][35][36].
Our data showed that CNV data from cfDNA, obtained from shallow WGS, provides comprehensive genetic data for EOC, and may be a highly specific biomarker reflecting chromosomal instability. In addition, we determined that CNVs from shallow WGS could predict EOC prognosis. Previous studies showed that WGS makes it possible to detect cancer-specific CNVs in cfDNA [34,35,37,38]; however, studies predicting survival outcomes by this method are limited. Vanderstchele et al., were the first to evaluate the potential of cfDNA for diagnosing EOC. Braicu et al. showed that cfDNA quantification using shallow WGS based on CNVs could be used for EOC diagnosis and treatment monitoring [34,36]. Some studies demonstrated an association between chromosomal instability in cfDNA detected by shallow WGS and EOC prognosis. However, one study could only demonstrate the clinical usefulness of shallow WGS in prognosis prediction when it was combined with whole exome sequencing (WES). Because of the high cost of WES, combining shallow WGS with WES for determining patient prognosis would negate the financial benefits offered by shallow WGS [39]. Another study suggested that shallow WGS may be a clinically feasible method for predicting survival; however, it did not identify an association between chromosomal instability and EOC prognosis on a genetic level [35].
In our study, RAB25 amplification was associated with poor DFS and OS. In addition, we found that the integrated value for RAB25 amplification and the known tumor marker CA125 was associated with EOC patient prognosis. RAB25 was also amplified in the EOC TCGA dataset [16] and is considered an oncogene in various malignancies, including EOC [40,41]. We have limitations that the ages of the healthy subjects and EOC patients were not matched, and the family histories of the healthy subjects were not provided. However, previous studies presented that~0.25% of the general population harbored germline BRCA1/2 mutations; therefore, it is suggested that the healthy subjects should be considered without germline BRCA1/2 mutations [42][43][44][45]. In addition, this study had a limited number of EOC patients with amplified RAB25 (7%; n = 7/40) and the overall sample size was small. However, the median follow-up period for the EOC patients in this study was longer (50 months) than previous studies, and this may demonstrate the reliability of the study [35].

Conclusions
We demonstrated, for the first time, that RAB25 amplification is predictive of EOC patient survival. It also showed that the genome-wide instability score could detect EOC. Shallow WGS is an updated tool that can use cfDNA, which can be obtained non-invasively. This approach has a cost-benefit, over WGS and WES. However, further validation of the approach with large cohort studies is required. Collectively, our data showed that cfDNA detected using shallow WGS may be a clinically applicable tool for diagnosing EOC and predicting patient prognosis.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author (Y.J.C.). The data are not publicly available due to ethical restrictions.