Molecular Characterization of Non-responders to Chemotherapy in Serous Ovarian Cancer

Nearly one-third of patients with high-grade serous ovarian cancer (HGSC) do not respond to initial treatment with platinum-based therapy. Genomic and clinical characterization of these patients may lead to potential alternative therapies. Here, the objective is to classify non-responders into subsets using clinical and molecular features. Using patients from The Cancer Genome Atlas (TCGA) dataset with platinum-resistant or platinum-refractory HGSC, we performed a genome-wide unsupervised cluster analysis that integrated clinical data, gene copy number variations, gene somatic mutations, and DNA promoter methylation. Pathway enrichment analysis was performed for each cluster to identify the targetable processes. Following the unsupervised cluster analysis, three distinct clusters of non-responders emerged. Cluster 1 had overrepresentation of the stage IV disease and suboptimal debulking, under-expression of miRNAs and mRNAs, hypomethylated DNA, “loss of function” TP53 mutations, and the overexpression of genes in the PDGFR pathway. Cluster 2 had low miRNA expression, generalized hypermethylation, MUC17 mutations, and significant activation of the HIF-1 signaling pathway. Cluster 3 had more optimally cytoreduced stage III patients, overexpression of miRNAs, mixed methylation patterns, and “gain of function” TP53 mutations. However, the survival for all clusters was similar. Integration of genomic and clinical data from patients that do not respond to chemotherapy has identified different subgroups or clusters. Pathway analysis further identified the potential alternative therapeutic targets for each cluster.


Introduction
Epithelial ovarian cancer is the fifth leading cause of cancer death among women in the United States and it has the highest mortality rate of all gynecologic cancers [1]. The majority of patients present with advanced disease at diagnosis. While most of the patients respond to the initial combination treatment of surgical debulking and platinum-based chemotherapy, nearly one-third of patients will not respond. Significant effort has been expended to define platinum resistance in epithelial ovarian cancers at both the histologic and biologic levels. For instance, increased chemoresistance has been Table 1. Clinical characteristics of 88 non-responder HSGC patients. Non-responders ‡ were divided by their resulting clusters after the iClusterPlus analysis. All of the clinical characteristics were not statistically different between the resulting clusters. Surgical Outcome 0.079 Optimal (<1 cm) 15 13 24 Suboptimal (>1 cm) 12 10 7  24 31 ‡ Non-responders were those who had progressed during the first platinum-based chemotherapy (platinum-refractory) or those who recurred within 6 months of treatment completion (platinum-resistant). * One patient had no information about drugs delivered; all other had initial platinum-based chemotherapy. ** Two patients had no information about drugs delivered; all other had initial platinum-based chemotherapy.
Results of the univariate analyses between responder and non-responder groups for all types of biological data are represented in Figure 1. All of the significant variables between groups were used for clustering analysis: 1023 differentially expressed genes, 960 differentially methylated promoters, 21 differentially expressed miRNAs, 56 somatic mutations, and more than 8000 gene copy number alterations (CNA). Initial evaluation with the clustering tool demonstrated that the miRNA variables did not influence the clustering process; thus, only the other four classes of biological data were included in the final analysis: gene expression, DNA methylation, somatic mutations, and gene copy number alteration.  24 31 ‡ Non-responders were those who had progressed during the first platinum-based chemotherapy (platinum-refractory) or those who recurred within 6 months of treatment completion (platinum-resistant). * One patient had no information about drugs delivered; all other had initial platinum-based chemotherapy. ** Two patients had no information about drugs delivered; all other had initial platinum-based chemotherapy.
Results of the univariate analyses between responder and non-responder groups for all types of biological data are represented in Figure 1. All of the significant variables between groups were used for clustering analysis: 1,023 differentially expressed genes, 960 differentially methylated promoters, 21 differentially expressed miRNAs, 56 somatic mutations, and more than 8,000 gene copy number alterations (CNA). Initial evaluation with the clustering tool demonstrated that the miRNA variables did not influence the clustering process; thus, only the other four classes of biological data were included in the final analysis: gene expression, DNA methylation, somatic mutations, and gene copy number alteration. Univariate Analysis between Responders and Non-Responders. Heatmaps and graphics of molecular variables that were different between groups of responders and non-responders, with color codes and significance levels for each class of data. These variables were used for the Figure 1. Univariate Analysis between Responders and Non-Responders. Heatmaps and graphics of molecular variables that were different between groups of responders and non-responders, with color codes and significance levels for each class of data. These variables were used for the integrative cluster analysis with iClusterPlus: (A) Differentially expressed genes; (B) Differentially methylated promoters; (C) Differentially expressed miRNAs; (D) Somatic mutations; (E) Altered gene copy numbers: green means gain of copy number, red is loss of copy number.

Integrative iClusterPlus Analysis
Eighty-eight non-responders from TCGA who had complete information for outcome, gene expression, mutation analysis, gene copy number, and DNA methylation were included in the cluster analysis. The optimization of the clustering method identified three differentiated clusters within non-responders ( Figure 2). To build the final model, we used a threshold, or cut-off value, which selected the most discriminative features for the three-cluster model solution. The threshold was the 95th percentile. Only those features that passed this threshold were included in the representation of the final three-cluster model. Clinical information, including surgical outcomes along with type of TP53 somatic mutation, and independently significant miRNAs were added to each cluster in a supervised manner for further characterization (Figure 3). integrative cluster analysis with iClusterPlus: (A) Differentially expressed genes; (B) Differentially methylated promoters; (C) Differentially expressed miRNAs; (D) Somatic mutations; (E) Altered gene copy numbers: green means gain of copy number, red is loss of copy number.

Integrative iClusterPlus Analysis.
Eighty-eight non-responders from TCGA who had complete information for outcome, gene expression, mutation analysis, gene copy number, and DNA methylation were included in the cluster analysis. The optimization of the clustering method identified three differentiated clusters within non-responders ( Figure 2). To build the final model, we used a threshold, or cut-off value, which selected the most discriminative features for the three-cluster model solution. The threshold was the 95 th percentile. Only those features that passed this threshold were included in the representation of the final three-cluster model. Clinical information, including surgical outcomes along with type of TP53 somatic mutation, and independently significant miRNAs were added to each cluster in a supervised manner for further characterization (Figure 3). To assess the number of clusters we plotted the number of tested clusters vs. percent of explained variation. Optimal k or cluster number is the point at which percent of explained variation begins to level off after initial rapid ascent. Here, k = 3 [15]. To assess the number of clusters we plotted the number of tested clusters vs. percent of explained variation. Optimal k or cluster number is the point at which percent of explained variation begins to level off after initial rapid ascent. Here, k = 3 [15]. A significant portion of Cluster 1 patients had > 2cm of residual visible disease following their primary debulking surgery. Additionally, they more often received suboptimal initial treatment. When classifying the TP53 somatic mutations within each cluster, the most loss-of-function TP53 mutations were seen in Cluster 1. Low levels of miRNA expression, significant hypomethylation, a high rate of somatic mutations, and CNA at chromosome 19 further characterized cluster 1.
Cluster 2 patients have mixed clinical characteristics and types of TP53 mutations. However, this group has extremely low levels of miRNA expression and higher expression of representative genes, as well as greater DNA hypermethylation, when compared to Cluster 1 and 3 patients. Further, unlike Clusters 1 and 3, there were no somatic mutations in the DNAH5 and ODZ1. Cluster 2 patients were more often diagnosed at stage III, received optimal primary treatment, and had optimal cytoreductive surgery, with the greatest percentage of completely cytoreduced patients as compared to other clusters. Cluster 3 patients had the greatest number of "oncogenic" gain-of-function (GOP) TP53 mutations. Cluster 3 was further characterized by high miRNA expression and DNA hypermethylation. Below them are the clinical profiles with the variable in the left margin (age, stage, optimal treatment, optimal surgery and residual disease after surgery) and the color-code for each category in the right margin. Underneath clinical information there is a representation of TP53 somatic mutation features: presence or status, variant, and mutation type. The last five heatmaps represent the top molecular features with specific color codes for their respective values at the right margin. Only molecular features that were most discriminating for this three-cluster model and passed a selection with a threshold value of > 95th percentile were included in the representation of the final 3-cluster model. Names of all features are detailed in Appendix A.
A significant portion of Cluster 1 patients had > 2cm of residual visible disease following their primary debulking surgery. Additionally, they more often received suboptimal initial treatment. When classifying the TP53 somatic mutations within each cluster, the most loss-of-function TP53 mutations were seen in Cluster 1. Low levels of miRNA expression, significant hypomethylation, a high rate of somatic mutations, and CNA at chromosome 19 further characterized cluster 1.
Cluster 2 patients have mixed clinical characteristics and types of TP53 mutations. However, this group has extremely low levels of miRNA expression and higher expression of representative genes, as well as greater DNA hypermethylation, when compared to Cluster 1 and 3 patients. Further, unlike Clusters 1 and 3, there were no somatic mutations in the DNAH5 and ODZ1. Cluster 2 patients were more often diagnosed at stage III, received optimal primary treatment, and had optimal cytoreductive surgery, with the greatest percentage of completely cytoreduced patients as compared to other clusters. Cluster 3 patients had the greatest number of "oncogenic" gain-of-function (GOP) TP53 mutations. Cluster 3 was further characterized by high miRNA expression and DNA hypermethylation.

Pathway Enrichment Analysis.
Pathway analysis findings were significant for the overrepresentation of the platelet derived growth factor receptor (PDGFR) in different pathways in Cluster 1 ( Table 2). Overrepresentation of the HIF-1 signaling pathway was seen in Cluster 2; the Wnt/β-catenin pathway was close but not statistically significant in this cluster (p-value = 0.089). Pathways that are involved in cellular Below them are the clinical profiles with the variable in the left margin (age, stage, optimal treatment, optimal surgery and residual disease after surgery) and the color-code for each category in the right margin. Underneath clinical information there is a representation of TP53 somatic mutation features: presence or status, variant, and mutation type. The last five heatmaps represent the top molecular features with specific color codes for their respective values at the right margin. Only molecular features that were most discriminating for this three-cluster model and passed a selection with a threshold value of > 95th percentile were included in the representation of the final 3-cluster model. Names of all features are detailed in Appendix A.

Pathway Enrichment Analysis
Pathway analysis findings were significant for the overrepresentation of the platelet derived growth factor receptor (PDGFR) in different pathways in Cluster 1 ( Table 2). Overrepresentation of the HIF-1 signaling pathway was seen in Cluster 2; the Wnt/β-catenin pathway was close but not statistically significant in this cluster (p-value = 0.089). Pathways that are involved in cellular senescence were found to be overrepresented in Cluster 3. Despite significant variation in clinical and molecular profiles between the clusters, survival differences were not observed among clusters ( Figure 4). Table 2. Pathway Enrichment Analysis. Given a list of genes, the pathway enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database will select those pathways that are overrepresented in the gene list for each one of the clusters. It will also compute a p-value for the resulting pathways. * Statistically non-significant.

Discussion
While understanding the biologic mechanism for platinum resistance in ovarian cancer is extremely valuable, this information has not yet been translated into effective therapeutic strategies. To date, the most widely used molecular inhibitors in the treatment of ovarian cancer are the VEGF inhibitor bevacizumab and poly (ADP-ribose) polymerase (PARP) inhibitors. Even with the incorporation of these drugs into the accepted treatment paradigm for ovarian cancers, molecular analysis of tumors is not routine. Here, our goal was not to assess the mechanism of platinum resistance but, rather, to characterize non-responders. Our unsupervised cluster analysis of HGSC patients that did not respond to primary platinum-based chemotherapy revealed three distinct groups.
Even though specific clusters did not predict survival, unique differences between the clusters could eventually be used for treatment planning. For example, non-responders in Cluster 1 were more often diagnosed at stage IV and receive suboptimal treatment. From the progression-free survival (PFS) and overall survival (OS) results of ICON7 and GOG 218, we now know that patients with poor prognosis disease, defined in the studies as those with stage IV disease, inoperable stage III disease, or suboptimally debulked stage III disease, have a slightly higher PFS and OS advantage with the addition of bevacizumab to standard chemotherapy [16,17].
We hypothesized that the combination of cluster analysis, clinical and mutational characteristics, and pathway analyses may reveal some of the biology behind each cluster. Indeed, pathway analysis of molecular characteristics can be used to gain insight into the common biological processes that contribute to disease pathophysiology for each cluster [18]. For example, Cluster 1 seems to overrepresent pathways that are involved in growth factor signaling (PDGFR and VEGF), which contributes to cell proliferation. Pathways that regulate cellular survival, particularly in the setting of stress (HIF-1 signaling pathway and cellular senescence), dominate Cluster 2. Finally, the significantly altered pathway in Cluster 3 was related to cellular senescence. Not only do these distinct alterations hint at the potential mechanisms of chemoresistance, but they can also be used to suggest alternative treatments to override the dominant signaling pathways.
Since Cluster 1 tumors have an overrepresentation of growth factor signaling pathways, a logical treatment choice for these patients would be a tyrosine kinase or angiokinase inhibitor. Indeed, a recent phase 3 trial using nintedanib (an oral triple angiokinase inhibitor of VEGF receptor, PDGFR, and fibroblast growth factor receptor), in addition to platinum based chemotherapy showed improved PFS for ovarian cancer (17.2 months vs 16.6 months; p = 0.024) [19]. Additionally, a recent phase II trial for platinum-resistant ovarian cancer patients has identified a group of patients with increased PFS after treatment with nintedanib [20]. In ICON6, the use of cediranib, another oral angiokinase inhibitor that targets VEGFR and PDGFR, in addition to platinum-based chemotherapy for platinum sensitive, recurrent ovarian cancer showed a PFS benefit of almost three months (p < 0.0001) [21]. Through the identification of Cluster 1 patients prior to initial treatment, we could potentially single out those patients that are likely to benefit from the addition of angiokinase inhibitors or bevacizumab to chemotherapy in the front-line setting and potentially improve survival.
The overrepresented pathways in Cluster 2 are involved in cell survival. Notably, the HIF-1 signaling pathway is intimately related to the PI3K/AKT/mTOR pathway, which is one of the most important signaling pathways in cell survival [22]. In pre-clinical studies, the chemoresistant ovarian cancer cells are highly sensitive to mTOR inhibitors [23]. However, Phase II trials in ovarian cancer, including resistant cases, have demonstrated a diverse range of responses to mTOR inhibitors, such as temsirolimus [24], indicating that better criteria are necessary to determine which patients are the best candidates for mTOR inhibitors. Our study may shed insight into this important clinical question. In the pathway enrichment analysis of Cluster 2, the Wnt/β-catenin signaling pathway was not significant (p-value = 0.09), but cellular senescence was (p-value = 0.001). It has been reported that Wnt signaling antagonizes oncogene-induced cellular senescence as a tumor suppression mechanism in vivo [25]. Suppressing Wnt signaling pathway may improve in response to treatment in some tumors. Not only are multiple Wnt inhibitors under preclinical development, but Wnt inhibitors, such as LGK974 and OMP18R5, have entered phase 1 clinical trials for solid tumors with aberrant Wnt signaling [26], and PRI-724 is currently being used in a phase II trial in metastatic colon cancer [27]. Additionally, recent phase 1 evaluation of the Wnt/β-catenin signaling pathway within melanoma has shown that overactivation of the Wnt pathway conveys resistance to immunotherapy with anti-PD-L1 monoclonal antibody [28], a treatment that has been recently extrapolated for use in gynecologic tumors [29,30].
The most significantly overrepresented pathway in Cluster 3 was the cellular senescence pathway. In cancer transformation, the cell needs to bypass cellular senescence to become immortal [31]. TP53, which is a key tumor suppressor gene, is one of the most well-known components of this pathway. Gain-of-function TP53 mutations were most frequent in Cluster 3 patients. In addition to targeted agents that are focused on the reactivation of null TP53, there are many investigational compounds that induce gain-of-function (GOF) mutant TP53 degradation, such as histone deacetylase (HDAC) inhibitors [32]. As described by Yan et al., the activity of HDAC inhibitors extended to decreasing not only GOF TP53, but also wild-type TP53 expression, indicating that knowledge of the mutant type of TP53 is critically important before the application of these targeted therapies [33].
This study is circumscribed by available TCGA data and the potential systematic biases from retrospective data involving selected patients. Therefore, before we can apply this classification model to patients with advanced stage HGSC and design new alternative treatments, it is necessary to validate the classification in a prospective study and within other, representative datasets. We are currently recruiting patients and processing tumors with this in mind. However, despite these limitations, our study has successfully classified patients that failed initial chemotherapy for HGSC based on clinical and molecular characteristics and postulated new targeted treatment strategies for patients with very poor outcomes that have few therapeutic options.

Outcomes Definition
Patients were categorized as responders or non-responders. The responders were defined as those with progression-free survival six months after the completion of six cycles of platinum-based chemotherapy. Non-responders were those who had progressed during the first platinum-based chemotherapy (platinum-refractory) or those who recurred within six months of treatment completion (platinum-resistant) [34][35][36]. Data from 450 patients with serous epithelial ovarian, fallopian tube, or primary peritoneal cancer were extracted from TCGA. The clinical characteristics of the study cohort are shown in Table 1. There was a total of 292 responder patients and 158 non-responders.

Source of Data
TCGA genomic data, including copy number variation, single nucleotide polymorphisms (SNPs), miRNA expression, gene expression (mRNA), and DNA methylation, as well as clinical and outcome information, were downloaded, normalized, formatted, and organized for the analysis, in accordance with the precepts of the TCGA data sharing agreements. All data collection and processing, including the consenting process, were performed after approval by the University of Iowa Institutional Review Board and they were in accord with the TCGA Human Subjects Protection and Data Access Policies, adopted by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

Copy Number Alterations (CNA)
Samples from Agilent Human Genome CGH Microarray 244A (Agilent Technologies, Santa Clara, CA) were processed and DNA sequences were aligned to NCBI Build 36 version of the human genome. Circular Binary Segmentation was used to identify the regions with an altered copy number in each chromosome [37]. The copy number at a particular genomic location was computed based on the segmentation mean log ratio data. We found regions with frequent CNA among all the samples by performing genomic identification of significant targets in cancer (or GISTIC) analysis [38]. The significance of CNA at a particular genomic location was determined based on false discovery rate (FDR), as previously described [39]. A total of 16,918 chromosomal loci were included in the analysis. CNA was available for 447 patients (CR = 290, IR = 157).

Mutation Analysis
Somatic mutation detection, calling, annotation, and validation have been extensively described elsewhere [39]. Somatic mutation information resulting from Illumina Genome Analyzer DNA Sequencing GAIIx platform (Illumina Inc., San Diego, CA) was downloaded and formatted for analysis. Mutation information was downloaded from TCGA Data Portal (Level 3, i.e., validated somatic mutations). Somatic mutation information was available from 171 samples from patients with CR and 89 with IR. For those patients, there were 6716 unique genes that presented some types of validated somatic mutation. These included: frame shift insertions and deletions, in-frame insertions or deletions, missense, nonsense and nonstop mutations, silence, splice site, and translation start site mutations.

Gene Expression
Raw gene expression data were downloaded from the TCGA Data Portal (Level 1), extracted, loaded, and normalized and annotated with the National Center for Biotechnology Information (NCBI) Build 36 of the human genome. There were 450 (CR = 292, IR = 158) Affymetrix HT Human Genome U133 arrays with gene expression and clinical information about the chemo-response. A total of 12,718 genes passed the filtering criteria for percentage of values missing (<50%) and they were included in the analysis.  [39]. A total of 14,473 DNA methylation probes passed filtering criteria for percentage of values missing (<50%) and they were included in the analysis.

MicroRNA (miRNA) Expression
Raw miRNA expression data were downloaded from the TCGA Data Portal (Level 1), extracted, loaded, and normalized with the analytical software, BRB-ArrayTools. There were 448 (CR = 290, IR158) Agilent Human miRNA Microarray Rel12.0 (Agilent Technologies Inc., Santa Clara, CA, USA) arrays from HGSC with clinical information regarding chemo-response [39]. A total of 619 miRNAs passed filtering criteria for the percentage of values missing (<50%) and they were included in the analysis.

Variable Selection
Our objective was to characterize HGSC patients that do not respond to initial chemotherapy by clustering clinical and biological data. Initially, we selected those variables that were associated with the defined outcome (non-response to chemotherapy) for all types of data: clinical, gene and miRNA expression, CNA, somatic mutations and DNA methylation. Variables that were not different between responders and non-responders do not inform the characterization of non-responder patients.
For a selection of variables that are associated with non-response, we performed a univariate two-sided t-test analysis, comparing the groups of chemo-response (responders versus non-responders) with respect to differential gene expression, DNA methylation, miRNA expression (cut off for significance p-value < 0.05), and CNA (p-value < 0.001). Logistic regression analyses were performed to determine the association between chemo-response, somatic gene mutations, and clinicopathological variables (cut off value for significance p < 0.05). A more stringent cut-off p-value was chosen for CNA (with elevated number of variables and overlap between them) to reduce false positives and minimize features in the cluster analysis. For all biological comparisons, 10,000 random permutations were performed to determine the p values. Significant variables in univariate analyses were included in the integrative cluster analysis, or iClusterPlus [40].

iClusterPlus Analysis
Only using data from the non-responder group, an unsupervised cluster analysis was performed with the iClusterPlus framework within the R statistical software package. The goal of this method is to generate a classification of tumors (or clusters) by capturing patterns from diverse classes of genomic data. The tumors were only clustered from non-responders. Initially, data from all biological classes were introduced in the model to optimize the clustering analysis. During the optimization process, no miRNAs made a significant contribution to the clustering solution; so, it was concluded that miRNA expression was not relevant to the clustering process. The remaining four biological/genomic classes were introduced for the final clustering analysis: gene expression, CNA, somatic mutations, and DNA methylation. All variables that were found to be significant in the univariate analysis were used. A total of 88 patients with information for all classes of biological data were used for the final analysis.
Before performing the final iClusterPlus analysis, we optimized or tuned the parameters to be applied. First, we determined the number of clusters (k) by repeatedly partitioning the samples into a learning and a test set. Subsequently, we evaluated the degree of agreement between the predicted and the observed cluster assignment [40]. For results visualization, we plotted the number of clusters vs. percent of explained variation. The optimal k is the point at which the curve of the percent explained variation levels off [15]. Afterwards, for each k, we used Bayesian information criteria to select the best sparse model with the optimal combination of penalty parameters, or lambdas (L) [40]. We ran the final model with the combination of optimized number of clusters (k), penalty parameters (L), and all classes of variables. Finally, we selected the top features (95th percentile) that were based on lasso coefficient estimates for the k-cluster solution.
Important clinical information, including demographic data, stage, surgical, and treatment information, along with TP53 mutation status was added to the representation of each cluster. The TP53 mutations were grouped into three categories based on predicted functional consequence: gain-of-function (GOF), loss-of-function (LOF), or wildtype (WT). The GOF mutations were those that have been shown to cause an oncogenic phenotype: P151S, Y163C, R175H, L194R, Y220C, R248Q, R248W, R273C, R273H, R273L, and R282W [41]; LOF mutations were those that resulted in the lack of protein expression and WT mutations were those that did not alter the amino acid sequence of TP53. The remaining mutations were single missense mutations, or "variants of unknown significance", in which the functional effect of the mutation is currently not known [42]. Residual disease after surgery was recorded in centimeters. Patients with residual disease >1 cm in largest diameter after surgery had suboptimal surgery. Optimal treatment was considered to be the sum of optimal surgery with the administration of six cycles of platinum-based chemotherapy.
Because miRNA expression did not contribute to the clustering process, in the representation of results, we only added miRNAs that were independently associated with chemo-response in the multivariate analysis. This analysis was performed using additive logistic regression modeling with backward elimination.

Pathway Analysis
To further characterize the molecular characteristics of each resulting cluster, a pathway enrichment analysis was performed using KEGG and clusterProfiler [18,43]. All of the analyses were performed using R environment for statistical computing and graphics (www.r-project.org) [44].

Conclusions
The integration of genomic and clinical data is a variable-based approach to cluster non-responders into distinct categories. Subsequent pathway analysis of the components of these clusters may be used to identify the potential alternative therapeutic targets and strategies for each cluster.
Author Contributions: M.E.M., E.A.S., and A.M.N. were involved in assembling the patient cohort. M.J.G. is responsible for assembling and maintaining the tumor bank utilized for this study. K.K.L. is one of the senior investigators overseeing this project, with D.P.B., and is the recipient of the N.I.H. grants that partially funded the study. B.J.S. and K.W.T. assisted with the biostatistics. M.E.M. was the major contributor in writing the manuscript; and E.A.S., D.P.B. and A.M.N. participated in editing the initial manuscript. E.J.D. interpreted pathway analysis and assessed possible targeted therapies for resulting cluster, in conjunction with K.W.T., J.G.B. is the senior investigator overseeing this project, analyzed and interpreted TCGA data, and was a major contributor in writing the manuscript. All authors read and approved the final manuscript.
Funding: This work was supported in part by the NIH grant R01 CA99908 and R01 CA184101 to Kimberly K. Leslie, and the basic research fund from the Department of Obstetrics & Gynecology at the University of Iowa.

Acknowledgments:
We would like to thank "TCGA Research Network" for generating, curating and providing clinical data.

Conflicts of Interest:
Dr. Kristina W. Thiel is a co-owner of Immortagen, Inc. All other authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs), or conflicts of interest in the subject matter or materials discussed in this manuscript. Appendix A Table A1. Clinical-molecular characteristics from the resulting cluster analysis in Figure 3. The order of the variables is the same than in the figure.