Integrated Multi-Tumor Radio-Genomic Marker of Outcomes in Patients with High Serous Ovarian Carcinoma

Simple Summary Clinical responses to the initial treatment of high grade serous ovarian cancer (HGSOC) vary greatly. Widespread intra-site and inter-site genomic heterogeneity presents significant challenges for the development of predictive biomarkers based on pre-treatment sampling of select individual tumors. Non-invasive stratification of patients with HGSOC by risk of outcome could facilitate a higher level of intervention for those with the highest risk of a poor outcome. We developed and validated a machine learning-based integrated marker of HGSOC outcomes to standard chemotherapy that combines a previously developed intra-site and inter-site CT radiomics measure called cluster dissimilarity (cluDiss) with clinical and genomic measures using two retrospective cohorts of internal and external institution datasets. Our approach was more accurate than conventional clinical and average radiomics measures for prognosticating progression-free survival and platinum resistance. Abstract Purpose: Develop an integrated intra-site and inter-site radiomics-clinical-genomic marker of high grade serous ovarian cancer (HGSOC) outcomes and explore the biological basis of radiomics with respect to molecular signaling pathways and the tumor microenvironment (TME). Method: Seventy-five stage III-IV HGSOC patients from internal (N = 40) and external factors via the Cancer Imaging Archive (TCGA) (N = 35) with pre-operative contrast enhanced CT, attempted primary cytoreduction, at least two disease sites, and molecular analysis performed within TCGA were retrospectively analyzed. An intra-site and inter-site radiomics (cluDiss) measure was combined with clinical-genomic variables (iRCG) and compared against conventional (volume and number of sites) and average radiomics (N = 75) for prognosticating progression-free survival (PFS) and platinum resistance. Correlation with molecular signaling and TME derived using a single sample gene set enrichment that was measured. Results: The iRCG model had the best platinum resistance classification accuracy (AUROC of 0.78 [95% CI 0.77 to 0.80]). CluDiss was associated with PFS (HR 1.03 [95% CI: 1.01 to 1.05], p = 0.002), negatively correlated with Wnt signaling, and positively to immune TME. Conclusions: CluDiss and the iRCG prognosticated HGSOC outcomes better than conventional and average radiomic measures and could better stratify patient outcomes if validated on larger multi-center trials.


Introduction
Ovarian cancer accounts for approximately 239,000 new cases and 152,000 deaths worldwide annually [1]. High grade serous ovarian carcinoma (HGSOC) is the deadliest gynecologic malignancy and is associated with a very poor prognosis [2]. Although HGSOC shows marked sensitivity to initial platinum-based chemotherapy [3], most patients recur and become progressively resistant to subsequent treatments [4]. Acquisition of resistance may be related to specific mutational processes that drive genomic heterogeneity [5,6] and clonal evolution [7,8]. HGSOC exhibits marked intra-site and inter-site genomic heterogeneity across metastatic sites in the peritoneal cavity [6][7][8] with altered immunological infiltrates and a tumor micro-environment (TME) [9]. Detection of spatial or temporal heterogeneity by multiple sampling in a single patient is expensive, invasive, and often clinically impractical. Consequently, analysis of heterogeneity has only been performed as retrospective research studies on a limited number of patients with HGSOC [6][7][8]. There is a pressing need for facile and non-invasive measures for intra-site and inter-site radiomic heterogeneity that can be integrated into clinical pathways.
Computed tomography (CT) and serum CA-125 measurement are routinely used for the initial staging and treatment monitoring of patients with HGSOC, but standard imaging protocols do not provide information on tumor heterogeneity. Texture analysis of CT data is a radiomics method [10,11] that can provide detailed quantitative characterization of local variations in intensity levels throughout an image. The majority of radiomics methods compute average measures of tumor heterogeneity based on a single site of disease per patient even in those with metastatic disease [10,[12][13][14][15][16][17][18][19], including a recent study of patients with advanced ovarian cancer from preoperative CT images [12]. However, averaged radiomics measures do not capture the potential variability within different regions of a tumor and between multiple tumors in the same patient.
Prior studies by our group have demonstrated that radiomic features quantifying the heterogeneity between tumor sites are associated with shorter overall survival (OS) and incomplete surgical resection in HGSOC patients treated with chemotherapy [20] as well as with shorter progression-free survival (PFS) in a different cohort of HGSOC patients with BRCA1/2 mutation [21]. More recently, we extended these methods to incorporate both intra-site and inter-site radiomic heterogeneity (IISH) and showed that a single measure, known as cluster dissimilarity (cluDiss), was associated with an immunotherapy response in patients with recurrent HGSOC [22]. These results show that modeling radiomic heterogeneity between the different sites of disease can help to better stratify patients with HGSOC.
In this retrospective study, we validated the cluDiss marker to stratify outcomes in HGSOC patients before chemotherapy treatment. Furthermore, we developed an integrated marker combining intra-site and inter-site radiomics-clinical-genomic (iRCG) variables using machine learning to distinguish patients' outcomes. The aims of this study were to (i) validate cluDiss as a predictor of outcomes using an internal and external multi-institutional cohort, and (ii) evaluate whether an integrated iRCG measure of HGSOC outcomes was more accurate than average heterogeneity radiomic (aRCG) and conventional imaging (CCG) measures. Finally, we attempted to establish the biological basis of the prognostic radiomics measures by studying their correlation with underlying biological processes characterized by a well-defined molecular HALLMARK gene set pathways, stromal and immune scores of the tumor microenvironment (TME), and established 18 cell types of the TME extracted using the consensus TME method [23,24] by using patient-level single sample gene set analysis (ssgsea) [25] from RNA-sequencing data.

Patient and Tumor Characteristics
The REMARK diagram flowchart for selecting the patients is described in Figure S1. The patient clinical characteristics are shown in Table 1. The median follow-up was 41.9 mos (inter-quartile range [IQR] 22.9 months [mos]-56.3 mos) in the internal Memorial Sloan Kettering Cancer Center (MSKCC) cohort and 19.3 mos (IQR 6.3 mos-38.6 mos) in the Cancer Imaging Archive (TCIA) cohort. All but two patients in MSKCC and 17 patients in TCIA experienced progression during the follow-up period. The median number of tumor sites was 7 (IQR 6 to 9) for the MSKCC and 4 (IQR of 3 to 5) for the TCIA cohort. In total, 460 tumor volumes of interest (VOI) were analyzed to compute the intra-site and inter-site tumor heterogeneity (IITH) metric cluster dissimilarity (cluDiss) as well as several (N = 75) average heterogeneity radiomics measures. The total tumor burden volume (TTV) was 122.0 cc (IQR of 65.5 cc to 229.0 cc) in MSKCC and 331.0 cc (IQR of 158.2 cc to 595.0 cc) in the TCIA datasets. After excluding 14 patients who had no platinum resistance data, 61 patients were analyzed for platinum resistance classification. Forty-four cases had matched imaging and RNA-sequencing data and were used for radio-genomic analysis.
The conventional-clinical-genomic (CCG) CCG PFS score was computed with best tuning parameters of α = 1, λ = 0.809, as (2) CCG PFS = 2.86 × age + 1.85 × sites + 1.17 × CNB + 0.235 × resection (2) The average radiomic-clinical-genomic (aRCG) aRCG PFS score was computed with (α = 1, λ = 0.803), as (3) Both cluDiss and continuous iRCG PFS scores were associated with PFS in both univariate and multivariable analysis (after adjusting for clinical factors and copy number burden [CNB]). Total tumor volume (TTV), CCG PFS , and aRCG PFS scores were not associated with progression-free survival (PFS) ( Table 2). The number of sites was associated with PFS in both univariate and multivariable analysis for the TCIA and to PFS in the univariate model in the MSKCC dataset. These results showed that the cluDiss measure was able to stratify patients by PFS better than both conventional imaging and average heterogeneity radiomics measures. The iRCG PFS and cluDiss cut-off to dichotomize patients into high risk (≥cut-off) and low risk (<cut-off) were determined as 642.00 and 68.62, respectively, on the MSKCC dataset. Testing on the TCIA dataset with this same cut-off showed significantly longer PFS for the lower values of iRCG PFS (p = 0.0006) and lower values of cluDiss (p < 0.001), respectively ( Figure 1). The iRCGPFS and cluDiss cut-off to dichotomize patients into high risk (≥cut-off) and low risk (<cut-off) were determined as 642.00 and 68.62, respectively, on the MSKCC dataset. Testing on the TCIA dataset with this same cut-off showed significantly longer PFS for the lower values of iRCGPFS (p = 0.0006) and lower values of cluDiss (p < 0.001), respectively ( Figure 1).

Classification of Platinum Resistance
The iRCG linear SVM model (AUROC of 0.78, 95% CI 0.77 to 0.79) was significantly more accurate than CCG linear SVM (p < 0.001) and aRCG RFE-SVM (p = 0.004) ( Table 3). The receiver operating characteristic curves for all three classifiers are shown in Figure 2. The iRCG SVM had the highest sensitivity of the three methods for classifying patients likely to develop platinum resistance.

Classification of Platinum Resistance
The iRCG linear SVM model (AUROC of 0.78, 95% CI 0.77 to 0.79) was significantly more accurate than CCG linear SVM (p < 0.001) and aRCG RFE-SVM (p = 0.004) ( Table 3). The receiver operating characteristic curves for all three classifiers are shown in Figure 2. The iRCG SVM had the highest sensitivity of the three methods for classifying patients likely to develop platinum resistance. Table 3. Machine learning classifier accuracies using intra-inter site radiomic-clinical-genomic (iRCG), conventional-clinical-genomic (CCG), and average radiomic-clinical-genomic (aRCG) classifiers of platinum resistance.   All variables except CNB were relevant (importance > 0) in the iRCG and CCG models. Resection status was the most relevant feature for all three models (Importance score = 100), while cluDiss had a lower importance score of 34.4, clearly indicating the relevance of the clinical variables for predicting platinum resistance. Two radiomic measures, dependence counts non-uniformity (DCN), and the gray level non-uniformity (GLN) were found to be relevant in the aRCG model. CNB had a low importance score of 2.32 in the aRCG model. Improved prognostication of HGSOC outcomes using the iRCG measures could allow for better upfront and non-invasirve strafication of patients with HGSOC by risk of outcome than with the All variables except CNB were relevant (importance > 0) in the iRCG and CCG models. Resection status was the most relevant feature for all three models (Importance score = 100), while cluDiss had a lower importance score of 34.4, clearly indicating the relevance of the clinical variables for predicting platinum resistance. Two radiomic measures, dependence counts non-uniformity (DCN), and the gray level non-uniformity (GLN) were found to be relevant in the aRCG model. CNB had a low importance score of 2.32 in the aRCG model. Improved prognostication of HGSOC outcomes using the iRCG measures could allow for better upfront and non-invasirve strafication of patients with HGSOC by risk of outcome than with the conventional clinical or genomic measures alone. More accurate risk stratification could faciliate a higher level of intervention for those with the highest risk.

Robustness to the CT Scanner Manufacturer
The cluDiss measure did not show statistical difference between scanners (p = 0.06) (Table S1). Of the 75 average radiomic measures, 24 were robust to scanner differences and these same radiomic featuers were used in constructing the integrated aRCG models of PFS and platinum resistance. Note that the iRCG model only used cluDiss as the radiomic measure. Four out of 13 gray level run length matrix (GLRLM) (30.8%), 5 out of 13 gray level size zone matrix (GLSZM) (38.5%), 2 out of 5 neighborhood gray tone difference matrix (NGTDM) (40%), 4 out of 14 neighborhood gray level difference matrix (NGLDM) (28.6%), and 9 out of 20 edge features (45%) did not show statistical differences while both first order and gray level correlation matrix (GLCM) measures showed significant differences between scanners. Gray level non-uniformity (ρ = 0.717) and dependence count non-uniformity (DCN) (ρ = 0.606) were highly correlated with TTV. The cluDiss feature, which is designed to increase with the number of lesions, was highly correlated with the number of sites (ρ = 0.833) (Table S2).

Correlation of Cludiss to Biological Processes
We studied the differences in the molecular signaling pathways between the low-risk and high-risk patient groups using the 50 hallmark gene sets [26], extracted using single sample gene-set enrichment (ssgsea) analysis of the RNA samples [25]. The signaling pathways were categorized into immune, oncogenic, stromal, cellular, and other [24]. Patients were dichotimized using the median value (cluDiss = 68.6) of cluDiss (low-risk<median and vice versa). Principal component analysis (PCA) of the 50 gene sets showed that the MYC, MTORC1 pathways were relevant in the high-risk group but not in the low risk group (Figure 3) in the first two PC dimensions (>60% variation explained). The 50 gene set expression variations for both groups ( Figure S3) showed many gene-sets contributing to patient variability for the high-risk (in the first two dimensions) compared to the low-risk group.

Correlation of Cludiss to Biological Processes
We studied the differences in the molecular signaling pathways between the low-risk and highrisk patient groups using the 50 hallmark gene sets [26], extracted using single sample gene-set enrichment (ssgsea) analysis of the RNA samples [25]. The signaling pathways were categorized into immune, oncogenic, stromal, cellular, and other [24]. Patients were dichotimized using the median value (cluDiss = 68.6) of cluDiss (low-risk<median and vice versa). Principal component analysis (PCA) of the 50 gene sets showed that the MYC, MTORC1 pathways were relevant in the high-risk group but not in the low risk group (Figure 3) in the first two PC dimensions (>60% variation explained). The 50 gene set expression variations for both groups ( Figure S3) showed many gene-sets contributing to patient variability for the high-risk (in the first two dimensions) compared to the lowrisk group. The Spearman rank correlation of the features cluDiss, DCN, and GLN that were relevant for platinum resistance classification were measured against stromal and immune scores computed using the ESTIMATE method [27], the 50 hallmark gene sets [26], and the 18 consense TME cell types [23,24]. TTV and sites were also evaluated for correlation for completeness. Because of the reported widespread TME heterogeneity in HGSOC patients both within and between tumor sites [7,24], we computed two variations of the cluDiss measure using only the tumor sites in the pelvis and only the tumors in the abdominal sites to assess their correlation with the biological processes.
CluDiss computed from across all visible tumor sites was negatively correlated with the Wnt signaling pathway ( = −0.35, p = 0.02). CluDiss measure from the pelvic sites showed no significant correlation to any of the gene set pathways. CluDiss measure computed from the abdominal metastases was negatively correlated with Wnt and NOTCH signaling ( Figure 4, Table S3), positively The Spearman rank correlation of the features cluDiss, DCN, and GLN that were relevant for platinum resistance classification were measured against stromal and immune scores computed using the ESTIMATE method [27], the 50 hallmark gene sets [26], and the 18 consense TME cell types [23,24]. TTV and sites were also evaluated for correlation for completeness. Because of the reported widespread TME heterogeneity in HGSOC patients both within and between tumor sites [7,24], we computed two variations of the cluDiss measure using only the tumor sites in the pelvis and only the tumors in the abdominal sites to assess their correlation with the biological processes.
CluDiss computed from across all visible tumor sites was negatively correlated with the Wnt signaling pathway (ρ = −0.35, p = 0.02). CluDiss measure from the pelvic sites showed no significant  In short, the gene set expressions were different between the low and high-risk patient groups dichotomized using the cluDiss measure. Furthermore, cluDiss, which quantifies the textural heterogeneity between multiple tumor sites, was positively correlated with the immune cell type and negatively with Wnt signaling pathway, while the average texture heterogeneity measures known as DCN and GLN showed a negative correlation with immune gene sets and immune cell types. Prior work by our group [24] has shown that enrichment of Wnt and Myc is negatively correlated with immune infiltration.

Discussion
Non-invasive stratification of patients with HGSOC by risk of outcome could facilitate a higher level of intervention for those with the highest risk of poor outcome. Possible therapeutic/diagnostic interventions could include enrollment in clinical trials, addition of bevacizumab to first line chemotherapy, and more frequent follow-up imaging to evaluate progression. Building on prior reports [12] that highlight the relevance of radiomic measures for predicting HGSOC outcomes, we validated a novel IISH measure, cluDiss, that we previously showed to be associated with HGSOC outcomes in a different cohort of patients [22]. Unlike most radiomic studies [10,12,16] that compute an average measure of single tumor heterogeneity, cluDiss quantifies the heterogeneity within and between the entire tumor burden rather than just the ovarian mass. Its magnitude increases with the number of sites and the textural differences between the sub-regions within and between the tumor sites, reflecting larger imaging heterogeneity. It adds to the conventional number of sites measured by quantifying the radiomic heterogeneity in those lesions. In short, the gene set expressions were different between the low and high-risk patient groups dichotomized using the cluDiss measure. Furthermore, cluDiss, which quantifies the textural heterogeneity between multiple tumor sites, was positively correlated with the immune cell type and negatively with Wnt signaling pathway, while the average texture heterogeneity measures known as DCN and GLN showed a negative correlation with immune gene sets and immune cell types. Prior work by our group [24] has shown that enrichment of Wnt and Myc is negatively correlated with immune infiltration.

Discussion
Non-invasive stratification of patients with HGSOC by risk of outcome could facilitate a higher level of intervention for those with the highest risk of poor outcome. Possible therapeutic/diagnostic interventions could include enrollment in clinical trials, addition of bevacizumab to first line chemotherapy, and more frequent follow-up imaging to evaluate progression. Building on prior reports [12] that highlight the relevance of radiomic measures for predicting HGSOC outcomes, we validated a novel IISH measure, cluDiss, that we previously showed to be associated with HGSOC outcomes in a different cohort of patients [22]. Unlike most radiomic studies [10,12,16] that compute an average measure of single tumor heterogeneity, cluDiss quantifies the heterogeneity within and between the entire tumor burden rather than just the ovarian mass. Its magnitude increases with the number of sites and the textural differences between the sub-regions within and between the tumor sites, reflecting larger imaging heterogeneity. It adds to the conventional number of sites measured by quantifying the radiomic heterogeneity in those lesions.
The cluDiss measure did not show significant differences to CT scanners, as did 24 of the 75 average heterogeneity measures. However, both first-order histogram and all GLCM measures showed a significant difference between scanners.
Quantifying inter-site and intra-site imaging heterogeneity is important because HGSOC exhibits widespread genomic intra-site and inter-site heterogeneity. Multi-site genomic studies have shown intratumor genomic heterogeneity to correlate to poor survival [7]. In addition to clonal heterogeneity, the malignant cell spread within the peritoneal cavity is distinct and non-random [8,9], as some sites harbor genetically diverse clones [8]. These site-specific properties, including immunologic components of the TME, may modulate malignant cell invasion and expansion, thereby shaping evolutionary selection [28]. However, large scale multi-site genomic heterogeneity studies are difficult to do and are impractical for clinical practice. This motivated the development and validation of a non-invasive CT-based measure of intra-site and inter-site radiomic heterogeneity.
More importantly, our results show that the integrated model combining cluDiss, clinical, and genomic variables was more accurate than the conventional imaging (total tumor volume and number of sites) and average tumor heterogeneity radiomics models for predicting PFS. This finding is in line with other works that have shown integrated radio-genomic models to better predict outcomes in other solid cancers [29,30]. Although cluDiss was as good as the iRCG PFS measure for predicting PFS, the platinum resistance classification benefitted from the clinical measures, indicated by their higher importance over cluDiss for that model. On the other hand, CNB, while relevant for the iRCG PFS model, was not relevant for classifying platinum resistance. Both cluDiss and the clinical measures can be obtained in a non-invasive way and their combination could, thus, serve to non-invasively stratify patient outcomes without needing genomic measures. On the other hand, genomic measures could be used for obtaining a mechanistic drivers of risk in those patients determined to be high risk using the non-invasive measures, such as by finding activated or suppressed pathways.
Understanding the radio-genomic correlations are important for furthering their application as biomarkers of treatment response [11]. Multiple studies have reported the association of image-based qualitative [31] and quantitative radiomics features with genomic measures [32][33][34][35][36]. We measured the correlation of cluDiss and two radiomics measures known as DCN and GLN with the Hallmark gene sets and TME cell types extracted from ssgsea analysis of the RNA expressions. Gene sets are candidates for genes that may either drive genomic heterogeneity or are required for survival in the context of ongoing chromosomal instability. Patients dichotomized into low-risk and high-risk groups using cluDiss showed a difference in the gene set pathways. Additionally, cluDiss was negatively correlated with the Wnt signaling pathway and positively to 13 immune TME cell types including Tregs, Tgd, Bcells, CD4, CD8, and NK. Prior work by our group [24] showed a mutual exclusivity in the expression of Wnt and Myc gene pathways with respect to the immune cell types in untreated HGSOC patients. CluDiss has also been shown to correlate with CKB protein in a previous study using matched imaging and proteomic samples from 20 HGSOC patients by our group [37]. To our knowledge, this is the first report on the radio-genomic correlation of intra-site and inter-site radiomic heterogeneity in HGSOC.
Our study is limited by a small dataset with high class imbalance (e.g., higher prevalence of platinum sensitivity than resistance), which was partly mitigated through a synthetic minority oversampling technique [17] by using a linear SVM classifier. Additionally, the genomic samples were available from only a single primary tumor site. Hence, a study of variability in gene sets and CNB between tumor sites, or their impact on classifying outcomes was not possible. Also, a study of repeatability of the radiomic measures needs to be assessed with test-retest studies and under different CT acquisition protocols. Studying the potential for clinical translation would require larger multi-institutional cohorts.

Ethics and Consent
The Institutional Review Board approved this retrospective Health Insurance Portability and Accountability Act-compliant study and waived the requirement for written informed consent. The TCIA is a managed, publicly available, open-source repository of de-identified medical images of cancer and corresponding patient information that is sourced from 28 participating institutions [38].

Study Design and Patients
Two cohorts of patients with HGSOC: a single institution dataset from MSKCC (N = 45) and a multi-institution dataset (N = 38) from the ovarian-TCIA [31], which included patients treated at five different institutions ( Figure S1) were identified from which 75 patients were selected. The eligibility criteria included: (i) federation of international gynecologic oncology (FIGO) stage III-IV HGSOC, (ii) attempted primary cytoreductive surgery, (iii) standard of care contrast-enhanced CT of the abdomen and pelvis performed prior to surgery, (iv) at least two tumor sites identified on CT for computing cluDiss, and (v) molecular analysis performed within The Cancer Genome Atlas (TCGA) Research Network ovarian cancer pilot project. Patients who received neoadjuvant chemotherapy prior to surgery, five patients from MSKCC, one from TCIA who did not complete molecular analysis, and two from TCIA who did not have data regarding surgical resection were excluded. Sixty-seven patient scans (89%) were acquired with voltage 120 kVp (median 120, IQR 110 to 120), tube current (median 300 mA, IQR of 89 mA to 393 mA), and reconstructed with a standard convolutional kernel with the most common slice thickness of 5 mm for 54 (72%) patients (median 5 mm, IQR 2 mm to 5 mm).
All patients used in this study were previously used for qualitative radiologist CT assessments based on association with Classification of Ovarian Cancer transcriptomic profiles and survival [31,39]. Thirty-eight patients from the MSKCC dataset were used with inter-site heterogeneity measures for predicting OS and surgical resection status [20]. The cluDiss measure, developed in our prior work on an entirely different cohort of patients for analyzing response to immunotherapy treatment in HGSOC [22], combines both intra-site and inter-site radiomic heterogeneity.
Our study aims ( Figure 5) were to: (a) evaluate association of cluDiss and iRCG with PFS, (b) compare iRCG classifier of platinum resistance against models combining clinical and genomic variables with conventional imaging (tumor volume, number of sites) (CCG), and average heterogeneity radiomic (N = 75) features (aRCG). We evaluated the robustness of cluDiss and average heterogeneity radiomic measures against CT scan manufacturers. Finally, we explored the biological basis of these measures on a subset of patients (N = 44) with matched CT imaging and molecular RNA-sequencing data by measuring correlation with well-defined Hallmark gene set signaling pathways, an immune and stromal tumor micro-environment (TME), and 18 TME cell types.
Clinical variables such as patient age, FIGO stage, and cytoreductive outcome (complete gross resection, optimal (≤1 cm residual disease), or suboptimal resection (>1 cm residual disease)) were obtained from patient clinical records for the internal MSKCC dataset and were available through cbioportal [40] for the TCIA dataset.
The copy number burden (CNB) was computed as the fraction of the altered genome and is available from the cbioportal [40] for outcome classification. The CNB is a measure of genome instability and is computed as the length of segments (in log2 scale) with copy number alterations >0.2 and divided by the length of the measured segments [40]. CNB was computed from the genomic sample taken from a single primary tumor site using common guidelines in the TCGA ovarian cancer study [41]. Specifically, biospecimens were collected from newly diagnosed ovarian cancer serous adenocarcinoma patients undergoing surgery. One tumor and matched normal tissue specimen were collected for each patient. variables with conventional imaging (tumor volume, number of sites) (CCG), and average heterogeneity radiomic (N = 75) features (aRCG). We evaluated the robustness of cluDiss and average heterogeneity radiomic measures against CT scan manufacturers. Finally, we explored the biological basis of these measures on a subset of patients (N = 44) with matched CT imaging and molecular RNA-sequencing data by measuring correlation with well-defined Hallmark gene set signaling pathways, an immune and stromal tumor micro-environment (TME), and 18 TME cell types.  Platinum resistance was defined as a platinum-free interval of less than 6 months after initial therapy [42]. PFS was calculated as the time from the date of primary surgery to the date of documented first recurrence on the basis of findings on a CT scan, physical examination results, or death prior to recurrence.

Computation of Intra-Site and Inter-Site Tumor Radiomic Heterogeneity
The IISH cluster dissimilarity (cluDiss) measure was computed as follows. i.
All suspected primary and metastatic tumors in the abdomen and pelvis (>1 cm) were manually delineated by two oncologic imaging research fellows (4 and 6 years of experience, respectively) using 3DSlicer [43], thereby resulting in multiple volumes of interest (VOI). Two conventional imaging measures, total tumor volume (TTV), estimated as the total number of voxels within each VOI multiplied by the voxel size, and the number of anatomic sites corresponding to the number of radiologist-defined sites of disease on preoperative CT scans were computed. ii.
CT images were rescaled to 0-255 and discretized into 32 bins. Then, Haralick textures, energy, entropy, homogeneity, and contrast were computed [20] by sliding a fixed sized patch (11 × 11 × 1) centered around every voxel within all VOIs using in-house software implemented in C++ using the Insight ToolKit (ITK) [44]. iii.
Sub-regions of homogeneous texture were extracted from within VOIs by grouping voxels with similar texture values using kernel K-means clustering [45], which exploits the spatial relatedness of voxels to produce a computationally fast clustering. The appropriate number of clusters for each patient was determined using Akaike information criterion from an empirically chosen maximum of five clusters. The mean values of the four individual Haralick texture measures described the sub-regions. iv.
Sub-region textural differences were quantified using Euclidean distance and summarized into a dissimilarity matrix.
v. The group dissimilarity matrix (GDM), which is a 2D histogram that captures the number of sub-region pairs with similar levels of dissimilarity, was computed. The rows of the GDM correspond to the number of sub-regions with a similar dissimilarity and the columns correspond to the dissimilarity level. Ten bins were used to discretize the dissimilarities and the sub-region pairs sizes following min-max normalization. vi.
The cluDiss measure, which quantifies the peakedness in the distribution of dissimilarities by considering the relatedness between groups of subregions by sharing similar levels of dissimilarity within the GDM is computed as: where K is the number of dissimilarity levels and M is group size levels, µD and µA are the normalized mean of dissimilarity levels and group sizes, and G is the group dissimilarity matrix. The indicies i and j emphasize larger dissimilarities and larger group sizes. Higher cluDiss values result from the presence of many texturally distinct sub-regions ( Figure S2b), while fewer large texturally distinct groups with distinct dissimilarity will result in lower cluDiss values ( Figure S2a).

Computation of Average Radiomic Heterogeneity Measures
Average heterogeneity radiomic texture measures (N = 75) (Table S1) quantifying the textural heterogeneity across all disease sites were computed using the computational environment for radiological research (CERR) [46]  , and a bandwidth of √ 2 (16 features). All of the radiomic features were compliant with the imaging biomarker standardization initiate (IBSI) [47] and default parameter settings available in CERR that were tested for IBSI compliance were used.

Single Sample Gene Set Enrichment Analysis
Single sample gene set enrichment analysis (ssgsea) [25] was performed on the RNA measurements of each sample using the GSVA package version 1.28.0 in R version 3.5.0 [25]. Default settings of parameter τ = 0.25 as originally used in Reference [48] was employed to place a modest weight on the expression of genes in a gene set pathway. This parameter corresponds to the weight associated with the ranking of absolute expression of genes in a signature of pathway in relation to the expression of all other genes. Normalized enrichment scores were then generated and combined with the gene ontology MSigDB [26] to estimate the pathway enrichment for the 50 Hallmark gene sets. The estimation of stromal and immune cells in malignant tumor tissues using expression data (ESTIMATE) method [27] was used to quantify the immune and stromal signatures from the bulk tumor RNA-sequence data. The results of ssgsea was used to estimate the relative expression of 18 different TME cell types by using the Consensus TME [23,24], which integrates seven different methods of gene sets or TME cell type estimation methods.  [49] using elastic net feature selection constraints was fit using cluDiss, conventional clinical, and genomic variables to classify patient survival in months using MSKCC as training dataset. Best tuning parameters ∝ and λ (or the L1-norm penalty) were determined from the training set (MSKCC). The relative feature importance obtained from this model was used to combine cluDiss, clinical, and genomic variables into a single continuous iRCG PFS score. An optimal cut point was determined from the iRCG score using receiver operating characteristic curve (ROC) analysis (optimalCutpoints in R) on the MSKCC set and applied on the external TCIA dataset to dichotomize patients into low-risk and high-risk groups. The same approach was repeated by combining conventional imaging and average heterogeneity radiomic measures with clinical and genomic variables to extract CCG PFS and aRCG PFS scores, respectively.

Platinum Resistance Classification
Sixty-one of the 75 patients had platinum resistance information (Table 1) with 14 patients resistant while the remaining 47 are sensitive to platinum resistance chemotherapy. Since the number of patients for training a machine learning classifier were relatively small, and due to the large class imbalance, this analysis was performed using cross-validation instead of using the TCIA dataset as a hold-out testing set, as done for determining PFS association in Section 4.6.1. This approach is valid for the purpose of this work where the goal was to assess the utility of the cluDis measure in comparison to conventional clinical and radiomics measures.
iRCG-SVM: A linear support vector machine classifier (SVM) [50] was constructed by combining cluDiss with three clinical (age, stage, cytoreductive outcome), and genomic (CNB) variables. A linear SVM was used since it treats the individual factors independently and avoids overfitting in small datasets. The classifier was trained with three-fold cross validation with 100 repetitions. Class imbalance was handled by the synthetic minority oversampling (SMOTE) technique as used in our prior work for classifying prostate cancer aggressiveness using radiomics measures [17].
Conventional-Clinical-Genomic SVM (CCG-SVM): A conventional-clinical-genomic linear SVM classifier of platinum resistance was trained using the clinical, conventional imaging, and CNB variables.
aRCG-SVM: A recursive feature elimination linear SVM classifier (RFE-SVM) was trained with repeated (100 repetitions) and nested (three-fold outer and three-fold inner) cross validation using average heterogeneity radiomic features found to be robust to scanner differences (Table 1), clinical, and CNB variables. Nested cross-validation was done to select the appropriate number of features (N = 5, 10, 15, 20, 25). The SMOTE method was used to handle class imbalance. RFE-SVM was used to perform implicit feature selection from the relatively large number of features used within the classifier.

Feature Robustness to CT Manufacturer
Statistical differences to CT scanners (GE vs. non-GE) in cluDiss and average heterogeneity radiomic features were evaluated. All MSKCC patients and 24 out of 38 TCIA patients were scanned on GE. The remaining 12 were scanned on Siemens and one each on Toshiba and Philips scanners. The goal of this experiment was to evaluate whether the variability in the features was due to the different CT scanner manufacturers. This is because large feature variabilities between scanners can obscure the signal to differentiate between the outcomes, thereby reducing the performance of radiomics measures [51].

Statistical Analysis
Machine learning classifiers were trained with repeated three-fold cross-validation and nested cross-validation (where applicable) to reduce bias in classification. Accuracy was computed from samples not used in training in each fold. Area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity with 95th percentile confidence intervals were computed. DeLong's method [52] was used to measure the differences in classifiers' AUC. Patient characteristics and texture measures were summarized using median and interquartile range (IQR). Data with missing variables or outcomes were excluded from the analysis. Two-sided Wilcoxon rank-sum test was used to test radiomics differences to scanners. Only p > 0.05 were considered significant.
Cox proportional hazard regression analysis was performed to test association with PFS using the iRCG, CCG, and aRCG scores. Hazard ratios (HR) and 95% confidence intervals were estimated. Dichotomized groups generated according to the cut points were used to compute Kaplan-Meier curves on the TCIA dataset. p Values < 0.05 were considered statistically significant. Concordance probability estimates (CPE) were computed for the individual predictors for determining the strength of association with survival measures.
Association between radiomic and gene set pathways were computed using Spearman rank correlation coefficients and principal component analysis of the Hallmark gene sets was performed using factoMineR software in R after scaling of the gene expressions.
All statistical analyses were performed in the software packages R, version 3.4 (The R Foundation for statistical computing). The code for textures and IISTH computation is available through open source software CERR [46].

Conclusions
We validated a previously developed intra-site and inter-site tumor heterogeneity measure (cluDiss) for predicting HGSOC outcomes. We showed that cluDiss, combined with known clinical and genomic variables, outperformed conventional clinical-genomic and standard radiomic-clinical-genomic models in predicting HGSOC outcomes. This measure was negatively correlated to Wnt signaling and positively to immune TME cell types. Validation on larger, multi-institutional cohorts is necessary to verify the potential for patient stratification.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/11/3403/s1, Figure S1: REMARK diagram showing patient selection, Figure S2: Differences between patients with low and high intra-site and inter-site tumor heterogeneity captured using the dissimilarity matrix and group dissimilarity matrix histogram for computing the IITH measures, Figure S3: Principal component analysis (PCA) factor loadings of the 50 Hallmark gene sets along the first three dimensions (70% variation explained) in the low-risk (cluDiss < median [68.6]) and high-risk (cluDiss ≥ median) groups, Table S1: Robustness of features to scanner differences. Wilcoxon rank-sum tests performed between GE vs. non-GE scanners. Robust features are indicated in bold font, Table S2: Spearman correlation of identified robust radiomics features with TTV and number of sites, Table S3  The funders had no role in the design of the study, the data collection, analyses or interpretation of data, in the writing of the manuscript, or in the decision to publish the results that must be declared in this section.

Conflicts of Interest:
Snyder reported prior research support from Bristol-Myers Squibb (last received June 2017). Levine reported receiving consulting fees/honoraria from Merck and Tesaro and research support from Splash Pharmaceuticals. Crispin-Ortuzar receives research support from Eli Lilly. Grisham reported consulting fees from Clovis, Regeneron, and Mateon. Brenton reported consulting fees/honoraria from GSK and Astra-Zeneca, research support from Aprea Therapeutics AB and is a cofounder and share holder of Inivata and Tailor Bio. Deasy is a co-founder and shareholder in PAIGE. AI. Sala reported consulting fees from Amazon and is a co-founder and shareholder of Lucia Medical. The funders had no role in the design of the study, the data collection, analyses or interpretation of data, in the writing of the manuscript, or in the decision to publish the results that must be declared in this section.