A Novel Predictive Multi-Marker Test for the Pre-Surgical Identification of Ovarian Cancer

Simple Summary Ovarian cancer remains one of the most lethal malignancies for women, with a complex presentation and, typically, a late-stage diagnosis. Many common benign gynecological diseases can present with similar symptoms to malignancy, and exploratory surgery is required before a conclusive diagnosis can be made. We have developed a new biomarker panel to assist in pre-surgical diagnosis and improve the clinical decision-making process. In a retrospectively collected cohort of 334 women, a multi-biomarker panel measured in plasma correctly identified malignant from benign samples with 95% sensitivity/specificity and out-performed current clinical methods. This new panel may provide a useful clinical adjunct to improve clinical workflows for patients with suspected ovarian malignancy. Abstract Ovarian cancer remains the most lethal of gynecological malignancies, with the 5-year survival below 50%. Currently there is no simple and effective pre-surgical diagnosis or triage for patients with malignancy, particularly those with early-stage or low-volume tumors. Recently we discovered that CXCL10 can be processed to an inactive form in ovarian cancers and that its measurement has diagnostic significance. In this study we evaluated the addition of processed CXCL10 to a biomarker panel for the discrimination of benign from malignant disease. Multiple biomarkers were measured in retrospectively collected plasma samples (n = 334) from patients diagnosed with benign or malignant disease, and a classifier model was developed using CA125, HE4, Il6 and CXCL10 (active and total). The model provided 95% sensitivity/95% specificity for discrimination of benign from malignant disease. Positive predictive performance exceeded that of “gold standard” scoring systems including CA125, RMI and ROMA% and was independent of menopausal status. In addition, 80% of stage I-II cancers in the cohort were correctly identified using the multi-marker scoring system. Our data suggest the multi-marker panel and associated scoring algorithm provides a useful measurement to assist in pre-surgical diagnosis and triage of patients with suspected ovarian cancer.


Introduction Background
Ovarian cancer remains one of the most lethal gynecological malignancies globally, with 314,000 new cases and 207,000 deaths in 2020 [1].Appropriate surgical referral and initial management strategy is a key indicator of outcome for ovarian cancer patients; in particular, the 5-year survival for patients with advanced-stage disease is improved when cytoreductive surgery is performed by a gynecological oncologist [2].A key obstacle in developing appropriate triage protocols, however, is the lack of diagnostic certainty.Ovarian cancers are typically asymptomatic or present as potentially benign conditions.Definitive diagnosis usually occurs post-surgically and often following extensive tissue removal.As a result, less than half of cancer patients are appropriately referred to a gynecological oncology specialist for primary surgery [3,4].
Whilst there are no universally adopted guidelines, clinical work-up for a suspected ovarian malignancy to direct referral for surgery typically involves physical examination, transvaginal ultrasound (TVU) and measurement of serum biomarkers including cancer antigen 125 (CA125) and Human Epididymal Protein 4 (HE4) [5,6].These measurements are commonly used in the calculation of Risk of Malignancy Index (RMI) or Risk of Malignancy Algorithm (ROMA) scores, used to indicate likelihood of malignancy when an ovarian mass is present [7,8].However, these modalities lack sufficient sensitivity and specificity for consistently reliable identification of malignancy-particularly for early stage, low volume disease-against the background of other benign gynecological conditions [5].More recently biomarker-based tests (e.g., OVA1™) have received FDA approval for presurgical triage; however, these have not been widely adopted to date.
Recently we reported that the measurement of inflammatory cytokines including Interleukin-6 (IL-6) and C-X-C-Motif Chemokine 10 (CXCL10) in blood serum or plasma were able to discriminate between benign and malignant serous epithelial ovarian cancers [9,10].In the case of CXCL10, this was achieved by evaluating the ratio of active: total CXCL10 in biological samples, termed the "active ratio" [10].This "active ratio test" improved the identification of malignancy in a small retrospective patient cohort, particularly when combined with the measurement of CA125.Importantly, CXCL10 measurement was largely independent of stage, suggesting that it could provide a useful addition to standard testing workflows to improve triage for surgical staging of patients with early, low-volume cancers [10].
In this study we report the use of a multi-marker panel, including the measurement of IL6 and the CXCL10 active ratio, for identification and differentiation of benign from malignant tumors.Our data suggest this biomarker panel provides improved differentiation of benign from malignant disease compared to CA125, ROMA or RMI, and can provide a useful measurement for the pre-surgical triage of patients diagnosed with an adnexal mass.

Reagents
Antibodies against intact and total CXCL10, full-length CXCL10 protein standard and all reagents were as previously described [10].Luminex magnetic bead assay kits (IL-6, HE4) were from Thermo Fisher (cat# RDSLXSAHM05).All other reagents were of analytical grade.

Clinical Samples
Assays were performed on retrospectively collected EDTA-chelated plasma samples accessed from the OCRF-sponsored Ovarian Cancer Tissue Bank, housed at the Hudson Institute of Medical Research, Australia.Ethical approval was obtained from the Southern Health Human Research Ethics Committee (HREC #06032C, #02031B), with all participants providing prior informed written consent.All samples were collected from anaesthetized, chemo-naïve patients who underwent surgery for suspected gynecological malignancies.Patients were excluded if they were <18 years at the time of surgery; had a recent (<2 years) history of breast, ovarian, uterine or other gynecological cancer; had undergone chemo, radio-or immune-therapy within the preceding 12 months; or were immunocompromised at the time of diagnosis.Histological assessment of tumor type, stage and grade, presurgical clinical markers (CA125, CEA, CA15.3 and CA19.9), pre-surgical pelvic imaging, age, self-reported menopausal status, pre-existing medical conditions and any prior history of malignancy were obtained from de-identified patient medical records.The imaging data were reviewed and scored according to the RMI2 schedule [11] by a gynecological oncology specialist.Details of the cohort are provided in Supplementary Table S1.
The Risk of Malignancy Algorithm (ROMA) was calculated according to [12] using the following formula: The Risk of Malignancy Index (RMI) was calculated according to [13] using the following formula: where ultrasound score is 1 or 4, menopausal status is 1 (pre) or 4 (post) and serum CA125 is in units/mL.The RMI2 scoring system was used as recommended [11].

ELISA
Biomarker measurements by ELISA were performed using EDTA-chelated plasma samples recovered from previously bio-banked specimens.Plasma samples were thawed on ice and clarified by centrifugation (16,000× g, 10 min at room temp) prior to use.The CXCL10 active ratio ELISA was carried out as previously described [10].Magnetic bead immunoassay for IL-6 and HE4 was carried out as previously described [9].A total of 100 beads per analyte were counted and the median fluorescence intensity was determined.Quantitation was performed against a standard curve for each analyte using five-parameter logistic curve fitting.
Exploration and estimation of diagnostic performance for the multi-marker panel (combining CXCL10 active ratio, IL-6, CA125 and HE4) was as follows.Missing values for HE4 and IL-6 in a single sample were assigned as their respective data median in each case.Biomarker measurements were transformed using the Yeo-Johnson method [16], and analytes contributing the greatest linear separation between groups were identified by linear discriminant analysis.A classification model was then defined by fitting a multivariate logistic regression model to the Yeo-Johnson transformed data.Model performance was estimated using repeated stratified K-fold cross-validation (4 folds × 5 repeats), with performance estimated using the mean and standard deviation of AUC across all submodels.The model was then refit to the entire dataset of n = 334 cases, and its performance was re-evaluated.A scoring cutoff point was chosen according to Youden's J index [17].Final estimates for the full-dataset model were compared to the cross-validation estimates to assess potential overfit in the model.

Patient Characteristics
The characteristics of the retrospective cohort used for testing are provided in Table 1 and Supplementary Table S1.A total of 334 patient samples met the requirements of this study and were included for analysis.All patients were recruited following referral to a gynecological oncologist for exploratory surgery.In total there were 164 ovarian malignancies (49%) and 170 benign (51%) cases included, with 34% or 66% from pre-or post-menopausal women, respectively.Malignancy was more common in post-menopausal women (~60% of samples) compared to pre-menopausal (~28% of samples), and diagnosed malignancies were almost exclusively high grade (grade 2-3) ovarian cancers of serous epithelial pathology.Amongst the samples included were 17 (~10%) stage I ovarian cancers, of which 14 (82%) were grade 2-3.Approximately 37% of patients (123/334) had known genetic abnormalities at the time of diagnosis, with relatively even distributions between pre-menopausal (~54%) and post-menopausal (~47%) cohorts.Approximately 50% of the cohort had unknown mutational status.

Individual Marker Performance
Median values for each biomarker according to disease (benign or malignant) are provided in Table 2. Unsurprisingly, significant differences were observed between benign and malignant samples for patient age at diagnosis and all individual markers measured (Figure 1).Each of CA125, HE4 and IL-6 increased in a stage-specific manner and were highest in late-stage (stages 2-4) disease.As previously reported [10] the CXCL10 active ratio was independent of cancer stage (Figure 1A) suggesting it may assist in the differentiation of benign disease from early-stage malignancy.RMI2 score also appeared independent of stage; however, only a limited number of stage I samples (n = 7) could be included for analysis due to the absence of imaging data.Linear discriminant analysis (LDA) was used to identify which individual markers contributed the greatest individual separation between benign and malignant samples.Amongst all biomarkers evaluated, discriminant coefficients with the largest magnitude were, in descending order, HE4, CA125, CXCL10 active ratio and IL-6 (Figure 1B).Total CXCL10 also provided discrimination but was excluded due to collinearity with CXCL10 active ratio.

Development of a Combined Biomarker Model
Commencing with individual biomarker measurements, we evaluated different regression models and marker combinations as follows.The data were first transformed to approximate a standard normal distribution using a Yeo-Johnson transformation [16], and a transformation parameter λ was determined for each biomarker.The transform for any individual marker x i was defined by the following; The transformed biomarker values were standardized according to x (S) = (x (λ) − µ)/σ, where x (S) is the standardized individual measurement in each case.Model selection for combined biomarker analyses was then estimated using multiple model types (including support vector classifier, decision tree, naive bayes and logistic regression), with performance assessed by repeated stratified k-fold cross-validation (4 folds × 5 repeats).Combinations of up to five biomarkers were analyzed.Primary metrics used for comparison were mean AUC ± SD across the 20 sub-models within each cross-validation estimate.A final linear regression model combining four biomarkers (HE4, CA125, IL-6 and CXCL10 active ratio) was chosen, that provided the highest AUC-SD.Linear discriminant analysis (LDA) was used to identify which individual markers contributed the greatest individual separation between benign and malignant samples.Amongst all biomarkers evaluated, discriminant coefficients with the largest magnitude were, in descending order, HE4, CA125, CXCL10 active ratio and IL-6 (Figure 1B).Total CXCL10 also provided discrimination but was excluded due to collinearity with CXCL10 active ratio.The multivariate logistic regression model was defined as described below and then fit to the transformed and scaled dataset using Maximum Likelihood Estimation.This results in the determination of a set of coefficients β.

IL6
where p(x) indicates probability of malignancy.
Cancers 2023, 15, 5267 7 of 13 Prediction of malignancy for any given observation was then obtained by applying the cutoff to the risk score S, calculated according to S = 10p(x); Using Youden's J index [17], an optimal risk score cutoff point of 3.684 was determined.Potential overfit was assessed by comparison between cross-validation and full model performance estimates.Good agreement was observed for all metrics (within <1% variation in every case-Table 3) indicating an acceptably low level of overfit and suggesting that performance estimates from the final model were reliable.Model performance for discrimination between benign and malignant samples was then assessed, with comparisons of the multi-marker panel score made against standard cutoff values for (≥35 U/mL [18]), RMI (≥200 [7]) and ROMA (pre-menopausal ≥13.1%; postmenopausal ≥27.7% [13]).Metrics for comparison included the area under the curve (AUC), sensitivity/specificity and negative/positive predictive values (Table 4 and Figure 2).Receiver operator characteristic (ROC) curves were generated for each of the multi-marker panel, CA125, RMI and ROMA tests (Figure 2A).The multi-marker panel achieved a clear increase in overall efficacy, providing improved sensitivity/specificity characteristics for differentiation of between benign from malignant samples compared to CA125, RMI2 or ROMA (Figure 2A).The AUC, sensitivity/specificity, PPV/NPV and overall accuracy for each test were determined for the combined cohort, as well as separately for pre-and post-menopausal samples (Table 4).Overall AUC in the combined cohort was above 0.95 in every case, with the highest AUC (0.98) achieved using the multi-marker panel (Table 4).Sensitivities were similar in each case; however, the specificities of CA125 (0.82) and RMI2 (0.75) were reduced compared to the multi-marker panel and ROMA (Table 4).Corresponding PPV was also comparatively lower for each of CA125 (0.83) and RMI2 (0.66).
a clear increase in overall efficacy, providing improved sensitivity/specificity characteristics for differentiation of between benign from malignant samples compared to CA125, RMI2 or ROMA (Figure 2A).  1 and 2.
The AUC, sensitivity/specificity, PPV/NPV and overall accuracy for each test were determined for the combined cohort, as well as separately for pre-and post-menopausal samples (Table 4).Overall AUC in the combined cohort was above 0.95 in every case, with the highest AUC (0.98) achieved using the multi-marker panel (Table 4).Sensitivities were similar in each case; however, the specificities of CA125 (0.82) and RMI2 (0.75) were reduced compared to the multi-marker panel and ROMA (Table 4).Corresponding PPV was also comparatively lower for each of CA125 (0.83) and RMI2 (0.66).   1 and 2.
Overall, whilst NPV was similar across each marker score and group (combined, preor post-menopausal; between 0.93 and 0.98) the highest PPV was achieved by the multimarker panel in each test (Table 4).The multi-marker panel was the only test to maintain specificity and PPV values above 90% in every case, demonstrating an overall performance that exceeded that of the other tests.Thus, within this cohort, the multi-marker panel provided improved capability to differentiate benign from malignant disease.

A Multi-Marker Panel Assists in the Identification of Early Stage Cancers
Differentiation of early-stage cancers (FIGO stages I and II) from non-malignant growths is particularly challenging, especially in the case of low-volume cancers where Cancers 2023, 15, 5267 9 of 13 CA125 can be below the ≥35 U/mL threshold [19].Amongst 17 stage I cancers in the dataset, the multi-marker panel correctly classified 13 (81%) as malignant.These included four stage I cancer samples (two pre-and two post-menopausal) with CA125 < 35 U/mL which were not identified by CA125 or ROMA index; whilst one additional sample (postmenopausal, CA125 = 9 U/mL) was correctly identified by the multi-marker panel.A further four stage II cancers were correctly classified by all scoring systems.RMI2 could not be compared as ultrasound information was not recovered for the majority of these samples.

Discussion
It is well established that patients diagnosed with OC have significantly improved survival benefit when primary surgery is performed by a specialist gynecological oncology surgeon [20,21].Appropriate pre-surgical triage is therefore highly desirable in a clinical setting, to ensure cancer patients at all stages derive maximal benefit from treatment [22].Currently less than 50% of cancer patients receive primary surgical intervention provided by an appropriately trained specialist, due to the difficulties in diagnosis and differentiation from more common benign conditions [4,20,[22][23][24].More effective pre-surgical triage testing is thus required to ensure appropriate referrals occur as early in the clinical workflow as possible.
Our data demonstrates high accuracy using a multi-marker panel incorporating the CXCL10 "active ratio" as a method to discriminate benign from malignant disease.CXCL10 is produced early in cancer progression and can be modified through enzymatic cleavage to produce an inactivated protein [10,25].We previously demonstrated that the measurement of active and total circulating CXCL10, and calculation of their relative ratio, provided a useful measurement for the differentiation of benign from malignant disease [10].This study now extends that data in a new cohort of patients, diagnosed with either benign or malignant adnexal mass.By comparison to common clinically used methods including CA125, RMI and ROMA, our multi-marker panel achieved superior sensitivity and specificity for the classification of benign from malignant samples in this dataset.Whilst all tests examined achieved the 75% specificity/80% sensitivity suggested as a minimum requirement for clinical use [13], the multi-marker panel outperformed the other modalities (Figure 2 and Table 4).At a comparative specificity of 95%, the sensitivity of each of CA125 and RMI remained below 80%; whilst ROMA remained under 90% sensitivity at the same threshold.Only the multi-marker panel achieved a 95% sensitivity at this level, highlighting its superior ability to differentiate benign from malignant disease.Moreover, the multi-marker panel operated independently of ultrasound scoring suggesting that a two stage clinical workup-as is currently recommended under American College of Obstetricians and Gynecologists (ACOG) guidelines [26]-should provide a practical improvement in the pre-surgical classification of adnexal masses.Our regression model also did not appear overly affected by menopausal status, suggesting this model may provide broad applicability pre-surgical discrimination of benign from malignant disease.
At present there is no clinically routine pre-surgical method for reliable evaluation and differentiation of benign vs. malignant adnexal mass.Cytoreductive surgery is the cornerstone of cancer management, with complete resection desirable for optimal outcomes [6].Complete hysterectomy is the norm, with bilateral oophorectomy performed in up to 80% of cases [27].However, removal of the ovaries predisposes women to multiple co-morbidities including increased risk of cardiovascular disease, dementia and certain cancers amongst others [27][28][29]; current recommendations, therefore, suggest a cautious approach to ovarian removal [30].In the case of benign disease, which outnumbers malignant diagnoses by ~9:1 [30,31], there is a clear need to differentiate pre-surgically to enhance patient outcomes-particularly in the case of pre-menopausal patients, where fertility preservation may be an important consideration and requires specialist input [32][33][34][35].
In the absence of accepted international guidelines, the most commonly used approaches for clinical work-up of patients with adnexal mass are the Risk of Malignancy Index (RMI) or Risk of Malignancy Algorithm (ROMA) scores [7,8].Whilst RMI can achieve specificity up to ~95% (i.e., the ability to correctly differentiate between benign and malignant disease), its sensitivity is generally lower at 71-80% (i.e., the ability to correctly identify the presence of disease) [8].For the detection of stage I-II ovarian cancers, sensitivity is further reduced to ~54% [19].RMI is also dependent on the quality of ultrasound imaging.ROMA typically exhibits higher sensitivity but lower specificity than RMI [36]; in addition, ROMA cutoffs can differ between suppliers (e.g., pre/post-menopausal values of 11.4%/29.9%Roche Diagnostics; 7.4%/25/3% Abbott Diagnostics) [37].An alternative, biomarker-based approach is an attractive option for improved identification of malignancy-particularly in the case of early stage (FIGO stage I) and metastatic low volume (FIGO stages II, IIIA1(i) and IIIA2) cancers [38], which can be challenging to correctly identify prior to surgery and often require additional and expensive radiological work-up prior to diagnostic laparotomy.Our multi-marker test correctly identified over 75% of FIGO stage I tumors within the dataset, compared to ~59% using ROMA; and all but four stage II-IV cancers.Accordingly, this multi-marker panel may enhance the rapid and effective triage of patients with early stage and/or low volume tumors to ultimately minimize overall health costs, reduce procedures and time associated with clinical work-up, and maximize treatment outcomes for these patients.
Several biomarker-based tests have been introduced for pre-surgical triage, and currently ROMA™ (Fujirebio, Tokyo, Japan), OVA1™ and OVERA™ (Aspira Women's Health, Austin, TX, USA) are FDA-approved as aids to assess whether a pre-or post-menopausal woman who presents with a suspicious adnexal mass (reviewed in [17]) is at a low or high likelihood of finding malignancy on surgery.Evaluated independently of CA125, OVA1 achieved a sensitivity of 92% but with a specificity of 35%; in combination with physician's assessment, a modest increase in sensitivity to 96% was observed [39].Overall OVA1 was not able to improve on CA125 alone [40].The OVERA test, an iterative advancement on the original OVA1 test, exhibited improved sensitivity (91-94%)/specificity (69-74%) for the differentiation of benign from malignant adnexal masses [41].Whilst it is not possible to directly compare our data for each of these biomarker-based tests, the high 95% sensitivity/95% specificity characteristics of our multi-marker panel suggests that it will have utility in the triage of adnexal masses.These findings now require validation using an independent cohort.
The overwhelmingly high mortality of ovarian cancers is in part due to the aggressive nature of the disease coupled with the lack of screening strategies [39,42].In particular, the 5-year survival for patients diagnosed with early (FIGO stage I) disease is >90%; recurrence rates for these patients are below 20% [43].Screening is therefore widely believed to be key in reducing mortality from ovarian cancer.Due to the low prevalence of ovarian cancer in the community, a screening test requires a minimum of 99.6% specificity at a minimum sensitivity of 75% [39,42]; currently no testing modalities meet this threshold.Our multimarker panel achieved a sensitivity of 98.8% at a specificity of 75% in this cohort and correctly identified over 75% of all stage I cancers present, suggesting the potential of these markers to contribute to early-stage detection of ovarian cancers.Further development may present an opportunity to apply new biomarkers such as the CXCL10 active ratio in a future screening context for ovarian cancer.

Conclusions
Determination of the circulating CXCL10 active ratio contributes positively to the definition of benign from malignant disease and, when incorporated into a four-biomarkerpanel, out-performs existing modalities.The assembled biomarker panel and associated scoring algorithm provides a useful measurement to assist in pre-surgical diagnosis and triage of patients with suspected ovarian cancer.

Patents
Aspects of this study are covered by granted patent 2020404453 and provisional patent 540674PRV.

Figure 1 .
Figure 1.Median marker values and linear discriminants for individual parameters used in scoring.(A)Individual analytes and/or calculated scores (ROMA, RMI2) within the cohort, according to disease type and stage.Sample numbers are provided in Tables1 and 2. * p ≤ 0.01; ** p ≤ 0.01; *** p ≤ 0.001; and **** p ≤ 0.0001.(B) Linear discriminant coefficients between groups for each parameter evaluated.

Figure 1 .
Figure 1.Median marker values and linear discriminants for individual parameters used in scoring.(A) Individual analytes and/or calculated scores (ROMA, RMI2) within the cohort, according to disease type and stage.Sample numbers are provided in Tables 1 and 2. * p ≤ 0.01; ** p ≤ 0.01; *** p ≤ 0.001; and **** p ≤ 0.0001.(B) Linear discriminant coefficients between groups for each parameter evaluated.

Figure 2 .
Figure 2. Performance of individual scoring systems for discrimination between benign and malignant disease.(A) ROC curves were constructed to assess each scoring system (multi-marker panel, CA125, RMI2 and ROMA).Cutoff values for each marker were as follows: multi-marker panel 3.68; CA125 >35 U/mL, RMI >200 and ROMA pre-menopausal >13.1% or post-menopausal >27.7%.(B) Violin plots demonstrating comparative scoring across all samples for each scoring system.Sample numbers are as indicated in Tables1 and 2.

0Figure 2 .
Figure 2. Performance of individual scoring systems for discrimination between benign and malignant disease.(A) ROC curves were constructed to assess each scoring system (multi-marker panel, CA125, RMI2 and ROMA).Cutoff values for each marker were as follows: multi-marker panel 3.68; CA125 > 35 U/mL, RMI > 200 and ROMA pre-menopausal >13.1% or post-menopausal >27.7%.(B) Violin plots demonstrating comparative scoring across all samples for each scoring system.Sample numbers are as indicated in Tables1 and 2.

Table 1 .
Cohort characteristics of patient samples included in this study.

Table 2 .
Individual marker concentrations and calculations.

Table 3 .
Performance and cross-validation estimates for the multi-marker model.

Table 4 .
Overall performance metrics for classification of benign vs. malignant disease.

Table 4 .
Overall performance metrics for classification of benign vs. malignant disease.