Diagnostic Added-Value of Serum CA-125 on the IOTA Simple Rules and Derivation of Practical Combined Prediction Models (IOTA SR X CA-125)

Background: This study aimed to evaluate the diagnostic added-value of serum CA-125 to the International Ovarian Tumor Analysis (IOTA) Simple Rules in order to facilitate differentiation between malignant and benign ovarian tumors before surgery. Methods: A secondary analysis of a cross-sectional cohort of women scheduled for surgery in Maharaj Nakorn Chiang Mai Hospital between April 2010 and March 2018 was carried out. Demographic and clinical data were prospectively collected. Histopathologic diagnosis was used as the reference standard. Logistic regression was used for development of the model. Evaluation of the diagnostic added-value was based on the increment of the area under the receiver operating characteristic curve (AuROC). Results: One hundred and forty-five women (30.3%) out of a total of 479 with adnexal masses had malignant ovarian tumors. The model that included information from the IOTA Simple Rules and serum CA-125 was significantly more superior to the model that used only information from the IOTA Simple Rules (AuROC 0.95 vs. 0.89, p < 0.001 for pre-menopause and AuROC 0.98 vs 0.83, p < 0.001 for post-menopause). Conclusions: The IOTA SR X CA-125 model showed high discriminative ability and is potentially useful as a decision tool for guiding patient referrals to oncologic specialists.


Introduction
General gynecologists are required to provide an accurate differentiation between benign and malignant adnexal pathologies to ensure an optimal starting point in the whole chain of care [1], as this would lead to appropriate decisions regarding the referral of patients to specialized oncologic care. Women with malignant masses should be referred to gynecologic oncologists for proper surgical staging and optimal debulking surgery [2]. In contrast, women with benign masses can be managed conservatively or with a minimally invasive approach (e.g., laparoscopic surgery), which can be safely performed by general gynecologists. Misclassification in either direction could ultimately lead to a decrease in patient survival, or serious morbidity and unnecessary infertility from overly radical surgery.
Transvaginal ultrasonography is generally the first modality used by gynecologists to characterize these masses in practice. It is currently the only imaging modality recommended by the American College of Obstetricians and Gynecologists (ACOG) to evaluate adnexal masses in women [3]. To date, this is widely accepted that the most accurate approach for the preoperative diagnosis of adnexal masses is subjective assessment (SA) by an experienced sonographer [4]. It has been proved superior to other widely advocated methods such as the Risk of Malignancy Index (RMI), the International Ovarian Tumor Analysis (IOTA) Simple Rules, and IOTA logistic regression models [4,5]. However, there are still major limitations with the subjectivity of such a method and the lack of expert examiners in most settings.
The IOTA models carry important advantages over subjective assessments in terms of objectivity, simplicity, and applicability [6]. They provide easy-to-use guidance to nonexpert sonographers for making an accurate presurgical diagnosis. Many multi-national external validation studies have confirmed the robustness of the accuracy of the IOTA Simple Rules and the IOTA logistic regression models [7,8]. The main disadvantage of the IOTA Simple Rules is the possibility of inconclusive results when they do not apply. According to previous reports, the proportion of inconclusive results could be as high as 20% [9,10]. A "two-step strategy" using the IOTA Simple Rules with the addition of subjective assessment for masses with inconclusive results was proposed and proved to have excellent test performance comparable to that of subjective assessment alone [11]. However, this strategy requires the availability of experienced examiners in the same setting to avoid unnecessary referrals and to reduce the health care costs.
Serum cancer antigen 125 or CA-125 is a well-known biomarker for epithelial ovarian cancer. Although the role of CA-125 in the diagnosis of ovarian cancer is controversial due to its only fair level of sensitivity and poor specificity, it is still widely used in the assessment of women with adnexal masses and is routinely used in preoperative investigation. Other biomarkers have been investigated to improve specificity for the diagnosis of ovarian cancer, such as the Human Epididymis Protein 4 or HE4. Although HE4 is highly specific to ovarian malignancy, it is not as sensitive as serum CA-125 [12]. For this reason, the risk of malignancy algorithm (ROMA) was developed to incorporate both the sensitivity of serum CA-125 and the specificity of HE4 to yield a better diagnostic performance [13]. However, in Thailand, HE4 analysis is not generally used in the preoperative evaluation due to its relatively high cost and the fact that it is only available in few institutions.
For pre-surgical diagnosis of women with adnexal masses, the American College of Obstetrics and Gynecology recommends a multivariable approach by combining demographic, clinical, laboratory, and imaging parameters to achieve better diagnostic accuracy [3,14]. Even though the application of IOTA Simple Rules has recently been proven acceptably accurate in our setting [9], we hypothesized that its accuracy could be further improved by including other relevant parameters such as serum CA-125. The primary aim of this study was to evaluate the diagnostic added-value of serum CA-125 to the IOTA Simple Rules, without subjective assessment, to differentiate between malignant and benign ovarian tumors before surgery. The evaluation would be done separately for premenopausal and postmenopausal women to eliminate the presence of effect modification. The secondary aim was to derive new prediction models based on the IOTA Simple Rules and serum CA-125 to enable the diagnostic prediction of ovarian malignancy in all patients, both conclusive and inconclusive, without relying on the presence of experienced sonographers.

Design and Setting
A secondary analysis of a cross-sectional cohort of women with adnexal masses was performed to evaluate the diagnostic added-value of serum CA-125 and to develop novel diagnostic models for the prediction of ovarian malignancy. The data of patients who were admitted for pelvic operation for adnexal masses at Maharaj Nakorn Chiang Mai Hospital were prospectively collected between April 2010 and March 2018. The hospital is a university-affiliated teaching hospital with a specialized oncologic center located in Chiang Mai Province.

Study Patients and Data Collection
Women with adnexal masses who were scheduled for surgery that met the following criteria were included in the analysis: (1) were diagnosed with an adnexal mass by either pelvic ultrasonographic examination or by vaginal examination; (2) had no known diagnosis of the mass before surgery. For women who had more than one adnexal mass, the mass with most complex ultrasonographic features, or the mass with higher malignancy potential on sonographers' judgment was included. Patients whose mass was surgically removed after 24 h of ultrasonographic examination and patients without preoperative CA-125 were excluded. Preoperative transvaginal ultrasound examination was performed in all included patients by non-expert sonographers. Sonographers were blinded to patients' clinical characteristics and preoperative laboratory results. During ultrasonographic examination, the morphology of the adnexal masses was characterized using 2D real-time and color Doppler ultrasound.
Demographic and clinical data including age, parity, menopausal status, and tumor marker level (i.e., CA-125) were prospectively collected. Ultrasonographic features of the adnexal masses based on the IOTA Simple Rules and the ultrasound score of the Risk of Malignancy Index (RMI) were recorded.

The IOTA Simple Rules
The International Ovarian Tumor Analysis (IOTA) Simple Rules is a classification system for preoperative diagnosis of ovarian cancer [15]. According to these rules, the adnexal masses are categorized into benign, malignant, or inconclusive tumors based on the presence of benign features (B-features) or malignant features (M-features). The B-features are as follows: (1) unilocular; (2) presence of solid components with largest diameter <7 mm; (3) presence of acoustic shadows; (4) smooth multilocular tumor with largest diameter <100 mm; and (5) no blood flow (color score 1). The M-features are as follows: (1) irregular solid tumor; (2) presence of ascites; (3) at least four papillary structures; (4) irregular multilocular-solid tumor with largest diameter ≥100 mm; and (5) very strong blood flow (color score 4). The mass would be categorized as benign if one or more B-features applied in the absence of an M-feature. Conversely, the mass would be categorized as malignant if one or more M-features applied in the absence of a B-feature. If both M-features and B-features applied or neither of the features applied, the mass was categorized as inconclusive.

Reference Standard
Histopathologic diagnosis of the surgical specimen was used as the reference standard for definite diagnosis of the adnexal masses. In the case of some benign masses without pathological specimens, intraoperative diagnosis made by the surgeons was used as reference. All adnexal masses were classified into two groups, benign or malignant ovarian tumors. Borderline ovarian tumors were grouped with malignant ovarian tumors.

Statistical Analysis
Statistical analyses were carried out using Stata version 16 (StataCorp, College Station, TX, USA). Frequency and percentage were used to describe categorical data. Mean and standard deviation or median and interquartile range were used to express continuous data according to their distribution. An exact probability test was used to compare the differences in categorical data between groups. An independent t-test or a Mann-Whitney U test was used to compare the differences in continuous data as appropriate.

Evaluation of Diagnostic Added-Value
The following steps were performed to evaluate the diagnostic added-value of CA-125 to the IOTA Simple Rules in premenopausal and postmenopausal women. First, three logistic regression models were developed separately in each subgroup of women. The results of the IOTA Simple Rules were left in their three original categories, without exclusion of patients with inconclusive results. This allowed us to utilize the full data of the patients. The first model included only the information from the IOTA Simple Rules (PRE1 and POST1). The second model included only the information from the log-transformed CA-125 (PRE2 and POST2). The third model included the information from both the IOTA Simple Rules and the log-transformed CA-125 (PRE3 and POST3). For both the second and the third model, a multivariable fractional polynomials (MFP) algorithm was performed to best fit the continuous value of the log-transformed CA-125 into the binary logistic model. In this study, the log-transformed CA-125 was included as the first-degree fractional polynomial term (FP1), log (CA-125) 3 .
The area under the receiver operating characteristic curve (AuROC) was used as the main measure of model performance. We employed the method proposed by DeLong and colleagues to check for significant differences in AuROCs [16]. If the AuROCs of the PRE3 and POST3 models were significantly better than the AuROC of the PRE1 and POST1 models, we concluded that serum CA-125 had added diagnostic value to the IOTA Simple Rules. In contrast, if the AuROCs of the PRE3 and POST3 models were not significantly better than the AuROC of the PRE1 and POST1, we concluded that serum CA-125 did not add any diagnostic value to the IOTA Simple Rules.

Prediction Model Development
If the models that included both serum CA-125 and the IOTA Simple Rules outperformed the models that contained only the IOTA Simple Rules, the combined models were developed further into a diagnostic prediction model and evaluated for model diagnostic performance. We plan to develop separate models for premenopausal and postmenopausal women in order to eliminate the modifying effect of menopausal status. The models were then presented as logistic regression equations. The prediction of probability of ovarian malignancy can be estimated by the inverse logit transformation of the linear predictor (lp) as follows: probability = e lp /(1 + e lp ), where e is the base value of natural logarithms.

Prediction Model Performance
The performance of each prediction model was measured separately in terms of discrimination, calibration, and clinical utility. The measure of discrimination was AuROC as described earlier. To examine the agreement between the model predicting the risk of malignancy and the observed proportion of malignancy, calibration plots were completed. Internal validation was performed using a boot-strap procedure with 500 replicates. The model optimism and shrinkage factor were estimated and presented To evaluate the clinical utility of the combined IOTA Simple Rules and CA-125 models over the models using IOTA Simple Rules alone, we conducted a decision curve analysis (DCA) [17]. This simple approach focuses on the net benefit (NB) gained from using the prediction models in making clinical decisions. The NB is calculated as the subtraction of harms (false positives) from benefits (true positives), as in the subtraction of expenditure from total income to calculate profit [18]. The decision curves were plotted to visualize the trend in NB of the prediction models across the range of threshold probability of patient referrals. The threshold probability is the minimum probability of ovarian malignancy at which a general gynecologist would opt for referral to oncologists. The NB of the index models should be compared to the two default strategies of referring all patients or not referring any patients. A prediction model with a clinical usefulness should have higher value of NB over the other models and the two default strategies across the entire range of threshold probability. In the context of pre-surgical diagnosis of ovarian cancer, we chose 5% to 50% as reasonable threshold probabilities. These decisions were made using the

Diagnostic Accuracy of the Models
Sensitivity, specificity, positive predictive values (PPV), negative predictive values (NPV), and positive likelihood ratios (LHR+) were calculated to compare the diagnostic ability of the IOTA Simple Rules and the newly derived prediction models at different levels of risk threshold. For binary classification of the IOTA Simple Rules, masses that were interpreted as malignant and inconclusive were classified as malignancy. In the case of the combined prediction models, we pre-specified the risk-thresholds for evaluation of diagnostic ability at ≥10%, ≥20%, ≥30%, ≥40%, and ≥50%. The analyses were done separately using the premenopausal and postmenopausal data.

Baseline Characteristics of the Study Patients
Out of a total of 479 women with adnexal masses included in this secondary analysis, 145 (30.3%) had malignant ovarian tumors and 334 (69.7%) had benign ovarian tumors. There were 115 (24.0%) postmenopausal women and 364 (76.0%) premenopausal women. The comparison of clinical characteristics, biomarkers, and features of the IOTA Simple Rules between women with malignant and benign ovarian tumors are presented in Table 1. There were significant differences in the proportions of nulliparous women (51.7% vs. 41.0%, p = 0.035), and postmenopausal women (40.0% vs. 17.1%, p < 0.001) between groups. All aspects of the IOTA Simple Rules (the M features and the B-features) showed a significant difference between women with malignant and benign ovarian tumors. However, in this cohort, the IOTA Simple Rules can only be applied in 392 (81.8%) women with conclusive results. The proportion of inconclusive results was not significantly different between malignant and benign tumors (19.3% vs. 17.7%, p = 0.699) The differences in clinical characteristics between women with malignant and benign ovarian tumors for premenopausal and postmenopausal women are presented in Supplementary Materials Tables S1 and S2, respectively. Supplementary Materials Table S3 shows histopathological classification of ovarian tumors in both premenopausal and postmenopausal women.

The Added-Value of CA-125
The evaluation of the impact of the diagnostic value of serum CA-125 on the IOTA Simple Rules for premenopausal and postmenopausal women is shown in Table 2 Table 3. In both premenopausal and postmenopausal women, the model that uses the information from the IOTA Simple Rules and CA-125 (PRE3 and POST3) was significantly superior than the model that uses only the information from the IOTA Simple Rules (PRE1 and POST1) ( Table 3).

Prediction Model Performance
In the case of premenopausal women, the PRE3 model exhibited the best discriminative ability (AuROC 0.94, 95%CI 0.91, 0.98). It contained only two sets of predictors:  Table 2). For the premenopausal model, the apparent AuROC was 0.9481 and the test AuROC was 0.9442. The optimism was estimated at 0.0039 and the shrinkage factor was 0.9821. In the postmenopausal model, the apparent AuROC was 0.9771 and the test AuROC was 0.9761. The optimism was estimated at 0.0010 and the shrinkage factor was 0.9751. The detailed results of the internal bootstrap validation procedure are shown in Supplementary Materials Table S4. The agreement between the predicted probability of ovarian malignancy from both prediction models and the observed proportion of malignancy in each group of women was visualized from the calibration plot (Figure 2a,b). the postmenopausal model, the apparent AuROC was 0.9771 and the test AuROC was 0.9761. The optimism was estimated at 0.0010 and the shrinkage factor was 0.9751. The detailed results of the internal bootstrap validation procedure are shown in Supplementary Materials Table S4. The agreement between the predicted probability of ovarian malignancy from both prediction models and the observed proportion of malignancy in each group of women was visualized from the calibration plot (Figure 2a,b). The clinical utility of the prediction models was illustrated via the decision curve (Figure 3a,b). The NB of both the IOTA Simple Rules and PRE3 model, or the IOTA SR X CA-125 in premenopausal women, was higher than the default strategies, specifically an approach to refer all patients or not to refer any patient, across the entire range of threshold probability for patient referrals. The NB of the combined model was higher than that of the IOTA Simple Rules beyond the threshold probability of 15%. The NB of the POST3 model or the IOTA SR X CA-125 model for postmenopausal women was higher than both default strategies and the IOTA Simple Rules alone across the entire range of threshold probability for patient referrals. The NB of the IOTA Simple Rules started to depart from the approach to refer all patients after a threshold probability of 20%. The clinical utility of the prediction models was illustrated via the decision curve (Figure 3a,b). The NB of both the IOTA Simple Rules and PRE3 model, or the IOTA SR X CA-125 in premenopausal women, was higher than the default strategies, specifically an approach to refer all patients or not to refer any patient, across the entire range of threshold probability for patient referrals. The NB of the combined model was higher than that of the IOTA Simple Rules beyond the threshold probability of 15%. The NB of the POST3 model or the IOTA SR X CA-125 model for postmenopausal women was higher than both default strategies and the IOTA Simple Rules alone across the entire range of threshold probability for patient referrals. The NB of the IOTA Simple Rules started to depart from the approach to refer all patients after a threshold probability of 20%.

Comparative Validation of Diagnostic Performance
In premenopausal women, only 80.2% (292/364) had conclusive IOTA Simple Rules results. The sensitivity and specificity of the IOTA Simple Rules in women with conclusive results were 90.8% (95%CI 81.0%, 96.5%) and 91.6% (95%CI 87.2%, 94.9%), respectively. By considering referral of patients with an inconclusive result, the sensitivity of the IOTA Simple Rules increased to 93.1% (95%CI 85.6%, 97.4%), whereas the specificity dropped to 75.1% (95%CI 69.6, 80.1%) ( Table 4). The sensitivity and specificity of the IOTA SR X CA-125 model for premenopausal women differ depending on the selected risk threshold to predict malignancy. The sensitivity, specificity, positive predictive values, negative predictive values, and positive likelihood ratios at each pre-specified risk threshold of the IOTA SR X CA-125 model are presented in Table 4. In postmenopausal women, only 87.0% (100/115) had conclusive IOTA Simple Rules results. The sensitivity and specificity of the IOTA Simple Rules in women with conclusive results were 75.0% (95%CI 61.1%, 86.0%) and 93.8% (95%CI 82.8%, 98.7%), respectively. When women with inconclusive results were considered for referral, as in the case of women with malignant results, the sensitivity of the IOTA Simple rules increased to 77.6% (95%CI 64.7%, 87.5%), whereas the specificity dropped to 78.9% (95%CI 66.1%, 88.6%) ( Table 5). The sensitivity and specificity of the IOTA SR X CA-125 model for postmenopausal women differ depending on the selected risk threshold to predict malignancy. The sensitivity, specificity, positive predictive values, negative predictive values, and positive likelihood ratios at each pre-specified risk threshold of the IOTA SR X CA-125 model are presented in Table 5.

Discussion
In this study, the addition of serum CA-125 to the IOTA Simple Rules was proven to increase the diagnostic value in preoperatively differentiating between malignant and benign ovarian tumors in women who presented with adnexal masses. The benefit of such an approach was identified in both premenopausal and postmenopausal women. However, the improvement in diagnostic performance seems to be larger in postmenopausal women. The prediction models that combine the information from both the IOTA Simple Rules and serum CA-125, or the IOTA Simple Rules X CA-125 models, might be comparable to the widely accepted two-step strategy of IOTA Simple Rules and the IOTA logistic regression models. The application of the combined models in practice might be a more practical and effective approach for the triage of women with adnexal masses in settings where experienced sonographers were not available to provide accurate subjective evaluations of the masses.
Within the past decade, the IOTA Simple Rules have gained more popularity and have been continuously validated and implemented in many academic hospitals [20], including our institution [21,22]. According to one meta-analysis, the pooled sensitivity and specificity of the IOTA Simple Rules was 93.0% and 95.0%, respectively [23]. However, the rate of inconclusive results from the IOTA Simple Rules was relatively high, 10%-20% on average [9,23,24]. In the case of non-academic hospitals, where experienced sonographers were not generally available, the patients with inconclusive results still needed to be referred to specialized oncologic centers (i.e., to be managed in the same way as patients with malignant results from the rules). In our previous validation study on the IOTA Simple Rules [9], the sensitivity and specificity of the rules where only patients with conclusive results were included were 83.8% and 92.0%, respectively. In this study, to which the same dataset was applied, when inconclusive patients were interpreted as malignant patients, the accuracy substantially changed. In postmenopausal women, the sensitivity and specificity of the IOTA Simple Rules dropped to 77.6% and 78.9%, respectively. In contrast, the sensitivity increased to 93.1% and the specificity decreased to 75.1% in premenopausal women.
According to a recent systematic review and meta-analysis of 47 studies, it is suggested that a two-step strategy should be used for patients with inconclusive results from the IOTA Simple Rules to achieve the highest level of diagnostic accuracy (sensitivity 91.0% and specificity 91.0%) [11]. In circumstances where an expert is not available, the IOTA logistic regression model 2 (LR2) can be used as an alternative to the IOTA Simple rules with subjective assessment, as either approach would ultimately result in comparable diagnostic performance [1,11]. However, compared to the IOTA Simple Rules, the IOTA logistic models required more detailed information for each predictor and were not as easy to memorize and execute. We proposed an alternative approach to the use of the IOTA logistic models by using complete information from the IOTA Simple Rules results together with serum CA-125 level to predict the probability of ovarian malignancy. With this approach, the risk can be accurately predicted from the combined models in all patients regardless of their IOTA Simple Rules results.
The role of serum CA-125 in ovarian cancer diagnosis is subject to controversy [1]. Despite its limitations, serum CA-125 is still the most widely used marker for epithelial ovarian cancer. Studies had reported variation in the discriminative performance of serum CA-125. One study in Oman reported an AuROC of serum CA-125 at 0.75 for ovarian cancer diagnosis [25]. Another study in Turkey, reported an AuROC of 0.78 [26]. A recent meta-analysis of 19 studies which examined the diagnostic performance of serum CA-125 in Chinese patients reported a high AuROC of 0.84 [27], which was considered acceptable. However, most studies examined the diagnostic accuracy and discriminative performance of serum CA-125 at specific cut-off values, most commonly at the standard established cutoff at ≥35 U/mL [28]. Dichotomization of continuous variables results in significant losses of information and may lead to a spurious predictor-outcome relationship, which could substantially affect the discriminative performance of the biomarkers [29,30]. In this study, we avoided dichotomizing the CA-125 values by employing a fractional polynomials procedure for flexible modeling of potential non-linear association between serum CA-125 and probability of ovarian malignancy [31].
In this study, fractional-polynomials transformed values of serum CA-125 alone had excellent discriminative ability in both premenopausal and postmenopausal women (AuROC 0.88 for both groups). This was not in accord with previous studies on the diagnostic value of CA-125 [1,24,32], which were claimed to be higher in postmenopausal women and lower in premenopausal women. The discordance in discriminative ability of serum CA-125 in our study may be explained by the clinical heterogeneity of patients recruited in each study and a different mix of ovarian tumors in premenopausal and postmenopausal women [32,33]. In a recent study in Italy which examined the influences of biomarkers on the diagnostic performance of the IOTA Simple Rules [24], the proportion of malignant mass was 10% in premenopausal women and 37% in postmenopausal women, whereas the proportion of malignant mass was 20.8% in premenopausal and 46.9% in post-menopausal in our study. In addition, the pattern of serum CA-125 in benign and malignant tumors was similar in premenopausal and postmenopausal women.
One study by a team investigating models and ovarian cancer examined the addedvalue of serum CA-125 to the mathematical prediction models in differentiating between benign and malignant adnexal tumors and concluded that the measurement of serum CA-125 was unnecessary, especially in premenopausal women [32]. The study reported an average AuROC of serum CA-125 at 0.81. The stratified analysis revealed that the AuROC was 0.63 in premenopausal women and 0.92 in postmenopausal women. The higher proportions of patients with borderline tumors in that study in comparison to our study (21.5% vs. 10.3%) might partially explain the difference in CA-125 performance.
Other tumor characteristics may also affect the discriminative ability of serum CA-125, such as the histologic grades, the presence of extraovarian invasions, and the proportion of patients with early-stage ovarian cancer [34]. Another important point raised by the authors was that other ultrasound predictors (e.g., presence of ascites, maximum tumor diameter, and maximum diameter of solid component) included within the model were significantly more informative than serum CA-125 in characterizing adnexal tumors [32].
A more recent study in Italy examined the added-value of serum CA-125 to the IOTA Simple Rules two-step approach and concluded that the addition of serum CA-125 to such strategies increased the net reclassification index and was cost-effective among postmenopausal women [24]. In our study, the diagnostic value of the combination of serum CA-125 with full information from the IOTA Simple Rules seemed to be more significant in postmenopausal women than in premenopausal women, as the IOTA Simple Rules was more effective in premenopausal women than postmenopausal women (AuROC 0.89 vs. 0.83). This agreed with the subgroup analyses of a recent meta-analysis in 2014, which showed a higher accuracy of the IOTA Simple Rules in premenopausal women [8]. In our study, the superiority of the IOTA Simple Rules in premenopausal women was obviously the result of a higher proportion of endometriotic cysts, mature cystic teratomas, and pseudocysts in premenopausal women. These benign pathologies were shown to be diagnosed more correctly by the IOTA Simple Rules, either with or without subjective assessment [24].
There were four primary strengths to our combined IOTA Simple Rules and serum CA-125 prediction models. First, the models were derived separately from data from premenopausal and postmenopausal women. As the effect modification of menopausal status can substantially affect the diagnostic model performance, development of a prediction model for each specific group of women may result in a more accurate prediction. Second, these models effectively utilize full information from the IOTA Simple Rules by considering and incorporating inconclusive results, as one diagnosis category, into the models. Third, the newly developed models combined the advantages of both ultrasound and biomarker approaches in the prediction of ovarian risk and were equipped with a high diagnostic accuracy comparable to the IOTA LR2 with significantly fewer numbers of predictors (AuROC 0.94 vs. 0.94 in premenopausal and AuROC 0.98 vs. 0.93 in postmenopausal women) [7]. Considering only the two-step strategy, our models had higher discriminative ability (AuROC 0.94 vs. 0.90 in premenopausal and AuROC 0.98 vs. 0.80 in postmenopausal women) [24]. Lastly, based on the decision curve analysis, the combined model with serum CA-125 was also proven to be clinically more useful than use of the model with IOTA Simple Rules alone in both premenopausal and postmenopausal women.
There are some limitations to be addressed. First, the data on the clinical staging of ovarian cancer was not available, as the study was originally intended to evaluate only the accuracy of IOTA ultrasound parameters for ovarian cancer diagnosis. Thus, we could not fully explain the discrepancy between our results and those in other studies in terms of staging. Second, a head-to-head comparative validation of the combined models with IOTA logistic models or a two-step approach cannot be performed in our dataset as some of the essential predictors were not collected. Third, in our study, ultrasound examiners were obstetrics and gynecologic residents in training with varying levels of sonographic experience. For this reason, their IOTA Simple Rules interpretation might have the same level of accuracy as that of experienced sonographers. However, we believe that the results could be viewed as pragmatic and could more closely resemble the situation in the real world as regards clinical examiners. Fourth, this was a secondary analysis of patient database which was not intended to be stratified by menopausal status. Therefore, the study size might not be adequate and might result in model overfitting. However, based on our post hoc estimation, the number of outcome events per variable (EPV) exceeded 10 for both premenopausal and postmenopausal groups [30,35]. Finally, the derivation of both models was based on a dataset from a single oncologic center that might not be a representative sample, generalizable across an entire population. A larger, multicenter external validation study is warranted before the models be considered for clinical implementation.
For implementation, clinical applicability of our diagnostic models is mathematically straightforward. The logistic regression models require data from only two main predictors, the results of the IOTA Simple Rules and serum CA-125, to estimate the probability of ovarian malignancy (Table 2). There are three possible interpretations for the IOTA Simple Rules: benign, inconclusive, and malignant. Each interpretation is assigned with a specific log odds ratio for calculation. Serum CA-125 will have to be log-transformed, cubed, and re-centered before being put into the equation. Given a case of a premenopausal woman with an IOTA Simple Rules inconclusive result and a serum CA-125 level of 325 U/mL, one can calculate the probability by, first, calculating the linear predictors from the premenopausal model (linear predictors = −4.011 + 2.101 (inconclusive results) + 0.023 (log (325) 3 -72.7142)). Then, use the inverse logit function to convert linear predictors into the probability scale. The calculated probability from the premenopausal model was 70.4%. At a predicted probability ≥50%, the likelihood ratio of her mass being malignant is extremely high (LR + 56.51). Therefore, she should be referred to a gynecologic oncologist for proper evaluation. In practice, our logistic models could be further developed into a user-friendly application for ease of use.

Conclusions
In conclusion, this study demonstrated that serum CA-125 significantly adds value to the IOTA Simple Rules in differentiating malignant adnexal masses from benign. We also developed the diagnostic models by incorporating serum CA-125 levels into the wellaccepted IOTA Simple Rules, entitled IOTA SR X CA-125. We presented the models as logistic regression equations, which estimate the probability of malignancy for each mass. Several probability cutoff points were proposed to guide gynecologists in patient referrals for clinical applicability. Our simple models are able to provide accurate presurgical diagnosis and are potentially useful in reducing inappropriate referrals. We particularly support the implementation of our models in practice, especially in settings where there are no specialized oncologists or experienced sonographers to make final interpretations in cases of inconclusive results from the IOTA Simple Rules.
Supplementary Materials: The following are available online at https://www.mdpi.com/2075 -4418/11/2/173/s1, Table S1: Patient clinical characteristics and IOTA Simple Rules results in premenopausal women (n = 364), Table S2: Patient clinical characteristics and IOTA Simple Rules results in postmenopausal women (n = 115), Table S3: Histopathological classification of ovarian tumors in premenopausal and postmenopausal women, Table S4: Results of internal validation with bootstrap resampling procedures.

Data Availability Statement:
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.