A Machine Learning Tool to Predict the Response to Neoadjuvant Chemotherapy in Patients with Locally Advanced Cervical Cancer

: Despite several studies having identiﬁed factors associated with successful treatment outcomes in locally advanced cervical cancer, there is the lack of accurate predictive modeling for progression-free survival (PFS) in patients who undergo radical hysterectomy after neoadjuvant chemotherapy (NACT). Here we investigated whether machine learning (ML) may have the potential to provide a tool to predict neoadjuvant treatment response as PFS. In this retrospective observational study, we analyzed patients with locally advanced cervical cancer (FIGO stages IB2, IB3, IIA1, IIA2, IIB, and IIIC1) who were followed in a tertiary center from 2010 to 2018. Demographic and clinical characteristics were collected at either treatment baseline or at 24-month follow-up. Furthermore, we recorded data about magnetic resonance imaging (MRI) examinations and post-surgery histopathology. Proper feature selection was used to determine an attribute core set. Three different machine learning algorithms, namely Logistic Regression (LR), Random Forest (RFF), and K-nearest neighbors (KNN), were then trained and validated with 10-fold cross-validation to predict 24-month PFS. Our analysis included n. 92 patients. The attribute core set used to train machine learning algorithms included the presence/absence of fornix inﬁltration at pre-treatment MRI as well as of either parametrium invasion and lymph nodes involvement at post-surgery histopathology. RFF showed the best performance (accuracy 82.4%, precision 83.4%, recall 96.2%, area under receiver operating characteristic curve (AUROC) 0.82). We developed an accurate ML model to predict 24-month PFS.


Introduction
Cervical cancer is the third most common cancer in women worldwide with 569,000 new cases each year [1].
Although early stage forms are often asymptomatic, symptoms that may occur in locally advance stages are abnormal vaginal bleeding, pelvic pain, hematuria, dysuria, or hematochezia [2].
The most common histopathologic type of cervical cancer is squamous cell carcinoma, accounting for more than 80% of the cervical malignancies. The others histotypes are adenocarcinoma (up to 15%) and adenosquamous carcinoma (less than 5%) [3]. Uncommon histopathologic types are small cell or neuroendocrine, serouspapillary and clear cell. Non squamous presentations are associated with the worst prognosis [4,5].
The most recent revision of the International Federation of Gynecology and Obstetrics (FIGO) staging system was announced in 2018 introducing the role of the imaging as a source of staging information [6,7].
For pretreatment local staging, pelvic magnetic resonance imaging (MRI) and/or transvaginal ultrasound are the gold standard examinations. This evaluations are useful to define pelvic tumor extent, allowing accurate assessment of either tumor size, stromal invasion depth, and parametrial invasion.
MRI examination is a valuable imaging method in the diagnostic work-up of macroscopically visible cervical cancers (stage ≥ IB) and represents a tool for monitoring the cervical tumor response to chemotherapy [8][9][10].
According to 2018 FIGO Staging System [7], in early stage forms (IA, IB1, IB2, IB3, and IIA) treatment typically consists of surgery as chemoradiation makes patients susceptible to more unpredictable long-term side effects and menopause, despite equally effective; patients may undergo surgery alone if no risk factors requiring adjuvant radiation treatment are identified [7] Conversely, in locally advanced cervical cancer (FIGO stage ≥ IIB), definitive management with concomitant chemoradiation is the preferred treatment [2,11].
In patients with stage IB2, IB3, IIA, or IIB, the choice of neoadjuvant chemotherapy followed by radical hysterectomy can improve disease control and reduce toxicity [12,13].
Additionally, several studies report that patients undergoing radical surgery after neoadjuvant chemotherapy may lead to improved survival outcomes compared with those on radiotherapy [14][15][16].
In several fields of science, machine learning (ML) is emerging as a promising tool for the implementation of complex multi-parametric decision algorithms [17]. In this regard, a ML approach is a potential gamechanger. In fact, in addition to detecting linear patterns in analyzed data, it can unravel complex non-linear relationships between patient attributes that cannot be solved by traditional statistical methods, merging them to output a forecast or a probability for a given outcome [18].
ML is a step towards precision medicine, leading to the improvement of patient profiling and treatment personalization. Supervised ML algorithms have proven effective in predicting treatment responses and disease progression in patients affected with heterogeneous diseases [19,20].
Despite several studies had identified factors correlated with successful treatment outcomes in locally advanced cervical cancer [21], there is the lack of accurate predictive modeling for long-term progression-free survival (PFS) after neoadjuvant therapy.
Here we investigated whether ML may have the potential to provide a tool to predict neoadjuvant treatment response in terms of PFS.

Materials and Methods
In this retrospective observational study, we analyzed patients with locally advanced cervical cancer who were followed in a tertiary center from 2010 to 2018. All patients of our cohort underwent a pre-treatment MRI and, consequently, a pretreatment radiologic stage, according to FIGO 2018 [7], was established. All patients had either IB2, IB3, IIA1, IIA2, IIB, or IIIC1 stage (ordinal variable). They also received neoadjuvant chemotherapy and a subsequent post-treatment MRI. The treatment response was assessed by variation in tumor size according to Response Evaluation Criteria In Solid Tumors (RECIST v. 1.1, ordinal variable) [22]. In case of complete response (CR), partial response (PR) or stable disease (SD), the patients underwent radical hysterectomy with pelvic and lombo-aortic lymphadenectomy. Radical hysterectomy type was C1. All surgery cases were performed by open surgery.
Demographic features (age), clinical characteristics (Body Mass Index (BMI), parity, menopause, regime of neoadjuvant therapy and number of cycles) and progression free survival (PFS) at 24-month were collected at either treatment baseline and 24-month followup. Furthermore, we recorded data about MRI examinations as well as information about post-surgery histopathology (histotypes, grading, lymph node involvement).
In pre and post-treatment MRI we recorded the largest diameter of lesion, the presence/absence of either lymph node involvement, fornix infiltration, parametrium infiltration, vescico-vaginal septum infiltration and recto-vaginal septum infiltration.
In total, the original database included n. 92 patients and n. 24 variables. Proper feature selection was used to determine an attribute core set (see "Attributes Selection" paragraph for further details).
This study followed STARD guidelines [23] and the TRIPOD statement [24]. The ML algorithms were aimed at forecasting PFS at 24-month follow up. Student's t-test for paired samples or Wilcoxon matched-pair signed-rank test were used as appropriate to identify difference among continuous variables between different observation periods. McNemar's test was used to identify the difference among dummy variables between different observation periods. The significance level at α = 0.05 was used.
The attribute core set used to train the algorithms was determined using a recursive feature elimination (RFE) wrapper based on a decision tree algorithm with extreme gradient boosting (XGBoost) [25]; in brief, this algorithm automatically selects among all the recorded attributes (n. 23) the best number of features upon their importance for predictions of the given outcome (PFS at 24 months). Feature selection may contrast overfitting problems and improves classification performance. RFE elimination method is one of the commonly used feature selection methods for small samples problems [26][27][28] (For further details about RFE see Supplementary Materials).
The whole analysis was implemented in a Python 3.6 environment using scikit-learn (ver.0.22.1) and XGBoost (ver. 1.1.0) libraries [25,29]. After z-score normalization, we ran a Bayesian ridge conditional imputation [30] for missing data. The latter method has proven to be the more accurate method of imputation for obstetrics and gynecology datasets [31] (see Supplementary Materials for further details).
Three different classifiers, either linear and non-linear, were trained and validated with 10-fold cross-validation using the attribute core set retrieved by the RFE for predicting 24-month PFS.
While logistic regression (LR) had been almost always the algorithm of choice to find independent predictors in multivariate models, it must be noticed that the study hypotheses were usually based on the unreal assumption that the association between the prognostic factors and clinical outcomes is direct and isolated. On the contrary, LR is not suitable for the modeling of non-independent variables. For this reason, along with usual LR, for linear modeling we deployed the non-parametric K-nearest neighbors (KNN) and random forest (RFF) [30] algorithms. The latter models have recently proven able to accurately predict important outcomes for woman's health, also in presence of non-linear patterns in data [32][33][34]. Additionally, we choose RFF as there is evidence of accurate performance in case of imbalanced data, which is often the case of clinical datasets [35]. We also ran RFF using cost-sensitive training (using the argument class weight = "balanced" in scikit-learn) to try to overcome imbalanced class issue.
A repeated grid-search with cross-validation was used for optimal hyperparameter tuning to maximize the classifiers' performance [36] (See Supplementary Material for hyperparameter fine-tuning).
For each classifier, we plotted ROC curves, and then area under receiver operating characteristic curve (AUROC) was determined.
Then, based on the optimal probability cut-off (Youden's Index) [37] classifiers' performance was compared with the following metrics: In general, a classification model forecasts a binary outcome for a given observation and class. In the process of predicting, a model may output the probability of an observation belonging to each possible class. This case provides some flexibility both in the way predictions are interpreted and presented, allowing the choice of a threshold, as the above mentioned Youden's index [38].
For a model to be reliable, the estimated class probabilities should be reflective of the true underlying probability of the sample. To check these assumptions, a diagnostic calibration curve for the candidate best classifier was also plotted [38].
RECIST criteria showed CR in n. 19 RFE retrieved an attribute core set used to train machine learning algorithms including the presence/absence of fornix infiltration at pre-treatment MRI as well as the presence/absence of either parametrium invasion and lymph nodes involvement at postradical surgery histopathology.
The final dataset had a dimensionality of 92 columns × 4 rows (n.3 selected attributes plus n. 1 target class (PFS at 24 months, as above mentioned).

Youden's Index Cut-Off Accuracy (%) TPR (%) Precision (%) AUROC
In Figure 2 calibration diagnostic has been plotted for RFF; PFS roughly happened with an observed relative frequency consistent with the forecast value, showing an acceptable calibration curve. We would expect the match between predicted frequencies and observed frequencies to increase with a larger dataset. In Figure 1, ROC curve for RFF (box A), LR (box B) and KNN (box C) models was reported. In Figure 2 calibration diagnostic has been plotted for RFF; PFS roughly happened with an observed relative frequency consistent with the forecast value, showing an acceptable calibration curve. We would expect the match between predicted frequencies and observed frequencies to increase with a larger dataset.  reported. In Figure 2 calibration diagnostic has been plotted for RFF; PFS roughly happened with an observed relative frequency consistent with the forecast value, showing an acceptable calibration curve. We would expect the match between predicted frequencies and observed frequencies to increase with a larger dataset.

Discussion
The pillar of survival analyses in oncologic research had historically been Cox proportional hazard regression model, being a surrogate for estimating treatment effectiveness and safety. This model is based on an assumption of linear association. However, many clinicopathologic features exhibit a nonlinear association in medicine [39].
Conversely in the area of cervical cancer research, ML can be used for supporting the study of human papillomavirus-related disease, evaluating either cervical cytology, colposcopy and genomic analysis [40][41][42][43][44][45][46][47][48][49][50][51]. However, there are only a few studies that have examined oncologic outcome [52]. This is the first study that wants to analyze the accuracy of a ML modeling to predict the response to neoadjuvant chemotherapy in patients with locally advanced cervical cancer.
Gadducci et al. [53] studied predictors of clinical outcome in patients with locally advanced cervical cancer treated with radical hysterectomy followed neoadjuvant chemotherapy using traditional statistics. This study stated that an optimal pathological response was the most relevant predictor for disease-free survival (DFS) and overall survival (OS). The involvement of the parameters and/or margins of surgical resection was the other independent predictor; vice versa, the lymph node status and the involvement of the lymphovascular spaces correlated with DFS and OS.
A study of Liang et al. established prognostic value of pathological response to neoadjuvant chemotherapy in 204 patients affected with stage IB2-IIA cervical squamous cell cancer. Clinical response and FIGO stage are variables statistically associated with DFS. Patient age, histological grade and chemotherapy regimen result not associated with DFS. An optimal pathological response to neoadjuvant chemotherapy has been shown to be associated with improved long-term outcome.
In this study tumor regression results to be an independent prognostic factor for survival performing the multivariate analysis. Moreover, in patients without extra-cervical deposits, an association between survival rate and chemotherapy response grade was shown, while this was not true in patients with extra-cervical deposits. This implies that when the tumor is confined to the cervix, residual viable tumor cells has an impact on prognosis. Although this is not true when tumor cells have spread outside the cervix [54].
The role of extra-cervical deposits (vaginal disease, nodal metastasis, parametrial involvement) in determining the prognosis of cervical cancer patients after NACT has been reported in previous studies. Uegaki et al. demonstrated that pelvic lymph node metastasis was the only histopathologically independent prognostic factor (p = 0.0029) [55].
Benedetti-Panici et al. showed that lymph node metastases and involved parametria were the only two independent factors for survival [56].
From all the recorded variables in our cohort, the automated attribute selection algorithm selected the presence/absence of fornix infiltration at pre-treatment MRI as well as presence/absence of either parametrium invasion and lymph nodes involvement at post-radical surgery histopathology as the attribute core set to be used in ML models training and validation.
The presence/absence of either parametrium invasion and lymph nodes involvement at post-radical surgery histopathology had been already evaluated as predictors in previous studies [53,55,56]. On the other hand, the presence/absence of fornix infiltration at pretreatment MRI was not considered as a classical predictor of response to neoadjuvant chemotherapy.
On the contrary, for predictive modeling RFF, a non-linear algorithm, showed a slightly higher performance than LR in terms of accuracy and precision.
The lack of balance in target class (PFS at 24 months) may be responsible of the better performance of RFF, especially in terms of precision, when compared to KNN and LR.
The main strength of our model is its capability of predicting sustained remission basing on easy-to-gather attributes that are widely available at treatment baseline visit and come with no added cost.
Despite good performance, the main limitation of this study remains the sample size. Although our sample size for training and validation is similar or larger than those recently published [54], it must be noticed that ML algorithms score dramatically better when huge cohorts (i.e., thousands of patients) are used for training.

Conclusions
In gynecologic oncology, ML is a step towards precision medicine, leading to the improvement of patient profiling and treatment personalization.
We developed an accurate model to predict 24-month PFS in patients with locally advanced cervical cancer on neoadjuvant therapy, based on an ML algorithm requiring few easy-to-collect attributes. Our results are promising but need to be tested prospectively. Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of Azienda Ospedaliera Policlinico Consorziale-University of Bari, IT (protocol code 6398, date of approval 10 June 2020).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study at baseline consultation.
Data Availability Statement: Data are not freely available due to local Ethics Committee privacy issues. Authors will consider data sharing upon specific request to local Ethics Committee.