A Deep Learning Radiomics Nomogram to Predict Response to Neoadjuvant Chemotherapy for Locally Advanced Cervical Cancer: A Two-Center Study

Purpose: This study aimed to establish a deep learning radiomics nomogram (DLRN) based on multiparametric MR images for predicting the response to neoadjuvant chemotherapy (NACT) in patients with locally advanced cervical cancer (LACC). Methods: Patients with LACC (FIGO stage IB-IIIB) who underwent preoperative NACT were enrolled from center 1 (220 cases) and center 2 (independent external validation dataset, 65 cases). Handcrafted and deep learning-based radiomics features were extracted from T2WI, DWI and contrast-enhanced (CE)-T1WI, and radiomics signatures were built based on the optimal features. Two types of radiomics signatures and clinical features were integrated into the DLRN for prediction. The AUC, calibration curve and decision curve analysis (DCA) were employed to illustrate the performance of these models and their clinical utility. In addition, disease-free survival (DFS) was assessed by Kaplan–Meier survival curves based on the DLRN. Results: The DLRN showed favorable predictive values in differentiating responders from nonresponders to NACT with AUCs of 0.963, 0.940 and 0.910 in the three datasets, with good calibration (all p > 0.05). Furthermore, the DLRN performed better than the clinical model and handcrafted radiomics signature in all datasets (all p < 0.05) and slightly higher than the DL-based radiomics signature in the internal validation dataset (p = 0.251). DCA indicated that the DLRN has potential in clinical applications. Furthermore, the DLRN was strongly correlated with the DFS of LACC patients (HR = 0.223; p = 0.004). Conclusion: The DLRN performed well in preoperatively predicting the therapeutic response in LACC and could provide valuable information for individualized treatment.


Introduction
Cervical cancer is the most frequently diagnosed cancer among women, with over 500,000 new diagnoses and 300,000 deaths per year [1,2]. In the past, for the treatment of locally advanced cervical cancer (LACC), radical hysterectomy (RH) and chemotherapy were suggested; however, the likelihood of treatment failure is high [3,4]. Since the 1990s, neoadjuvant chemotherapy (NACT), a short-course chemotherapy administered before surgery, has been used as a substitute treatment option for cervical cancer [5]. Although the five-year overall and disease-specific survival rates are similar between these treatment modalities, they demonstrate disparate toxicity profiles [6]. An increasing number of findings have suggested that NACT followed by radical hysterectomy or radiotherapy can improve survival. A meta-analysis from 21 randomized studies found that NACT substantially minimized the risk of death in LACC participants [7], while another analysis of 6 randomized controlled clinical trials demonstrated that NACT significantly improved both overall survival and progression-free survival [8]. The potential advantages of NACT have been reinforced by its ability to reduce tumor volume, eradicate subclinical distant metastasis, allow better delivery of radiation and improve prognosis [9,10]. Current research should focus on how to best identify patients who will benefit the most from this method. Ideally, radiotherapy should be avoided to the greatest extent in young women to preserve ovarian and sexual function. Only individuals with excellent reaction and negative lymph nodes should be subjected to radical surgery. This will decrease the requirement for postoperative chemoradiation and the increased morbidity associated with this approach when combined with radical surgery.
MRI has wide utility in pelvic evaluation due to its unique benefits in soft tissue imaging; it is effective in detecting primary tumor size, local diffusion and lymph node metastasis and is considered the gold standard for monitoring the response of cervical tumors to chemotherapy and determining eligibility for eventual excision [11]. In recent years, MRI-based radiomics has received increasing attention due to the advantages of extracting a large number of quantitative features from imaging phenotypes and offering meaningful biologic information [12,13]. Recent efforts have highlighted that MRI-based radiomics features may hold potential in the prediction of the response to NACT in LACC preoperatively [14,15], confirming that radiomics can distinguish patients who are suitable for NACT from those who are not. While the performance of these radiomics models for the prediction of NACT response is within acceptable ranges, there are still some limitations. Handcrafted radiomics features are restricted to the current knowledge of medical imaging [16] and operators' prior knowledge and hence may be insufficiently representative. Because generic features might lead to poor classification model performance, most tasks typically require distinguishing features depending on individual context to produce better outcomes. However, such distinguishing characteristics may not be applicable to other activities [17].
Deep learning (DL) performs well in medical image classification and prediction, extracting and learning deep features directly in a data-driven manner [18][19][20], and has shown excellent performance in precision cancer diagnosis, therapeutic decision making and survival analysis [21][22][23]. Since DL provides more abstract and difficult-to-mine radiomics features, several researchers have combined handcrafted and DL-based radiomics features to develop prediction models in cancers [24][25][26]. Although radiomics is capable of quantifying interpretable tumor image features, DL-based features primarily focus on effective semantic information, making them suitable for large-scale image analysis [27,28]. Moreover, preoperative clinical parameters offer additional useful data for predicting the NACT response [29]. Combining clinical characteristics with the two sets of radiomics features can predict NACT effectiveness from multiple perspectives, providing a more comprehensive assessment.
Nevertheless, investigations on the development of a DL radiomic model for predicting the therapeutic response to NACT in patients with LACC are still lacking. Notably, the pathological response was an independent prognostic variable for disease-free survival (DFS) and overall survival in patients with stage Ib2-IIb disease who underwent radical hysterectomy combined with NACT. Patients who are nonresponsive to NACT have a higher risk of recurrence than those who do respond [30]. However, the application of deep learning radiomics to predict the pathological response to NACT has been infrequently described in LACC. In addition, it is essential to determine radiomics and clinical predictors of the NACT response to stratify cervical cancer patients based on relevant therapy.
In this study, we first built and validated a new preoperative deep learning radiomics nomogram (DLRN) that combines handcrafted and DL-based radiomics features and clinical predictors for the prediction of the clinical response to NACT in cervical cancers.

Patients
A retrospective study was performed after receiving approval from the Ethics Committee of our institution, who waived the need for informed consent from the participants. The study was conducted in accordance with the Declaration of Helsinki. All procedures performed during this study complied with the 1964 Helsinki declaration. The data of patients with cervical cancer with a confirmed histopathology from two centers between January 2015 and June 2021 were enrolled ( Figure 1 and Supplementary S1). The inclusion criteria were as follows: (1) conventional pelvic MR scanning examinations performed before NACT, including T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) and contrast-enhanced (CE)-T1WI, in patients with pathologically confirmed cervical cancer; (2) FIGO stage (2018) from IB to IIIB; (3) use of the same NACT therapy; and (4) the presence of clinicopathological data. The exclusion criteria were as follows: (1) radiation therapy before NACT; (2) incomplete MR sequence or clinicopathological information; and (3) complications with other malignant tumors. Ultimately, a total of 220 patients from center 1 and 65 patients from center 2 were enrolled according to the selection criteria. The overall radiomics pipeline of this study is shown in Figure 2.
January 2015 and June 2021 were enrolled (Figure 1 and Supplementary S1). The inclusion criteria were as follows: (1) conventional pelvic MR scanning examinations performed before NACT, including T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) and contrast-enhanced (CE)-T1WI, in patients with pathologically confirmed cervical cancer; (2) FIGO stage (2018) from IB to IIIB; (3) use of the same NACT therapy; and (4) the presence of clinicopathological data. The exclusion criteria were as follows: (1) radiation therapy before NACT; (2) incomplete MR sequence or clinicopathological information and (3) complications with other malignant tumors. Ultimately, a total of 220 patients from center 1 and 65 patients from center 2 were enrolled according to the selection criteria. The overall radiomics pipeline of this study is shown in Figure 2.
Regular follow-up was performed every 3-6 months for the first 2 years following treatment, every 6 months for the next 3-5 years and then once a year after that. The period from the date of surgery to the date of the first locoregional recurrence, distant metastasis death, or the last visit in follow-up, which was defined as DFS, was the endpoint of our study. Gynecological examination and imaging modalities such as CT, MRI and positron emission tomography-computed tomography were applied to screen for locoregional recurrence and distant metastases.  Regular follow-up was performed every 3-6 months for the first 2 years following treatment, every 6 months for the next 3-5 years and then once a year after that. The period from the date of surgery to the date of the first locoregional recurrence, distant metastasis, death, or the last visit in follow-up, which was defined as DFS, was the endpoint of our study. Gynecological examination and imaging modalities such as CT, MRI and positron emission tomography-computed tomography were applied to screen for locoregional recurrence and distant metastases.

Assessment of the NACT Response
All participants conducted one or two cycles of NACT and hysterectomies within 2 weeks. The short-term response was measured by the change in tumor size, which was calculated using the Response Evaluation Criteria In Solid Tumors (RECIST v. 1.1) [31]. Before surgery, the clinical response was assessed and categorized as follows: patients who had complete response (CR) or partial response (PR) were assigned to the pathological good responder (pGR) group, while those who had stable disease (SD) or progressive disease (PD) were assigned to the non-pGR group.

Assessment of the NACT Response
All participants conducted one or two cycles of NACT and hysterectomies within 2 weeks. The short-term response was measured by the change in tumor size, which was calculated using the Response Evaluation Criteria In Solid Tumors (RECIST v. 1.1) [31]. Before surgery, the clinical response was assessed and categorized as follows: patients who had complete response (CR) or partial response (PR) were assigned to the pathological good responder (pGR) group, while those who had stable disease (SD) or progressive disease (PD) were assigned to the non-pGR group.

MRI Acquisition
All patients received a systematic pelvic MR examination with a 3.0 T MR scanner at two distinct time periods. The pre-and post-treatment scanning periods were set to be within one week of the initiation of NACT and one week after the completion of NACT, respectively. The MRI parameters of the axial T2WI, DWI and CE-T1WI sequences are detailed in Table S1.

Image Segmentation and Processing
A radiologist with 9 years of experience in abdominal MRI diagnosis delineated regions of interest (ROIs) for use in radiomics analysis slice-by-slice along the tumor boundary manually by ITK-SNAP software (version 3.8.0, open-source software available for download at http://www.itksnap.org, accessed on 1 March 2022). All segmentations underwent quality control for proper annotation by an expert pelvic radiologist (with 15

MRI Acquisition
All patients received a systematic pelvic MR examination with a 3.0 T MR scanner at two distinct time periods. The pre-and post-treatment scanning periods were set to be within one week of the initiation of NACT and one week after the completion of NACT, respectively. The MRI parameters of the axial T2WI, DWI and CE-T1WI sequences are detailed in Table S1.

Image Segmentation and Processing
A radiologist with 9 years of experience in abdominal MRI diagnosis delineated regions of interest (ROIs) for use in radiomics analysis slice-by-slice along the tumor boundary manually by ITK-SNAP software (version 3.8.0, open-source software available for download at http://www.itksnap.org, accessed on 1 March 2022). All segmentations underwent quality control for proper annotation by an expert pelvic radiologist (with 15 years of experience in abdominal MRI diagnosis). After three months, 40 patients in the training set were selected randomly and resegmented by the radiologists to confirm the stability of radiomics characteristics.
For DL feature extraction, MR images were cropped by defining a rectangular ROI that encompassed the labeled tumor. Then, to satisfy the input size requirement of the pretrained convolutional neural network (CNN) model, the tumor patch was adjusted to a 224 × 224 square, and the intensity of the tumor patch images was standardized to the color range of 0-255 for training. To improve feature discrimination, many preprocessing procedures were applied before extracting quantitative features. Nonlinear intensity normalization and gray-level quantization were used to transform MR images to standardized intensity ranges and map the whole intensity range of the tumor location to distinct gray levels [32,33]. Finally, images were resampled to an isotropic pixel size using bilinear interpolation. Supplementary S2 contains detailed information about image processing.

Feature Extraction
Two types of features, handcrafted and DL-based radiomics features, were extracted. The open-source Python (v.3.10; https://www.python.org/, accessed on 1 May 2022) toolbox PyRadiomics was used for all handcrafted radiomics features. In total, 1223 handcrafted radiomics features were extracted from each MR sequence, including shape, first-order statistical, texture and transformation features. The details of the feature extraction are shown in Supplementary S3. In this study, DL-based features were extracted using DenseNet-121, which was pretrained on the natural picture dataset ImageNet. All convolutional layers were produced with a constant kernel size of 3 × 3 with stride and padding set to 1. The network consists of 5 max-pooling layers with a window size of 2 × 2 and a nonlinear activation function of rectified linear units, with the final fully connected layer having 1001 features with SoftMax enabled.

Feature Selection and Radiomics Signature Construction
Handcrafted radiomics and DL-based features were normalized individually using Z scores to achieve a standard normal distribution of feature intensity. The following steps were conducted in the training dataset for feature selection and signature establishment. First, the intraclass correlation coefficient (ICC) was determined to confirm the stability of radiomics characteristics, and features with an ICC < 0.85 were removed. Second, univariate analysis was applied to select handcrafted and DL-based radiomics features that were substantially different between the pGR and non-pGR groups using the Mann-Whitney U test. Third, considering that unbalanced classification (pGR group to non-pGR group: 113:35 = 3.23) may lead to inaccurate results, the synthetic minority oversampling technique (SMOTE) was used to balance the minority class in the training dataset so that the ratio of the two groups of patients was set to 1.10:1 (118 patients in the pGR group and 107 in the non-pGR group) in the SMOTE training dataset [34]. Further details of the SMOTE process are shown in Supplementary S4. The SMOTE algorithm was applied to create synthetic samples according to k-nearest neighbour of each minority class sample to balance the number of two-class sample [35,36]. Fourth, to perform feature selection, least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation was utilized. The handcrafted and DL-based radiomics features derived by feature selection were passed through a support vector machine (SVM) classifier to develop the handcrafted radiomics signature and the DL-based radiomics signature, respectively, for predicting the response to NACT.

Clinical Model and DLRN Development and Validation
To investigate the relationship between clinical variables and NACT response, differences in clinical characteristics were evaluated using the chi-square test or Fisher's exact test for categorical variables and the t test or Mann-Whitney U test for continuous data as applicable. The significant clinical characteristics were then subjected to stepwise backward multivariate logistic regression to determine the independent predictors of the NACT response. The criterion for variable inclusion was p < 0.1 for main effects and p < 0.05 for interactions. The clinical model comprising only the significant clinical information was constructed.
In this study, as a hybrid model, the DLRN was developed to investigate the potential combined utility of the clinical model, handcrafted radiomics signature, and DL-based radiomics signature by multivariate logistic regression. Given the considerable interpretability of logistic regression, the effect of different models on final prediction outcomes based on model weights may be explained. It is also appropriate for binary classification applications and effective training. The DLRN not only predicts whether the patient would respond to NACT but also gives an approximate probability prediction.

Statistical Analysis
The receiver operating characteristic (ROC) curve was drawn to assess the performances of the handcrafted and DL-based radiomics signatures, clinical model and the DLRN in training and validation cohorts. DeLong's test was used to compare the AUCs estimation between these prediction models. Performance measurements included sensitivity, specificity and accuracy. Calibration curves and the Hosmer-Lemeshow (H-L) test, which investigate how similar the models' predictions are to the observed outcome, were implemented to verify calibration performance. To assess the clinical value of models, a decision curve analysis (DCA) was performed by quantifying the net benefits to a range of threshold probabilities in both internal and external validation datasets. To investigate the association between the DLRN score and DFS, the Kaplan-Meier curve was employed, and patients were classified into high-risk or low-risk groups. The DFS prediction model was designed using both univariate and multivariate analyses, as well as Cox proportional hazards regression. The scaled Schoenfeld residual test was used to validate the models' proportional hazards assumption. Statistical analyses were conducted using R studio software (R Studio Inc., version 4.0.0, R Foundation for Statistical Computing, Vienna, Austria) and SPSS 24.0 software (SPSS Inc., Chicago, IL, USA). The R software packages are provided in the Supplementary S5. A two-sided p < 0.05 was considered statistically significant.

Patient Characteristics
From January 2015 to June 2021, 220 patients from center 1 were included and split into training (n = 148; mean age, 52.3 ± 9.54) and validation datasets (n = 72; mean age, 51.2 ± 10.52). In addition, 65 patients (mean age, 53.3 ± 10.72) were enrolled from another center as an external validation dataset. The NACT response rate was consistent throughout the three groups of patients, with 76.35% for the training and 63.89% and 72.31% for the internal and external validation datasets, respectively. The baseline clinical characteristics are summarized in Table 1. No significant difference was observed in menopausal status, maximum tumor diameter, pathological type, neutrophil-to-lymphocyte ratio (NLR) or platelet-to-lymphocyte ratio (PLR) between the pGR and non-pGR groups in the datasets (p > 0.05). Moreover, age, FIGO stage, serum SCC-Ag (squamous cell carcinoma antigen) level, lymphovascular space invasion (LVSI) status, lymph node metastasis and parametrial invasion (PMI) showed significant differences between the two groups in the training dataset (p < 0.05).

Feature Selection and Radiomics Signature Development
In this study, two types of radiomics features were extracted, including handcrafted radiomics and DL-based radiomics features. For each patient, 3669 handcrafted radiomics features were derived. By univariate analysis, 989 handcrafted radiomics features were found to be significantly different between patients in the pGR and non-pGR groups in the training dataset. As shown in Figure 3a-c, the LASSO classifier selected 10 features (presented in Table 2), including four T2WI features, two DWI features and four CE-T1WI features.
A total of 3000 DL-based features were extracted using DenseNet-121 ( Figure 4 and Supplementary Table S2), and 7 optimal DL-based features were acquired through feature selection, including three T2WI features, two DWI features, and two CE-T1WI features (Figure 3d,e and Table 2). Then, we developed a DL-based radiomics signature to differentiate responders from nonresponders in the training data. A total of 3000 DL-based features were extracted using DenseNet-121 (Figure 4 and Supplementary Table S2), and 7 optimal DL-based features were acquired through feature selection, including three T2WI features, two DWI features, and two CE-T1WI features (Figure 3d,e and Table 2). Then, we developed a DL-based radiomics signature to differentiate responders from nonresponders in the training data.

Performance of Handcrafted Radiomics Signatures
The handcrafted radiomics signature achieved an AUC of 0.884 (95% CI, 0.827-0.942) in the training set and 0.858 (95% CI, 0.763-0.953) and 0.810 (95% CI, 0.707-0.912) in the internal and external validation datasets, respectively. The resulting radiomics signature performed stably in all three datasets, and the inaccuracy was within acceptable limits. The internal validation dataset had an accuracy, sensitivity and specificity of 0.781, 0.846 and 0.619, respectively, whereas the corresponding values in the external validation dataset were 0.764, 0.824 and 0.619, respectively.

Performance of Handcrafted Radiomics Signatures
The handcrafted radiomics signature achieved an AUC of 0.884 (95% CI, 0.827-0.942 in the training set and 0.858 (95% CI, 0.763-0.953) and 0.810 (95% CI, 0.707-0.912) in th internal and external validation datasets, respectively. The resulting radiomics signatur performed stably in all three datasets, and the inaccuracy was within acceptable limits The internal validation dataset had an accuracy, sensitivity and specificity of 0.781, 0.84 and 0.619, respectively, whereas the corresponding values in the external validation da taset were 0.764, 0.824 and 0.619, respectively.

Performance of the DL-Based Radiomics Signatures
The DL-based radiomics signature yielded AUCs of 0.871 (95% CI 0.756-0.901) in th training dataset, and 0.893 (95% CI 0.814-0.972) and 0.829 (95% CI 0.680-0.936) in thes two validation datasets, respectively. The prediction performance of the DL-based signa ture was slightly higher than that of the handcrafted radiomics signature in the validatio dataset.

Performance of the Clinical Model
FIGO stage, serum SCC-Ag level, LVSI status and PMI were identified as independ ent predictors based on univariate and multivariate logistic regression analyses (p < 0.05 in the training dataset and were used to build the clinical model ( Table 3). The predictio efficiency was determined for the clinical model in the training dataset (AUC = 0.711; 95% CI: 0.620-0.801), internal validation dataset (AUC = 0.665; 95% CI: 0.538-0.792) and exter nal validation dataset (AUC = 0.689; 95% CI 0.750-0.901).

Performance of the DL-Based Radiomics Signatures
The DL-based radiomics signature yielded AUCs of 0.871 (95% CI 0.756-0.901) in the training dataset, and 0.893 (95% CI 0.814-0.972) and 0.829 (95% CI 0.680-0.936) in these two validation datasets, respectively. The prediction performance of the DL-based signature was slightly higher than that of the handcrafted radiomics signature in the validation dataset.

Performance of the Clinical Model
FIGO stage, serum SCC-Ag level, LVSI status and PMI were identified as independent predictors based on univariate and multivariate logistic regression analyses (p < 0.05) in the training dataset and were used to build the clinical model ( Table 3). The prediction efficiency was determined for the clinical model in the training dataset (AUC = 0.711; 95% CI: 0.620-0.801), internal validation dataset (AUC = 0.665; 95% CI: 0.538-0.792) and external validation dataset (AUC = 0.689; 95% CI 0.750-0.901).

Performance of the DLRN and Model Comparison
The handcrafted and DL-based radiomics signatures and the abovementioned independent predictors were incorporated to build the DLRN (Figure 5a and Table 4), which showed good performance for NACT response prediction, achieving AUCs of 0.963 (95% CI 0.932-0.995), 0.940 (95% CI 0.877-1.000) and 0.910 (95% CI: 0.859-0.961) in the training, internal validation and external validation datasets, respectively (Table 5 and Figure 6). The calibration curve showed good agreement between the DLRN predicted response and the actual response to NACT (Figure 5b,c), which was also demonstrated by the Hosmer-Lemeshow test. Compared with the clinical model and the handcrafted and DL-based radiomics signatures, the DLRN provided the best performance in predicting the response to NACT (p < 0.05) and was externally validated. The patients who had CR and SD to NACT are shown in Figure 7. Compared with the clinical model, the handcrafted radiomics signature and the DL-based signature, the prediction efficiency of DLRN is more accurate, by considering the three models above. This suggests that the DLRN built in this study was more generalizable and allows for more personalized clinical decision making.    Combined model probability = sigmoid (1.297 × Clinical model probability + 1.087 × Handcrafted radiomics signature probability + 1.157 × DL-based radiomics signature probability − 0.547). * Indicates the β is the regression coefficient; the p < 0.05.   Furthermore, we looked into the prognosis of DLRN in 101 patients with LACC. The median duration of the follow-up was 29 (range, 16-40) months. As shown in Figure 8, higher DLRN scores were demonstrated to be substantially related to improved DFS (hazard ratio (HR), 0.223; 95% CI, 0.073-0.667, log-rank test, p = 0.004). The DLRN indicates deep learning radiomics nomogram, which integrated handcrafted signature, DL-based signature and clinical model; a Sen, Spe, Acc and AUC indicate the sensitivity, specificity, accuracy and area under the curve of the receiver operating characteristic curve, respectively.    The DLRN indicates deep learning radiomics nomogram, which integrated handcrafted signature, DL-based signature and clinical model; a Sen, Spe, Acc and AUC indicate the sensitivity, specificity, accuracy and area under the curve of the receiver operating characteristic curve, respectively.

Discussion
In this study, two sets of noninvasive radiomics signatures were developed by handcrafted and DL-based radiomics features based on multiparametric MRI to preoperatively predict the clinical response to NACT in patients with LACC. Furthermore, we incorporated these radiomics signatures with clinical independent predictors, namely, serum SCC-Ag level, PMI, LVSI status and FIGO stage, to build a DL radiomics model, which was presented in the form of a nomogram (DLRN). To the best of our knowledge, this is the first study to explore the predictive value of three ensemble models: clinical model, handcrafted radiomics signature and DL-based radiomics signature. The nomogram might be utilized for multidisciplinary consultation between radiologists and oncologists to predict the NACT response preoperatively and propose appropriate tailored therapy alternatives to these patients, rather than depending exclusively on clinical assessment. Furthermore, the DLRN score was substantially correlated with DFS in patients with LACC, yielding essential supplemental information for prognosis.
Predicting the response to NACT before treatment would be extremely beneficial in terms of preventing unnecessary chemotherapy before surgery, as well as for providing treatment options, individualization and effective patient stratification. An increasing number of methods have been shown to predict the response to NACT in patients with LACC. Tian et al. [15] found that CT-based machine learning analysis could be employed to identify whether NACT is effective for patients with LACC, and their hybrid model, which included radiomics features and clinical features, achieved discriminatory performance with AUCs of 0.803 and 0.821 in the training and validation datasets, respectively. However, when compared to CT, MR is a more effective technique for pelvic examinations in terms of tracking disease progression and the effectiveness of treatment [37,38]. Sun et al. [14] showed that a model based on MRI data could yield excellent accuracy in determining the response to NACT by radiomics analysis. Their model achieved a high AUC (0.999 [95% CI 0.997-1]) and specificity (100% [95% CI 99-100%]) in the validation cohort. The results were higher than the result of the handcrafted radiomics signature in our study. The multicenter statistics in their analysis improved the model's stability, which is worth exploring in future studies.
Previous studies on quantitative image-based prediction have shown that DL-based features, featuring in-depth information in the hidden layers of neural networks without the need for specified features, are complementary to radiomics features [39,40]. Our study demonstrates that this remains true when using DL-based features extracted by Resnet-121, with AUCs that were slightly higher than those of the handcrafted radiomics signature in terms of discrimination for all three datasets. These features have the potential to improve diagnostic value beyond the basic quantification of information currently available in MR images, since they are not being limited to previously established image properties [41,42]. DL-based features provide abundant information on tumor geographic heterogeneity and the tumor microenvironment, both of which are linked to tumor chemical sensitivity, and have utility for cervical cancer screening, detection and prognosis [43,44].
In our study, we extracted radiomics features from MRI, identified the optimal radiomics features, and created a radiomics signature. Likewise, we extracted deep semantic features that can well characterize the tumor tissue to build a DL signature. Finally, the DLRN was developed by incorporating both radiomics signatures with clinical predictors, and its effectiveness was considerably greater than that of the clinical model in all datasets. Earlier studies have found association between FIGO stage and serum SCC-Ag level and NACT effectiveness [29,45]. Our findings are in line with those of Ou et al. [29], who showed that PMI may have an effect on NACT efficacy in LACC. The number of invasive tumor cells is higher in cervical cancer with PMI, which is frequently exposed to strong genetic stress under hypoxic conditions, leading to genetic instability and drug-resistance induction via various mechanisms. Additionally, since cancer stem cells coexist with invasive tumor cells, invasion into surrounding tissue is a significant contribution to therapy response [46,47]. Similarly, the DLRN outperformed the handcrafted and DL-based radiomics signatures in prediction ability in most datasets. Solely DL classification algorithms are prone to overfitting, which may lead to instability; therefore, well-defined handcrafted features were introduced into our processing. The combination of handcrafted and DL-based radiomics signature has the potential to provide higher investment possibilities.
Intriguingly, the score of the DLRN created in this study was strongly related to the DFS of LACC patients. Several studies shown that the response to NACT is a predictor of prognosis and that patients who achieve an NACT response have a better prognosis. Individuals at low risk of an NACT response had a substantially lower chance of favorable long-term results than those at high risk, even after NACT followed by surgery. Patients with higher DLRN scores had a considerably improved DFS. Patients with low DLRN scores tended to have unsatisfactory outcomes; therefore, other treatment options should be provided in a timely manner to minimize excessive toxicity and improve patients' survival prospects. The high-precision online DLRN will not only aid in the evaluation of treatment options and administration of individualized treatment but also enable clinicians to examine patients remotely.
There are some limitations in our study. First, the number of patients enrolled in this research was still limited, and the patients were all Chinese. Therefore, while data used for modeling and validation were obtained from two centers, additional large-scale worldwide validation is required before the model may be used in patients of other ethnicities. Second, only LACC patients with NACT followed by radical hysterectomy were included; those with NACT followed by radiotherapy were excluded, and all those patients will be enrolled for further analysis for predicting the NACT response. Third, image segmentation was performed manually by radiologists in this study, which was time consuming and laborious. An automatic segmentation model of cervical cancer based on multiple MR sequences should be established. Finally, information included in pathological images and genomes may be valuable for prognostic prediction tasks, implying that larger well-designed prospective studies with numerous data points should be conducted in the future.

Conclusions
In summary, we developed and validated an MRI-based radiomics nomogram that integrated handcrafted and DL-based radiomics signatures and a clinical model for predicting the response to NACT for patients with LACC preoperatively, which has the potential to become an automatic tool for enabling individualized treatment. Additional validation is needed to generate stronger evidence. The proposed DLRN, incorporating MR-based radiomics signatures and clinical predictors, exhibited promising performance for predicting response and clinical outcomes, and provided valuable information for individualized treatment in LACC. Nevertheless, future prospective investigations with multicenter and large sample sizes are needed to prove the clinical applicability of this DLRN, and radiomics feature explanation at a biological level should be performed in future radiogenomics research. Informed Consent Statement: Patient consent was waived due to the retrospective nature of the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.