Computer-Aided Segmentation and Machine Learning of Integrated Clinical and Diffusion-Weighted Imaging Parameters for Predicting Lymph Node Metastasis in Endometrial Cancer

Simple Summary Computer-aided segmentation and machine learning added values of clinical parameters and diffusion-weighted imaging radiomics for predicting nodal metastasis in endometrial cancer, with a diagnostic performance superior to criteria based on lymph node size or apparent diffusion coefficient. Abstract Precise risk stratification in lymphadenectomy is important for patients with endometrial cancer (EC), to balance the therapeutic benefit against the operation-related morbidity and mortality. We aimed to investigate added values of computer-aided segmentation and machine learning based on clinical parameters and diffusion-weighted imaging radiomics for predicting lymph node (LN) metastasis in EC. This prospective observational study included 236 women with EC (mean age ± standard deviation, 51.2 ± 11.6 years) who underwent magnetic resonance (MR) imaging before surgery during July 2010–July 2018, randomly split into training (n = 165) and test sets (n = 71). A decision-tree model was constructed based on mean apparent diffusion coefficient (ADC) value of the tumor (cutoff, 1.1 × 10−3 mm2/s), skewness of the relative ADC value (cutoff, 1.2), short-axis diameter of LN (cutoff, 1.7 mm) and skewness ADC value of the LN (cutoff, 7.2 × 10−2), as well as tumor grade (1 vs. 2 and 3), and clinical tumor size (cutoff, 20 mm). The sensitivity and specificity of the model were 94% and 80% for the training set and 86%, 78% for the independent testing set, respectively. The areas under the receiver operating characteristics curve (AUCs) of the decision-tree was 0.85—significantly higher than the mean ADC model (AUC = 0.54) and LN short-axis diameter criteria (AUC = 0.62) (both p < 0.0001). We concluded that a combination of clinical and MR radiomics generates a prediction model for LN metastasis in EC, with diagnostic performance surpassing the conventional ADC and size criteria.


Introduction
Endometrial cancer (EC) is one of the most common gynecological malignancies worldwide. Its incidence rate has increased in successive generations in countries with rapid socioeconomic transitions [1]. Early-stage EC has favorable outcomes [2]; however, the prognosis for patients with lymph node (LN) involvement is considerably poorer. A lymphadenectomy is valuable in defining nodal status and tailoring adjuvant therapy [2]. However, routine lymphadenectomy in patients with EC remains controversial [3,4] because of the potential postoperative morbidity and the technical difficulty of the procedure in obese patients. However, emerging evidence suggests the survival benefit of systematic lymphadenectomy in patients with EC with intermediate or high risk for nodal metastasis [5]. This evidence highlights the importance of precise risk stratification in lymphadenectomy to balance the therapeutic benefit against perioperative morbidity and mortality.
Magnetic resonance (MR) imaging is useful in defining the extent of nodal disease to guide the anatomic border for lymphadenectomy [6]. However, conventional MR imaging using a short-axis diameter of 10 mm or greater to identify suspicious LN could only achieve a modest sensitivity of 48% [7]. Diffusion-weighted (DW) imaging has proven to increase the conspicuity in pelvic LN identification [8,9], but the role of the apparent diffusion coefficient (ADC) values in the prediction of LN metastasis in EC remains debatable. The mean [10] and relative [11] ADC values can be considerably lower in metastatic nodes than in benign nodes, but contradictory results have also been reported [9]. The discordant results in literature may be partly explained by the considerable variations in interobserver and intraobserver reproducibility in measuring LN ADC values [12]. Obtaining reliable ADC quantification for LN is challenging because of the small size of LN. To optimize the diagnostic performance of DW imaging in LN staging, the analytical technique should be refined. To achieve reproducible segmentation results, whole tumor volumetric segmentation, rather than focused selected tumor region of interest (ROI), could be used. LN was segmented using a computer-assisted method based on objective imaging characteristics. The high-throughput radiomic ADC features through machine learning have potentials in building a prediction model to serve as a risk stratification tool for lymphadenectomy and guide the extent of operation through the localization of potential LN metastasis regions.
The aim of this study was to investigate added values of computer-aided segmentation and machine learning based on clinical parameters and diffusion-weighted imaging radiomics for predicting nodal metastasis in endometrial cancer.

Patients and Imaging Protocol
This study was performed in a prospective observational cohort diagnosed as having EC during July 2010-July 2018 and during in a tertiary referral center by a dedicated gynecologic oncology interdisciplinary team. The study was approved by the local institutional review board (approval number: IRB101-2187B and IRB103-7316A3), and written informed consent was obtained from all patients. Inclusion criteria were (1) histologically proven and untreated EC for which operations were scheduled and (2) age ≥ 18 years. Exclusion criteria were (1) MR contraindications (cardiac pacemaker, insulin pump, cochlear implant, and metal shrapnel), (2) presence of pelvic or hip metal prostheses, (3) impaired renal function with estimated glomerular filtration rates < 60 mL/min/1.73 m 2 , and (4) inability to provide informed consent. A flow diagram of the cohort selection is presented in Figure 1. All imaging exams were conducted with a 3-T MR scanner (Tim Trio; Siemens, Erlangen, Germany) before the patients were scheduled for operations, with the detailed imaging protocol stated in the Appendix A.
3 of 14 presented in Figure 1. All imaging exams were conducted with a 3-T MR scanner (Tim Trio; Siemens, Erlangen, Germany) before the patients were scheduled for operations, with the detailed imaging protocol stated in the Appendix A.

Image Processing and Feature Extraction
By using in-house developed software written in MATLAB (version 8.3 R2014a; MathWorks, Natick, MA, USA), we manually contoured the ROIs of the main tumors based on DW imaging. Regional largest LNs were segmented using a computer-aided method (Figure 2), and the details are described in the Appendix A. To improve the reliability of ADC comparison, normalized ADC (nADC) was computed. Four classes of ADC parameter were extracted: tumor ADC (ADCt), LN ADC (ADCln), absolute ADC difference between LN and tumor (rADC), and absolute ADC difference between tumor mean value and LN histogram value (rmADC). Each class comprised 12 histogram-derived data: mean, minimum and maximal pixel ADC (ADCmean, ADCmin and ADCmax, respectively); 10th-, 25th-, 50th-, 75th-, and 90th-percentile pixel ADC (ADCp10, ADCp25, ADCp50,

Histopathology
The reference standard is based on final histopathology. All patients underwent a standard surgical procedure. Surgeons with prior knowledge of the MR imaging findings carefully identified any possible metastasis during pelvic lymphadenectomy. The details are described in the Appendix A.

Statistical Analysis
Descriptive statistics were used to summarize the characteristics of the study population. We used the t-test on normally distributed variables, Mann-Whitney U test for non-parametric continuous data, and Chi-square or Fisher's exact test on categorical data, when appropriate. A weighted decision-tree model based on the classification and regression tree method was applied to build the prediction model for LN metastasis through the training/validation and testing process. The dataset of total 236 patients was randomly split into training and testing sets consisting of 16 LN metastasis and 149 absent patients (70%), and 7 LN metastasis and 64 absent patients (30%) respectively. A decision-tree method on the region-based training data was employed for feature selection and determining the cut-off for the most appropriate model-RadScore, initially including all the MR parameters. The rpart [23] package in R was used, to fit the trees with default cp = 0.01 and setting minsplit = 5 and maxdepth = 4 to control the size of the trees, and 10-fold cross-validation process repeated 10 times was perform to select the best fitting. Thus, a binary RadScore indicated the corresponding classification according to the tree rule can be obtained, and it was then combined with clinical parameters to fit a composite tree model-RadSignature. The success criteria for prediction were set to achieve high sensitivity and negative predictive value (NPV) while maintaining non-inferior specificity to the standard of care based on metrics in internal validation, and the performance of the model was assessed independently using testing set after training/validation step. The quality metrics (sensitivity, specificity, and diagnostic accuracy) of the tree model and

Histopathology
The reference standard is based on final histopathology. All patients underwent a standard surgical procedure. Surgeons with prior knowledge of the MR imaging findings carefully identified any possible metastasis during pelvic lymphadenectomy. The details are described in the Appendix A.

Statistical Analysis
Descriptive statistics were used to summarize the characteristics of the study population. We used the t-test on normally distributed variables, Mann-Whitney U test for non-parametric continuous data, and Chi-square or Fisher's exact test on categorical data, when appropriate. A weighted decision-tree model based on the classification and regression tree method was applied to build the prediction model for LN metastasis through the training/validation and testing process. The dataset of total 236 patients was randomly split into training and testing sets consisting of 16 LN metastasis and 149 absent patients (70%), and 7 LN metastasis and 64 absent patients (30%) respectively. A decision-tree method on the region-based training data was employed for feature selection and determining the cut-off for the most appropriate model-RadScore, initially including all the MR parameters. The rpart [23] package in R was used, to fit the trees with default cp = 0.01 and setting minsplit = 5 and maxdepth = 4 to control the size of the trees, and 10-fold cross-validation process repeated 10 times was perform to select the best fitting. Thus, a binary RadScore indicated the corresponding classification according to the tree rule can be obtained, and it was then combined with clinical parameters to fit a composite tree model-RadSignature. The success criteria for prediction were set to achieve high sensitivity and negative predictive value (NPV) while maintaining non-inferior specificity to the standard of care based on metrics in internal validation, and the performance of the model was assessed independently using testing set after training/validation step. The quality metrics (sensitivity, specificity, and diagnostic accuracy) of the tree model and conventional single parameter model based on ADC values (ADC model) or LN short-axis diameter (SA model) were determined and presented with 95% confidence intervals. The cut-off values for the ADC or SA models were chosen based on the Youden index. The areas under the receiver operating characteristic (ROC) curve (AUCs) were calculated to compare the diagnostic performance among models based on the De Long methods. All data were analyzed using the SPSS (version 11; SPSS, Chicago, IL, USA), MedCalc for Windows (version 9.2.0.0; MedCalc Software; Mariakerke, Belgium), or R (version 3.4.1). All tests were two-sided, and p < 0.05 was considered statistically significant.

Demographics
From July 2010 to July 2018, a consecutive cohort of 300 patients was enrolled, and a total of 236 patients were eligible for final analysis with mean ± standard deviation age 51.2 ± 11.6 years. Table 1 lists the clinical and demographic characteristics of the study population. The interval between the MR examination and surgery was 27 ± 4 days.

Data Distribution
An average of 27 nodes per patient was harvested from the pelvic sidewalls (range: 0-83, total: 5078). The positive cases were 33 among the 472 analyzed regions (7.0%), and 23 among the 236 patients (9.7%) based on the final pathology report, suggesting sufficient positive and negative classes for model fitting. The 23 patients with pelvic LN metastasis exhibited significant differences in age, histology, tumor grade, tumor size, deep myometrial invasion and low segment involvement of the uterus, as summarized in Table 1. Patients with metastatic nodes tended to have an older (p = 0.004), non-endometrioid type (p < 0.001), grade 3 tumor (p < 0.001), larger tumor size (p = 0.009), deep myometrial invasion (p < 0.0001) and low segment involvement on MR imaging (p = 0.002). The stepwise multivariate analysis identified the nonendometrioid type and presence of deep myometrial invasion being the independent clinical risk factors. We also found the positive LNs having a significantly larger shortaxis diameter (p < 0.0001) and short-to-long axis ratio (p < 0.0001), and significant lower tumor ADC mean (p < 0.0001), ADC min (p = 0.003), but not tumor ADC max (p = 0.316).
The ADC values of the metastatic LNs were significantly lower than those of the benign LNs (ADC mean , p = 0.049; ADC min , p = 0.017). The correlation matrix demonstrated a high correlation among the ADC parameters (Appendix A Figure A1). None of the LNs showed lobulated or spiculated margins indicating metastasis.

Model Comparison and Subgroup Analysis
A RadScore was built using the decision-tree analysis, based on the radiomics parameters including mean ADC value of the tumor (ADCt mean : cutoff, 1.1 × 10 −3 mm 2 /s), skewness of the relative ADC value (rADC skewness : cutoff, 1.2), short-axis diameter of LN (cutoff, 1.7 mm) and skewness ADC value of the LN (ADCln skewness : cutoff, 7.2 × 10 −2 ) ( Figure 3a). The characteristics of patients according to the risk group based on the radiomics parameters (RadScore) is detailed in Table 2 Figure A1). None of the LNs showed lobulated or spiculated margins indicating metastasis.

Model Comparison and Subgroup Analysis
A RadScore was built using the decision-tree analysis, based on the radiomics parameters including mean ADC value of the tumor (ADCtmean: cutoff, 1.1 × 10 −3 mm 2 /s), skewness of the relative ADC value (rADCskewness: cutoff, 1.2), short-axis diameter of LN (cutoff, 1.7 mm) and skewness ADC value of the LN (ADClnskewness: cutoff, 7.2 × 10 −2 ) (Figure 3a). The characteristics of patients according to the risk group based on the radiomics parameters (RadScore) is detailed in Table 2     The diagnostic performances of models for the detection of pelvic LN metastasis are summarized in Table 3. On the regional basis, the sensitivity of the RadSignature for detecting LN metastasis (100%) was significantly higher than that of the ADC (44%, p = 0.0001) or SA model (76%, p = 0.0313). The specificity of the RadSignature for detecting LN metastasis (91%) was also significantly higher than that of the ADC (75%, p < 0.0001) or SA model (61%, p < 0.0001), for the testing dataset. On a per patient basis, the sensitivity of the RadSignature for detecting LN metastasis (100%) was significantly higher than that of the ADC model (59%, p = 0.0156). The sensitivity of RadSignature was also higher than SA model (88%, p = 0.5), but did not reach statistical significance level. The specificity of the RadSignature to detect LN metastasis (90%) was significantly higher than that of the ADC (67%, p = 0.0001) or SA model (41%, p < 0.0001), for the testing dataset. Based on the ROC analysis, the RadSignature significantly outperformed the ADC and SA models for both the region and patient bases.
The pairwise comparisons of ROC curves in detecting pelvic lymph node metastasis is summarized in Table 4. The only two false-negativity of the RadSignature demonstrated microscopic tumor nests of 0.8 mm and 3.3 mm, respectively (Figure 4).
The implication of over-diagnosis causes an unnecessary LN dissection, particularly in low-risk patients. Therefore, we conducted a post-hoc subgroup analysis to investigate the possibility of the over-diagnosis or under-diagnosis in a specific risk group. Subgroups were defined according to the European Society of Gynaecological Oncology-European Society for Medical Oncology guidelines [24]: (1) low-risk (stage IA, grade 1-2, endometrioid type), (2) intermediate-risk (stage IA, grade 3 EC or IB grade 1-2 endometrioid type), (3) high-risk (Stage IB, grade 3 endometrioid type or any stage any grade non-endometrioid type). The RadSignature outperformed the ADC and SA models in all the risk groups for all the study participants (Table 5). Notably, on the per patient basis, the RadSignature retained a sensitivity of 100% to detect LN metastasis in all groups; moreover, its specificity was significantly higher than that of the ADC (p = 0.0005) or SA model (p < 0.0001) in the low-risk group. The specificity of the tree model to detect LN metastasis (86%) was significantly higher than that of the SA model in the intermediate-risk (p = 0.0078) and high-risk (p = 0.0313) groups.     The implication of over-diagnosis causes an unnecessary LN dissection, particularly in low-risk patients. Therefore, we conducted a post-hoc subgroup analysis to investigate the possibility of the over-diagnosis or under-diagnosis in a specific risk group. Subgroups were defined according to the European Society of Gynaecological Oncology-European Society for Medical Oncology guidelines [24]: (1) low-risk (stage IA, grade 1-2, endometrioid type), (2) intermediate-risk (stage IA, grade 3 EC or IB grade 1-2 endometrioid type), (3) high-risk (Stage IB, grade 3 endometrioid type or any stage any grade nonendometrioid type). The RadSignature outperformed the ADC and SA models in all the

Discussion
In the present study, we combined all available clinical and MR imaging parameters to build a composite prediction model-the RadSignature. The major advantages of decision-tree analysis are ease in interpretation of the tree using the binary splitting rule, which efficiently balances model accuracy and model simplicity or interpretability, and familiarity of the end user with the modeling technique. The RadSignature model yields an excellent NPV (98%), thus a subset of low-risk patients with EC who may not benefit from lymphadenectomy can be reliably identified. For patients undergoing lymphadenectomy, the prediction model could guide surgery through localization of potential laterality of nodal metastasis with reasonable accuracy. To the best of our knowledge, this is the first model predicting LN metastasis in EC based on the most comprehensive clinical and radiomic information obtained preoperatively.
Although not selected in the decision-tree model, we found that the ADC mean and ADC min of metastatic LNs were significantly lower than those of the benign LNs. Our findings are in line with a previous study that showed that the ADC mean and ADC min of metastatic LNs are significantly lower than those of non-metastatic LNs [10]. A recent publication supporting this point demonstrated that the ADC metrics of lymph nodes, including ADC min , ADC max , ADC mean , ADC SD , and rADC, showed high values enabling differentiation between metastatic and non-metastatic lymph nodes [25]. However, other studies have reported contradictory results of no significant difference in the mean ADC values between metastatic and non-metastatic nodes either at 1.5 T with b = 0, 800 mm 2 /s [9] or at 3 T with b = 0, 1000 mm 2 /s [8]. The controversial result might be attributed to potential bias in manual measurement which might be solved by using computer-aided segmentation in the present study.
LN short-axis diameter is indeed an outstanding factor [10,26] for predicting LN metastasis in EC and was selected in the present decision-tree analysis. However, a study reported no size differences between the metastatic and non-metastatic nodes on T2W images, but reported a significant difference on pathology slices [9]. Such controversial results imply the potential pitfall of LN segmentation on MR imaging, thereby again highlighting the computer-assisted segmentation technique applied in this study could reduce the potential bias caused by selecting small ROI of LN. Notably, based on our previous work, combining size and relative ADC values can result in higher sensitivity (25% vs. 83%) but similar specificity (98% vs. 99%) to detect LN in gynecologic cancers compared to conventional MR imaging [11], with the smallest detected metastatic LN being of 5 mm on its short axis [11].
In the present study, grade I EC with a tumor size < 20 mm can reliably exclude nodal metastasis, as supported by the results from previous studies [6,15,17]. Studies have also shown that preoperative assessment based on MR imaging and tumor histological grade can identify low-risk patients for nodal metastasis, and lymphadenectomy may be omitted in this subgroup of patients [17,27]. The Mayo-modified criteria (well or moderately differentiated endometrioid histology, <50% invasion, and tumor size < 20 mm) are also widely applied to assess nodal disease risk in patients with EC [28]. Our data and all the aforementioned models suggest tumor histology grade and size remain a central role of preoperative assessment for LN metastasis.
Our proof-of-concept model, although seemingly promising, has several limitations that merit further discussion. First, the overfitting of the model may occur due to the smaller sample size relative to the number of features extracted. Although the statistical power was sufficient, as well as the cross-validation and independent set being tested in this prospective study, our preliminary results must be validated externally before a wider adaptation into a clinical decision process. Second, the radiomic features extracted from the images are related to histogram analysis of ADC value while not including the higher order texture analysis, because 97% of the lymph node regions contain <100 pixels for analysis. Third, some imaging characteristics of the LNs (such as LN margin) were not included in the algorithm. Lobulated and spiculated LN margins indicate metastatic LNs, whereas smooth margin suggests benign LNs. Inclusion of this information might further enhance the performance of the prediction model. Finally, region-based analysis was used in this study, but we were unable to assess precise node-to-node radiological pathology correlation. Nevertheless, the strength of the present study is that the computer-assisted segmentation technique could reduce the potential bias caused by selecting small lymph nodes in pelvic MR. The decision-tree learning method has an advantage in interpretation using the binary splitting rule, which efficiently balances model accuracy and model simplicity or interpretability.

Conclusions
In conclusion, computer-aided segmentation and machine learning added values of clinical parameters and DW radiomics for predicting nodal metastasis in EC, with a diagnostic performance superior to that of the current ADC and size criteria. The highthroughput radiomic ADC features through machine learning have potential in building a prediction model to serve as a risk stratification tool for lymphadenectomy and guide the extent of operation through the localization of potential LN metastasis regions.
b-value diffusion weighted images (DWIs) with reference to the apparent diffusion coefficient (ADC) maps and T2-weighted images. The second reader independently verified the ROIs (G.L, a gynecologic radiologist with 10 years of experience). Both readers were blinded to clinical outcome. Care was taken to avoid ROI contaminated by the adjacent normal cervical stroma or vascular structures, or by areas of fluid or Nabothian cysts in the cervix. Normalized ADC (nADC) was computed and used for comparison in this study. The nADC was defined as ADC (tumor or LN)/ADC (reference). ADC reference value was obtained from the urine, with ovoid ROI placed in the center of the bladder lumen. Due to the small size of pelvic LNs, which made them difficult for accurate manual segmentation by MRI, an application software with a graphic user interface ( Figure 2) specific for LN segmentation developed in-house was used for this purpose. The largest LNs in the left and right pelvic areas were identified on axial pelvic DWI along with T2-weighted images (T.Y.S, a radiologist with two years of experience). The identified LN was automatically segmented using a region-growing algorithm (https://www.mathworks.com/ matlabcentral/fileexchange/19084-region-growing/content/regiongrowing.m; accessed on 7 February 2020). Specifically, a pixel is initially selected as the starting point of a region. The region is grown iteratively by comparing the neighboring pixels to the region. In every iteration, the neighboring pixel with the smallest intensity difference to the region mean is included to the region. This process terminates when the intensity difference between region mean and all neighboring pixels exceed a predefined threshold, thereby providing a 100% reproducibility of segmentation. The reason we use 2D instead of 3D is that LN size is too small to be segmented in 3D.

Appendix A.3. Surgical Procedure and Histopathology
Primary surgical treatment consisted of hysterectomy, bilateral salpingo-oophorectomy and pelvic LN dissection. Para-aortic LN dissection was carried out for patients with highrisk histopathological type or with clinical suspicion of deep myometrial invasion. The resected nodes were anatomically labeled in left or right pelvic regions by the surgeons. The LNs were then cut into parallel slices of thickness 2 to 3 mm. All nodal tissue was routinely processed and embedded in paraffin, followed by staining with hematoxylin and eosin. The histopathology report included the number of total harvested LNs and the identified metastatic ones in each region respectively. Histopathologic types and tumor grades were evaluated in the consensus of a general pathologist and a specialized gynecological pathologist (R.C.W), with relevant clinical information available. A consensus read between surgeons, pathologists and radiologists for providing the most accurate assignment between MR and histopathology were performed in the weekly panel discussion of gynecologic oncology.

Appendix A.4. Statistical Analysis
The prediction problem was set to predict pathological LN metastasis, and the target variable was categorical (presence or absence). The decision tree used in this study used iterative back propagation, but unlike deep learning, the decision tree can identify important parameters and determine the priority of the steps to construct an actionable plan. A decision tree, in general, does not have the best predictive accuracy compared with some other machine learning techniques. However, it has the advantage of interpretability, with a format consistent with many clinical pathways. A decision tree based on classification and regression tree method was used to identify combined clinical and MR parameters predictive for LN metastasis. The tree construction was performed using R package rpart (R Foundation, http://cran.r-project.org/web/packages/rpart/rpart.pdf; accessed on 8 August 2020). The classification rules are applied sequentially with each rule partitioning a predictor (so-called attribute) into a binary response. The splitting rule was built based on minimizing the impurity in the attribute, so that selecting a root attribute could vary according to the splitting rule and scaling. The classification and regression tree method automatically identified and removed redundant independent variables. The practical costs of prediction errors would be more important to prevent underdiagnosis. In order to learn a model which have high sensitivity and acceptable accuracy and specificity, based on the data with imbalance structure in LN positive rate, a weighted algorithm was introduced. The selected weight value led to the achievement of 90% sensitivity. The success criteria for prediction were set to achieve high sensitivity and NPV whilst remaining non-inferior specificity to the standard of care, based on metrics in internal validation. The sample size and power calculation of machine learning methods need to be constructed by numerical simulation and estimation. In general, the power can achieve more than 90% if sample size per class is more than 4 at significance level 5%, for a number of groups being less than 10 [29]. In the present study, we optimized the growing of the tree, setting minsplit = 5 and maxdepth = 4, to control the size to avoid over-fitting and low-power. We did not perform data preprocessing, including data cleaning or transformation or outlier removal. Missing data were excluded from the analysis case-wise.
Cancers 2021, 13, x 13 of 1 Figure A1. The correlation matrix demonstrated a high correlation among the ADC parameters