Risk Prediction of Second Primary Endometrial Cancer in Obese Women: A Hospital-Based Cancer Registry Study

Due to the high effectiveness of cancer screening and therapies, the diagnosis of second primary cancers (SPCs) has increased in women with endometrial cancer (EC). However, previous studies providing adequate evidence to support screening for SPCs in endometrial cancer are lacking. This study aimed to develop effective risk prediction models of second primary endometrial cancer (SPEC) in women with obesity (body mass index (BMI) > 25) and included datasets on the incidence of SPEC and the other risks of SPEC in 4480 primary cancer survivors from a hospital-based cancer registry database. We found that obesity plays a key role in SPEC. We used 10 independent variables as predicting variables, which correlated to obesity, and so should be monitored for the early detection of SPEC in endometrial cancer. Our proposed scheme is promising for SPEC prediction and demonstrates the important influence of obesity and clinical data representation in all cases following primary treatments. Our results suggest that obesity is still a crucial risk factor for SPEC in endometrial cancer.


Introduction
Endometrial cancer (EC) is the most common gynecological malignancy, and its incidence is rising alongside the growing prevalence of obesity [1]. Endometrial cancer affects women worldwide, resulting in an estimated 42,000 deaths annually [2]. EC most commonly occurs after menopause, related to long-term exposure to unopposed estrogens. On average, the overall 5-year survival rate is around 80%. Overweight (defined as body mass index (BMI) of at least 25 kg/m 2 ) also represents an important risk factor in 50% of endometrial cancers. A BMI above 25 kg/m 2 doubles a woman's risk of endometrial cancer, and a BMI above 30 kg/m 2 triples the risk [3,4]. Therefore, understanding the key mechanisms driving endometrial carcinogenesis in primary endometrial cancer (PEC) may affect second primary endometrial cancer (SPEC) diagnoses if aimed at those at highest risk. An understanding of the correlation between obesity and SPEC is critical in developing such prevention strategies [1].
In the Taiwan Cancer Registry database, nine variables are recorded as clinical prognostic factors of EC: (1) age at diagnosis, (2) grade/differentiation, (3) tumor size, (4) clinical stage group, (5) pathologic stage group, (6) surgical margin involvement at the primary site, (7) date of first surgical procedure, (8) sequence of radiotherapy and surgery, and (9) sequence of locoregional therapy and systemic therapy. In this study, we hypothesized that these factors and BMI are important predictors of SPEC in endometrial cancers. Therefore, the purpose of the analysis was to identify the most important risk factors from the 10 predictors listed in Tables 1 and 2.  We suggest potential prevention strategies and demonstrate the need for risk prediction models that identify specific groups of women at particularly high risk of endometrial cancer, for whom risk-reducing interventions are likely to have a significant impact.

Materials and Methods
A hospital-based cohort of 4480 patients diagnosed with endometrial cancer was identified from the database of the Taiwan Cancer Registry from 2009 to 2016. The risk of endometrial cancer in age-and grade-deferential, clinical or pathological stages or therapies was compared using analysis of obese and non-obese groups. Using these different decision tree models, prediction factor combinations for conditions of interest were identified. Moreover, a comprehensive clinical prevention approach was associated with all factors.
We aimed to use data mining methods including support vector machine (SVM), linear discriminant analysis (LDA), logistic regression (LGR), C4.5, classification and regression tree (CART), random forest (RF), and C5.0 to predict second primary endometrial cancer in obese women with different variables ( Table 3). The classification accuracy of the seven methods was evaluated using receiver operating characteristic curve analysis to estimate the area under the curve (AUC) ( Table 4). Accuracy, sensitivity, and specificity were considered in this study ( Figure 1). SVM classifiers operate by separating two classes using a linear decision boundary called the hyperplane. The hyperplane places data to maximize the distance between the hyperplane and instances [5,6]. LDA is a supervised leaning algorithm used for dimensionality reduction and classification. It also uses a feature extraction and data compression technology [7,8].
LGR is the most widely used modeling approach for binary outcomes in epidemiology and medicine. The model is part of the family of generalized linear models that explicitly model the relationship between explanatory variable X and response variable Y [9,10]. The C4.5 decision tree is a common and excellent machine algorithm that selects the decision tree's attributes on each node based on the concept of information entropy. It adopts a greedy approach in which the decision trees are constructed in a top-down recursive divide-and-conquer manner [11,12]. RF is an ensemble learning method. It generates many classification trees by selecting subsets of the given dataset and selecting subsets of predictor variables randomly, finally aggregating the results of all models to obtain a random forest [13]. The C5.0 decision tree is a classification approach that generates the tree in a top-down scheme based on the given information using a recursive process [14]. CART is a decision tree system that uses a binary recursive procedure to partition the data in homogenous subsets based on the Gini index. The CART algorithm classifies data in the process. The classification process is similar to a tree structure, including root, node and leaf. [15,16].   Several researchers have studied the use of machine learning technologies in developing predictive models for cancers. Shih et al. [17] utilized LDA, C4.5 decision trees, and CART to predict early chronic kidney disease in patients. Tseng et al. [18] investigated the use of SVM to predict the recurrence of cervical cancer. Tseng et al. [19] reported on the use of SVM and RF in predicting risk factors and the recurrence of ovarian cancer. The important variables and coding data in Table 3, which were collected by the Taiwan hospital registry database, were used in this study. Based on the literature and discussion with clinicians, we used 10 independent variables that were determined as the risk factors for SPEC as predicting variables.
With the highest AUC value, CART produced an ideal prediction model for the obese women (BMI > 25) (Figure 2) in this study.

Results
During the study period, 520 patients were diagnosed with SPEC in primary endometrial cancers. Figure 2 shows the CART classification tree depicting the SPCs of endometrial cancer predictors. For CART decision tree stratification, the status of the branches of the tree is based on the priority of all independent variables.
All subjects were divided into 11 subgroups, from the root node to leaf nodes, through different branches. As previously explained, the pathologic stage variable has a strong influence on the interpretation of the SPEC and was therefore identified as the root node of the classified decision tree.
The first-rule decision tree was obtained from the following determining factors: pathologic stage (<II) and surgical margin involvement (Yes); the accuracy obtained was 1.0 across 17 samples. The second-rule decision tree was obtained from the following determining factors: pathologic stage (<II), surgical margin involvement (No), tumor size (<2 cm), clinical stage (≥II), and age at diagnosis (≥50); the accuracy obtained was 1.0 across 32 samples. The fourth-rule decision tree was obtained from the following determining factors: pathologic stage (<II), surgical margin involvement (No), tumor size (<2 cm), clinical stage (<II), and sequence of radiotherapy/surgery (Yes); the accuracy obtained was 0.882 across 17 samples. The fifth-rule decision tree was obtained from the following determining factors: pathologic stage (<II), surgical margin involvement (No), tumor size (<2 cm), clinical stage (<II), sequence of radiotherapy/surgery (No), and sequence of locoregional/systemic therapy (Yes); the accuracy obtained was 0.7 across 20 samples. The eighth-rule decision tree was obtained from the following determining factors: pathologic stage (<II), surgical margin involvement (No), tumor size (≥2 cm), age at diagnosis (<50), sequence of locoregional/systemic therapy (Yes), and clinical stage (≥II); the accuracy obtained was 1.0 across 16 samples. Therefore, the decision tree could be divided into abnormal (ABNL; SPEC) or normal (NL; non-SPEC) situations. The accuracy ranged from 68.5% to 100% (Figure 2). Five rules are related to the prediction models of SPEC in endometrial cancer in obese women (Table 5). For obese women (BMI > 25 kg/m 2 ), age (≥50 years, p = 0.019), tumor size (≥2 cm, p < 0.001), clinical stage and pathological stage (<II, p < 0.001), surgery (Yes, p = 0.014), and sequence of radiotherapy/surgery (No, p < 0.001) increased the risk of SPEC in endometrial cancer (Table 6).

Discussion
Recent advances in diagnostic and therapeutic methods have increased the overall survival rate of patients with cancers. As cancer survival rates have increased, the incidence of second primary cancers has gradually increased. However, this phenomenon is due to multiple factors such as genetic or environmental factors and the development of new anti-cancer drugs. In the present study, SPEC in endometrial cancers was observed in 11.6% of 4480 patients who had ever been diagnosed with primary endometrial cancer. Obesity (BMI ≥ 30 kg/m 2 ) is the strongest risk factor for primary EC. For every 5 kg/m 2 increase in BMI, there is a 60% increased risk of EC, with a BMI above 25 kg/m 2 doubling the risk and a BMI above 30 kg/m 2 tripling the risk [20]. However, obesity may not be a crucial risk factor in second primary endometrial cancers [21].
Currently, there is no benefit to early screening for endometrial cancer as screening is unable to decrease mortality from endometrial cancers; it mainly detects women with low-risk tumors [22]. In literature reviews, increasing age and long-term exposure to unopposed estrogens are strong risk factors for endometrial cancer. Metabolic syndrome (obesity, diabetes) is also a well-known risk factor. It alters the concentrations of insulin-like growth factor and its binding proteins [23]. Estrogen receptor transcriptional activity can be induced by signaling by insulin-like growth factor 1 even in the absence of estradiol, which increases the incidence of endometrial cancer [24][25][26][27]. In our study, obesity seems to be an independent risk factor of primary endometrial cancer. It also plays a key role in the incidence of second primary endometrial cancer [28].
The use of preoperative radiotherapy has been abandoned because it interferes with surgical staging and there is no benefit compared to postoperative radiotherapy [1]. The aim of adjuvant radiotherapy is the pelvic lymph-node regions that might contain microscopic metastasis, as well as the central pelvic region and the upper vagina. There is a consensus that patients with lesions of surgical stage IA or IB and grade 1 or 2 (low risk) can be treated without postoperative radiotherapy [29]. Isolated pelvic and vaginal recurrences of low-risk endometrial cancers can be successfully treated at the time of recurrence without radiotherapy. Therefore, radiotherapy is usually used in advanced endometrial cancer. In our study, postoperative radiotherapy was found to be an increasing risk factor in the non-obesity group but a decreasing risk factor in the obesity group.
Endometrial cancer is a surgically staged disease. The most important therapy for endometrial cancer is surgery. Surgical staging provides prognostic information for survivors. In our study, most patients (99.31%) had received surgical intervention for their endometrial cancer. All second primary endometrial cancer was from these patients. In our study, for the obesity group, one early endometrial cancer (stage < II) case who had received surgery without radiotherapy and systemic therapy had a higher risk of second primary endometrial cancer at older age (≥50 years).
In the past, we successfully used data mining classification techniques for building a predictive model of early chronic kidney disease [17]. In this study, we successfully applied 10 prognostic factors to determine SPEC risk factors in obese women using data mining algorithms. However, there might be some limitations from using only Taiwan's local hospital registry database, which may not represent other ethnicities. The tree-based algorithm was dependent on local consensus to decide the variables in the predictive modeling. International database pooling analysis was suggested for future studies. Some clinic-pathological factors such as histological type, family history of cancer, timing of chemotherapy exposure, and regimen used should be included in future analyses. The strength of our current study was a comprehensive Taiwan hospital registry database. Our promising results could guide us to create another possible predictive model for other gynecologic cancers in the future.

Conclusions
Age (>50 years), BMI (>25 kg/m 2 ), grade/differentiation, cancer stage, grade, and adjuvant therapies were used as prognostic factors of endometrial cancer. In our study, we found these factors can be used to predict second primary endometrial cancer. Obesity is an independent risk factor of second primary endometrial cancer.
Obese women have a higher risk of endometrial cancer. In this study, the decision tree could be divided into abnormal (SPEC) or normal (non-SPEC) situations in obese women with primary endometrial cancer, with accuracy ranging from 68.5% to 100%. In obese women, we also identified that age at diagnosis, tumor size, clinical stage and pathological stage, surgery, and the sequence of radiotherapy had important impacts on the predictivity of the models, whereas other predictors, such as grade/differentiation, surgical margin involvement and locoregional/systemic therapy, were less important.