Machine Learning Model to Stratify the Risk of Lymph Node Metastasis for Early Gastric Cancer: A Single-Center Cohort Study

Simple Summary Endoscopic resection (ER) is a treatment option for clinically T1a early gastric cancer (EGC) without suspicion of lymph node metastasis (LNM). In patients with non-curative resection after ER, additional surgery is recommended owing to the LNM risk. However, of those patients treated with additional surgery after ER, the actual rate of LNM was about 5–10%; that is, the other patients underwent unnecessary surgeries. Therefore, it is crucial to estimate LNM risk in EGC patients to determine additional management after ER. We derived a machine learning (ML) model to stratify the LNM risk in EGC patients and validate its performance. The constructed ML model, which showed good performance with an area under the receiver operating characteristic of 0.85 or higher, could stratify LNM risk into very low (<1%), low (<3%), intermediate (<7%), and high (≥7%) risk categories. These findings suggest that the ML model can stratify the LNM risk in EGC patients. Abstract Stratification of the risk of lymph node metastasis (LNM) in patients with non-curative resection after endoscopic resection (ER) for early gastric cancer (EGC) is crucial in determining additional treatment strategies and preventing unnecessary surgery. Hence, we developed a machine learning (ML) model and validated its performance for the stratification of LNM risk in patients with EGC. We enrolled patients who underwent primary surgery or additional surgery after ER for EGC between May 2005 and March 2021. Additionally, patients who underwent ER alone for EGC between May 2005 and March 2016 and were followed up for at least 5 years were included. The ML model was built based on a development set (70%) using logistic regression, random forest (RF), and support vector machine (SVM) analyses and assessed in a validation set (30%). In the validation set, LNM was found in 337 of 4428 patients (7.6%). Among the total patients, the area under the receiver operating characteristic (AUROC) for predicting LNM risk was 0.86 in the logistic regression, 0.85 in RF, and 0.86 in SVM analyses; in patients with initial ER, AUROC for predicting LNM risk was 0.90 in the logistic regression, 0.88 in RF, and 0.89 in SVM analyses. The ML model could stratify the LNM risk into very low (<1%), low (<3%), intermediate (<7%), and high (≥7%) risk categories, which was comparable with actual LNM rates. We demonstrate that the ML model can be used to identify LNM risk. However, this tool requires further validation in EGC patients with non-curative resection after ER for actual application.


Introduction
Early gastric cancer (EGC) describes a gastric tumor confined to the submucosa with or without lymph node metastasis (LNM). Endoscopic resection (ER) is recommended as a minimally invasive treatment for clinically mucosal EGC without suspicion of LNM [1][2][3][4]. In cases of non-curative resection after ER that do not satisfy the expanded criteria of curative resection, additional surgery is recommended, considering the risk of LNM [5,6]; however, LNM is found in only 5-10% of those patients after surgery [7][8][9][10]. Therefore, overtreatment is a concern. To address this, the recently revised guidelines excluded piecemeal resection and a positive lateral margin from the factors of non-curative resection after ER for which additional surgery is primarily recommended [1,4,11].
Furthermore, in Japan, patients who have non-curative resection after ER, excluding piecemeal resection and a positive lateral margin, are classified as "endoscopic curability (eCura) C-2"; patients in the eCura C-2 category are further stratified into low (2.5%), intermediate (6.7%), and high (22.7%) LNM risk categories based on the eCura scoring system [2,12,13]. In the low-risk category, there is no difference in cancer recurrence or cancer-specific mortality between patients who undergo no additional treatment and those who undergo additional surgery [14]. Hence, this LNM risk stratification system suggests that additional surgery after non-curative resection may be determined on an individual basis, considering the LNM risk, the patient's condition, and the benefits and limitations of additional surgery [11,12,14].
Another area of concern is that some patients who were confirmed non-curative resection after ER without actual LNM may be unnecessarily exposed to surgery-related risks. The rates of postoperative complications and overall mortality after gastric cancer surgery are 10-26% and 0.3-2.3%, respectively, and comorbidities, body mass index, and lymph node dissection have been reported as risk factors [15][16][17][18][19][20][21]. In addition, the potential for long-term health problems after gastric cancer surgery, such as reflux, gastroparesis, gallstone, and osteoporosis, must be considered [22,23]. Therefore, it is clinically significant to predict the LNM risk among EGC patients who undergo non-curative resection after ER to prevent unnecessary surgery.
To stratify the LNM risk in EGC patients, we created a machine learning (ML) model for predicting LNM risk and validated its performance.

Patients
We included patients who underwent surgery for EGC between May 2005 and March 2021 at Samsung Medical Center. Additionally, patients who underwent additional surgery after ER owing to complications or non-curative resection were included. Moreover, patients who underwent ER alone for EGC without surgery between May 2005 and March 2016 were included and followed up for at least 5 years. After excluding patients with missing data, a total of 14,760 patients who underwent surgery (n = 12,631) or ER alone (n = 2129) were included ( Figure 1). The patients were randomly divided into the development set (70%) and validation set (30%).

Definition, Outcome, Data Sources, and Study Variables
LNM was defined based on surgical specimens of patients who underwent surgery. In patients who underwent ER alone, regional LN recurrence was determined based on computed tomography scans during follow-up.
The outcome consisted of establishing the ML model for predicting LNM risk in EGC patients and validating its performance. We divided the entire cohort into a development set (70%) for derivation of the ML model and a validation set (30%) for validation. Since the actual target participants were patients treated with ER for EGC, the performance of the ML model was evaluated for total patients and initial ER patients, respectively, using three methods in the development set and validation set. First, the area under the receiver operating characteristic (AUROC), sensitivity, and specificity of the ML model were analyzed. Second, we assessed whether the ML model could stratify the risk of LNM into very low-, low-, intermediate-, and high-risk categories. In the development set, we listed the predicted values calculated by the ML model and selected cutoffs at the points where the actual LNM rates were 1%, 3%, and 7%. An actual LNM rate <1% was allocated into the very low-, <3% into the low-, <7% into the intermediate-, and ≥7% into the high-risk categories. The 3% and 7% criteria for the low-, intermediate-, and high-risk categories were based on the previous literature [12]. Additionally, we set a very-low risk category of predicted LNM risk with <1%. This ML model for stratifying LNM risk was applied to the total patients and patients with initial ER in the validation set. Third, we evaluated the ability of the ML model to discriminate patients with negligible risk of LNM at a high-sensitivity cutoff of 100% to predict LNM. From a clinical perspective, the utility of a risk score depends on its ability to discriminate patients at low risk for LNM, i.e., it is ideal to identify patients who do not need surgery and those who need surgery.

Definition, Outcome, Data Sources, and Study Variables
LNM was defined based on surgical specimens of patients who underwent surgery. In patients who underwent ER alone, regional LN recurrence was determined based on computed tomography scans during follow-up.
The outcome consisted of establishing the ML model for predicting LNM risk in EGC patients and validating its performance. We divided the entire cohort into a development set (70%) for derivation of the ML model and a validation set (30%) for validation. Since the actual target participants were patients treated with ER for EGC, the performance of the ML model was evaluated for total patients and initial ER patients, respectively, using three methods in the development set and validation set. First, the area under the receiver operating characteristic (AUROC), sensitivity, and specificity of the ML model were analyzed. Second, we assessed whether the ML model could stratify the risk of LNM into very low-, low-, intermediate-, and high-risk categories. In the development set, we listed the predicted values calculated by the ML model and selected cutoffs at the points where the actual LNM rates were 1%, 3%, and 7%. An actual LNM rate <1% was allocated into the very low-, <3% into the low-, <7% into the intermediate-, and ≥7% into the high-risk categories. The 3% and 7% criteria for the low-, intermediate-, and high-risk categories were based on the previous literature [12]. Additionally, we set a very-low risk category of predicted LNM risk with <1%. This ML model for stratifying LNM risk was applied to the total patients and patients with initial ER in the validation set. Third, we evaluated the ability of the ML model to discriminate patients with negligible risk of LNM at a highsensitivity cutoff of 100% to predict LNM. From a clinical perspective, the utility of a risk score depends on its ability to discriminate patients at low risk for LNM, i.e., it is ideal to Non-curative resection was defined as not satisfying an expanded criterion for curative resection. The expanded criteria for curative resection were en bloc resection, negative horizontal and vertical margins, absence of lymphovascular invasion, and one of the following: (a) differentiated mucosal cancer without ulcerative lesions, regardless of the tumor size; (b) differentiated mucosal cancer with ulcerative lesions that were ≤3 cm in size; (c) undifferentiated mucosal cancer without ulcerative lesions that were ≤2 cm in size; or (d) differentiated cancer invasion to the submucosa <500 µm from the muscularis mucosa that was ≤3 cm in size.

Establishment of the Machine Learning Model
The ML model was implemented using 3 methods to produce an optimal model based on the development set (70%): logistic regression, support vector machine (SVM), and random forest (RF). We constructed the ML model in the cohort of total patients and patients with initial ER, respectively. This design considered our actual target as EGC patients who were feasible ER. A randomized search algorithm with fivefold nested cross-validation in the development set was conducted for hyperparameter optimization of each method. The algorithm was optimized by randomly searching the given hyperparameter space 1000 times using the development set (Table S1). We selected this search algorithm rather than grid or Bayesian search algorithms because these three methods are fast enough to search all given spaces and have relatively few hyperparameters. The best hyperparameters in a model were chosen when the model had the highest AUROC. The performance of the models with the best hyperparameters was evaluated in the validation set (30%). We defined the weighted factors of 14.0 through the imbalanced rate of the classes. We confirmed the feature importance as permutating a specific variable 100 times. We publicly opened the codes and models at https://github.com/YeongChanLee/Predict-LNM (accessed on 21 February 2022).

Statistical Analysis
Baseline characteristics were compared between the development and validation sets and presented as means (standard deviation) and frequencies (%) for continuous and categorical variables, respectively. The performance of the ML model was evaluated using AUROC, sensitivity, and specificity. The sensitivity and specificity were derived using Youden's index. The risk probability was calculated for the stratification of LNM risk based on the logistic regression, RF, and SVM analyses in the development set. Predicted LNM risk was classified into very low-, low-, intermediate-, and high-risk categories according to the actual LNM rate with a cutoff <1%, <3%, and <7%. We analyzed whether the categories of predicted LNM risk correlated with the real LNM rate. As a subanalysis, the performance of the ML model was compared with the eCura system as a clinical model in cases defined as non-curative resection after ER for EGC in the validation set, using AUROC, net reclassification improvement (NRI), and specificity at a high-sensitivity cutoff of 95%. The ML model was developed using Scikit-learn 0.24.1 and Python 3.8.5. Statistical analyses were performed using R (version 3.5.1, Vienna, Austria).

Baseline Characteristics
A total of 14,760 patients were eligible for analysis; 10,332 patients were randomly sorted into the development set and 4428 into the validation set. LNM was found in 794 of 10,332 patients (7.7%) in the development set and 337 of 4428 patients (7.6%) in the validation set. The baseline characteristics of the development and validation sets are shown in Table 1. They were comparable in most variables, including age, sex, number of tumors, size, gross type, differentiation, Lauren classification, depth of invasion, lymphatic invasion, venous invasion, and perineural invasion. However, the middle-third of the stomach was the most frequent tumor location in the development set whereas the lowerthird of the stomach was the most frequent tumor location in the validation set (p = 0.013).  (2) 0.817 † Mean ± standard deviation presented for continuous variables. Values are expressed as n (%); unless otherwise specified. a p-value calculated using Student's t-test for continuous variables or Pearson's chi-square test for categorical variables for overall data. SM1: submucosal invasion <500 µm from the muscularis mucosa; SM2/3: submucosal invasion ≥500 µm from the muscularis mucosa.

Derivation of the Machine Learning Model
In the development set, LNM was found in 794 of 10,332 patients (7.7%) in the total patients, and in 42 of 2320 patients (1.8%) in patients with initial ER. The derivatated ML model showed good to excellent performance in the development set; in the total patients, logistic regression was AUROC (95% CI), 0. In the development set, LNM risk was predicted using the ML model (logistic regression, RF, and SVM), and the cutoff for the categories of very low, low, intermediate, and high risk was set as the value of the actual LNM rate of <1%, <3%, and <7% in the total patients and initial ER patients, respectively ( Table 2). As an example, in the total patients, LNM risk was stratified using logistic regression into very low (<1%)-, low (<3%)-, intermediate (<7%)-, and high (≥7%)-risk categories, and the cutoff was determined by the actual LNM rate. Each category showed a real LNM rate of 0.2%, 1.4%, 4.1%, and 18.4% (Table 2).  In the development set, LNM risk was predicted using the ML model (logistic regression, RF, and SVM), and the cutoff for the categories of very low, low, intermediate, and high risk was set as the value of the actual LNM rate of <1%, <3%, and <7% in the total patients and initial ER patients, respectively ( Table 2). As an example, in the total patients, LNM risk was stratified using logistic regression into very low (<1%)-, low (<3%)-, intermediate (<7%)-, and high (≥7%)-risk categories, and the cutoff was determined by the actual LNM rate. Each category showed a real LNM rate of 0.2%, 1.4%, 4.1%, and 18.4% (Table 2).

(B) Patients with Initial ER (n = 1016) and LNM (n = 24)
Random forest In the total patients in the validation set, the specificities of the ML model at the high-sensitivity cutoff of 100% were 49%, 46%, and 49% in the logistic regression, RF, and SVM analyses, respectively. In patients with initial ER, the specificities of the ML model at the high-sensitivity cutoff of 100% were 71%, 57%, and 70% in the logistic regression, RF, and SVM analyses, respectively (Figure 4).
Cancers 2022, 14, x 9 of 12 analyses, respectively. In patients with initial ER, the specificities of the ML model at the high-sensitivity cutoff of 100% were 71%, 57%, and 70% in the logistic regression, RF, and SVM analyses, respectively ( Figure 4). In the validation set, as a subanalysis in the patients with non-curative resection after ER for EGC, LNM was found in 21 of 362 patients (5.8%). The AUROC of the ML model was 0.76, 0.73, and 0.75 in the logistic regression, RF, and SVM analyses, respectively, and the AUROC of the eCura system was 0.72. Logistic regression (NRI, 0.46) and SMV (NRI, 0.21) improved the performance compared to the eCura system. The specificities of the ML model at the high-sensitivity cutoff of 95% were 39%, 38%, and 38% in the logistic regression, RF, and SVM analyses, respectively, which were higher than the specificity of 9% for the eCura system ( Figure S1).

Discussion
Here, we demonstrated the utility of an ML model for predicting the LNM risk in EGC patients. In the validation set, the AUROC of each ML model showed a good performance, ranging from 0.85 to 0.90. Furthermore, each ML model could stratify the LNM risk as very low, low, intermediate, and high risk, and those stratified groups showed a consistent actual LNM rate. In addition, these showed specificities of about 0.50 or higher at a matched sensitivity of 100%, indicating that it could discriminate patients with negligible risk of LNM while identifying the patients who needed surgery owing to the LNM risk with 100% sensitivity. This tool can easily be applied in clinical practice to categorize the LNM risk and identify patients with negligible LNM risk under the assumption of maximum sensitivity.
Non-curative resection after ER for EGC patients is a clinical concern. Physicians determine further strategies under careful consideration, accounting for the patient's comorbidities associated with surgical risk and individual preference, and the characteristics of the tumor and surgical procedure. Despite additional surgery owing to non-curative re- In the validation set, as a subanalysis in the patients with non-curative resection after ER for EGC, LNM was found in 21 of 362 patients (5.8%). The AUROC of the ML model was 0.76, 0.73, and 0.75 in the logistic regression, RF, and SVM analyses, respectively, and the AUROC of the eCura system was 0.72. Logistic regression (NRI, 0.46) and SMV (NRI, 0.21) improved the performance compared to the eCura system. The specificities of the ML model at the high-sensitivity cutoff of 95% were 39%, 38%, and 38% in the logistic regression, RF, and SVM analyses, respectively, which were higher than the specificity of 9% for the eCura system ( Figure S1).

Discussion
Here, we demonstrated the utility of an ML model for predicting the LNM risk in EGC patients. In the validation set, the AUROC of each ML model showed a good performance, ranging from 0.85 to 0.90. Furthermore, each ML model could stratify the LNM risk as very low, low, intermediate, and high risk, and those stratified groups showed a consistent actual LNM rate. In addition, these showed specificities of about 0.50 or higher at a matched sensitivity of 100%, indicating that it could discriminate patients with negligible risk of LNM while identifying the patients who needed surgery owing to the LNM risk with 100% sensitivity. This tool can easily be applied in clinical practice to categorize the LNM risk and identify patients with negligible LNM risk under the assumption of maximum sensitivity.
Non-curative resection after ER for EGC patients is a clinical concern. Physicians determine further strategies under careful consideration, accounting for the patient's comorbidities associated with surgical risk and individual preference, and the characteristics of the tumor and surgical procedure. Despite additional surgery owing to non-curative resection after ER, the rate of LNM is only 5-10%; hence, among the patients with noncurative resection, it is clinically significant to identify patients at low risk of LNM to prevent unnecessary surgery. The current guidelines have been revised to address these issues and recommend a more detailed strategy after non-curative resection [1,2,4,11]. In the JGCA guidelines (5th edition), among the factors of non-curative resection, piecemeal resection or a positive lateral margin is defined as eCura C-1, and other factors are described as eCura C-2. Based on these classifications, physicians can determine the appropriate therapeutic options, such as additional ER or coagulation for patients in eCura C-1. For eCura C-2, the eCura scoring system was built based on large-scale data and stratifies LNM risk as low (0-1 point), intermediate (2-4 points), or high (5-7 points) [11,12]. In patients with the low-risk category, there is no difference in cancer recurrence or cancer-specific mortality between patients who receive no additional treatment and those who undergo additional surgery [14]. Similarly, reports that investigated LNM risk in patients with early colon cancer after ER were conducted to prevent unnecessary surgery or excess treatment using the AI system and clinical guidelines [24][25][26][27]. This reflects the necessity for detailed guidance on additional strategies through the stratification of LNM risk in EGC patients with non-curative resection after ER; therefore, this study has clinical significance.
The strength of this study is that it is the first to develop an ML model to predict LNM in patients with EGC and validate its good performance. Furthermore, our study was based on a large sample size and investigated three models (logistic regression, RF, and SVM) to develop an optimal ML model. Considering that the target participants were patients who underwent ER for EGC, the performance of the ML model was verified not only for the total patients but also the patients who received ER as the initial treatment for EGC. In our study, the very low-risk group had an LNM rate of <1%. This is a stricter category than the classifications of previous reports that defined a low risk of LNM as <3%, including nomograms and the eCura system for predicting LNM in EGC patients [11,28]. In addition to the variables included in the nomogram and the eCura system, our ML model was constructed based on various variables, including the number of tumors, tumor location, Lauren classification, perineural invasion, age, sex, gross type, tumor size, differentiation, depth of invasion, lymphatic invasion, and venous invasion [12,28]. Moreover, we utilized the ability of the ML model to comprehensively interpret various factors by subdividing the data of the variables assessed in previous reports [12,28]. For example, the depth of invasion was subdivided into the lamina propria, muscularis mucosae, SM1, and SM2/3.
We evaluated the performance of the ML model using clinically relevant outcomes. In estimating LNM risk in patients with non-curative resection after ER for EGC, achieving a high sensitivity to predict LNM is essential for long-term outcomes. Furthermore, there is a need to identify patients at low risk for LNM to prevent unnecessary surgery. Our ML model showed specificities of 49% in the total patients and 71% in the patients with initial ER at the high-sensitivity cutoff of 100%. When examining only patients with non-curative resection after ER, our ML model showed specificities ranging from 38% to 39% at the high-sensitivity cutoff of 95%, which is significantly increased compared to the specificity of 9% for the eCura system. The sensitivity of 95% was set based on the highest sensitivity achieved by the eCura system. Therefore, the ML model has great clinical potential in that it had better specificity than the eCura system at a high-sensitivity cutoff, despite there being no significant difference in the value of AUROC.
This study had several limitations. First, there may be selection bias due to the exclusion of missing data and the study's retrospective nature; however, this study was designed to develop the ML model, including major factors without missing data. Second, this was a single-center study, and the results need to be validated in other institutions. In addition, it is necessary to validate the performance of the ML model in patients undergoing non-