Recurrence Risk Evaluation in Patients with Papillary Thyroid Carcinoma: Multicenter Machine Learning Evaluation of Lymph Node Variables

Simple Summary The analytic appropriateness and general applicability of lymph node-related risk factors used to predict long-term outcomes in patients with papillary thyroid carcinoma need to be validated. This study aimed to assess detailed lymph node-related risk factors and suggest new risk categories. In the present study, using the K-means clustering algorithm, we determined new cutoffs for lymph node variables besides extranodal extension: 0.2 cm and 1.1 cm for the maximal diameter of metastatic lymph node foci, 4 and 13 for the number of metastatic lymph nodes, and 0.28 and 0.58 for the metastatic lymph node ratio; new lymph node risk categories were suggested. The recurrence-free survival curves of each subgroup classified by these newly determined cutoffs showed significant differences. These newly developed risk categories might be considered when redefining risk stratification or staging systems. Abstract Background: Lymph node (LN)-related risk factors have been updated to predict long-term outcomes in patients with papillary thyroid carcinoma (PTC). However, those factors’ analytic appropriateness and general applicability must be validated. This study aimed to assess LN-related risk factors, and suggest new LN-related risk categories. Methods: This multicenter observational cohort study included 1232 patients with PTC with N1 disease treated with a total thyroidectomy and neck dissection followed by radioactive iodine remnant ablation. Results: The median follow-up duration was 117 months. In the follow-up period, structural recurrence occurred in 225 patients (18.3%). Among LN-related variables, the presence of extranodal extension (p < 0.001), the maximal diameter of metastatic LN foci (p = 0.029), the number of retrieved LNs (p = 0.003), the number of metastatic LNs (p = 0.003), and the metastatic LN ratio (p < 0.001) were independent risk factors for structural recurrence. Since these factors showed a nonlinear association with the hazard ratio of recurrence-free survival (RFS) rates, we calculated their optimal cutoff values using the K-means clustering algorithm, selecting 0.2 cm and 1.1 cm for the maximal diameter of metastatic LN foci, 4 and 13 for the number of metastatic LN, and 0.28 and 0.58 for the metastatic LN ratio. The RFS curves of each subgroup classified by these newly determined cutoff values showed significant differences (p < 0.001). Each LN risk group also showed significantly different RFS rates from the others (p < 0.001). Conclusions: In PTC patients with an N1 classification, our novel LN-related risk estimates may help predict long-term outcomes and design postoperative management and follow-up strategies. After further validation studies based on independent datasets, these risk categories might be considered when redefining risk stratification or staging systems.

Lymph node (LN) metastasis and LN-related variables, such as the number or ratio of metastatic LNs (MLNs), maximal diameter of metastatic LN foci, and extranodal extension, have been investigated as risk factors for target outcomes [1,[3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. The 2015 American Thyroid Association (ATA) guidelines introduced a new risk stratification system based on the number of MLNs (five for low-and intermediate-risk categories) and the maximal diameter of MLN foci (0.2 cm for low-and intermediate-risk categories, and 3.0 cm for intermediate-and high-risk categories) [1]. Although central compartment LN metastases are prevalent in patients with PTC, most international guidelines do not recommend a prophylactic central compartment node dissection (CCND) [1,19,20]. Therefore, the studies identifying detailed LN-related risk factors, particularly for low-and intermediate-risk categories, must have been only from institutions where prophylactic CCND was preferred and might have been limited. Most management and follow-up strategies in patients with PTC are based on individualized estimates of the risk of structural recurrence or RFS rates. Therefore, it is crucial to examine the evidence for the use of existing LN-related risk factors and validate their generalizability.
This study aimed to redefine LN-related risk categories by determining the optimal cutoff values of LN-related variables for RFS rates and to analyze clinicopathological risk factors for structural recurrence in PTC patients with pathological N1 (pN1) who underwent total thyroidectomy (TT) and neck dissection followed by radioactive iodine (RAI) remnant ablation.

Study Population
This observational cohort study included a total of 1232 PTC patients with pN1 who underwent TT and neck LN dissection as an initial surgical procedure and subsequent RAI remnant ablation at Wonju Severance Christian Hospital (Wonju, Korea) (n = 225) and Asan Medical Center (Seoul, Korea) (n = 1007) between 2000 and 2010. Exclusion criteria were as follows: (1) distant metastases detected at initial presentation or within 12 months after initial surgery, (2) patients without a minimum follow-up of 2 years after initial treatment, and (3) patients who underwent thyroid lobectomy.
The clinicopathological characteristics of patients and risk factors associated with structural recurrence were assessed. The LN-related variables included in the statistical analyses were pN1 classification, the number of retrieved and MLNs, the MLN ratio (defined as the number of metastatic LNs divided by the number of retrieved LNs), the maximal diameter of MLN foci, and extranodal extension. Multivariate analysis was used to confirm which LN-related variables were significant. Next, if the significant LNrelated variables were continuous, we evaluated the linearity of the association between the variables and the hazard ratio (HR) of RFS rates to identify the statistical method to be used to determine the optimal LN-related variable cutoffs for the stratification of patients by target outcome.

Surgical Strategy
During the study period, we performed TT for almost all patients with PTC based on the 2009 American Thyroid Association guidelines and preferred prophylactic CCND, at a minimum unilateral CCND, even in the absence of suspicious LNs. We performed bilateral CCND only in patients with bilateral PTC or suspicious LNs on the contralateral side. Additionally, we performed lateral neck dissection only in patients with biopsy-proven lateral LN metastasis.

Radioactive Iodine Remnant Ablation Protocol
Radioactive iodine (RAI) remnant ablation was performed 4-6 weeks after the initial surgery with a thyroid-stimulating hormone (TSH) level higher than 30 mIU/L following thyroid hormone withdrawal or recombinant human TSH administration (2). An RAI activity of 1110 MBq was administered to patients with a multifocal tumor and/or a tumor >1.0 cm without extrathyroidal extension (ETE). An RAI activity of 2960-3700 MBq was administered to patients with any tumor measuring <4.0 cm with ETE and an RAI activity of 5550 MBq was administered to patients with a tumor measuring ≥4.0 cm with or without positive surgical resection margins. At the time of remnant ablation, the serum-stimulated thyroglobulin (Tg) level and the anti-Tg antibody (Ab) level were simultaneously measured. A post-ablation whole-body scan (WBS) was obtained 2 days after the administration of iodine-131.

Postoperative Follow-Up Protocol
All patients were followed at the outpatient clinic every 6 months or year, checking serum Tg/anti-Tg Ab levels and neck ultrasonography during the follow-up period. If the suppressed or stimulated Tg level at 6-12 months after remnant ablation was ≥0.2 ng/mL or ≥1 ng/mL, but neck ultrasonography showed no evidence of disease, 18F-deoxyglucose positron emission tomography (FDG-PET) or chest computed tomography (CT) imaging was considered to rule out the persistent or remnant disease.
We defined structural recurrence as newly identified malignant tissue after a period of no evidence of disease for a minimum of 1 year after initial treatment, confirmed by fine-needle aspiration cytology (FNAC) or surgical biopsy for locoregional lesions and imaging studies for distant lesions.

Statistical Analysis
Descriptive statistics were used to characterize baseline characteristics. Continuous variables are presented as mean ± standard deviation (SD) or median and interquartile range (IQR), and categorical variables as absolute numbers and percentages (%). The t-test and the Mann-Whitney U test were used to compare continuous variables between the two groups. The Chi-squared and Fisher exact tests were used to assess categorical variables. Univariate and multivariate Cox proportional hazard models were used to investigate the relationship between the variables and structural recurrence and to calculate HRs with 95% confidence intervals (CIs). Since the number of retrieved LNs and MLNs, and the MLN ratio were directly related, these were analyzed using separate models. Variance inflation factors (VIF) were used to assess multicollinearity between the independent variables. VIF values exceeding 10 indicated that the relevant variable was highly correlated with the other variables in the model. Restricted cubic spline analysis in the Cox proportional hazard model was used to confirm the linearity of the association between each significant continuous variable and the HR for the RFS rate [21,22]. The K-means clustering algorithm (Supplementary Materials, File S1) was adopted to determine the optimal cutoff values of LN-related continuous variables for the target outcome ( Figure 1) [23].

Baseline Demographic Characteristics
The clinicopathological characteristics of all patients are summarized in Table 1. The median follow-up duration was 117 months (53-134 months). Two hundred twenty-five patients (18.3%) had a structural recurrence. Of these, 199 patients (16.2%) showed locoregional recurrence, 15 patients (1.2%) showed distant metastasis, and 11 patients (0.9%) showed both locoregional recurrence and distant metastasis. Locoregional recurrence sites were the central compartment in 31 patients, previously undissected lateral LNs in 182 patients, and combined central and lateral LNs in 3 patients. Distant metastases were the lung in 15 patients, lung and bone in 3 patients, bone in 1 patient, and multiple organs in 7 patients. Thirteen patients (1.1%) died from PTC during the follow-up period. All patients with locoregional structural recurrence underwent reoperation, and those with distant metastasis underwent a therapeutic high-dose RAI treatment.

Risk Factor Analysis for Structural Recurrence
Multivariate analysis identified the following as independent risk factors for structural recurrence: male sex (HR, 1 Table 2). In addition, pathologic N classification was not an independent risk factor for structural recurrence (p = 0.521) ( Table 2). VIF values for all variables ranging from 1.070 to 3.122 showed no multicollinearity ( Table 2).

Determination of the Optimal Cutoff Values of LN-Related Risk Factors for RFS and Re-Analysis of Risk Factors for Structural Recurrence
The restricted cubic spline analysis showed nonlinear associations of the LN-related continuous variables with the HR for RFS ( Figure 2). Based on the Kaplan-Meier analyses' highest log-rank test score, the K-means clustering algorithm selects the optimal cutoff values of the LN-related risk factors, showing those with the most significant prognostic impact on the target outcome. The optimal cutoff values were 0.2 cm and 1.1 cm for the maximal diameter of MLN foci, 4 and 13 for the number of MLNs, and 0.28 and 0.58 for the MLN ratio ( Figure 2).

Determination of the Optimal Cutoff Values of LN-Related Risk Factors for RFS and Re-analysis of Risk Factors for Structural Recurrence
The restricted cubic spline analysis showed nonlinear associations of the LN-related continuous variables with the HR for RFS (Figure 2). Based on the Kaplan-Meier analyses' highest log-rank test score, the K-means clustering algorithm selects the optimal cutoff values of the LN-related risk factors, showing those with the most significant prognostic impact on the target outcome. The optimal cutoff values were 0.2 cm and 1.1 cm for the maximal diameter of MLN foci, 4 and 13 for the number of MLNs, and 0.28 and 0.58 for the MLN ratio ( Figure 2). When the LN-related continuous variables were converted into categorical variables using these optimal cutoff values, the following were found to be independent risk factors for structural recurrence: maximal diameter of MLN foci >1.1 cm (HR, 2.16 [CI, 1.21-3.87]; When the LN-related continuous variables were converted into categorical variables using these optimal cutoff values, the following were found to be independent risk factors for structural recurrence: maximal diameter of MLN

Comparison of Long-Term Outcome between the Subgroups Classified According to the Newly Determined Cutoff Values of LN-Related Risk Factors
The patients were categorized into three subgroups according to the newly generated values. Additionally, patients were categorized into three subgroups according to the existing cutoff values for the maximal diameter of MLN foci: ≤0.2 cm and >0.2 cm, and ≤3.0 cm and >3.0 cm.
There were significant differences in RFS rates among subgroups classified using the newly determined cutoff values for each LN-related risk factor mentioned above (all p < 0.001) (Figure 3). Subgroups classified by the presence or absence of extranodal extension also showed a significant difference in RFS rates (p < 0.001) ( Figure 3); however, no significant differences in RFS rates were observed for subgroups classified by the current cutoff values for the maximal diameter of MLN foci (Figure 4).

Proposal of New LN-Related Risk Categories for RFS
We have proposed novel detailed LN-related risk categories. These are summarized in Table 3. In addition, we identified a risk category for the presence of extranodal extension based on its HR compared with the HRs of the other LN-related variables classified as high-risk. There were significant differences in the RFS rates of the newly developed LN-related risk groups (p < 0.001) ( Figure 5). There were significant differences in RFS rates among subgroups classified using the newly determined cutoff values for each LN-related risk factor mentioned above (all p < 0.001) (Figure 3). Subgroups classified by the presence or absence of extranodal extension also showed a significant difference in RFS rates (p < 0.001) ( Figure 3); however, no signif icant differences in RFS rates were observed for subgroups classified by the current cutof values for the maximal diameter of MLN foci (Figure 4).

Proposal of New LN-Related Risk Categories for RFS
We have proposed novel detailed LN-related risk categories. These are summarized in Table 3. In addition, we identified a risk category for the presence of extranodal extension based on its HR compared with the HRs of the other LN-related variables classified as high-risk. There were significant differences in the RFS rates of the newly developed LN-related risk groups (p < 0.001) ( Figure 5).

Comparison of Long-Term Outcome between pN0 Patients and pN1 Patients Classified into Each LN Risk Category
We compared the clinicopathological variables of pN1 patients with those of pN0 patients who underwent TT and neck dissection followed by RAI remnant ablation at Wonju Severance Christian Hospital during the same period (Table 4). We matched the patients with pN0 and pN1 disease for the clinicopathological variables showing significant differences by 1:3 propensity score matching (Table 5). After propensity score matching, we compared the RFS rates of pN0 patients and pN1 patients classified into each LN-related risk category; the LN-related low-risk pN1 patients did not show significantly different RFS rates from pN0 patients, but the intermediate-and high-risk pN1 patients did ( Figure 6).

Discussion
This study proposes novel LN-related risk categories with optimal cutoff values for each detailed LN variable selected by the K-means clustering algorithm, based on a largescale, multicenter cohort of PTC patients with pN1 who underwent TT followed by RAI remnant ablation. Considering the nonlinear associations between LN-related continuous Figure 6. Comparison of the recurrence-free survival rates between pathologic N0 patients and pathologic N1 patients classified into the newly developed lymph node-related risk categories after 1:3 propensity score matching.

Discussion
This study proposes novel LN-related risk categories with optimal cutoff values for each detailed LN variable selected by the K-means clustering algorithm, based on a largescale, multicenter cohort of PTC patients with pN1 who underwent TT followed by RAI remnant ablation. Considering the nonlinear associations between LN-related continuous variables and the HR of RFS, the K-means clustering algorithm generated and internally validated the optimal cutoffs for the HR of RFS, which were 0.2 cm and 1.1 cm for the maximal diameter of MLN foci, 4 and 13 for the number of MLNs, and 0.28 and 0.58 for the MLN ratio. Significant differences in the RFS rates were found among the subgroups categorized according to these cutoff values.
The current ATA initial risk stratification system estimates additional LN risk factors, namely the maximal size of MLN (with cutoffs of 0.2 cm and 3.0 cm) and the number of MLNs (with a cutoff value of five) [1]. Despite not being included in the guidelines, other LN-related risk factors, such as the number of retrieved LNs, extranodal extension, and the MLN ratio, have been found to have a prognostic impact [3,4,14,18]. We reviewed the evidence used to identify the recent LN-related risk factors. We focused on the optimal inclusion of clinicopathological variables to adjust risk factors and aimed to use the appropriate tool for modeling continuous variables and the appropriate statistical method to determine the optimal cutoff values and their numbers.
For the maximal size of MLN foci, a study involving 604 patients undergoing initial surgery for PTC > 1 cm demonstrated that large nodal metastasis ≥ 3 cm was an independent risk factor for both DSS and disease-free survival (DFS) in patients aged ≥ 50 years [8].
Another study compared the prognoses of 621 patients with N1b disease with those of 4297 patients with N0 and 125 patients with N1a, revealing that MLNs > 3 cm independently affected both DSS and DFS [10]. In these studies, there were deficiencies in the modeling tool for continuous variables and the statistical method for determining the cutoff value and its number. These studies dichotomized continuous variables with each single cutoff value, including five for the number of MLNs and 3.0 cm for the largest size of MLN, but did not specify how to determine the cutoff values. As these methods are likely to either weaken the model's predictive ability or show a poor association between continuous variables and the target outcome, they are not recommended in the field of medicine [21,22]. Additionally, using a single cutoff may have resulted in the inadequate stratification of patients for the target outcome in models with a nonlinear association. Moreover, these studies did not include the extent of thyroidectomy as a variable in the risk factor analysis, and the latter study examined only four LN-related variables in its risk factor analysis [8,10].
Furthermore, large nodal metastasis ≥3 cm was an independent risk factor for DSS and DFS only in patients aged ≥ 50 years in the former study and only in the N1b subset in the latter study, resulting in issues with the general applicability of these results [8,10]. A further study reported that patients with MLN <0.2 cm showed significantly lower locoregional recurrence than their counterparts (5% vs. 32%, p = 0.015) [24]. However, this was a low-volume study that adopted the definition of LN micrometastases commonly used in breast cancer and other solid tumors but not used for thyroid cancer. A recent study based on 398 patients with pN1a classic PTC demonstrated that LN metastasis < 0.35 cm was an independent risk factor for long-term structural recurrence [3]. Although the investigators treated the variable as a continuous variable, the cutoff was determined based on a linear association between the variable and the target outcome, without considering the possibility of a nonlinear association. Therefore, the generalizability of the cutoff and its number may be challenged.
The prognostic impact of the number of MLNs on DSS and DFS has been previously investigated [8,10]. Two studies found that MLNs ≥5 were an independent risk factor for DFS only in younger patients (<50 and <55 years, respectively), but not for DFS in older patients and DSS [8,10]. Another study involving 148 consecutive patients with PTC with LN metastases examined the significance of the number of MLNs and concluded that a number of metastatic LNs >10 was an independent risk factor for persistent but not recurrent disease [11]. A previous study involving 398 patients with pN1a classic PTC analyzed the number of MLNs as a continuous variable, revealing that a number of MLNs ≥ 4 was an independent risk factor for long-term structural recurrence with significant differences in the RFS rate between the subgroups determined by the cutoff [3]. In addition to the fact that the prognostic significance of these studies was restricted to the specific patient cohort, there were further limitations concerning the modeling tool of continuous variables, the statistical method applied to determine the cutoff values, and the consideration of the possibility of a nonlinear association between variables and outcomes.
Another potential LN-related risk factor as a categorical variable is the MLN ratio. Most studies found a significant association between different MLN ratio thresholds ranging from 0.3 to 0.86 and poor RFS rates [5][6][7]9,12,[14][15][16][17]. However, these studies determined only one cutoff value for the MLN ratio without considering the possibility of a nonlinear association between variables and target outcomes.
A further study based on 305 patients with PTC compared the risk of persistent/recurrent disease (PRD) between pNx, pN0, microscopic pN1, and macroscopic pN1 groups focusing on the prognostic value of microscopic LN involvement (25). The study showed that macroscopic (p < 0.001) and microscopic (p < 0.02) pN1 classification systems were independent risk factors for PRD. Furthermore, the disease-free survival rate significantly decreased from pNx (98%), pN0 (93%), and microscopic pN1 (89%) to macroscopic pN1 patients (70%) (p < 0.001), demonstrating the presence of microscopic LN involvement is associated with an intermediate outcome. The study also comprehensively evaluates LN variables, showing that the size of the largest MLN foci (0.2-1.0 cm and >1.0 cm), the number of MLNs (1-5 and >5), and the presence of extranodal extension were independent risk factors for PRD. Although these results were similar to ours, the study did not avoid the modeling and statistical errors mentioned above. There were further limitations concerning the small patient number (n = 305), short follow-up duration (4 years), and ambiguous and overlapping classification of microscopic (0.2-1.1 cm) and macroscopic (0.7-7.0 cm) LN metastases.
The present study was a large-scale, multicenter study aiming to determine a novel LN-related risk stratification system. Unlike previous studies, the present study included sufficient clinicopathological variables to analyze risk factors for long-term structural recurrence and analyzed continuous variables as a continuous variable to avoid weakening the model's predictive ability. In addition, we excluded multicollinearity between the variables using the VIF test. As we evaluated the LN-related risk factors, we did not differentiate between N1a and N1b since the N classification was not an independent risk factor for structural recurrence, which was consistent with the 8th edition tumor, node, and metastasis staging system classifying N1a and N1b together as N1 [25]. We used the restricted cubic spline analysis to assess the association between the continuous variables and the HR for RFS and identified their nonlinear association [21,22].
Further, we determined the optimal cutoff values for generalizability and consistency with the existing risk stratification system using the K-means clustering algorithm, which automatically selected the optimal cutoff values and their numbers through machine learning [23]. The K-means clustering algorithm first determined the potential cutoff values of LN-related continuous variables which were able to stratify patients for the HR of RFS in one thousand sets of training and test groups that were randomly classified. The algorithm then selected the optimal cutoff values with the highest log-like test score for each LN-related continuous variable. This algorithm could simultaneously determine and validate the optimal cutoff values. Interestingly, some subgroups of LN risk factors sorted by their optimal cutoffs were not independent risk factors for structural recurrence, irrespective of the fact that each LN risk factor subgroup showed significantly different RFS rates. The Cox proportional hazard regression analysis used to assess risk factors imposes a stringent assumption that the association between variables and target outcomes must be linear [21,22]. Therefore, the significance of the continuous variable analyzed as a continuous variable may have been inconsistent with that found when the variable was analyzed as a categorical variable, particularly in models with a nonlinear association with the target outcome.
We acknowledge the limitations of our study, including inherent bias due to its retrospective nature and the preference of the institutions involved for prophylactic CCND, which many international guidelines do not recommend. However, prophylactic LN dissection allowed us to evaluate all of the LN variables in detail. Additionally, even though the exact causes were uncertain, the patient cohort in our study showed relatively high proportions of ETE and N1b classification, which may be indicative of aggressive disease, resulting in a relatively high recurrence rate. However, the clinicopathological status of actual patients undergoing thyroidectomy may be different from those observed in the patients in this study. Moreover, the extent of thyroidectomy and the use of RAI remnant ablation for patients with PTC have shifted toward a more conservative direction, particularly for low-and intermediate-risk patients. Therefore, it is essential to conduct further large-scale, multicenter studies to determine appropriate cutoff values for LN variables based on data from patients undergoing thyroid lobectomy only or total thyroidectomy without RAI therapy.
The present study was a large-scale, multicenter study based on adequate clinicopathological information, and all enrolled patients had a minimum follow-up duration of 2 years after the initial surgery. Additionally, we adopted a modeling tool suitable for continuous variables and used a new statistical method to assess the optimal cutoff values of continuous variables in the model with a nonlinear association with the target outcome. However, the clinical implications of our study and the generalizability of our results need to be corroborated using independent validation datasets.

Conclusions
In PTC patients with N1 classification, we propose novel LN-related risk categories, which could be used to predict long-term outcomes and design targeted postoperative management and follow-up strategies for thyroidectomy patients with N1 PTC. After additional validation studies based on independent datasets, this novel LN-related risk stratification system might be taken into consideration when risk stratification or staging systems are redefined.

Informed Consent Statement:
The requirement for informed consent was waived because there was no intervention.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to the privacy of enrolled patients.

Conflicts of Interest:
The authors declare no conflict of interest.