Next Article in Journal
DeepSIGNAL-ITS—Deep Learning Signal Intelligence for Adaptive Traffic Signal Control in Intelligent Transportation Systems
Previous Article in Journal
Double Receiving Coils Eccentricity Self-Compensating Small-Loop Transient Electromagnetic System Based on Robustness Analysis
Previous Article in Special Issue
Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction

1
Department of Computer Science, St. Francis Xavier University, Antigonish, NS B2G 2W5, Canada
2
Department of Physics, St. Francis Xavier University, Antigonish, NS B2G 2W5, Canada
3
Department of Physics, University of Toronto, Toronto, ON M5S 1A7, Canada
4
Nova Scotia Health Authority, Halifax, NS B3H 1V8, Canada
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(17), 9397; https://doi.org/10.3390/app15179397 (registering DOI)
Submission received: 8 May 2025 / Revised: 30 July 2025 / Accepted: 23 August 2025 / Published: 27 August 2025
(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

Abstract

Differentiated thyroid cancer (DTC) poses significant management challenges due to the variable risk of recurrence. This study uses a dataset comprising clinical, pathological, and treatment data from 383 patients to develop and validate machine learning models, combined with feature selection algorithms, for predicting differentiated thyroid cancer recurrence. We evaluated models based on a variety of machine learning technologies (light gradient boosting machine, random forest, k-nearest neighbor, logistic regression, stochastic gradient descent, and an emerging deep learner optimized for tabular data: Gandalf) combined with several feature selection methods. Our feature selection technologies include an emerging redundancy-aware wrapper-based feature selection technique, achieving thyroid cancer recurrence prediction accuracy of 94.8 to 95.9% across two validation methods, based only on whether the patient’s tumor’s response was structurally incomplete, whether their tumor’s stage was advanced (III, IVA, or IVB), and the patient’s age. The results underline the potential for machine learning to enhance the precision of recurrence prediction in DTC while developing technologies whose predictive capacity is more easily explained. Using the same dataset, machine learning and feature selection techniques, this study also provides an analysis on predicting American Thyroid Association (ATA) risk scores. The technologies developed as part of this study have potential for improving the personalization of healthcare through the creation of models based on detailed patient-specific clinical attributes.

1. Introduction

1.1. Overview

Thyroid cancer is the fifth most common cancer in women in 2015, and there are approximately 62,000 new cases a year for men with the most common type being differentiated thyroid cancer (DTC) [1]. The majority of thyroid cancers are well differentiated, having been derived from thyroid follicular cells [2]. Well-differentiated thyroid cancers occur in papillary, follicular, and oncocytic forms, representing 84%, 4%, and 2% of all thyroid cancers, respectively [2]. These cancers are usually curable when found early and in younger patients [2]. However, some patients may experience recurrence or metastasis (meaning it comes back after treatment, possibly having spread to another tissue), which can negatively affect patient survival and quality of life [3]. Technologies that predict recurrence in patients with well-differentiated thyroid cancer can be useful prognostic tools, which can help identify patients who would most benefit from rigorous surveillance and monitoring, and potentially more aggressive treatment strategies.

1.2. Clinical Management

The current method for treating thyroid cancer is surgery, radioactive iodine therapy, and thyroid-stimulating hormone (TSH) suppression therapy [4]. The extent of surgery, and the need for radioactive iodine therapy, depend on the size of the tumor, its location, how aggressively it grows, etc. Surveillance includes serum thyroglobulin (Tg) measurements, TSH-stimulated Tg test, ultrasound, and more [5,6]. The amount of surveillance relied upon depends on many clinical factors [6].
The current method for prognosis uses the American Thyroid Association (ATA) Risk Stratification System, which is often referred to as the ATA risk score. The 2015 ATA guidelines help doctors decide when to perform biopsies on thyroid nodules by classifying them into risk categories. However, there is some concern that these guidelines might underestimate the risk for more serious nodules, and doctors often vary in their assessment of risk [7]. A recent study found that a microRNA-based risk score is more accurate than the traditional ATA score at predicting the progression of thyroid cancer, suggesting it could be a better tool for prognosis [8]. Research also suggests that the ATA’s initial risk predictions can be significantly refined by monitoring how patients respond to their first two years of treatment, which helps tailor follow-up care more effectively [9].

1.3. Literature Review

Machine learning has been used in thyroid cancer diagnosis and management in applications such as detection, classification, prognosis, and treatment [10]. Machine learning models have also been used to predict the outcomes of thyroid cancer patients based on ultrasound-derived features [11]. AI models have been trained to predict the risk of recurrence in patients with well-differentiated thyroid cancer [3]. These models have shown promising improvements over traditional risk assessment methods [3]. AI has considerable potential to assist in thyroid cancer management, and AI technologies have helped with challenges like inaccuracy in predicting prognoses and intra-observer variability in ultrasound imaging. AI’s role in improving diagnostic accuracy and moving toward personalized treatment is becoming more and more prevalent as outlined in the literature [12]. Specifically, AI’s application in thyroid cancer diagnosis has improved sensitivity and specificity, providing new analytic approaches for indeterminate nodules [13]. AI has tremendous potential in improving our approach to personalized medicine, especially in differentiating follicular neoplasms and refining prognoses, where these technologies demonstrate potential to revolutionize thyroid cancer care [10]. AI can even match or outperform less experienced doctors in evaluating thyroid nodules through ultrasonographic diagnosis, making the resultant diagnoses less subjective [14]. AI not only supports radiologists in delivering more accurate diagnoses but can also improve diagnostic efficiency and reduce clinician workload [15]. Considerable research has focused on applications of AI on thyroid cancer recurrence [16,17,18,19,20,21,22,23,24,25,26,27,28,29], applying a variety of machine learning and sometimes feature selection algorithms.
The leading publication addressing the dataset relied upon in this study focused on machine learning for the prediction of thyroid cancer recurrence, “Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study” [3]. Their analysis included a comparison of many machine learning algorithms, including k-nearest neighbor (KNN), support vector machine (SVM) tree-based models, and artificial neural networks (ANNs) with 283 samples used for training and 100 for validation. This study trained the learners three times targeting recurrence: once with all the data available, once with all the data but without the American Thyroid Association (ATA) risk score, and once with just the ATA risk score. When training on all feature measurements available in this dataset, their learning machines achieved their top predictive performance for thyroid cancer recurrence with accuracies of 96% for the SVM, 93% for the k-nearest neighbor, 96% for the decision tree, 95% for the random forest, and 96% for the artificial neural network. When the ATA measurements were excluded, models achieved improved specificity but a decrease in sensitivity, whereas when using only the ATA score, they observed an increase in sensitivity and a decrease in specificity.
Thyroid cancer recurrence prediction with machine learning, based on this dataset, has been the focus of additional research articles [18,20,22,23,27,28,30,31]. One study was focused on the development of explainable deep learning technology for thyroid cancer prediction [30]; however, they did not report predictive accuracies on held-out testing data, and so their results cannot be compared with our analysis. An additional study found that the XGBoost algorithm achieved 98% accuracy in predicting thyroid cancer recurrence [31]. Ensemble methods have also been developed for this application [27,28]. Approaches have also been developed that utilize synthetic minority oversampling (SMOTE) methods [18,20,23]; however, it should be noted that SMOTE analyses create synthetic data, and so performance metrics, such as accuracy, should not be directly compared with studies that rely purely on real-world data. Finally, an approach that relies upon few features in order to support more explainable AI was developed, which relied on distance correlation to select for response, risk, tumor grade and nodal status upon which to inform thyroid cancer recurrence prediction, achieving accuracies of 94.8 to 96.1% [22].

1.4. Problem Statement, Hypotheses and Contributions

Thyroid cancer recurrence is a major health problem, and so the creation of technologies that may assist in accurate explainable predictions of thyroid cancer recurrence and risk have the potential to improve the standard of patient care. In this analysis, we hypothesize that a thorough assessment of a wide variety of machine learning and feature selection algorithms, inclusive of a novel and emerging redundancy-aware based feature selection approach, performed with public domain software, will assist in the creation of high-performing models for the prediction of thyroid cancer recurrence. We further hypothesize that by reducing the input features relied upon to inform prediction with feature selection algorithms, we can create high-performing models whose functionality is easier to explain through the reduced feature set. By identifying a reduced feature set relied upon for high-performing AI models, we can also inform the broader research community as to combinations of clinical variables that are highly predictive of thyroid cancer recurrence. We also hypothesize that machine learning can be relied upon to predict American Thyroid Association (ATA) risk scores. The main contributions of this thyroid cancer-focused manuscript are the consideration of novel redundancy-aware step-up feature selection, the consideration of an emerging deep learning algorithm optimized for tabular data (Gandalf), and the consideration of two applications in ATA risk score prediction in addition to the standard focus on thyroid cancer recurrence prediction.

2. Materials and Methods

As an overview, this manuscript will introduce the dataset the analysis is based upon (Section 2.1); introduce the learning technologies considered (Section 2.2); outline the statistical metrics employed for technology evaluation (Section 2.3); present the results for thyroid cancer recurrence prediction (Section 3.1) and ATA risk prediction (Section 3.2 and Section 3.3); discuss the thyroid cancer recurrence prediction results (Section 4.1), the binarized ATA risk prediction results (Section 4.2), and the regression based ATA risk prediction results (Section 4.3); comparatively discuss the learning machines addressed in this analysis (Section 4.4); and conclude with a discussion of strengths, limitations, and future work (Section 4.5).

2.1. Dataset Description

The dataset relied upon in this analysis is publicly available [32] under a Creative Commons 4.0 International Attribution License. The dataset was obtained from patients with differentiated thyroid cancer (DTC) and included a minimum of 10 years of follow-up, as part of a 15-year study timeframe, with additional details available in the literature [3]. The primary target variable in this dataset is the binary variable recurrence, indicating whether thyroid cancer recurred in a given patient within the 15-year timeframe, and a minimum of 10 years follow-up after treatment. The dataset was acquired from 383 patients, and thyroid cancer recurred in 108 patients (i.e., DTC did not recur in 275 patients) over the 15-year span of the study. Table 1 provides a detailed description of the specific categorical predictor features found in this dataset along with the different possible values for each categorical feature, the number of patients/samples in this dataset with a given categorical feature value (counts), as well as the proportions (percentages) of patients with each categorical feature value. Age is the only numerical variable with mean ± standard deviation (SD): 40.87 ± 15.13.

2.2. Machine Learning

The machine learning for this research was performed using df-analyze [33], which is a public domain software package that implements machine learning classifiers and regression learners, along with statistical validation techniques, on tabular datasets. This includes feature type inference, feature description (e.g., univariate associations and statistics), data cleaning (e.g., missing value handling and imputation), training, and test splitting, feature selection, hyperparameter tuning, and validation (e.g., K-fold and holdout validation) [33]. This benchmarking software package, which facilitates running a thorough competition between leading learning technologies for a given dataset, known as df-analyze, was first used in a neuroscience study on patients with schizophrenia from features extracted from magnetic resonance imaging examinations [34], and it has since been applied to mitigating bias in traffic stop outcomes [35], the diagnosis of chronic kidney disease [36], studying proteins potentially involved with learning in the cerebral cortex [37], and diagnosing, predicting treatment, and staging of pediatric appendicitis [38]. The software was configured to exhaustively compare all combinations of learning algorithms (light gradient boosting machine—lgbm, random forest—rf, logistic regression—lr, stochastic gradient descent—sgd, a tabular data optimized deep learner—gandalf [39], and the k-nearest neighbor algorithm—knn) and feature selection methods (no feature selection—none, filter-based association feature selection—assoc, filter-based prediction feature selection—pred, redundancy-aware step-up wrapper-based feature selection—wrap, embedded-based lgbm feature selection—embed_lgbm, and embedded-based linear feature selection—embed_linear) with full documentation available as part of the df-analyze software package, version 3.3.0 [33]. Note that the redundancy-aware feature selection algorithm is an emerging and novel technique that is a modification of forward stepwise feature selection designed to ensure that redundant features (ones that do not provide substantial additional predictive capacity relative to those features already included) are not added to the model [40]. The software is a single-phase selection technology that is publicly available with a detailed description online on github [40]. This is the first study to report on the performance of redundancy-aware feature selection [40] in thyroid cancer recurrence and risk prediction. This is also the first study to report on the performance of an emerging tabular data optimized deep learner, Gandalf [39], in thyroid cancer recurrence and risk prediction.

2.3. Statistical Analysis

Our statistical analysis was completed twice: implementing k-fold validation with 5 folds, as well as holdout validation, with a large 40% of samples reserved for the holdout set, to help ensure the reliability and reproducibility of the findings. Optuna hyperparameter tuning was completed with 50 runs. The first analysis was focused on predicting whether thyroid cancer would recur—a classification technology for which our statistics, outlined below, are defined in terms of the number of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). For the comparative classification analyses, we provide a thorough set of statistics, including the overall accuracy (acc = (TP + TN)/(TP + TN + FP + FN)), sensitivity (sens = TP/(TP + FN)), specificity (spec = TN/(TN + FP)), area under the receiver operating characteristic curve (auroc, the area under the profile of performance tradeoff between sensitivity and specificity), balanced accuracy (bal-acc = (sens + spec)/2), F1 Score (f1 = TP/(TP + 1/2(FP + FN))), negative predictive value (npv = TN/(TN + FN)), and positive predictive value (ppv = TP/(TP + FP)). The second and third analyses were blinded/agnostic to whether the thyroid cancer would recur (the target variable in the first analysis) and focused on predicting the ATA risk score. In the second analysis, we report on prediction of the ATA risk score as a classification learning task; however, it should be noted that there are insufficient samples in the higher risk categories of the ATA risk score in this dataset to address this task as a multi-class classification problem with three classes. As such, for this experiment, we grouped the intermediate and high ATA risk categories into a single class and addressed the problem as a binary classification task, predicting low vs. elevated (intermediate/high) ATA risk scores with machine learning. In the third analysis, we report on prediction of the ATA risk score as a regression learning task targeting an ordinal encoding of the three ATA risk score categories. For this regression analysis, we provide a thorough set of statistics, including the mean absolute error (mae, which is the average unsigned amount of error in predictions), median absolute error (mdae, which is the sorted median of error in predictions), mean squared error (msqe, the square of the signed differences between predictions and ground truth), coefficient of determination or R2 (r2 = 1 − Sum of Squares of Residuals/Total Sum of Squares), and the variance (var) explained (var-exp = 1 − var(target − predictions)/var(predictions)).

3. Results

3.1. Recurrence Prediction

The main findings of two types of validation, comprising a detailed comparison between all combinations of supported machine learning and feature selection technologies, applied to predicting thyroid cancer recurrence, are reported here. Table A1 provides the leading validation results on the holdout set for recurrence prediction. Table A2 provides the leading k-fold validation results on the holdout set. From the results provided in Table A1 and Table A2, we observe that the wrapper-based redundancy-aware step-up feature selection algorithm (wrap) produces particularly high-quality results when paired with a variety of learning machines (lgbm, lr, rf, sgd, knn). The wrapper-based feature selection results are provided in Table A3.
Note that our highly accurate models (with about 95% accuracy) based on this reduced feature set from redundancy-aware wrapper-based feature selection rely on just three features: age, whether their tumor’s stage was III, IVA, or IVB, and whether their tumor’s response was “Structurally Incomplete”. It is also noteworthy that in feature selection reports, such as the one provided in Table A3, the features are sorted in order of apparent predictive importance. Thus, this method implies that Response and Stage are more predictive of thyroid cancer recurrence than age, though age is assessed to still have value in thyroid cancer recurrence prediction.

3.2. Secondary Application: ATA Risk Score Prediction—Binary Classification

We performed an analysis predicting ATA risk scores as a binary classification task with the intermediate- and high-risk target classes grouped together to create an elevated risk category. This was performed as the elevated risk subcategories were too underrepresented in the dataset to run the df-analyze software employed. To summarize, our leading models achieved accuracies of 86.4 to 93.2% across our two validation methods based on the tabular data optimized deep learner, Gandalf [39], with no feature selection. Detailed results are provided in Table A4.

3.3. Tertiary Application: ATA Risk Score Prediction—Regression

Our thorough competition serves as a method for comparative validation between combinations of feature selection and machine learning algorithms for the task of ATA risk prediction. This task was addressed as a regression task with the target variable encoded as an ordinal, from a dataset that excluded knowledge of whether the patient’s thyroid cancer would recur, and the results are provided here. Table A5 provides the leading validation results on the holdout set, and Table A6 provides the leading K-fold validation results on the holdout set. The results demonstrate that it is possible to create continuous regression models based on an ordinal representation of ATA risk. The filter-based association (assoc) feature selection method, alongside no feature selection (none), produced the highest-performing models. The features selected by the algorithm (assoc) are provided in Table A7.
It is noteworthy that in feature selection reports, such as the one provided in Table A7, that the categorical features are sorted in order of predictive importance. Thus, in Table A7, Adenopathy and Response are highly predictive of ATA risk, whereas being a smoker or having a history of smoking are much less predictive. The final selected features from the filter-based association feature selection method were established from the ranking in Table A7, and they included Age, Adenopathy, Response, Thyroid_Function, Pathology, Physical Examination, Focality, Stage, History of Radiotherapy, Gender, Nodal status (N), History of Smoking, and Smoking. Figure 1 is provided to visualize the regression predictions of leading models (one LGBM, one ElasticNet) alongside the ground truth those learning machines were tasked with predicting.

4. Discussion

This study presents the results of a thorough assessment of many combinations of feature selection algorithms and learning machines for the prediction of three tasks: thyroid cancer recurrence, encoding of ATA risk score prediction as a binary classification task, and ordinal encoding of ATA risk score prediction as a regression target variable. The results demonstrate high-quality results with feature selection reports that assist in identifying leading factors that appear to contribute to thyroid cancer recurrence as well as technologies and underlying features that can potentially contribute to predictive assessments of ATA risk scores.
Once a patient has been diagnosed with thyroid cancer, our models can be relied upon clinically to predict whether or not the patient will experience tumor recurrence. This information would thus be available to the clinicians prior to commencement of the follow-up period. Patients for whom the AI predicts recurrence could theoretically, at the order of the managing clinician, be monitored for recurrence more aggressively (more frequent examinations, additional types of examinations applied, etc.). In turn, this additional monitoring could potentially assist in identifying thyroid cancer recurrence earlier on, thus supporting earlier treatment. Earlier treatment is broadly associated with better patient outcomes; as such, AI models for thyroid cancer recurrence prediction have the potential to improve the standard of patient care.

4.1. Thyroid Cancer Recurrence

We were able to create models predicting thyroid cancer recurrence with ~95% accuracy, using just three underlying feature measurements: age, stage, and response. Our feature selection method also further informs us that knowledge of whether the tumor’s stage was elevated (i.e., stages III, IVA, and IVB) or not is associated with improved accuracy of thyroid cancer recurrence prediction. Our feature selection method also further informs us that knowledge of whether the tumor response was structurally incomplete (or not) is also associated with improved accuracy of thyroid cancer recurrence prediction. These findings help with creating AI models whose functionality is far easier to explain to the clinicians and patients who might benefit from such technology. These findings also help illustrate the value of these three highly predictive features of tumor recurrence, which is potentially useful information for the broader medical community focused on thyroid cancer management. Our findings produce similar accuracies to those reported on in the literature [3,22,31]; however, our approach, based on the emerging redundancy-aware step-up feature selection algorithm, produced a small set of just three underlying features (age, stage and response) that together produce very high-quality recurrence prediction models, potentially helping to create more explainable AI technologies in this domain.

4.2. ATA Binarized Risk Prediction

Our analysis demonstrated two methods for predicting ATA risk scores, including binarized- and regression-based models. Results from binarized prediction provide early promising results for the emerging Gandalf deep learner [39], which is optimized for tabular data, and was the winning learning algorithm in this task. Historically, deep learners have been underperformers when basing predictions on tabular data; thus, these findings indicate considerable potential from future developments in deep learning applied to tabular datasets.

4.3. Regression Risk Prediction of Ordinally Encoded ATA Risk

For our regression-based models, the risk score was encoded as an ordinal variable or a sorted categorical variable. This involves assigning a numerical value to each of the risk scores (for example: low = 1, intermediate = 2, high = 3) and then training a regression-based learning algorithm to predict those values. The advantages of taking such an approach are multiple. This method allows the direct prediction of all categorical class values, even though the elevated risk categories (intermediate and high) are underrepresented in the dataset, making classical multi-class classification for three target classes impossible in our dataset. This method also allows for nuanced per-patient predictions of risk. For example, if a patient’s machine learning assessed risk is in between low and intermediate, or in between intermediate and high, the regression-based learners will communicate this assessment to the clinician and patient through predictions between 1 and 2, or 2 and 3, respectively. Thus, a predicted risk score between 1 and 2 implies a patient at elevated risk relative to those assessed as low risk but at reduced risk relative to those patients assessed as intermediate risk. Similarly, predicted scores between 2 and 3 imply reduced risk relative to the high-risk patients and elevated risk relative to the intermediate-risk patients. Thus, this approach to risk prediction provides potential benefits in patient-specific risk assessments, potentially supporting AI-based risk assessment technology to assist in improving approaches to personalized medicine. Technologies that can predict ATA risk scores have the potential to replace the sometimes subjective and clinician-specific assessment of patient risk. Predicting ATA risk scores was not the focus of previous literature studies; as such, our findings in these two applications are novel. Figure 1 was provided to visualize the distribution of predictions for leading regression models in this application. Note that although risk was encoded as 1, 2 and 3, df-analyze internally employs min–max normalization, which maps all targets and predictions to floating point values in the 0 to 1 range (hence the range of the y axis in Figure 1). The visualized figure implies that the models are struggling to correctly predict the high-risk category, which is severely under-sampled in this dataset. Future work will benefit from increased examples of patients with high-risk ATA categorization. Note that the leading models were created from either no feature selection or the filter-based association feature selection method. The filter-based association FS method tends to select for large numbers of features available; thus, these findings imply that the leading regression ATA risk score models are based on a large proportion of the feature measurements available, potentially implying that additional features that correlate with ATA risk will be needed to further optimize the performance of these models.

4.4. Machine Learning Algorithm Discussion

For recurrence prediction, the leading methods were tied: lr, sgd, knn, rf, and lgbm. This represents almost all machine learning algorithms considered with only the newly emerging and experimental deep learning based method, Gandalf, underperforming. In this situation, there are no obvious major advantages nor disadvantages in choosing any of the leading techniques in terms of model predictive performance/accuracy. However, the KNN classifier offers advantages in terms of example-based explainability, as the underlying technology can provide the user with the feature measurements paired with the target class labels of the k-nearest neighbor training samples to the test sample, providing a level of illustrative explainability not possible with the other techniques. Although the decision tree is a highly explainable learning machine, it consistently underperforms on its own, and so it is not included in the main trial. Conversely, the random forest and the light gradient boosting machine, which are each based on the decision tree, are strong performers; however, they continue to be a challenge to explain on a per-sample basis, as aggregating and interpreting the reason for predictions across many decision trees is difficult. Logistic regression and stochastic gradient descent provide statistical learning baselines to compare all techniques against. When performing regression-based risk prediction, elastic regression and LGBM were the leading techniques; however, Gandalf and KNN also performed well. It was noteworthy that for binarized risk prediction, the emergent deep learner built for tabular data, Gandalf, was the winner, demonstrating considerable potential from developments in deep learning technology applied to tabular data. Deep learners particularly benefit from large sample sizes for training; thus, Gandalf may be at a disadvantage in our study, which may help explain its underperformance in the first and third applications.

4.5. Strengths, Limitations and Future Work

The main limitation of this study is the limited number of samples in this dataset. Future work will apply these methods to larger datasets that are inclusive of more samples with high ATA risk scores. An additional limitation was the set of features considered for this study. Had additional feature measurements been acquired from each patient, we may have more interesting feature selection findings to report as well as potentially helping to produce more highly accurate predictive models. An additional limitation of this study was that we were unable to assess the generalization of the study findings due to a lack of public domain datasets derived from additional populations. Thus, future work will re-apply the methods outlined in this paper on independent datasets after they become available. An additional limitation of this study is that the technology developed is dependent on the availability of the input feature measurements relied upon for basic predictive functionality; as such, in situations where this information is not available (such as clinical assessment of tumor grade), data imputation methods would need to be relied upon, adding an uncharacterized source of error. Additionally, the hosting of AI models on the internet would involve the transmission of patient information to the website hosting the AI technology, which brings with it a risk of privacy violations. Fortunately, our models do not rely on sufficient patient identifying information to uniquely identify a patient from the predictor variables that inform model predictions; as such, it is expected that in future deployments of this technology, patient privacy should not be at risk. However, in order to be extra cautious, such a website could employ encryption technologies in order to help ensure that no third party is capable of intercepting any patient data, even if it is anonymized. An additional limitation of this study was that our regression analysis produced negative values for the coefficient of determination (R2) in multiple technologies, implying potentially poorly fitted models. This finding reinforces our primary evaluation statistic, MAE, which indicates that the average absolute error is approximately 20–30% of the range of the target variable. While this component of the study presents novel methods for a regression assessment of ATA risk, the results imply that future work will be needed. Adding new feature measurements predictive of risk in future studies could improve the quality of fit to the training data and thus support reliable risk predictions. Finally, false predictions are a major limitation of the technology created, of which there are two main types: falsely indicating to a patient that they will have recurrent thyroid cancer when they will not (a problem that might involve the patient receiving additional unnecessary surveillance as well as an unnecessary scare) as well as falsely indicating to a patient that they will not have recurrent thyroid cancer when they will (a much more dangerous problem, as potentially life-saving additional surveillance may not be applied to the patient). Future work will investigate the potential to create AI models with custom operating points that minimize the more serious second type of error at the expense of some additional cases of the earlier type of less serious error. This will involve a study thoroughly evaluating the tradeoffs of AI models with varying operating points and thus varying rates of each type of error.
Strengths of this study include that it was based on an open-source dataset that included a long follow-up on each patient to assess thyroid cancer recurrence (10 years minimum as part of an overall 15-year study). Strengths also include having devoted 40% of our dataset to the holdout testing group, which helps ensure reliability in study findings. However, as a result, fewer samples were available for training our models, potentially resulting in reduced predictive performances reported to what could have been obtained had we trained on larger training datasets and relied on fewer samples for our statistical assessment of predictive performance. This issue may help explain the modest discrepancies between our reported predictive accuracies and those in the literature [3,31]. Strengths also include the use of open-source software, which facilitates the assessment of the potential of a wide variety of feature selection and machine learning technologies, the identification of small subsets of features that together produce reliable highly predictive technologies, and the consideration of ATA risk prediction by multiple methods. Strengths also include the consideration of a novel emerging redundancy-aware step-up wrapper-based feature selection algorithm and the consideration of an emerging deep learner optimized for tabular data. Future work can address the potential for microRNA-based risk scores to inform thyroid cancer recurrence prediction as well as to correlate microRNA-based risk scores to established ATA risk scores.

Author Contributions

Conceptualization, M.A.P., and J.L.; methodology, M.A.P., D.B., X.G.; df-analyze software, D.B., X.G., and J.L.; Validation, D.B., J.L., X.G.; Formal analysis, M.A.P., D.B., X.G., J.L.; Investigation, M.A.P.; Resources, D.B. and J.L.; Data curation, M.A.P.; Writing—original draft preparation, M.A.P.; Writing—review and editing, M.A.P., and J.L.; Supervision, J.L.; Project administration, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by a Canada Foundation for Innovation grant, a Nova Scotia Research and Innovation Trust grant, an NSERC Discovery grant, a Compute Canada Resource Allocation, and a Nova Scotia Health Authority grant to J.L.

Institutional Review Board Statement

The original study detailing this dataset indicated that the “article was submitted with the ethical identifier IR.UMSHA.REC.1402.360 at Hamadan University of Medical Sciences, Hamadan, Iran” [3]. The results presented in this manuscript involved only secondary analysis of de-identified data. The dataset used in this study is publicly available and so institutional review board approval was not required to complete this retrospective analysis.

Informed Consent Statement

The original study detailing this dataset indicated that the “article was submitted with the ethical identifier IR.UMSHA.REC.1402.360 at Hamadan University of Medical Sciences, Hamadan, Iran” [3]. The results presented in this manuscript involved only secondary analysis of de-identified data, thus informed consent was not needed for this analysis.

Data Availability Statement

The dataset used in this study is publicly available and can be accessed from Kaggle at https://www.kaggle.com/datasets/joebeachcapital/differentiated-thyroid-cancer-recurrence (accessed on 7 May 2025). No new data were created or collected specifically for this study. Since this was a retrospective analysis of public domain data, no institutional review board approval was necessary for conducting this study.

Conflicts of Interest

Jacob Levman is the founder of Time Will Tell Technologies, Inc. The authors declare no relevant conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
ANNArtificial Neural Network
ATAAmerican Thyroid Association
aurocArea under the Receiver Operating Characteristic Curve
DTCDifferentiated Thyroid Cancer
FNFalse Negative
FPFalse Positive
KNN/knnK Nearest Neighbor
lgbmLight Gradient Boosting Machine
lrLogistic Regression
MMetastasis
NNode
rfRandom Forest
SDStandard Deviation
sgdStochastic Gradient Descent
SMOTESynthetic Oversampling Method
SVMSupport Vector Machine
TTumor
TgThyroglobulin
TNTrue Negative
TPTrue Positive
TSHThyroid Stimulating Hormone

Appendix A

Table A1. Recurrence prediction competition on holdout set.
Table A1. Recurrence prediction competition on holdout set.
ModelSelectionEmbed_Selectoraccaurocbal-accf1npvppvsensspec
lrwrapnone0.9480.9450.9190.9330.9580.9440.9190.852
sgdwrapnone0.9480.9450.9190.9330.9580.9440.9190.852
knnwrapnone0.9480.9370.9190.9330.9580.9440.9190.852
rfwrapnone0.9480.9610.9190.9330.9580.9440.9190.852
lgbmwrapnone0.9480.9430.9190.9330.9580.9440.9190.852
lgbmprednone0.8850.9800.9140.8710.7160.9910.9140.981
lgbmembed_lgbmlgbm0.8850.9770.9140.8710.7160.9910.9140.981
rfprednone0.8850.9750.8970.8680.7350.9670.8970.926
lrprednone0.8800.9780.9100.8650.7070.9910.9100.981
lrembed_lgbmlgbm0.8800.9770.9100.8650.7070.9910.9100.981
lrnonenone0.8740.9750.9070.8600.6970.9910.9070.981
rfnonenone0.8740.9800.8960.8580.7080.9750.8960.944
lrassocnone0.8690.9750.9030.8550.6880.9910.9030.981
rfembed_lgbmlgbm0.8690.9770.8980.8540.6930.9830.8980.963
lgbmnonenone0.8640.9750.8990.8500.6790.9910.8990.981
lgbmassocnone0.8640.9750.8940.8480.6840.9830.8940.963
gandalfembed_lgbmlgbm0.8430.9400.8460.8190.6760.9350.8460.852
knnassocnone0.8320.9240.8660.8160.6370.9730.8660.944
knnnonenone0.8320.9240.8660.8160.6370.9730.8660.944
rfassocnone0.8120.9700.8520.7960.6070.9720.8520.944
knnembed_lgbmlgbm0.8060.9140.8480.7910.6000.9720.8480.944
knnprednone0.8060.9090.8540.7920.5980.9810.8540.963
sgdassocnone0.7750.8430.8430.7650.5571.0000.8431.000
sgdnonenone0.7750.8430.8430.7650.5571.0000.8431.000
sgdprednone0.7750.8430.8430.7650.5571.0000.8431.000
sgdembed_lgbmlgbm0.7750.8430.8430.7650.5571.0000.8431.000
dummyassocnone0.7170.5000.5000.418nan0.7170.5000.000
dummynonenone0.7170.5000.5000.418nan0.7170.5000.000
dummyembed_lgbmlgbm0.7170.5000.5000.418nan0.7170.5000.000
dummywrapnone0.7170.5000.5000.418nan0.7170.5000.000
dummyprednone0.7170.5000.5000.418nan0.7170.5000.000
gandalfassocnone0.6810.6700.5420.5380.3870.7380.5420.222
gandalfwrapnone0.6650.7880.7440.6580.4550.9510.7440.926
gandalfnonenone0.2930.9250.5070.2370.2861.0000.5071.000
gandalfprednone0.2830.8100.5000.2200.283nan0.5001.000
Table A2. Recurrence prediction competition 5-fold performance on holdout set.
Table A2. Recurrence prediction competition 5-fold performance on holdout set.
ModelSelectionEmbed_Selectoraccaurocbal-accf1npvppvsensspec
knnwrapnone0.9590.9400.9370.9430.9670.9630.9370.889
rfwrapnone0.9480.9500.9190.9290.9630.9490.9190.853
lrwrapnone0.9480.9600.9190.9290.9630.9490.9190.853
rfnonenone0.9480.9970.9240.9300.9500.9550.9240.871
lgbmwrapnone0.9480.9230.9190.9290.9630.9490.9190.853
sgdwrapnone0.9480.9110.9190.9290.9630.9490.9190.853
rfprednone0.9430.9940.9150.9230.9470.9490.9150.853
lrprednone0.9220.9950.8950.8940.9210.9460.8950.835
lgbmnonenone0.9170.9970.8970.8880.8900.9530.8970.853
lrnonenone0.9110.9960.8820.8810.9130.9390.8820.816
lrassocnone0.9110.9960.8820.8810.9130.9390.8820.816
rfembed_lgbmlgbm0.9110.9970.8980.8890.8820.9580.8980.871
lrembed_lgbmlgbm0.9110.9970.8980.8910.8930.9580.8980.871
gandalfassocnone0.9060.9930.8610.8600.9380.9240.8610.760
sgdembed_lgbmlgbm0.9010.8860.8860.8760.8590.9500.8860.853
knnnonenone0.8960.9180.8520.8570.9110.9200.8520.756
knnassocnone0.8960.9180.8520.8570.9110.9200.8520.756
gandalfembed_lgbmlgbm0.8960.9050.8480.8610.8950.9080.8480.740
lgbmprednone0.8900.9920.8730.8600.8480.9440.8730.835
lgbmassocnone0.8900.9930.8780.8650.8790.9530.8780.853
knnprednone0.8900.9480.8730.8610.8560.9440.8730.835
sgdassocnone0.8850.8630.8640.8480.8480.9400.8640.816
sgdprednone0.8850.8690.8690.8550.8380.9440.8690.835
sgdnonenone0.8800.8480.8480.8330.8660.9300.8480.778
knnembed_lgbmlgbm0.8800.9530.8760.8570.8300.9580.8760.871
rfassocnone0.8480.9890.8590.8310.8380.9630.8590.889
lgbmembed_lgbmlgbm0.8430.9970.8500.8210.8350.9580.8500.871
dummynonenone0.7170.5000.5000.418nan0.7170.5000.000
dummyembed_lgbmlgbm0.7170.5000.5000.418nan0.7170.5000.000
dummyassocnone0.7170.5000.5000.418nan0.7170.5000.000
dummywrapnone0.7170.5000.5000.418nan0.7170.5000.000
dummyprednone0.7170.5000.5000.418nan0.7170.5000.000
gandalfprednone0.6430.7800.5070.4090.6430.7120.5070.200
gandalfwrapnone0.5960.8160.6050.4650.6170.8160.6050.618
Table A3. Features selected by wrapper based step-up feature selection. Higher numbers imply more predictive value from that measurement.
Table A3. Features selected by wrapper based step-up feature selection. Higher numbers imply more predictive value from that measurement.
FeatureScore
Response_Structural Incomplete9.425 × 10−1
Stage_nan9.425 × 10−1
Age9.325 × 10−1
Table A4. ATA risk binary classification prediction results on the holdout set.
Table A4. ATA risk binary classification prediction results on the holdout set.
ModelSelectionEmbed_Selectoraccaurocbal-accf1npvppvsensspec
gandalfassocnone0.8900.8980.9150.8860.7611.0000.9151.000
gandalfnonenone0.8640.9330.8810.8580.7410.9620.8810.940
knnassocnone0.8530.9440.8730.8480.7240.9620.8730.940
knnnonenone0.8530.9440.8730.8480.7240.9620.8730.940
sgdassocnone0.8380.9740.8650.8330.6960.9700.8650.955
sgdnonenone0.8380.9740.8650.8330.6960.9700.8650.955
gandalfembed_lgbmlgbm0.8170.9320.8280.8080.6900.9160.8280.866
sgdprednone0.8010.9720.8430.7990.6410.9890.8430.985
knnprednone0.7910.9160.8350.7880.6290.9880.8350.985
gandalfwrapnone0.7910.9030.8040.7820.6550.9040.8040.851
sgdembed_lgbmlgbm0.7490.8360.8030.7480.5840.9870.8030.985
knnwrapnone0.7120.8050.7130.6990.5710.8220.7130.716
knnembed_lgbmlgbm0.7120.8330.7710.7120.5510.9730.7710.970
lrnonenone0.7120.9650.7710.7120.5510.9730.7710.970
lrassocnone0.7120.9670.7710.7120.5510.9730.7710.970
lrprednone0.7070.9660.7670.7070.5460.9720.7670.970
lgbmprednone0.7020.9330.7630.7010.5420.9720.7630.970
lgbmembed_lgbmlgbm0.6960.9110.7560.6960.5380.9580.7560.955
lgbmnonenone0.6960.9250.7560.6960.5380.9580.7560.955
lrembed_lgbmlgbm0.6960.9560.7590.6960.5370.9710.7590.970
rfwrapnone0.6910.9010.7310.6890.5370.8920.7310.866
rfnonenone0.6910.9070.7310.6890.5370.8920.7310.866
rfprednone0.6910.8970.7310.6890.5370.8920.7310.866
rfembed_lgbmlgbm0.6910.9310.7310.6890.5370.8920.7310.866
rfassocnone0.6910.9280.7310.6890.5370.8920.7310.866
lrwrapnone0.6910.9060.7310.6890.5370.8920.7310.866
lgbmassocnone0.6910.9310.7310.6890.5370.8920.7310.866
sgdwrapnone0.6910.8480.7310.6890.5370.8920.7310.866
lgbmwrapnone0.6910.9270.7310.6890.5370.8920.7310.866
dummyassocnone0.6490.5000.5000.394nan0.6490.5000.000
dummyembed_lgbmlgbm0.6490.5000.5000.394nan0.6490.5000.000
dummynonenone0.6490.5000.5000.394nan0.6490.5000.000
dummywrapnone0.6490.5000.5000.394nan0.6490.5000.000
dummyprednone0.6490.5000.5000.394nan0.6490.5000.000
gandalfprednone0.4240.8180.5290.4020.3660.7330.5290.881
Table A5. ATA risk prediction results on the holdout set. regression targeting of ordinal encoding of ATA risk.
Table A5. ATA risk prediction results on the holdout set. regression targeting of ordinal encoding of ATA risk.
ModelSelectionEmbed_Selectormaemdaemsqer2var-exp
lgbmnonenone0.2030.0290.1050.1250.277
lgbmembed_linearlinear0.2070.0750.0990.1780.298
lgbmprednone0.2340.1720.1030.1410.361
elasticnonenone0.2590.1610.124−0.0270.329
knnembed_linearlinear0.2930.2000.186−0.544−0.011
elasticassocnone0.2950.2060.153−0.2690.270
elasticprednone0.3110.2440.164−0.3610.250
knnprednone0.3130.3000.185−0.5340.140
knnnonenone0.3470.4000.220−0.8260.064
knnassocnone0.3470.4000.220−0.8260.064
lgbmassocnone0.3720.3970.220−0.8320.081
elasticembed_linearlinear0.4020.4650.268−1.2280.001
elasticembed_lgbmlgbm0.4040.4770.271−1.2480.000
elasticwrapnone0.4040.4770.271−1.2480.000
lgbmwrapnone0.4040.4770.271−1.2480.000
lgbmembed_lgbmlgbm0.4050.4720.272−1.263−0.009
sgdnonenone0.4060.4890.282−1.3440.016
knnembed_lgbmlgbm0.4070.5000.284−1.3570.000
sgdassocnone0.4090.4950.286−1.3730.008
sgdprednone0.4090.4940.284−1.3630.004
sgdembed_linearlinear0.4100.4990.288−1.3960.001
sgdwrapnone0.4110.5000.289−1.4030.000
rfprednone0.4110.5000.289−1.4040.000
rfwrapnone0.4110.5000.289−1.4040.000
rfnonenone0.4110.5000.289−1.4040.000
rfembed_linearlinear0.4110.5000.289−1.4040.000
dummyassocnone0.4110.5000.289−1.4040.000
rfassocnone0.4110.5000.289−1.4040.000
dummywrapnone0.4110.5000.289−1.4040.000
dummyprednone0.4110.5000.289−1.4040.000
dummynonenone0.4110.5000.289−1.4040.000
dummyembed_linearlinear0.4110.5000.289−1.4040.000
dummyembed_lgbmlgbm0.4110.5000.289−1.4040.000
rfembed_lgbmlgbm0.4110.5000.289−1.4040.000
knnwrapnone0.4110.5000.289−1.4040.000
sgdembed_lgbmlgbm0.4110.4990.289−1.4000.001
Table A6. ATA risk prediction results: five-fold performance on holdout set and regression targeting of ordinal encoding of ATA risk.
Table A6. ATA risk prediction results: five-fold performance on holdout set and regression targeting of ordinal encoding of ATA risk.
ModelSelectionEmbed_Selectormaemdaemsqer2var-exp
elasticnonenone0.2280.1900.078−0.5360.015
elasticassocnone0.2340.1900.081−0.5910.042
lgbmassocnone0.2360.2090.087−0.748−0.087
lgbmnonenone0.2430.2220.089−0.789−0.117
dummyassocnone0.2560.3000.128−1.2550.000
dummyembed_lgbmlgbm0.2560.3000.128−1.2550.000
dummyembed_linearlinear0.2560.3000.128−1.2550.000
dummynonenone0.2560.3000.128−1.2550.000
dummyprednone0.2560.3000.128−1.2550.000
dummywrapnone0.2560.3000.128−1.2550.000
elasticprednone0.2570.2440.096−0.906−0.087
knnnonenone0.2580.2700.099−1.101−0.290
knnassocnone0.2580.2700.099−1.101−0.290
lgbmembed_linearlinear0.2640.2450.101−1.037−0.319
elasticembed_linearlinear0.2760.2820.113−1.192−0.040
knnembed_linearlinear0.2840.2900.120−1.453−0.357
knnembed_lgbmlgbm0.2910.3000.124−1.458−0.121
lgbmprednone0.2970.2740.123−1.556−0.727
knnprednone0.3020.3000.124−1.732−0.613
sgdprednone0.3060.2950.132−1.596−0.277
sgdassocnone0.3120.2970.142−1.776−0.348
elasticwrapnone0.3190.3650.158−1.9490.000
lgbmwrapnone0.3190.3650.158−1.9490.000
elasticembed_lgbmlgbm0.3220.3430.151−1.935−0.182
sgdembed_linearlinear0.3220.3200.150−1.910−0.289
lgbmembed_lgbmlgbm0.3240.3730.158−2.037−0.196
knnwrapnone0.3550.4000.175−2.325−0.011
sgdnonenone0.3790.3660.214−3.081−0.426
rfprednone0.3980.3830.266−4.9050.000
rfwrapnone0.4120.4000.290−5.5680.000
sgdwrapnone0.4120.4000.290−5.572−0.000
Table A7. Continuous and categorical features selected for by filter-based association feature selection. Higher numbers imply more predictive value from that measurement.
Table A7. Continuous and categorical features selected for by filter-based association feature selection. Higher numbers imply more predictive value from that measurement.
### Continuous Features (Mutual Information: Higher = More important)

|   |  mut_info |
|:----|-----------:|
| Age | 1.952 × 10−2 |

### Categorical Features (Kruskal-Wallace H: Higher = More important)

|         |     H |
|:---------------------|----------:|
| Adenopathy      | 3.651 × 102 |
| Response       | 3.331 × 102 |
| Thyroid_Function   | 3.325 × 102 |
| Pathology       | 3.146 × 102 |
| Physical_Examination | 2.994 × 102 |
| Focality       | 2.759 × 102 |
| Stage        | 9.192 × 100 |
| Hx_Radiothreapy   | 6.477 × 100 |
| Gender        | 1.474 × 100 |
| N          | 5.113 × 10−1 |
| Hx_Smoking      | 3.327 × 10−1 |
| Smoking       | 1.034 × 10−2 |

References

  1. Cabanillas, M.E.; McFadden, D.G.; Durante, C. Thyroid Cancer. Lancet 2016, 388, 2783–2795. [Google Scholar] [CrossRef] [PubMed]
  2. Boucai, L.; Zafereo, M.; Cabanillas, M.E. Thyroid Cancer: A Review. JAMA 2024, 331, 425–435. [Google Scholar] [CrossRef] [PubMed]
  3. Borzooei, S.; Briganti, G.; Golparian, M.; Lechien, J.R.; Tarokhian, A. Machine Learning for Risk Stratification of Thyroid Cancer Patients: A 15-Year Cohort Study. Eur. Arch. Otorhinolaryngol. 2024, 281, 2095–2104. [Google Scholar] [CrossRef] [PubMed]
  4. Giuffrida, D.; Prestifilippo, A.; Scarfia, A.; Martino, D.; Marchisotta, S. New Treatment in Advanced Thyroid Cancer. J. Oncol. 2012, 2012, 391629. [Google Scholar] [CrossRef]
  5. Mazzaferri, E.; Robbins, R.; Spencer, C.; Braverman, L.; Pacini, F.; Wartofsky, L.; Haugen, B.; Sherman, S.; Cooper, D.; Braunstein, G.; et al. A Consensus Report of the Role of Serum Thyroglobulin as a Monitoring Method for Low-Risk Patients with Papillary Thyroid Carcinoma. J. Clin. Endocrinol. Metab. 2003, 88, 1433–1441. [Google Scholar] [CrossRef]
  6. Santhanam, P.; Ladenson, P. Surveillance for Differentiated Thyroid Cancer Recurrence. Endocrinol. Metab. Clin. N. Am. 2019, 48, 239–252. [Google Scholar] [CrossRef]
  7. Pandya, A.; Caoili, E.; Jawad-Makki, F.; Wasnik, A.; Shankar, P.R.; Bude, R.; Haymart, M.; Davenport, M. Limitations of the 2015 ATA Guidelines for Prediction of Thyroid Cancer: A Review of 1947 Consecutive Aspirations. J. Clin. Endocrinol. Metab. 2018, 103, 3496–3502. [Google Scholar] [CrossRef]
  8. Toraih, E.; Fawzy, M.; Hussein, M.; EL-Labban, M.; Ruiz, E.M.L.; Attia, A.-E.-A.; Halat, S.; Moroz, K.; Errami, Y.; Zerfaoui, M.; et al. MicroRNA-Based Risk Score for Predicting Tumor Progression Following Radioactive Iodine Ablation in Well-Differentiated Thyroid Cancer Patients: A Propensity-Score Matched Analysis. Cancers 2021, 13, 4649. [Google Scholar] [CrossRef]
  9. Tuttle, R.; Tala, H.; Shah, J.; Leboeuf, R.; Ghossein, R.; Gonen, M.; Brokhin, M.; Omry, G.; Fagin, J.; Shaha, A. Estimating Risk of Recurrence in Differentiated Thyroid Cancer after Total Thyroidectomy and Radioactive Iodine Remnant Ablation: Using Response to Therapy Variables to Modify the Initial Risk Estimates Predicted by the New American Thyroid Association Staging System. Thyroid. Off. J. Am. Thyroid. Assoc. 2010, 20, 1341–1349. [Google Scholar] [CrossRef]
  10. Li, L.-R.; Du, B.; Liu, H.-Q.; Chen, C. Artificial Intelligence for Personalized Medicine in Thyroid Cancer: Current Status and Future Perspectives. Front. Oncol. 2021, 10, 604051. [Google Scholar] [CrossRef]
  11. Paul, R.; Juliano, A.; Faquin, W.; Chan, A.W. An Artificial Intelligence Ultrasound Platform for Screening and Staging of Thyroid Cancer. Int. J. Radiat. Oncol. Biol. Phys. 2022, 112, e8. [Google Scholar] [CrossRef]
  12. Nagendra, L.; Pappachan, J.M.; Fernandez, C.J. Artificial Intelligence in the Diagnosis of Thyroid Cancer: Recent Advances and Future Directions. Artif. Intell. Cancer 2023, 4, 1–10. [Google Scholar] [CrossRef]
  13. Habchi, Y.; Himeur, Y.; Kheddar, H.; Boukabou, A.; Atalla, S.; Chouchane, A.; Ouamane, A.; Mansoor, W. AI in Thyroid Cancer Diagnosis: Techniques, Trends, and Future Directions. Systems 2023, 11, 519. [Google Scholar] [CrossRef]
  14. Ahn, J.; Lee, M.-C. Application of Artificial Intelligence to Evaluate Thyroid Nodules. J. Clin. Otolaryngol. Head Neck Surg. 2023, 34, 17–22. [Google Scholar] [CrossRef]
  15. Cao, C.-L.; Li, Q.; Tong, J.; Shi, L.; Li, W.-X.; Xu, Y.; Cheng, J.; Du, T.-T.; Li, J.; Cui, X. Artificial Intelligence in Thyroid Ultrasound. Front. Oncol. 2023, 13, 1060702. [Google Scholar] [CrossRef]
  16. Kim, S.Y.; Kim, Y.I.; Kim, H.J.; Chang, H.; Kim, S.M.; Lee, Y.S.; Kwon, S.S.; Shin, H.; Chang, H.S.; Park, C.S. New approach of prediction of recurrence of thyroid cancer patients using machine learning. Medicine 2021, 100, e27493. [Google Scholar] [CrossRef]
  17. Grani, G.; Gentili, M.; Siciliano, F.; Albano, D.; Zilioli, V.; Morelli, S.; Puxeddu, E.; Zatelli, M.C.; Gagliardi, I.; Piovesan, A.; et al. A data-driven approach to refine predictions of differentiated thyroid cancer outcomes: A prospective multicenter study. J. Clin. Endocrinol. Metab. 2023, 108, 1921–1928. [Google Scholar] [CrossRef] [PubMed]
  18. Clark, E.; Price, S.; Lucena, T.; Haberlein, B.; Wahbeh, A.; Seetan, R. Predictive Analytics for Thyroid Cancer Recurrence: A Machine Learning Approach. Knowledge 2024, 4, 557–570. [Google Scholar] [CrossRef]
  19. Park, Y.M.; Lee, B.-J. Machine learning-based prediction model using clinico-pathologic factors for papillary thyroid carcinoma recurrence. Sci. Rep. 2021, 11, 4948. [Google Scholar] [CrossRef]
  20. Firat Atay, F.; Yagin, F.H.; Colak, C.; Elkiran, E.T.; Mansuri, N.; Ahmad, F.; Ardigò, L.P. A hybrid machine learning model combining association rule mining and classification algorithmsto predict differentiated thyroid cancer recurrence. Front. Med. 2024, 11, 1461372. [Google Scholar] [CrossRef] [PubMed]
  21. Kim, G.H.; Lee, D.H.; Choi, J.W.; Jeon, H.J.; Park, S. Mulitmodal Neural Network for Recurrence Prediction of Papillary Thyroid Carcinoma. Adv. Intell. Syst. 2023, 5, 2200365. [Google Scholar] [CrossRef]
  22. Arslan, A.K.; Colak, C. Explainable Machine Learning Models for Prediting Recurrence in Differentiated Thyroid Cancer. Med. Rec. 2024, 6, 468–473. [Google Scholar] [CrossRef]
  23. Gurcan, F.; Soylu, A. Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis. Cancers 2024, 16, 3417. [Google Scholar] [CrossRef]
  24. Thomas, J. Predicting Differentiated Thyroid Cancer Outcomes Using Machine Learning: A Move toward Precision Medicine. Clin. Thyroidol. 2024, 36, 64–66. [Google Scholar] [CrossRef]
  25. Gu, J.; Xie, R.; Zhao, Y.; Zhao, Z.; Xu, D.; Ding, M.; Lin, T.; Xu, W.; Nie, Z.; Miao, E.; et al. A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer. Front. Oncol. 2022, 12, 938292. [Google Scholar] [CrossRef]
  26. Mourad, M.; Moubayed, S.; Dezube, A.; Mourad, Y.; Park, K.; Torreblanca-Zanca, A.; Torrecilla, J.S.; Cancilla, J.C.; Wang, J. Machine Learning and Feature Selection Applied to SEER Data to Reliably Assess Thyroid Cancer Prognosis. Sci. Rep. 2020, 10, 5176. [Google Scholar] [CrossRef] [PubMed]
  27. Setiawan, K.E. Predicting recurrence in differentiated thyroid cancer: A comparative analysis of various machine learning models including ensemble methods with chi-squared feature selection, Commun. Math. Biol. Neurosci. 2024, 2024, 55. [Google Scholar]
  28. Sibarani, I.J.B.; Loy, K.M.; Surharjito, S. Enhancing Predictive Accuracy for Differentiated Thyroid Cancer (DTC) Recurrence Through Advanced Data Mining Techniques. TIN Terap. Inform. Nusant. 2024, 5, 11–22. [Google Scholar] [CrossRef]
  29. Xi, N.M.; Wang, L.; Yang, C. Improving the diagnosis of thyroid cancer by machine learning and clinical data. Sci. Reports. 2022, 12, 11143. [Google Scholar] [CrossRef]
  30. Ahmad, M.A.; Haddad, J. An Explainable AI Model for Predicting the Recurrence of Differentiated Thyroid Cancer. In Proceedings of the 2024 Second Jordanian International Biomedical Engineering Conference (JIBEC), Amman, Jordan, 27–28 November 2024. [Google Scholar]
  31. Bharath, K.; Sai Sabatha, A. Predicting Recurrence in Differentiated Thyroid Cancer: A Machine Learning Approach. In Proceedings of the International Conference on Advances in Data Engineering and Intelligent Computing Systems, Chennai, India, 18–19 April 2024. [Google Scholar]
  32. Arvidsson, J. Differentiated Thyroid Cancer Recurrence. 2024. Available online: https://www.kaggle.com/datasets/joebeachcapital/differentiated-thyroid-cancer-recurrence (accessed on 7 May 2025).
  33. Stfxecutables Df-Analyze: AutoML Command-Line Tool 2024. Available online: https://github.com/stfxecutables/df-analyze (accessed on 7 May 2025).
  34. Levman, J.; Jennings, M.; Rouse, E.; Berger, D.; Kabaria, P.; Nangaku, M.; Gondra, I.; Takahashi, E. A Morphological Study of Schizophrenia with Magnetic Resonance Imaging, Advanced Analytics, and Machine Learning. Front. Neurosci. 2022, 16, 926426. [Google Scholar] [CrossRef] [PubMed]
  35. Saville, K.; Berger, D.; Levman, J. Mitigating Bias Due to Race and Gender in Machine Learning Predictions of Traffic Stop Outcomes. Information 2024, 15, 687. [Google Scholar] [CrossRef]
  36. Figueroa, J.; Etim, P.; Shibu, A.; Berger, D.; Levman, J. Diagnosing and Characterizing Chronic Kidney Disease with Machine Learning: The Value of Clinical Patient Characteristics as Evidenced from an Open Dataset. Electronics 2024, 13, 4326. [Google Scholar] [CrossRef]
  37. Huang, X.; Gauthier, C.; Berger, D.; Cai, H.; Levman, J. Identifying Cortical Molecular Biomarkers Potentially Associated with Learning in Mice Using Artificial Intelligence. Int. J. Mol. Sci. 2025, 26, 6878. [Google Scholar] [CrossRef] [PubMed]
  38. Kendall, J.; Gaspar, G.; Berger, D.; Levman, J. Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography 2025, 11, 90. [Google Scholar] [CrossRef]
  39. Joseph, M.; Raj, H. GANDALF: Gated adaptive network for deep automated learning of features. arXiv 2022, arXiv:2207.08548. [Google Scholar]
  40. Berger, D. Redundancy-Aware Feature Selection. Available online: https://github.com/stfxecutables/df-analyze/tree/experimental?tab=readme-ov-file#redundancy-aware-feature-selection-new (accessed on 23 December 2024).
Figure 1. Boxplots of ground truth target variables and leading regression prediction models of ordinally encoded ATA risk.
Figure 1. Boxplots of ground truth target variables and leading regression prediction models of ordinally encoded ATA risk.
Applsci 15 09397 g001
Table 1. Categorical data.
Table 1. Categorical data.
Measurement NameValues with Counts and Proportions (%)
GenderFemale (312, 81.5%), Male (71, 18.5%)
SmokingNo (334, 82.2%), Yes (49, 12.8%)
History (Hx) of SmokingNo (355, 92.7%), Yes (28, 7.3%)
History of RadiotherapyNo (376, 98.2%), Yes 7 (1.8%)
Thyroid FunctionEuthyroid (332, 86.7%), Clinical Hyperthyroidism (20, 5.2%), Subclinical Hypothyroidism (14, 3.7%), Clinical Hypothyroidism (12, 3.1%), Subclinical Hyperthyroidism (5, 1.3%)
Physical ExaminationMultinodal goiter (140, 36.6%), single nodular goiter—right (140, 36.6%), single nodular goiter—left (89, 23.2%), normal (7, 1.8%), diffuse goiter (7, 1.8%)
AdenopathyNo (277, 72.3%), Right (48, 12.5%), Bilateral (32, 8.7%), Left (17, 4.4%), Extensive (7, 1.8%), Posterior (2, 0.5%)
PathologyPapillary (287, 74.9%), micropapillary (48, 12.5%), follicular (28, 7.3%), Hurthel cell (20, 5.2%)
FocalityUni-Focal (247, 64.4%), Multi-Focal (136, 35.5%)
RiskLow (246, 65.0%), Medium (102, 26.6%), High (32, 8.4%)
T—TumorT1a (49, 12.8%), T1b (43, 11.2%), T2 (151, 39.4%), T3a (96, 25.1%), T3b (16, 4.2%), T4a (20, 5.2%), T4b (8, 2.1%)
N—NodeN0 (268, 70.0%), N1b (93, 24.3%), N1a (22, 5.7%)
M—MetastasisM0 (365, 95.3%), M1 (18, 4.7%)
StageI (333, 87.0%), II (32, 8.4%), III (4, 1.0%), IVA (3, 0.8%), IVB (11, 2.9%)
ResponseBiochemical Incomplete (23, 6.0%), Excellent (208, 54.2%), Indeterminate (61, 15.9%), Structural Incomplete (91, 23.7%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Penner, M.A.; Berger, D.; Guo, X.; Levman, J. Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction. Appl. Sci. 2025, 15, 9397. https://doi.org/10.3390/app15179397

AMA Style

Penner MA, Berger D, Guo X, Levman J. Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction. Applied Sciences. 2025; 15(17):9397. https://doi.org/10.3390/app15179397

Chicago/Turabian Style

Penner, Matthew A., Derek Berger, Xuchen Guo, and Jacob Levman. 2025. "Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction" Applied Sciences 15, no. 17: 9397. https://doi.org/10.3390/app15179397

APA Style

Penner, M. A., Berger, D., Guo, X., & Levman, J. (2025). Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction. Applied Sciences, 15(17), 9397. https://doi.org/10.3390/app15179397

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop