Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction
Abstract
1. Introduction
1.1. Overview
1.2. Clinical Management
1.3. Literature Review
1.4. Problem Statement, Hypotheses and Contributions
2. Materials and Methods
2.1. Dataset Description
2.2. Machine Learning
2.3. Statistical Analysis
3. Results
3.1. Recurrence Prediction
3.2. Secondary Application: ATA Risk Score Prediction—Binary Classification
3.3. Tertiary Application: ATA Risk Score Prediction—Regression
4. Discussion
4.1. Thyroid Cancer Recurrence
4.2. ATA Binarized Risk Prediction
4.3. Regression Risk Prediction of Ordinally Encoded ATA Risk
4.4. Machine Learning Algorithm Discussion
4.5. Strengths, Limitations and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
ANN | Artificial Neural Network |
ATA | American Thyroid Association |
auroc | Area under the Receiver Operating Characteristic Curve |
DTC | Differentiated Thyroid Cancer |
FN | False Negative |
FP | False Positive |
KNN/knn | K Nearest Neighbor |
lgbm | Light Gradient Boosting Machine |
lr | Logistic Regression |
M | Metastasis |
N | Node |
rf | Random Forest |
SD | Standard Deviation |
sgd | Stochastic Gradient Descent |
SMOTE | Synthetic Oversampling Method |
SVM | Support Vector Machine |
T | Tumor |
Tg | Thyroglobulin |
TN | True Negative |
TP | True Positive |
TSH | Thyroid Stimulating Hormone |
Appendix A
Model | Selection | Embed_Selector | acc | auroc | bal-acc | f1 | npv | ppv | sens | spec |
---|---|---|---|---|---|---|---|---|---|---|
lr | wrap | none | 0.948 | 0.945 | 0.919 | 0.933 | 0.958 | 0.944 | 0.919 | 0.852 |
sgd | wrap | none | 0.948 | 0.945 | 0.919 | 0.933 | 0.958 | 0.944 | 0.919 | 0.852 |
knn | wrap | none | 0.948 | 0.937 | 0.919 | 0.933 | 0.958 | 0.944 | 0.919 | 0.852 |
rf | wrap | none | 0.948 | 0.961 | 0.919 | 0.933 | 0.958 | 0.944 | 0.919 | 0.852 |
lgbm | wrap | none | 0.948 | 0.943 | 0.919 | 0.933 | 0.958 | 0.944 | 0.919 | 0.852 |
lgbm | pred | none | 0.885 | 0.980 | 0.914 | 0.871 | 0.716 | 0.991 | 0.914 | 0.981 |
lgbm | embed_lgbm | lgbm | 0.885 | 0.977 | 0.914 | 0.871 | 0.716 | 0.991 | 0.914 | 0.981 |
rf | pred | none | 0.885 | 0.975 | 0.897 | 0.868 | 0.735 | 0.967 | 0.897 | 0.926 |
lr | pred | none | 0.880 | 0.978 | 0.910 | 0.865 | 0.707 | 0.991 | 0.910 | 0.981 |
lr | embed_lgbm | lgbm | 0.880 | 0.977 | 0.910 | 0.865 | 0.707 | 0.991 | 0.910 | 0.981 |
lr | none | none | 0.874 | 0.975 | 0.907 | 0.860 | 0.697 | 0.991 | 0.907 | 0.981 |
rf | none | none | 0.874 | 0.980 | 0.896 | 0.858 | 0.708 | 0.975 | 0.896 | 0.944 |
lr | assoc | none | 0.869 | 0.975 | 0.903 | 0.855 | 0.688 | 0.991 | 0.903 | 0.981 |
rf | embed_lgbm | lgbm | 0.869 | 0.977 | 0.898 | 0.854 | 0.693 | 0.983 | 0.898 | 0.963 |
lgbm | none | none | 0.864 | 0.975 | 0.899 | 0.850 | 0.679 | 0.991 | 0.899 | 0.981 |
lgbm | assoc | none | 0.864 | 0.975 | 0.894 | 0.848 | 0.684 | 0.983 | 0.894 | 0.963 |
gandalf | embed_lgbm | lgbm | 0.843 | 0.940 | 0.846 | 0.819 | 0.676 | 0.935 | 0.846 | 0.852 |
knn | assoc | none | 0.832 | 0.924 | 0.866 | 0.816 | 0.637 | 0.973 | 0.866 | 0.944 |
knn | none | none | 0.832 | 0.924 | 0.866 | 0.816 | 0.637 | 0.973 | 0.866 | 0.944 |
rf | assoc | none | 0.812 | 0.970 | 0.852 | 0.796 | 0.607 | 0.972 | 0.852 | 0.944 |
knn | embed_lgbm | lgbm | 0.806 | 0.914 | 0.848 | 0.791 | 0.600 | 0.972 | 0.848 | 0.944 |
knn | pred | none | 0.806 | 0.909 | 0.854 | 0.792 | 0.598 | 0.981 | 0.854 | 0.963 |
sgd | assoc | none | 0.775 | 0.843 | 0.843 | 0.765 | 0.557 | 1.000 | 0.843 | 1.000 |
sgd | none | none | 0.775 | 0.843 | 0.843 | 0.765 | 0.557 | 1.000 | 0.843 | 1.000 |
sgd | pred | none | 0.775 | 0.843 | 0.843 | 0.765 | 0.557 | 1.000 | 0.843 | 1.000 |
sgd | embed_lgbm | lgbm | 0.775 | 0.843 | 0.843 | 0.765 | 0.557 | 1.000 | 0.843 | 1.000 |
dummy | assoc | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | none | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | embed_lgbm | lgbm | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | wrap | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | pred | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
gandalf | assoc | none | 0.681 | 0.670 | 0.542 | 0.538 | 0.387 | 0.738 | 0.542 | 0.222 |
gandalf | wrap | none | 0.665 | 0.788 | 0.744 | 0.658 | 0.455 | 0.951 | 0.744 | 0.926 |
gandalf | none | none | 0.293 | 0.925 | 0.507 | 0.237 | 0.286 | 1.000 | 0.507 | 1.000 |
gandalf | pred | none | 0.283 | 0.810 | 0.500 | 0.220 | 0.283 | nan | 0.500 | 1.000 |
Model | Selection | Embed_Selector | acc | auroc | bal-acc | f1 | npv | ppv | sens | spec |
---|---|---|---|---|---|---|---|---|---|---|
knn | wrap | none | 0.959 | 0.940 | 0.937 | 0.943 | 0.967 | 0.963 | 0.937 | 0.889 |
rf | wrap | none | 0.948 | 0.950 | 0.919 | 0.929 | 0.963 | 0.949 | 0.919 | 0.853 |
lr | wrap | none | 0.948 | 0.960 | 0.919 | 0.929 | 0.963 | 0.949 | 0.919 | 0.853 |
rf | none | none | 0.948 | 0.997 | 0.924 | 0.930 | 0.950 | 0.955 | 0.924 | 0.871 |
lgbm | wrap | none | 0.948 | 0.923 | 0.919 | 0.929 | 0.963 | 0.949 | 0.919 | 0.853 |
sgd | wrap | none | 0.948 | 0.911 | 0.919 | 0.929 | 0.963 | 0.949 | 0.919 | 0.853 |
rf | pred | none | 0.943 | 0.994 | 0.915 | 0.923 | 0.947 | 0.949 | 0.915 | 0.853 |
lr | pred | none | 0.922 | 0.995 | 0.895 | 0.894 | 0.921 | 0.946 | 0.895 | 0.835 |
lgbm | none | none | 0.917 | 0.997 | 0.897 | 0.888 | 0.890 | 0.953 | 0.897 | 0.853 |
lr | none | none | 0.911 | 0.996 | 0.882 | 0.881 | 0.913 | 0.939 | 0.882 | 0.816 |
lr | assoc | none | 0.911 | 0.996 | 0.882 | 0.881 | 0.913 | 0.939 | 0.882 | 0.816 |
rf | embed_lgbm | lgbm | 0.911 | 0.997 | 0.898 | 0.889 | 0.882 | 0.958 | 0.898 | 0.871 |
lr | embed_lgbm | lgbm | 0.911 | 0.997 | 0.898 | 0.891 | 0.893 | 0.958 | 0.898 | 0.871 |
gandalf | assoc | none | 0.906 | 0.993 | 0.861 | 0.860 | 0.938 | 0.924 | 0.861 | 0.760 |
sgd | embed_lgbm | lgbm | 0.901 | 0.886 | 0.886 | 0.876 | 0.859 | 0.950 | 0.886 | 0.853 |
knn | none | none | 0.896 | 0.918 | 0.852 | 0.857 | 0.911 | 0.920 | 0.852 | 0.756 |
knn | assoc | none | 0.896 | 0.918 | 0.852 | 0.857 | 0.911 | 0.920 | 0.852 | 0.756 |
gandalf | embed_lgbm | lgbm | 0.896 | 0.905 | 0.848 | 0.861 | 0.895 | 0.908 | 0.848 | 0.740 |
lgbm | pred | none | 0.890 | 0.992 | 0.873 | 0.860 | 0.848 | 0.944 | 0.873 | 0.835 |
lgbm | assoc | none | 0.890 | 0.993 | 0.878 | 0.865 | 0.879 | 0.953 | 0.878 | 0.853 |
knn | pred | none | 0.890 | 0.948 | 0.873 | 0.861 | 0.856 | 0.944 | 0.873 | 0.835 |
sgd | assoc | none | 0.885 | 0.863 | 0.864 | 0.848 | 0.848 | 0.940 | 0.864 | 0.816 |
sgd | pred | none | 0.885 | 0.869 | 0.869 | 0.855 | 0.838 | 0.944 | 0.869 | 0.835 |
sgd | none | none | 0.880 | 0.848 | 0.848 | 0.833 | 0.866 | 0.930 | 0.848 | 0.778 |
knn | embed_lgbm | lgbm | 0.880 | 0.953 | 0.876 | 0.857 | 0.830 | 0.958 | 0.876 | 0.871 |
rf | assoc | none | 0.848 | 0.989 | 0.859 | 0.831 | 0.838 | 0.963 | 0.859 | 0.889 |
lgbm | embed_lgbm | lgbm | 0.843 | 0.997 | 0.850 | 0.821 | 0.835 | 0.958 | 0.850 | 0.871 |
dummy | none | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | embed_lgbm | lgbm | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | assoc | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | wrap | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
dummy | pred | none | 0.717 | 0.500 | 0.500 | 0.418 | nan | 0.717 | 0.500 | 0.000 |
gandalf | pred | none | 0.643 | 0.780 | 0.507 | 0.409 | 0.643 | 0.712 | 0.507 | 0.200 |
gandalf | wrap | none | 0.596 | 0.816 | 0.605 | 0.465 | 0.617 | 0.816 | 0.605 | 0.618 |
Feature | Score |
---|---|
Response_Structural Incomplete | 9.425 × 10−1 |
Stage_nan | 9.425 × 10−1 |
Age | 9.325 × 10−1 |
Model | Selection | Embed_Selector | acc | auroc | bal-acc | f1 | npv | ppv | sens | spec |
---|---|---|---|---|---|---|---|---|---|---|
gandalf | assoc | none | 0.890 | 0.898 | 0.915 | 0.886 | 0.761 | 1.000 | 0.915 | 1.000 |
gandalf | none | none | 0.864 | 0.933 | 0.881 | 0.858 | 0.741 | 0.962 | 0.881 | 0.940 |
knn | assoc | none | 0.853 | 0.944 | 0.873 | 0.848 | 0.724 | 0.962 | 0.873 | 0.940 |
knn | none | none | 0.853 | 0.944 | 0.873 | 0.848 | 0.724 | 0.962 | 0.873 | 0.940 |
sgd | assoc | none | 0.838 | 0.974 | 0.865 | 0.833 | 0.696 | 0.970 | 0.865 | 0.955 |
sgd | none | none | 0.838 | 0.974 | 0.865 | 0.833 | 0.696 | 0.970 | 0.865 | 0.955 |
gandalf | embed_lgbm | lgbm | 0.817 | 0.932 | 0.828 | 0.808 | 0.690 | 0.916 | 0.828 | 0.866 |
sgd | pred | none | 0.801 | 0.972 | 0.843 | 0.799 | 0.641 | 0.989 | 0.843 | 0.985 |
knn | pred | none | 0.791 | 0.916 | 0.835 | 0.788 | 0.629 | 0.988 | 0.835 | 0.985 |
gandalf | wrap | none | 0.791 | 0.903 | 0.804 | 0.782 | 0.655 | 0.904 | 0.804 | 0.851 |
sgd | embed_lgbm | lgbm | 0.749 | 0.836 | 0.803 | 0.748 | 0.584 | 0.987 | 0.803 | 0.985 |
knn | wrap | none | 0.712 | 0.805 | 0.713 | 0.699 | 0.571 | 0.822 | 0.713 | 0.716 |
knn | embed_lgbm | lgbm | 0.712 | 0.833 | 0.771 | 0.712 | 0.551 | 0.973 | 0.771 | 0.970 |
lr | none | none | 0.712 | 0.965 | 0.771 | 0.712 | 0.551 | 0.973 | 0.771 | 0.970 |
lr | assoc | none | 0.712 | 0.967 | 0.771 | 0.712 | 0.551 | 0.973 | 0.771 | 0.970 |
lr | pred | none | 0.707 | 0.966 | 0.767 | 0.707 | 0.546 | 0.972 | 0.767 | 0.970 |
lgbm | pred | none | 0.702 | 0.933 | 0.763 | 0.701 | 0.542 | 0.972 | 0.763 | 0.970 |
lgbm | embed_lgbm | lgbm | 0.696 | 0.911 | 0.756 | 0.696 | 0.538 | 0.958 | 0.756 | 0.955 |
lgbm | none | none | 0.696 | 0.925 | 0.756 | 0.696 | 0.538 | 0.958 | 0.756 | 0.955 |
lr | embed_lgbm | lgbm | 0.696 | 0.956 | 0.759 | 0.696 | 0.537 | 0.971 | 0.759 | 0.970 |
rf | wrap | none | 0.691 | 0.901 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
rf | none | none | 0.691 | 0.907 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
rf | pred | none | 0.691 | 0.897 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
rf | embed_lgbm | lgbm | 0.691 | 0.931 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
rf | assoc | none | 0.691 | 0.928 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
lr | wrap | none | 0.691 | 0.906 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
lgbm | assoc | none | 0.691 | 0.931 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
sgd | wrap | none | 0.691 | 0.848 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
lgbm | wrap | none | 0.691 | 0.927 | 0.731 | 0.689 | 0.537 | 0.892 | 0.731 | 0.866 |
dummy | assoc | none | 0.649 | 0.500 | 0.500 | 0.394 | nan | 0.649 | 0.500 | 0.000 |
dummy | embed_lgbm | lgbm | 0.649 | 0.500 | 0.500 | 0.394 | nan | 0.649 | 0.500 | 0.000 |
dummy | none | none | 0.649 | 0.500 | 0.500 | 0.394 | nan | 0.649 | 0.500 | 0.000 |
dummy | wrap | none | 0.649 | 0.500 | 0.500 | 0.394 | nan | 0.649 | 0.500 | 0.000 |
dummy | pred | none | 0.649 | 0.500 | 0.500 | 0.394 | nan | 0.649 | 0.500 | 0.000 |
gandalf | pred | none | 0.424 | 0.818 | 0.529 | 0.402 | 0.366 | 0.733 | 0.529 | 0.881 |
Model | Selection | Embed_Selector | mae | mdae | msqe | r2 | var-exp |
---|---|---|---|---|---|---|---|
lgbm | none | none | 0.203 | 0.029 | 0.105 | 0.125 | 0.277 |
lgbm | embed_linear | linear | 0.207 | 0.075 | 0.099 | 0.178 | 0.298 |
lgbm | pred | none | 0.234 | 0.172 | 0.103 | 0.141 | 0.361 |
elastic | none | none | 0.259 | 0.161 | 0.124 | −0.027 | 0.329 |
knn | embed_linear | linear | 0.293 | 0.200 | 0.186 | −0.544 | −0.011 |
elastic | assoc | none | 0.295 | 0.206 | 0.153 | −0.269 | 0.270 |
elastic | pred | none | 0.311 | 0.244 | 0.164 | −0.361 | 0.250 |
knn | pred | none | 0.313 | 0.300 | 0.185 | −0.534 | 0.140 |
knn | none | none | 0.347 | 0.400 | 0.220 | −0.826 | 0.064 |
knn | assoc | none | 0.347 | 0.400 | 0.220 | −0.826 | 0.064 |
lgbm | assoc | none | 0.372 | 0.397 | 0.220 | −0.832 | 0.081 |
elastic | embed_linear | linear | 0.402 | 0.465 | 0.268 | −1.228 | 0.001 |
elastic | embed_lgbm | lgbm | 0.404 | 0.477 | 0.271 | −1.248 | 0.000 |
elastic | wrap | none | 0.404 | 0.477 | 0.271 | −1.248 | 0.000 |
lgbm | wrap | none | 0.404 | 0.477 | 0.271 | −1.248 | 0.000 |
lgbm | embed_lgbm | lgbm | 0.405 | 0.472 | 0.272 | −1.263 | −0.009 |
sgd | none | none | 0.406 | 0.489 | 0.282 | −1.344 | 0.016 |
knn | embed_lgbm | lgbm | 0.407 | 0.500 | 0.284 | −1.357 | 0.000 |
sgd | assoc | none | 0.409 | 0.495 | 0.286 | −1.373 | 0.008 |
sgd | pred | none | 0.409 | 0.494 | 0.284 | −1.363 | 0.004 |
sgd | embed_linear | linear | 0.410 | 0.499 | 0.288 | −1.396 | 0.001 |
sgd | wrap | none | 0.411 | 0.500 | 0.289 | −1.403 | 0.000 |
rf | pred | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
rf | wrap | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
rf | none | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
rf | embed_linear | linear | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
dummy | assoc | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
rf | assoc | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
dummy | wrap | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
dummy | pred | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
dummy | none | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
dummy | embed_linear | linear | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
dummy | embed_lgbm | lgbm | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
rf | embed_lgbm | lgbm | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
knn | wrap | none | 0.411 | 0.500 | 0.289 | −1.404 | 0.000 |
sgd | embed_lgbm | lgbm | 0.411 | 0.499 | 0.289 | −1.400 | 0.001 |
Model | Selection | Embed_Selector | mae | mdae | msqe | r2 | var-exp |
---|---|---|---|---|---|---|---|
elastic | none | none | 0.228 | 0.190 | 0.078 | −0.536 | 0.015 |
elastic | assoc | none | 0.234 | 0.190 | 0.081 | −0.591 | 0.042 |
lgbm | assoc | none | 0.236 | 0.209 | 0.087 | −0.748 | −0.087 |
lgbm | none | none | 0.243 | 0.222 | 0.089 | −0.789 | −0.117 |
dummy | assoc | none | 0.256 | 0.300 | 0.128 | −1.255 | 0.000 |
dummy | embed_lgbm | lgbm | 0.256 | 0.300 | 0.128 | −1.255 | 0.000 |
dummy | embed_linear | linear | 0.256 | 0.300 | 0.128 | −1.255 | 0.000 |
dummy | none | none | 0.256 | 0.300 | 0.128 | −1.255 | 0.000 |
dummy | pred | none | 0.256 | 0.300 | 0.128 | −1.255 | 0.000 |
dummy | wrap | none | 0.256 | 0.300 | 0.128 | −1.255 | 0.000 |
elastic | pred | none | 0.257 | 0.244 | 0.096 | −0.906 | −0.087 |
knn | none | none | 0.258 | 0.270 | 0.099 | −1.101 | −0.290 |
knn | assoc | none | 0.258 | 0.270 | 0.099 | −1.101 | −0.290 |
lgbm | embed_linear | linear | 0.264 | 0.245 | 0.101 | −1.037 | −0.319 |
elastic | embed_linear | linear | 0.276 | 0.282 | 0.113 | −1.192 | −0.040 |
knn | embed_linear | linear | 0.284 | 0.290 | 0.120 | −1.453 | −0.357 |
knn | embed_lgbm | lgbm | 0.291 | 0.300 | 0.124 | −1.458 | −0.121 |
lgbm | pred | none | 0.297 | 0.274 | 0.123 | −1.556 | −0.727 |
knn | pred | none | 0.302 | 0.300 | 0.124 | −1.732 | −0.613 |
sgd | pred | none | 0.306 | 0.295 | 0.132 | −1.596 | −0.277 |
sgd | assoc | none | 0.312 | 0.297 | 0.142 | −1.776 | −0.348 |
elastic | wrap | none | 0.319 | 0.365 | 0.158 | −1.949 | 0.000 |
lgbm | wrap | none | 0.319 | 0.365 | 0.158 | −1.949 | 0.000 |
elastic | embed_lgbm | lgbm | 0.322 | 0.343 | 0.151 | −1.935 | −0.182 |
sgd | embed_linear | linear | 0.322 | 0.320 | 0.150 | −1.910 | −0.289 |
lgbm | embed_lgbm | lgbm | 0.324 | 0.373 | 0.158 | −2.037 | −0.196 |
knn | wrap | none | 0.355 | 0.400 | 0.175 | −2.325 | −0.011 |
sgd | none | none | 0.379 | 0.366 | 0.214 | −3.081 | −0.426 |
rf | pred | none | 0.398 | 0.383 | 0.266 | −4.905 | 0.000 |
rf | wrap | none | 0.412 | 0.400 | 0.290 | −5.568 | 0.000 |
sgd | wrap | none | 0.412 | 0.400 | 0.290 | −5.572 | −0.000 |
### Continuous Features (Mutual Information: Higher = More important) |
| | mut_info | |:----|-----------:| | Age | 1.952 × 10−2 | |
### Categorical Features (Kruskal-Wallace H: Higher = More important) |
| | H | |:---------------------|----------:| | Adenopathy | 3.651 × 102 | | Response | 3.331 × 102 | | Thyroid_Function | 3.325 × 102 | | Pathology | 3.146 × 102 | | Physical_Examination | 2.994 × 102 | | Focality | 2.759 × 102 | | Stage | 9.192 × 100 | | Hx_Radiothreapy | 6.477 × 100 | | Gender | 1.474 × 100 | | N | 5.113 × 10−1 | | Hx_Smoking | 3.327 × 10−1 | | Smoking | 1.034 × 10−2 | |
References
- Cabanillas, M.E.; McFadden, D.G.; Durante, C. Thyroid Cancer. Lancet 2016, 388, 2783–2795. [Google Scholar] [CrossRef] [PubMed]
- Boucai, L.; Zafereo, M.; Cabanillas, M.E. Thyroid Cancer: A Review. JAMA 2024, 331, 425–435. [Google Scholar] [CrossRef] [PubMed]
- Borzooei, S.; Briganti, G.; Golparian, M.; Lechien, J.R.; Tarokhian, A. Machine Learning for Risk Stratification of Thyroid Cancer Patients: A 15-Year Cohort Study. Eur. Arch. Otorhinolaryngol. 2024, 281, 2095–2104. [Google Scholar] [CrossRef] [PubMed]
- Giuffrida, D.; Prestifilippo, A.; Scarfia, A.; Martino, D.; Marchisotta, S. New Treatment in Advanced Thyroid Cancer. J. Oncol. 2012, 2012, 391629. [Google Scholar] [CrossRef]
- Mazzaferri, E.; Robbins, R.; Spencer, C.; Braverman, L.; Pacini, F.; Wartofsky, L.; Haugen, B.; Sherman, S.; Cooper, D.; Braunstein, G.; et al. A Consensus Report of the Role of Serum Thyroglobulin as a Monitoring Method for Low-Risk Patients with Papillary Thyroid Carcinoma. J. Clin. Endocrinol. Metab. 2003, 88, 1433–1441. [Google Scholar] [CrossRef]
- Santhanam, P.; Ladenson, P. Surveillance for Differentiated Thyroid Cancer Recurrence. Endocrinol. Metab. Clin. N. Am. 2019, 48, 239–252. [Google Scholar] [CrossRef]
- Pandya, A.; Caoili, E.; Jawad-Makki, F.; Wasnik, A.; Shankar, P.R.; Bude, R.; Haymart, M.; Davenport, M. Limitations of the 2015 ATA Guidelines for Prediction of Thyroid Cancer: A Review of 1947 Consecutive Aspirations. J. Clin. Endocrinol. Metab. 2018, 103, 3496–3502. [Google Scholar] [CrossRef]
- Toraih, E.; Fawzy, M.; Hussein, M.; EL-Labban, M.; Ruiz, E.M.L.; Attia, A.-E.-A.; Halat, S.; Moroz, K.; Errami, Y.; Zerfaoui, M.; et al. MicroRNA-Based Risk Score for Predicting Tumor Progression Following Radioactive Iodine Ablation in Well-Differentiated Thyroid Cancer Patients: A Propensity-Score Matched Analysis. Cancers 2021, 13, 4649. [Google Scholar] [CrossRef]
- Tuttle, R.; Tala, H.; Shah, J.; Leboeuf, R.; Ghossein, R.; Gonen, M.; Brokhin, M.; Omry, G.; Fagin, J.; Shaha, A. Estimating Risk of Recurrence in Differentiated Thyroid Cancer after Total Thyroidectomy and Radioactive Iodine Remnant Ablation: Using Response to Therapy Variables to Modify the Initial Risk Estimates Predicted by the New American Thyroid Association Staging System. Thyroid. Off. J. Am. Thyroid. Assoc. 2010, 20, 1341–1349. [Google Scholar] [CrossRef]
- Li, L.-R.; Du, B.; Liu, H.-Q.; Chen, C. Artificial Intelligence for Personalized Medicine in Thyroid Cancer: Current Status and Future Perspectives. Front. Oncol. 2021, 10, 604051. [Google Scholar] [CrossRef]
- Paul, R.; Juliano, A.; Faquin, W.; Chan, A.W. An Artificial Intelligence Ultrasound Platform for Screening and Staging of Thyroid Cancer. Int. J. Radiat. Oncol. Biol. Phys. 2022, 112, e8. [Google Scholar] [CrossRef]
- Nagendra, L.; Pappachan, J.M.; Fernandez, C.J. Artificial Intelligence in the Diagnosis of Thyroid Cancer: Recent Advances and Future Directions. Artif. Intell. Cancer 2023, 4, 1–10. [Google Scholar] [CrossRef]
- Habchi, Y.; Himeur, Y.; Kheddar, H.; Boukabou, A.; Atalla, S.; Chouchane, A.; Ouamane, A.; Mansoor, W. AI in Thyroid Cancer Diagnosis: Techniques, Trends, and Future Directions. Systems 2023, 11, 519. [Google Scholar] [CrossRef]
- Ahn, J.; Lee, M.-C. Application of Artificial Intelligence to Evaluate Thyroid Nodules. J. Clin. Otolaryngol. Head Neck Surg. 2023, 34, 17–22. [Google Scholar] [CrossRef]
- Cao, C.-L.; Li, Q.; Tong, J.; Shi, L.; Li, W.-X.; Xu, Y.; Cheng, J.; Du, T.-T.; Li, J.; Cui, X. Artificial Intelligence in Thyroid Ultrasound. Front. Oncol. 2023, 13, 1060702. [Google Scholar] [CrossRef]
- Kim, S.Y.; Kim, Y.I.; Kim, H.J.; Chang, H.; Kim, S.M.; Lee, Y.S.; Kwon, S.S.; Shin, H.; Chang, H.S.; Park, C.S. New approach of prediction of recurrence of thyroid cancer patients using machine learning. Medicine 2021, 100, e27493. [Google Scholar] [CrossRef]
- Grani, G.; Gentili, M.; Siciliano, F.; Albano, D.; Zilioli, V.; Morelli, S.; Puxeddu, E.; Zatelli, M.C.; Gagliardi, I.; Piovesan, A.; et al. A data-driven approach to refine predictions of differentiated thyroid cancer outcomes: A prospective multicenter study. J. Clin. Endocrinol. Metab. 2023, 108, 1921–1928. [Google Scholar] [CrossRef] [PubMed]
- Clark, E.; Price, S.; Lucena, T.; Haberlein, B.; Wahbeh, A.; Seetan, R. Predictive Analytics for Thyroid Cancer Recurrence: A Machine Learning Approach. Knowledge 2024, 4, 557–570. [Google Scholar] [CrossRef]
- Park, Y.M.; Lee, B.-J. Machine learning-based prediction model using clinico-pathologic factors for papillary thyroid carcinoma recurrence. Sci. Rep. 2021, 11, 4948. [Google Scholar] [CrossRef]
- Firat Atay, F.; Yagin, F.H.; Colak, C.; Elkiran, E.T.; Mansuri, N.; Ahmad, F.; Ardigò, L.P. A hybrid machine learning model combining association rule mining and classification algorithmsto predict differentiated thyroid cancer recurrence. Front. Med. 2024, 11, 1461372. [Google Scholar] [CrossRef] [PubMed]
- Kim, G.H.; Lee, D.H.; Choi, J.W.; Jeon, H.J.; Park, S. Mulitmodal Neural Network for Recurrence Prediction of Papillary Thyroid Carcinoma. Adv. Intell. Syst. 2023, 5, 2200365. [Google Scholar] [CrossRef]
- Arslan, A.K.; Colak, C. Explainable Machine Learning Models for Prediting Recurrence in Differentiated Thyroid Cancer. Med. Rec. 2024, 6, 468–473. [Google Scholar] [CrossRef]
- Gurcan, F.; Soylu, A. Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis. Cancers 2024, 16, 3417. [Google Scholar] [CrossRef]
- Thomas, J. Predicting Differentiated Thyroid Cancer Outcomes Using Machine Learning: A Move toward Precision Medicine. Clin. Thyroidol. 2024, 36, 64–66. [Google Scholar] [CrossRef]
- Gu, J.; Xie, R.; Zhao, Y.; Zhao, Z.; Xu, D.; Ding, M.; Lin, T.; Xu, W.; Nie, Z.; Miao, E.; et al. A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer. Front. Oncol. 2022, 12, 938292. [Google Scholar] [CrossRef]
- Mourad, M.; Moubayed, S.; Dezube, A.; Mourad, Y.; Park, K.; Torreblanca-Zanca, A.; Torrecilla, J.S.; Cancilla, J.C.; Wang, J. Machine Learning and Feature Selection Applied to SEER Data to Reliably Assess Thyroid Cancer Prognosis. Sci. Rep. 2020, 10, 5176. [Google Scholar] [CrossRef] [PubMed]
- Setiawan, K.E. Predicting recurrence in differentiated thyroid cancer: A comparative analysis of various machine learning models including ensemble methods with chi-squared feature selection, Commun. Math. Biol. Neurosci. 2024, 2024, 55. [Google Scholar]
- Sibarani, I.J.B.; Loy, K.M.; Surharjito, S. Enhancing Predictive Accuracy for Differentiated Thyroid Cancer (DTC) Recurrence Through Advanced Data Mining Techniques. TIN Terap. Inform. Nusant. 2024, 5, 11–22. [Google Scholar] [CrossRef]
- Xi, N.M.; Wang, L.; Yang, C. Improving the diagnosis of thyroid cancer by machine learning and clinical data. Sci. Reports. 2022, 12, 11143. [Google Scholar] [CrossRef]
- Ahmad, M.A.; Haddad, J. An Explainable AI Model for Predicting the Recurrence of Differentiated Thyroid Cancer. In Proceedings of the 2024 Second Jordanian International Biomedical Engineering Conference (JIBEC), Amman, Jordan, 27–28 November 2024. [Google Scholar]
- Bharath, K.; Sai Sabatha, A. Predicting Recurrence in Differentiated Thyroid Cancer: A Machine Learning Approach. In Proceedings of the International Conference on Advances in Data Engineering and Intelligent Computing Systems, Chennai, India, 18–19 April 2024. [Google Scholar]
- Arvidsson, J. Differentiated Thyroid Cancer Recurrence. 2024. Available online: https://www.kaggle.com/datasets/joebeachcapital/differentiated-thyroid-cancer-recurrence (accessed on 7 May 2025).
- Stfxecutables Df-Analyze: AutoML Command-Line Tool 2024. Available online: https://github.com/stfxecutables/df-analyze (accessed on 7 May 2025).
- Levman, J.; Jennings, M.; Rouse, E.; Berger, D.; Kabaria, P.; Nangaku, M.; Gondra, I.; Takahashi, E. A Morphological Study of Schizophrenia with Magnetic Resonance Imaging, Advanced Analytics, and Machine Learning. Front. Neurosci. 2022, 16, 926426. [Google Scholar] [CrossRef] [PubMed]
- Saville, K.; Berger, D.; Levman, J. Mitigating Bias Due to Race and Gender in Machine Learning Predictions of Traffic Stop Outcomes. Information 2024, 15, 687. [Google Scholar] [CrossRef]
- Figueroa, J.; Etim, P.; Shibu, A.; Berger, D.; Levman, J. Diagnosing and Characterizing Chronic Kidney Disease with Machine Learning: The Value of Clinical Patient Characteristics as Evidenced from an Open Dataset. Electronics 2024, 13, 4326. [Google Scholar] [CrossRef]
- Huang, X.; Gauthier, C.; Berger, D.; Cai, H.; Levman, J. Identifying Cortical Molecular Biomarkers Potentially Associated with Learning in Mice Using Artificial Intelligence. Int. J. Mol. Sci. 2025, 26, 6878. [Google Scholar] [CrossRef] [PubMed]
- Kendall, J.; Gaspar, G.; Berger, D.; Levman, J. Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography 2025, 11, 90. [Google Scholar] [CrossRef]
- Joseph, M.; Raj, H. GANDALF: Gated adaptive network for deep automated learning of features. arXiv 2022, arXiv:2207.08548. [Google Scholar]
- Berger, D. Redundancy-Aware Feature Selection. Available online: https://github.com/stfxecutables/df-analyze/tree/experimental?tab=readme-ov-file#redundancy-aware-feature-selection-new (accessed on 23 December 2024).
Measurement Name | Values with Counts and Proportions (%) |
---|---|
Gender | Female (312, 81.5%), Male (71, 18.5%) |
Smoking | No (334, 82.2%), Yes (49, 12.8%) |
History (Hx) of Smoking | No (355, 92.7%), Yes (28, 7.3%) |
History of Radiotherapy | No (376, 98.2%), Yes 7 (1.8%) |
Thyroid Function | Euthyroid (332, 86.7%), Clinical Hyperthyroidism (20, 5.2%), Subclinical Hypothyroidism (14, 3.7%), Clinical Hypothyroidism (12, 3.1%), Subclinical Hyperthyroidism (5, 1.3%) |
Physical Examination | Multinodal goiter (140, 36.6%), single nodular goiter—right (140, 36.6%), single nodular goiter—left (89, 23.2%), normal (7, 1.8%), diffuse goiter (7, 1.8%) |
Adenopathy | No (277, 72.3%), Right (48, 12.5%), Bilateral (32, 8.7%), Left (17, 4.4%), Extensive (7, 1.8%), Posterior (2, 0.5%) |
Pathology | Papillary (287, 74.9%), micropapillary (48, 12.5%), follicular (28, 7.3%), Hurthel cell (20, 5.2%) |
Focality | Uni-Focal (247, 64.4%), Multi-Focal (136, 35.5%) |
Risk | Low (246, 65.0%), Medium (102, 26.6%), High (32, 8.4%) |
T—Tumor | T1a (49, 12.8%), T1b (43, 11.2%), T2 (151, 39.4%), T3a (96, 25.1%), T3b (16, 4.2%), T4a (20, 5.2%), T4b (8, 2.1%) |
N—Node | N0 (268, 70.0%), N1b (93, 24.3%), N1a (22, 5.7%) |
M—Metastasis | M0 (365, 95.3%), M1 (18, 4.7%) |
Stage | I (333, 87.0%), II (32, 8.4%), III (4, 1.0%), IVA (3, 0.8%), IVB (11, 2.9%) |
Response | Biochemical Incomplete (23, 6.0%), Excellent (208, 54.2%), Indeterminate (61, 15.9%), Structural Incomplete (91, 23.7%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Penner, M.A.; Berger, D.; Guo, X.; Levman, J. Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction. Appl. Sci. 2025, 15, 9397. https://doi.org/10.3390/app15179397
Penner MA, Berger D, Guo X, Levman J. Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction. Applied Sciences. 2025; 15(17):9397. https://doi.org/10.3390/app15179397
Chicago/Turabian StylePenner, Matthew A., Derek Berger, Xuchen Guo, and Jacob Levman. 2025. "Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction" Applied Sciences 15, no. 17: 9397. https://doi.org/10.3390/app15179397
APA StylePenner, M. A., Berger, D., Guo, X., & Levman, J. (2025). Machine Learning in Differentiated Thyroid Cancer Recurrence and Risk Prediction. Applied Sciences, 15(17), 9397. https://doi.org/10.3390/app15179397