A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data

Andreou, Kypros; Georgakopoulos, Eleftherios; Toufexis, Costas; Papaloizou, Nikos L.; Exarchos, Themis P.; Vlamos, Panagiotis; Krokidis, Marios G.

doi:10.3390/biomedinformatics6030034

Open AccessArticle

A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data

by

Kypros Andreou

¹,

Eleftherios Georgakopoulos

²,

Costas Toufexis

³,

Nikos L. Papaloizou

⁴,

Themis P. Exarchos

¹,

Panagiotis Vlamos

¹

and

Marios G. Krokidis

^1,*

¹

Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49 100 Corfu, Greece

²

FGA Center, 106 73 Athens, Greece

³

Hippocrateon Private Hospital, 2408 Nicosia, Cyprus

⁴

Clinical Laboratories Nikos Papaloizou, 2540 Dali, Cyprus

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2026, 6(3), 34; https://doi.org/10.3390/biomedinformatics6030034

Submission received: 16 April 2026 / Revised: 25 May 2026 / Accepted: 29 May 2026 / Published: 2 June 2026

(This article belongs to the Section Applied Biomedical Data Science)

Download

Browse Figures

Versions Notes

Abstract

The increasing prevalence of autoimmune thyroid diseases and thyroid cancer highlights the urgent need for improved diagnostic support approaches. Traditional diagnostic methods often rely primarily on biochemical markers or qualitative imaging evaluations, which may delay accurate disease identification and hinder timely treatment. The present study demonstrates that machine learning models integrating biochemical, demographic, and ultrasound data achieve strong classification performance for thyroid disorder identification. Tree-based algorithms, such as XGBoost and Random Forest, demonstrated strong performance, while deep learning models achieved high accuracy in imaging-based classification tasks. Although the results highlight the potential of multi-source data-driven approaches to support clinical decision-making, performance variability indicates the need for validation on larger and more diverse datasets. Future work should focus on expanding data sources, incorporating additional biomarkers, and improving model interpretability to facilitate clinical translation.

Keywords:

thyroid diseases; machine learning; early diagnosis; ultrasound imaging

1. Introduction

Thyroid diseases represent a major global health concern, with a notable rise in autoimmune thyroid disorders and thyroid cancer reported in recent decades [1,2]. The thyroid gland, located at the base of the neck, plays a crucial role in maintaining metabolic homeostasis through the secretion of triiodothyronine (T3) and thyroxine (T4), regulated by thyroid-stimulating hormone (TSH). These hormones control essential physiological functions such as metabolism, heart rate, thermoregulation, and energy expenditure [3]. Dysregulation in their production leads to common endocrine disorders, including hypothyroidism and hyperthyroidism, which can significantly affect quality of life and overall health.

Hypothyroidism results from insufficient secretion of thyroid hormones, leading to decreased metabolic activity and symptoms such as fatigue, cold intolerance, and weight gain [4]. In contrast, hyperthyroidism results from excessive hormone release, accelerating metabolism and causing clinical manifestations such as palpitations, anxiety, and unintended weight loss [5,6]. Among autoimmune thyroid conditions, Hashimoto’s thyroiditis is the most prevalent cause of hypothyroidism and is characterized by lymphocytic infiltration and the presence of thyroid-specific autoantibodies [7], as well as mitochondrial dysfunction and reduced micronutrient bioavailability [8]. Conversely, Graves’ disease is the leading cause of hyperthyroidism and is associated with glandular hyperactivity, goiter formation, and, in some cases, ophthalmopathy [9]. In addition, thyroid nodules are highly common findings, affecting up to two-thirds of the population. While the majority are benign, a small but clinically significant proportion may indicate malignancy [10].

Although biochemical testing and ultrasound imaging constitute the cornerstone of thyroid disease evaluation, early and accurate diagnosis remains challenging. Overlapping clinical features, physiological variability between individuals, and limitations in subjective image interpretation can hinder precise clinical assessment. Furthermore, the increasing volume of diagnostic data generated in clinical practice highlights the need for automated systems and artificial intelligence-based approaches that can assist clinicians in synthesizing information and identifying subtle patterns not easily detectable by humans [11].

In recent years, machine learning and deep learning have emerged as powerful tools in medical research, demonstrating significant potential in disease prediction, image interpretation, and clinical decision support [12]. These methods are particularly well suited to endocrine disorders, where diagnosis often requires the integration of heterogeneous data types, including hormone measurements, antibody profiles, demographic factors, and ultrasound imaging characteristics [13,14]. The incorporation of heterogeneous data into computational models may enable a more comprehensive understanding of thyroid disorders and help overcome the limitations of traditional diagnostic approaches. Herein, we investigate the implementation and evaluation of machine learning models for the classification of thyroid disorders, integrating biochemical, demographic, and ultrasound imaging data. A range of algorithms was implemented, including Random Forest, Gradient Boosting, and deep learning architectures such as EfficientNet-B0-based CNN and U-Net, and their performance was evaluated using ROC AUC, F1-score, and accuracy metrics. The findings suggest that machine learning may support clinical decision-making in thyroid disease management.

2. Materials and Methods

2.1. Data Sources and Study Population

The study utilized three independent datasets comprising a total of 17,782 data instances across all sources. The first dataset consists of 17,412 thyroid ultrasound images with corresponding segmentation masks, obtained from the Center for Artificial Intelligence in Medicine and Imaging at Stanford University (AIMI) thyroid ultrasound cine-clip repository (https://aimi.stanford.edu/datasets/thyroid-ultrasound-cine-clip, accessed on 20 June 2025).

The second dataset includes 180 samples related to Graves’ disease, comprising 89 confirmed patient cases and 91 healthy control subjects [15]. (https://doi.org/10.6084/m9.figshare.20762900, accessed on 5 July 2025) (https://doi.org/10.6084/m9.figshare.5768562, accessed on 5 July 2025).

The third dataset comprises 190 Hashimoto’s thyroiditis, including 94 confirmed patient cases and 96 healthy control subject cases, provided by collaborating endocrinology specialists, along with associated biochemical and clinical variables including TSH, T3, T4, and thyroid-specific autoantibodies.

For both the Graves’ and Hashimoto’s datasets, an approximately balanced cohort was constructed by selecting an equal or near-equal number of disease cases and healthy control subjects from the available source data. No synthetic data generation or algorithmic class rebalancing techniques (e.g., oversampling, undersampling, or class weighting) were applied; class balance was achieved solely through dataset curation.

All data were fully anonymized with no personal identifiable information retained. Each dataset was used independently for task-specific model development, including ultrasound segmentation, Graves’ disease classification, and Hashimoto’s disease analysis.

2.2. Data Preprocessing

2.2.1. Structured Clinical and Endocrinology Data

The structured clinical datasets corresponding to Hashimoto’s thyroiditis and Graves’ disease comprised numerical laboratory measurements alongside categorical demographic and clinical characteristics. To prepare the attributes for algorithmic processing, categorical variables were converted into a binary format using Boolean encoding. For both patient groups, the datasets were partitioned using a reproducible randomized split, allocating 70% of the observations for model training and the remaining 30% for independent validation and testing to evaluate baseline generalizability.

Predictive modeling was executed and compared across four distinct machine learning architectures: Decision Trees, Random Forests, Gradient Boosting, and XGBoost classifiers. For the Hashimoto’s thyroiditis patient population, the models were initialized utilizing standardized baseline hyperparameters; specifically, the Gradient Boosting model was configured with 100 estimators, a learning rate of 0.1, and a maximum tree depth of 3, while the Random Forest architecture utilized 100 estimators.

To accommodate the underlying complexity found within the Graves’ disease patient collective, hyperparameter configurations were tailored to optimize decision boundaries. The Gradient Boosting Classifier was trained using 100 estimators, a learning rate of 0.1, and an expanded maximum depth of 5. The corresponding Random Forest Classifier was structured with 100 estimators, a maximum tree depth of 10, and a minimum sample split threshold of 3. To ensure deterministic behavior, algorithmic reproducibility, and a reliable comparative baseline across the Decision Tree, Random Forest, Gradient Boosting, and XGBoost implementations, a fixed random state of 42 was strictly enforced during the dataset splitting and training phases.

Figure 1 describes the workflow of the machine learning models for Hashimoto’s and Graves’ disease prediction.

2.2.2. Ultrasound Image Data

To classify and segment thyroid anomalies, data were utilized to train both a Convolutional Neural Network (CNN) for classification and a U-Net architecture for region segmentation. For the nodule classification task, the Stanford Thyroid Ultrasound Dataset was employed to distinguish between benign and malignant nodules based on the Thyroid Imaging Reporting and Data System (TI-RADS) scoring system. Nodules labeled as TI-RADS 2 and 3 were categorized as benign, while those labeled TI-RADS 4 and 5 were categorized as malignant.

The original dataset for the classification network contained 11,154 ultrasound images across 124 patients. To minimize class imbalance and enhance the model’s ability to generalize, class-balancing techniques were applied, yielding a final classification subset of 95 patients and 4750 images. To prevent individual patients from dominating this dataset, a maximum threshold of 50 images per patient was enforced. Dataset partitioning for classification was strictly organized by patient ID, ensuring that all images from any single patient were assigned exclusively to either the training, validation, or testing set to avoid data leakage.

For the U-Net segmentation task, a broader cohort consisting of 146 patients was utilized. The dataset was partitioned using an 85% to 15% split for training and testing, respectively. This split allocated 124 patients with 10,561 images to the training pipeline, while the remaining 22 patients with 2554 images were reserved exclusively for independent testing.

Prior to model training, all thyroid ultrasound images underwent a standardized preprocessing pipeline. Using the OpenCV library, images were cropped to isolate the region of interest (ROI) and eliminate extraneous background details.

Subsequent dimensions were tailored to the target network architectures: images were resized to 256 × 256 pixels for the primary classification models, whereas frames mapped to the U-Net segmentation model were resized to 128 × 128 pixels. Following resizing, pixel intensity values across all subsets were normalized to a continuous range of [0, 1].

To enhance dataset diversity, increase variance, and mitigate overfitting within the training sets, identical data augmentation pipelines were executed for both models. These transformations included horizontal and vertical flipping, rotation, and minor spatial translations, ensuring both architectures remain robust against clinical variations in unseen images.

The EfficientNet-B0 architecture was selected for the binary classification task and implemented via transfer learning. To leverage pre-trained features while adapting to specific thyroid pathology, the first 20 layers of the network were frozen, leaving the remaining 70 layers unfrozen for fine-tuning. Training for the classification network was conducted over 40 epochs with a learning rate of 1 × 10³. To enforce regularization and prevent overfitting, a dropout rate of 0.7 was applied alongside an early stopping mechanism configured with a patience of 10 epochs.

Concurrently, the specialized U-Net architecture was trained strictly for the pixel-level segmentation of thyroid structures. Accommodating the larger framewise dimensions of the segmentation dataset, the U-Net model was trained over a duration of 5 epochs to map the ground-truth annotations effectively without inducing computational over-saturation.

3. Results and Discussion

3.1. Hashimoto’s Disease

For Hashimoto’s disease, the Random Forest algorithm achieved the highest performance, with an F1-score, precision, recall, and accuracy of 96%, demonstrating strong capability in distinguishing affected individuals from healthy controls. The XGBoost and Gradient Boosting models also performed well, with F1-scores of 93% and 91%, respectively. The Decision Tree achieved moderate results with 88% accuracy. According to the AUC scores, the Random Forest and XGBoost models both reached 0.98, confirming their excellent sensitivity and specificity (Table S1).

The confusion matrix for the XGBOOST algorithm reveals 26 correctly classified Hashimoto cases and 26 correctly classified healthy individuals, with only 3 false positives and 2 false negatives (Figure 2A). Overall, these findings indicate that tree-based and boosting algorithms particularly Random Forest and XGBoost are the most effective approaches for classifying Hashimoto’s thyroiditis, capturing complex relationships in biochemical and clinical data more accurately than linear or distance-based models.

Recent work has moved beyond the classical classification of Hashimoto’s thyroiditis and explored more specific subtypes using machine learning. A 2024 study proposed a non-invasive ML model to distinguish IgG4-related Hashimoto’s thyroiditis from the non-IgG4 form. Using serological markers, especially IgG4-specific TgAb and TPOAb, the researchers tested Logistic Regression, SVM, and Random Forest. Random Forest achieved the best performance, reaching an AUC of 0.87–0.92, outperforming traditional statistical approaches [16]. Although this study focuses on subtype differentiation rather than general disease detection, its findings align the present study: Random Forest models consistently outperform linear and distance-based algorithms. This highlights the strength of tree-based ensembles in capturing complex immunological patterns and confirms their suitability for both broad thyroid disease prediction and finer subtype analysis [16].

These observations are further supported by the biochemical levels’ optimization (Figures S1–S4). Bar chart comparisons show that TPOAb and TgAb levels present the largest separation between Hashimoto’s cases and controls (Figures S3 and S4), while TSH and FT4 also contribute significantly to differentiating patients from healthy individuals (Figures S2 and S3). The SHAP analysis additionally confirms that antibody-related markers, together with FT4 and age, exhibit the highest feature importance across the best-performing models (Figure 2B). This finding aligns with established clinical knowledge, as elevated levels of these autoantibodies are characteristic markers of autoimmune thyroiditis [17]. Conversely, low FT4 values (blue dots) also contribute to a higher likelihood of disease prediction, reflecting the hypothyroid state typically observed in Hashimoto’s disease.

3.2. Graves’ Disease

For Graves’ disease, the XGBoost and Random Forest algorithms achieved the highest performance, with accuracy, precision, recall and F1-score all equal to 98%, and AUC scores of 1.00, indicating exceptional sensitivity and specificity. The Decision Tree model also performed comparably well, with identical metric values and an AUC of 0.98, confirming its strong diagnostic capability. Gradient Boosting algorithm demonstrated slightly lower but still satisfactory results, with accuracy value of 94% and AUC score of 0.95 (Table S2).

The confusion matrix for the Random Forest algorithm (Figure 3A) shows 25 correctly classified cases with Graves’ disease and 28 correctly classified healthy individuals, with 0 false positives and 1 false negative result.

These findings demonstrate that tree-based, boosting, and linear classifiers are highly effective in distinguishing individuals with Graves’ disease from healthy controls, with ensemble and boosting techniques providing particularly stable and accurate predictions.

In contrast to Duan et al., who focused on predicting post-radioiodine (RAI) hypothyroidism in patients with Graves’ disease using a multivariate logistic model derived from 138 clinical and laboratory variables (AUROC = 0.74, F1 = 0.74) [18], the present work focuses on the classification of Graves’ disease itself using multiple machine learning models. While the previous study emphasized post-treatment prognosis based on biochemical and demographic predictors, our approach utilized pre-treatment diagnostic data to distinguish patients with Graves’ disease from healthy individuals. Our models, particularly XGBoost, Random Forest, Logistic Regression, and SVM, achieved high classification performance, with accuracy, precision, recall, and F1-scores reaching 98% and AUC values ranging from 0.94 to 1.00.

Together, both studies highlight the growing potential of machine learning in Graves’ disease management, ranging from early diagnosis to personalized post-treatment prediction, thereby demonstrating complementary applications across the diagnostic and prognostic spectrum.

These results are further validated by TSH, FT3 and FT4 measurements. The biomarker distribution plots clearly highlight the characteristic suppression of TSH (Figure S5) and elevation of FT3/FT4 in Graves’ disease (Figures S6 and S7). Correspondingly, SHAP visualizations demonstrate that these hormone levels are the strongest drivers of model predictions, as shown in Figure 3B. Low TSH values (blue dots) strongly contribute to the prediction of Graves’ disease, which is consistent with the classical suppression of TSH typically observed in this condition. In contrast, high FT3, FT4, and TRAb levels (red dots) substantially increase the likelihood of disease prediction, reflecting both the hyperthyroid state and the presence of stimulating autoantibodies characteristic of Graves’ disease. High TPOAb values also contribute to the prediction, albeit to a lesser extent. Additionally, age and gender show a noticeable influence, with women exhibiting a slightly higher predicted probability than men.

3.3. Classification of Thyroid Nodules and Nodule Segmentation

As depicted in Table 1, the CNN EfficientNet-B0 model demonstrated moderate performance in the classification of thyroid nodules, achieving an overall accuracy of 76%. In contrast, the U-Net model achieved a substantially higher performance level, with an overall accuracy of 94%, reflecting strong agreement between the predicted segmentation outputs and the ground truth annotations.

For the CNN EfficientNet-B0 model, the benign class achieved a precision of 85%, recall of 73%, and an F1-score of 78%. For the malignant class, the model obtained a precision of 66%, recall of 80%, and an F1-score of 73%. The higher recall observed for malignant nodules suggests that the model was more effective at identifying malignant cases, although this came at the cost of lower precision, indicating a higher number of false-positive predictions. Overall, the weighted average F1-score of 76% reflects balanced but still limited performance across both classes.

The confusion matrix further clarifies these results, showing that the model correctly classified 219 benign and 160 malignant nodules, while misclassifying 81 benign cases as malignant and 40 malignant cases as benign (Figure 4). This pattern highlights the model’s tendency to favor sensitivity for malignant cases, while struggling to consistently distinguish benign nodules.

Overall, these findings indicate that the CNN EfficientNet-B0 model can distinguish between benign and malignant thyroid nodules with reasonable effectiveness, although misclassifications remain evident, particularly in cases with overlapping imaging characteristics. In contrast, the U-Net model demonstrated superior performance in the segmentation task, achieving 94% accuracy and exhibiting stronger overall robustness in delineating thyroid nodule regions. As illustrated in Figure 5, the U-Net model accurately delineated the nodule boundary, with the predicted mask closely resembling the ground truth annotation. The Grad-CAM heatmap further confirms that the model’s attention was appropriately concentrated on the nodule region, as indicated by the high-intensity activations (red) overlaid on the corresponding anatomical area. This thereby provides more reliable spatial characterization compared to the classification-based CNN approach.

4. Conclusions

The present study demonstrates the application of machine learning techniques for the classification of thyroid disorders using integrated biochemical, imaging, and demographic data. The analysis incorporated not only thyroid hormone measurements (TSH, T3, and T4), but also demographic and clinical variables such as age, gender, and the presence of autoimmune conditions, which collectively contributed to model performance within the evaluated dataset. Tree-based machine learning algorithms, particularly XGBoost and Random Forest, achieved strong performance in distinguishing between Hashimoto’s thyroiditis and Graves’ disease. In addition, the convolutional neural network demonstrated effective performance in the classification of thyroid nodules from ultrasound images, enabling discrimination between benign and malignant cases within the study dataset. A U-Net architecture was also applied for thyroid nodule segmentation, allowing accurate delineation of regions of interest in ultrasound images. This work is a retrospective, proof-of-concept study and does not evaluate prospective clinical outcomes such as changes in diagnostic time, treatment decisions, or patient outcomes. Therefore, while the results highlight the potential of data-driven approaches for thyroid disorder classification, clinical applicability requires further validation in larger and more diverse cohorts, as well as prospective clinical studies. Future work will focus on expanding dataset size and diversity, incorporating additional relevant biomarkers, and exploring advanced deep learning approaches, including multimodal and transformer-based architectures. Further efforts will also aim to improve model interpretability and robustness to support future translational research in thyroid disease assessment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biomedinformatics6030034/s1, Figures S1–S4. Comparison between the average TSH, FT4, TgAb and TpoAb levels in men and women with Hashimoto; Figures S5–S7. Comparison between the average TSH, FT3 and FT4 levels in men and women with Graves; Table S1. Comparison of Algorithm Performance for Detecting Hashimoto’s Disease; Table S2. Comparison of Algorithm Performance for Detecting Graves’ Disease.

Author Contributions

Conceptualization, K.A. and M.G.K.; methodology, K.A. and M.G.K.; software, K.A.; validation, T.P.E. and M.G.K.; formal analysis, K.A.; investigation, K.A., resources, E.G., C.T., N.L.P.; data curation, K.A., E.G., C.T., N.L.P.; writing—original draft preparation, K.A. and M.G.K.; writing—review and editing, T.P.E. and P.V.; visualization, K.A.; supervision, M.G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Ionian University (protocol number 8812/27 May 2025) and the Bioethics Committee of the Republic of Cyprus (2025.01.90/21 March 2025).

Informed Consent Statement

Not applicable. The study used fully anonymized data for the development and evaluation of machine learning models.

Data Availability Statement

The data used in this study were obtained from publicly available repositories and anonymized clinical records. The publicly available datasets include the Stanford AMI thyroid ultrasound dataset and additional open-access datasets from Springer/Figshare repositories, which are accessible through their respective official platforms (links provided in the manuscript). The clinical dataset was collected from certified laboratories and endocrinology specialists and was fully anonymized prior to analysis in accordance with applicable data protection regulations. Due to patient privacy and ethical restrictions, these data are not publicly available. Access to the clinical dataset may be granted upon reasonable request to the corresponding author, subject to institutional approval and compliance with ethical and data protection requirements. The code is available in GitHub: https://github.com/kyprosantreou/Thyroid_Disorders_Classification (accessed on 25 May 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Song, M.; Sun, W.; Liu, Q.; Wang, Z.; Zhang, H. Global scientific trends on thyroid disease in early 21st century: A bibliometric and visualized analysis. Front. Endocrinol. 2024, 14, 1306232. [Google Scholar] [CrossRef] [PubMed]
Forma, A.; Kłodnicka, K.; Pająk, W.; Flieger, J.; Teresińska, B.; Januszewski, J.; Baj, J. Thyroid cancer: Epidemiology, classification, risk factors, diagnostic and prognostic markers, and current treatment strategies. Int. J. Mol. Sci. 2025, 26, 5173. [Google Scholar] [CrossRef] [PubMed]
Sabatino, L.; Vassalle, C. Thyroid hormones and metabolism regulation: Which role on brown adipose tissue and browning process? Biomolecules 2025, 15, 361. [Google Scholar] [CrossRef] [PubMed]
Jansen, H.I.; Boelen, A.; Heijboer, A.C.; Bruinstroop, E.; Fliers, E. Hypothyroidism: The difficulty in attributing symptoms to their underlying cause. Front. Endocrinol. 2023, 14, 1130661. [Google Scholar] [CrossRef] [PubMed]
Lee, S.Y.; Pearce, E.N. Hyperthyroidism: A review. JAMA 2023, 330, 1472–1483. [Google Scholar] [CrossRef] [PubMed]
Wiersinga, W.M.; Poppe, K.G.; Effraimidis, G. Hyperthyroidism: Aetiology, pathogenesis, diagnosis, management, complications, and prognosis. Lancet Diabetes Endocrinol. 2023, 11, 282–298. [Google Scholar] [CrossRef] [PubMed]
Ralli, M.; Angeletti, D.; Fiore, M.; D’Aguanno, V.; Lambiase, A.; Artico, M.; De Vincentiis, M.; Greco, A. Hashimoto’s thyroiditis: An update on pathogenic mechanisms, diagnostic protocols, therapeutic strategies, and potential malignant transformation. Autoimmun. Rev. 2020, 19, 102649. [Google Scholar] [CrossRef] [PubMed]
Sarandi, E.; Tsoukalas, D.; Rudofsky, G.; Fragoulakis, V.; Liapi, C.; Paramera, E.; Papakonstantinou, E.; Krueger Krasagakis, S.; Tsatsakis, A. Identifying the metabolic profile of Hashimoto’s thyroiditis from the METHAP clinical study. Sci. Rep. 2025, 15, 12410. [Google Scholar] [CrossRef] [PubMed]
Subekti, I.; Pramono, L.A. Current diagnosis and management of Graves’ disease. Acta Med. Indones. 2018, 50, 177–182. [Google Scholar] [PubMed]
Alexander, E.K.; Cibas, E.S. Diagnosis of thyroid nodules. Lancet Diabetes Endocrinol. 2022, 10, 533–539. [Google Scholar] [CrossRef] [PubMed]
Raza, A.; Eid, F.; Montero, E.C.; Noya, I.D.; Ashraf, I. Enhanced interpretable thyroid disease diagnosis by leveraging synthetic oversampling and machine learning models. BMC Med. Inform. Decis. Mak. 2024, 24, 364. [Google Scholar] [CrossRef] [PubMed]
Andrès, E.; Escobar, C.; Doi, K. Machine Learning and Artificial Intelligence in Clinical Medicine—Trends, Impact, and Future Directions. J. Clin. Med. 2025, 14, 8137. [Google Scholar] [CrossRef] [PubMed]
Gubbi, S.; Hamet, P.; Tremblay, J.; Koch, C.A.; Hannah-Shmouni, F. Artificial intelligence and machine learning in endocrinology and metabolism: The dawn of a new era. Front. Endocrinol. 2019, 10, 185. [Google Scholar] [CrossRef] [PubMed]
Rashed, A.; Medhat, T.; Elgarayhi, A. Enhancing automatic diagnosis of thyroid nodules from ultrasound scans leveraging deep learning models. Sci. Rep. 2025, 15, 40364. [Google Scholar] [CrossRef] [PubMed]
Pandiyan, B.; Merrill, S.J.; Di Bari, F.; Antonelli, A.; Benvenga, S. A patient-specific treatment model for Graves’ hyperthyroidism. Theor. Biol. Med. Model. 2018, 15, 1. [Google Scholar] [CrossRef] [PubMed]
Zhao, C.; Sun, Z.; Yu, Y.; Lou, Y.; Liu, L.; Li, G.; Liu, J.; Chen, L.; Zhu, S.; Huang, Y.; et al. A machine learning-based diagnosis modeling of IgG4 Hashimoto’s thyroiditis. Endocrine 2024, 86, 672–681. [Google Scholar] [CrossRef] [PubMed]
Tywanek, E.; Michalak, A.; Świrska, J.; Zwolak, A. Autoimmunity, new potential biomarkers and the thyroid gland—The perspective of hashimoto’s thyroiditis and its treatment. Int. J. Mol. Sci. 2024, 25, 4703. [Google Scholar] [CrossRef] [PubMed]
Duan, L.; Zhang, H.Y.; Lv, M.; Zhang, H.; Chen, Y.; Wang, T.; Li, Y.; Wu, Y.; Li, J.; Li, K. Machine learning identifies baseline clinical features that predict early hypothyroidism in patients with Graves’ disease after radioiodine therapy. Endocr. Connect. 2022, 11, e220119. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flow of the machine learning models.

Figure 2. Model performance and interpretability for Hashimoto’s thyroiditis. (A) Confusion matrix. (B) SHAP visualization.

Figure 3. Model performance and interpretability for Graves’ disease. (A) Confusion matrix. (B) SHAP visualization.

Figure 4. Confusion Matrix for CNN model.

Figure 5. A sample of U-Net thyroid nodule segmentation result.

Table 1. Classification performance metrics for thyroid nodule categorization of CNN model.

Class	Precision	Recall	F1-Score	Support
CNN-Efficient Net B0
Benign	0.85	0.73	0.78	300
Malignant	0.66	0.80	0.73	200
Accuracy			0.76	500
Macro Avg	0.75	0.77	0.75	500
Weighted Avg	0.77	0.76	0.76	500

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Andreou, K.; Georgakopoulos, E.; Toufexis, C.; Papaloizou, N.L.; Exarchos, T.P.; Vlamos, P.; Krokidis, M.G. A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data. BioMedInformatics 2026, 6, 34. https://doi.org/10.3390/biomedinformatics6030034

AMA Style

Andreou K, Georgakopoulos E, Toufexis C, Papaloizou NL, Exarchos TP, Vlamos P, Krokidis MG. A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data. BioMedInformatics. 2026; 6(3):34. https://doi.org/10.3390/biomedinformatics6030034

Chicago/Turabian Style

Andreou, Kypros, Eleftherios Georgakopoulos, Costas Toufexis, Nikos L. Papaloizou, Themis P. Exarchos, Panagiotis Vlamos, and Marios G. Krokidis. 2026. "A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data" BioMedInformatics 6, no. 3: 34. https://doi.org/10.3390/biomedinformatics6030034

APA Style

Andreou, K., Georgakopoulos, E., Toufexis, C., Papaloizou, N. L., Exarchos, T. P., Vlamos, P., & Krokidis, M. G. (2026). A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data. BioMedInformatics, 6(3), 34. https://doi.org/10.3390/biomedinformatics6030034

Article Menu

A Machine Learning and Deep Learning Approach for the Classification of Thyroid Disorders Using Multi-Source Clinical Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources and Study Population

2.2. Data Preprocessing

2.2.1. Structured Clinical and Endocrinology Data

2.2.2. Ultrasound Image Data

3. Results and Discussion

3.1. Hashimoto’s Disease

3.2. Graves’ Disease

3.3. Classification of Thyroid Nodules and Nodule Segmentation

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI