ROI+Context Graph Neural Networks for Thyroid Nodule Classification: Baselines, Cross-Validation Protocol, and Reproducibility
Abstract
1. Introduction
2. Related Work
- Thyroid ultrasound datasets
- Dataset choice and class imbalance
- Thyroid ultrasound methods
- ROI-based classifiers
- General machine-learning considerations
- Graph neural networks in medical imaging
3. Materials and Methods
3.1. Method Selection: Alternatives and Rationale
- Rationale for a graph formulation
3.2. Why GNNs for Peri-Lesional Context?
3.3. Evaluation Protocol
3.4. Dataset
3.5. Preprocessing and Embeddings
3.6. Graph Construction and Readout
- Context sampling
- Node features
- ROI node as the anchor
- Edges and attributes
- Encoder and readout
- Reproducibility
3.7. Training Setup
3.8. Metrics
3.9. Calibration
4. Experiments and Results
4.1. Single-Split Baseline
4.2. Cross-Validation Performance
- Comparison to non-graph TN5000 classifiers
- Adding a threshold-based F1 for comparability
- Notes on comparability. Direct numeric comparison to Table 5 should be interpreted cautiously because prior works may differ in preprocessing, augmentation (including synthetic data), class rebalancing, and evaluation procedures, and they often report threshold-based metrics (accuracy/F1/sensitivity/specificity) rather than AUROC/AUPRC. Our results are reported as mean cross-validation estimates (5-fold × 3-seed; 15 runs) using AUC-focused metrics that are less sensitive to a single operating threshold under class imbalance.
| Metric | Mean ± SD | 95% CI (Half-Width) |
|---|---|---|
| AUROC | 0.906 ± 0.008 | ±0.004 |
| AUPRC | 0.954 ± 0.006 | ±0.003 |
| Sensitivity @ Specificity ≥ 0.90 | 0.707 ± 0.035 | ±0.018 |
| Specificity @ Sensitivity ≥ 0.90 | 0.735 ± 0.019 | ±0.010 |
| Study (Method) | Model/Core Idea | Evaluation Protocol | Reported Performance | F1 (Reported/maxF1) |
|---|---|---|---|---|
| Bahmane et al. [40] (Hybrid CNN) | EfficientNet-B3 with SE/residual refinement; synthetic benign sample generation to mitigate imbalance | TN5000; GAN-based augmentation and class rebalancing; single-study evaluation | Acc 89.73%, Sens 90.01%, Prec 88.23% | 88.85% |
| Sujini et al. [41] (ViT+WGAN-GP) | Vision Transformer feature extractor combined with WGAN-GP data augmentation | TN5000; augmentation-driven training; single-study evaluation | Acc 96.8%, Sens 97.3%, Spec 96.4% | 96.5% |
| Ours (ROI+context GNN) | ResNet-50 ROI/context embeddings with GraphSAGE encoder and attention readout | Official TN5000 split; 5-fold × 3-seed cross-validation (15 runs) | AUROC 0.906, AUPRC 0.954 (mean) | maxF1 0.922 ± 0.001 |
| Variant | AUROC | AUPRC |
|---|---|---|
| Full (ROI + k = 8 context + geometry) | 0.906 | 0.954 |
| No geometry (ROI + k = 8 context; visual only) | 0.900 | 0.949 |
| Fewer context nodes (ROI + k = 4 context + geometry) | 0.903 | 0.952 |
| ROI-only (k = 0; no context nodes) | 0.892 | 0.944 |
4.3. Ablation Study
- These results suggest that both adding context nodes and providing explicit geometry contribute to discrimination, with the largest drop observed when removing peri-lesional context entirely (k = 0). In the final submission, we will report the same ablation under the identical 5-fold × 3-seed protocol as Table 4.
- Single-run validation snapshot
4.4. Calibration and Operating Points
- Notes
- Interpretation
- Fold-5 degradation analysis
- Interpretation
5. Discussion
- Clinical relevance of relational reasoning
- Role of an ROI-centric design
- Advantages of the GNN architecture
- Cross-validation and robustness
- Calibration and decision support
Limitations
6. Conclusions
- Summary and significance. This work presents a compact yet expressive ROI+context Graph Neural Network for thyroid ultrasound classification, motivated by the observation that malignancy assessment depends not only on intra-nodular appearance but also on structured peri-lesional cues. By representing each image as a small graph anchored at the lesion ROI and augmented with deterministic context nodes, the proposed model explicitly captures spatial relationships that are typically implicit or ignored in CNN- and MIL-based pipelines. Across both the official TN5000 single-split and a rigorous 5-fold × 3-seed cross-validation protocol, the method achieves strong and stable discrimination, indicating that relational modeling can improve robustness without resorting to complex segmentation or heavy architectural overhead.
- Reliability and clinical relevance. Beyond discrimination, we emphasize probability reliability through comprehensive calibration analysis. The observed improvements in NLL, Brier score, and ECE after temperature scaling demonstrate that the proposed GNN yields well-calibrated malignancy probabilities, a critical requirement for decision-support scenarios where thresholds directly influence biopsy or follow-up recommendations. The attention-based readout further provides a degree of interpretability by highlighting which peri-lesional regions contribute most to the final decision, aligning the model’s behavior with radiologist reasoning patterns.
- Dataset choice. We selected TN5000 as it aligns well with an ROI+context graph formulation (annotation quality, preserved peri-lesional tissue, and heterogeneity); the detailed rationale is provided in Section 3 under the Dataset subsection.
- Outlook. Together, these results suggest that graph-based image-level representations offer a promising and practical direction for thyroid ultrasound CAD. The proposed ROI+context GNN serves as a strong baseline that bridges classical ROI classifiers and more complex holistic models, while remaining reproducible and calibration-aware. Future work will explore external validation, alternative graph topologies, and multi-modal extensions incorporating clinical metadata, with the goal of further strengthening the role of relational learning in ultrasound-based decision support.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AUROC | Area Under the Receiver Operating Characteristic Curve |
| AUPRC | Area Under the Precision–Recall Curve |
| CAD | Computer-Aided Diagnosis |
| CNN | Convolutional Neural Network |
| CV | Cross-Validation |
| ECE | Expected Calibration Error |
| FNA | Fine-Needle Aspiration |
| GNN | Graph Neural Network |
| MIL | Multiple Instance Learning |
| NLL | Negative Log-Likelihood |
| RNG | Random Number Generator |
| ROI | Region of Interest |
| TI-RADS | Thyroid Imaging Reporting and Data System |
| US | Ultrasound |
Appendix A. Implementation and Reproducibility Details
- Hardware and runtime environment
- Random seeds and determinism
- Software
References
- Tessler, F.N.; Middleton, W.D.; Grant, E.G. Thyroid imaging reporting and data system (TI-RADS): A user’s guide. Radiology 2018, 287, 29–36. [Google Scholar] [CrossRef]
- Boers, T.; Braak, S.J.; Rikken, N.E.; Versluis, M.; Manohar, S. Ultrasound imaging in thyroid nodule diagnosis, therapy, and follow-up: Current status and future trends. J. Clin. Ultrasound 2023, 51, 1087–1100. [Google Scholar] [CrossRef]
- David, E.; Aliotta, L.; Frezza, F.; Riccio, M.; Cannavale, A.; Pacini, P.; Di Bella, C.; Dolcetti, V.; Seri, E.; Giuliani, L.; et al. Thyroid Nodule Characterization: Which Thyroid Imaging Reporting and Data System (TIRADS) Is More Accurate? A Comparison Between Radiologists with Different Experiences and Artificial Intelligence Software. Diagnostics 2025, 15, 2108. [Google Scholar] [CrossRef] [PubMed]
- Radhachandran, A.; Kinzel, A.; Chen, J.; Sant, V.; Patel, M.; Masamed, R.; Arnold, C.W.; Speier, W. A multitask approach for automated detection and segmentation of thyroid nodules in ultrasound images. Comput. Biol. Med. 2024, 170, 107974. [Google Scholar] [CrossRef]
- Hou, X.; Hua, M.; Zhang, W.; Ji, J.; Zhang, X.; Jiang, H.; Li, M.; Wu, X.; Zhao, W.; Sun, S.; et al. An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning. Sci. Data 2024, 11, 1272. [Google Scholar] [CrossRef]
- Savelonas, M. An Overview of AI-Guided Thyroid Ultrasound Image Segmentation and Classification for Nodule Assessment. Big Data Cogn. Comput. 2025, 9, 255. [Google Scholar] [CrossRef]
- Yu, D.; Song, T.; Yu, Y.; Zhang, H.; Gao, F.; Wang, Z.; Wang, J. Risk assessment of thyroid nodules with a multi-instance convolutional neural network. Front. Oncol. 2025, 15, 1608963. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, Q.; Han, X.; Niu, L.; Sun, W. TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification. Sci. Data 2025, 12, 1437. [Google Scholar] [CrossRef] [PubMed]
- Parisot, S.; Ktena, S.I.; Ferrante, E.; Lee, M.; Guerrero, R.; Glocker, B.; Rueckert, D. Disease Prediction using Graph Convolutional Networks: Application to Autism Spectrum Disorder and Alzheimer’s Disease. In Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI), Quebec City, QC, Canada, 11–13 September 2017. [Google Scholar]
- Meng, X.; Zou, T. Clinical applications of graph neural networks in computational histopathology: A review. Comput. Biol. Med. 2023, 164, 107201. [Google Scholar] [CrossRef] [PubMed]
- Bronstein, M.M.; Bruna, J.; Cohen, T.; Veličković, P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv 2021, arXiv:2104.13478. [Google Scholar] [CrossRef]
- Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the IJCAI, Quebec City, QC, Canada, 20–25 August 1995. [Google Scholar]
- Varma, S.; Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 2006, 7, 91. [Google Scholar] [CrossRef]
- Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [Google Scholar] [CrossRef]
- Gong, H.; Chen, J.; Chen, G.; Li, H.; Li, G.; Chen, F. Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules. Comput. Biol. Med. 2023, 155, 106389. [Google Scholar] [CrossRef]
- Dong, P.; Zhang, R.; Li, J.; Liu, C.; Liu, W.; Hu, J.; Yang, Y.; Li, X. An ultrasound image segmentation method for thyroid nodules based on dual-path attention mechanism-enhanced UNet++. BMC Med. Imaging 2024, 24, 341. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Fu, C.; Xu, S.; Sham, C.W. Thyroid ultrasound image database and marker mask inpainting method for research and development. Ultrasound Med. Biol. 2024, 50, 509–519. [Google Scholar] [CrossRef]
- Xu, Y.; Xu, M.; Geng, Z.; Liu, J.; Meng, B. Thyroid nodule classification in ultrasound imaging using deep transfer learning. BMC Cancer 2025, 25, 544. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Liu, Q.; Han, X.; Niu, L.; Sun, W. TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification (Data Release). figshare 2025. [Google Scholar] [CrossRef]
- Hu, M.; Zhang, Y.; Xue, H.; Lv, H.; Han, S. Mamba- and ResNet-Based Dual-Branch Network for Ultrasound Thyroid Nodule Segmentation. Bioengineering 2024, 11, 1047. [Google Scholar] [CrossRef]
- Chi, J.; Walia, E.; Babyn, P.; Wang, J.; Groot, G.; Eramian, M. Thyroid nodule classification in ultrasound images by fine-tuning deep convolutional neural network. J. Digit. Imaging 2017, 30, 477–486. [Google Scholar] [CrossRef] [PubMed]
- Aci, C.I.; Mutlu, G.; Ozen, M.; Sarac, E.; Uzel, V.N.K. A Feature Selection-Based Multi-Stage Methodology for Improving Driver Injury Severity Prediction on Imbalanced Crash Data. Electronics 2025, 14, 3377. [Google Scholar] [CrossRef]
- Chen, C.; Wu, Y.; Dai, Q.; Zhou, H.Y.; Xu, M.; Yang, S.; Han, X.; Yu, Y. A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10297–10318. [Google Scholar] [CrossRef]
- Zhang, L.; Zhao, Y.; Che, T.; Li, S.; Wang, X. Graph neural networks for image-guided disease diagnosis: A review. Intell. Robot. Devices 2023, 1, 151–166. [Google Scholar] [CrossRef]
- Mienye, I.D.; Viriri, S. Graph Neural Networks in Medical Imaging: Methods, Applications and Future Directions. Information 2025, 16, 1051. [Google Scholar] [CrossRef]
- Chowa, S.S.; Azam, S.; Montaha, S.; Payel, I.J.; Bhuiyan, M.R.I.; Hasan, M.Z.; Jonkman, M. Graph neural network-based breast cancer diagnosis using ultrasound images with optimized graph construction integrating clinically significant features. J. Cancer Res. Clin. Oncol. 2023, 149, 18039–18064. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, C.; Luo, S.; Dai, Y.; Zhang, J. Graph Neural Network Enhanced Dual-Branch Network (GED-Net) for lesion segmentation in ultrasound images. Expert Syst. Appl. 2024, 256, 124835. [Google Scholar] [CrossRef]
- Agyekum, E.A.; Kong, W.; Ren, Y.Z.; Issaka, E.; Baffoe, J.; Xian, W.; Tan, G.; Xiong, C.; Wang, Z.; Qian, X.; et al. A comparative analysis of three graph neural network models for predicting axillary lymph node metastasis in early-stage breast cancer. Sci. Rep. 2025, 15, 13918. [Google Scholar] [CrossRef]
- Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the NeurIPS, Long Beach, CA, USA, 8 December 2017. [Google Scholar]
- Efron, B. Bootstrap methods: Another look at the jackknife. In Breakthroughs in Statistics: Methodology and Distribution; Springer: Berlin/Heidelberg, Germany, 1992; pp. 569–593. [Google Scholar]
- Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks. In Proceedings of the ICLR, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Niculescu-Mizil, A.; Caruana, R. Predicting Good Probabilities with Supervised Learning. In Proceedings of the ICML, Bonn, Germany, 7–11 August 2005. [Google Scholar]
- Naeini, M.P.; Cooper, G.; Hauskrecht, M. Obtaining Well Calibrated Probabilities Using Bayesian Binning. In Proceedings of the AAAI, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Brier, G.W. Verification of Forecasts Expressed in Terms of Probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, NSW, Australia, 6–11 August 2017. [Google Scholar]
- Bahmane, K.; Bhattacharya, S.; Chaouki, A.B. Evaluation of a Hybrid CNN Model for Automatic Detection of Malignant and Benign Lesions. Medicina 2025, 61, 2036. [Google Scholar] [CrossRef] [PubMed]
- Sujini, G.N.; Sivadi, S. Automated thyroid nodule classification in ultrasound imaging using a hybrid vision transformer and Wasserstein GAN with gradient penalty. Sci. Rep. 2025, 15, 40786. [Google Scholar] [CrossRef]




| Model Family | Strength | Key Limitation for Peri-Lesional Context |
|---|---|---|
| ROI-only CNN | Strong local texture/margin cues; simple training | Discards surrounding tissue relationships |
| Whole-image CNN/ViT | Uses full anatomy; no cropping choices | Background dominates; requires learning to ignore non-informative regions |
| MIL/patch-attention | Pools multiple regions; weak localization | Instances typically unordered; spatial relations implicit or absent |
| GNN (ours) | Explicit nodes + geometry; message passing; attention readout | Requires defining nodes/edges |
| Split | Images | Grouping | Notes |
|---|---|---|---|
| Train | 3500 | disjoint | optimisation |
| Val | 500 | disjoint | model selection |
| Test | 1000 | disjoint | held-out; used for predictions |
| Acc | AUROC | AUPRC | Min Val Loss | Note |
|---|---|---|---|---|
| 0.904 | 0.942 | 0.979 | 0.268 | epoch-wise best |
| Split | Phase | AUROC | AUPRC | NLL | Brier | ECE |
|---|---|---|---|---|---|---|
| mean | before | 0.8893 | 0.9468 | 0.6392 | 0.2231 | 0.2584 |
| mean | after | 0.8893 | 0.9468 | 0.4074 | 0.1271 | 0.0991 |
| fold1 | before | 0.8774 | 0.9421 | 0.6114 | 0.2096 | 0.2558 |
| fold1 | after | 0.8774 | 0.9421 | 0.4738 | 0.1566 | 0.1321 |
| fold2 | before | 0.8605 | 0.9241 | 0.6356 | 0.2214 | 0.2875 |
| fold2 | after | 0.8605 | 0.9241 | 0.3992 | 0.1211 | 0.0815 |
| fold3 | before | 0.8111 | 0.8977 | 0.6459 | 0.2265 | 0.2099 |
| fold3 | after | 0.8111 | 0.8977 | 0.4542 | 0.1443 | 0.0608 |
| fold4 | before | 0.8616 | 0.9317 | 0.6296 | 0.2184 | 0.2220 |
| fold4 | after | 0.8616 | 0.9317 | 0.3988 | 0.1248 | 0.0363 |
| fold5 | before | 0.3607 | 0.6260 | 0.6811 | 0.2440 | 0.2519 |
| fold5 | after | 0.3607 | 0.6260 | 0.6531 | 0.2285 | 0.2176 |
| Split | T |
|---|---|
| mean | 0.1000 |
| fold1 | 0.1866 |
| fold2 | 0.1000 |
| fold3 | 0.1000 |
| fold4 | 0.1000 |
| fold5 | 0.1594 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yavuz, M.; Yumuşak, N. ROI+Context Graph Neural Networks for Thyroid Nodule Classification: Baselines, Cross-Validation Protocol, and Reproducibility. Electronics 2026, 15, 151. https://doi.org/10.3390/electronics15010151
Yavuz M, Yumuşak N. ROI+Context Graph Neural Networks for Thyroid Nodule Classification: Baselines, Cross-Validation Protocol, and Reproducibility. Electronics. 2026; 15(1):151. https://doi.org/10.3390/electronics15010151
Chicago/Turabian StyleYavuz, Mehmet, and Nejat Yumuşak. 2026. "ROI+Context Graph Neural Networks for Thyroid Nodule Classification: Baselines, Cross-Validation Protocol, and Reproducibility" Electronics 15, no. 1: 151. https://doi.org/10.3390/electronics15010151
APA StyleYavuz, M., & Yumuşak, N. (2026). ROI+Context Graph Neural Networks for Thyroid Nodule Classification: Baselines, Cross-Validation Protocol, and Reproducibility. Electronics, 15(1), 151. https://doi.org/10.3390/electronics15010151

