An Overview of AI-Guided Thyroid Ultrasound Image Segmentation and Classification for Nodule Assessment

Michalis Savelonas

doi:10.3390/bdcc9100255

Department of Computer Science and Biomedical Informatics, University of Thessaly, GR-35131 Lamia, Greece

Big Data Cogn. Comput.2025, 9(10), 255;https://doi.org/10.3390/bdcc9100255

Version Notes

Order Reprints

Abstract

Accurate segmentation and analysis of thyroid nodules in ultrasound (US) images are essential for the diagnosis and management of thyroid conditions, including cancer. Despite advancements in medical imaging, achieving accurate and efficient segmentation remains a significant challenge due to the complexity and variability of US images. Recently, deep learning (DL) techniques, such as convolutional neural networks (CNNs) and vision transformers (ViTs), have emerged as powerful tools for computer-aided diagnosis (CAD). This review highlights recent advancements in thyroid US image segmentation, focusing on state-of-the-art DL techniques such as contrastive learning, consistency learning, and knowledge-driven DL. We explore various approaches to improve segmentation accuracy, including multi-task learning, self-supervised learning, and methods that minimize reliance on the availability of large, annotated datasets. Additionally, we examine the clinical significance of these methods in differentiating between benign and malignant nodules, as well as their potential for integration into clinically adopted, fully automated CAD systems. By addressing the latest developments and ongoing challenges, this review serves as a comprehensive reference for future research and clinical implementation of thyroid US diagnostics.

Keywords:

ultrasound; thyroid nodules; segmentation; deep learning; computer-aided diagnosis

1. Introduction

Thyroid nodule assessment is crucial for the diagnosis and management of thyroid conditions, particularly thyroid cancer, which has seen a rapid increase in incidence worldwide. From a clinical perspective, thyroid nodules are discrete lesions or abnormal growths of cells within the thyroid gland. Their etiology is multifactorial, arising from conditions such as iodine deficiency, chronic inflammation like thyroiditis, or benign proliferative lesions. Although thyroid nodules are rather ordinary in the adult population, only 8% to 16% are malignant []. The primary clinical challenge is to accurately and non-invasively distinguish malignant lesions, primarily papillary thyroid carcinoma. Thyroid nodules exhibit a broad spectrum of sonographic appearances, making accurate risk stratification essential for guiding appropriate management.

Various auxiliary examination methods, such as US, computed tomography (CT), and radioisotope scanning, are used to evaluate thyroid diseases []. Among them, US imaging has emerged as the primary modality for thyroid evaluation due to its non-invasiveness, real-time display capabilities, and absence of ionizing radiation. The thyroid imaging reporting and data system (TIRADS) has been defined in various versions for standardizing the stratification of thyroid nodules with respect to their malignancy risk by means of sonographic features, such as hypoechoic reflectance, halo loss, microcalcifications, nodule inflow, and shape aspect ratio [,]. However, accurate interpretation of US images is challenging, as it heavily depends on the skill and experience of the clinician. The variability in image quality, observer expertise, and subtlety of nodular features often leads to misdiagnosis or unnecessary interventions. In this context, computer-aided diagnosis (CAD) systems have been developed to alleviate physician workload and enhance diagnostic accuracy. In particular, CAD systems for thyroid US images demonstrate benefits when aiding less experienced physicians in the diagnosis of thyroid nodules []. CAD systems also strengthen interobserver agreement with respect to thyroid nodule malignancy risk [].

Segmentation of thyroid nodules can be a critical first step in the diagnostic process, as it directly influences subsequent analyses, assessments, and risk stratifications of nodules. The extraction of precise nodule boundaries enables a more accurate evaluation of key features such as size, shape, and internal composition, which are essential for determining malignancy risk. Additionally, effective segmentation further aids in decision-making with respect to patient management and the need for fine-needle aspiration (FNA) biopsy or surgical intervention. Therefore, segmentation not only enhances the accuracy of initial assessments but also forms the foundation for the subsequent diagnostic workflow.

Historically, computational methods for thyroid nodule segmentation and assessment were based on traditional image analysis and machine learning (ML) techniques. Active contours, often formulated as level sets to enable topological adaptability (i.e., contour splitting or merging), were one of the most widely adopted image segmentation paradigms during that period []. For malignancy risk assessment, standard classifiers such as k-nearest neighbors (k-NN), support vector machines (SVMs), and random forests [] were commonly used. These methods were foundational and addressed the presence of noise, image quality inconsistencies, and image/nodule inhomogeneity, as well as smooth or irregular nodule boundaries.

The emergence of deep learning (DL) methods, particularly convolutional neural networks (CNNs) [], revolutionized thyroid image analysis by automating feature extraction and learning directly from raw data. DL methods significantly improved segmentation accuracy and reliability. Kim et al. [] trained and validated CNN models for differentiating malignant from benign thyroid nodules on US images. Their experiments led to the conclusion that the DL models demonstrate comparable diagnostic performance to radiologists and may play a pivotal role in augmenting radiologists’ diagnoses. Similar conclusions were derived for CNNs in the multicenter retrospective study of Koh et al. [].

More recently, vision transformers (ViTs) [] appeared as a promising alternative to CNNs by leveraging self-attention mechanisms to capture long-range dependencies in imaging data. In the context of thyroid US imaging, ViTs have the potential to enhance performance in scenarios where spatial relationships between features are subtle or distributed across the image. Already some studies suggest that ViTs may outperform CNNs in classification tasks, particularly when trained on sufficiently large datasets or pretrained on related medical imaging tasks.

Despite their advantages, DL methods also have limitations, such as their dependency on the availability of large, annotated datasets, the issue of overfitting, and their limited interpretability. Aiming to overcome these limitations, hybrid approaches have been proposed, combining DL with traditional methods or domain-specific knowledge. These hybrid approaches aim to provide a more balanced solution, improving both the clinical relevance and robustness of thyroid nodule CAD systems.

Several scholars have provided review studies on the application of CAD systems on thyroid nodules [,,,,,,,]. However, most review studies have a narrow scope in some sense, either addressing: (1) earlier segmentation and assessment approaches, while missing more recent DL-based methods [,], (2) mostly pre-DL approaches and early DL-based ones, which involve the standard CNN architectures, while missing the latest developments, such as ViTs [], (3) mostly DL-based malignancy risk assessment methods, while overlooking the importance of image segmentation, as well as of non-DL methodologies, which however continue to appear in the literature [,,,], (4) only thyroid US image segmentation [], (5) mostly the obtained results and the medical perspective, without providing insights with respect to image analysis and ML [,,]. This work provides an up-to-date (2025) comprehensive overview and critical evaluation of thyroid nodule CAD in US images by addressing (1) traditional image analysis methods for thyroid US segmentation, (2) traditional ML-based methods for nodule malignancy risk assessment, (3) DL-based methods, including the latest developments, and (4) hybrid methods encompassing DL-based and traditional image analysis methods and/or expert knowledge. This review makes several key contributions to the field:

A comprehensive taxonomy of AI-guided methods, systematically organizing approaches from traditional image processing to state-of-the-art DL models like ViTs. Figure 1 presents a visual roadmap of the topics discussed, organizing the landscape of AI-guided thyroid US image analysis into its core components.

Figure 1. A mind map-style taxonomy of AI-guided methods for thyroid US image analysis. The field is organized in four key pillars branching out: segmentation, computational assessment, learning paradigms, and clinical integration factors.
A critical and comparative evaluation through consolidated strengths and weaknesses tables, summarizing over 100 distinct methods to illuminate research gaps and guide future work.
Practical guidance designed to bridge the gap between academic research and clinical implementation. This includes a detailed discussion of public datasets, checklists for evaluating model alignment with clinical standards like TIRADS, and actionable “playbook” boxes for operationalizing key concepts such as explainability and clinical workflow integration.

By synthesizing the trade-offs between traditional and DL paradigms, as well as by highlighting the trend towards hybrid models, this work serves as a comprehensive survey and a practical guide for researchers, clinicians, and engineers in the field.

A thorough literature search was conducted across the PubMed, IEEE Xplore, Google Scholar, and ScienceDirect databases. The search timeframe was set from 2006 to 2025, aligning with the emergence of the foundational methods covered in this review. Search queries were structured around key concepts and their variants, including “thyroid nodule segmentation,” “thyroid nodule boundary extraction,” “thyroid nodule malignancy risk assessment,” “risk stratification,” “thyroid nodule classification,” and “thyroid nodule computer-aided diagnosis.”

The selection process involved a multi-stage screening and curation protocol. Initially, approximately 500 articles were identified. After removing approximately 20 duplicates, the titles and abstracts of the remaining 480 articles were screened, leading to the exclusion of roughly 300 articles that did not present a primary research method. The remaining approximately 180 full-text articles were then assessed for eligibility based on a qualitative and quantitative expert assessment. Inclusion criteria prioritized seminal and state-of-the-art contributions based on citation count, journal impact factor or conference ranking, technical novelty, and quality of presentation. Studies were excluded if they offered only a marginal technical contribution or were of low presentational quality. Studies were excluded if they presented a marginal technical contribution or were of low presentational quality. Following this detailed assessment, approximately 70 articles were excluded, resulting in a final set of 113 primary research studies that form the basis of the methodological analysis in this review. The entire selection process is visually summarized in the PRISMA flowchart (Figure 2).

Figure 2. PRISMA flowchart illustrating the literature search protocol of this review.

The rest of this text is organized as follows: Section 2 presents related work on thyroid US segmentation for nodule boundary extraction. Section 3 covers computational assessment of thyroid nodules. Both of these sections address traditional image analysis and ML, as well as DL-based methods. Section 4 provides a discussion on relevant trends, as well as on the main existing challenges, whereas Section 5 presents the main conclusions of this work.

2. Thyroid US Image Segmentation for Nodule Boundary Extraction

Although segmentation is often integrated as a component within broader CAD pipelines, it warrants dedicated attention due to its central role in the accurate assessment of thyroid nodules. Precise delineation of nodule boundaries is critical for downstream tasks such as feature extraction, classification, and longitudinal monitoring. Moreover, segmentation quality directly influences diagnostic reliability and inter-observer consistency, making it a foundational step in computational, clinician, and hybrid workflows.

Computational methods for US image segmentation have been investigated since the 1980s, when US imaging became essential in clinical medicine []. In particular, the extraction of thyroid nodule boundaries presents several challenges, including speckle noise, acoustic shadowing, low contrast, imaging artifacts, and intensity inhomogeneity. The methodologies for addressing this have evolved from traditional, model-driven techniques to modern, data-driven DL paradigms. A detailed overview of over 50 of these methods, summarizing their core contributions and critically evaluating their strengths and weaknesses, is presented in Table 1. The following subsections provide a high-level analysis of these two distinct eras of segmentation research.

In the pre-DL era, active contours [,,,,,,,,,,,,] were the dominant image segmentation paradigm of the 1990s–2000s. Consequently, several active contour variants emerged for thyroid US image segmentation. These model-based methods deform a contour to fit object boundaries by minimizing a predefined energy functional, but their reliance on handcrafted models and sensitivity to parameters presented significant challenges. Figure 3 illustrates three examples of thyroid US images, ground truth annotations by expert radiologists, and the nodule delineations obtained by two active contour variants, manually or automatically parameterized []. Besides the main research path of active contours, alternative approaches were also attempted for thyroid US segmentation and nodule boundary extraction [,,]. A core performance limitation of these multi-stage pipelines is the lack of end-to-end optimization; because each step is optimized independently, errors from earlier stages can propagate and accumulate, a problem that modern deep learning models are architecturally designed to overcome.

Figure 3. Examples of thyroid nodules in US images: (first column) original thyroid US images; (second column) ground truth annotations; (third column) segmentation results with manual parameter adjustment; (fourth column) segmentation results with automatic parameter adjustment. Reproduced from [], which is an open-access article distributed under a Creative Commons Attribution 4.0 International License.

The advent of DL, and particularly CNNs like U-Net [], marked a paradigm shift in medical image segmentation. Their capability to automatically learn hierarchical spatial features from raw input data, combined with efficient parameter sharing, made them especially effective. The symmetric encoder–decoder architecture of U-Net, with its skip connections, allowed for the precise localization of structures by integrating both low-level detail and high-level context—an essential property for delineating subtle nodules in US images. Furthermore, their efficiency on relatively small datasets and robustness to speckle noise positioned CNN-based models as a natural fit for thyroid US segmentation tasks [,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,].

The evolution of DL-based segmentation is characterized by continuous innovation to address the limitations of early models. While CNNs like U-Net have been successfully applied, these approaches can be seriously affected by blurred boundaries caused by low-quality imaging, and the small number of boundary pixels often causes a significant class imbalance issue during training. Aiming to cope with this issue, Jin et al. [] proposed a boundary field (BF) regression branch to provide useful boundary information, aiding encoder–decoder networks for thyroid US image segmentation and nodule boundary extraction. Furthermore, to address the limitations of CNNs’ local receptive fields, vision transformers (ViTs) have recently emerged as a compelling alternative. Unlike CNNs, ViTs are designed to model long-range dependencies through self-attention mechanisms, enabling them to capture global contextual information more effectively. This is particularly beneficial in US imaging, where subtle differences in texture and spatial relationships hold diagnostic relevance [,,,,,].

Another research direction is defined by hybrid thyroid US segmentation methods, combining traditional approaches, such as active contours or boundary regression, with DL-based methods [,].

In parallel with the development of transformer-based models, recent studies have also begun to explore the potential of generative adversarial networks (GANs) to further enhance segmentation quality [,,,,,]. The adversarial training scheme of GANs can encourage sharper boundaries and more realistic segmentation outputs by learning to distinguish between real and generated segmentations. In this context, hybrid frameworks have emerged, combining GANs with attention mechanisms or deformable convolutions to leverage both global structure and fine-grained boundary precision.

Of particular interest is the method of Wan et al. [], which introduces DpRAN (dynamic perfusion representation and aggregation network) for the automatic segmentation of thyroid nodules using dynamic contrast-enhanced US (CEUS) imaging. The focus of their method is on modeling the enhancement dynamics of various perfusion areas. For this, they divide the enhancement features into two scales: short-range enhancement patterns and long-range evolution tendency. Aiming to represent real-time enhancement characteristics and aggregate them in a global view, they introduced the perfusion excitation (PE) gate and the cross-attention temporal aggregation (CTA) module, respectively. Unlike most temporal fusion methods, they also introduced an uncertainty estimation strategy to assist the model to locate the critical enhancement point. Box 1 summarizes the use of temporal and CEUS images.

Box 1. Use of Temporal and CEUS Models

1.: Clinical benefits
Assessment of nodule vascularity: The primary improvement is the ability to visualize and quantify blood flow within a nodule after the injection of a microbubble contrast agent. Certain vascular patterns, such as irregular, chaotic, or peripheral vascularity, as well as rapid contrast uptake and wash-out, may increase the suspicion of malignancy.
Quantitative perfusion analysis: Models can be trained to extract quantitative features from the video, such as wash-in/wash-out curves and time-to-peak enhancement, providing objective and reproducible perfusion metrics that are difficult to quantify reliably through visual inspection alone.

2.: Practical limitations and challenges
Hardware and workflow disruption: CEUS requires specialized US equipment and the intravenous administration of a contrast agent, which is not part of routine, first-line thyroid nodule screening in most clinical settings.
Data scarcity and annotation complexity: The acquisition, storage, and, most importantly, annotation of video data—often requiring frame-level or temporal labeling of perfusion phases–are far more complex and labor-intensive than for static images. This has led to a profound lack of large, public datasets, hindering robust model development.
Model-specific challenges: While models like DpRAN can capture multiscale temporal features, the inherent variability in perfusion can be challenging to model. These systems can lack the flexibility to robustly distinguish truly salient perfusion differences from background noise in highly heterogeneous nodules.

3.: When to use
Temporal and CEUS-based CAD methods are generally not intended for routine screening. Their ideal application is as a second-line diagnostic tool for characterizing equivocal or indeterminate nodules that are difficult to classify using B-mode features alone, or in specialized academic research settings focused on tumor vascularity.

Table 1 presents an overview of US image segmentation methods, including traditional, CNN/ViT-based, hybrid and GAN-based methods for thyroid nodule boundary extraction. As the evaluation metrics adopted in these works vary, along with the datasets used in the experiments, which include in-house datasets in many cases, we do not provide numerical experimental results. For these, the reader is referred to the original works. A discussion on the main weaknesses identified and the challenges involved, appears in Section 4.

Table 1. Overview of thyroid nodule US image segmentation methods.

Reference	Method	Strengths	Weaknesses	Description
Maroulis et al. (2007) []	Active contour	Addresses smooth/blurred boundaries, robust to intensity inhomogeneity, inherent denoising	Cannot be applied on isoechoic nodules	Introduces the variable background active contour (VBAC), a Chan–Vese model [] variant, which mitigates the effects of inhomogeneous tissue in US images by selectively excluding outlier regions from the contour evolution equations.
Savelonas et al. (2009) []	Active contour	Copes with isoechoic nodules, integrates textural features	Parameter adjustment (see Section 4.1)	Introduces the joint echogenicity-texture (JET) active contour, which incorporates statistical texture information into a modified energy functional, enabling it to cope with challenging isoechoic thyroid nodules.
Du et al. (2015) []	Active contour	Noise robustness	Region-based information is not considered, parameter adjustment (see Section 4.1)	Presents a pipeline centered on a distance-regularized level set guided by a local phase symmetry feature. This approach is designed to suppress speckle noise and prevent boundary leakage.
Koundal et al. (2016) [,]	Active contour/neutrosophic clustering	Integration of neutrosophic and clustering information, automated parameter adjustment	Dependency on rough region-of-interest (ROI) estimation, computational complexity/cost (see Section 4.1)	Integrates spatial information with neutrosophic L-means clustering to derive a robust region of interest (ROI), which then guides a distance-regularized level set [] for precise delineation. Later, Koundal et al. [] proposed a similar pipeline employing intuitionistic fuzzy c-means clustering.
Mylona et al. (2014) []	Active contour	Automated parameter adjustment	Potential robustness issues for gradient orientation estimation based on structure tensor (see Section 4.1)	Employs orientation entropy (OE) to automatically adjust the regularization and data fidelity parameters of region-based active contours, adapting the model’s behavior to local image structure.
Tsantis et al. (2006) []	Wavelets, local maxima chaining, Hough transform	Noise robustness, copes with iso-echoic nodules	Prior knowledge is simply expressed as circular shapes	A three-stage traditional method that combines wavelet edge detection for speckle reduction, a multiscale structure model for contour representation, and a Hough transform to distinguish nodule boundaries.
Chiu et al. (2014) []	Radial gradient	Noise robustness	User-defined ROI/boundary points (see Section 4.1)	A semi-automatic method that uses a radial gradient algorithm [] and variance-reduction statistics to select cut points on the nodule boundary from a user-defined ROI, with additional filtering for outliers.
Nugroho et al. (2015) []	Bilateral filtering, active contour	Addresses smooth/blurred boundaries, inherent denoising, and topological adaptability	Assumes nearly homogeneous foreground and background	A traditional pipeline that applies bilateral filtering for image denoising and uses the Chan–Vese model [] for nodule delineation, leveraging its inherent denoising and topological adaptability.
Le et al. (2015) []	Gradient and directional vector flow active contour	Increased contour flexibility, reduced computational time and complexity	Multiple parameters, no end-to-end optimization (see Section 4.1)	A variant of the dynamic directional gradient vector flow active contour [] that redefines the energy functional by altering straight lines to fold lines in order to increase contour flexibility. It also introduces a vector field to reduce computational complexity.
Ma et al. (2017) []	CNN	Fully automatic	Ignores global context, lacks interpretability (see Section 4.2)	An early CNN-based approach that formulates segmentation as a patch-level classification task. It uses a multi-view strategy on patches from normal and nodular glands to generate a probability map.
Wang et al. (2023) []	CNN (DPAM-PSPNet)	Integrates multiscale context, brightness, contrast, and structural similarity	Model complexity (see Section 4.3)	Introduces a dual path attention mechanism (DPAM) into pyramid scene parsing network (PSPNet) []. This mechanism is designed to capture both global contextual information and fine-grained nodule edge structures simultaneously.
Zhou et al. (2018) []	CNN (U-Net)	Requires only limited human effort	Not fully automatic	U-Net is accompanied by an interactive segmentation stage. The model is guided during training and inference by four manually determined endpoints of major and minor nodule axes.
Nandamuri et al. (2019) []	FCN	Relatively efficient inference	Tested on a moderately sized dataset (see Section 4.4)	Introduces SUMNet, an FCN that learns the spatial relationship between classes. It uses feature concatenation and index-passing-based unpooling to enhance semantic segmentation.
Abdolali et al. (2020) []	Mask R-CNN	No complex postprocessing required	Moderately sized dataset, concerns about overfitting (see Section 4.4)	Applies a mask R-CNN model, extending the faster R-CNN object detector with a specialized loss function to perform instance segmentation of thyroid US images.
Koumar et al. (2020) []	VGG16 variant	Efficient inference, fully automatic	Moderate detection rate for cystic components, moderate accuracy	A VGG16 [] variant that uses dilated convolutions to expand the receptive field. It features two separate outputs to simultaneously delineate both the normal thyroid gland and the nodule.
Wu et al. (2020) []	DenseNet-121 combined with ASPP	Captures multiscale information	Limited exploitation of boundary information	Combines atrous spatial pyramid pooling (ASPP) with depth-wise separable convolutions. This approach is designed to better capture contextual information while managing the size of the feature maps.
Webb et al. (2021) []	Deeplab v3+ with DenseNet-101 variant	Preserves spatial resolution, improved low-level feature extraction	Computational and memory cost (see Section 4.3)	Adapts the DeepLabv3+ [] architecture with a ResNet101 backbone for thyroid US. It features a dual-output design to delineate the thyroid gland and nodules/cysts separately, improving handling of overlapping classes.
Nugroho et al. (2021) []	Res-UNet	Combines strengths of U-Net (preserves spatial resolution) and ResNet (robustness and depth)	Linear interpolation may degrade details	Employs a Res-UNet model, which integrates the residual connections of ResNet into the U-Net architecture, in order to combine the strengths of both (spatial preservation and deep feature extraction).
Xiangyu et al. (2022) []	DPCNN	Effective localization	Hyperparameter adjustment (see Section 4.1), no deep feature representation	A method based on a pulse-coupled neural network that first performs a rough localization, which is then refined using variance and covariance criteria to identify and segment the final lesion area.
Nguyen et al. (2022) []	U-Net	Refined segmentation obtained with the successive application of SN and EN	Computational cost (see Section 4.3), struggles in cases of small thyroid glands (see Section 4.1)	Introduces a dual network based on information fusion. It uses a “suggestion network” (SN) to generate an initial rough mask, which is then refined by an “enhancement network” (EN).
Yang et al. (2022) []	U-Net variant	Contextual understanding via PAM, captures boundary details via MRM	Computational complexity/cost (see Section 4.3), small-sized dataset (see Section 4.4)	Introduces DMU-Net, a dual-subnet architecture with a U-shaped and an inverse U-shaped path. It incorporates a pyramid attention module (PAM) and a margin refinement module (MRM) to capture both context and fine details.
Song et al. (2022) []	Faster R-CNN	Enhanced localization by adding a segmentation branch, fine-grained annotations are not required	Computational complexity/cost (see Section 4.3), the quality of pseudo-labels affects performance	Introduces FDnet, a feature-enhanced dual-branch network based on faster R-CNN. It adds a semantic segmentation branch and introduces a method for generating pseudo-labels (computationally generated masks) for training.
Chen et al. (2022) []	Trident network	Captures multiscale information, suppresses false positives, handles complex textures	Computational complexity/cost (see Section 4.3), generalization concerns (see Section 4.4)	Introduces MTN-Net, a multi-task network based on the trident network. It uses trident blocks with different receptive fields to detect nodules of varying sizes and includes a specialized non-maximum suppression (TN-NMS).
Gan et al. (2022) []	U-Net variant	Enhanced spatial attention via polarized self-attention	Computational complexity/cost (see Section 4.3), hyperparameter adjustment (see Section 4.1), generalization concerns (see Section 4.4)	A U-Net variant that uses an enhanced residual module with soft pooling in the encoder. It also incorporates a full-channel, attention-assisted skip connection based on polarized self-attention.
Jin et al. (2022) []	Boundary field regression branch integrated with existing networks	Can be integrated with various network architectures (U-Net, DeepLab v3+, etc.), enhances the accuracy of boundary extraction	Depends on the network backbone and on the quality of the utilized segmentation mask	Addresses the boundary imbalance issue by introducing a boundary field (BF) regression branch that is trained on a heatmap generated from existing masks to provide explicit boundary information to the network.
Shao et al. (2023) []	U-Net3+ variant	Reduced number of parameters, secondary feature maps derived by ghost modules may lack diversity	Inference time (see Section 4.3)	Introduces a U-Net3+ [] variant that employs the “ghost” bottleneck module. The latter uses ghost modules (GMs) [] and depth-wise convolutions [] to expand and then reduce channel numbers efficiently.
Dai et al. (2024) []	U-Net++ variant	SK-based attention addresses diversity in nodules’ size and shape	Moderately sized dataset (see Section 4.4), oversegmentation in some cases	Introduces SK-Unet++, which starts from U-Net++ [] and adds adaptive receptive fields based on the selective kernel (SK) attention mechanism to better handle nodules of varying sizes.
Zheng et al. (2023) []	U-Net variant	ResNeSt copes with the presence of small nodules, contextual awareness via ASPP	Inference time (see Section 4.3)	Presents DSRU-Net, which enhances U-Net with a ResNeSt backbone [], atrous spatial pyramid pooling (ASPP) [] for context, and deformable convolution (DC v3) [] for adaptability to irregular gland and nodule shapes.
Wang et al. (2023) []	U-Net variant	MSAC captures both contextual and fine-grained information	Computational complexity/cost (see Section 4.3), concerns for generalization (see Section 4.4)	Introduces the multiscale attentional convolution (MSAC) module, which replaces standard convolutions in U-Net. This module uses cascaded convolutions and self-attention to concatenate features from various receptive fields.
Chen et al. (2023) []	CNN	Addresses the diversity in nodules’ size and shape	Cannot cope with multiple nodules	Introduces FDE-Net, which combines a CNN with a traditional image omics method. It employs a segmental frequency domain enhancement to reduce noise and strengthen contour details in the feature maps.
Ma et al. (2023) []	CycleGAN	Addresses domain shift due to multi-site US image collection	Computational complexity/cost (see Section 4.3)	A domain adaptation framework for multi-site datasets. It uses a CycleGAN-based [] image translation module to unify image styles and a segmentation consistency loss to guide unsupervised learning on unlabeled target data. Their segmentation module is constructed into two symmetrical parts, which use EfficientNet [].
Bouhdiba et al. (2024) []	U-Net and RefineUNet	Captures both fine-grained and contextual information	Computational complexity/cost (see Section 4.3)	A deep architecture that combines U-Net with multi-resolution RefineUNet modules. This combination of residual blocks and chained residual pooling (CRP) is aimed at exploiting both fine-scale and contextual features.
Ma et al. (2024) []	CNN	Captures complex patterns via dense blocks, addresses small nodules and dataset imbalance	Generic backbone, not specialized for segmentation, struggles with heterogeneous nodules	Introduces MDenseNet, a densely connected CNN. The dense connectivity pattern is designed to capture complex patterns, address the small nodule problem, and mitigate dataset imbalance issues.
Liu et al. (2024) []	U-Net and ConvNeXt	Captures both edge detail and contextual information	Computational complexity/cost (see Section 4.3)	Builds on U-Net using a ConvNeXt [] backbone and introduces four specialized modules: a boundary feature guidance module (BFGM), a multiscale perception fusion module (MSPFM), a depthwise separable ASPP, and a refinement module.
Xing et al. (2024) []	U-Net variant	Multiscale feature extraction, computational complexity/cost	Struggles in the presence of calcifications (see Section 4.1)	A U-Net variant that integrates three key modifications for efficiency and performance: dense connectivity for feature reuse, dilated convolutions for redesigning layers, and factorized convolutions to improve efficiency.
Yang et al. (2024) []	U-Net variant	Lightweight design reducing computational complexity/cost, dual attention aids capturing both local and global information	Has not been compared with some large models, such as SWIN U-Net	Introduces DACNet, a lightweight U-shaped network designed for efficiency. It uses depthwise convolutions with squeeze-and-excitation (DWSE) and split atrous with dual attention (ADA) to reduce parameters while capturing multi-scale features.
Xiao et al. (2025) []	DeepLab v3+, EfficientNet-B7	Captures contextual information	Struggles with speckle noise	Replaces the standard backbone of the DeepLabv3+ architecture with the more efficient EfficientNet-B7. This combination leverages atrous convolution to capture context while handling morphological variability.
Ongole et al. (2025) []	FCM, ResNet101 variant	Addresses inhomogeneity, identifies calcium flecks	Computational complexity/cost (see Section 4.3)	Introduces Bi-MCS, a method combining fuzzy c-means and k-means clustering. This technique is incorporated into a ResNet101 variant (Bi-ResNet101) to enhance color sense-based segmentation by focusing on intensity variations.
Xiang et al. (2025) []	U-Net variant	Addresses datasets obtained from multiple sites by means of FL	Computational complexity/cost (see Section 4.1), limitations in adapting to multi-center scenarios of high variability (see Section 4.4)	A federated learning (FL) method based on a multi-attention guided U-Net (MAUNet). It uses a multiscale cross-attention (MSCA) module to handle variations in nodule shape and size across different institutions.
Yadav et al. (2022) []	SegNet, VGG16, U-Net	Addresses speckle noise	Computational complexity/cost (see Section 4.3)	Introduces Hybrid-UNet, a model that combines the architectural principles of SegNet, VGG16, and U-Net. The model is trained using transfer learning to delineate nodules and cystic components.
Ali et al. (2024) []	U-Net-inspired encoder–decoder	Captures contextual information	Computational complexity/cost (see Section 4.3)	Introduces CIL-Net, an encoder–decoder architecture that uses dense connectivity and a triplet attention block in the encoder, as well as a feature improvement block with dilated convolutions in the decoder to capture global context.
Sun et al. (2024) []	U-Net variant	Captures contextual information	Computational cost/complexity (see Section 4.3)	Introduces CRSANet, a U-Net-based network that uses class representations to describe the characteristics of thyroid nodules. A dual-branch self-attention module then refines the coarse segmentation results.
Wu et al. (2025) []	SWIN ViT	Captures global information, not high computational complexity/cost (for a ViT)	Global-local attention could be further enhanced	A pure ViT-based model that uses a SWIN transformer variant. It integrates depthwise convolutions into the transformer blocks to enhance global-local feature representations and uses a multi-level patch embedding.
Ma et al. (2023) []	SWIN ViT	Addresses blurred or uneven tissue regions, late fusion aids handling high-level features	Computational complexity/cost (see Section 4.3)	Introduces AMSeg, a SWIN transformer variant that exploits multiscale anatomical features for late-stage fusion. It uses an adversarial training scheme with separate segmentation and discrimination blocks to handle blurred tissue regions.
Bi et al. (2023) []	U-Net, ViT	Captures both fine-grained and contextual information, integrates low and high frequency	Computational complexity/cost (see Section 4.3)	A U-Net/ViT hybrid architecture named BPAT. It introduces a boundary point supervision module (BPSM) and an assembled transformer module (ATM) to explicitly enhance and constrain nodule boundaries.
Zheng et al. (2025) []	U-Net inspired, SWIN ViT	Addresses indistinct boundaries, captures both fine-grained and contextual information	Computational complexity/cost (see Section 4.3)	Introduces GWUNet, a U-shaped network combining a SWIN transformer with gated attention and a wavelet transform module. The latter is designed to aid in distinguishing indistinct nodule boundaries.
Li et al. (2023) []	CNN, ViT	Captures both fine-grained and contextual information, integrates low and high frequency	Computational complexity/cost (see Section 4.3), small-sized dataset with bias towards clear, well-defined boundaries (see Section 4.4)	A ViT/CNN hybrid network (TCNet) with two branches: a large kernel CNN branch extracts shape information, whereas an enhanced ViT branch models long-range dependencies, with both branches fused by a multiscale fusion module (MFM).
Ozcan et al. (2024) []	U-Net, ViT	Captures both fine-grained and contextual information	Computational complexity/cost (see Section 4.3)	Proposes Enhanced-TransUNet, which combines a ViT with U-Net. It adds an information bottleneck layer to the architecture, aiming to condense features and reduce overfitting.
Li et al. (2023) []	CNN (DLA-34), level-set	Reduces the annotation burden, requires only polygon contours, level-set considers fine-grained boundary information	Computational complexity/cost (see Section 4.3)	A weakly supervised deep active contour model. It uses a deep layer aggregation (DLA) network to deform an initial polygon contour by regressing vertex offsets, guided by a level-set-based loss function.
Sun et al. (2025) []	CLIP, ViT	Captures coarse-grained semantic features and fine-grained spatial details	Domain shift, since CLIP is trained on natural images (see Section 4.4)	Proposes CLIP-TNseg, which integrates a large, pre-trained CLIP model for coarse semantic features and a U-Net-style branch for fine-grained spatial features.
Lu et al. (2022) []	GAN, ResNet	Nodule localization via CAM	Limited GAN-guided deformations used for training	A GAN-based method guided by online class activation mapping (CAM). It uses ResNet for feature extraction and a deformable convolution module to capture discriminative features of nodular regions.
Kunapinun et al. (2023) []	GAN, DeepLab v3+, ResNet18	Captures fine-grained information and maintains high-level consistency (via GAN), addresses GAN instability via PID	Computational complexity/cost (see Section 4.3)	Introduces StableSeg GAN, which combines supervised segmentation with unsupervised learning. It employs DeepLab v3+ as the generator, ResNet18 [] as the discriminator, and a proportional-integral-derivative (PID) controller to stabilize GAN training and avoid mode collapse.
Wan et al. (2023) []	Aggregation network	Captures perfusion dynamics via CEUS, models multiscale temporal features	Lacks flexibility to capture salient perfusion differences	Introduces DpRAN for dynamic contrast-enhanced US (CEUS). It focuses on modeling perfusion dynamics by introducing a perfusion excitation (PE) gate and a cross-attention temporal aggregation (CTA) module.
Zhang et al. (2024) []	AKE, SSFA	Integrates domain knowledge, interpretability	Depends on the availability of pairs of radiologists’ reports and US images (see Section 4.4)	A hybrid method that exploits textual content from clinical reports. It uses an adversarial keyword extraction (AKE) module and a semantic-spatial feature aggregation (SSFA) module to integrate report information.

3. Computational Assessment of Thyroid Nodules

Building on the segmentation-focused methods discussed in the previous section, where the accurate delineation of nodule boundaries is a central task, CAD systems evaluate these segmentation results to support clinical decision-making. These systems potentially offer standardized, reproducible, and efficient support to radiologists, frequently leveraging features derived from structured risk stratification frameworks such as the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TIRADS), which categorizes nodules based on composition, echogenicity, shape, margin, and echogenic foci []. Given that malignant nodules often exhibit specific suspicious morphological traits, such as a taller-than-wide shape or spiculated margins [], CAD systems aid in consistently marking high-risk findings. Several studies have examined the performance of commercial CAD software under different TIRADS frameworks—including the American Thyroid Association (ATA), British Thyroid Association (BTA), European Thyroid Association (EU TIRADS), Korean Society of Thyroid Radiology (KSThR)—showing that diagnostic accuracy can vary depending on parameter settings, with some systems achieving performance comparable to experienced clinicians. An interesting finding is that adjusting the sensitivity to echogenic foci may aid in enhancing specificity with minimal change in sensitivity [].

CAD systems can be particularly valuable for less experienced clinicians, aiding both in standardization and in reducing unnecessary fine-needle aspirations (FNAs) []. Combinations of CAD systems with junior physician assessments have been shown to improve accuracy and agreement with pathological outcomes [,]. For instance, in the study by Fresilli et al. (2020) [], the Korean TIRADS (K-TIRADS) classification system has been used to compare an expert radiologist, a senior resident, a medical student, and a CAD system, as well as to evaluate the interobserver agreement among them. Although the CAD system was outperformed by the expert radiologist with respect to sensitivity and specificity, it outperformed the medical student with respect to specificity. In other studies, DL–based CAD models, including CNNs and ensemble classifiers, have demonstrated performance on par with expert radiologists across multiple datasets [,]. Wang et al. [] obtained a more extreme result: that CAD models may even outperform expert radiologists. Still, the meta-analysis conducted by Zhao et al. [] suggests that the sensitivity of CAD models is similar to that of experienced radiologists, but CAD models have lower specificity and diagnostic odds ratios []. The same meta-analysis demonstrates that the diagnostic accuracy of expert CAD-assisted radiologists is higher than that of either standalone radiologists or standalone CAD systems. Recent efforts have even expanded CAD frameworks to temporal US video analysis, incorporating diagnostic reasoning aligned with radiologists’ workflows while aiding interpretability and generalization [].

To better understand the range of computational strategies used in thyroid nodule assessment, the rest of this section presents key methodologies based on (1) traditional computational assessment, (2) DL-based assessment, and (3) hybrid methods for assessment, combining traditional techniques with DL. This division is essential not only for tracing the historical evolution of CAD systems but also for highlighting the distinctive strengths and limitations of traditional feature-based models versus modern end-to-end neural architectures.

Several traditional machine learning (ML) and image analysis techniques have been applied for assessing thyroid nodules on US images [,,,,,,,,,,,,,,,,,,,]. These approaches typically involve feature extraction followed by classification with supervised learning algorithms. Feature extraction methods include statistical texture descriptors, wavelet transforms, histogram-based features, and morphological characteristics. Classification algorithms such as support vector machines (SVM), k-nearest neighbors (k-NN), (shallow) neural networks, decision trees/random forests, and fuzzy inference systems have been widely adopted. While the handcrafted features used in these traditional nodule assessment methods are considered to be rather discriminative, they cannot easily adapt to changes in different imaging methods or variations in signal-to-noise ratios [].

Unlike traditional ML assessment methods, DL-based assessment methods do not require handcrafted features and ROIs determined by humans []. Instead, they can learn representations directly from raw data. DL-based methods are composed of simple and nonlinear modules that are particularly effective at extracting features from US images. Various DL architectures have been explored to solve problems, with CNNs being the most commonly utilized one.

CNNs have been applied to image analysis since the 1990s, offering superior utilization of spatial and structural information when compared to traditional ML methods. A major advantage, as is the case with other DL architectures, is their ability to operate directly on raw images, bypassing manual preprocessing and handcrafted feature extraction that can introduce errors or biases. A typical CNN architecture consists of an input layer, multiple convolutional and pooling layers, followed by fully connected and output layers. Convolutional layers, the core component of CNNs, extract hierarchical features by learning local patterns and sharing weights, thus significantly reducing the number of parameters and easing network training. The first convolutional layers capture basic structures, such as edges or corners, while deeper layers capture progressively higher-level features. Pooling layers, commonly using max pooling, are interspersed to downsample feature maps, reduce computational load, control overfitting, and enhance robustness to small distortions. Fully connected layers integrate the learned features to perform classification; however, their dense connectivity can lead to overfitting. In the context of thyroid US imaging, CNNs enable automated extraction of latent feature representations related to clinically important attributes such as nodule margins, echogenicity, and texture [,,,,,,,,,,,,,,,,,,,,,,,,,,,]. However, these learned features are not necessarily human-interpretable, as is the case with some traditional handcrafted descriptors. Moreover, challenges such as variability in US acquisition settings, low contrast between nodules and surrounding tissue, and the limited availability of large, annotated thyroid US image datasets make the design and training of effective CNN models particularly challenging.

Finally, hybrid methods have been applied for thyroid nodule assessment. These methods can be divided into three different types: (1) methods that aim to combine the strengths of traditional ML and DL to enhance thyroid nodule classification in US images—these methods typically involve using DL models, such as CNNs, to automatically extract features from raw images, followed by the application of traditional ML classifiers, such as SVMs or shallow feed-forward neural networks, for final classification; (2) methods that combine traditional handcrafted features, such as scale-invariant feature transform (SIFT) [] and speeded-up robust features (SURF) [], with DL architectures; and (3) methods that incorporate expert-defined heuristics or thresholds, as well as domain knowledge and rule-based techniques, in order to guide the learning process and improve interpretability. The latter type of hybrid method is potentially useful for handling complex or ambiguous cases, where both detailed image information and domain knowledge could be useful.

Table 2 presents an overview of methods for thyroid nodule assessment in US images, traditional [,,,,,,,,,,,,,,,,,,,], DL-based [,,,,,,,,,,,,,,,,,,,,,,,,,,], and hybrid methods [,,,,,,,,,,,,,,,,,]. As was the case with the segmentation methods, the evaluation metrics adopted in these works vary, along with the datasets used in the experiments, which include in-house datasets in many cases. Accordingly, we do not provide numerical experimental results. For these, the reader is referred to the original works. A discussion on the main weaknesses identified and the challenges involved follows in Section 4.

Table 2. Overview of assessment methods for thyroid nodules in US images.

4. Discussion

Despite notable progress in the application of AI to thyroid US image analysis for segmentation and assessment of nodules, several challenges remain that hinder consistent clinical deployment, as evident by the weaknesses identified in Table 1 and Table 2. This section discusses the most critical of these challenges—from methodological limitations and data scarcity to the practicalities of clinical integration—and outlines promising future directions.

4.1. Methodological and Technical Challenges

Traditional image processing and classical ML methods, such as active contours and shape-based segmentation, have shown promise in handling certain imaging challenges such as speckle noise []. However, these methods often rely on handcrafted features and manual initialization, limiting automation and introducing user-dependent variability. While active contours, such as JET [], have been proposed to address initialization invariance, sensitivity to energy function parameters, and the need for parameter adjustment persist. Genetic algorithms [], as well as image-based parameter adjustment methods [], have been proposed in this direction, but clinical deployment remains limited. Overall, the lack of end-to-end optimization of the multi-stage pipelines followed by most traditional methods means that each stage—preprocessing, feature extraction, segmentation, or classification—is optimized independently, and the end result is often suboptimal, as errors from earlier stages can propagate and accumulate.

In contrast, modern DL-based methods are architecturally designed to overcome this limitation through end-to-end optimization. These methods, particularly CNNs, have significantly advanced the field by enabling automated feature extraction directly from raw US images, eliminating the need for feature engineering. However, CNNs often require empirical hyperparameter adjustment, and their performance can be compromised when dealing with small nodules or complex backgrounds, especially in cases involving blood vessels or adipose tissue with similar echotexture [], fuzzy boundaries, calcifications [], focal inhomogeneities, nodules in thyroiditis, and inspissated colloid cystic nodules [].

4.2. Explainability, Interpretability, and Clinician Trust

A significant challenge, especially for DL-based methods, involves explainability, interpretability, and clinician trust in AI systems. While DL models often outperform traditional methods in segmentation and classification, their ‘black-box’ nature limits their adoption in clinical workflows (see Table 1 and Table 2). Emerging explainability techniques, such as Grad-CAM, Shapley additive explanations (SHAP), and attention heatmaps, offer partial solutions by highlighting important image regions for predictions. Nevertheless, most of these tools provide only post hoc justification and may not correspond to radiologically meaningful criteria. There is increasing interest in concept bottleneck models and prototype learning frameworks that are also interpretable by enforcing alignment with known diagnostic concepts such as nodule shape, echogenicity, or margin irregularity. Incorporating structured clinical features, such as the ones provided by TIRADS, alongside imaging data in multimodal models enhances interpretability, decision support, and ultimately clinician trust.

Box 2 provides a playbook for thyroid US CAD and TIRADS alignment:

Box 2. Playbook for Thyroid US CAD and TIRADS Alignment

1.: Saliency/Heatmap Methods (achieving explainability)
Principle: These post-hoc methods provide a spatial explanation via a heatmap.
TIRADS Alignment: Indirect & inferred. The model itself remains a black box. The clinician must correlate the heatmap’s focus with suspicious TIRADS descriptors.
Clinical Utility: Validate the model’s spatial plausibility by confirming it is focusing on relevant regions.

2.: Concept-based & Prototypical Methods (achieving interpretability)
Principle: These methods quantify predefined concepts to drive predictions, providing a semantic explanation.
TIRADS Alignment: Direct & intrinsic. Concepts are explicitly mapped to TIRADS features. The output is a quantitative report, which directly mirrors the radiologist’s checklist.
Clinical Utility: Enable a semantic audit of the model’s internal logic. This strengthens clinician trust.

To further concretize the above two types of methods, consider the following clinical application examples:

Example 1—Clinical Use of an Explainable (Heatmap-based) System

Consider a CAD system analyzing a thyroid nodule and predicting a “high suspicion” of malignancy. To explain its prediction, the system generates a Grad-CAM heatmap. The clinician observes that the highlighted area corresponds precisely to a spiculated, irregular margin—a key high-risk feature in the TIRADS lexicon. In this case, the explanation provides a qualitative validation of the model’s prediction by confirming its focus on a clinically relevant feature. The clinician’s trust in the AI’s output is thereby increased, as the model’s reasoning, while not explicit, is spatially plausible.

Example 2—Clinical Use of an Interpretable (Concept-based) System

Consider an interpretable, concept-based CAD system analyzing the same nodule. This system also predicts “high suspicion”. However, instead of a heatmap, its explanation takes the form of a structured, semantic report. The model’s output explicitly states its reasoning in the language of TIRADS, for instance: margin (irregular): 0.92, composition (solid): 0.88, echogenic foci: 0.10. In this case, the clinician’s task is not to interpret a visual overlay but to directly audit the model’s feature-level conclusions. The radiologist can verify that the model correctly identified the irregular margin as the primary driver of the high-risk score. This direct alignment provides a more transparent and justifiable basis for trust, as it allows for a semantic audit of the model’s internal diagnostic logic.

4.3. Computational Cost and Deployment Efficiency

A significant bottleneck for several DL-based methods, as evident in Table 1 and Table 2, lies in their substantial computational complexity and cost. DL-based methods, particularly CNN or ViT-based architectures, demand extensive computational resources for training and inference, often necessitating access to high-performance GPUs or cloud infrastructure. This becomes particularly prohibitive in resource-constrained clinical settings. Moreover, large model sizes and high latency can hinder real-time or edge deployment. To address these challenges, recent advances explore model compression techniques (e.g., pruning, quantization, and knowledge distillation), the development of lightweight architectures (e.g., MobileNets, EfficientNet), and hardware-aware neural architecture search (NAS). Additionally, hybrid models that combine handcrafted features with neural networks or incorporate domain knowledge to reduce data and computing demands offer promising directions toward more efficient and clinically feasible solutions.

Table 3 provides a summary of approximate computational cost metrics for several representative DL architectures. The table is grounded by the foundational reference for each architecture, from which the parameter counts—an intrinsic property of each architecture—are derived. To offer a practical comparison of deployment costs, the performance metrics (FPS and VRAM) are then estimated on a consistent, modern hardware baseline, as these values are highly dependent on the specific experimental setup that cannot be found in the original publications. For a concrete frame of reference, these values are benchmarked under fixed inference conditions (batch size of 1, image size of 512 × 512 pixels) on a modest, widely available GPU (NVIDIA RTX 3060).

Table 3. Approximate computational costs of representative DL architectures *.

4.4. Datasets, Generalization, and Annotation Challenges

Another challenge across both traditional and DL-based approaches is the limited availability, quality, and diversity of annotated datasets (see Table 1 and Table 2). There are some publicly available benchmark datasets for thyroid US image segmentation and assessment, such as the digital database for thyroid images (DDTI) [] and the thyroid nodule 3493 (TN3K) []. Table 4 summarizes the key characteristics of these public benchmark datasets.

Table 4. Overview of public benchmark datasets for thyroid US image analysis.

To provide an objective, albeit indirect, comparison, Table 5 summarizes the reported performance of several methods from this review that were evaluated on either the DDTI or TN3K. It is important to note that the reported scores are obtained from the original publications and are intended to provide a snapshot of the performance, rather than being interpreted as a direct, controlled comparison, considering that the underlying experimental conditions, including data splits, preprocessing, and evaluation protocols, may have varied between studies. The performance metrics in Table 5, i.e., IoU (intersection over union, also known as Jaccard Index), Dice, Accuracy, Precision, and Recall, are widely adopted. For brevity, their definition can be found in the respective original publications (e.g., Boudhiba et al. [] for the segmentation task or Swathi et al. [] for the assessment task).

Table 5. Reported results on DDTI or TN3K dataset.

Still, several works are based on relatively small and institution-specific datasets, leading to overfitting and poor generalization across clinical settings. In some cases, the datasets include only binary classes (benign or malignant), excluding normal thyroid gland cases or images with multiple nodule types, which are common in routine practice []. This restricts the development of robust models for multi-class classification and comprehensive triage systems.

Furthermore, even when using public benchmarks, researchers must be aware of their inherent caveats. A key risk is domain shift; since these datasets are often collected from a single institution, models trained exclusively on them may not generalize well to images from different clinics with different equipment. Image acquisition protocols, US machine types, and probe frequencies vary significantly across centers, introducing distributional shifts that current models struggle to accommodate. Class imbalance is another common issue that can bias model training. Solutions such as domain adaptation, transfer learning, and multi-source learning are promising but require further systematic evaluation.

Semi-supervised and weakly supervised learning, which can leverage partially labeled or unlabeled data, warrant more attention in the thyroid imaging context. Active learning (AL) is a clinician-in-the-loop approach that holds considerable promise in this context. By iteratively selecting the most informative or uncertain samples for annotation, AL can significantly reduce the number of labeled examples required to achieve high performance, while it also facilitates aligning model behavior with clinical reasoning. This is particularly relevant in thyroid US image analysis, where expert labeling is time-consuming and costly. Incorporating AL into ML or DL-based workflows may aid in building more efficient annotation pipelines and accelerate the creation of robust, generalizable models with minimal expert effort.

4.5. Three-Dimensional Imaging, Doppler, Federated Learning, and Future Directions

A growing area of interest is the use of 3D US imaging and Doppler modalities. Compared to 2D scans, 3D US provides a volumetric nodule representation, potentially improving detection of irregular shapes, spiculated margins, and internal vascularity—all of which are critical for malignancy risk stratification []. However, DL models for 3D US images remain relatively unexplored, partly due to the computational demands and the scarcity of publicly available 3D US image datasets. Existing 2D-based CNNs must be adapted or replaced by 3D CNNs, ViTs, or hybrid architectures capable of learning spatiotemporal dependencies across volumes and vascular sequences. Additionally, pretraining on related volumetric tasks (e.g., fetal or liver US) may aid in overcoming data scarcity by means of transfer learning.

The challenge of cross-institutional data privacy and heterogeneity is addressed through federated learning (FL), which, however, has not been sufficiently leveraged for thyroid nodule CAD (see Table 1 and Table 2). According to this learning paradigm, multiple institutions collaboratively train a model without sharing patient data, which remains on local servers. Only model parameters or gradients are exchanged. FL is especially attractive in healthcare due to strict data privacy regulations (e.g., HIPAA, GDPR) and the logistical difficulty of data pooling. In the thyroid imaging domain, FL could aid in constructing generalizable models across populations, devices, and protocols. However, FL faces its own issues: device variability may lead to skewed local data distributions, formally known as statistical heterogeneity (non-IID data), and communication constraints between nodes may slow convergence. These are often compounded by system heterogeneity. Recent research in FL adaptation and personalized FL offers potential solutions, including robust aggregation algorithms (e.g., FedProx [], SCAFFOLD []) to mitigate client drift, allowing the global model to retain universal features while adapting to local peculiarities.

In summary, while computational methods for segmentation and assessment of thyroid nodules on US images have shown considerable promise, especially the DL-based ones, widespread clinical translation will require overcoming key challenges in computational complexity and cost, dataset size and diversity, domain generalization, interpretability, and collaborative human-AI integration. Continued progress will depend not only on algorithmic innovations but also on the development of standardized datasets, shared evaluation benchmarks, and interdisciplinary cooperation between computer scientists, engineers, radiologists, and healthcare providers.

The following practical guide (Box 3) synthesizes the preceding discussion into a summary of the trade-offs among the main methodological families:

Box 3. Trade-offs Among the Main Methodological Families

1.: Traditional ML (e.g., SVMs with handcrafted features):
When to use: Best suited for smaller, well-curated datasets, where strong, interpretable features are already known or can be easily engineered (e.g., analyzing boundary irregularity as in []).
Practical cost: Low computational requirements (no GPU needed for training), but high human effort is required for feature design and validation.
Clinical constraint: Ideal for initial exploratory studies, resource-limited settings, or when a "white-box" model is required for regulatory or clinical trust reasons. Highly interpretable.

2.: CNNs (e.g., U-Net, ResNet):
When to use: The current workhorses for both segmentation and classification when a moderate-to-large annotated dataset (hundreds to thousands of images) is available. They excel at automatically learning hierarchical spatial features from images.
Practical cost: Moderate-to-high computational cost (GPU is typically required for efficient training). Less interpretable than traditional ML.
Clinical constraint: The standard choice for developing robust, high-performance CAD systems for deployment in clinical settings, provided sufficient data and hardware are available

3.: ViTs (e.g., TransUNet, SWIN):
When to use: Best for very large-scale datasets (many thousands of images), where capturing complex, long-range spatial dependencies across the entire image is critical.
Practical cost: Very high computational and data cost. They are “data-hungry” and can overfit on smaller datasets unless sophisticated pretraining strategies are used.
Clinical constraint: Currently best suited for well-funded academic research centers aiming for state-of-the-art performance and working with massive, multi-institutional datasets.

4.: Hybrid methods (e.g., DL-extracted features + SVM classifier):
When to use: A practical compromise when data is limited but you want to leverage the power of deep feature extraction without training a full end-to-end model.
Practical cost: A balance between the two paradigms. Training is often faster and requires less data than a full end-to-end DL model.
Clinical constraint: A pragmatic approach for many research groups or smaller clinical centers that have access to limited datasets but want to move beyond purely traditional methods.

5. Conclusions

This work presented a comprehensive overview and critical evaluation of computational methods for the segmentation and assessment of thyroid nodules in US images. The rest of this section comprises key takeaways and key challenges and outlines the path to clinical integration.

5.1. Key Takeaways

Several key takeaways emerge that define the current state and future direction of the field:

Paradigm shift: The field has decisively transitioned from traditional image analysis and ML methods—such as active contours, shape-based approaches, and handcrafted feature-based classification—to end-to-end DL models. While traditional methods remain interpretable and often effective in constrained settings, they are frequently limited by their dependency on manual input and handcrafted features, as well as by their sensitivity to parameter settings and initialization. In contrast, DL-based methods, with CNNs being particularly prominent, have demonstrated superior performance in both segmentation and classification tasks, largely due to their ability to learn hierarchical representations from raw US data, though this increased performance comes at the cost of generalization, interpretability, and data requirements. ViTs have recently emerged as a compelling alternative to CNNs by capturing long-range dependencies through self-attention mechanisms. Although still in early stages of exploration for thyroid US imaging, ViTs show potential to model complex spatial relationships, particularly in challenging scenarios involving heterogeneous nodule appearance.
Hybrid methods as a pragmatic solution: A significant and growing line of research focuses on hybrid methods that combine the strengths of traditional ML and DL methods for enhanced performance and interpretability. One class of hybrid methods employs DL architectures for automatic feature extraction, followed by classical ML classifiers, such as SVMs or shallow feed-forward neural networks for the final classification. These combinations often improve robustness and reduce overfitting, especially on small datasets. Another class of hybrid methods integrates handcrafted features into DL pipelines, enriching the model with image-specific descriptors that are invariant to scale and rotation. A third class of hybrid methods incorporates expert-defined heuristics or rule-based logic to guide the learning process, for example, by embedding thyroid-specific knowledge or TIRADS-based thresholds, thereby enhancing both performance and interpretability.
Data-centric bottleneck: While novel architectures continue to emerge, the most significant barrier to progress is no longer algorithmic innovation but the data ecosystem. The field’s primary bottleneck has shifted towards solving the challenges of data scarcity, quality, annotation efficiency, and cross-institutional generalization. Future breakthroughs are likely to come from better data strategies than from minor architectural tweaks.
The definition of “state-of-the-art” is expanding beyond accuracy: A model’s success is no longer judged solely on its segmentation or classification accuracy on a benchmark dataset. The definition of state-of-the-art is expanding to include clinical utility and trustworthiness. Future CAD methods will be increasingly evaluated on their interpretability, their explicit alignment with clinical frameworks like TIRADS, and their ability to be safely operationalized within a human-in-the-loop workflow.

5.2. Key Challenges

The Discussion Section outlined several key challenges that must be addressed to enable robust, clinically deployable CAD systems for thyroid nodule assessment. These include:

Dependency on human intervention (mostly in traditional methods): active contour and shape-based methods often require manual initialization and parameter tuning, complicating full automation and clinical deployment.
Data diversity and availability: existing datasets are often small, lack diversity, and often miss clinically relevant classes [].
Generalization across institutions: variability in US equipment, acquisition protocols, and patient populations can lead to significant domain shifts. Solutions such as domain adaptation, transfer learning, and FL are promising.
Small nodule detection: DL models often struggle to detect small nodules embedded in complex anatomical backgrounds, where echogenicity may resemble that of surrounding tissues.
Explainability and trust: DL models are often perceived as “black-boxes”. Post hoc explainability tools like Grad-CAM and SHAP provide partial insight but need better alignment with clinical reasoning.
3D and Doppler imaging: the shift to 3D US and Doppler modalities offers more nuanced visual cues but also incre
ases model complexity and data scarcity [];
Limitations in labeled data availability: annotating US images is time-consuming and costly. Semi-supervised, weakly supervised, and AL methods offer ways to reduce annotation burden.
Privacy and collaboration: FL enables model training across institutions without sharing patient data but introduces technical challenges related to anonymization and communication overhead.

5.3. Clinical Integration

Finally, the clinical integration of CAD systems for thyroid nodule assessment presents notable societal, ethical, and psychological challenges. A primary concern is the unresolved question of legal responsibility. For example, a false-negative result—where an AI model fails to identify a malignant thyroid nodule on a US scan—may lead to delayed treatment and potentially worsen patient outcomes. On the other hand, an incorrect AI-based assessment of a benign lesion as malignant could cause unnecessary psychological distress, leading to invasive procedures, patient anxiety, or long-term mental health impacts. These risks raise difficult questions []: who bears the responsibility for AI-induced diagnostic errors—the software developers, the clinicians using the system, or the healthcare institution? Currently, government agencies, such as the Food and Drug Administration (FDA) in the USA, as well as regulatory frameworks, such as the Medical Device Regulation (MDR) in the EU, dictate that CAD tools are intended to assist—not replace—clinician judgment, and human oversight is expected in clinical deployment.

Additionally, the adoption of such technologies may be met with resistance from clinicians due to concerns over professional displacement. In the context of thyroid US interpretation—where radiologists rely on clinical guidelines, such as TIRADS, and accumulated experience—there may be apprehension that automated systems could eventually marginalize expert roles. Although current AI tools are designed to support, rather than replace, clinical decision-making, the perception of job insecurity and erosion of professional autonomy remains a significant barrier. Addressing these concerns will require transparent model validation, explainable outputs, and clear regulatory frameworks that support collaborative, clinician-in-the-loop workflows rather than substitution.

To facilitate the transition from theory to practice, this review concludes with the following two text boxes. Box 4 offers a structured checklist that clinicians can use to critically evaluate the depth and validity of a CAD method’s claimed alignment with the TIRADS framework. Box 5 presents an operational guide for clinical departments on how to implement these methods safely and effectively, addressing the key challenges of shared liability, human oversight, and continuous quality control.

Box 4. A Clinician’s Checklist for Evaluating TIRADS Alignment of a CAD Method

Before adopting a CAD method, clinicians and departments should respond to the following questions on the depth, robustness, and safety of TIRADS alignment:

1.: What is the depth of the TIRADS alignment?
Risk-stratification alignment: Does the model only provide a final, high-level risk score (e.g., “TR4” or “high suspicion”)?
Feature-level alignment: Does it provide granular, feature-level scores for each of the five TIRADS categories (composition, echogenicity, shape, margin, and echogenic foci)? This type of alignment is more transparent and clinically useful.

2.: What is the validated performance granularity?
Feature-Specific Accuracy: Has the system’s performance been validated for each individual TIRADS feature? A common failure mode is high overall accuracy that masks a critical weakness in detecting a specific high-risk feature (e.g., the model may excel at “composition” but fail on “echogenic foci”). Request a feature-by-feature performance breakdown.
Handling of Ambiguity: How is the model’s performance on ambiguous or borderline cases quantified? Inquire about its accuracy in distinguishing between clinically similar but distinct categories, such as an "ill-defined" margin (lower risk) versus an "irregular" margin (higher risk).

3.: How was the model validated against real-world variability?
Was the model’s alignment validated on a diverse, multi-institutional dataset that includes different US machine vendors and patient populations? A model that performs well on clean, single-center data may fail when faced with noisy images acquired from diverse equipment. How does it perform on rare but clinically important nodule subtypes?

4.: What was the quality and scale of the annotation data used for training?
Was the data annotated by a single expert or by a consensus of multiple experienced radiologists? High-quality, consensus-based annotations are critical for building a robust model. How many examples of each specific feature (especially high-risk ones) were used in training?

5.: Does the system explain its feature-level conclusions?
If the model assigns a high-risk score for "margin," does it provide a heatmap or other type of representation to localize the suspicious feature along the nodule boundary?

Box 5. Integrating CAD into the Clinical Workflow: An Operational Guide

1.: Establishment of a shared responsibility policy
Before deployment, a formal policy should be established to clearly delineate liability. This policy should define the responsibilities of the AI vendor, the clinician (as the ultimate authority for the final diagnosis), and the healthcare institution (for providing adequate training and quality control).

2.: Implementation of a mandatory human oversight protocol
The “assist, not replace” paradigm should be reinforced by formalizing the clinician-in-the-loop workflow. The protocol should mandate that any AI-generated finding, score, or segmentation be reviewed, verified, and explicitly accepted or rejected by a qualified radiologist before being entered into the patient’s official report.

3.: Development of a continuous training and feedback program
To address clinician concerns and build trust, a transparent training program should be developed. This program should extend beyond basic use to include education on the limitations and common failure modes of AI. A clear feedback channel should also be established for clinicians to report and document cases of incorrect or ambiguous AI outputs.

4.: Institution of regular quality control audits
The performance of the CAD system should not be taken solely on the vendor’s claims. Periodic audits should be instituted to evaluate the system’s performance on a curated set of local cases with known pathological outcomes. This ensures the model remains robust on the department’s specific patient population and US equipment.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflict of interest.

References

Burman, K.D.; Wartofsky, L. Thyroid Nodules. N. Engl. J. Med. 2015, 373, 2347–2356. [Google Scholar] [CrossRef]
Cao, C.L.; Li, Q.L.; Tong, J.; Shi, L.N.; Li, W.X.; Xu, Y.; Cheng, J.; Du, T.T.; Li, J.; Cui, X.W. Artificial intelligence in thyroid ultrasound. Front. Oncol. 2023, 13, 1060702. [Google Scholar] [CrossRef]
Wu, X.; Tan, G.; Luo, H.; Chen, Z.; Pu, B.; Li, S.; Li, K. A knowledge-interpretable multi-task learning framework for automated thyroid nodule diagnosis in ultrasound videos. Med. Image Anal. 2024, 91, 103039. [Google Scholar] [CrossRef]
Shahroudnejad, A.; Vega, R.; Forouzandeh, A.; Balachandran, S.; Jaremko, J.; Noga, M.; Hareendranathan, A.R.; Kapur, J.; Punithakumar, K. Thyroid nodule segmentation and classification using deep convolutional neural network and rule-based classifiers. In Proceedings of the International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Online, 1–5 November 2021; pp. 3118–3121. [Google Scholar]
Kang, S.; Lee, E.; Chung, C.W.; Jang, H.N.; Moon, J.H.; Shin, Y.; Kim, K.; Li, Y.; Shin, S.M.; Kim, Y.H.; et al. A beneficial role of computer-aided diagnosis system for less experienced physicians in the diagnosis of thyroid nodule on ultrasound. Sci. Rep. 2021, 11, 20448. [Google Scholar]
Fresilli, D.; Grani, G.; De Pascali, M.L.; Alagna, G.; Tassone, E.; Ramundo, V.; Ascoli, V.; Bosco, V.; Biffoni, M.; Bononi, M.; et al. Computer-aided diagnostic system for thyroid nodule sonographic evaluation outperforms the specificity of less experienced examiners. J. Ultrasound 2020, 23, 169–174. [Google Scholar] [CrossRef]
Maroulis, D.E.; Savelonas, M.A.; Iakovidis, D.K.; Karkanis, S.A.; Dimitropoulos, N. Variable background active contour model for computer-aided delineation of nodules in thyroid ultrasound images. IEEE Trans. Inf. Technol. Biomed. 2007, 11, 537–543. [Google Scholar] [CrossRef]
Li, L.R.; Du, B.; Liu, H.Q.; Chen, C. Artificial intelligence for personalized medicine in thyroid cancer: Current status and future perspectives. Front. Oncol. 2021, 10, 604051. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Kim, Y.J.; Choi, Y.; Hur, S.J.; Park, K.S.; Kim, H.J.; Seo, M.; Lee, M.K.; Jung, S.L.; Jung, C.K. Deep convolutional neural network for classification of thyroid nodules on ultrasound: Comparison of the diagnostic performance with that of radiologists. Eur. J. Radiol. 2022, 152, 110335. [Google Scholar] [CrossRef] [PubMed]
Koh, J.; Lee, E.; Han, K.; Kim, E.K.; Son, E.J.; Sohn, Y.M.; Seo, M.; Kwon, M.R.; Yoon, J.H.; Lee, J.H.; et al. Diagnosis of thyroid nodules on ultrasonography by a deep convolutional neural network. Sci. Rep. 2020, 10, 15245. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; You, H.; Li, K. A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images. Comput. Methods Programs Biomed. 2020, 185, 105329. [Google Scholar] [CrossRef] [PubMed]
Chambara, N.; Ying, M. The diagnostic efficiency of ultrasound computer–aided diagnosis in differentiating thyroid nodules: A systematic review and narrative synthesis. Cancers 2019, 11, 1759. [Google Scholar] [CrossRef] [PubMed]
Anand, V.; Koundal, D. Computer-Assisted Diagnosis of Thyroid Cancer Using Medical Images: A Survey. In Recent Innovations in Computing; Singh, P.K., Kar, A.K., Singh, Y., Kolekar, M.H., Tanwar, S., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 597, pp. 543–559. [Google Scholar]
Gulame, M.B.; Dixit, V.V.; Suresh, M. Thyroid nodules segmentation methods in clinical ultrasound images: A review. Mater. Today Proc. 2021, 45, 2270–2276. [Google Scholar]
Sorrenti, S.; Dolcetti, V.; Radzina, M.; Bellini, M.I.; Frezza, F.; Munir, K.; Grani, G.; Durante, C.; D’Andrea, V.; David, E.; et al. Artificial intelligence for thyroid nodule characterization: Where are we standing? Cancers 2022, 14, 3357. [Google Scholar] [CrossRef] [PubMed]
Ludwig, M.; Ludwig, B.; Mikuła, A.; Biernat, S.; Rudnicki, J.; Kaliszewski, K. The use of artificial intelligence in the diagnosis and classification of thyroid nodules: An update. Cancers 2023, 15, 708. [Google Scholar] [CrossRef]
Yadav, N.; Dass, R.; Virmani, J. A systematic review of machine learning based thyroid tumor characterisation using ultrasonographic images. J. Ultrasound 2024, 27, 209–224. [Google Scholar] [CrossRef]
Mylona, E.A.; Savelonas, M.A.; Maroulis, D. Automated adjustment of region-based active contour parameters using local image geometry. IEEE Trans. Cybern. 2014, 44, 2757–2770. [Google Scholar] [CrossRef]
Mylona, E.A.; Savelonas, M.A.; Maroulis, D. Self-parameterized active contours based on regional edge structure for medical image segmentation. SpringerPlus 2014, 3, 424. [Google Scholar] [CrossRef]
Nugroho, H.A.; Nugroho, A.; Choridah, L. Thyroid nodule segmentation using active contour bilateral filtering on ultrasound images. In Proceedings of the ΙΕΕΕ International Conference on Quality in Research (QiR), Lombok, Indonesia, 10–13 August 2015; pp. 43–46. [Google Scholar]
Chan, T.F.; Vese, L.A. Active contours without edges. IEEE Trans. Image Process. 2001, 10, 266–277. [Google Scholar] [CrossRef]
Savelonas, M.A.; Iakovidis, D.K.; Legakis, I.; Maroulis, D. Active Contours guided by Echogenicity and Texture for Delineation of Thyroid Nodules in Ultrasound Images. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 519–527. [Google Scholar] [CrossRef]
Du, W.; Sang, N. An effective method for ultrasound thyroid nodules segmentation. In Proceedings of the International Symposium on Bioelectronics and Bioinformatics (ISBB), Beijing, China, 14–17 October 2015; pp. 207–210. [Google Scholar]
Koundal, D.; Gupta, S.; Singh, S. Automated delineation of thyroid nodules in ultrasound images using spatial neutrosophic clustering and level set. Appl. Soft Comput. 2016, 40, 86–97. [Google Scholar] [CrossRef]
Koundal, D.; Gupta, S.; Singh, S. Computer aided thyroid nodule detection system using medical ultrasound images. Biomed. Signal Process. Control 2018, 40, 117–130. [Google Scholar]
Li, C.; Xu, C.; Gui, C.; Fox, M.D. Distance regularized level set evolution and its application to image segmentation. IEEE Trans. Image Process. 2010, 19, 3243–3254. [Google Scholar] [CrossRef] [PubMed]
Koundal, D.; Sharma, B.; Guo, Y. Intuitionistic based segmentation of thyroid nodules in ultrasound images. Comput. Biol. Med. 2020, 121, 103776. [Google Scholar] [CrossRef] [PubMed]
Le, Y.; Xu, X.; Zha, L.; Zhao, W.; Zhu, Y. Tumour localisation in ultrasound-guided high-intensity focused ultrasound ablation using improved gradient and direction vector flow. IET Image Process. 2015, 9, 857–865. [Google Scholar]
Cheng, J.; Foo, S.W. Dynamic directional gradient vector flow for snakes. IEEE Trans. Image Process. 2006, 15, 1563–1571. [Google Scholar] [PubMed]
Tsantis, S.; Dimitropoulos, N.; Cavouras, D.; Nikiforidis, G. A hybrid multi-scale model for thyroid nodule boundary detection on ultrasound images. Comput. Methods Programs Biomed. 2006, 84, 86–98. [Google Scholar] [CrossRef]
Chiu, L.Y.; Chen, A. A variance-reduction method for thyroid nodule boundary detection on ultrasound images. In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), Taipei, Taiwan, 18–22 August 2014; pp. 681–685. [Google Scholar]
Madabhushi, A.; Metaxas, D.N. Combining low-, high-level and empirical domain knowledge for automated segmentation of ultrasonic breast lesions. IEEE Trans. Med. Imaging 2003, 22, 155–169. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. Med. Image Comput. Comput. Assist. Interv. 2015, 9351, 234–241. [Google Scholar]
Ma, J.; Wu, F.; Jiang, T.; Zhao, Q.; Kong, D. Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 1895–1910. [Google Scholar] [CrossRef]
Wang, S.; Li, Z.; Liao, L.; Zhang, C.; Zhao, J.; Sang, L.; Qian, W.; Pan, G.; Huang, L.; Ma, H. DPAM-PSPNet: Ultrasonic image segmentation of thyroid nodule based on dual-path attention mechanism. Phys. Med. Biol. 2023, 68, 165002. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Zhou, S.; Wu, H.; Gong, J.; Le, T.; Wu, H.; Chen, Q.; Xu, Z. Mark-guided segmentation of ultrasonic thyroid nodules using deep learning. In Proceedings of the ACM International Symposium on Image Computing and Digital Medicine, Chengdu, China, 13–14 October 2018; pp. 21–26. [Google Scholar]
Nandamuri, S.; China, D.; Mitra, P.; Sheet, D. SUMNet: Fully convolutional model for fast segmentation of anatomical structures in ultrasound volumes. In Proceedings of the IEEE International Symposium on Biomedical Imaging, Venice, Italy, 8–11 April 2019; pp. 1729–1732. [Google Scholar]
Abdolali, F.; Kapur, J.; Jaremko, J.L.; Noga, M.; Hareendranathan, A.R.; Punithakumar, K. Automated thyroid nodule detection from ultrasound imaging using deep convolutional neural networks. Comput. Biol. Med. 2020, 122, 103871. [Google Scholar] [CrossRef]
Kumar, V.; Webb, J.; Gregory, A.; Meixner, D.D.; Knudsen, J.M.; Callstrom, M.; Fatemi, M.; Alizad, A. Automated segmentation of thyroid nodule, gland, and cystic components from ultrasound images using deep learning. IEEE Access 2020, 8, 63482–63496. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. very deep convolutional networks for large-scale image recognition. Int. J. Comput. Assist. Radiol. Surg. 2015, 9351, 234–241. [Google Scholar]
Wu, Y.; Shen, X.; Bu, F.; Tian, J. Ultrasound image segmentation method for thyroid nodules using ASPP fusion features. IEEE Access 2020, 8, 172457–172466. [Google Scholar] [CrossRef]
Webb, J.M.; Meixner, D.D.; Adusei, S.A.; Polley, E.C.; Fatemi, M.; Alizad, A. Automatic deep learning semantic segmentation of ultrasound thyroid cineclips using recurrent fully convolutional networks. IEEE Access 2021, 9, 5119–5127. [Google Scholar] [CrossRef] [PubMed]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Nugroho, H.A.; Frannita, E.L.; Nurfauzi, R. An automated detection and segmentation of thyroid nodules using Res-UNet. In Proceedings of the IEEE International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Semarang, Indonesia, 20–21 October 2021; pp. 181–185. [Google Scholar]
Xiangyu, D.; Huan, Z.; Yahan, Y. Ultrasonic image segmentation algorithm of thyroid nodules based on DPCNN. In Lecture Notes in Electrical Engineering, Proceedings of the International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD), Online, 25–26 March 2021; Su, R., Zhang, Y.D., Liu, H., Eds.; Springer: Singapore, 2021; Volume 784, pp. 163–174. [Google Scholar]
Nguyen, D.T.; Choi, J.; Park, K.R. Thyroid nodule segmentation in ultrasound image based on information fusion of suggestion and enhancement networks. Mathematics 2022, 10, 3484. [Google Scholar] [CrossRef]
Yang, Q.; Geng, C.; Chen, R.; Pang, C.; Han, R.; Lyu, L.; Zhang, Y. DMU-Net: Dual-route mirroring U-Net with mutual learning for malignant thyroid nodule segmentation. Biomed. Signal Process. Control 2022, 77, 103805. [Google Scholar] [CrossRef]
Song, R.; Zhu, C.; Zhang, L.; Zhang, T.; Luo, Y.; Liu, J.; Yang, J. Dual-branch network via pseudo-label training for thyroid nodule detection in ultrasound image. Appl. Intell. 2022, 52, 11738–11754. [Google Scholar] [CrossRef]
Chen, L.; Zheng, W.; Hu, W. MTN-Net: A multi-task network for detection and segmentation of thyroid nodules in ultrasound images. In Knowledge Science, Engineering and Management KSEM 2022 Lecture Notes in Computer Science; Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M., Eds.; Springer: Cham, Switzerland, 2022; Volume 13370. [Google Scholar]
Gan, J.; Zhang, R. Ultrasound image segmentation algorithm of thyroid nodules based on improved U-Net network. In Proceedings of the ACM International Conference on Control, Robotics and Intelligent System, Online, 26–28 August 2022; pp. 61–66. [Google Scholar]
Jin, Z.; Li, X.; Zhang, Y.; Shen, L.; Lai, Z.; Kong, H. Boundary regression-based deep neural network for thyroid nodule segmentation in ultrasound images. Neural Comput. Appl. 2022, 34, 22357–22366. [Google Scholar] [CrossRef]
Shao, J.; Pan, T.; Fan, L.; Li, Z.; Yang, J.; Zhang, S.; Zhang, J.; Chen, D.; Zhu, X.; Chen, H.; et al. FCG-Net: An innovative full-scale connected network for thyroid nodule segmentation in ultrasound images. Biomed. Signal Process. Control 2023, 86, 105048. [Google Scholar] [CrossRef]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Chen, Y.; Wu, J. UNet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Online, 4–8 May 2020. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
Sifre, L.; Mallat, S. Rigid-motion scattering for texture classification. arXiv 2014, arXiv:1403.1687. [Google Scholar] [CrossRef]
Dai, H.; Xie, W.; Xia, E. SK-Unet++: An improved Unet++ network with adaptive receptive fields for automatic segmentation of ultrasound thyroid nodule images. Med. Phys. 2024, 51, 1798–1811. [Google Scholar] [CrossRef]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the International Workshop in Deep Learning in Medical Image Analysis (DLMIA) and Multimodal Learning for Clinical Decision (ML-CDS), Held in Conjunction with Medical Image Computing and Computer Assisted Intervention (MICCAI), Granada, Spain, 20 September 2018; pp. 3–11. [Google Scholar]
Zheng, T.; Qin, H.; Cui, Y.; Wang, R.; Zhao, W.; Zhang, S.; Geng, S.; Zhao, L. Segmentation of thyroid glands and nodules in ultrasound images using the improved U-Net architecture. BMC Med. Imaging 2023, 23, 56. [Google Scholar] [CrossRef]
Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Zhang, Z.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; et al. ResNeSt: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 2735–2745. [Google Scholar]
Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. InternImage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 2735–2745e. [Google Scholar]
Wang, R.; Zhou, H.; Fu, P.; Shen, H.; Bai, Y. A multiscale attentional Unet model for automatic segmentation in medical ultrasound images. Ultrason. Imaging 2023, 45, 159–174. [Google Scholar] [CrossRef]
Chen, H.; Yu, M.; Chen, C.; Zhou, K.; Qi, S.; Chen, Y.; Xiao, R. FDE-net: Frequency-domain enhancement network using dynamic-scale dilated convolution for thyroid nodule segmentation. Comput. Biol. Med. 2023, 153, 106514. [Google Scholar] [CrossRef]
Ma, W.; Li, X.; Zou, L.; Fan, C.; Wu, M. Symmetrical awareness network for cross-site ultrasound thyroid nodule segmentation. Front. Public Health 2023, 11, 1055815. [Google Scholar] [CrossRef]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Bouhdiba, K.; Meddeber, L.; Meddeber, M.; Zouagui, T. A hybrid deep neural network architecture RefineUnet for ultrasound thyroid nodules segmentation. In Proceedings of the International Conference on Advanced Electrical Engineering (ICAEE), Sidi-Bel-Abbes, Algeria, 5–7 November 2024; pp. 1–6. [Google Scholar]
Ma, J.; Kong, D.; Wu, F.; Bao, L.; Yuan, J.; Liu, Y. Densely connected convolutional networks for ultrasound image based lesion segmentation. Comput. Biol. Med. 2024, 168, 107725. [Google Scholar] [CrossRef]
Liu, J.; Mu, J.; Sun, H.; Dai, C.; Ji, Z.; Ganchev, I. BFG&MSF-Net: Boundary feature guidance and multi-scale fusion network for thyroid nodule segmentation. IEEE Access 2024, 12, 78701–78713. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
Xing, G.; Wang, S.; Gao, J.; Li, X. Real-time reliable semantic segmentation of thyroid nodules in ultrasound images. Phys. Med. Biol. 2024, 69, 025016. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Huang, H.; Shao, Y.; Chen, B. DAC-Net: A light-weight U-shaped network based efficient convolution and attention for thyroid nodule segmentation. Comput. Biol. Med. 2024, 180, 108972. [Google Scholar] [CrossRef]
Xiao, N.; Kong, D.; Wang, J. Ultrasound thyroid nodule segmentation algorithm based on DeepLabV3+ with EfficientNet. J. Digit. Imaging 2025. [Google Scholar] [CrossRef]
Ongole, D.; Saravanan, S. Advanced thyroid nodule detection using ultrasonography image analysis and bilateral mean clustering strategies. Comput. Biol. Med. 2025, 186, 109647. [Google Scholar] [CrossRef]
Xiang, Z.; Tian, X.; Liu, Y.; Chen, M.; Zhao, C.; Tang, L.N.; Xue, E.S.; Zhou, Q.; Shen, B.; Li, F.; et al. Federated learning via multi-attention guided UNet for thyroid nodule segmentation of ultrasound images. Neural Netw. 2025, 181, 106754. [Google Scholar] [CrossRef]
Yadav, N.; Dass, R.; Virmani, J. Objective assessment of segmentation models for thyroid ultrasound images. J. Ultrasound 2022, 26, 673–685. [Google Scholar] [CrossRef] [PubMed]
Ali, H.; Wang, M.; Xie, J. CIL-Net: Densely connected context information learning network for boosting thyroid nodule segmentation using ultrasound images. Con. Comput. 2024, 16, 1176–1197. [Google Scholar] [CrossRef]
Sun, S.; Fu, C.; Xu, S.; Wen, Y.; Ma, T. CRSANet: Class representations self-attention network for the segmentation of thyroid nodules. Biomed. Signal Process. Control 2024, 91, 105917. [Google Scholar] [CrossRef]
Wu, Y.; Huang, L.; Yang, T. Thyroid nodule ultrasound image segmentation based on improved SWIN transformer. IEEE Access 2025, 13, 19788–19795. [Google Scholar] [CrossRef]
Ma, X.; Sun, B.; Liu, W.; Sui, D.; Chen, J.; Tian, Z. AMSeg: A novel adversarial architecture based multi-scale fusion framework for thyroid nodule segmentation. IEEE Access 2023, 11, 72911–72924. [Google Scholar] [CrossRef]
Bi, H.; Cai, C.; Sun, J.; Jiang, Y.; Lu, G.; Shu, H.; Ni, X. BPAT-UNet: Boundary preserving assembled transformer UNet for ultrasound thyroid nodule segmentation. Comput. Methods Programs Biomed. 2023, 238, 107614. [Google Scholar] [CrossRef] [PubMed]
Zheng, S.; Yu, S.; Wang, Y.; Wen, J. GWUNet: A UNet with gated attention and improved wavelet transform for thyroid nodules segmentation. In MultiMedia Modeling MMM 2025 Lecture Notes in Computer Science; Ide, I., Kompatsiaris, I., Xu, C., Yanai, K., Chu, W.T., Nitta, N., Riegler, M., Yamasaki, T., Eds.; Springer: Singapore, 2025; Volume 15521. [Google Scholar]
Li, G.; Chen, R.; Zhang, J.; Liu, K.; Geng, C.; Lyu, L. Fusing enhanced transformer and large kernel CNN for malignant thyroid nodule segmentation. Biomed. Signal Process. Control 2023, 83, 104636. [Google Scholar] [CrossRef]
Ozcan, A.; Tosun, Ö.; Donmez, E.; Sanwal, M. Enhanced-TransUNet for ultrasound segmentation of thyroid nodules. Biomed. Signal Process. Control 2024, 95, 106472. [Google Scholar] [CrossRef]
Li, Z.; Zhou, S.; Chang, C.; Wang, Y.; Guo, Y. A weakly supervised deep active contour model for nodule segmentation in thyroid ultrasound images. Pattern Recognit. Lett. 2023, 165, 128–137. [Google Scholar]
Sun, X.; Wei, B.; Jiang, Y.; Mao, L.; Zhao, Q. CLIP-TNseg: A multi-modal hybrid framework for thyroid nodule segmentation in ultrasound images. IEEE Signal Process. Lett. 2025, 32, 1625–1629. [Google Scholar]
Lu, J.; Ouyang, X.; Shen, X.; Liu, T.; Cui, Z.; Wang, Q.; Shen, D. GAN-guided deformable attention network for identifying thyroid nodules in ultrasound images. IEEE J. Biomed. Health Inform. 2022, 26, 1582–1590. [Google Scholar]
Kunapinun, A.; Dailey, M.N.; Songsaeng, D.; Parnichkun, M.; Keatmanee, C.; Ekpanyapong, M. Improving GAN learning dynamics for thyroid nodule segmentation. Ultrasound Med. Biol. 2023, 49, 416–430. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Wan, P.; Xue, H.; Liu, C.; Chen, F.; Kong, W.; Zhang, D. Dynamic perfusion representation and aggregation network for nodule segmentation using contrast-enhanced US. IEEE J. Biomed. Health Inform. 2023, 27, 3431–3442. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, W.; Li, X.; Shen, L.; Lai, Z.; Kong, H. Adversarial keyword extraction and semantic-spatial feature aggregation for clinical report guided thyroid nodule segmentation. In Pattern Recognition and Computer Vision (PRCV), Lecture Notes in Computer Science; Liu, Q., Wang, H., Ma, Z., Zheng, W., Zha, H., Chen, X., Wang, L., Ji, R., Eds.; Springer: Singapore, 2024; Volume 14437. [Google Scholar]
Chambara, N.; Liu, S.Y.W.; Lo, X.; Ying, M. Diagnostic performance evaluation of different TI-RADS using ultrasound computer-aided diagnosis of thyroid nodules: An experience with adjusted settings. PLoS ONE 2021, 16, e0245617. [Google Scholar] [CrossRef] [PubMed]
Han, M.; Ha, E.J.; Park, J.H. Computer-aided diagnostic system for thyroid nodules on ultrasonography: Diagnostic performance based on the thyroid imaging reporting and data system classification and dichotomous outcomes. AJNR Am. J. Neuroradiol. 2021, 42, 559–565. [Google Scholar]
Xie, F.; Luo, Y.K.; Lan, Y.; Tian, X.Q.; Zhu, Y.Q.; Jin, Z.; Zhang, Y.; Zhang, M.B.; Song, Q.; Zhang, Y. Differential diagnosis and feature visualization for thyroid nodules using computer-aided ultrasonic diagnosis system: Initial clinical assessment. BMC Med. Imaging 2022, 22, 153. [Google Scholar] [CrossRef]
Gomes Ataide, E.J.; Jabaraj, M.S.; Schenke, S.; Petersen, M.; Haghghi, S.; Wuestemann, J.; Illanes, A.; Friebe, M.; Kreissl, M.C. Thyroid nodule detection and region estimation in ultrasound images: A comparison between physicians and an automated decision support system approach. Diagnostics 2023, 13, 2873. [Google Scholar] [CrossRef]
Wang, Y.; Yue, W.; Li, X.; Liu, S.; Guo, L.; Xu, H.; Zhang, H.; Yang, G. Comparison study of radiomics and deep learning-based methods for thyroid nodules classification using ultrasound images. IEEE Access 2020, 8, 52010–52017. [Google Scholar] [CrossRef]
Zhao, W.J.; Fu, L.R.; Huang, Z.M.; Zhu, J.Q.; Ma, B.Y. Effectiveness evaluation of computer-aided diagnosis system for the diagnosis of thyroid nodules on ultrasound: A systematic review and meta-analysis. Medicine 2019, 98, e16379. [Google Scholar] [CrossRef]
Savelonas, M.A.; Iakovidis, D.K.; Dimitropoulos, N.; Maroulis, D. Computational characterization of thyroid tissue in the Radon domain. In Proceedings of the IEEE International Symposium on Computer-Based Medical Systems (CBMS), Maribor, Slovenia, 20–22 June 2007; pp. 189–192. [Google Scholar]
Seabra, J.C.R.; Fred, A.L.N. A Biometric identification system based on thyroid tissue echo-morphology. In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing, Porto, Portugal, 14–17 January 2009; pp. 186–193. [Google Scholar]
Savelonas, M.; Maroulis, D.; Sangriotis, M. A Computer-aided system for malignancy risk assessment of nodules in thyroid US images based on boundary features. Comput. Methods Programs Biomed. 2009, 96, 25–32. [Google Scholar] [CrossRef]
Legakis, I.; Savelonas, M.A.; Maroulis, D.; Iakovidis, D.K. Computer-based nodule malignancy risk assessment in thyroid ultrasound images. Int. J. Comput. Appl. 2011, 33, 29–35. [Google Scholar] [CrossRef]
Iakovidis, D.K.; Keramidas, E.G.; Maroulis, D. Fusion of fuzzy statistical distributions for classification of thyroid ultrasound patterns. Artif. Intell. Med. 2010, 50, 33–41. [Google Scholar] [CrossRef] [PubMed]
Katsigiannis, S.; Keramidas, E.; Maroulis, D. A Contourlet transform feature extraction scheme for ultrasound thyroid texture classification. Eng. Intell. Syst. 2010, 18, 171. [Google Scholar]
Keramidas, E.G.; Maroulis, D.; Iakovidis, D.K. ΤND: A thyroid nodule detection system for analysis of ultrasound images and videos. J. Med. Syst. 2012, 36, 1271–1281. [Google Scholar] [CrossRef] [PubMed]
Kale, S.; Dudhe, A. Texture analysis of thyroid ultrasonography images for diagnosis of benign & malignant nodule using feed forward neural network. Int. J. Comput. Engin. Manag. 2018, 5, 205–211. [Google Scholar]
Kale, S.; Punwatkar, K. Texture analysis of ultrasound medical images for diagnosis of thyroid nodule using support vector machine. Int. J. Comput. Sc. Mob. Comput. 2013, 2, 71–77. [Google Scholar]
Vadhiraj, V.V.; Simpkin, A.; O’Connell, J.; Singh Ospina, N.; Maraka, S.; O’Keeffe, D.T. Ultrasound image classification of thyroid nodules using machine learning techniques. Medicina 2021, 57, 527. [Google Scholar] [CrossRef]
Nanda, S.; Sukumar, M. Thyroid nodule classification using steerable pyramid–based features from ultrasound images. J. Clin. Eng. 2018, 43, 149–158. [Google Scholar] [CrossRef]
Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of the European Conference on Machine Learning (ECML), Catania, Italy, 6–8 April 1994; pp. 171–182. [Google Scholar]
Acharya, U.R.; Chowriappa, P.; Fujita, H.; Bhat, S.; Dua, S.; Koh, J.E.W.; Eugene, L.W.J.; Kongmebhol, P.; Ng, K.H. Thyroid lesion classification in 242 patient population using Gabor transform features from high resolution ultrasound images. Knowl. -Based Syst. 2016, 107, 235–245. [Google Scholar]
Raghavendra, U.; Acharya, U.R.; Gudigar, A.; Tan, J.H.; Fujita, H.; Hagiwara, Y.; Molinari, F.; Kongmebhol, P.; Ng, K.H. Fusion of spatial gray level dependency and fractal texture features for the characterization of thyroid lesions. Ultrasonics 2017, 77, 110–120. [Google Scholar] [CrossRef] [PubMed]
Xia, J.; Chen, H.; Li, Q.; Zhou, M.; Chen, L.; Cai, Z.; Fang, Y.; Zhou, H. Ultrasound-based differentiation of malignant and benign thyroid Nodules: An extreme learning machine approach. Comput. Methods Programs Biomed. 2017, 147, 37–49. [Google Scholar] [CrossRef] [PubMed]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Prochazka, A.; Gulati, S.; Holinka, S.; Smutek, D. Classification of thyroid nodules in ultrasound images using direction-independent features extracted by two-threshold binary decomposition. Technol. Cancer Res. Treat. 2019, 18, 1533033819830748. [Google Scholar] [CrossRef] [PubMed]
Raghavendra, U.; Gudigar, A.; Maithri, M.; Gertych, A.; Meiburger, K.M.; Yeong, C.H.; Madla, C.; Kongmebhol, P.; Molinari, F.; Ng, K.H.; et al. Optimized multi-level elongated quinary patterns for the assessment of thyroid nodules in ultrasound images. Comput. Biol. Med. 2018, 95, 55–62. [Google Scholar] [CrossRef]
Gomes Ataide, E.J.; Ponugoti, N.; Illanes, A.; Schenke, S.; Kreissl, M.; Friebe, M. Thyroid nodule classification for physician decision support using machine learning-evaluated geometric and morphological features. Sensors 2020, 20, 6110. [Google Scholar] [CrossRef]
Barzegar-Golmoghani, E.; Mohebi, M.; Gohari, Z.; Aram, S.; Mohammadzadeh, A.; Firouznia, S.; Shakiba, M.; Naghibi, H.; Moradian, S.; Ahmadi, M.; et al. ELTIRADS framework for thyroid nodule classification integrating elastography, TIRADS, and radiomics with interpretable machine learning. Sci. Rep. 2025, 15, 8763. [Google Scholar] [CrossRef]
Chi, J.; Walia, E.; Babyn, P.; Wang, J.; Groot, G.; Eramian, M. Thyroid nodule classification in ultrasound images by fine-tuning deep convolutional neural network. J. Digit. Imaging 2017, 30, 477–486. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Varhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Liang, X.; Yu, J.; Liao, J.; Chen, Z. Convolutional neural network for breast and thyroid nodules diagnosis in ultrasound imaging. BioMed Res. Int. 2020, 2020, 1763803. [Google Scholar] [CrossRef]
Zhu, Y.; Fu, Z.; Fei, J. An image augmentation method using convolutional network for thyroid nodule classification by transfer learning. In Proceedings of the IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2017. [Google Scholar]
Song, R.; Zhang, L.; Zhu, C.; Liu, J.; Yang, J.; Zhang, T. Thyroid nodule ultrasound image classification through hybrid feature cropping network. IEEE Access 2020, 8, 64064–64074. [Google Scholar] [CrossRef]
Zhang, S.; Du, H.; Jin, Z.; Zhu, Y.; Zhang, Y.; Xie, F.; Zhang, M.; Tian, X.; Zhang, J.; Luo, Y. A novel interpretable computer-aided diagnosis system of thyroid nodules on ultrasound based on clinical experience. IEEE Access 2020, 8, 53223–53231. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Wang, L.; Zhou, X.; Nie, X.; Lin, X.; Li, J.; Zheng, H.; Xue, E.; Chen, S.; Chen, C.; Du, M.; et al. A multi-scale densely connected convolutional neural network for automated thyroid nodule classification. Front. Neurosci. 2022, 16, 878718. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Wang, Q.; Zhao, J.; Yu, H.; Wang, F.; Zhang, J. Automatic ultrasound diagnosis of thyroid nodules: A combination of deep learning and KWAK TI-RADS. Phys. Med. Biol. 2023, 68, 205021. [Google Scholar] [CrossRef] [PubMed]
Kwon, S.W.; Choi, I.J.; Kang, J.Y.; Jang, W.I.; Lee, G.H.; Lee, M.C. Ultrasonographic thyroid nodule classification using a deep convolutional neural network with surgical pathology. J. Digit. Imaging 2020, 33, 1202–1208. [Google Scholar] [CrossRef]
Zhao, Z.; Yang, C.; Wang, Q.; Zhang, H.; Shi, L.; Zhang, Z. A deep learning-based method for detecting and classifying the ultrasound images of suspicious thyroid nodules. Med. Phys. 2021, 48, 7959–7970. [Google Scholar] [CrossRef]
Han, B.; Zhang, M.; Gao, X.; Wang, Z.; You, F.; Li, H. Automatic classification method of thyroid pathological images using multiple magnification factors. Neurocomputing 2021, 460, 231–242. [Google Scholar] [CrossRef]
Chatfield, K.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Return of the devil in the details: Delving deep into convolutional nets. arXiv 2014, arXiv:1405.3531. [Google Scholar] [CrossRef]
Liu, Z.; Zhong, S.; Liu, Q.; Xie, C.; Dai, Y.; Peng, C.; Chen, X.; Zou, R. Thyroid nodule recognition using a joint convolutional neural network with information fusion of ultrasound images and radiofrequency data. Eur. Radiol. 2021, 31, 5001–5011. [Google Scholar] [CrossRef]
Wang, M.; Yuan, C.; Wu, D.; Zeng, Y.; Zhong, S.; Qiu, W. Automatic segmentation and classification of thyroid nodules in ultrasound images with convolutional neural networks. In Segmentation, Classification, and Registration of Multi-Modality Medical Imaging Data MICCAI 2020 Lecture Notes in Computer Science; Shusharina, N., Heinrich, M.P., Huang, R., Eds.; Springer: Cham, Switzerland, 2021; Volume 12587. [Google Scholar]
Kang, Q.; Lao, Q.; Li, Y.; Jiang, Z.; Qiu, Y.; Zhang, S.; Li, K. Thyroid nodule segmentation and classification in ultrasound images through intra- and inter-task consistent learning. Med. Image Anal. 2022, 79, 102443. [Google Scholar] [CrossRef]
Zhou, H.; Wang, R.; Zhou, M.; Fu, P.; Bai, Y. A deep learning-based cascade automatic classification system for malignant thyroid nodule recognition in ultrasound image. J. Phys. Conf. Ser. 2022, 2363, 012029. [Google Scholar] [CrossRef]
Wang, Y.; Gan, J. Benign and malignant classification of thyroid nodules based on ConvNeXt. In Proceedings of the ACM International Conference on Control, Robotics and Intelligent System, Online, 26–28 August 2022; pp. 56–60. [Google Scholar]
Yang, J.; Shi, X.; Wang, B.; Qiu, W.; Tian, G.; Wang, X.; Wang, P.; Yang, J. Ultrasound image classification of thyroid nodules based on deep learning. Front. Oncol. 2022, 12, 905955. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2019, 128, 336–359. [Google Scholar] [CrossRef]
Wang, B.; Yuan, F.; Lv, Z.; He, Y.; Chen, Z.; Hu, J.; Yu, J.; Zheng, S.; Liu, H. Hierarchical deep learning networks for classification of ultrasonic thyroid nodules. J. Imaging Sci. Technol. 2022, 66, 040409. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2017, arXiv:1703.06870. [Google Scholar]
Xing, G.; Miao, Z.; Zheng, Y.; Zhao, M. A multi-task model for reliable classification of thyroid nodules in ultrasound images. Biomed. Eng. Lett. 2024, 14, 187–197. [Google Scholar] [CrossRef] [PubMed]
Yadav, N.; Dass, R.; Virmani, J. Deep learning-based CAD system design for thyroid tumor characterization using ultrasound images. Multimed. Tools Appl. 2024, 83, 43071–43113. [Google Scholar] [CrossRef]
Al-Jebrni, A.H.; Ali, S.G.; Li, H.; Lin, X.; Li, P.; Jung, Y.; Kim, J.; Feng, D.D.; Sheng, B.; Jiang, L.; et al. SThy-Net: A feature fusion-enhanced dense-branched modules network for small thyroid nodule classification from ultrasound images. Vis. Comput. 2023, 39, 3675–3689. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Cao, J.; Zhu, Y.; Tian, X.; Wang, J. Tnc-Net: Automatic classification for thyroid nodules lesions using convolutional neural network. IEEE Access 2024, 12, 84567–84578. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded up robust features. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
Aboudi, N.; Khachnaoui, H.; Moussa, O.; Khlifa, N. Bilinear pooling for thyroid nodule classification in ultrasound imaging. Arab. J. Sci. Eng. 2023, 48, 10563–10573. [Google Scholar] [CrossRef]
Göreke, V. A novel deep-learning-based CADx architecture for classification of thyroid nodules using ultrasound images. Interdiscip. Sci. Comput. Life Sci. 2023, 15, 360–373. [Google Scholar] [CrossRef] [PubMed]
Wang, M.; Chen, C.; Xu, Z.; Xu, L.; Zhan, W.; Xiao, J.; Hou, Y.; Huang, B.; Huang, L.; Li, S. An interpretable two-branch bi-coordinate network based on multi-grained domain knowledge for classification of thyroid nodules in ultrasound images. Med. Image Anal. 2024, 97, 103255. [Google Scholar] [CrossRef]
Xie, J.; Guo, L.; Zhao, C.; Li, X.; Luo, Y.; Jianwei, L. A hybrid deep learning and handcrafted features based approach for thyroid nodule classification in ultrasound images. J. Phys. Conf. Ser. 2020, 1693, 012160. [Google Scholar] [CrossRef]
Ojala, T.; Pietikäinen, M.; Mäenpää, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
Hang, Y. Thyroid nodule classification in ultrasound images by fusion of conventional features and Res-GAN deep features. J. Healthc. Eng. 2021, 2021, 9917538. [Google Scholar] [CrossRef] [PubMed]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
Shankarlal, B.; Sathya, P.D.; Sakthivel, V.P. Computer-aided detection and diagnosis of thyroid nodules using machine and deep learning classification algorithms. IETE J. Res. 2023, 69, 995–1006. [Google Scholar] [CrossRef]
Swathi, G.; Altalbe, A.; Kumar, R.P. QuCNet: Quantum-inspired convolutional neural networks for optimized thyroid nodule classification. IEEE Access 2024, 12, 27829–27842. [Google Scholar] [CrossRef]
Duan, X.; Duan, S.; Jiang, P.; Li, R.; Zhang, Y.; Ma, J.; Zhao, H.; Dai, H. An ensemble deep learning architecture for multilabel classification on TI-RADS. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Republic of Korea, 16–19 December 2020; pp. 576–582. [Google Scholar]
Liu, W.; Lin, C.; Chen, D.; Niu, L.; Zhang, R.; Pi, Z. Shape-margin knowledge augmented network for thyroid nodule segmentation and diagnosis. Comput. Methods Programs Biomed. 2024, 244, 107999. [Google Scholar] [CrossRef] [PubMed]
Zhao, S.X.; Chen, Y.; Yang, K.F.; Luo, Y.; Ma, B.Y.; Li, Y.J. A local and global feature disentangled network: Toward classification of benign-malignant thyroid nodules from ultrasound image. IEEE Trans. Med. Imaging 2022, 41, 1497–1509. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Zhou, X.; Shi, G.; Xiao, N.; Song, K.; Zhao, J.; Hao, R.; Li, K. Semantic consistency generative adversarial network for cross-modality domain adaptation in ultrasound thyroid nodule classification. Appl. Intell. 2022, 52, 10369–10383. [Google Scholar] [CrossRef] [PubMed]
Avola, D.; Cinque, L.; Fagioli, A.; Filetti, S.; Grani, G.; Rodola, E. Multimodal feature fusion and knowledge-driven learning via experts consult for thyroid nodule classification. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2527–2534. [Google Scholar] [CrossRef]
Deng, P.; Han, X.; Wei, X.; Chang, L. Automatic classification of thyroid nodules in ultrasound images using a multi-task attention network guided by clinical knowledge. Comput. Biol. Med. 2022, 150, 106172. [Google Scholar] [CrossRef]
Bai, Z.; Chang, L.; Yu, R.; Li, X.; Wei, X.; Yu, M.; Liu, Z.; Gao, J.; Zhu, J.; Zhang, Y.; et al. Thyroid nodules risk stratification through deep learning based on ultrasound images. Med. Phys. 2020, 47, 6355–6365. [Google Scholar] [CrossRef]
Yang, W.; Dong, Y.; Du, Q.; Qiang, Y.; Wu, K.; Zhao, J.; Yang, X.; Zia, M.B. Integrate domain knowledge in training multi-task cascade deep learning model for benign–malignant thyroid nodule classification on ultrasound images. Eng. Appl. Artif. Intell. 2021, 98, 104064. [Google Scholar] [CrossRef]
Shi, G.; Wang, J.; Qiang, Y.; Yang, X.; Zhao, J.; Hao, R.; Yang, W.; Du, Q.; Kazihise, N.G.F. Knowledge-guided synthetic medical image adversarial augmentation for ultrasonography thyroid nodule classification. Comput. Methods Programs Biomed. 2020, 196, 105611. [Google Scholar] [CrossRef]
Iakovidis, D.K.; Savelonas, M.A.; Karkanis, S.A.; Maroulis, D.E. A genetically optimized level set approach to segmentation of thyroid ultrasound images. Appl. Intell. 2007, 27, 193–203. [Google Scholar] [CrossRef]
Ma, J.; Wu, F.; Jiang, T.; Zhu, J.; Kong, D. Cascade convolutional neural networks for automatic detection of thyroid nodules in ultrasound images. Med. Phys. 2017, 44, 1678–1691. [Google Scholar] [CrossRef]
Molnár, K.; Kálmán, E.; Hári, Z.; Giyab, O.; Gáspár, T.; Rucz, K.; Bogner, P.; Tóth, A. false-positive malignant diagnosis of nodule mimicking lesions by computer-aided thyroid nodule analysis in clinical ultrasonography practice. Diagnostics 2020, 10, 378. [Google Scholar] [CrossRef] [PubMed]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual , 3–7 May 2021. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. SWIN Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Online, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Pedraza, L.; Vargas, C.; Narváez, F.; Durán, O.; Muñoz, E.A.; Romero, E. An open access thyroid ultrasound image database. In Proceedings of the International Symposium on Medical Information Processing and Analysis, Cartagena de Indias, Colombia, 14–16 October 2015; pp. 1–6. [Google Scholar]
Gong, H.; Chen, J.; Chen, G.; Li, H.; Li, G.; Chen, F. Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules. Comput. Biol. Med. 2022, 155, 106389. [Google Scholar] [CrossRef] [PubMed]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. In Proceedings of the Machine Learning Systems (MLSys) Conference, Austin, TX, USA, 2–4 March 2020. [Google Scholar]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the International Conference on Machine Learning (ICML), Cambridge, MA, USA, 13–18 July 2020; pp. 5132–5143. [Google Scholar]

Figure 1. A mind map-style taxonomy of AI-guided methods for thyroid US image analysis. The field is organized in four key pillars branching out: segmentation, computational assessment, learning paradigms, and clinical integration factors.

Figure 2. PRISMA flowchart illustrating the literature search protocol of this review.

Figure 3. Examples of thyroid nodules in US images: (first column) original thyroid US images; (second column) ground truth annotations; (third column) segmentation results with manual parameter adjustment; (fourth column) segmentation results with automatic parameter adjustment. Reproduced from [], which is an open-access article distributed under a Creative Commons Attribution 4.0 International License.

Table 2. Overview of assessment methods for thyroid nodules in US images.

Reference	Method	Strengths	Weaknesses	Description
Savelonas et al. (2007) []	Radon transform, SVM, k-NN	Physically interpretable hypothesis: different tissue types exhibit patterns of distinct directionality, limited computational complexity/cost	Standalone Radon features provide only partial texture representation, no end-to-end optimization (see Section 4.1), empirical parameter adjustment (see Section 4.1)	Differentiates thyroid tissue based on directionality patterns, encoded using Radon transform features and classified with k-NN and SVM.
Seabra et al. (2009) []	Radon transform, wavelet transform, Rayleigh-based, PCA, k-NN, MAP	Leverages complementary descriptors, limited computational complexity/cost	No end-to-end optimization (see Section 4.1), empirical parameter adjustment (see Section 4.1)	A multi-descriptor scheme that combines Radon transform features with Rayleigh parameters, mid-distance term, and wavelet coefficients, using PCA for dimensionality reduction before classification.
Savelonas et al. (2009) []	Boundary features, SVM, k-NN	Aligned with TIRADS malignancy risk stratification, distinguishes medium-risk and high-risk nodules, limited computational complexity/cost	Standalone boundary features cannot capture tissue texture (see Section 4.1)	A malignancy risk assessment method based on nodule boundary features. It uses features that correlate with TIRADS descriptors, like boundary irregularity and local echogenicity variance.
Legakis et al. (2011) []	Active contour, boundary and textural features, SVM	Unified for full workflow (detection, segmentation, assessment), combines boundary and textural features	No end-to-end optimization (see Section 4.1), empirical parameter adjustment (see Section 4.1)	A multi-stage pipeline that integrates processes for thyroid gland boundary detection, nodule detection and delineation (using active contours), and malignancy risk assessment.
Iakovidis et al. (2010) []	Fuzzy local binary patterns, fuzzy intensity histograms, SVM	Noise robustness, limited computational complexity/cost	No end-to-end optimization (see Section 4.1), empirical parameter adjustment (see Section 4.1)	A texture-based method that represents sonographic features, using a noise-resistant combination of fuzzy local binary patterns (LBP) for texture and fuzzy intensity histograms for echogenicity.
Katsigianis et al. (2010) []	Contourlet transform, SFFS	Contourlet transform captures multidirectional and multiscale texture patterns, noise robustness	No end-to-end optimization (see Section 4.1), empirical parameter adjustment (see Section 4.1)	A texture representation scheme based on the Contourlet transform. It uses thresholding to select the most significant transform coefficients, followed by sequential forward feature selection (SFFS).
Keramidas et al. (2012) []	Fuzzy local binary patterns, fuzzy intensity histograms, SVM	Addresses both ROI detection and assessment, noise robustness, limited computational complexity/cost	No end-to-end optimization (see Section 4.1), empirical parameter adjustment (see Section 4.1)	Introduces the thyroid nodule detection (TND) system, which combines automatic rough region-of-interest (ROI) extraction via unsupervised gland boundary detection with a fuzzy approach that represents US patterns by means of fuzzy LBP and fuzzy gray-level histograms.
Kale et al. (2018, 2013) [,], Vadhiraj et al. (2021) []	GLCM, two-layer feed-forward neural network	Limited computational complexity/cost, SVM variants ([,]) are less prone to overfitting	No end-to-end optimization (see Section 4.1), GLCM is sensitive to noise (see Section 4.1)	This series of works uses gray-level co-occurrence matrix (GLCM) features for nodule assessment, exploring classification with both feed-forward neural networks and SVMs.
Nanda et al. (2018) []	GLCM, ReliefF, SVM	Captures directional and multiscale information, addresses overfitting (via ReliefF and SVM)	No end-to-end optimization (see Section 4.1), GLCM is sensitive to noise (see Section 4.1)	Combines GLCM with textural features derived from a steerable pyramid decomposition, which captures information at different scales and orientations. It uses ReliefF [] for feature selection.
Acharya et al. (2016) []	Gabor transform, ReliefF, various classifiers	Captures directional and frequency-based information (via Gabor), addresses dataset imbalance, addresses overfitting (via ReliefF and SVM)	No end-to-end optimization (see Section 4.1), computational complexity/cost of Gabor transform/entropy calculations (for a non-DL-based method)	An automated method that extracts entropy features from a Gabor transform of high-resolution US images. It uses locality sensitive discriminant analysis (LSDA) for feature reduction and over-sampling strategies to handle data imbalance.
Raghvendra et al. (2017) []	SGLDF/fractal features	Captures spatial dependencies, self-similar patterns, and irregularities, can be integrated with various classifiers	No end-to-end optimization (see Section 4.1), boundary features may be overlooked	Fuses spatial gray-level dependence features (SGLDF) with fractal texture features to represent the intrinsic structure of thyroid lesions, using graph-based marginal Fisher analysis (MFA) for feature reduction.
Xia et al. (2017) []	ELM, ReliefF	Low computational complexity/cost	Sensitivity to initialization/hyperparameter adjustment (see Section 4.1), lower capacity than DL-based methods (see Section 4.1)	Employs an extreme learning machine (ELM) [], an efficient algorithm for single-hidden-layer feedforward networks, to discriminate between malignant and benign nodules.
Prochazka et al. (2019) []	Fractal features, SVM, random forests	Captures self-similar patterns and irregularities, robust to probe orientation and inclination	Requires manual segmentation (see Section 4.1), small dataset size (see Section 4.4)	A fractal-based method that uses histogram analysis and segmentation-based fractal texture analysis to extract direction-independent features, aiming for invariance to the US probe’s orientation.
Raghavendra et al. (2018) []	Elongated quinary patterns, PSO, SVM	Captures local directional textural information, addresses overfitting (with PSO/SVM)	Depends on preprocessing, difficult to interpret (see Section 4.2)	Uses optimized multi-level elongated quinary patterns to extract higher-order spectral (HOS) entropy features for classification.
Gomes Ataide et al. (2020) []	Random forest, geometrical/morphological features	Clinically interpretable features aligned with TIRADS, low computational complexity/cost	Textural/echogenicity features are not considered	Extracts a comprehensive set of 27 geometric and morphological features and selects a subset of 11 features that are evaluated as significant according to the TIRADS framework.
Barzegar-Golmoghani et al. (2025) []	SVM, random forest, SHAP, PDP	Clinically interpretable features aligned with TIRADS	Requires US elastography	Introduces ELTIRADS, a multimodal framework that integrates US elastography data with standard TIRADS scores and radiomic features. It employs Shapley additive explanations (SHAP) and partial dependence plots (PDP) for enhanced interpretability.
Chi et al. (2017) []	GoogLeNet, random forest	Addresses dataset imbalance	No end-to-end optimization (see Section 4.1), limited interpretability (see Section 4.2)	An early hybrid, CNN-based method that fine-tunes a pre-trained GoogLeNet [] model and uses the extracted deep features in a cost-sensitive random forest classifier.
Liang et al. (2020) []	CNN	Data-driven feature extraction	No end-to-end optimization (see Section 4.1), limited interpretability (see Section 4.2)	A similar early hybrid, CNN-based method that also uses a pre-trained model to extract features, which are then used to train a separate classifier for the final assessment.
Zhu et al. (2017) []	ResNet18	Data-driven feature extraction	Limited interpretability (see Section 4.2)	Fine-tunes a ResNet18 architecture using transfer learning and a CNN-based data augmentation strategy to generate new training samples.
Song et al. (2020) []	CNN	Multi-branch network, captures both local and global information	Computational complexity/cost (see Section 4.3), limited interpretability (see Section 4.2)	Introduces a hybrid multi-branch CNN. It combines a global branch for overall classification with a feature cropping branch that performs multi-cropping on feature maps to focus on local discriminative details.
Duan et al. (2020) []	EfficientNet, feature pyramid network	Clinically interpretable features aligned with TIRADS, captures multiscale information	Computational complexity/cost (see Section 4.3)	An ensemble architecture that integrates three modules for multi-class TIRADS feature assessment: an EfficientNet, a feature engineering module for geometric/textural features, and a feature pyramid network.
Zhang et al. (2020) []	ResNet50	Clinically interpretable features aligned with TIRADS	Computational complexity/cost (see Section 4.3)	Develops an interpretable CAD system (iCADs) comprising ResNet50 [] trained on ImageNet [] for benign/malignant classification and an ensemble of six specialized sub-networks, each trained to assess a specific TIRADS feature.
Wang et al. (2022) []	CNN, HAC	Captures multiscale information, enhanced contextual awareness	Computational complexity/cost (see Section 4.3)	Introduces n-ClsNet, a CNN model featuring a multiscale classification layer, multiple skip blocks, and a hybrid atrous convolution (HAC) block to enhance the learning of spatial information at different scales.
Zhang et al. (2023) []	CNN	Multilabel output, aligned with TIRADS	Computational complexity/cost (see Section 4.3)	Multi-label, two-stage pipeline. It first uses a U-Net++ variant for segmentation, followed by a multi-task CNN that classifies benign/malignant status and four specific malignant features aligned with TIRADS.
Wang et al. (2020) []	VGG16	Combines low-level textural information with high-level semantic information	VGG16 lacks residual connections, no end-to-end optimization (see Section 4.1)	A method that combines high-throughput statistical feature extraction with a fine-tuned, pre-trained VGG16 model, using linear discriminant analysis (LDA) for dimensionality reduction.
Kwon et al. (2020) []	VGG16	Addresses overfitting	VGG16 lacks residual connections, limited interpretability (see Section 4.2)	Employs a pre-trained VGG16 model with transfer learning, using data augmentation and global average pooling to mitigate overfitting.
Zhao et al. (2021) []	ResNet50, CBAM	End-to-end optimization, captures multiscale information	Small-sized dataset used, sensitive to dataset imbalance, annotations were performed by radiologists of various levels of experience (see Section 4.4)	A two-stage framework. The first stage uses a single-shot detection (SSD) network for nodule detection, and the second stage employs a convolution-based attention module (CBAM) to aid a ResNet50 in focusing on relevant features for classification.
Han et al. (2021) []	VGG-f	Inspired by radiologists’ cognitive process, captures multiscale information, copes with annotation effort by means of AL	Complex pipeline, VGG-f is outdated when compared to more recent backbones (e.g., ResNet, EfficientNet)	An attention-driven framework inspired by the radiologist’s cognitive process (“zoom-in”). It uses VGG-f [] for low-magnification suspicious region detection, as well as for high-magnification patch classification.
Liu et al. (2021) []	Two-branched CNN	Uses complementary, bimodal information, end-to-end optimization	Requires RF signal data, which are often not accessible	An information fusion-based joint CNN (IF-JCNN) that evaluates both US images and radiofrequency (RF) signals. It uses a two-branched CNN to extract features from each modality, which are fused at the backend.
Wang et al. (2021) []	Dual-attention ResNet variant	End-to-end optimization	Computational complexity/cost (see Section 4.3)	Introduces a cascaded dual-attention ResNet-based network that integrates segmentation and classification in a single pipeline, aiming for efficient and automatic end-to-end analysis.
Kang et al. (2022) []	U-Net	End-to-end optimization, consistency constraints address task conflict	Computational complexity/cost (see Section 4.3), requires annotations for multiple tasks	A multi-task learning framework that jointly addresses segmentation and classification. It introduces intra- and inter-task consistent learning to enforce consistent predictions among different but related tasks.
Zhou et al. (2022) []	Res-U-Net, CVAE	Addresses small dataset size and categorical imbalance, aligned with TIRADS	No end-to-end optimization (see Section 4.1), computational complexity/cost (see Section 4.3)	Introduces a cascaded automatic classification system (CACS), which uses a Res-U-Net for ROI extraction and a conditional variational autoencoder (CVAE) to generate synthetic US images for data augmentation before final classification.
Wang et al. (2022) []	ConvNeXt	End-to-end optimization	Interpretability (see Section 4.2)	Benign/malignant classification based on transfer learning using a pre-trained ConvNeXt [].
Yang et al. (2022) []	ResNet18, Grad-CAM	Computational complexity/cost, interpretability	ResNet18 may underperform when compared to deeper networks	Employs ResNet18 with transfer learning and uses gradient-weighted class activation mapping (Grad-CAM) [] to visualize the model’s attention.
Wang et al. (2022) []	Mask-RCNN	End-to-end optimization	Computational complexity/cost (see Section 4.3)	A mask-guided hierarchical DL (MHDL) framework. It first uses a mask R-CNN [] to locate the nodule ROI, then a residual attention network to extract depth features, and finally an attention drop-based CNN for classification.
Xing et al. (2024) []	CNN, ASPP	End-to-end optimization, captures contextual information	Biased dataset (see Section 4.4), complex architecture	A multi-task framework that combines dense connectivity, squeeze-and-excitation (SE) connectivity, and an atrous spatial pyramid pooling (ASPP) layer to enhance feature extraction and optimize feature reuse.
Yadav et al. (2024) []	CNN, PCA-SVM	Direct, comparative benchmark of various CNN backbones for feature extraction	No end-to-end optimization (see Section 4.1)	A wide range of pre-trained CNNs (e.g., AlexNet, VGG, GoogLeNet, DenseNet, ResNet) are compared as feature extractors. The extracted deep features are classified by a PCA-SVM pipeline.
Al-Jebrni et al. (2023) []	Inception v3/ResNet-inspired	End-to-end optimization, addresses small nodules, captures multiscale information	Computational complexity/cost (see Section 4.3), dataset is small/biased (see Section 4.4)	Introduces SThy-Net, a network explicitly designed for small thyroid nodules. It is inspired by Inception-v3 [] and features a dense-branched module and a Gaussian-enhanced feature fusion module.
Li et al. (2023) []	CNN/ViT	Captures multiscale information and nodule shape features	Small-sized dataset (see Section 4.4)	A DL-based CAD method (TCNet) that fuses a large kernel CNN branch (for shape features) with an enhanced ViT branch (for remote connections) via a multiscale fusion module (MFM).
Cao et al. (2024) []	SP-LeNet, ConvNeXt	Addresses small data volume and categorical imbalance, captures both contextual and fine-grained information	Computational complexity/cost (see Section 4.3), not aligned with TIRADS (see Section 4.2)	Introduces Tnc-Net, which consists of three parts: an SP-LeNet backbone for global features on small datasets, a ConvNeXt branch for local features, and a feature aggregation module.
Wu et al. (2024) []	YOLO, BiLSTM	Uses US video, exploits spatio-temporal information, inspired by radiologists’ cognitive processes	Computational complexity/cost (see Section 4.3), depends on US video availability	A framework for thyroid nodule assessment in US video. It is based on a “you-only-look-once” (YOLO) architecture and an attention-driven bidirectional LSTM (BiLSTM) to collect spatio-temporal information between video frames.
Aboudi et al. (2023) []	CNN, SVM	Rich feature representation	Computational complexity/cost (see Section 4.3), small-sized dataset (see Section 4.4), interpretability (see Section 4.4)	Investigates the use of bilinear models, which consist of a fusion of the outputs of two different CNNs (e.g., VGG19 and ResNet50) using outer products, which are then classified by a linear SVM.
Göreke (2023) []	VGG16, SVM	Integrates domain knowledge, captures both textural and morphological features	No end-to-end optimization (see Section 4.1)	A hybrid method, where the final feature vector used by an SVM is formed by combining textural and morphological features that have been extracted by a pre-trained VGG16 network.
Wang et al. (2024) []	CNN, FPN	Integrates domain knowledge, aligned with TIRADS, interpretability	Computational complexity/cost (see Section 4.3), depends on a large amount of expert annotations (see Section 4.4)	Introduces multi-grained domain knowledge representation (MG-DKR) to convert image features into classification labels and masks. A two-branch bi-coordinate network (TB2C-Net) learns from this representation.
Xie et al. (2020) []	Two-branched ResNet, LBP	LBP induces inductive bias, aiding in cases of small-sized datasets	Computational complexity/cost (see Section 4.3), risk of feature redundancy	A hybrid method that combines DL with handcrafted features. It uses a two-stream deep neural network that processes both the source US image and its corresponding LBP feature map.
Hang (2021) []	Res-GAN, LBP, SIFT, SURF, random forest	Handcrafted features induce inductive bias, aiding in cases of small-sized datasets	Computational complexity/cost (see Section 4.3), risk of feature redundancy	A hybrid method that fuses traditional handcrafted features (histogram-of-gradients (HoG) [], LBP, SIFT, SURF) with deep features extracted from a residual GAN (Res-GAN).
Shankarlal et al. (2023) []	Kirch’s edge detector, DTCT, CANFES, CNN	DTCT captures multiscale features, CANFES aids interpretability	No end-to-end optimization (see Section 4.1), interpretability (see Section 4.2)	A hybrid CAD method that uses Kirsch’s edge detector and a dual-tree contourlet transform (DTCT) for feature extraction. The extracted features are used by a co-active adaptive neuro-fuzzy expert system (CANFES).
Swathi et al. (2024) []	Quanvolutional filter, CNN	Captures multiscale information	Interpretability (see Section 4.2)	Introduces QuCNet, a quantum-inspired CNN. It combines features extracted by a “quanvolutional” filter, which uses equations from random quantum circuits, with a typical CNN for classification.
Duan et al. (2020) []	EfficientNet, FPN	Granular, multilabel classification, aligned with TIRADS, interpretability	No end-to-end optimization (see Section 4.1), computational complexity/cost (see Section 4.3)	An ensemble architecture for multi-class TIRADS assessment that integrates three distinct feature extraction modules: EfficientNet, handcrafted geometric/textural features, and a multi-scale feature pyramid network (FPN).
Liu et al. (2024) []	CNN, ViT	Captures multiscale information, interpretability via CAM heatmaps	Computational complexity/cost (see Section 4.3), prior knowledge is limited to shape/margin	Introduces SKA-Net, a shape-margin knowledge augmented network. It is a dual-branch architecture for joint segmentation and diagnosis that uses an exponential mixture module to combine convolutional and self-attention feature maps.
Zhao et al. (2022) []	CNN	Disentangles tissue and anatomical information, aiding interpretability, aligned with radiologists’ cognitive process, end-to-end optimization	Computational complexity/cost (see Section 4.3)	Introduces LoGo-Net, a local and global feature disentangled network inspired by human vision. It uses a tissue-anatomy disentangled (TAD) block to decouple local tissue features from global anatomical features.
Shahroudnejad et al. (2021) []	U-Net variant	Aligned with TIRADS	Requires manually extracted ROI, struggles with heterogeneous nodules (see Section 4.1)	Analyzes videos and incorporates two rule-based classifiers (for composition and echogenicity) that are directly aligned with TIRADS classifications, in addition to the main DL model.
Zhao et al. (2022) []	SCGAN	Augments US image information with textual information, addresses domain shifts with semantic consistency loss	Computational complexity/cost (see Section 4.1), depends on the availability of paired text-US image data (see Section 4.4)	Introduces SCGAN, a deeply fused semantic consistency GAN. It organically incorporates domain knowledge from accompanying text reports, using self-attention and metric learning.
Avola et al. (2022) []	LBP, DWT, CNN variants (DenseNet, AlexNet, ResNet, VGG) ensemble	Handcrafted features and the use of a network ensemble aid training	Computational complexity/cost (see Section 4.3), interpretability (see Section 4.2)	A CAD method that leverages cues from an ensemble of expert networks (pre-trained on ImageNet) to guide the learning phase of a final DenseNet classifier, reducing the number of samples required for training.
Deng et al. (2022) []	ResNet50 variant	ACR TIRADS aligned, multitask learning aids generalization	Computational complexity/cost (see Section 4.3), interpretability (see Section 4.2), limited stratification of echogenic foci, requires ACR TIRADS annotations (see Section 4.2)	A multi-task branching attention network that incorporates domain knowledge by explicitly classifying each of the ACR TIRADS descriptors. It uses a variant of ResNet50 as its feature extraction backbone.
Bai et al. (2020) []	CNN	ACR TIRADS aligned, incorporates radiologists’ cognitive process	Requires ACR TIRADS annotations (see Section 4.2), focuses on boundary features, and ignores features in nodule areas	Introduces risk stratification network (RS-Net), an automatic hierarchical method that incorporates a CNN with domain knowledge aligned with ACR TIRADS, in order to perform risk stratification.
Yang et al. (2021) []	U-Net, DScGAN, S3VM	Addresses limitations in the availability of annotated data, integrates domain knowledge	Computational complexity/cost (see Section 4.3), depends on the availability of pairs of B-mode and elastography images	A multi-task cascade DL model (MCDLM) that integrates radiologists’ domain knowledge (DK) by using U-Net to transfer boundary information and assists a dual-path semi-supervised GAN (DScGAN) for classification.
Shi et al. (2020) []	CNN, LSTM, ACGAN	Integrates domain knowledge via term encoder, addresses data scarcity and dataset imbalance, aligned with TIRADS	Depends on the availability of pairs of radiologists’ reports and US images (see Section 4.4)	A knowledge-guided adversarial augmentation method. It employs term and image encoders in order to exploit radiologists’ domain knowledge in the form of conditions to constrain the auxiliary classifier GAN (ACGAN) framework for the synthesis of thyroid nodule images.

Table 3. Approximate computational costs of representative DL architectures *.

Architecture	Task	Parameters (M)	FPS **	VRAM (GB) ***
VGG16 []	Classification	138	60	3.0
ResNet50 []	Classification	25.6	120	2.5
EfficientNet-B0 []	Classification	5.3	150	1.5
U-Net [] ****	Segmentation	31	90	3.0
DeepLab-v3+ []	Segmentation	41	80	3.5
ViT Base [] *****	Classification	86	75	4.0
SWIN Tiny [] *****	Classification	29	100	2.8

* Values are approximations for illustrative purposes and can vary based on implementation. ** Frames per second (FPS) estimated for inference on a 512 × 512 image with a batch size of 1 on an NVIDIA RTX 3060 GPU. *** Video memory (VRAM) footprint estimated under the same inference conditions. **** Parameter count for U-Net can vary significantly based on the depth and initial filter count; a common variant is shown. ***** ViT performance (FPS) is highly sensitive to image resolution due to the quadratic complexity of the self-attention mechanism.

Table 4. Overview of public benchmark datasets for thyroid US image analysis.

Characteristic	DDTI (Digital Database for Thyroid Images)	TN3K (Thyroid Nodule 3K)
Reference	Pedraza et al. []	Gong et al. []
Size	99 patients, 416 images	>3500 patients, >4500 images
Classes	Benign, malignant, normal tissue	Benign, malignant; TIRADS categories (2–5)
Imaging Modality	B-mode US	B-mode US
Image Resolution	576 × 768 pixels	Varies
Annotation Type	Nodule boundary masks	Nodule boundary masks
Licensing	Open access for research	Open access for research
Typical Use	Foundational segmentation and classification tasks	Large-scale DL model training, TIRADS-aligned tasks

Table 5. Reported results on DDTI or TN3K dataset.

Reference	Method	Task	Dataset	IoU	Dice	Accuracy *	Precision *	Recall *
Wang et al. (2023) []	CNN (DPAM-PSPNet)	Segmentation	DDTI	0.69	0.80	0.95	-	-
Wang et al. (2023) []	CNN (DPAM-PSPNet)	Segmentation	TN3K	0.75	0.85	0.97	-	-
Shao et al. (2023) []	U-Net3+ variant	Segmentation	DDTI	0.60	0.75
Bouhdiba et al. (2024) []	U-Net and RefineUNet	Segmentation	TN3K	0.96	0.81	0.94	0.88	0.82
Liu et al. (2024) []	U-Net and ConvNeXt	Segmentation	DDTI	0.69	0.81	-	0.81	0.84
Liu et al. (2024) []	U-Net and ConvNeXt	Segmentation	TN3K	0.79	0.87	-	0.89	0.87
Yang et al. (2024) []	U-Net variant	Segmentation	DDTI	0.74	0.84	0.96	0.86	-
Yang et al. (2024) []	U-Net variant	Segmentation	TN3K	0.78	0.87	0.98	0.85	-
Xiao et al. (2025) []	DeepLab v3+, EfficientNet-B7	Segmentation	DDTI	0.75	0.85	0.96	0.85	0.88
Ali et al. (2024) []	U-Net-inspired encoder–decoder	Segmentation	DDTI	0.76	0.87	0.96	0.87	0.88
Ali et al. (2024) []	U-Net-inspired encoder–decoder	Segmentation	TN3K	0.69	0.82	-	-	-
Sun et al. (2024) []	U-Net variant	Segmentation	DDTI	0.79	0.74	0.97	0.76	0.75
Wu et al. (2025) []	SWIN ViT	Segmentation	DDTI	0.68	0.79	-	-	-
Wu et al. (2025) []	SWIN ViT	Segmentation	TN3K	0.73	0.82	-	-	-
Ma et al. (2023) []	SWIN ViT	Segmentation	DDTI	0.61	0.75	-	0.76	-
Ma et al. (2023) []	SWIN ViT	Segmentation	TN3K	0.72	0.84	-	0.88	-
Ozcan et al. (2024) []	U-Net, ViT	Segmentation	DDTI	0.91	0.95	-	-	-
Ozcan et al. (2024) []	U-Net, ViT	Segmentation	TN3K	0.71	0.83	-	-	-
Li et al. (2023) []	CNN (DLA-34), level-set	Segmentation	DDTI	0.69	0.82	-	-	-
Song et al. (2020) []	CNN	Assessment	DDTI	-	-	0.96	0.93	0.97
Swathi et al. (2024) []	Quanvolutional filter, CNN	Assessment	DDTI	-	-	0.94	0.95	0.94

* Accuracy, precision, and recall should be interpreted at the pixel level for the segmentation task, and at the sample level for the assessment task.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

An Overview of AI-Guided Thyroid Ultrasound Image Segmentation and Classification for Nodule Assessment

Abstract

1. Introduction

2. Thyroid US Image Segmentation for Nodule Boundary Extraction

3. Computational Assessment of Thyroid Nodules

4. Discussion

4.1. Methodological and Technical Challenges

4.2. Explainability, Interpretability, and Clinician Trust

4.3. Computational Cost and Deployment Efficiency

4.4. Datasets, Generalization, and Annotation Challenges

4.5. Three-Dimensional Imaging, Doppler, Federated Learning, and Future Directions

5. Conclusions

5.1. Key Takeaways

5.2. Key Challenges

5.3. Clinical Integration

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics