Next Article in Journal
A Systematic Review and Meta-Analysis on the Diagnostic Test Accuracy of Hepatorenal Index in Pediatric Metabolic Dysfunction-Associated Steatotic Liver Disease
Previous Article in Journal
Complex Disseminated Tuberculosis with Oral and Gastrointestinal Involvement: Histopathologic and Clinical Insights
Previous Article in Special Issue
Artificial Intelligence Approaches to Predict Postoperative Length of Hospital Stay in Head and Neck Cancer Patients: A Systematic Review 
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development and Preliminary Evaluation of an EfficientNet-Based Deep Learning System for Ultrasound Assessment of Neck Disorders: A Single-Center Study

1
Department of Mechatronics and Biomedical Engineering, Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, Bandar Sungai Long, Kajang 43000, Selangor, Malaysia
2
Center for Cancer Research, M. Kandiah Faculty of Medicine and Health Sciences, Universiti Tunku Abdul Rahman, Bandar Sungai Long, Kajang 43000, Selangor, Malaysia
3
HAN Neuro Acupuncture & Herbal Specialists, Cyberjaya 63000, Selangor, Malaysia
*
Author to whom correspondence should be addressed.
Diagnostics 2026, 16(5), 728; https://doi.org/10.3390/diagnostics16050728
Submission received: 6 October 2025 / Revised: 3 February 2026 / Accepted: 10 February 2026 / Published: 1 March 2026

Abstract

Background/Objectives: Neck disorders encompass a range of discomforts impacting a person’s quality of life. Traditional diagnostic methods, such as physical tests and imaging techniques, rely heavily on clinician expertise, leading to potential variability in assessments. While ultrasound imaging is commonly used, the application of machine learning models to assess neck disorders, particularly fascial abnormalities, remains limited. This study seeks to fill this gap by developing a machine learning model using ultrasound images to provide accurate and efficient support for diagnosing neck disorders. Methods: Due to limited availability of labeled ultrasound data for neck disorders, developing robust and generalizable models remains a challenge. In this study, a neck disorder assessment system was developed using ultrasound images collected from 184 patients by employing various machine learning algorithms. To address data scarcity and improve model generalizability, an approach utilizing EfficientNet with transfer learning was introduced and thoroughly assessed using the trained model on a completely clean test dataset, ensuring the robustness of the solution. The model was trained using 5-fold cross-validation with the respective weight of each class and AdamW as the optimizer. Results: The results showed promising performance, with the deep fascia fuzzy texture and deep fascia and myofascial adhesion at lower cervical regions demonstrating the highest weighted average F1-scores of 76% and 81%, respectively. The macro averages reflected similar performance, at 74% and 78%, respectively, indicating consistent class-wise accuracy for these regions. Conclusions: The proposed model demonstrated robust classification performance for neck disorder assessment, particularly in evaluating the lower cervical region. This approach has the potential to support clinical decision-making by providing consistent, efficient, and accurate diagnostic assistance. Further refinement and validation across diverse clinical settings will be critical to enhance its real-world applicability.

1. Introduction

Neck disorders can encompass a wide range of conditions, including muscular and musculoskeletal neck pain. These disorders could lead to discomfort, pain, and restricted neck motion. Neck disorders have several definitions based on the anatomical location and other considerations [1]. As technology evolves, people are increasingly dependent on electronic devices, adopting improper postures that can result in varying degrees of neck damage [2,3,4]. By 2050, the global number of neck pain cases is expected to reach 269 million, reflecting a 32.5% increase compared to 2020 [5]. This growing prevalence underscores the critical need for innovative solutions to support the timely and accurate diagnosis of neck disorders.
As a result, multiple diagnostic techniques have been introduced to diagnose the disorders with high accuracy, including physical tests, ultrasound imaging, computed tomography (CT), and magnetic resonance imaging (MRI). Additionally, the cervical range of motion (CROM) device can be used for physical tests to assess the muscle capability of the patients with reference to their ages and body indices [6]. Since physical test results are subject to the pain tolerance of a patient, radiology has revolutionized the way for diagnosis. Ultrasound imaging is getting popular due to its non-invasive nature and ability to provide real-time visualization of soft tissues and structures in the neck region, making it useful in detecting the fascial characteristics of a patient for diagnosis [7]. Fascia screening provides useful information for neck disorders, such as deep fascia discontinuity and adhesion [8]. However, a clinician requires extensive training to analyze the images and categorize the severity of the neck disorder. While some studies have applied machine learning and deep learning approaches to related musculoskeletal ultrasound tasks—such as fascia segmentation using deep networks [9] and texture-based classification of myofascial trigger points [10]—there is no prior work specifically focused on models for ultrasound-based neck fascia analysis. Hence, this study aimed to develop machine learning algorithms for assessing neck disorder severity from ultrasound images, which could assist clinicians in performing preliminary analyses before integrating them with other information to make informed decisions.
Casaletto et al. [11] demonstrated the efficacy of ultrasound in visualizing neck nerves for clinical diagnosis, providing a foundation for understanding ultrasound clinical utility in nerve imaging. In addition, techniques like transfer learning—which adapts pretrained models to smaller, specialized datasets [12]—have been explored in ultrasound research to enhance diagnostic accuracy in tumor detection. For example, Cheng and Malhi [13] evaluated transfer learning using CaffeNet and VGGNet for an abdominal ultrasound dataset, achieving the highest test accuracy of 77.9% compared to 71.7% accuracy by the radiologist. Similarly, Gu et al. [14] proposed a neural network-based classification method for medical ultrasound image processing, emphasizing the role of convolutional architectures and transfer learning to improve diagnostic accuracy in the detection of thyroid nodules. Saha and Sheikh [15] utilized data augmentation and the Auxiliary Classifier Generative Adversarial Network (ACGAN) for classifying breast ultrasound images, which consisted of 150 malignant images and 100 benign images. Despite the highly imbalanced dataset, the authors achieved 98.8% accuracy, compared to 96.4% with VGG19.
EfficientNet, a deep neural network, utilizes highly effective compound scaling, which optimizes the model’s width, depth, and resolution to maximize performance while minimizing computational cost [16]. EfficientNet V2, an enhanced version introduced by Tan and Le [17], improves the original model with faster training speeds and better parameter efficiency. Marques et al. [18] developed an automated medical diagnosis system for Coronavirus disease (COVID-19) with CT X-ray lung images. They achieved an average F1-score of 97.11% for multiclass classification and 99.62% for binary classification.
This study addresses the challenge of automating neck disorder assessment using ultrasound imaging by applying transfer learning techniques. A curated dataset of ultrasound images from 184 patients was developed, focusing on muscular neck conditions with varied texture patterns—an area with limited representation in existing datasets. Unlike tumor detection, these images present a unique challenge due to their varied textures and lack of tumor-specific features. This makes them distinctly different from the more homogeneous, tumor-focused datasets, presenting a more complex task for accurate classification. EfficientNet, a high-performing convolutional neural network (CNN) architecture, was adapted to this domain to classify ultrasound features associated with neck disorders. The study contributes to the field by demonstrating the feasibility of applying deep learning to ultrasound analysis of neck fascia and by providing a foundation for future research in clinical decision support systems for neck disorder diagnosis. In particular, this work represents the first application of an EfficientNet-based deep learning model for fascia-texture classification using ultrasound images, addressing an area of musculoskeletal assessment that has been largely unexplored in prior literature.

2. Materials and Methods

The proposed solution consists of several stages, including data preprocessing and classification. The classification models were developed using two different platforms. The first was a desktop equipped with an NVIDIA RTX 3080Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA), which has 12 GB of CUDA memory. The second device was Google Colab Pro, utilizing an A100 GPU with 40 GB of CUDA memory. Figure 1 illustrates the workflow of the developed solution, detailing each stage from data preprocessing to final classification.

2.1. Dataset

The data were collected from HAN Neuro Acupuncture & Herbal Specialists Sdn. Bhd. with ethical approval from the UTAR Scientific and Ethical Review Committee (SERC), reference number U/SERC/271/2021. The data collection period spanned from November 2022 to October 2024. The study included subjects who met the following inclusion criteria: no skin diseases or injuries on the limbs, no skin allergies, and no incompatibility with ECG electrodes. All participants were fully informed of the examination procedures and provided written informed consent prior to their participation. Exclusion criteria included patients with any of the following conditions: dermatological lesions on the limbs, excessive perspiration (hyperhidrosis), the use of a cardiac pacemaker or any implanted electronic devices such as defibrillators, inability to sit during the examination, metal pins or prostheses on the extremities or joints, pregnancy, or the absence of one or more limbs. Importantly, patients with prior neck surgery or trauma were not included in the study to prevent confounding factors.
All images were acquired using a Mindray DC-70 (Shenzhen Mindray Bio-Medical Electronics Co., Ltd., Shenzhen, Guangdong, China) high-resolution ultrasound system with Sound Touch Quantification. A linear array transducer (L12-3E) operating within a 3–12 MHz frequency range was used, with the acquisition set to F H10.0 (10 MHz high-frequency) in B-mode. The exported images had a resolution of 1260 × 910 pixels at 96 dpi. All scans were obtained under consistent protocol by two trained clinicians to ensure uniformity of image quality. The clinician provided a neck ultrasound dataset followed by ultrasound images analysis on each image. The data were collected from a total of 184 patients, comprising 75 males and 109 females, with informed consent. The males had an average age of approximately 43 years, weighed 74 kg, were 170 cm tall, and had a body mass index (BMI) of 26.3. The females had an average age of approximately 40 years, weighed 59 kg, were 160 cm tall, and had a BMI of 23.6. A two-tailed independent t-test showed no significant difference in age between males and females, but weight, height, and BMI were significantly higher in males compared to females. The summary of the analysis is presented in Table 1.
The ultrasound images were taken from three body parts, the cervical lower region (CL), cervical middle region (CM), and cervical upper region (CU), corresponding to C5, C3, and C1 in the cervical vertebrae, respectively. The features include deep fascia discontinuity, deep fascia “wen li bu qing” (fuzzy texture of deep fascia), the modified Heckmatt scale (MHS), and deep fascia and myofascial adhesion. Each category was scored either zero or one, except for deep fascia discontinuity, which was scored from zero to two, and the modified Heckmatt scale, which ranged from one to four. Table 2 shows the number of images collected for each score in each category. Figure 2 shows an example of a CL ultrasound image, along with the annotated scores provided by the clinician for each feature.

2.2. Data Preprocessing Procedures

Data preprocessing is crucial to arrange the data properly before any further steps, preventing failure due to faulty data. In this study, the preprocessing was performed using Pandas, which was employed to read the CSV file, check for empty cells, and validate that all scores were within the correct bounds. The first step of data preprocessing involved filtering the data to remove patients with empty or abnormal scores. A total of two patients had empty or abnormal scores in the scoring provided by the clinician, so only a total of 182 images were used for model development. Next, the MHS scores were renumbered from 1–4 to 0–3 for easier processing. After filtering, the images were cropped and resized to remove the fixed frame and retain only the ultrasound portion, thereby reducing computational costs by eliminating unnecessary pixels. The cropped image size was set to 620 × 620 pixels, based on uniform resizing to maintain consistency across all images. The ultrasound images were then synchronized with the numbering in the analysis file. At this stage, the images were ready for processing and were categorized according to their different features and scores.
Next, we further addressed noticeable noise in the images, which was not useful for algorithm development. Noise reduction aimed to mitigate the unavoidable speckles in ultrasound images. Conventional noise reduction methods, including mean filter, median filter, and Gaussian filter, were employed. Additionally, the Optimized Blockwise Non-Local Means (OBNLM) from the GitHub repository was tested. In this study, the Peak Signal-to-Noise Ratio (PSNR) metric was chosen as it is widely used in image processing for assessing the quality of reconstructed or denoised images. PSNR provides a straightforward, quantitative measure of the difference between the original and processed images, focusing on pixel-level error, making it ideal for evaluating the effectiveness of the filtering techniques applied.
Since the neck ultrasound images were analyzed based on their texture, texture characterization was carried out to highlight the crucial textures, making the analysis more significant and easier. The Gray-Level Co-Occurrence Matrix (GLCM) has proven to be one of the best texture descriptors over the years [19]. The GLCM works by calculating how often pairs of pixels with specific bit values occur. Several statistics can be obtained from the matrix, such as contrast, correlation, energy, homogeneity, and mean. The GLCM was computed using a 5 × 5 kernel, a pixel distance of 1, and an angle of 0°. A 5 × 5 neighborhood was selected to preserve fine-scale fascia echotexture while reducing sensitivity to random speckles. A distance of 1 pixel is widely recommended in radiomics studies for capturing the most discriminative local spatial dependencies, while the 0° orientation aligns with the predominant horizontal organization of cervical fascia layers. This configuration enables stable and anatomically meaningful texture feature extraction, consistent with current radiomics practice [20,21,22].
Patients typically seek medical attention when they feel unwell; therefore the dataset was highly unbalanced with the majority class on the severe characteristics. Snider et al. [23] reported that standard augmentations and ensemble predictions boosted ultrasound classification accuracy. Several methods were employed to manage the imbalanced datasets. The images were augmented by random horizontal flips, random rotations within 10°, and random affine transformations. The augmented images were then concatenated with the original images to achieve a balanced dataset. This augmentation and concatenation process was carried out within each category, as the imbalance varied across categories. From the augmented images, a random subset was selected and concatenated to match the number of images in the major class.
In the classification algorithm, the weight for each class in a particular category was adjusted based on the ratio in the dataset. This adjustment influenced the learning weight of each class in the model, ensuring that the algorithm accounted for the class imbalance during training. In practice, this means that classes with fewer images were assigned higher weights, while those with more images received lower weights, allowing the model to pay proportionally more attention to under-represented categories. This approach helps prevent the classifier from being biased toward the majority class and supports more balanced learning across all classes. Formally, for a dataset with C classes and ni samples in class i, the class weight was computed as wi = N/(C × ni), where N denotes the total number of samples.

2.3. Classification

Given the complexity of ultrasound images, a deep learning model was essential for iteratively learning important features through backpropagation from the provided training set. The first step involved splitting the original dataset into training and testing sets with a ratio of 9:1. The training set was further split into training and validation sets, in an 8:2 ratio. Piffer et al. [24] systematically reviewed “small data” AI in medical imaging, finding that transfer learning and data augmentation are widely used to mitigate scarcity. To ensure reproducibility, the deep learning model was provided with inputs resized to 480 × 480 pixels. During training, a set of light augmentations was applied, including random horizontal flipping and random posterization (2-bit), followed by normalization using ImageNet mean and standard deviation. ImageNet normalization was retained to ensure compatibility with the pretrained network and to maintain stable feature representations during transfer learning. Validation and test images were only resized and normalized, without augmentation. The model was trained using a 5-fold cross-validation approach, with 20 epochs per fold (totaling 100 training epochs).
Each fold was trained using a batch size of 16, an initial learning rate of 0.0002, and a learning-rate scheduler that reduced the learning rate by a factor of 0.1 every five epochs. A dropout rate of 0.2 was incorporated to help reduce overfitting. The model was implemented in the PyTorch framework and optimized with the Adam optimizer using default momentum settings.
In addition to using the pretrained weights from EfficientNet, which was trained on the 1000-class ImageNet dataset, two other datasets were obtained from Kaggle to perform transfer learning. The first dataset was the ultrasound images of breast cancer by [25], consisting of two classes of 224 × 224 pixel images and approximately 8000 training images. The second dataset was the brain MRI tumor dataset by [26], comprising four classes of 512 × 512 pixel images.
The brain MRI dataset, with 2870 training images, was computationally expensive to train due to its large size. Therefore, the dataset was reduced to 100 images per class, totaling 400 images—a practical trade-off to enable feasible computation. Although reduced, the network was able to learn general low-level features, such as edges and textures, which were transferable to the neck ultrasound dataset. These pretrained weights served as initializations for the ultrasound task, accelerating convergence and improving final model performance compared to training from scratch, without causing underfitting. The different approaches tested are listed in Table 3.
However, a key constraint of this study is the lack of segmentation masks, which limited the model’s ability to focus on specific regions of interest, such as the fascia and muscle structures, in the ultrasound images. This limitation meant that the model had to rely on raw image data without precise localization, which could have affected the model’s ability to extract more targeted features.

2.4. Performance Evaluation

After developing the algorithm, its performance was evaluated using two different metrics: F1-score (also referred to as Dice coefficient) in terms of macro average and weighted average. F1-scores quantified the algorithm’s performance by calculating the harmonic mean of precision and recall, indicating the algorithm’s reliability. F1-scores were computed from a confusion matrix, which consisted of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Given the unbalanced nature of the dataset, the metrics were computed using both macro average (the unweighted mean of all per-class scores) and weighted average (which accounts for the class weight to address class imbalance) [27]. Macro average computed the metrics independently for each class, while weighted average considered each class’s contribution proportional to its size. Accuracy, a common metric, was not considered as it could be misleading due to the dominance of the majority class.
M a c r o   A v e r a g e   F 1 S c o r e = 1 N     i = 0 N 2 T P i 2 T P i + F P i + F N i  
W e i g h t e d   A v e r a g e   F 1 S c o r e = i = 0 N w i N × 2 T P i 2 T P i + F P i + F N i
where N is the total number of classes in the respective feature; i is the index of the class in the respective feature; w i is the number of samples in class i.

3. Results

3.1. Data Preprocessing Outcomes

The images were first cropped from 1260 × 910 pixels into 620 × 620 pixels, removing unnecessary portions that did not contain important features to enhance computational efficiency. A representative cropped image is shown in Figure 3.
Next, the images were filtered to reduce noise. The PSNR values for each filtering technique, ranked from highest to lowest, are as follows: Gaussian Filter (49.16 dB), Median Filter (43.14 dB), Bilateral Filter (42.04 dB), Mean Filter (39.45 dB), OBNLM (36.32 dB), and Laplacian Filter (21.76 dB) (see Appendix A.1). A higher PSNR indicates better image quality with less noise. Therefore, the Gaussian-filtered images were used for subsequent texture characterization. The GLCM-texturized images are shown in Figure 4. Among them, the GLCM mean images were chosen for further processing (see Appendix A.2 for more examples). The transition from the original image to the Gaussian-filtered image, and finally to the GLCM mean image, is shown in Figure 5. The GLCM mean image exhibited better contrast and eliminated noise in the pure black region, which helped to reduce image complexity.

3.2. Classification Algorithms

The different methods used for classification algorithms are shown in Table 4. Methods 1 to 6 were used to train on open datasets (the breast cancer dataset and the brain MRI dataset), with validation accuracy results presented. The breast cancer dataset was effectively trained using EfficientNet B0, achieving 99% validation accuracy, while the brain MRI dataset performed the best when trained with EfficientNet V2L, enhanced by the addition of a dropout layer with the probability of 0.2.
Figure 6 shows an example of the learning curve for Method 10. The training loss curve converged rapidly until epoch 20 and then slowed and fluctuated at a very low loss, less than 0.2. The validation loss curve converged quickly in the first five epochs and then fluctuated between losses of 1.5 and 1.7. Similarly, in the accuracy curve graph, the training curve gradually increased and fluctuated between accuracies of 0.96 and 0.99, while the validation curve exhibited larger fluctuations. Early stopping was applied during training to prevent overfitting, ensuring that the model did not continue training once the validation loss stopped improving in 5 epochs.
The best-performing trained weights were used as the pretrained weights to train the cervical dataset. Various combinations were tested, and the results are shown in Table 5. During the model development process, to save computational cost and time, the models were first trained and tested on deep fascia fuzzy texture. If the model performed well, training proceeded with deep fascia discontinuity, followed by deep fascia and myofascial adhesion, and finally MHS. This stepwise procedure represents a structured incremental approach, starting from the easiest-to-learn feature and progressively moving to more challenging features. In the absence of a segmentation mask, deep fascia fuzzy texture was easier to learn by the neural network, as it contains texture-like features, compared to deep fascia discontinuity, which focuses on the discontinuity of the muscle fiber. Adhesion is also associated with muscle fiber texture, so we expected it to resemble deep fascia fuzzy texture. MHS, with the greatest number of classes, had the fewest images per class, making it the most challenging to process.
Table 5 presents the F1-scores for all tested classification algorithms. Methods 9 and 10 employed K-fold cross validation, with the training and validation curves of the best model (Method 10) presented in Figure 6. Both Methods 9 and 10 employed the AdamW optimizer in PyTorch, with a learning rate of 0.0002 and a weight decay of 0.00001. AdamW is similar to the Adam optimizer, but it decouples weight decay from the learning rate, allowing each parameter to be tuned independently. Cross-entropy loss was used as the loss function, and a step scheduler was employed to decay the learning rate by a factor of 0.1 every five epochs, enabling the model to learn more slowly and deeply. Additionally, average pooling and a dropout of 0.2 were applied. Different from Method 9, which used data augmentation to balance the dataset, Method 10 considered the weight of each class over the total number of images. Figure 6 shows the learning curves for Method 10, using the CL fuzzy texture dataset and pretrained weights from Method 6. The final testing accuracy of this model was 76%. While the training loss curve shows a smooth trend, the other curves exhibited varying degrees of fluctuation across several folds.
Model training was conducted using 5-fold cross-validation and required approximately 45 min on an NVIDIA A100 40GB GPU for the ultrasound dataset (182 images). Each fold used about 145 training images (batch size = 16), yielding around 10 batches per epoch and 20 epochs (around 200 batches per fold), or around 1,000 batches across all folds.
Compared to the CU and CM regions, the classification accuracy for the CL region was generally higher. This disparity may be attributed to the more pronounced degeneration observed in the CL region, which results from increased mechanical and biomechanical stress due to its central role in neck mobility and head support [28,29]. Furthermore, the imbalanced dataset, characterized by a higher proportion of severe cases in the CL region, may have enhanced the classifier’s ability to identify distinctive features, thereby improving its performance.

4. Discussion

There are no state-of-the-art algorithms specifically designed for neck disorder assessment systems, making direct comparison difficult. Most previous ultrasound images assessment systems have focused on cancer, specifically tumor detection, whereas this study concentrates on texture-like features related to neck fascia. By applying machine learning techniques to ultrasound images for the assessment of these features, this study explores an area that has not been extensively addressed in the literature.
Azmoodeh-Kalati et al. [30] combined EfficientNetV1 and EfficientNetV2 in an ensemble for breast cancer classification, using ultrasound images. Similarly, Liu et al. [31] introduced pretrained EfficientNetV2 for breast cancer classification. They combined conventional CNNs, such as ResNet_v2 and Inception_v3 with EfficientNetV2 and found that EfficientNetV2-b1 showed the best performance, indicating its effectiveness in the task. Although different breast cancer ultrasound images were used, the outcomes were similar, proving the effectiveness of EfficientNet in medical imaging classification. Given the previous success of applying transfer learning to small datasets, various combinations of transfer learning methods were tested to determine the best model, as shown in Table 2. Due to the small and unbalanced nature of the dataset, it was challenging for EfficientNet to learn potential features directly. Therefore, EfficientNet was first trained using similar, larger datasets, and the learned weights were then transferred to train on the cervical dataset.
In addition, preprocessing steps such as Gaussian and median filtering were implemented to normalize texture variance and reduce speckle noise before inputting the data into EfficientNet. To address the computational cost and time constraints, the K-means clustering algorithm was employed to facilitate faster testing with different methods as feature extractors. The tests involved removing the last layer of the neural network to serve as the feature descriptor. The results are shown in Figure 7. K-means clustering was applied to the extracted embedding vectors to partition diagnostic categories efficiently, minimizing redundant computations during inference. This approach was chosen over PCA or direct CNN embeddings because K-means allows grouping of similar feature embeddings into representative cluster centers, enabling faster class decision boundaries without re-training the convolutional layers. Unlike PCA, which performs linear dimensionality reduction, K-means preserves the natural clustering structure of the embeddings, which is critical for maintaining diagnostic category distinctions.
RegNetY, a popular deep learning model, was also tested and compared but did not outperform EfficientNet (as shown in Figure 7). The breast cancer dataset was trained using EfficientNet B0, as the image size was 224 × 224 pixels, and EfficientNet is sensitive to image size. Consequently, other datasets with larger sizes were not used to train EfficientNet B0. For the brain MRI images, which are sized at 512 × 512 pixels, EfficientNet B6 (with images resized to 528 × 528 pixels) and EfficientNet V2L (with images resized to 480 × 480 pixels) were employed, as deeper neural networks tend to learn useful features more effectively.
When compared with EfficientNet B6, EfficientNet V2 demonstrated a 14% improvement in performance on the validation set [17]. The final model, EfficientNet V2L, achieved 96% validation accuracy and 77% testing accuracy. Due to the significant size difference, the breast cancer dataset was not used to train EfficientNet V2L.
Methods 7 and 8 in Table 5 demonstrate the influence of image size on the classification model. In Method 7, the neck ultrasound images were resized to 480 × 480 pixels, while they were resized to 224 × 224 pixels in Method 8. Hence, Methods 7 and 8 yielded insignificant results. Subsequently, development focused on tuning EfficientNet V2L with different parameters. The models were first tuned with various optimizers, including Adam, RMSProp, SGD, and AdamW. Given the small dataset size, overfitting was a concern. The best optimizer, AdamW, was used with a small learning rate of 0.0002. Weight decay (L2 regularization) was applied to prevent overfitting and to encourage the model to focus on more important features. The step scheduler served as a regularization and fine-tuning technique to help the model converge for better feature learning. Additionally, average pooling with a 20% dropout was applied to regularize and prevent overfitting. All combinations of the different regularization techniques were tested, and the best combination was reported in the result. After testing the best combinations of the optimizers, Method 9 did not yield satisfactory results. Due to the highly unbalanced dataset, the augmented images created were in a big amount compared to the normal cases, creating a bias in the model. To further minimize this bias, the weighted random sampler from PyTorch was used in Method 10, also referred to as the final model. This method used data based on the weight of the particular class, thereby ensuring that each class was represented proportionally during training. This approach reduces the impact of the higher selection probability for the majority class.
The inference of the proposed algorithm was visualized using Gradient-weighted Class Activation Mapping (Grad-CAM), a technique used to visualize the regions of an image important for predicting a particular class label by examining the last feature layer [32]. A sample output is shown in Figure 8, demonstrating that the model did not learn the noisy parts, unlike in K-means clustering. Recent research in medical image analysis has demonstrated the benefit of hybrid approaches that combine deep learning with handcrafted feature representations. For example, hybrid models integrating CNNs with handcrafted texture features have shown enhanced performance in histopathological diagnosis tasks, such as malignant lymphoma classification [33], and in pneumonia detection from chest radiographs, where texture descriptors like GLCM complement CNN-based features to improve robustness [34]. These findings conceptually support our use of texture-oriented characterization alongside deep feature learning, adapted here to address the challenge of cervical fascia analysis.
The observed performance variation across cervical regions likely reflects inherent anatomical and biomechanical differences rather than limitations of the modeling approach [35]. The lower cervical region plays a primary role in load transmission and mechanical stabilization of the neck and is characterized by relatively well-defined and continuous fascial structures, which may yield more consistent texture patterns for feature learning. In contrast, the middle and upper cervical regions comprise anatomically intricate soft-tissue arrangements with overlapping fascia, musculature, and connective tissues that support fine motor control and multidirectional movement. Such structural complexity can result in more heterogeneous texture representations, making discrimination between diagnostic categories inherently more challenging. Similar region-dependent performance variations have been reported in recent medical imaging studies, where deep learning models demonstrated differing classification and segmentation accuracy across anatomically distinct subregions. These findings suggest that region-specific anatomical characteristics are an important consideration when interpreting model performance in ultrasound-based musculoskeletal analysis.

5. Limitations and Recommendations

The neck disorder assessment system is new in the medical intelligence field, unlike cancer assessment systems, which have been extensively researched. The main challenge in developing this system lies in the limitation of the dataset, which is small, highly imbalanced, and lacks available open datasets for transfer learning. While many state-of-the-art algorithms perform well on small datasets, they typically require large datasets for effective transfer learning. The ideal public dataset should be large in size, with consistent pixel size and texture-related images. Due to the limited public dataset options, the breast cancer and brain MRI datasets used in this study were the best options available, despite being tumor-related and having smaller pixel sizes.
Additionally, since the dataset was collected and analyzed by only two clinicians from a single center, it limits the generalizability and robustness of the algorithm. Consequently, the solution may not be objective enough to assist clinicians without bias. This limitation underscores the importance of multi-center data integration in medical imaging–based artificial intelligence research. For instance, Ghabri et al. [36] applied ImageNet-pretrained CNNs with transfer learning to classify fetal ultrasound images, using datasets collected from multiple hospitals across countries such as Spain, Egypt, and Algeria. They demonstrated the importance of using robust datasets to produce a robust classification model, ensuring the solution is suitable for a broader population. Accordingly, a key future direction of this work is to expand data collection to include multi-center and multi-operator datasets, enabling the model to better capture inter-institutional and inter-population variability. Furthermore, prospective multi-center validation studies would be essential to evaluate the stability of model performance over time, support clinical translation, and ensure that the proposed framework remains reliable and effective across diverse real-world clinical settings.
Besides that, with the capability of Generative AI, these datasets can be combined to create a larger dataset for synthetic image generation using generative deep learning models. Specifically, diffusion-based methods (e.g., denoising diffusion models shown to improve diversity in cardiac ultrasound) or GAN-based techniques (such as CycleGAN architectures tailored for multi-organ ultrasound enhancement) could be employed to ensure reproducibility and high-quality image synthesis [37,38]. If successful, synthetic image generation could contribute to open datasets like MedMNIST, accelerating the development of texture-related ultrasound image assessment systems. While Grad-CAM offered an initial view of how the model interprets ultrasound features, its explanatory capability remains limited. In future work, more robust interpretability methods such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), which can provide clearer, model-agnostic insights into the features driving the model’s decisions [39], should be employed. Grad-CAM++ will also be considered to improve localization and highlight finer anatomical details. These enhancements aim to address current limitations and support greater transparency and confidence when applying the model in clinical practice.
Moreover, future studies should aim to obtain segmented masks from clinicians as the ground truth. Each cervical part has several categories, each focusing on different regions in the ultrasound image. Ground truth labels of segmented masks or regions will aid in feature extraction and classification algorithms. The absence of such region-specific annotations in the current study restricts the model’s ability to explicitly attend to anatomically relevant fascia structures and may partly explain the moderate performance observed in certain cervical regions and feature categories. Excellent zero-shot segmentation models such as Medical Segment Anything Model (MedSAM) show promising performance. The mask region will help the algorithm focus on specific regions for each category, enhancing the significance of the features. Future work could also explore alternative deep learning architectures, including lightweight CNNs and CNN-ViT models, along with a systematic comparison of computational efficiency, to further enhance both diagnostic accuracy and efficiency in medical imaging.
To make the model more clinically meaningful and adaptable to real-world cases, future work should consider incorporating additional patient information such as age, BMI, palpation findings, and relevant medical history. Bringing these elements together through a multimodal fusion approach would allow the model to capture a fuller picture of each patient, ultimately improving its robustness and ability to handle natural variations between individuals. In addition, future work can include a full ablation study to quantitatively evaluate the contribution of each component of the model.
Finally, while developing and sharing ultrasound datasets is critical for advancing medical AI, ethical and privacy considerations must be prioritized. Even de-identified images carry potential re-identification risks if metadata is not carefully managed. All data sharing should follow robust governance, ensuring patient confidentiality, informed consent, and responsible stewardship. Transparent consent processes and secure data handling are essential to respect patient rights while maximizing the societal benefit of data reuse [40,41].

6. Conclusions

This study demonstrated the potential of EfficientNet-based deep learning models for automated neck disorder assessment using ultrasound images. The final model, developed with transfer learning and validated through 5-fold cross-validation and an independent clean test dataset, showed its strongest performance in the lower cervical region. Specifically, deep fascia fuzzy texture and myofascial adhesion achieved weighted F1-scores of 76% and 81%, indicating significant diagnostic value for clinical support. These results highlight the model’s potential for clinical use in scenarios such as screening and triage, where it could enable rapid, consistent assessments, leading to early identification and prioritization of neck disorders. However, the study is limited by a small, imbalanced dataset and the lack of segmentation masks, which may impact generalizability and feature localization. Future work should focus on expanding the dataset across multiple centers to enhance generalizability and incorporating segmentation masks for region-specific learning. Additionally, advanced augmentation or generative techniques could be explored to further improve model robustness. Overall, this model lays a promising foundation for supporting neck disorder assessment, offering the potential to improve the consistency, efficiency, and accuracy of clinical ultrasound interpretation.

Author Contributions

Conceptualization, S.-Y.M., Y.M.L., H.S.T., L.F.T., C.N.F. and C.-H.G.; Methodology, W.D.W., S.-Y.M., Y.M.L., H.S.T., L.F.T., C.N.F. and C.-H.G.; Software, W.D.W.; Validation, Y.M.L., H.S.T., L.F.T., C.N.F., C.P.Y.L. and C.-H.G.; Formal analysis, W.D.W., S.-Y.M., L.F.T. and C.-H.G.; Investigation, W.D.W., S.-Y.M., H.S.T., L.F.T., C.P.Y.L. and C.-H.G.; Resources, Y.M.L. and H.S.T.; Data curation, W.D.W., S.-Y.M., H.S.T., L.F.T., C.P.Y.L. and C.-H.G.; Writing—original draft preparation, W.D.W., S.-Y.M. and C.-H.G.; Writing—review and editing, S.-Y.M., Y.M.L., H.S.T., L.F.T., C.N.F., C.P.Y.L. and C.-H.G.; Visualization, W.D.W., S.-Y.M. and C.-H.G.; Supervision, S.-Y.M. and C.-H.G.; Project administration, C.P.Y.L.; Funding acquisition, S.-Y.M., Y.M.L., H.S.T., L.F.T., C.N.F. and C.-H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by Han Neuro Acupuncture & Herbal Specialists Sdn. Bhd. (8122-0001).

Institutional Review Board Statement

This study received ethical approval by the UTAR Scientific and Ethical Review Committee (SERC), with reference number U/SERC/271/2021 and was conducted in accordance with the Declaration of Helsinki. Date: 22 November 2021.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The study’s data and resources are presently being analyzed and cannot yet be accessed by the public. We are unable to offer more details or access to the data currently due to the ongoing nature of the research.

Acknowledgments

The authors gratefully acknowledge the valuable contributions of all participants who took part in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ACGANAuxiliary Classifier Generative Adversarial Network
BMIBody Mass Index
CLCervical Lower region
CMCervical Middle region
CROMCervical Range of Motion
CTComputed Tomography
CUCervical Upper region
FNFalse Negative
FPFalse Positive
GLCMGray-Level Co-Occurrence Matrix
Grad-CAMGradient-weighted Class Activation Mapping
MHSModified Heckmatt Scale
MRIMagnetic Resonance Imaging
OBNLMOptimized Blockwise Non-local Means
PSNRPeak Signal-to-Noise Ratio
TNTrue Negative
TPTrue Positive

Appendix A

Appendix A.1

Figure A1. Comparison of image filtering techniques. (a) Raw image without filtering; (b) Median filter applied, yielding a PSNR of 43.14; (c) Bilateral filter applied, yielding a PSNR of 42.04; (d) Gaussian filter applied, achieving the highest PSNR of 49.16.
Figure A1. Comparison of image filtering techniques. (a) Raw image without filtering; (b) Median filter applied, yielding a PSNR of 43.14; (c) Bilateral filter applied, yielding a PSNR of 42.04; (d) Gaussian filter applied, achieving the highest PSNR of 49.16.
Diagnostics 16 00728 g0a1

Appendix A.2

Figure A2. Additional representative ultrasound samples demonstrating GLCM texture characterization, i.e., (a) Cervical lower (CL) region, (b) Cervical middle (CM) region, (c) Cervical upper (CU) region. Quantitative GLCM metrics show consistent reductions in entropy and increases in variance across the displayed images, supporting the effectiveness of the filtering process in suppressing noise while maintaining muscle fiber texture integrity.
Figure A2. Additional representative ultrasound samples demonstrating GLCM texture characterization, i.e., (a) Cervical lower (CL) region, (b) Cervical middle (CM) region, (c) Cervical upper (CU) region. Quantitative GLCM metrics show consistent reductions in entropy and increases in variance across the displayed images, supporting the effectiveness of the filtering process in suppressing noise while maintaining muscle fiber texture integrity.
Diagnostics 16 00728 g0a2

References

  1. Misailidou, V.; Malliou, P.; Beneka, A.; Karagiannidis, A.; Godolias, G. Assessment of patients with neck pain: A review of definitions, selection criteria, and measurement tools. J. Chiropr. Med. 2010, 9, 49–59. [Google Scholar] [CrossRef]
  2. Jun, D.; Zoe, M.; Johnston, V.; O’Leary, S. Physical risk factors for developing non-specific neck pain in office workers: A systematic review and meta-analysis. Int. Arch. Occup. Environ. Health 2017, 90, 373–410. [Google Scholar] [CrossRef]
  3. Xie, Y.; Szeto, G.; Dai, J. Prevalence and risk factors associated with musculoskeletal complaints among users of mobile handheld devices: A systematic review. Appl. Ergon. 2017, 59, 132–142. [Google Scholar] [CrossRef]
  4. Szeto, G.P.; Tsang, S.M.; Dai, J.; Madeleine, P. A field study on spinal postures and postural variations during smartphone use among university students. Appl. Ergon. 2020, 88, 103183. [Google Scholar] [CrossRef] [PubMed]
  5. Wu, A.M.; Cross, M.; Elliott, J.M.; Culbreth, G.T.; Haile, L.M.; Steinmetz, J.D.; Hagins, H.; Kopec, J.A.; Brooks, P.M.; Woolf, A.D.; et al. Global, regional, and national burden of neck pain, 1990–2020, and projections to 2050: A systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol. 2024, 6, e142–e155. [Google Scholar] [CrossRef] [PubMed]
  6. Chen, X.; O’Leary, S.; Johnston, V. Modifiable individual and work-related factors associated with neck pain in 740 office workers: A cross-sectional study. Braz. J. Phys. Ther. 2018, 22, 318–327. [Google Scholar] [CrossRef]
  7. Stecco, A.; Meneghini, A.; Stern, R.; Stecco, C.; Imamura, M. Ultrasonography in myofascial neck pain: Randomized clinical trial for diagnosis and follow-up. Surg. Radiol. Anat. 2014, 36, 243–253. [Google Scholar] [CrossRef] [PubMed]
  8. Pawlukiewicz, M.; Kochan, M.; Niewiadomy, P.; Szuścik-Niewiadomy, K.; Taradaj, J.; Król, P.; Kuszewski, M.T. Fascial manipulation method is effective in the treatment of Myofascial Pain, but the treatment protocol matters: A Randomised Control Trial—Preliminary Report. J. Clin. Med. 2022, 11, 4546. [Google Scholar] [CrossRef] [PubMed]
  9. Bonaldi, L.; Pirri, C.; Giordani, F.; Fontanella, C.G.; Stecco, C.; Uccheddu, F. Segmentation of the Thoracolumbar Fascia in Ultrasound Imaging: A Deep Learning Approach. BMC Med. Imaging 2025, 25, 164. [Google Scholar] [CrossRef]
  10. Shomal Zadeh, F.; Koh, R.G.; Dilek, B.; Masani, K.; Kumbhare, D. Identification of Myofascial Trigger Point Using the Combination of Texture Analysis in B-Mode Ultrasound with Machine Learning Classifiers. Sensors 2023, 23, 9873. [Google Scholar] [CrossRef]
  11. Casaletto, E.; Lin, B.; Wolfe, S.W.; Lee, S.K.; Sneag, D.B.; Feinberg, J.H.; Nwawka, O.K. Ultrasound imaging of nerves in the neck: Correlation with MRI, EMG, and clinical findings. Neurol. Clin. Pract. 2020, 10, 415–421. [Google Scholar] [CrossRef]
  12. Safonova, A.; Ghazaryan, G.; Stiller, S.; Main-Knorn, M.; Nendel, C.; Ryo, M. Ten deep learning techniques to address small data problems with remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103569. [Google Scholar] [CrossRef]
  13. Cheng, P.M.; Malhi, H.S. Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. J. Digit. Imaging 2017, 30, 234–243. [Google Scholar] [CrossRef]
  14. Gu, F.; Deng, M.; Chen, X.; An, L.; Zhao, Z. Research on classification method of medical ultrasound image processing based on neural network. Comput. Intell. Neurosci. 2022, 2022, 8912566. [Google Scholar] [CrossRef]
  15. Saha, S.; Sheikh, N. Ultrasound image classification using ACGAN with small training dataset. In Proceedings of the International Symposium on Signal and Image Processing, Singapore, 18–19 March 2020; Springer Nature: Singapore, 2020; pp. 85–93. [Google Scholar]
  16. Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
  17. Tan, M.; Le, Q. EfficientNetV2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
  18. Marques, G.; Agarwal, D.; De la Torre Díez, I. Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Appl. Soft Comput. 2020, 96, 106691. [Google Scholar] [CrossRef]
  19. Roberti de Siqueira, F.; Schwartz, W.R.; Pedrini, H. Multi-scale gray level co-occurrence matrices for texture description. Neurocomputing 2013, 120, 336–345. [Google Scholar] [CrossRef]
  20. Varghese, B.A.; Fields, B.K.; Hwang, D.H.; Duddalwar, V.A.; Matcuk, G.R., Jr.; Cen, S.Y. Spatial assessments in texture analysis: What the radiologist needs to know. Front. Radiol. 2023, 3, 1240544. [Google Scholar] [CrossRef] [PubMed]
  21. Zhang, W.; Guo, Y.; Jin, Q. Radiomics and its feature selection: A review. Symmetry 2023, 15, 1834. [Google Scholar] [CrossRef]
  22. Lu, Y.; Yang, C. Influence of GLCM texture parameters on lithological mapping using Sentinel-1 imagery. Geocarto Int. 2024, 39, 2425183. [Google Scholar] [CrossRef]
  23. Snider, E.J.; Hernandez-Torres, S.I.; Hennessey, R. Using ultrasound image augmentation and ensemble predictions to prevent machine-learning model overfitting. Diagnostics 2023, 13, 417. [Google Scholar] [CrossRef]
  24. Piffer, S.; Ubaldi, L.; Tangaro, S.; Retico, A.; Talamonti, C. Tackling the small data problem in medical image classification with artificial intelligence: A systematic review. Prog. Biomed. Eng. 2024, 6, 032001. [Google Scholar] [CrossRef] [PubMed]
  25. Ultrasound Brain Images for Brain Cancer. Available online: https://www.kaggle.com/datasets/vuppalaadithyasairam/ultrasound-breast-images-for-breast-cancer (accessed on 16 May 2024).
  26. Brain Tumor Classification (MRI) Dataset. Available online: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri (accessed on 18 May 2024).
  27. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  28. Margetis, K.; Dowling, T.J. Cervical Degenerative Disc Disease; StatPearls Publishing: Online, 2025. [Google Scholar]
  29. Rahman, W.U.; Jiang, W.; Zhao, F.; Li, Z.; Wang, G.; Yang, G. Biomechanical effect of C5-C6 intervertebral disc degeneration on the human lower cervical spine (C3–C7): A finite element study. Comput. Methods Biomech. Biomed. Eng. 2023, 26, 820–834. [Google Scholar] [CrossRef]
  30. Azmoodeh-Kalati, M.; Shabani, H.; Maghareh, M.S.; Barzegar, Z.; Lashgari, R. Leveraging an ensemble of EfficientNetV1 and EfficientNetV2 models for classification and interpretation of breast cancer histopathology images. Sci. Rep. 2025, 15, 21541. [Google Scholar] [CrossRef]
  31. Liu, D.; Wang, W.; Wu, X.; Yang, J. EfficientNetV2 model for breast cancer histopathological image classification. In Proceedings of the 2022 3rd International Conference on Electronic Communication and Artificial Intelligence (IWECAI), Zhuhai, China, 14–16 January 2022; IEEE: New York City, NY, USA, 2022; pp. 384–387. [Google Scholar]
  32. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York City, NY, USA, 2017; pp. 618–626. [Google Scholar]
  33. Hamdi, M.; Senan, E.M.; Jadhav, M.E.; Olayah, F.; Awaji, B.; Alalayah, K.M. Hybrid models based on fusion features of a CNN and handcrafted features for accurate histopathological image analysis for diagnosing malignant lymphomas. Diagnostics 2023, 13, 2258. [Google Scholar] [CrossRef]
  34. Hashmi, M.F.; Husain, A.A.; Kotambkar, D.M.; Kumar, K.D. Hybrid deep learning and handcrafted feature fusion for pneumonia detection in chest X-rays. J. Electr. Syst. Inf. Technol. 2025, 12, 94. [Google Scholar] [CrossRef]
  35. Javanshir, K.; Ghafouri-Rouzbehani, P.; Zohrehvand, A.; Naeimi, A.; Fernández-de-las-Peñas, C.; Nikbakht, H.-A.; Mousavi-Khatir, S.R.; Valera-Calero, J.A. Cervical multifidus and longus colli ultrasound differences among patients with cervical disc bulging, protrusion and extrusion, and asymptomatic controls: A cross-sectional study. J. Clin. Med. 2024, 13, 624. [Google Scholar] [CrossRef] [PubMed]
  36. Ghabri, H.; Alqahtani, M.S.; Ben Othman, S.; Al-Rasheed, A.; Abbas, M.; Almubarak, H.A.; Sakli, H.; Abdelkarim, M.N. Transfer learning for accurate fetal organ classification from ultrasound images: A potential tool for maternal healthcare providers. Sci. Rep. 2023, 13, 17904. [Google Scholar] [CrossRef]
  37. Van De Vyver, G.; Lenz, A.T.; Smistad, E.; Olaisen, S.H.; Grenne, B.; Holte, E.; Dalen, H.; Løvstakken, L. Generative augmentations for improved cardiac ultrasound segmentation using diffusion models. Sci. Rep. 2025, 15, 38013. [Google Scholar] [CrossRef] [PubMed]
  38. Wang, W.; Li, H. A novel CycleGAN network applicable for enhancing low-quality ultrasound images of multiple organs. J. King Saud Univ. Comput. Inf. Sci. 2025, 37, 261. [Google Scholar] [CrossRef]
  39. Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting artificial intelligence models: A systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inform. 2024, 11, 10. [Google Scholar] [CrossRef] [PubMed]
  40. Kondylakis, H.; Catalan, R.; Alabart, S.M.; Barelle, C.; Bizopoulos, P.; Bobowicz, M.; Bona, J.; Fotiadis, D.I.; Garcia, T.; Gomez, I.; et al. Documenting the de-identification process of clinical and imaging data for AI for health imaging projects. Insights Imaging 2024, 15, 130. [Google Scholar] [CrossRef] [PubMed]
  41. Rempe, M.; Heine, L.; Seibold, C.; Hörst, F.; Kleesiek, J. De-identification of medical imaging data: A comprehensive tool for ensuring patient privacy. Eur. Radiol. 2025, 35, 7809–7818. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The Workflow Diagram of This Study. The figure illustrates the complete pipeline, including data preprocessing, noise reduction, texture characterization, and imbalanced dataset management. The figure explicitly differentiates between the two evaluation strategies employed in this study: (i) a train–test split, used for benchmarking with external ultrasound datasets, and (ii) K-fold cross-validation, applied exclusively to the neck ultrasound dataset to ensure robust model assessment given the limited sample size. The training stage incorporates transfer learning using EfficientNet with class-weighting, dropout, and optimization. Model performance is evaluated using F1-scores.
Figure 1. The Workflow Diagram of This Study. The figure illustrates the complete pipeline, including data preprocessing, noise reduction, texture characterization, and imbalanced dataset management. The figure explicitly differentiates between the two evaluation strategies employed in this study: (i) a train–test split, used for benchmarking with external ultrasound datasets, and (ii) K-fold cross-validation, applied exclusively to the neck ultrasound dataset to ensure robust model assessment given the limited sample size. The training stage incorporates transfer learning using EfficientNet with class-weighting, dropout, and optimization. Model performance is evaluated using F1-scores.
Diagnostics 16 00728 g001
Figure 2. A representative raw ultrasound image illustrating key features: deep fascia discontinuity (score = 2), deep fascia “Wen Li Bu Qing” (score = 1), Modified Heckmatt Scale (MHS) feature (score = 3), and deep fascia with myofascial adhesion (score = 1).
Figure 2. A representative raw ultrasound image illustrating key features: deep fascia discontinuity (score = 2), deep fascia “Wen Li Bu Qing” (score = 1), Modified Heckmatt Scale (MHS) feature (score = 3), and deep fascia with myofascial adhesion (score = 1).
Diagnostics 16 00728 g002
Figure 3. Images of a representative sample before and after cropping. The image was cropped from 1260 × 910 pixels to 620 × 620 pixels. This cropping step removes redundant background and ensures that the model focuses on the relevant anatomical region, improving consistency across samples.
Figure 3. Images of a representative sample before and after cropping. The image was cropped from 1260 × 910 pixels to 620 × 620 pixels. This cropping step removes redundant background and ensures that the model focuses on the relevant anatomical region, improving consistency across samples.
Diagnostics 16 00728 g003
Figure 4. The Outputs of GLCM Texture Characterization. The ideal outcome of filtering was to remove the noise under the hill-shaped line (red color) while maintaining the muscle fiber texture. (a) The first image shows the raw image, the second image is the binary image, which highlights the areas of interest by separating the features from the background, and the third image represents the enhancement, with noise more visible. (b) Among GLCM mean, contrast, and ASM, the GLCM mean performed the best to remove noise while preserving the muscle fiber texture. The enhancement reduces entropy (6.43 → 4.01) and increases variance (1179.13 → 1647.90), indicating improved structural clarity and contrast separation. This preprocessing enhances texture clarity, supporting more reliable feature extraction by the model.
Figure 4. The Outputs of GLCM Texture Characterization. The ideal outcome of filtering was to remove the noise under the hill-shaped line (red color) while maintaining the muscle fiber texture. (a) The first image shows the raw image, the second image is the binary image, which highlights the areas of interest by separating the features from the background, and the third image represents the enhancement, with noise more visible. (b) Among GLCM mean, contrast, and ASM, the GLCM mean performed the best to remove noise while preserving the muscle fiber texture. The enhancement reduces entropy (6.43 → 4.01) and increases variance (1179.13 → 1647.90), indicating improved structural clarity and contrast separation. This preprocessing enhances texture clarity, supporting more reliable feature extraction by the model.
Diagnostics 16 00728 g004
Figure 5. Comparison of The Image Before and After Preprocessing. The GLCM mean was done after the Gaussian Filter to better remove noise. The final output from the GLCM mean shows less noise and better contrast. This combined filtering pipeline improves fine structural details and reduces noise, resulting in clearer images that improve the reliability of downstream feature extraction.
Figure 5. Comparison of The Image Before and After Preprocessing. The GLCM mean was done after the Gaussian Filter to better remove noise. The final output from the GLCM mean shows less noise and better contrast. This combined filtering pipeline improves fine structural details and reduces noise, resulting in clearer images that improve the reliability of downstream feature extraction.
Diagnostics 16 00728 g005
Figure 6. Learning Curves of The Best Model (Method 10).
Figure 6. Learning Curves of The Best Model (Method 10).
Diagnostics 16 00728 g006
Figure 7. Results of the Conducted Tests. EfficientNet_B7 + SMOTE: Extract features using EfficientNet B7 and then apply SMOTE; EfficientNet_B0 + SMOTE: Extract features using EfficientNet B0 and then apply SMOTE; BUSI_EfficientNet_B0: Extract features using EfficientNet B0 with transfer learnt weights from an online breast ultrasound image dataset; BUSI_EfficientNet_B0 + SMOTE: Extract features using EfficientNet B0 with transfer learnt weights from an online breast ultrasound image dataset and then apply SMOTE; RegNetY: Extract features using RegNetY; RegNetY + SMOTE: Extract features using RegNetY and then apply SMOTE; BrainMRI_EfficientNet_B6: Extract features using EfficientNet B6 with transfer learnt weights from an online brain MRI image dataset; BrainMRI_EfficientNet_B6 + SMOTE: Extract features using EfficientNet B6 with transfer learnt weights from an online brain MRI image dataset and then apply SMOTE.
Figure 7. Results of the Conducted Tests. EfficientNet_B7 + SMOTE: Extract features using EfficientNet B7 and then apply SMOTE; EfficientNet_B0 + SMOTE: Extract features using EfficientNet B0 and then apply SMOTE; BUSI_EfficientNet_B0: Extract features using EfficientNet B0 with transfer learnt weights from an online breast ultrasound image dataset; BUSI_EfficientNet_B0 + SMOTE: Extract features using EfficientNet B0 with transfer learnt weights from an online breast ultrasound image dataset and then apply SMOTE; RegNetY: Extract features using RegNetY; RegNetY + SMOTE: Extract features using RegNetY and then apply SMOTE; BrainMRI_EfficientNet_B6: Extract features using EfficientNet B6 with transfer learnt weights from an online brain MRI image dataset; BrainMRI_EfficientNet_B6 + SMOTE: Extract features using EfficientNet B6 with transfer learnt weights from an online brain MRI image dataset and then apply SMOTE.
Diagnostics 16 00728 g007
Figure 8. The Sample Output of Grad-CAM. It shows how the model determines which parts of the input are more relevant to the features of class 0 or class 1. In this sample, only the red highlighted part led the model to classify it as class 0. (Note: All four features—Deep Fascia Discontinuity, Deep Fascia Fuzzy Texture, Deep Fascia and Myofascial Adhesion, MHS—are evaluated based on the same ultrasound image, so only one Grad-CAM visualization is shown).
Figure 8. The Sample Output of Grad-CAM. It shows how the model determines which parts of the input are more relevant to the features of class 0 or class 1. In this sample, only the red highlighted part led the model to classify it as class 0. (Note: All four features—Deep Fascia Discontinuity, Deep Fascia Fuzzy Texture, Deep Fascia and Myofascial Adhesion, MHS—are evaluated based on the same ultrasound image, so only one Grad-CAM visualization is shown).
Diagnostics 16 00728 g008
Table 1. Summary of Patients’ Information.
Table 1. Summary of Patients’ Information.
VariableGender
Male
(n = 75)
Female
(n = 109)
p-Value
Age, years42.81 ± 15.6040.35 ± 14.150.267
Weight, kg74.19 ± 14.1459.31 ± 12.85<0.001
Height, cm170.45 ± 7.12159.75 ± 5.87<0.001
BMI, kg/m226.26 ± 4.6723.64 ± 4.87<0.001
Note: The scores are in the form of average ± standard deviation.
Table 2. Number of Images for Each Score in Each Category.
Table 2. Number of Images for Each Score in Each Category.
FeaturesClassCervical Lower (CL)Cervical Middle (CM)Cervical Upper (CU)
Deep fascia discontinuity0142323
1348571
21347488
Deep fascia fuzzy texture0377079
1145112103
Modified Heckmatt scale (MHS)1375255
2748478
3603642
411107
Deep fascia and myofascial adhesion0289594
11548788
Table 3. Methods Used for Classification Models.
Table 3. Methods Used for Classification Models.
MethodModelDatasetWeight Used
1EfficientNet B0Breast CancerImageNet (1000 classes)
2ResNet 101Brain MRI (reduced size)ImageNet (1000 classes)
3EfficientNet B6Brain MRI (reduced size)ImageNet (1000 classes)
4EfficientNet V2LBrain MRI (reduced size)ImageNet (1000 classes)
5EfficientNet V2LBrain MRI (full size)ImageNet (1000 classes)
6EfficientNet V2L (with dropout)Brain MRI (full size)ImageNet (1000 classes)
7EfficientNet V2LNeck Disorder (Augmented)ImageNet (1000 classes)
8EfficientNet B0Neck Disorder (Augmented)Breast Cancer + EfficientNet B0
9EfficientNet V2LNeck Disorder (Augmented)Brain MRI (full size) + EfficientNet V2L
10EfficientNet V2LNeck Disorder (weighted)Brain MRI (full size) + EfficientNet V2L
Table 4. Training Results of Different Classification Models on Open Datasets.
Table 4. Training Results of Different Classification Models on Open Datasets.
MethodNumber of Parameters (Millions)Breast Cancer DatasetBrain MRI (Reduced)Brain MRI (Full)
15.30.99NA *NA *
244.5NA *0.85
343.00.53
4118.50.67
5118.5NA *0.72
6118.5 0.96
Note: NA *: Not Applicable (configurations that were not evaluated due to incompatible input image resolutions for the corresponding model architecture). Method 1: EfficientNet B0 with breast cancer dataset and pretrained weight with 1000 classes ImageNet; Method 2: ResNet 101 with reduced size brain MRI dataset and pretrained weight with 1000 classes ImageNet; Method 3: EfficientNet B6 with reduced size brain MRI dataset and pretrained weight with 1000 classes ImageNet; Method 4: EfficientNet V2L with reduced size brain MRI dataset and pretrained weight with 1000 classes ImageNet; Method 5: EfficientNet V2L with full size brain MRI dataset and pretrained weight with 1000 classes ImageNet; Method 6: EfficientNet V2L with dropout and full size brain MRI dataset and pretrained weight with 1000 classes ImageNet.
Table 5. The Overall Performance of The Classification Algorithm.
Table 5. The Overall Performance of The Classification Algorithm.
MethodF1-ScoreDeep Fascia Discontinuity (%)Deep Fascia Fuzzy Texture (%)Deep Fascia and Myofascial Adhesion (%)MHS (%)
CLCMCUCLCMCUCLCMCU
7Macro average455149535255NSNS
Weighted average524749555255
8Macro averageNS515052NS
Weighted average535052
9Macro averageNS626164NS
Weighted average626164
10Macro average356765747249785554
Weighted average526365767249815554
Note: NS: Not Significant (non-convergent training); CL: Cervical Lower region; CM: Cervical Middle region; CU: Cervical Upper region; Method 7: EfficientNet V2L directly trained on the given dataset with augmentation with the original pretrained weight of 1000 classes ImageNet; Method 8: EfficientNet B0 trained on the given dataset with augmentation with pretrained weight using the breast cancer dataset which was trained on EfficientNet B0 with weight of 1000 classes ImageNet; Method 9: EfficientNet V2L trained on the given dataset with augmentation to balance the dataset, with pretrained weight using full-size brain MRI dataset which was trained on EfficientNet V2L with weight of 1000 classes ImageNet; Method 10: EfficientNet V2L trained on the given dataset with weighted random sampler, with pretrained weight using full-size brain MRI dataset which was trained on EfficientNet V2L with weight of 1000 classes ImageNet.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, W.D.; Mok, S.-Y.; Lim, Y.M.; Tham, H.S.; Tan, L.F.; Foo, C.N.; Lim, C.P.Y.; Goh, C.-H. Development and Preliminary Evaluation of an EfficientNet-Based Deep Learning System for Ultrasound Assessment of Neck Disorders: A Single-Center Study. Diagnostics 2026, 16, 728. https://doi.org/10.3390/diagnostics16050728

AMA Style

Wang WD, Mok S-Y, Lim YM, Tham HS, Tan LF, Foo CN, Lim CPY, Goh C-H. Development and Preliminary Evaluation of an EfficientNet-Based Deep Learning System for Ultrasound Assessment of Neck Disorders: A Single-Center Study. Diagnostics. 2026; 16(5):728. https://doi.org/10.3390/diagnostics16050728

Chicago/Turabian Style

Wang, Wei Ding, Siew-Ying Mok, Yang Mooi Lim, Hui Saan Tham, Lee Fan Tan, Chai Nien Foo, Clara Pei Ying Lim, and Choon-Hian Goh. 2026. "Development and Preliminary Evaluation of an EfficientNet-Based Deep Learning System for Ultrasound Assessment of Neck Disorders: A Single-Center Study" Diagnostics 16, no. 5: 728. https://doi.org/10.3390/diagnostics16050728

APA Style

Wang, W. D., Mok, S.-Y., Lim, Y. M., Tham, H. S., Tan, L. F., Foo, C. N., Lim, C. P. Y., & Goh, C.-H. (2026). Development and Preliminary Evaluation of an EfficientNet-Based Deep Learning System for Ultrasound Assessment of Neck Disorders: A Single-Center Study. Diagnostics, 16(5), 728. https://doi.org/10.3390/diagnostics16050728

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop