Journal of Imaging

35 pages, 11854 KiB

Open AccessArticle

ODDM: Integration of SMOTE Tomek with Deep Learning on Imbalanced Color Fundus Images for Classification of Several Ocular Diseases

by Afraz Danish Ali Qureshi, Hassaan Malik, Ahmad Naeem, Syeda Nida Hassan, Daesik Jeong and Rizwan Ali Naqvi

J. Imaging 2025, 11(8), 278; https://doi.org/10.3390/jimaging11080278 - 18 Aug 2025

Abstract

Ocular disease (OD) represents a complex medical condition affecting humans. OD diagnosis is a challenging process in the current medical system, and blindness may occur if the disease is not detected at its initial phase. Recent studies showed significant outcomes in the identification [...] Read more.

Ocular disease (OD) represents a complex medical condition affecting humans. OD diagnosis is a challenging process in the current medical system, and blindness may occur if the disease is not detected at its initial phase. Recent studies showed significant outcomes in the identification of OD using deep learning (DL) models. Thus, this work aims to develop a multi-classification DL-based model for the classification of seven ODs, including normal (NOR), age-related macular degeneration (AMD), diabetic retinopathy (DR), glaucoma (GLU), maculopathy (MAC), non-proliferative diabetic retinopathy (NPDR), and proliferative diabetic retinopathy (PDR), using color fundus images (CFIs). This work proposes a custom model named the ocular disease detection model (ODDM) based on a CNN. The proposed ODDM is trained and tested on a publicly available ocular disease dataset (ODD). Additionally, the SMOTE Tomek (SM-TOM) approach is also used to handle the imbalanced distribution of the OD images in the ODD. The performance of the ODDM is compared with seven baseline models, including DenseNet-201 (R₁), EfficientNet-B0 (R₂), Inception-V3 (R₃), MobileNet (R₄), Vgg-16 (R₅), Vgg-19 (R₆), and ResNet-50 (R₇). The proposed ODDM obtained a 98.94% AUC, along with 97.19% accuracy, a recall of 88.74%, a precision of 95.23%, and an F1-score of 88.31% in classifying the seven different types of OD. Furthermore, ANOVA and Tukey HSD (Honestly Significant Difference) post hoc tests are also applied to represent the statistical significance of the proposed ODDM. Thus, this study concludes that the results of the proposed ODDM are superior to those of baseline models and state-of-the-art models. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)

► Show Figures

Figure 1

18 pages, 7265 KiB

Open AccessArticle

Automated Task-Transfer Function Measurement for CT Image Quality Assessment Based on AAPM TG 233

by Choirul Anam, Riska Amilia, Ariij Naufal, Eko Hidayanto, Heri Sutanto, Lukmanda E. Lubis, Toshioh Fujibuchi and Geoff Dougherty

J. Imaging 2025, 11(8), 277; https://doi.org/10.3390/jimaging11080277 - 18 Aug 2025

Abstract

This study aims to develop and validate software for the automatic measurement of the task-transfer function (TTF) based on the American Association of Physicists in Medicine (AAPM) Task Group (TG) 233. The software consists of two main stages: automatic placement of the region [...] Read more.

This study aims to develop and validate software for the automatic measurement of the task-transfer function (TTF) based on the American Association of Physicists in Medicine (AAPM) Task Group (TG) 233. The software consists of two main stages: automatic placement of the region of interest (ROI) within circular objects of the phantoms and calculating the TTF. The software was developed on four CT phantom types: computational phantom, ACR 464 CT phantom, AAPM CT phantom, and Catphan^® 604 phantom. Each phantom was tested with varying parameters, including spatial resolution level, slice thickness, and image reconstruction technique. The results of TTF were compared with manual measurements performed using ImQuest version 7.3.01 and iQmetix-CT version v1.2. The software successfully located ROIs at all circular objects within each phantom and measured accurate TTF with various contrast-to-noise ratios (CNRs) of all phantoms. The TTF results were comparable to those obtained with ImQuest and iQmetrix-CT. It was found that the TTF curves produced by the software are smoother than those produced by ImQuest. An algorithm for the automated measurement of TTF was successfully developed and validated. TTF measurement with our software is highly user-friendly, requiring only a single click from the user. Full article

► Show Figures

Figure 1

29 pages, 693 KiB

Open AccessArticle

The Contribution of AIDA (Artificial Intelligence Dystocia Algorithm) to Cesarean Section Within Robson Classification Group

by Antonio Malvasi, Lorenzo E. Malgieri, Michael Stark, Edoardo Di Naro, Dan Farine, Giorgio Maria Baldini, Miriam Dellino, Murat Yassa, Andrea Tinelli, Antonella Vimercati and Tommaso Difonzo

J. Imaging 2025, 11(8), 276; https://doi.org/10.3390/jimaging11080276 - 16 Aug 2025

Abstract

Global cesarean section (CS) rates continue to rise, with the Robson classification widely used for analysis. However, Robson Group 2A patients (nulliparous women with induced labor) show disproportionately high CS rates that cannot be fully explained by demographic factors alone. This study explored [...] Read more.

Global cesarean section (CS) rates continue to rise, with the Robson classification widely used for analysis. However, Robson Group 2A patients (nulliparous women with induced labor) show disproportionately high CS rates that cannot be fully explained by demographic factors alone. This study explored how the Artificial Intelligence Dystocia Algorithm (AIDA) could enhance the Robson system by providing detailed information on geometric dystocia, thereby facilitating better understanding of factors contributing to CS and developing more targeted reduction strategies. The authors conducted a comprehensive literature review analyzing both classification systems across multiple databases and developed a theoretical framework for integration. AIDA categorized labor cases into five classes (0–4) by analyzing four key geometric parameters measured through intrapartum ultrasound: angle of progression (AoP), asynclitism degree (AD), head–symphysis distance (HSD), and midline angle (MLA). Significant asynclitism (AD ≥ 7.0 mm) was strongly associated with CS regardless of other parameters, potentially explaining many “failure to progress” cases in Robson Group 2A patients. The proposed integration created a combined classification providing both population-level and individual geometric risk assessment. The integration of AIDA with the Robson classification represented a potentially valuable advancement in CS risk assessment, combining population-level stratification with individual-level geometric assessment to enable more personalized obstetric care. Future validation studies across diverse settings are needed to establish clinical utility. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)

► Show Figures

Figure 1

22 pages, 3234 KiB

Open AccessArticle

A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI

by Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen and Firoz Ahmed

J. Imaging 2025, 11(8), 275; https://doi.org/10.3390/jimaging11080275 - 15 Aug 2025

Abstract

Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all [...] Read more.

Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17–1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to

150 \times 150

px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of

1 \times 10^{- 4}

, reduce-on-plateau, early stopping),

ℓ_{2}

regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

14 pages, 3502 KiB

Open AccessArticle

Deep Learning-Based Nuclei Segmentation and Melanoma Detection in Skin Histopathological Image Using Test Image Augmentation and Ensemble Model

by Mohammadesmaeil Akbarpour, Hamed Fazlollahiaghamalek, Mahdi Barati, Mehrdad Hashemi Kamangar and Mrinal Mandal

J. Imaging 2025, 11(8), 274; https://doi.org/10.3390/jimaging11080274 - 15 Aug 2025

Abstract

Histopathological images play a crucial role in diagnosing skin cancer. However, due to the very large size of digital histopathological images (typically in the order of billion pixels), manual image analysis is tedious and time-consuming. Therefore, there has been significant interest in developing [...] Read more.

Histopathological images play a crucial role in diagnosing skin cancer. However, due to the very large size of digital histopathological images (typically in the order of billion pixels), manual image analysis is tedious and time-consuming. Therefore, there has been significant interest in developing Artificial Intelligence (AI)-enabled computer-aided diagnosis (CAD) techniques for skin cancer detection. Due to the diversity of uncertain cell boundaries, automated nuclei segmentation of histopathological images remains challenging. Automating the identification of abnormal cell nuclei and analyzing their distribution across multiple tissue sections can significantly expedite comprehensive diagnostic assessments. In this paper, a deep neural network (DNN)-based technique is proposed to segment nuclei and detect melanoma in histopathological images. To achieve a robust performance, a test image is first augmented by various geometric operations. The augmented images are then passed through the DNN and the individual outputs are combined to obtain the final nuclei-segmented image. A morphological technique is then applied on the nuclei-segmented image to detect the melanoma region in the image. Experimental results show that the proposed technique can achieve a Dice score of 91.61% and 87.9% for nuclei segmentation and melanoma detection, respectively. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

24 pages, 5649 KiB

Open AccessArticle

Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion

by Md. Shahid Ahammed Shakil, Fahmid Al Farid, Nitun Kumar Podder, S. M. Hasan Sazzad Iqbal, Abu Saleh Musa Miah, Md Abdur Rahim and Hezerul Abdul Karim

J. Imaging 2025, 11(8), 273; https://doi.org/10.3390/jimaging11080273 - 14 Aug 2025

Abstract

Emotion recognition in speech is essential for enhancing human–computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep [...] Read more.

Emotion recognition in speech is essential for enhancing human–computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep learning models, struggling with robustness and accuracy in noisy or varied data. In this study, we propose a novel multi-stream deep learning feature fusion approach for Bangla speech emotion recognition, addressing the limitations of existing methods. Our approach begins with various data augmentation techniques applied to the training dataset, enhancing the model’s robustness and generalization. We then extract a comprehensive set of handcrafted features, including Zero-Crossing Rate (ZCR), chromagram, spectral centroid, spectral roll-off, spectral contrast, spectral flatness, Mel-Frequency Cepstral Coefficients (MFCCs), Root Mean Square (RMS) energy, and Mel-spectrogram. Although these features are used as 1D numerical vectors, some of them are computed from time–frequency representations (e.g., chromagram, Mel-spectrogram) that can themselves be depicted as images, which is conceptually close to imaging-based analysis. These features capture key characteristics of the speech signal, providing valuable insights into the emotional content. Sequentially, we utilize a multi-stream deep learning architecture to automatically learn complex, hierarchical representations of the speech signal. This architecture consists of three distinct streams: the first stream uses 1D convolutional neural networks (1D CNNs), the second integrates 1D CNN with Long Short-Term Memory (LSTM), and the third combines 1D CNNs with bidirectional LSTM (Bi-LSTM). These models capture intricate emotional nuances that handcrafted features alone may not fully represent. For each of these models, we generate predicted scores and then employ ensemble learning with a soft voting technique to produce the final prediction. This fusion of handcrafted features, deep learning-derived features, and ensemble voting enhances the accuracy and robustness of emotion identification across multiple datasets. Our method demonstrates the effectiveness of combining various learning models to improve emotion recognition in Bangla speech, providing a more comprehensive solution compared with existing methods. We utilize three primary datasets—SUBESCO, BanglaSER, and a merged version of both—as well as two external datasets, RAVDESS and EMODB, to assess the performance of our models. Our method achieves impressive results with accuracies of 92.90%, 85.20%, 90.63%, 67.71%, and 69.25% for the SUBESCO, BanglaSER, merged SUBESCO and BanglaSER, RAVDESS, and EMODB datasets, respectively. These results demonstrate the effectiveness of combining handcrafted features with deep learning-based features through ensemble learning for robust emotion recognition in Bangla speech. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 6304 KiB

Open AccessArticle

Digital Image Processing and Convolutional Neural Network Applied to Detect Mitral Stenosis in Echocardiograms: Clinical Decision Support

by Genilton de França Barros Filho, José Fernando de Morais Firmino, Israel Solha, Ewerton Freitas de Medeiros, Alex dos Santos Felix, José Carlos de Lima Júnior, Marcelo Dantas Tavares de Melo and Marcelo Cavalcanti Rodrigues

J. Imaging 2025, 11(8), 272; https://doi.org/10.3390/jimaging11080272 - 14 Aug 2025

Abstract

The mitral valve is the most susceptible to pathological alterations, such as mitral stenosis, characterized by failure of the valve to open completely. In this context, the objective of this study was to apply digital image processing (DIP) and develop a convolutional neural [...] Read more.

The mitral valve is the most susceptible to pathological alterations, such as mitral stenosis, characterized by failure of the valve to open completely. In this context, the objective of this study was to apply digital image processing (DIP) and develop a convolutional neural network (CNN) to provide decision support for specialists in the diagnosis of mitral stenosis based on transesophageal echocardiography examinations. The following procedures were implemented: acquisition of echocardiogram exams; application of DIP; use of augmentation techniques; and development of a CNN. The DIP classified 26.7% cases without stenosis, 26.7% with mild stenosis, 13.3% with moderate stenosis, and 33.3% with severe stenosis. A CNN was initially developed to classify videos into those four categories. However, the number of acquired exams was insufficient to effectively train the model for this purpose. So, the final model was trained to differentiate between videos with or without stenosis, achieving an accuracy of 92% with a loss of 0.26. The results demonstrate that both DIP and CNN are effective in distinguishing between cases with and without stenosis. Moreover, DIP was capable of classifying varying degrees of stenosis severity—mild, moderate, and severe—highlighting its potential as a valuable tool in clinical decision support. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

19 pages, 6354 KiB

Open AccessArticle

Extract Nutritional Information from Bilingual Food Labels Using Large Language Models

by Fatmah Y. Assiri, Mohammad D. Alahmadi, Mohammed A. Almuashi and Ayidh M. Almansour

J. Imaging 2025, 11(8), 271; https://doi.org/10.3390/jimaging11080271 - 13 Aug 2025

Abstract

Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for [...] Read more.

Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for online grocery stores to offer reliable nutrition facts and empower customers to make informed dietary decisions. Unfortunately, product labels are typically available in image formats, requiring organizations and online stores to manually transcribe them—a process that is not only time-consuming but also highly prone to human error, especially with multilingual labels that add complexity to the task. Our study investigates the challenges and effectiveness of leveraging large language models (LLMs) to extract nutritional elements and values from multilingual food product labels, with a specific focus on Arabic and English. A comprehensive empirical analysis was conducted using a manually curated dataset of 294 food product labels, comprising 588 transcribed nutritional elements and values in both languages, which served as the ground truth for evaluation. The findings reveal that while LLMs performed better in extracting English elements and values compared to Arabic, our post-processing techniques significantly enhanced their accuracy, with GPT-4o outperforming GPT-4V and Gemini. Full article

(This article belongs to the Special Issue Computer Vision for Food Data Analysis: Methods, Challenges, and Applications)

► Show Figures

Figure 1

17 pages, 8033 KiB

Open AccessArticle

PU-DZMS: Point Cloud Upsampling via Dense Zoom Encoder and Multi-Scale Complementary Regression

by Shucong Li, Zhenyu Liu, Tianlei Wang and Zhiheng Zhou

J. Imaging 2025, 11(8), 270; https://doi.org/10.3390/jimaging11080270 - 12 Aug 2025

Abstract

Point cloud imaging technology usually faces the problem of point cloud sparsity, which leads to a lack of important geometric detail. There are many point cloud upsampling networks that have been designed to solve this problem. However, the existing methods have limitations in [...] Read more.

Point cloud imaging technology usually faces the problem of point cloud sparsity, which leads to a lack of important geometric detail. There are many point cloud upsampling networks that have been designed to solve this problem. However, the existing methods have limitations in local–global relation understanding, leading to contour distortion and many local sparse regions. To this end, PU-DZMS is proposed with two components. (1) the Dense Zoom Encoder (DENZE) is designed to capture local–global features by using ZOOM Blocks with a dense connection. The main module in the ZOOM Block is the Zoom Encoder, which embeds a Transformer mechanism into the down–upsampling process to enhance local–global geometric features. The geometric edge of the point cloud would be clear under the DENZE. (2) The Multi-Scale Complementary Regression (MSCR) module is designed to expand the features and regress a dense point cloud. MSCR obtains the features’ geometric distribution differences across scales to ensure geometric continuity, and it regresses new points by adopting cross-scale residual learning. The local sparse regions of the point cloud would be reduced by the MSCR module. The experimental results on the PU-GAN dataset and the PU-Net dataset show that the proposed method performs well on point cloud upsampling tasks. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

24 pages, 948 KiB

Open AccessReview

A Review on Deep Learning Methods for Glioma Segmentation, Limitations, and Future Perspectives

by Cecilia Diana-Albelda, Álvaro García-Martín and Jesus Bescos

J. Imaging 2025, 11(8), 269; https://doi.org/10.3390/jimaging11080269 - 11 Aug 2025

Abstract

Accurate and automated segmentation of gliomas from Magnetic Resonance Imaging (MRI) is crucial for effective diagnosis, treatment planning, and patient monitoring. However, the aggressive nature and morphological complexity of these tumors pose significant challenges that call for advanced segmentation techniques. This review provides [...] Read more.

Accurate and automated segmentation of gliomas from Magnetic Resonance Imaging (MRI) is crucial for effective diagnosis, treatment planning, and patient monitoring. However, the aggressive nature and morphological complexity of these tumors pose significant challenges that call for advanced segmentation techniques. This review provides a comprehensive analysis of Deep Learning (DL) methods for glioma segmentation, with a specific focus on bridging the gap between research performance and practical clinical deployment. We evaluate over 80 state-of-the-art models published up to 2025, categorizing them into CNN-based, Pure Transformer, and Hybrid CNN-Transformer architectures. The primary objective of this paper is to critically assess these models not only on their segmentation accuracy but also on their computational efficiency and suitability for real-world medical environments by incorporating hardware resource considerations. We present a comparison of model performance on the BraTS datasets benchmark and introduce a suitability analysis for top-performing models based on their robustness, efficiency, and completeness of tumor region delineation. By identifying current trends, limitations, and key trade-offs, this review offers future research directions aimed at optimizing the balance between technical performance and clinical usability to improve diagnostic outcomes for glioma patients. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

28 pages, 13462 KiB

Open AccessArticle

Research on the Accessibility of Different Colour Schemes for Web Resources for People with Colour Blindness

by Daiva Sajek, Olena Korotenko and Tetiana Kyrychok

J. Imaging 2025, 11(8), 268; https://doi.org/10.3390/jimaging11080268 - 11 Aug 2025

Abstract

This study is devoted to the analysis of the perception of colour schemes of web resources by users with different types of colour blindness (colour vision deficiency). The purpose of this study is to develop recommendations for choosing the optimal colour scheme for [...] Read more.

This study is devoted to the analysis of the perception of colour schemes of web resources by users with different types of colour blindness (colour vision deficiency). The purpose of this study is to develop recommendations for choosing the optimal colour scheme for web resource design that will ensure the comfortable perception of content for the broadest possible audience, including users with colour vision deficiency of various types (deuteranopia and deuteranomaly, protanopia and protanomaly, tritanopia, and tritanomaly). This article presents the results of a survey of people with different colour vision deficiencies regarding the accessibility of web resources created using different colour schemes. The colour deviation value ∆E was calculated to objectively assess changes in the perception of different colour groups by people with colour vision impairments. The conclusions of this study emphasise the importance of taking into account the needs of users with colour vision impairments when developing web resources. Specific recommendations for choosing the best colour schemes for websites are also offered, which will help increase the accessibility and effectiveness of web content for users with different types of colour blindness. Full article

(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)

► Show Figures

Figure 1

17 pages, 7225 KiB

Open AccessArticle

Placido Sub-Pixel Edge Detection Algorithm Based on Enhanced Mexican Hat Wavelet Transform and Improved Zernike Moments

by Yujie Wang, Jinyu Liang, Yating Xiao, Xinfeng Liu, Jiale Li, Guangyu Cui and Quan Zhang

J. Imaging 2025, 11(8), 267; https://doi.org/10.3390/jimaging11080267 - 11 Aug 2025

Abstract

In order to meet the high-precision location requirements of the corneal Placido ring edge in corneal topographic reconstruction, this paper proposes a sub-pixel edge detection algorithm based on multi-scale and multi-position enhanced Mexican Hat Wavelet Transform and improved Zernike moment. Firstly, the image [...] Read more.

In order to meet the high-precision location requirements of the corneal Placido ring edge in corneal topographic reconstruction, this paper proposes a sub-pixel edge detection algorithm based on multi-scale and multi-position enhanced Mexican Hat Wavelet Transform and improved Zernike moment. Firstly, the image undergoes preliminary processing using a multi-scale and multi-position enhanced Mexican Hat Wavelet Transform function. Subsequently, the preliminary edge information extracted is relocated based on the Zernike moments of a 9 × 9 template. Finally, two improved adaptive edge threshold algorithms are employed to determine the actual sub-pixel edge points of the image, thereby realizing sub-pixel edge detection for corneal Placido ring images. Through comparison and analysis of edge extraction results from real human eye images obtained using the algorithm proposed in this paper and those from other existing algorithms, it is observed that the average sub-pixel edge error of other algorithms is 0.286 pixels, whereas the proposed algorithm achieves an average error of only 0.094 pixels. Furthermore, the proposed algorithm demonstrates strong robustness against noise. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

14 pages, 2224 KiB

Open AccessArticle

Evaluation of Transfer Learning Efficacy for Surgical Suture Quality Classification on Limited Datasets

by Roman Ishchenko, Maksim Solopov, Andrey Popandopulo, Elizaveta Chechekhina, Viktor Turchin, Fedor Popivnenko, Aleksandr Ermak, Konstantyn Ladyk, Anton Konyashin, Kirill Golubitskiy, Aleksei Burtsev and Dmitry Filimonov

J. Imaging 2025, 11(8), 266; https://doi.org/10.3390/jimaging11080266 - 8 Aug 2025

Abstract

This study evaluates the effectiveness of transfer learning with pre-trained convolutional neural networks (CNNs) for the automated binary classification of surgical suture quality (high-quality/low-quality) using photographs of three suture types: interrupted open vascular sutures (IOVS), continuous over-and-over open sutures (COOS), and interrupted laparoscopic [...] Read more.

This study evaluates the effectiveness of transfer learning with pre-trained convolutional neural networks (CNNs) for the automated binary classification of surgical suture quality (high-quality/low-quality) using photographs of three suture types: interrupted open vascular sutures (IOVS), continuous over-and-over open sutures (COOS), and interrupted laparoscopic sutures (ILS). To address the challenge of limited medical data, eight state-of-the-art CNN architectures—EfficientNetB0, ResNet50V2, MobileNetV3Large, VGG16, VGG19, InceptionV3, Xception, and DenseNet121—were trained and validated on small datasets (100–190 images per type) using 5-fold cross-validation. Performance was assessed using the F1-score, AUC-ROC, and a custom weighted stability-aware score (Score_adj). The results demonstrate that transfer learning achieves robust classification (F1 > 0.90 for IOVS/ILS, 0.79 for COOS) despite data scarcity. ResNet50V2, DenseNet121, and Xception were more stable by Score_adj, with ResNet50V2 achieving the highest AUC-ROC (0.959 ± 0.008) for IOVS internal view classification. GradCAM visualizations confirmed model focus on clinically relevant features (e.g., stitch uniformity, tissue apposition). These findings validate transfer learning as a powerful approach for developing objective, automated surgical skill assessment tools, reducing reliance on subjective expert evaluations while maintaining accuracy in resource-constrained settings. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)

► Show Figures

Figure 1

19 pages, 12806 KiB

Open AccessArticle

A Vision Method for Detecting Citrus Separation Lines Using Line-Structured Light

by Qingcang Yu, Song Xue and Yang Zheng

J. Imaging 2025, 11(8), 265; https://doi.org/10.3390/jimaging11080265 - 8 Aug 2025

Abstract

The detection of citrus separation lines is a crucial step in the citrus processing industry. Inspired by the achievements of line-structured light technology in surface defect detection, this paper proposes a method for detecting citrus separation lines based on line-structured light. Firstly, a [...] Read more.

The detection of citrus separation lines is a crucial step in the citrus processing industry. Inspired by the achievements of line-structured light technology in surface defect detection, this paper proposes a method for detecting citrus separation lines based on line-structured light. Firstly, a gamma-corrected Otsu method is employed to extract the laser stripe region from the image. Secondly, an improved skeleton extraction algorithm is employed to mitigate the bifurcation errors inherent in original skeleton extraction algorithms while simultaneously acquiring 3D point cloud data of the citrus surface. Finally, the least squares progressive iterative approximation algorithm is applied to approximate the ideal surface curve; subsequently, principal component analysis is used to derive the normals of this ideally fitted curve. The deviation between each point (along its corresponding normal direction) and the actual geometric characteristic curve is then adopted as a quantitative index for separation lines positioning. The average similarity between the extracted separation lines and the manually defined standard separation lines reaches 92.5%. In total, 95% of the points on the separation lines obtained by this method have an error of less than 4 pixels. Experimental results demonstrate that through quantitative deviation analysis of geometric features, automatic detection and positioning of the separation lines are achieved, satisfying the requirements of high precision and non-destructiveness for automatic citrus splitting. Full article

(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)

► Show Figures

Figure 1

20 pages, 7305 KiB

Open AccessArticle

Systematic and Individualized Preparation of External Ear Canal Implants: Development and Validation of an Efficient and Accurate Automated Segmentation System

by Yanjing Luo, Mohammadtaha Kouchakinezhad, Felix Repp, Verena Scheper, Thomas Lenarz and Farnaz Matin-Mann

J. Imaging 2025, 11(8), 264; https://doi.org/10.3390/jimaging11080264 - 8 Aug 2025

Abstract

External ear canal (EEC) stenosis, often associated with cholesteatoma, carries a high risk of postoperative restenosis despite surgical intervention. While individualized implants offer promise in preventing restenosis, the high morphological variability of EECs and the lack of standardized definitions hinder systematic implant design. [...] Read more.

External ear canal (EEC) stenosis, often associated with cholesteatoma, carries a high risk of postoperative restenosis despite surgical intervention. While individualized implants offer promise in preventing restenosis, the high morphological variability of EECs and the lack of standardized definitions hinder systematic implant design. This study aimed to characterize individual EEC morphology and to develop a validated automated segmentation system for efficient implant preparation. Reference datasets were first generated by manual segmentation using 3D Slicer^TM software version 5.2.2. Based on these, we developed a customized plugin capable of automatically identifying the maximal implantable region within the EEC and measuring its key dimensions. The accuracy of the plugin was assessed by comparing it with manual segmentation results in terms of shape, volume, length, and width. Validation was further performed using three temporal bone implantation experiments with 3D-Bioplotter©-fabricated EEC implants. The automated system demonstrated strong consistency with manual methods and significantly improved segmentation efficiency. The plugin-generated models enabled successful implant fabrication and placement in all validation tests. These results confirm the system’s clinical feasibility and support its use for individualized and systematic EEC implant design. The developed tool holds potential to improve surgical planning and reduce postoperative restenosis in EEC stenosis treatment. Full article

(This article belongs to the Special Issue Current Progress in Medical Image Segmentation)

► Show Figures

Graphical abstract

23 pages, 5644 KiB

Open AccessArticle

Enhancing YOLOv5 for Autonomous Driving: Efficient Attention-Based Object Detection on Edge Devices

by Mortda A. A. Adam and Jules R. Tapamo

J. Imaging 2025, 11(8), 263; https://doi.org/10.3390/jimaging11080263 - 8 Aug 2025

Abstract

On-road vision-based systems rely on object detection to ensure vehicle safety and efficiency, making it an essential component of autonomous driving. Deep learning methods show high performance; however, they often require special hardware due to their large sizes and computational complexity, which makes [...] Read more.

On-road vision-based systems rely on object detection to ensure vehicle safety and efficiency, making it an essential component of autonomous driving. Deep learning methods show high performance; however, they often require special hardware due to their large sizes and computational complexity, which makes real-time deployment on edge devices expensive. This study proposes lightweight object detection models based on the YOLOv5s architecture, known for its speed and accuracy. The models integrate advanced channel attention strategies, specifically the ECA module and SE attention blocks, to enhance feature selection while minimizing computational overhead. Four models were developed and trained on the KITTI dataset. The models were analyzed using key evaluation metrics to assess their effectiveness in real-time autonomous driving scenarios, including precision, recall, and mean average precision (mAP). BaseECAx2 emerged as the most efficient model for edge devices, achieving the lowest GFLOPs (13) and smallest model size (9.1 MB) without sacrificing performance. The BaseSE-ECA model demonstrated outstanding accuracy in vehicle detection, reaching a precision of 96.69% and an mAP of 98.4%, making it ideal for high-precision autonomous driving scenarios. We also assessed the models’ robustness in more challenging environments by training and testing them on the BDD-100K dataset. While the models exhibited reduced performance in complex scenarios involving low-light conditions and motion blur, this evaluation highlights potential areas for improvement in challenging real-world driving conditions. This study bridges the gap between affordability and performance, presenting lightweight, cost-effective solutions for integration into real-time autonomous vehicle systems. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

29 pages, 3842 KiB

Open AccessArticle

SABE-YOLO: Structure-Aware and Boundary-Enhanced YOLO for Weld Seam Instance Segmentation

by Rui Wen, Wu Xie, Yong Fan and Lanlan Shen

J. Imaging 2025, 11(8), 262; https://doi.org/10.3390/jimaging11080262 - 6 Aug 2025

Abstract

Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, [...] Read more.

Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, existing approaches still face significant challenges in boundary perception and structural representation. Due to the inherently elongated shapes, complex geometries, and blurred edges of weld seams, current segmentation models often struggle to maintain high accuracy in practical applications. To address this issue, a novel structure-aware and boundary-enhanced YOLO (SABE-YOLO) is proposed for weld seam instance segmentation. First, a Structure-Aware Fusion Module (SAFM) is designed to enhance structural feature representation through strip pooling attention and element-wise multiplicative fusion, targeting the difficulty in extracting elongated and complex features. Second, a C2f-based Boundary-Enhanced Aggregation Module (C2f-BEAM) is constructed to improve edge feature sensitivity by integrating multi-scale boundary detail extraction, feature aggregation, and attention mechanisms. Finally, the inner minimum point distance-based intersection over union (Inner-MPDIoU) is introduced to improve localization accuracy for weld seam regions. Experimental results on the self-built weld seam image dataset show that SABE-YOLO outperforms YOLOv8n-Seg by 3 percentage points in the AP(50–95) metric, reaching 46.3%. Meanwhile, it maintains a low computational cost (18.3 GFLOPs) and a small number of parameters (6.6M), while achieving an inference speed of 127 FPS, demonstrating a favorable trade-off between segmentation accuracy and computational efficiency. The proposed method provides an effective solution for high-precision visual perception of complex weld seam structures and demonstrates strong potential for industrial application. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

11 pages, 1947 KiB

Open AccessArticle

Quantitative Magnetic Resonance Imaging and Patient-Reported Outcomes in Patients Undergoing Hip Labral Repair or Reconstruction

by Kyle S. J. Jamar, Adam Peszek, Catherine C. Alder, Trevor J. Wait, Caleb J. Wipf, Carson L. Keeter, Stephanie W. Mayer, Charles P. Ho and James W. Genuario

J. Imaging 2025, 11(8), 261; https://doi.org/10.3390/jimaging11080261 - 5 Aug 2025

Abstract

This study evaluates the relationship between preoperative cartilage quality, measured by T2 mapping, and patient-reported outcomes following labral tear treatment. We retrospectively reviewed patients aged 14–50 who underwent primary hip arthroscopy with either labral repair or reconstruction. Preoperative T2 values of femoral, acetabular, [...] Read more.

This study evaluates the relationship between preoperative cartilage quality, measured by T2 mapping, and patient-reported outcomes following labral tear treatment. We retrospectively reviewed patients aged 14–50 who underwent primary hip arthroscopy with either labral repair or reconstruction. Preoperative T2 values of femoral, acetabular, and labral tissue were assessed from MRI by blinded reviewers. International Hip Outcome Tool (iHOT-12) scores were collected preoperatively and up to two years postoperatively. Associations between T2 values and iHOT-12 scores were analyzed using univariate mixed linear models. Twenty-nine patients were included (mean age of 32.5 years, BMI 24 kg/m², 48.3% female, and 22 repairs). Across all patients, higher T2 values were associated with higher iHOT-12 scores at baseline and early postoperative timepoints (three months for cartilage and six months for labrum; p < 0.05). Lower T2 values were associated with higher 12- and 24-month iHOT-12 scores across all structures (p < 0.001). Similar trends were observed within the repair and reconstruction subgroups, with delayed negative associations correlating with worse tissue quality. T2 mapping showed time-dependent correlations with iHOT-12 scores, indicating that worse cartilage or labral quality predicts poorer long-term outcomes. These findings support the utility of T2 mapping as a preoperative tool for prognosis in hip preservation surgery. Full article

(This article belongs to the Special Issue New Developments in Musculoskeletal Imaging)

► Show Figures

Figure 1

19 pages, 7531 KiB

Open AccessArticle

Evaluating the Impact of 2D MRI Slice Orientation and Location on Alzheimer’s Disease Diagnosis Using a Lightweight Convolutional Neural Network

by Nadia A. Mohsin and Mohammed H. Abdulameer

J. Imaging 2025, 11(8), 260; https://doi.org/10.3390/jimaging11080260 - 5 Aug 2025

Abstract

Accurate detection of Alzheimer’s disease (AD) is critical yet challenging for early medical intervention. Deep learning methods, especially convolutional neural networks (CNNs), have shown promising potential for improving diagnostic accuracy using magnetic resonance imaging (MRI). This study aims to identify the most informative [...] Read more.

Accurate detection of Alzheimer’s disease (AD) is critical yet challenging for early medical intervention. Deep learning methods, especially convolutional neural networks (CNNs), have shown promising potential for improving diagnostic accuracy using magnetic resonance imaging (MRI). This study aims to identify the most informative combination of MRI slice orientation and anatomical location for AD classification. We propose an automated framework that first selects the most relevant slices using a feature entropy-based method applied to activation maps from a pretrained CNN model. For classification, we employ a lightweight CNN architecture based on depthwise separable convolutions to efficiently analyze the selected 2D MRI slices extracted from preprocessed 3D brain scans. To further interpret model behavior, an attention mechanism is integrated to analyze which feature level contributes the most to the classification process. The model is evaluated on three binary tasks: AD vs. mild cognitive impairment (MCI), AD vs. cognitively normal (CN), and MCI vs. CN. The experimental results show the highest accuracy (97.4%) in distinguishing AD from CN when utilizing the selected slices from the ninth axial segment, followed by the tenth segment of coronal and sagittal orientations. These findings demonstrate the significance of slice location and orientation in MRI-based AD diagnosis and highlight the potential of lightweight CNNs for clinical use. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

Journal Description

Journal of Imaging

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI