Next Issue
Volume 11, May
Previous Issue
Volume 11, March
 
 

J. Imaging, Volume 11, Issue 4 (April 2025) – 35 articles

Cover Story (view full-size image): Prostate cancer (PCa) is the second most common malignancy among men worldwide; however, it is highly curable if detected early. Hence, the main clinical challenge is to accurately identify those with and without cancer as early as possible. This paper introduces a novel multi-encoder cross-attention 3D architecture for assessing PCa presence in whole bi-parametric magnetic resonance imaging (MRI) volumes. With an architecture specifically designed to exploit complementary imaging features alongside clinical variables and the ProstateNET Imaging Archive, the largest image database worldwide for PCa mpMRI data, this study establishes new baselines for performances. The proposed method paves the way towards the clinical adoption of deep learning models for accurately determining the presence of PCa in patient populations. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
17 pages, 39878 KiB  
Article
Real-Time Volume-Rendering Image Denoising Based on Spatiotemporal Weighted Kernel Prediction
by Xinran Xu, Chunxiao Xu and Lingxiao Zhao
J. Imaging 2025, 11(4), 126; https://doi.org/10.3390/jimaging11040126 - 21 Apr 2025
Viewed by 218
Abstract
Volumetric Path Tracing (VPT) based on Monte Carlo (MC) sampling often requires numerous samples for high-quality images, but real-time applications limit samples to maintain interaction rates, leading to significant noise. Traditional real-time denoising methods use radiance and geometric features as neural network inputs, [...] Read more.
Volumetric Path Tracing (VPT) based on Monte Carlo (MC) sampling often requires numerous samples for high-quality images, but real-time applications limit samples to maintain interaction rates, leading to significant noise. Traditional real-time denoising methods use radiance and geometric features as neural network inputs, but lightweight networks struggle with temporal stability and complex mapping relationships, causing blurry results. To address these issues, a spatiotemporal lightweight neural network is proposed to enhance the denoising performance of VPT-rendered images with low samples per pixel. First, the reprojection technique was employed to obtain features from historical frames. Next, a dual-input convolutional neural network architecture was designed to predict filtering kernels. Radiance and geometric features were encoded independently. The encoding of geometric features guided the pixel-wise fitting of radiance feature filters. Finally, learned weight filtering kernels were applied to images’ spatiotemporal filtering to produce denoised results. The experimental results across multiple denoising datasets demonstrate that this approach outperformed the baseline models in terms of feature extraction and detail representation capabilities while effectively suppressing noise with superior performance and enhanced temporal stability. Full article
Show Figures

Figure 1

44 pages, 11702 KiB  
Review
Low-Light Image and Video Enhancement for More Robust Computer Vision Tasks: A Review
by Mpilo M. Tatana, Mohohlo S. Tsoeu and Rito C. Maswanganyi
J. Imaging 2025, 11(4), 125; https://doi.org/10.3390/jimaging11040125 - 21 Apr 2025
Viewed by 396
Abstract
Computer vision aims to enable machines to understand the visual world. Computer vision encompasses numerous tasks, namely action recognition, object detection and image classification. Much research has been focused on solving these tasks, but one that remains relatively uncharted is light enhancement (LE). [...] Read more.
Computer vision aims to enable machines to understand the visual world. Computer vision encompasses numerous tasks, namely action recognition, object detection and image classification. Much research has been focused on solving these tasks, but one that remains relatively uncharted is light enhancement (LE). Low-light enhancement (LLE) is crucial as computer vision tasks fail in the absence of sufficient lighting, having to rely on the addition of peripherals such as sensors. This review paper will shed light on this (focusing on video enhancement) subfield of computer vision, along with the other forementioned computer vision tasks. The review analyzes both traditional and deep learning-based enhancers and provides a comparative analysis on recent models in the field. The review also analyzes how popular computer vision tasks are improved and made more robust when coupled with light enhancement algorithms. Results show that deep learners outperform traditional enhancers, with supervised learners obtaining the best results followed by zero-shot learners, while computer vision tasks are improved with light enhancement coupling. The review concludes by highlighting major findings such as that although supervised learners obtain the best results, due to a lack of real-world data and robustness to new data, a shift to zero-shot learners is required. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 2833 KiB  
Article
Evolution of Lung Disease Studied by Computed Tomography in Adults with Cystic Fibrosis Treated with Elexacaftor/Tezacaftor/Ivacaftor
by Susana Hernández-Muñiz, Paloma Caballero, Adrián Peláez, Marta Solís-García, Carmen de Benavides, Javier Collada, Ignacio Díaz-Lorenzo, Cristina Zorzo, Rosa Mar Gómez-Punter and Rosa María Girón
J. Imaging 2025, 11(4), 124; https://doi.org/10.3390/jimaging11040124 - 21 Apr 2025
Viewed by 201
Abstract
Elexacaftor–tezacaftor–ivacaftor (ETI) has shown clinical and spirometric benefits in cystic fibrosis (CF). CT remains a vital tool for diagnosing and monitoring structural lung disease. This study aimed to assess the evolution of lung disease, as evaluated through CT, in adults with CF after [...] Read more.
Elexacaftor–tezacaftor–ivacaftor (ETI) has shown clinical and spirometric benefits in cystic fibrosis (CF). CT remains a vital tool for diagnosing and monitoring structural lung disease. This study aimed to assess the evolution of lung disease, as evaluated through CT, in adults with CF after at least one year of ETI treatment. This ambispective observational analysis assessed lung CT scans performed before initiating ETI and after at least one year of treatment, using the modified Bhalla scoring system. For those patients with an earlier CT scan, a pre-treatment phase analysis was performed. Epidemiological, clinical, and functional parameters were evaluated. Results: Sixty-two patients were included (35 males, median age 30.4 ± 7.87 years). After at least one year of ETI, significant improvements were observed in the global CT Bhalla score (12.2 ± 2.8 vs. 14.0 ± 2.8), peribronchial thickening (1.4 ± 0.6 vs. 1.0 ± 0.4), and mucus plugging (1.6 ± 0.7 vs. 0.8 ± 0.6) (p < 0.001). Spirometry parameters increased significantly: the percentage of the predicted forced expiratory volume in the first second (ppFEV1) increased from 66.5 ± 19.8 to 77.0 ± 20.4 (p = 0.005) and forced vital capacity (ppFVC) from 80.6 ± 16.4 to 91.6 ± 14.1 (p < 0.001). Additionally, body mass index showed a significant increase. A moderate correlation was found between the Bhalla score and spirometry results. In the pre-treatment phase (n = 52), mucus plugging demonstrated a significant worsening, whereas global CT score, other subscores, and spirometry did not change significantly. Conclusions: In adults with CF, after at least one year of ETI, a significant improvement in structural lung disease was achieved, as reflected by the CT Bhalla score. Full article
(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)
Show Figures

Graphical abstract

17 pages, 8594 KiB  
Article
Evolutionary-Driven Convolutional Deep Belief Network for the Classification of Macular Edema in Retinal Fundus Images
by Rafael A. García-Ramírez, Ivan Cruz-Aceves, Arturo Hernández-Aguirre, Gloria P. Trujillo-Sánchez and Martha A. Hernandez-González
J. Imaging 2025, 11(4), 123; https://doi.org/10.3390/jimaging11040123 - 21 Apr 2025
Viewed by 124
Abstract
Early detection of diabetic retinopathy is critical for preserving vision in diabetic patients. The classification of lesions in Retinal fundus images, particularly macular edema, is an essential diagnostic tool, yet it presents a significant learning curve for both novice and experienced ophthalmologists. To [...] Read more.
Early detection of diabetic retinopathy is critical for preserving vision in diabetic patients. The classification of lesions in Retinal fundus images, particularly macular edema, is an essential diagnostic tool, yet it presents a significant learning curve for both novice and experienced ophthalmologists. To address this challenge, a novel Convolutional Deep Belief Network (CDBN) is proposed to classify image patches into three distinct categories: two types of macular edema—microhemorrhages and hard exudates—and a healthy category. The method leverages high-level feature extraction to mitigate issues arising from the high similarity of low-level features in noisy images. Additionally, a Real-Coded Genetic Algorithm optimizes the parameters of Gabor filters and the network, ensuring optimal feature extraction and classification performance. Experimental results demonstrate that the proposed CDBN outperforms comparative models, achieving an F1 score of 0.9258. These results indicate that the architecture effectively overcomes the challenges of lesion classification in retinal images, offering a robust tool for clinical application and paving the way for advanced clinical decision support systems in diabetic retinopathy management. Full article
Show Figures

Figure 1

15 pages, 3751 KiB  
Article
Classification of Parotid Tumors with Robust Radiomic Features from DCE- and DW-MRI
by Francesca Angelone, Silvia Tortora, Francesca Patella, Maria Chiara Bonanno, Maria Teresa Contaldo, Mario Sansone, Gianpaolo Carrafiello, Francesco Amato and Alfonso Maria Ponsiglione
J. Imaging 2025, 11(4), 122; https://doi.org/10.3390/jimaging11040122 - 17 Apr 2025
Viewed by 228
Abstract
This study aims to evaluate the role of MRI-based radiomic analysis and machine learning using both DWI with multiple B-values and dynamic contrast-enhanced T1-weighted sequences to differentiate benign (B) and malignant (M) parotid tumors. Patients underwent DCE- and DW-MRI. An expert radiologist performed [...] Read more.
This study aims to evaluate the role of MRI-based radiomic analysis and machine learning using both DWI with multiple B-values and dynamic contrast-enhanced T1-weighted sequences to differentiate benign (B) and malignant (M) parotid tumors. Patients underwent DCE- and DW-MRI. An expert radiologist performed the manual selection of 3D ROIs. Classification of malignant vs. benign parotid tumors was based on radiomic features extracted from DCE-based and DW-based parametric maps. Care was taken in robustness evaluation and the no-bias selection of features. Several classifiers were employed. Sensitivity and specificity ranged from 0.6 to 0.8. The combination of LASSO + neural networks achieved the highest performance (0.76 sensitivity and 0.75 specificity). Our study identified a few robust DCE-based radiomic features with respect to ROI selection that can effectively be adopted in classifying malignant vs. benign parotid tumors. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

31 pages, 203525 KiB  
Article
Implementation of Chaotic Synchronization and Artificial Neural Networks in Modified OTP Scheme for Image Encryption
by Hristina Stoycheva, Georgi Mihalev, Stanimir Sadinov and Krasen Angelov
J. Imaging 2025, 11(4), 121; https://doi.org/10.3390/jimaging11040121 - 17 Apr 2025
Viewed by 234
Abstract
This paper presents a modified image encryption scheme based on the OTP (One-Time Pad) algorithm, consisting of chaotic synchronization and artificial neural networks (ANNs) for improved security and efficiency. The scheme uses chaotic synchronization based on feedback control to create complex and unique [...] Read more.
This paper presents a modified image encryption scheme based on the OTP (One-Time Pad) algorithm, consisting of chaotic synchronization and artificial neural networks (ANNs) for improved security and efficiency. The scheme uses chaotic synchronization based on feedback control to create complex and unique encryption keys. Additionally, ANNs are used to approximate time functions, creating a neural encoding key, which adds an additional layer of complexity to the encryption process. The proposed scheme integrates static, chaotic, and neural keys in a multilayer structure, providing high resistance against statistical and cryptographic attacks. The results show that the proposed methodology achieves entropy values close to the theoretical maximum, effectively destroys the correlation between pixels, and demonstrates high sensitivity to variations in the input data. The proposed scheme shows very good feasibility in terms of both security and efficiency, which gives a reliable solution for secure image transmission and storage. This is proven by a study of resistance to various crypto–graphic attacks such as brute force attack, differential attack, noise and data cut attacks, key sensitivity, and computational complexity. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

14 pages, 733 KiB  
Article
RGB Color Space-Enhanced Training Data Generation for Cucumber Classification
by Hotaka Hoshino, Takuya Shindo, Takefumi Hiraguri and Nobuhiko Itoh
J. Imaging 2025, 11(4), 120; https://doi.org/10.3390/jimaging11040120 - 17 Apr 2025
Viewed by 173
Abstract
Cucumber farmers classify harvested cucumbers based on specific criteria before they are introduced to the market. During peak harvesting periods, farmers must process a large volume of cucumbers; however, the classification task requires specialized knowledge and experience. This expertise-dependent process poses a significant [...] Read more.
Cucumber farmers classify harvested cucumbers based on specific criteria before they are introduced to the market. During peak harvesting periods, farmers must process a large volume of cucumbers; however, the classification task requires specialized knowledge and experience. This expertise-dependent process poses a significant challenge, as it prevents untrained individuals, including hired workers, from effectively assisting in classification, thereby necessitating that farmers perform the task themselves. To address this issue, this study aims to develop a classification system that enables individuals, regardless of their level of expertise, to accurately classify cucumbers. The proposed system employs a convolutional neural network (CNN) to process cucumber images and generate classification results. The CNN used in this study consists of a total of 11 layers: 2 convolution layers, 2 pooling layers, 3 dense layers, and 4 dropout layers. To facilitate the widespread adoption of this system, improving classification accuracy is imperative. In this paper, we propose a method for embedding information related to cucumber length, bend, and thickness into the background space of cucumber images when creating training data. Specifically, this method encodes these attributes into the RGB color space, allowing the background color to vary based on the cucumber’s length, bend, and thickness. The effectiveness of the proposed method is validated through an evaluation of multi-class classification metrics, including accuracy, recall, precision, and F-measure, using cucumbers classified based on the criteria established by an actual agricultural cooperative. The experimental results demonstrate that the proposed method improves these evaluation metrics, thereby enhancing the overall performance of the system.Specifically, the proposed method achieved 79.1% accuracy, while the method without RGB color space achieved 70.1% accuracy. This indicates that the proposed method achieves 1.1 times better performance than the conventional method. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

22 pages, 13074 KiB  
Review
Computed Tomography Imaging of Thoracic Aortic Surgery: Distinguishing Life-Saving Repairs from Life-Threatening Complications
by Marco Fogante, Paolo Esposto Pirani, Fatjon Cela, Jacopo Alfonsi, Corrado Tagliati, Liliana Balardi, Giulio Argalia, Marco Di Eusanio and Nicolò Schicchi
J. Imaging 2025, 11(4), 119; https://doi.org/10.3390/jimaging11040119 - 17 Apr 2025
Viewed by 177
Abstract
Thoracic aortic pathology encompasses a spectrum of life-threatening conditions that demand prompt diagnosis and intervention. Significant advancements in surgical management, including open repair, endovascular aortic repair, and hybrid techniques, have markedly enhanced patient outcomes. However, these procedures necessitate meticulous imaging follow-up to identify [...] Read more.
Thoracic aortic pathology encompasses a spectrum of life-threatening conditions that demand prompt diagnosis and intervention. Significant advancements in surgical management, including open repair, endovascular aortic repair, and hybrid techniques, have markedly enhanced patient outcomes. However, these procedures necessitate meticulous imaging follow-up to identify potential complications. Computed tomography angiography remains the gold standard for evaluating aortic pathology, guiding surgical planning, and monitoring postoperative changes. A thorough understanding of the characteristic imaging features associated with various aortic surgical techniques is crucial for precise assessment, enhancing postoperative surveillance, and optimizing patient management. Distinguishing between surgical complications and postoperative findings is vital to prevent misdiagnosis. This review examines the imaging characteristics of thoracic aortic diseases and their corresponding surgical interventions, emphasizing the differentiation between expected postoperative findings and true pathological conditions. This approach aims to facilitate accurate diagnosis and effective management of complications, ultimately improving patient care. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 4568 KiB  
Article
Impact of Display Pixel–Aperture Ratio on Perceived Roughness, Glossiness, and Transparency
by Kosei Aketagawa, Midori Tanaka and Takahiko Horiuchi
J. Imaging 2025, 11(4), 118; https://doi.org/10.3390/jimaging11040118 - 16 Apr 2025
Viewed by 108
Abstract
Shitsukan, which encompasses the perception of roughness, glossiness, and transparency/translucency, represents the comprehensive visual appearance of objects and plays a crucial role in accurate reproduction across various fields, including manufacturing and imaging technologies. This study experimentally examines the impact of the pixel–aperture ratio [...] Read more.
Shitsukan, which encompasses the perception of roughness, glossiness, and transparency/translucency, represents the comprehensive visual appearance of objects and plays a crucial role in accurate reproduction across various fields, including manufacturing and imaging technologies. This study experimentally examines the impact of the pixel–aperture ratio on the perception of roughness, glossiness, and transparency. A visual evaluation experiment was conducted using natural images presented on stimuli with pixel–aperture ratios of 100% and 6%, employing an RGB sub-pixel array. The results demonstrated that the pixel–aperture ratio significantly affects the perception of glossiness and transparency, with the 100% pixel–aperture ratio producing a statistically significant effect compared to the 6% condition. However, roughness perception varied substantially among the observers, and no statistically significant effect was observed. Nonetheless, when comparing two observer clusters identified through clustering analysis, the cluster favoring the 100% pixel–aperture ratio exhibited “Huge” effect sizes for all perceptual attributes. Additionally, the findings indicate that the degree of influence of pixel–aperture ratio on glossiness and transparency is not constant and can vary depending on individual observer differences and image characteristics. Full article
(This article belongs to the Special Issue Color in Image Processing and Computer Vision)
Show Figures

Figure 1

14 pages, 3375 KiB  
Article
YOLO-Tryppa: A Novel YOLO-Based Approach for Rapid and Accurate Detection of Small Trypanosoma Parasites
by Davide Antonio Mura, Luca Zedda, Andrea Loddo and Cecilia Di Ruberto
J. Imaging 2025, 11(4), 117; https://doi.org/10.3390/jimaging11040117 - 15 Apr 2025
Viewed by 253
Abstract
Early detection of Trypanosoma parasites is critical for the prompt treatment of trypanosomiasis, a neglected tropical disease that poses severe health and socioeconomic challenges in affected regions. To address the limitations of traditional manual microscopy and prior automated methods, we propose YOLO-Tryppa, a [...] Read more.
Early detection of Trypanosoma parasites is critical for the prompt treatment of trypanosomiasis, a neglected tropical disease that poses severe health and socioeconomic challenges in affected regions. To address the limitations of traditional manual microscopy and prior automated methods, we propose YOLO-Tryppa, a novel YOLO-based framework specifically engineered for the rapid and accurate detection of small Trypanosoma parasites in microscopy images. YOLO-Tryppa incorporates ghost convolutions to reduce computational complexity while maintaining robust feature extraction and introduces a dedicated P2 prediction head to improve the localization of small objects. By eliminating the redundant P5 prediction head, the proposed approach achieves a significantly lower parameter count and reduced GFLOPs. Experimental results on the public Tryp dataset demonstrate that YOLO-Tryppa outperforms the previous state of the art by achieving an AP50 of 71.3%, thereby setting a new benchmark for both accuracy and efficiency. These improvements make YOLO-Tryppa particularly well-suited for deployment in resource-constrained settings, facilitating more rapid and reliable diagnostic practices. Full article
Show Figures

Figure 1

29 pages, 6364 KiB  
Article
Face Anti-Spoofing Based on Adaptive Channel Enhancement and Intra-Class Constraint
by Ye Li, Wenzhe Sun, Zuhe Li and Xiang Guo
J. Imaging 2025, 11(4), 116; https://doi.org/10.3390/jimaging11040116 - 10 Apr 2025
Viewed by 255
Abstract
Face anti-spoofing detection is crucial for identity verification and security monitoring. However, existing single-modal models struggle with feature extraction under complex lighting conditions and background variations. Moreover, the feature distributions of live and spoofed samples often overlap, resulting in suboptimal classification performance. To [...] Read more.
Face anti-spoofing detection is crucial for identity verification and security monitoring. However, existing single-modal models struggle with feature extraction under complex lighting conditions and background variations. Moreover, the feature distributions of live and spoofed samples often overlap, resulting in suboptimal classification performance. To address these issues, we propose a jointly optimized framework integrating the Enhanced Channel Attention (ECA) mechanism and the Intra-Class Differentiator (ICD). The ECA module extracts features through deep convolution, while the Bottleneck Reconstruction Module (BRM) employs a channel compression–expansion mechanism to refine spatial feature selection. Furthermore, the channel attention mechanism enhances key channel representation. Meanwhile, the ICD mechanism enforces intra-class compactness and inter-class separability, optimizing feature distribution both within and across classes, thereby improving feature learning and generalization performance. Experimental results show that our framework achieves average classification error rates (ACERs) of 2.45%, 1.16%, 1.74%, and 2.17% on the CASIA-SURF, CASIA-SURF CeFA, CASIA-FASD, and OULU-NPU datasets, outperforming existing methods. Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
Show Figures

Figure 1

22 pages, 3420 KiB  
Article
A Hybrid CNN Framework DLI-Net for Acne Detection with XAI
by Shaila Sharmin, Fahmid Al Farid, Md. Jihad, Shakila Rahman, Jia Uddin, Rayhan Kabir Rafi, Radia Hossan and Hezerul Abdul Karim
J. Imaging 2025, 11(4), 115; https://doi.org/10.3390/jimaging11040115 - 10 Apr 2025
Viewed by 547
Abstract
Acne is a prevalent skin condition that can significantly impact individuals’ psychological and physiological well-being. Detecting acne lesions is crucial for improving dermatological care and providing timely treatment. Numerous studies have explored the application of deep learning models to enhance the accuracy and [...] Read more.
Acne is a prevalent skin condition that can significantly impact individuals’ psychological and physiological well-being. Detecting acne lesions is crucial for improving dermatological care and providing timely treatment. Numerous studies have explored the application of deep learning models to enhance the accuracy and speed of acne diagnoses. This study introduces a novel hybrid model that combines DeepLabV3 for precise image segmentation with InceptionV3 for classification, offering an enhanced solution for acne detection. The DeepLabV3 model isolates acne lesions and generates accurate segmentation masks, while InceptionV3 efficiently classifies the different types of acne, improving the overall diagnostic accuracy. The model was trained using a custom dataset and evaluated using advanced optimization techniques. The hybrid model achieved exceptional performances with a validation accuracy of 97%, a test accuracy of 97%, an F1 score of 0.97, a precision of 0.97, and a recall of 0.97, surpassing many of the existing baseline models. To enhance its interpretability further, Grad-CAM (Gradient-Weighted Class Activation Mapping) is utilized to visualize the regions of the image that the model focuses on during predictions, providing transparent insights into the decision-making process. This study underscores the transformative potential of AI in dermatology, offering a robust solution for acne detection and classification, which can significantly improve clinical decision making and patient outcomes. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 28051 KiB  
Article
WEDM: Wavelet-Enhanced Diffusion with Multi-Stage Frequency Learning for Underwater Image Enhancement
by Junhao Chen, Sichao Ye, Xiong Ouyang and Jiayan Zhuang
J. Imaging 2025, 11(4), 114; https://doi.org/10.3390/jimaging11040114 - 9 Apr 2025
Viewed by 326
Abstract
Underwater image enhancement (UIE) is inherently challenging due to complex degradation effects such as light absorption and scattering, which result in color distortion and a loss of fine details. Most existing methods focus on spatial-domain processing, often neglecting the frequency-domain characteristics that are [...] Read more.
Underwater image enhancement (UIE) is inherently challenging due to complex degradation effects such as light absorption and scattering, which result in color distortion and a loss of fine details. Most existing methods focus on spatial-domain processing, often neglecting the frequency-domain characteristics that are crucial for effectively restoring textures and edges. In this paper, we propose a novel UIE framework, the Wavelet-based Enhancement Diffusion Model (WEDM), which integrates frequency-domain decomposition with diffusion models. The WEDM consists of two main modules: the Wavelet Color Compensation Module (WCCM) for color correction in the LAB space using discrete wavelet transform, and the Wavelet Diffusion Module (WDM), which replaces traditional convolutions with wavelet-based operations to preserve multi-scale frequency features. By combining residual denoising diffusion with frequency-specific processing, the WEDM effectively reduces noise amplification and high-frequency blurring. Ablation studies further demonstrate the essential roles of the WCCM and WDM in improving color fidelity and texture details. Our framework offers a robust solution for underwater visual tasks, with promising applications in marine exploration and ecological monitoring. Full article
(This article belongs to the Special Issue Underwater Imaging (2nd Edition))
Show Figures

Figure 1

13 pages, 2935 KiB  
Article
Recurrence Quantification Analysis for Scene Change Detection and Foreground/Background Segmentation in Videos
by Theodora Kyprianidi, Effrosyni Doutsi and Panagiotis Tsakalides
J. Imaging 2025, 11(4), 113; https://doi.org/10.3390/jimaging11040113 - 8 Apr 2025
Viewed by 278
Abstract
This paper presents the mathematical framework of Recurrence Quantification Analysis (RQA) for dynamic video processing, exploring its applications in two primary tasks: scene change detection and adaptive foreground/background segmentation. Originally developed for time series analysis, Recurrence Quantification Analysis (RQA) examines the recurrence of [...] Read more.
This paper presents the mathematical framework of Recurrence Quantification Analysis (RQA) for dynamic video processing, exploring its applications in two primary tasks: scene change detection and adaptive foreground/background segmentation. Originally developed for time series analysis, Recurrence Quantification Analysis (RQA) examines the recurrence of states within a dynamic system. When applied to video streams, RQA detects recurrent patterns by leveraging the temporal dynamics of video frames. This approach offers a computationally efficient and robust alternative to traditional deep learning methods, which often demand extensive training data and high computational power. Our approach is evaluated on three annotated video datasets: Autoshot, RAI, and BBC Planet Earth, where it demonstrates effectiveness in detecting abrupt scene changes, achieving results comparable to state-of-the-art techniques. We also apply RQA to foreground/background segmentation using the UCF101 and DAVIS datasets, where it accurately distinguishes between foreground motion and static background regions. Through the examination of heatmaps based on the embedding dimension and Recurrence Plots (RPs), we show that RQA provides precise segmentation, with RPs offering clearer delineation of foreground objects. Our findings indicate that RQA is a promising, flexible, and computationally efficient approach to video analysis, with potential applications across various domains requiring dynamic video processing. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

21 pages, 15502 KiB  
Article
Multi-Scale Spatiotemporal Feature Enhancement and Recursive Motion Compensation for Satellite Video Geographic Registration
by Yu Geng, Jingguo Lv, Shuwei Huang and Boyu Wang
J. Imaging 2025, 11(4), 112; https://doi.org/10.3390/jimaging11040112 - 8 Apr 2025
Viewed by 265
Abstract
Satellite video geographic alignment can be applied to target detection and tracking, true 3D scene construction, image geometry measurement, etc., which is a necessary preprocessing step for satellite video applications. In this paper, a multi-scale spatiotemporal feature enhancement and recursive motion compensation method [...] Read more.
Satellite video geographic alignment can be applied to target detection and tracking, true 3D scene construction, image geometry measurement, etc., which is a necessary preprocessing step for satellite video applications. In this paper, a multi-scale spatiotemporal feature enhancement and recursive motion compensation method for satellite video geographic alignment is proposed. Based on the SuperGlue matching algorithm, the method achieves automatic matching of inter-frame image points by introducing the multi-scale dilated attention (MSDA) to enhance the feature extraction and adopting a joint multi-frame optimization strategy (MFMO), designing a recursive motion compensation model (RMCM) to eliminate the cumulative effect of the orbit error and improve the accuracy of the inter-frame image point matching, and using a rational function model to establish the geometrical mapping between the video and the ground points to realize the georeferencing of satellite video. The geometric mapping between video and ground points is established by using the rational function model to realize the geographic alignment of satellite video. The experimental results show that the method achieves the inter-frame matching accuracy of 0.8 pixel level, and the georeferencing accuracy error is 3 m, which is a significant improvement compared with the traditional single-frame method, and the method in this paper can provide a certain reference for the subsequent related research. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

28 pages, 4886 KiB  
Article
The Aesthetic Appreciation of Multi-Stable Images
by Levin Saracbasi and Heiko Hecht
J. Imaging 2025, 11(4), 111; https://doi.org/10.3390/jimaging11040111 - 4 Apr 2025
Viewed by 375
Abstract
Does the quality that renders multi-stable images fascinating, the sudden perceptual reorganization, the switching from one interpretation into another, also make these images appear beautiful? Or is the aesthetic quality of multi-stable figures unrelated to the ease with which they switch? Across two [...] Read more.
Does the quality that renders multi-stable images fascinating, the sudden perceptual reorganization, the switching from one interpretation into another, also make these images appear beautiful? Or is the aesthetic quality of multi-stable figures unrelated to the ease with which they switch? Across two experiments, we presented multi-stable images and manipulated their perceptual stability. We also presented their unambiguous components in isolation. In the first experiment, this manipulation targeted the inherent stimulus stability through properties like figural size and composition. The second experiment added an instruction for observers to actively control the stability, by attempting to either enhance or prevent perceptual switches as best they could. We found that higher stability was associated with higher liking, positive valence, and lower arousal. This increase in appreciation was mainly driven by inherent stimulus properties. The stability instruction only increased the liking of figures that had been comparatively stable to begin with. We conclude that the fascinating feature of multi-stable images does not contribute to their aesthetic liking. In fact, perceptual switching is detrimental to it. Processing fluency can explain this counterintuitive finding. We also discuss the role of ambiguity in the aesthetic quality of multi-stable images. Full article
Show Figures

Figure 1

18 pages, 4882 KiB  
Review
Artificial Intelligence in Placental Pathology: New Diagnostic Imaging Tools in Evolution and in Perspective
by Antonio d’Amati, Giorgio Maria Baldini, Tommaso Difonzo, Angela Santoro, Miriam Dellino, Gerardo Cazzato, Antonio Malvasi, Antonella Vimercati, Leonardo Resta, Gian Franco Zannoni and Eliano Cascardi
J. Imaging 2025, 11(4), 110; https://doi.org/10.3390/jimaging11040110 - 3 Apr 2025
Viewed by 401
Abstract
Artificial intelligence (AI) has emerged as a transformative tool in placental pathology, offering novel diagnostic methods that promise to improve accuracy, reduce inter-observer variability, and positively impact pregnancy outcomes. The primary objective of this review is to summarize recent developments in AI applications [...] Read more.
Artificial intelligence (AI) has emerged as a transformative tool in placental pathology, offering novel diagnostic methods that promise to improve accuracy, reduce inter-observer variability, and positively impact pregnancy outcomes. The primary objective of this review is to summarize recent developments in AI applications tailored specifically to placental histopathology. Current AI-driven approaches include advanced digital image analysis, three-dimensional placental reconstruction, and deep learning models such as GestAltNet for precise gestational age estimation and automated identification of histological lesions, including decidual vasculopathy and maternal vascular malperfusion. Despite these advancements, significant challenges remain, notably dataset heterogeneity, interpretative limitations of current AI algorithms, and issues regarding model transparency. We critically address these limitations by proposing targeted solutions, such as augmenting training datasets with annotated artifacts, promoting explainable AI methods, and enhancing cross-institutional collaborations. Finally, we outline future research directions, emphasizing the refinement of AI algorithms for routine clinical integration and fostering interdisciplinary cooperation among pathologists, computational researchers, and clinical specialists. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

16 pages, 5365 KiB  
Article
Validation of Quantitative Ultrasound and Texture Derivative Analyses-Based Model for Upfront Prediction of Neoadjuvant Chemotherapy Response in Breast Cancer
by Adrian Wai Chan, Lakshmanan Sannachi, Daniel Moore-Palhares, Archya Dasgupta, Sonal Gandhi, Rossanna Pezo, Andrea Eisen, Ellen Warner, Frances C. Wright, Nicole Look Hong, Ali Sadeghi-Naini, Mia Skarpathiotakis, Belinda Curpen, Carrie Betel, Michael C. Kolios, Maureen Trudeau and Gregory J. Czarnota
J. Imaging 2025, 11(4), 109; https://doi.org/10.3390/jimaging11040109 - 3 Apr 2025
Viewed by 302
Abstract
This work was conducted in order to validate a pre-treatment quantitative ultrasound (QUS) and texture derivative analyses-based prediction model proposed in our previous study to identify responders and non-responders to neoadjuvant chemotherapy in patients with breast cancer. The validation cohort consisted of 56 [...] Read more.
This work was conducted in order to validate a pre-treatment quantitative ultrasound (QUS) and texture derivative analyses-based prediction model proposed in our previous study to identify responders and non-responders to neoadjuvant chemotherapy in patients with breast cancer. The validation cohort consisted of 56 breast cancer patients diagnosed between the years 2018 and 2021. Among all patients, 53 were treated with neoadjuvant chemotherapy and three had unplanned changes in their chemotherapy cycles. Radio Frequency (RF) data were collected volumetrically prior to the start of chemotherapy. In addition to tumour region (core), a 5 mm tumour-margin was also chosen for parameters estimation. The prediction model, which was developed previously based on quantitative ultrasound, texture derivative, and tumour molecular subtypes, was used to identify responders and non-responders. The actual response, which was determined by clinical and pathological assessment after lumpectomy or mastectomy, was then compared to the predicted response. The sensitivity, specificity, positive predictive value, negative predictive value, and F1 score for determining chemotherapy response of all patients in the validation cohort were 94%, 67%, 96%, 57%, and 95%, respectively. Removing patients who had unplanned changes in their chemotherapy resulted in a sensitivity, specificity, positive predictive value, negative predictive value, and F1 score of all patients in the validation cohort of 94%, 100%, 100%, 50%, and 97%, respectively. Explanations for the misclassified cases included unplanned modifications made to the type of chemotherapy during treatment, inherent limitations of the predictive model, presence of DCIS in tumour structure, and an ill-defined tumour border in a minority of cases. Validation of a model was conducted in an independent cohort of patient for the first time to predict the tumour response to neoadjuvant chemotherapy using quantitative ultrasound, texture derivate, and molecular features in patients with breast cancer. Further research is needed to improve the positive predictive value and evaluate whether the treatment outcome can be improved in predicted non-responders by switching to other treatment options. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

18 pages, 4664 KiB  
Article
Local Binary Pattern–Cycle Generative Adversarial Network Transfer: Transforming Image Style from Day to Night
by Abeer Almohamade, Salma Kammoun and Fawaz Alsolami
J. Imaging 2025, 11(4), 108; https://doi.org/10.3390/jimaging11040108 - 31 Mar 2025
Viewed by 286
Abstract
Transforming images from day style to night style is crucial for enhancing perception in autonomous driving and smart surveillance. However, existing CycleGAN-based approaches struggle with texture loss, structural inconsistencies, and high computational costs. In our attempt to overcome these challenges, we produced LBP-CycleGAN, [...] Read more.
Transforming images from day style to night style is crucial for enhancing perception in autonomous driving and smart surveillance. However, existing CycleGAN-based approaches struggle with texture loss, structural inconsistencies, and high computational costs. In our attempt to overcome these challenges, we produced LBP-CycleGAN, a new modification of CycleGAN that benefits from the advantages of a Local Binary Pattern (LBP) that extracts details of texture, unlike traditional CycleGAN, which relies heavily on color transformations. Our model leverages LBP-based single-channel inputs, ensuring sharper, more consistent night-time textures. We evaluated three model variations: (1) LBP-CycleGAN with a self-attention mechanism in both the generator and discriminator, (2) LBP-CycleGAN with a self-attention mechanism in the discriminator only, and (3) LBP-CycleGAN without a self-attention mechanism. Our results demonstrate that the LBP-CycleGAN model without self-attention outperformed the other models, achieving a superior texture quality while significantly reducing the training time and computational overhead. This work opens up new possibilities for efficient, high-fidelity night-time image translation in real-world applications, including autonomous driving and low-light vision systems. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

22 pages, 5756 KiB  
Article
Optimizing Digital Image Quality for Improved Skin Cancer Detection
by Bogdan Dugonik, Marjan Golob, Marko Marhl and Aleksandra Dugonik
J. Imaging 2025, 11(4), 107; https://doi.org/10.3390/jimaging11040107 - 31 Mar 2025
Viewed by 321
Abstract
The rising incidence of skin cancer, particularly melanoma, underscores the need for improved diagnostic tools in dermatology. Accurate imaging plays a crucial role in early detection, yet challenges related to color accuracy, image distortion, and resolution persist, leading to diagnostic errors. This study [...] Read more.
The rising incidence of skin cancer, particularly melanoma, underscores the need for improved diagnostic tools in dermatology. Accurate imaging plays a crucial role in early detection, yet challenges related to color accuracy, image distortion, and resolution persist, leading to diagnostic errors. This study addresses these issues by evaluating color reproduction accuracy across various imaging devices and lighting conditions. Using a ColorChecker test chart, color deviations were measured through Euclidean distances (ΔE*, ΔC*), and nonlinear color differences (ΔE00, ΔC00), while the color rendering index (CRI) and television lighting consistency index (TLCI) were used to evaluate the influence of light sources on image accuracy. Significant color discrepancies were identified among mobile phones, DSLRs, and mirrorless cameras, with inadequate dermatoscope lighting systems contributing to further inaccuracies. We demonstrate practical applications, including manual camera adjustments, grayscale reference cards, post-processing techniques, and optimized lighting conditions, to improve color accuracy. This study provides applicable solutions for enhancing color accuracy in dermatological imaging, emphasizing the need for standardized calibration techniques and imaging protocols to improve diagnostic reliability, support AI-assisted skin cancer detection, and contribute to high-quality image databases for clinical and automated analysis. Full article
(This article belongs to the Special Issue Novel Approaches to Image Quality Assessment)
Show Figures

Figure 1

12 pages, 1100 KiB  
Article
Lightweight U-Net for Blood Vessels Segmentation in X-Ray Coronary Angiography
by Jesus Salvador Ramos-Cortez, Dora E. Alvarado-Carrillo, Emmanuel Ovalle-Magallanes and Juan Gabriel Avina-Cervantes
J. Imaging 2025, 11(4), 106; https://doi.org/10.3390/jimaging11040106 - 30 Mar 2025
Viewed by 282
Abstract
Blood vessel segmentation in X-ray coronary angiography (XCA) plays a crucial role in diagnosing cardiovascular diseases, enabling a precise assessment of arterial structures. However, segmentation is challenging due to a low signal-to-noise ratio, interfering background structures, and vessel bifurcations, which hinder the accuracy [...] Read more.
Blood vessel segmentation in X-ray coronary angiography (XCA) plays a crucial role in diagnosing cardiovascular diseases, enabling a precise assessment of arterial structures. However, segmentation is challenging due to a low signal-to-noise ratio, interfering background structures, and vessel bifurcations, which hinder the accuracy of deep learning models. Additionally, deep learning models for this task often require high computational resources, limiting their practical application in real-time clinical settings. This study proposes a lightweight variant of the U-Net architecture using a structured kernel pruning strategy inspired by the Lottery Ticket Hypothesis. The pruning method systematically removes entire convolutional filters from each layer based on a global reduction factor, generating compact subnetworks that retain key representational capacity. This results in a significantly smaller model without compromising the segmentation performance. This approach is evaluated on two benchmark datasets, demonstrating consistent improvements in segmentation accuracy compared to the vanilla U-Net. Additionally, model complexity is significantly reduced from 31 M to 1.9 M parameters, improving efficiency while maintaining high segmentation quality. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 9360 KiB  
Article
Inspection of Defective Glass Bottle Mouths Using Machine Learning
by Daiki Tomita and Yue Bao
J. Imaging 2025, 11(4), 105; https://doi.org/10.3390/jimaging11040105 - 29 Mar 2025
Viewed by 241
Abstract
In this study, we proposed a method for detecting chips in the mouth of glass bottles using machine learning. In recent years, Japanese cosmetic glass bottles have gained attention for their advancements in manufacturing technology and eco-friendliness through the use of recycled glass, [...] Read more.
In this study, we proposed a method for detecting chips in the mouth of glass bottles using machine learning. In recent years, Japanese cosmetic glass bottles have gained attention for their advancements in manufacturing technology and eco-friendliness through the use of recycled glass, leading to an increase in the volume of glass bottle exports overseas. Although cosmetic bottles are subject to strict quality inspections from the standpoint of safety, the complicated shape of the glass bottle mouths makes automated inspections difficult, and visual inspections have been the norm. Visual inspections conducted by workers have become problematic because it has become clear that the standard of judgment differs from worker to worker and that inspection accuracy deteriorates after long hours of work. To address these issues, the development of inspection systems for glass bottles using image processing and machine learning has been actively pursued. While conventional image processing methods can detect chips in glass bottles, the target glass bottles are those without screw threads, and the light from the light source is diffusely reflected by the screw threads in the glass bottles in this study, resulting in a loss of accuracy. Additionally, machine learning-based inspection methods are generally limited to the body and bottom of the bottle, excluding the mouth from analysis. To overcome these challenges, this study proposed a method to extract only the screw thread regions from the bottle image, using a dedicated machine learning model, and perform defect detection. To evaluate the effectiveness of the proposed approach, accuracy was assessed by training models using images of both the entire mouth and just the screw threads. Experimental results showed that the accuracy of the model trained using the image of the entire mouth was 98.0%, while the accuracy of the model trained using the image of the screw threads was 99.7%, indicating that the proposed method improves the accuracy by 1.7%. In a demonstration experiment using data obtained at a factory, the accuracy of the model trained using images of the entire mouth was 99.7%, whereas the accuracy of the model trained using images of screw threads was 100%, indicating that the proposed system can be used to detect chips in factories. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

11 pages, 1088 KiB  
Article
Evaluating Super-Resolution Models in Biomedical Imaging: Applications and Performance in Segmentation and Classification
by Mario Amoros, Manuel Curado and Jose F. Vicent
J. Imaging 2025, 11(4), 104; https://doi.org/10.3390/jimaging11040104 - 29 Mar 2025
Viewed by 365
Abstract
Super-resolution (SR) techniques have gained traction in biomedical imaging for their ability to enhance image quality. However, it remains unclear whether these improvements translate into better performance in clinical tasks. In this study, we provide a comprehensive evaluation of state-of-the-art SR models—including CNN- [...] Read more.
Super-resolution (SR) techniques have gained traction in biomedical imaging for their ability to enhance image quality. However, it remains unclear whether these improvements translate into better performance in clinical tasks. In this study, we provide a comprehensive evaluation of state-of-the-art SR models—including CNN- and Transformer-based architectures—by assessing not only visual quality metrics (PSNR and SSIM) but also their downstream impact on segmentation and classification performance for lung CT scans. Using U-Net and ResNet architectures, we quantify how SR influences diagnostic tasks across different datasets, and we evaluate model generalization in cross-domain settings. Our findings show that advanced SR models such as SwinIR preserve diagnostic features effectively and, when appropriately applied, can enhance or maintain clinical performance even in low-resolution contexts. This work bridges the gap between image quality enhancement and practical clinical utility, providing actionable insights for integrating SR into real-world biomedical imaging workflows. Full article
(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)
Show Figures

Figure 1

13 pages, 5340 KiB  
Article
Riemannian Manifolds for Biological Imaging Applications Based on Unsupervised Learning
by Ilya Larin and Alexander Karabelsky
J. Imaging 2025, 11(4), 103; https://doi.org/10.3390/jimaging11040103 - 29 Mar 2025
Viewed by 390
Abstract
The development of neural networks has made the introduction of multimodal systems inevitable. Computer vision methods are still not widely used in biological research, despite their importance. It is time to recognize the significance of advances in feature extraction and real-time analysis of [...] Read more.
The development of neural networks has made the introduction of multimodal systems inevitable. Computer vision methods are still not widely used in biological research, despite their importance. It is time to recognize the significance of advances in feature extraction and real-time analysis of information from cells. Teacherless learning for the image clustering task is of great interest. In particular, the clustering of single cells is of great interest. This study will evaluate the feasibility of using latent representation and clustering of single cells in various applications in the fields of medicine and biotechnology. Of particular interest are embeddings, which relate to the morphological characterization of cells. Studies of C2C12 cells will reveal more about aspects of muscle differentiation by using neural networks. This work focuses on analyzing the applicability of the latent space to extract morphological features. Like many researchers in this field, we note that obtaining high-quality latent representations for phase-contrast or bright-field images opens new frontiers for creating large visual-language models. Graph structures are the main approaches to non-Euclidean manifolds. Graph-based segmentation has a long history, e.g., the normalized cuts algorithm treated segmentation as a graph partitioning problem—but only recently have such ideas merged with deep learning in an unsupervised manner. Recently, a number of works have shown the advantages of hyperbolic embeddings in vision tasks, including clustering and classification based on the Poincaré ball model. One area worth highlighting is unsupervised segmentation, which we believe is undervalued, particularly in the context of non-Euclidean spaces. In this approach, we aim to mark the beginning of our future work on integrating visual information and biological aspects of individual cells to multimodal space in comparative studies in vitro. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

15 pages, 2497 KiB  
Article
Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
by Junning Xu and Sanxin Jiang
J. Imaging 2025, 11(4), 102; https://doi.org/10.3390/jimaging11040102 - 28 Mar 2025
Viewed by 278
Abstract
There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the [...] Read more.
There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 5387 KiB  
Article
Dual-Stream Contrastive Latent Learning Generative Adversarial Network for Brain Image Synthesis and Tumor Classification
by Junaid Zafar, Vincent Koc and Haroon Zafar
J. Imaging 2025, 11(4), 101; https://doi.org/10.3390/jimaging11040101 - 28 Mar 2025
Viewed by 369
Abstract
Generative adversarial networks (GANs) prioritize pixel-level attributes over capturing the entire image distribution, which is critical in image synthesis. To address this challenge, we propose a dual-stream contrastive latent projection generative adversarial network (DSCLPGAN) for the robust augmentation of MRI images. The dual-stream [...] Read more.
Generative adversarial networks (GANs) prioritize pixel-level attributes over capturing the entire image distribution, which is critical in image synthesis. To address this challenge, we propose a dual-stream contrastive latent projection generative adversarial network (DSCLPGAN) for the robust augmentation of MRI images. The dual-stream generator in our architecture incorporates two specialized processing pathways: one is dedicated to local feature variation modeling, while the other captures global structural transformations, ensuring a more comprehensive synthesis of medical images. We used a transformer-based encoder–decoder framework for contextual coherence and the contrastive learning projection (CLP) module integrates contrastive loss into the latent space for generating diverse image samples. The generated images undergo adversarial refinement using an ensemble of specialized discriminators, where discriminator 1 (D1) ensures classification consistency with real MRI images, discriminator 2 (D2) produces a probability map of localized variations, and discriminator 3 (D3) preserves structural consistency. For validation, we utilized a publicly available MRI dataset which contains 3064 T1-weighted contrast-enhanced images with three types of brain tumors: meningioma (708 slices), glioma (1426 slices), and pituitary tumor (930 slices). The experimental results demonstrate state-of-the-art performance, achieving an SSIM of 0.99, classification accuracy of 99.4% for an augmentation diversity level of 5, and a PSNR of 34.6 dB. Our approach has the potential of generating high-fidelity augmentations for reliable AI-driven clinical decision support systems. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

14 pages, 3064 KiB  
Article
A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion
by Zhaoyu Shou, Yanjun Lin, Jianwen Mo and Ziyong Wu
J. Imaging 2025, 11(4), 99; https://doi.org/10.3390/jimaging11040099 - 27 Mar 2025
Viewed by 192
Abstract
The complexity of various factors influencing online learning makes it difficult to characterize learning concentration, while Accurately estimating students’ gaze points during learning video sessions represents a critical scientific challenge in assessing and enhancing the attentiveness of online learners. However, current appearance-based gaze [...] Read more.
The complexity of various factors influencing online learning makes it difficult to characterize learning concentration, while Accurately estimating students’ gaze points during learning video sessions represents a critical scientific challenge in assessing and enhancing the attentiveness of online learners. However, current appearance-based gaze estimation models lack a focus on extracting essential features and fail to effectively model the spatio-temporal relationships among the head, face, and eye regions, which limits their ability to achieve lower angular errors. This paper proposes an appearance-based gaze estimation model (RSP-MCGaze). The model constructs a feature extraction backbone network for gaze estimation (ResNetSC) by integrating ResNet and SCConv; this integration enhances the model’s ability to extract important features while reducing spatial and channel redundancy. Based on the ResNetSC backbone, the method for video gaze estimation was further optimized by jointly locating the head, eyes, and face. The experimental results demonstrate that our model achieves significantly higher performance compared to existing baseline models on public datasets, thereby fully confirming the superiority of our method in the gaze estimation task. The model achieves a detection error of 9.86 on the Gaze360 dataset and a detection error of 7.11 on the detectable face subset of Gaze360. Full article
Show Figures

Figure 1

33 pages, 22075 KiB  
Systematic Review
A Systematic Review of Medical Image Quality Assessment
by H. M. S. S. Herath, H. M. K. K. M. B. Herath, Nuwan Madusanka and Byeong-Il Lee
J. Imaging 2025, 11(4), 100; https://doi.org/10.3390/jimaging11040100 - 27 Mar 2025
Viewed by 480
Abstract
Medical image quality assessment (MIQA) is vital in medical imaging and directly affects diagnosis, patient treatment, and general clinical results. Accurate and high-quality imaging is necessary to make accurate diagnoses, efficiently design treatments, and consistently monitor diseases. This review summarizes forty-two research studies [...] Read more.
Medical image quality assessment (MIQA) is vital in medical imaging and directly affects diagnosis, patient treatment, and general clinical results. Accurate and high-quality imaging is necessary to make accurate diagnoses, efficiently design treatments, and consistently monitor diseases. This review summarizes forty-two research studies on diverse MIQA approaches and their effects on performance in diagnostics, patient results, and efficiency in the process. It contrasts subjective (manual assessment) and objective (rule-driven) evaluation methods, underscores the growing promise of machine intelligence and machine learning (ML) in MIQA automation, and describes the existing MIQA challenges. AI-powered tools are revolutionizing MIQA with automated quality checks, noise reduction, and artifact removal, producing consistent and reliable imaging evaluation. Enhanced image quality is demonstrated in every examination to improve diagnostic precision and support decision making in the clinic. However, challenges still exist, such as variability in quality and variability in human ratings and small datasets hindering standardization. These must be addressed with better-quality data, low-cost labeling, and standardization. Ultimately, this paper reinforces the need for high-quality medical imaging and the potential of MIQA with the power of AI. It is crucial to advance research in this area to advance healthcare. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

24 pages, 11715 KiB  
Article
Assessing Cancer Presence in Prostate MRI Using Multi-Encoder Cross-Attention Networks
by Avtantil Dimitriadis, Grigorios Kalliatakis, Richard Osuala, Dimitri Kessler, Simone Mazzetti, Daniele Regge, Oliver Diaz, Karim Lekadir, Dimitrios Fotiadis, Manolis Tsiknakis, Nikolaos Papanikolaou, ProCAncer-I Consortium and Kostas Marias
J. Imaging 2025, 11(4), 98; https://doi.org/10.3390/jimaging11040098 - 26 Mar 2025
Viewed by 438
Abstract
Prostate cancer (PCa) is currently the second most prevalent cancer among men. Accurate diagnosis of PCa can provide effective treatment for patients and reduce mortality. Previous works have merely focused on either lesion detection or lesion classification of PCa from magnetic resonance imaging [...] Read more.
Prostate cancer (PCa) is currently the second most prevalent cancer among men. Accurate diagnosis of PCa can provide effective treatment for patients and reduce mortality. Previous works have merely focused on either lesion detection or lesion classification of PCa from magnetic resonance imaging (MRI). In this work we focus on a critical, yet underexplored task of the PCa clinical workflow: distinguishing cases with cancer presence (pathologically confirmed PCa patients) from conditions with no suspicious PCa findings (no cancer presence). To this end, we conduct large-scale experiments for this task for the first time by adopting and processing the multi-centric ProstateNET Imaging Archive which contains more than 6 million image representations of PCa from more than 11,000 PCa cases, representing the largest collection of PCa MR images. Bi-parametric MR (bpMRI) images of 4504 patients alongside their clinical variables are used for training, while the architectures are evaluated on two hold-out test sets of 975 retrospective and 435 prospective patients. Our proposed multi-encoder-cross-attention-fusion architecture achieved a promising area under the receiver operating characteristic curve (AUC) of 0.91. This demonstrates our method’s capability of fusing complex bi-parametric imaging modalities and enhancing model robustness, paving the way towards the clinical adoption of deep learning models for accurately determining the presence of PCa across patient populations. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

29 pages, 9142 KiB  
Article
Self-Supervised Multi-Task Learning for the Detection and Classification of RHD-Induced Valvular Pathology
by Lorna Mugambi, Ciira wa Maina and Liesl Zühlke
J. Imaging 2025, 11(4), 97; https://doi.org/10.3390/jimaging11040097 - 25 Mar 2025
Viewed by 358
Abstract
Rheumatic heart disease (RHD) poses a significant global health challenge, necessitating improved diagnostic tools. This study investigated the use of self-supervised multi-task learning for automated echocardiographic analysis, aiming to predict echocardiographic views, diagnose RHD conditions, and determine severity. We compared two prominent self-supervised [...] Read more.
Rheumatic heart disease (RHD) poses a significant global health challenge, necessitating improved diagnostic tools. This study investigated the use of self-supervised multi-task learning for automated echocardiographic analysis, aiming to predict echocardiographic views, diagnose RHD conditions, and determine severity. We compared two prominent self-supervised learning (SSL) methods: DINOv2, a vision-transformer-based approach known for capturing implicit features, and simple contrastive learning representation (SimCLR), a ResNet-based contrastive learning method recognised for its simplicity and effectiveness. Both models were pre-trained on a large, unlabelled echocardiogram dataset and fine-tuned on a smaller, labelled subset. DINOv2 achieved accuracies of 92% for view classification, 98% for condition detection, and 99% for severity assessment. SimCLR demonstrated good performance as well, achieving accuracies of 99% for view classification, 92% for condition detection, and 96% for severity assessment. Embedding visualisations, using both Uniform Manifold Approximation Projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE), revealed distinct clusters for all tasks in both models, indicating the effective capture of the discriminative features of the echocardiograms. This study demonstrates the potential of using self-supervised multi-task learning for automated echocardiogram analysis, offering a scalable and efficient approach to improving RHD diagnosis, especially in resource-limited settings. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop