Previous Issue
Volume 11, April
 
 

J. Imaging, Volume 11, Issue 5 (May 2025) – 45 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
24 pages, 3955 KiB  
Article
IEWNet: Multi-Scale Robust Watermarking Network Against Infrared Image Enhancement Attacks
by Yu Bai, Li Li, Shanqing Zhang, Jianfeng Lu and Ting Luo
J. Imaging 2025, 11(5), 171; https://doi.org/10.3390/jimaging11050171 - 21 May 2025
Abstract
Infrared (IR) images record the temperature radiation distribution of the object being captured. The hue and color difference in the image reflect the caloric and temperature difference, respectively. However, due to the thermal diffusion effect, the target information in IR images can be [...] Read more.
Infrared (IR) images record the temperature radiation distribution of the object being captured. The hue and color difference in the image reflect the caloric and temperature difference, respectively. However, due to the thermal diffusion effect, the target information in IR images can be relatively large and the objects’ boundaries are blurred. Therefore, IR images may undergo some image enhancement operations prior to use in relevant application scenarios. Furthermore, Infrared Enhancement (IRE) algorithms have a negative impact on the watermarking information embedded into the IR image in most cases. In this paper, we propose a novel multi-scale robust watermarking model under IRE attack, called IEWNet. This model trains a preprocessing module for extracting image features based on the conventional Undecimated Dual Tree Complex Wavelet Transform (UDTCWT). Furthermore, we consider developing a noise layer with a focus on four deep learning and eight classical attacks, and all of these attacks are based on IRE algorithms. Moreover, we add a noise layer or an enhancement module between the encoder and decoder according to the application scenarios. The results of the imperceptibility experiments on six public datasets prove that the Peak Signal to Noise Ratio (PSNR) is usually higher than 40 dB. The robustness of the algorithms is also better than the existing state-of-the-art image watermarking algorithms used in the performance evaluation comparison. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

21 pages, 17011 KiB  
Article
Three-Blind Validation Strategy of Deep Learning Models for Image Segmentation
by Andrés Larroza, Francisco Javier Pérez-Benito, Raquel Tendero, Juan Carlos Perez-Cortes, Marta Román and Rafael Llobet
J. Imaging 2025, 11(5), 170; https://doi.org/10.3390/jimaging11050170 - 21 May 2025
Abstract
Image segmentation plays a central role in computer vision applications such as medical imaging, industrial inspection, and environmental monitoring. However, evaluating segmentation performance can be particularly challenging when ground truth is not clearly defined, as is often the case in tasks involving subjective [...] Read more.
Image segmentation plays a central role in computer vision applications such as medical imaging, industrial inspection, and environmental monitoring. However, evaluating segmentation performance can be particularly challenging when ground truth is not clearly defined, as is often the case in tasks involving subjective interpretation. These challenges are amplified by inter- and intra-observer variability, which complicates the use of human annotations as a reliable reference. To address this, we propose a novel validation framework—referred to as the three-blind validation strategy—that enables rigorous assessment of segmentation models in contexts where subjectivity and label variability are significant. The core idea is to have a third independent expert, blind to the labeler identities, assess a shuffled set of segmentations produced by multiple human annotators and/or automated models. This allows for the unbiased evaluation of model performance and helps uncover patterns of disagreement that may indicate systematic issues with either human or machine annotations. The primary objective of this study is to introduce and demonstrate this validation strategy as a generalizable framework for robust model evaluation in subjective segmentation tasks. We illustrate its practical implementation in a mammography use case involving dense tissue segmentation while emphasizing its potential applicability to a broad range of segmentation scenarios. Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
Show Figures

Figure 1

20 pages, 4751 KiB  
Article
Recovery and Characterization of Tissue Properties from Magnetic Resonance Fingerprinting with Exchange
by Naren Nallapareddy and Soumya Ray
J. Imaging 2025, 11(5), 169; https://doi.org/10.3390/jimaging11050169 - 20 May 2025
Abstract
Magnetic resonance fingerprinting (MRF), a quantitative MRI technique, enables the acquisition of multiple tissue properties in a single scan. In this paper, we study a proposed extension of MRF, MRF with exchange (MRF-X), which can enable acquisition of the six tissue properties [...] Read more.
Magnetic resonance fingerprinting (MRF), a quantitative MRI technique, enables the acquisition of multiple tissue properties in a single scan. In this paper, we study a proposed extension of MRF, MRF with exchange (MRF-X), which can enable acquisition of the six tissue properties T1a,T2a, T1b, T2b, ρ and τ simultaneously. In MRF-X, ‘a’ and ‘b’ refer to distinct compartments modeled in each voxel, while ρ is the fractional volume of component ‘a’, and τ is the exchange rate of protons between the two components. To assess the feasibility of recovering these properties, we first empirically characterize a similarity metric between MRF and MRF-X reconstructed tissue property values and known reference property values for candidate signals. Our characterization indicates that such a recovery is possible, although the similarity metric surface across the candidate tissue properties is less structured for MRF-X than for MRF. We then investigate the application of different optimization techniques to recover tissue properties from noisy MRF and MRF-X data. Previous work has widely utilized template dictionary-based approaches in the context of MRF; however, such approaches are infeasible with MRF-X. Our results show that Simplicial Homology Global Optimization (SHGO), a global optimization algorithm, and Limited-memory Bryoden–Fletcher–Goldfarb–Shanno algorithm with Bounds (L-BFGS-B), a local optimization algorithm, performed comparably with direct matching in two-tissue property MRF at an SNR of 5. These optimization methods also successfully recovered five tissue properties from MRF-X data. However, with the current pulse sequence and reconstruction approach, recovering all six tissue properties remains challenging for all the methods investigated. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

20 pages, 4395 KiB  
Article
The Creation of Artificial Data for Training a Neural Network Using the Example of a Conveyor Production Line for Flooring
by Alexey Zaripov, Roman Kulshin and Anatoly Sidorov
J. Imaging 2025, 11(5), 168; https://doi.org/10.3390/jimaging11050168 - 20 May 2025
Abstract
This work is dedicated to the development of a system for generating artificial data for training neural networks used within a conveyor-based technology framework. It presents an overview of the application areas of computer vision (CV) and establishes that traditional methods of data [...] Read more.
This work is dedicated to the development of a system for generating artificial data for training neural networks used within a conveyor-based technology framework. It presents an overview of the application areas of computer vision (CV) and establishes that traditional methods of data collection and annotation—such as video recording and manual image labeling—are associated with high time and financial costs, which limits their efficiency. In this context, synthetic data represents an alternative capable of significantly reducing the time and financial expenses involved in forming training datasets. Modern methods for generating synthetic images using various tools—from game engines to generative neural networks—are reviewed. As a tool-platform solution, the concept of digital twins for simulating technological processes was considered, within which synthetic data is utilized. Based on the review findings, a generalized model for synthetic data generation was proposed and tested on the example of quality control for floor coverings on a conveyor line. The developed system provided the generation of photorealistic and diverse images suitable for training neural network models. A comparative analysis showed that the YOLOv8 model trained on synthetic data significantly outperformed the model trained on real images: the mAP50 metric reached 0.95 versus 0.36, respectively. This result demonstrates the high adequacy of the model built on the synthetic dataset and highlights the potential of using synthetic data to improve the quality of computer vision models when access to real data is limited. Full article
(This article belongs to the Special Issue Industrial Machine Learning with Image Technology Integration)
Show Figures

Figure 1

39 pages, 14011 KiB  
Article
Comparing Geodesic Filtering to State-of-the-Art Algorithms: A Comprehensive Study and CUDA Implementation
by Pierre Boulanger and Sadid Bin Hasan
J. Imaging 2025, 11(5), 167; https://doi.org/10.3390/jimaging11050167 - 20 May 2025
Abstract
This paper presents a comprehensive investigation into advanced image processing using geodesic filtering within a Riemannian manifold framework. We introduce a novel geodesic filtering formulation that uniquely integrates spatial and intensity relationships through minimal path computation, demonstrating significant improvements in edge preservation and [...] Read more.
This paper presents a comprehensive investigation into advanced image processing using geodesic filtering within a Riemannian manifold framework. We introduce a novel geodesic filtering formulation that uniquely integrates spatial and intensity relationships through minimal path computation, demonstrating significant improvements in edge preservation and noise reduction compared to conventional methods. Our quantitative analysis using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) metrics across diverse image types reveals that our approach outperforms traditional techniques in preserving fine details while effectively suppressing both Gaussian and non-Gaussian noise. We developed an automatic parameter optimization methodology that eliminates manual tuning by identifying optimal filtering parameters based on image characteristics. Additionally, we present a highly optimized GPU implementation featuring innovative wave-propagation algorithms and memory access optimization techniques that achieve a 200× speedup, making geodesic filtering practical for real-time applications. Our work bridges the gap between theoretical elegance and computational practicality, establishing geodesic filtering as a superior solution for challenging image processing tasks in fields ranging from medical imaging to remote sensing. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

23 pages, 2927 KiB  
Article
Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion
by Soroush Oskouei, Marit Valla, André Pedersen, Erik Smistad, Vibeke Grotnes Dale, Maren Høibø, Sissel Gyrid Freim Wahl, Mats Dehli Haugum, Thomas Langø, Maria Paula Ramnefjell, Lars Andreas Akslen, Gabriel Kiss and Hanne Sorger
J. Imaging 2025, 11(5), 166; https://doi.org/10.3390/jimaging11050166 - 20 May 2025
Abstract
The increased workload in pathology laboratories today means automated tools such as artificial intelligence models can be useful, helping pathologists with their tasks. In this paper, we propose a segmentation model (DRU-Net) that can provide a delineation of human non-small cell lung carcinomas [...] Read more.
The increased workload in pathology laboratories today means automated tools such as artificial intelligence models can be useful, helping pathologists with their tasks. In this paper, we propose a segmentation model (DRU-Net) that can provide a delineation of human non-small cell lung carcinomas and an augmentation method that can improve classification results. The proposed model is a fused combination of truncated pre-trained DenseNet201 and ResNet101V2 as a patch-wise classifier, followed by a lightweight U-Net as a refinement model. Two datasets (Norwegian Lung Cancer Biobank and Haukeland University Lung Cancer cohort) were used to develop the model. The DRU-Net model achieved an average of 0.91 Dice similarity coefficient. The proposed spatial augmentation method (multi-lens distortion) improved the Dice similarity coefficient from 0.88 to 0.91. Our findings show that selecting image patches that specifically include regions of interest leads to better results for the patch-wise classifier compared to other sampling methods. A qualitative analysis by pathology experts showed that the DRU-Net model was generally successful in tumor detection. Results in the test set showed some areas of false-positive and false-negative segmentation in the periphery, particularly in tumors with inflammatory and reactive changes. In summary, the presented DRU-Net model demonstrated the best performance on the segmentation task, and the proposed augmentation technique proved to improve the results. Full article
Show Figures

Figure 1

22 pages, 9010 KiB  
Article
“ShapeNet”: A Shape Regression Convolutional Neural Network Ensemble Applied to the Segmentation of the Left Ventricle in Echocardiography
by Eduardo Galicia Gómez, Fabián Torres-Robles, Jorge Perez-Gonzalez and Fernando Arámbula Cosío
J. Imaging 2025, 11(5), 165; https://doi.org/10.3390/jimaging11050165 - 20 May 2025
Abstract
Left ventricle (LV) segmentation is crucial for cardiac diagnosis but remains challenging in echocardiography. We present ShapeNet, a fully automatic method combining a convolutional neural network (CNN) ensemble with an improved active shape model (ASM). ShapeNet predicts optimal pose (rotation, translation, and scale) [...] Read more.
Left ventricle (LV) segmentation is crucial for cardiac diagnosis but remains challenging in echocardiography. We present ShapeNet, a fully automatic method combining a convolutional neural network (CNN) ensemble with an improved active shape model (ASM). ShapeNet predicts optimal pose (rotation, translation, and scale) and shape parameters, which are refined using the improved ASM. The ASM optimizes an objective function constructed from gray-level profiles concatenated into a single contour appearance vector. The model was trained on 4800 augmented CAMUS images and tested on both CAMUS and EchoNet databases. It achieved a Dice coefficient of 0.87 and a Hausdorff Distance (HD) of 4.08 pixels on CAMUS, and a Dice coefficient of 0.81 with an HD of 10.21 pixels on EchoNet, demonstrating robust performance across datasets. These results highlight the improved accuracy in HD compared to previous semantic and shape-based segmentation methods by generating statistically valid LV contours from ultrasound images. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Show Figures

Figure 1

23 pages, 31912 KiB  
Article
LIM: Lightweight Image Local Feature Matching
by Shanquan Ying, Jianfeng Zhao, Guannan Li and Junjie Dai
J. Imaging 2025, 11(5), 164; https://doi.org/10.3390/jimaging11050164 - 20 May 2025
Abstract
Image matching is a fundamental problem in computer vision, serving as a core component in tasks such as visual localization, structure from motion, and SLAM. While recent advances using convolutional neural networks and transformer have achieved impressive accuracy, their substantial computational demands hinder [...] Read more.
Image matching is a fundamental problem in computer vision, serving as a core component in tasks such as visual localization, structure from motion, and SLAM. While recent advances using convolutional neural networks and transformer have achieved impressive accuracy, their substantial computational demands hinder practical deployment on resource-constrained devices, such as mobile and embedded platforms. To address this challenge, we propose LIM, a lightweight image local feature matching network designed for computationally constrained embedded systems. LIM integrates efficient feature extraction and matching modules that significantly reduce model complexity while maintaining competitive performance. Our design emphasizes robustness to extreme viewpoint and rotational variations, making it suitable for real-world deployment scenarios. Extensive experiments on multiple benchmarks demonstrate that LIM achieves a favorable trade-off between speed and accuracy, running more than 3× faster than existing deep matching methods, while preserving high-quality matching results. These characteristics position LIM as an effective solution for real-time applications in power-limited environments. Full article
Show Figures

Figure 1

16 pages, 3751 KiB  
Article
Improved Face Image Super-Resolution Model Based on Generative Adversarial Network
by Qingyu Liu, Yeguo Sun, Lei Chen and Lei Liu
J. Imaging 2025, 11(5), 163; https://doi.org/10.3390/jimaging11050163 - 19 May 2025
Viewed by 40
Abstract
Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, [...] Read more.
Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, which dynamically enhances feature representation in critical face regions through dual-branch convolution and channel-spatial attention. Second, an Edge-guided Enhancement Block (EEB) is introduced, generating adaptive detail residuals by combining edge masks and channel attention to accurately recover high-frequency textures. Furthermore, a multi-scale discriminator with a weighted sub-discriminator loss is developed to balance global structural and local detail generation quality. Additionally, a phase-wise training strategy with dynamic adjustment of learning rate (Lr) and loss function weights is implemented to improve the realism of super-resolved face images. Experiments on the CelebA-HQ dataset demonstrate that the proposed model achieves a PSNR of 23.35 dB, a SSIM of 0.7424, and a LPIPS of 24.86, outperforming classical models and delivering superior visual quality in high-frequency regions. Notably, this model also surpasses the SwinIR model (PSNR: 23.28 dB → 23.35 dB, SSIM: 0.7340 → 0.7424, and LPIPS: 30.48 → 24.86), validating the effectiveness of the improved model and the training strategy in preserving facial details. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

16 pages, 1572 KiB  
Article
A Lightweight Semantic Segmentation Model for Underwater Images Based on DeepLabv3+
by Chongjing Xiao, Zhiyu Zhou and Yanjun Hu
J. Imaging 2025, 11(5), 162; https://doi.org/10.3390/jimaging11050162 - 19 May 2025
Viewed by 55
Abstract
Underwater object image processing is a crucial technology for marine environmental exploration. The complexity of marine environments typically results in underwater object images exhibiting color deviation, imbalanced contrast, and blurring. Existing semantic segmentation methods for underwater objects either suffer from low segmentation accuracy [...] Read more.
Underwater object image processing is a crucial technology for marine environmental exploration. The complexity of marine environments typically results in underwater object images exhibiting color deviation, imbalanced contrast, and blurring. Existing semantic segmentation methods for underwater objects either suffer from low segmentation accuracy or fail to meet the lightweight requirements of underwater hardware. To address these challenges, this study proposes a lightweight semantic segmentation model based on DeepLabv3+. The framework employs MobileOne-S0 as the lightweight backbone for feature extraction, integrates Simple, Parameter-Free Attention Module (SimAM) into deep feature layers, replaces global average pooling in the Atrous Spatial Pyramid Pooling (ASPP) module with strip pooling, and adopts a content-guided attention (CGA)-based mixup fusion scheme to effectively combine high-level and low-level features while minimizing parameter redundancy. Experimental results demonstrate that the proposed model achieves a mean Intersection over Union (mIoU) of 71.18% on the DUT-USEG dataset, with parameters and computational complexity reduced to 6.628 M and 39.612 G FLOPs, respectively. These advancements significantly enhance segmentation accuracy while maintaining model efficiency, making the model highly suitable for resource-constrained underwater applications. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

27 pages, 7590 KiB  
Review
From Physically Based to Generative Models: A Survey on Underwater Image Synthesis Techniques
by Lucas Amparo Barbosa and Antonio Lopes Apolinario, Jr.
J. Imaging 2025, 11(5), 161; https://doi.org/10.3390/jimaging11050161 - 19 May 2025
Viewed by 179
Abstract
The underwater world has gained significant attention in research in recent years, particularly in the context of ocean exploration. Images serve as a valuable data source for underwater tasks, but they face several issues related to light behavior in this environment. Given the [...] Read more.
The underwater world has gained significant attention in research in recent years, particularly in the context of ocean exploration. Images serve as a valuable data source for underwater tasks, but they face several issues related to light behavior in this environment. Given the complexity of capturing data from the sea and the large variability of environmental components (depth, distance, suspended particles, turbidity, etc.), synthesized underwater scenes can provide relevant data to improve image processing algorithms and computer vision tasks. The main goal of this survey is to summarize techniques to underwater image synthesis, their contributions and correlations, and to highlight further directions and opportunities in this research domain. Full article
(This article belongs to the Special Issue Underwater Imaging (2nd Edition))
Show Figures

Figure 1

16 pages, 1447 KiB  
Article
Noise Suppressed Image Reconstruction for Quanta Image Sensors Based on Transformer Neural Networks
by Guanjie Wang and Zhiyuan Gao
J. Imaging 2025, 11(5), 160; https://doi.org/10.3390/jimaging11050160 - 17 May 2025
Viewed by 183
Abstract
The photon detection capability of quanta image sensors make them an optimal choice for low-light imaging. To address Possion noise in QIS reconstruction caused by spatio-temporal oversampling characteristic, a deep learning-based noise suppression reconstruction method is proposed in this paper. The proposed neural [...] Read more.
The photon detection capability of quanta image sensors make them an optimal choice for low-light imaging. To address Possion noise in QIS reconstruction caused by spatio-temporal oversampling characteristic, a deep learning-based noise suppression reconstruction method is proposed in this paper. The proposed neural network integrates convolutional neural networks and Transformers. Its architecture combines the Anscombe transformation with serial and parallel modules to enhance denoising performance and adaptability across various scenarios. Experimental results demonstrate that the proposed method effectively suppresses noise in QIS image reconstruction. Compared with representative methods such as TD-BM3D, QIS-Net and DPIR, our approach achieves up to 1.2 dB improvement in PSNR, demonstrating superior reconstruction quality. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

23 pages, 7984 KiB  
Article
A Transfer Learning-Based VGG-16 Model for COD Detection in UV–Vis Spectroscopy
by Jingwei Li, Iqbal Muhammad Tauqeer, Zhiyu Shao and Haidong Yu
J. Imaging 2025, 11(5), 159; https://doi.org/10.3390/jimaging11050159 - 17 May 2025
Viewed by 147
Abstract
Chemical oxygen demand (COD) serves as a key indicator of organic pollution in water bodies, and its rapid and accurate detection is crucial for environmental protection. Recently, ultraviolet–visible (UV–Vis) spectroscopy has gained popularity for COD detection due to its convenience and the absence [...] Read more.
Chemical oxygen demand (COD) serves as a key indicator of organic pollution in water bodies, and its rapid and accurate detection is crucial for environmental protection. Recently, ultraviolet–visible (UV–Vis) spectroscopy has gained popularity for COD detection due to its convenience and the absence of chemical reagents. Meanwhile, deep learning has emerged as an effective approach for automatically extracting spectral features and predicting COD. This paper proposes transforming one-dimensional spectra into two-dimensional spectrum images and employing convolutional neural networks (CNNs) to extract features and model automatically. However, training such deep learning models requires a vast dataset of water samples, alongside the complex task of labeling this data. To address these challenges, we introduce a transfer learning model based on VGG-16 for spectrum images. In this approach, parameters in the initial layers of the model are frozen, while those in the later layers are fine-tuned with the spectrum images. The effectiveness of this method is demonstrated through experiments conducted on our dataset, where the results indicate that it significantly enhances the accuracy of COD prediction compared to traditional methods and other deep learning methods such as partial least squares regression (PLSR), support vector machine (SVM), artificial neural network (ANN), and CNN-based methods. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

15 pages, 85946 KiB  
Article
Real-Time Far-Field BCSDF Filtering
by Junjie Wei and Ying Song
J. Imaging 2025, 11(5), 158; https://doi.org/10.3390/jimaging11050158 - 16 May 2025
Viewed by 74
Abstract
The real-time rendering of large-scale curve-based surfaces (e.g., hair, fabrics) requires efficient handling of bidirectional curve-scattering distribution functions (BCSDFs). While curve-based material models are essential for capturing anisotropic reflectance characteristics, conventional prefiltering techniques encounter challenges in jointly resolving micro-scale BCSDFs variations with tangent [...] Read more.
The real-time rendering of large-scale curve-based surfaces (e.g., hair, fabrics) requires efficient handling of bidirectional curve-scattering distribution functions (BCSDFs). While curve-based material models are essential for capturing anisotropic reflectance characteristics, conventional prefiltering techniques encounter challenges in jointly resolving micro-scale BCSDFs variations with tangent distribution functions (TDFs) at pixel-level accuracy. This paper presents a real-time BCSDF filtering framework that achieves high-fidelity rendering without precomputation. Our key insight lies in formulating each pixel’s scattering response as a mixture of von Mises–Fisher (vMF) distributions, enabling analytical convolution between micro-scale BCSDFs and TDFs. Furthermore, we derive closed-form expressions for the integral of TDF-BCSDF products, avoiding the need for numerical approximation and heavy precomputation. Our method demonstrates state-of-the-art performance, achieving results comparable to 1000 spp Monte Carlo simulations under parallax-free conditions, where it improves the mean squared error (MSE) by one to two orders of magnitude over baseline methods. Qualitative comparisons and error analysis confirm both visual fidelity and computational efficiency. Full article
(This article belongs to the Section Visualization and Computer Graphics)
Show Figures

Figure 1

28 pages, 7048 KiB  
Article
AI-Driven Automated Blood Cell Anomaly Detection: Enhancing Diagnostics and Telehealth in Hematology
by Sami Naouali and Oussama El Othmani
J. Imaging 2025, 11(5), 157; https://doi.org/10.3390/jimaging11050157 - 16 May 2025
Viewed by 113
Abstract
Hematology plays a critical role in diagnosing and managing a wide range of blood-related disorders. The manual interpretation of blood smear images, however, is time-consuming and highly dependent on expert availability. Moreover, it is particularly challenging in remote and resource-limited settings. In this [...] Read more.
Hematology plays a critical role in diagnosing and managing a wide range of blood-related disorders. The manual interpretation of blood smear images, however, is time-consuming and highly dependent on expert availability. Moreover, it is particularly challenging in remote and resource-limited settings. In this study, we present an AI-driven system for automated blood cell anomaly detection, combining computer vision and machine learning models to support efficient diagnostics in hematology and telehealth contexts. Our architecture integrates segmentation (YOLOv11), classification (ResNet50), transfer learning, and zero-shot learning to identify and categorize cell types and abnormalities from blood smear images. Evaluated on real annotated samples, the system achieved high performance, with a precision of 0.98, recall of 0.99, and F1 score of 0.98. These results highlight the potential of the proposed system to enhance remote diagnostic capabilities and support clinical decision making in underserved regions. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Show Figures

Figure 1

26 pages, 3942 KiB  
Article
Unleashing the Potential of Residual and Dual-Stream Transformers for the Remote Sensing Image Analysis
by Priya Mittal, Vishesh Tanwar, Bhisham Sharma and Dhirendra Prasad Yadav
J. Imaging 2025, 11(5), 156; https://doi.org/10.3390/jimaging11050156 - 15 May 2025
Viewed by 106
Abstract
The categorization of remote sensing satellite imagery is crucial for various applications, including environmental monitoring, urban planning, and disaster management. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have exhibited exceptional performance among deep learning techniques, excelling in feature extraction and representational learning. [...] Read more.
The categorization of remote sensing satellite imagery is crucial for various applications, including environmental monitoring, urban planning, and disaster management. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have exhibited exceptional performance among deep learning techniques, excelling in feature extraction and representational learning. This paper presents a hybrid dual-stream ResV2ViT model that combines the advantages of ResNet50 V2 and Vision Transformer (ViT) architectures. The dual-stream approach allows the model to extract both local spatial features and global contextual information by processing data through two complementary pathways. The ResNet50V2 component is utilized for hierarchical feature extraction and captures short-range dependencies, whereas the ViT module efficiently models long-range dependencies and global contextual information. After position embedding in the hybrid model, the tokens are bifurcated into two parts: q1 and q2. q1 is passed into the convolutional block to refine local spatial details, and q2 is given to the Transformer to provide global attention to the spatial feature. Combining these two architectures allows the model to acquire low-level and high-level feature representations, improving classification performance. We assess the proposed ResV2ViT model using the RSI-CB256 dataset and another dataset with 21 classes. The proposed model attains an average accuracy of 99.91%, with precision and F1 score of 99.90% for the first dataset and 98.75% accuracy for the second dataset, illustrating its efficacy in satellite image classification. The findings demonstrate that the dual-stream hybrid ResV2ViT model surpasses traditional CNN and Transformer-based models, establishing it as a formidable framework for remote sensing applications. Full article
Show Figures

Figure 1

17 pages, 5792 KiB  
Article
Beyond Handcrafted Features: A Deep Learning Framework for Optical Flow and SLAM
by Kamran Kazi, Arbab Nighat Kalhoro, Farida Memon, Azam Rafique Memon and Muddesar Iqbal
J. Imaging 2025, 11(5), 155; https://doi.org/10.3390/jimaging11050155 - 15 May 2025
Viewed by 182
Abstract
This paper presents a novel approach for visual Simultaneous Localization and Mapping (SLAM) using Convolution Neural Networks (CNNs) for robust map creation. Traditional SLAM methods rely on handcrafted features, which are susceptible to viewpoint changes, occlusions, and illumination variations. This work proposes a [...] Read more.
This paper presents a novel approach for visual Simultaneous Localization and Mapping (SLAM) using Convolution Neural Networks (CNNs) for robust map creation. Traditional SLAM methods rely on handcrafted features, which are susceptible to viewpoint changes, occlusions, and illumination variations. This work proposes a method that leverages the power of CNNs by extracting features from an intermediate layer of a pre-trained model for optical flow estimation. We conduct an extensive search for optimal features by analyzing the offset error across thousands of combinations of layers and filters within the CNN. This analysis reveals a specific layer and filter combination that exhibits minimal offset error while still accounting for viewpoint changes, occlusions, and illumination variations. These features, learned by the CNN, are demonstrably robust to environmental challenges that often hinder traditional handcrafted features in SLAM tasks. The proposed method is evaluated on six publicly available datasets that are widely used for bench-marking map estimation and accuracy. Our method consistently achieved the lowest offset error compared to traditional handcrafted feature-based approaches on all six datasets. This demonstrates the effectiveness of CNN-derived features for building accurate and robust maps in diverse environments. Full article
(This article belongs to the Section Visualization and Computer Graphics)
Show Figures

Figure 1

16 pages, 9488 KiB  
Article
A Multitask Network for the Diagnosis of Autoimmune Gastritis
by Yuqi Cao, Yining Zhao, Xinao Jin, Jiayuan Zhang, Gangzhi Zhang, Pingjie Huang, Guangxin Zhang and Yuehua Han
J. Imaging 2025, 11(5), 154; https://doi.org/10.3390/jimaging11050154 - 15 May 2025
Viewed by 148
Abstract
Autoimmune gastritis (AIG) has a strong correlation with gastric neuroendocrine tumors (NETs) and gastric cancer, making its timely and accurate diagnosis crucial for tumor prevention. The endoscopic manifestations of AIG differ from those of gastritis caused by Helicobacter pylori (H. pylori) [...] Read more.
Autoimmune gastritis (AIG) has a strong correlation with gastric neuroendocrine tumors (NETs) and gastric cancer, making its timely and accurate diagnosis crucial for tumor prevention. The endoscopic manifestations of AIG differ from those of gastritis caused by Helicobacter pylori (H. pylori) infection in terms of the affected gastric anatomical regions and the pathological characteristics observed in biopsy samples. Therefore, when diagnosing AIG based on endoscopic images, it is essential not only to distinguish between normal and atrophic gastric mucosa but also to accurately identify the anatomical region in which the atrophic mucosa is located. In this study, we propose a patient-based multitask gastroscopy image classification network that analyzes all images obtained during the endoscopic procedure. First, we employ the Scale-Invariant Feature Transform (SIFT) algorithm for image registration, generating an image similarity matrix. Next, we use a hierarchical clustering algorithm to group images based on this matrix. Finally, we apply the RepLKNet model, which utilizes large-kernel convolution, to each image group to perform two tasks: anatomical region classification and lesion recognition. Our method achieves an accuracy of 93.4 ± 0.5% (95% CI) and a precision of 92.6 ± 0.4% (95% CI) in the anatomical region classification task, which categorizes images into the fundus, body, and antrum. Additionally, it attains an accuracy of 90.2 ± 1.0% (95% CI) and a precision of 90.5 ± 0.8% (95% CI) in the lesion recognition task, which identifies the presence of gastric mucosal atrophic lesions in gastroscopy images. These results demonstrate that the proposed multitask patient-based gastroscopy image analysis method holds significant practical value for advancing computer-aided diagnosis systems for atrophic gastritis and enhancing the diagnostic accuracy and efficiency of AIG. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

28 pages, 32576 KiB  
Article
Machine Learning Algorithms of Remote Sensing Data Processing for Mapping Changes in Land Cover Types over Central Apennines, Italy
by Polina Lemenkova
J. Imaging 2025, 11(5), 153; https://doi.org/10.3390/jimaging11050153 - 12 May 2025
Viewed by 352
Abstract
This work presents the use of remote sensing data for land cover mapping with a case of Central Apennines, Italy. The data include 8 Landsat 8-9 Operational Land Imager/Thermal Infrared Sensor (OLI/TIRS) satellite images in six-year period (2018–2024). The operational workflow included satellite [...] Read more.
This work presents the use of remote sensing data for land cover mapping with a case of Central Apennines, Italy. The data include 8 Landsat 8-9 Operational Land Imager/Thermal Infrared Sensor (OLI/TIRS) satellite images in six-year period (2018–2024). The operational workflow included satellite image processing which were classified into raster maps with automatically detected 10 classes of land cover types over the tested study. The approach was implemented by using a set of modules in Geographic Resources Analysis Support System (GRASS) Geographic Information System (GIS). To classify remote sensing (RS) data, two types of approaches were carried out. The first is unsupervised classification based on the MaxLike approach and clustering which extracted Digital Numbers (DN) of landscape feature based on the spectral reflectance of signals, and the second is supervised classification performed using several methods of Machine Learning (ML), technically realised in GRASS GIS scripting software. The latter included four ML algorithms embedded from the Python’s Scikit-Learn library. These classifiers have been implemented to detect subtle changes in land cover types as derived from the satellite images showing different vegetation conditions in spring and autumn periods in central Apennines, northern Italy. Full article
Show Figures

Figure 1

16 pages, 3511 KiB  
Article
Frequency-Aware Diffusion Model for Multi-Modal MRI Image Synthesis
by Mingfeng Jiang, Peihang Jia, Xin Huang, Zihan Yuan, Dongsheng Ruan, Feng Liu and Ling Xia
J. Imaging 2025, 11(5), 152; https://doi.org/10.3390/jimaging11050152 - 11 May 2025
Viewed by 195
Abstract
Magnetic Resonance Imaging (MRI) is a widely used, non-invasive imaging technology that plays a critical role in clinical diagnostics. Multi-modal MRI, which combines images from different modalities, enhances diagnostic accuracy by offering comprehensive tissue characterization. Meanwhile, multi-modal MRI enhances downstream tasks, like brain [...] Read more.
Magnetic Resonance Imaging (MRI) is a widely used, non-invasive imaging technology that plays a critical role in clinical diagnostics. Multi-modal MRI, which combines images from different modalities, enhances diagnostic accuracy by offering comprehensive tissue characterization. Meanwhile, multi-modal MRI enhances downstream tasks, like brain tumor segmentation and image reconstruction, by providing richer features. While recent advances in diffusion models (DMs) show potential for high-quality image translation, existing methods still struggle to preserve fine structural details and ensure accurate image synthesis in medical imaging. To address these challenges, we propose a Frequency-Aware Diffusion Model (FADM) for generating high-quality target modality MRI images from source modality images. The FADM incorporates a discrete wavelet transform within the diffusion model framework to extract both low- and high-frequency information from MRI images, enhancing the capture of tissue structural and textural features. Additionally, a wavelet downsampling layer and supervision module are incorporated to improve frequency awareness and optimize high-frequency detail extraction. Experimental results on the BraTS 2021 dataset and a 1.5T–3T MRI dataset demonstrate that the FADM outperforms existing generative models, particularly in preserving intricate brain structures and tumor regions while generating high-quality MRI images. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Show Figures

Figure 1

25 pages, 6904 KiB  
Article
A Weighted Facial Expression Analysis for Pain Level Estimation
by Parkpoom Chaisiriprasert and Nattapat Patchsuwan
J. Imaging 2025, 11(5), 151; https://doi.org/10.3390/jimaging11050151 - 9 May 2025
Viewed by 185
Abstract
Accurate assessment of pain intensity is critical, particularly for patients who are unable to verbally express their discomfort. This study proposes a novel weighted analytical framework that integrates facial expression analysis through action units (AUs) with a facial feature-based weighting mechanism to enhance [...] Read more.
Accurate assessment of pain intensity is critical, particularly for patients who are unable to verbally express their discomfort. This study proposes a novel weighted analytical framework that integrates facial expression analysis through action units (AUs) with a facial feature-based weighting mechanism to enhance the estimation of pain intensity. The proposed method was evaluated on a dataset comprising 4084 facial images from 25 individuals and demonstrated an average accuracy of 92.72% using the weighted pain level estimation model, in contrast to 83.37% achieved using conventional approaches. The observed improvements are primarily attributed to the strategic utilization of AU zones and expression-based weighting, which enable more precise differentiation between pain-related and non-pain-related facial movements. These findings underscore the efficacy of the proposed model in enhancing the accuracy and reliability of automated pain detection, especially in contexts where verbal communication is impaired or absent. Full article
Show Figures

Figure 1

16 pages, 6530 KiB  
Article
Reduction of Aerial Image Misalignment in Face-to-Face 3D Aerial Display
by Atsutoshi Kurihara and Yue Bao
J. Imaging 2025, 11(5), 150; https://doi.org/10.3390/jimaging11050150 - 9 May 2025
Viewed by 203
Abstract
A Micromirror Array Plate (MMAP) has been proposed as a type of aerial display that allows users to directly touch the floating image. However, the aerial images generated by this optical element have a limited viewing angle, making them difficult to use in [...] Read more.
A Micromirror Array Plate (MMAP) has been proposed as a type of aerial display that allows users to directly touch the floating image. However, the aerial images generated by this optical element have a limited viewing angle, making them difficult to use in face-to-face interactions. Conventional methods enable face-to-face usability by displaying multiple aerial images corresponding to different viewpoints. However, because these images are two-dimensional, they cannot be displayed at the same position due to the inherent characteristics of MMAP. An omnidirectional 3D autostereoscopic aerial display has been developed to address this issue, but it requires multiple expensive and specially shaped MMAPs to generate aerial images. To overcome this limitation, this study proposes a method that combines a single MMAP with integral photography (IP) to produce 3D aerial images with depth while reducing image misalignment. The experimental results demonstrate that the proposed method successfully displays a 3D aerial image using a single MMAP and reduces image misalignment to 1.1 mm. Full article
Show Figures

Figure 1

26 pages, 17670 KiB  
Article
Adaptive High-Precision 3D Reconstruction of Highly Reflective Mechanical Parts Based on Optimization of Exposure Time and Projection Intensity
by Ci He, Rong Lai, Jin Sun, Kazuhiro Izui, Zili Wang, Xiaojian Liu and Shuyou Zhang
J. Imaging 2025, 11(5), 149; https://doi.org/10.3390/jimaging11050149 - 8 May 2025
Viewed by 234
Abstract
This article is used to reconstruct mechanical parts with highly reflective surfaces. Three-dimensional reconstruction based on Phase Measuring Profilometry (PMP) is a key technology in non-contact optical measurement and is widely applied in the intelligent inspection of mechanical components. Due to the high [...] Read more.
This article is used to reconstruct mechanical parts with highly reflective surfaces. Three-dimensional reconstruction based on Phase Measuring Profilometry (PMP) is a key technology in non-contact optical measurement and is widely applied in the intelligent inspection of mechanical components. Due to the high reflectivity of metallic parts, direct utilization of the captured high-dynamic-range images often results in significant information loss in the oversaturated areas and excessive noise in the dark regions, leading to geometric defects and reduced accuracy in the reconstructed point clouds. Many image-fusion-based solutions have been proposed to solve these problems. However, unknown geometric structures and reflection characteristics of mechanical parts lead to the lack of effective guidance for the design of important imaging parameters. Therefore, an adaptive high-precision 3D reconstruction method of highly reflective mechanical parts based on optimization of exposure time and projection intensity is proposed in this article. The projection intensity is optimized to adapt the captured images to the linear dynamic range of the hardware. Image sequence under the obtained optimal intensities is fused using an integration of Genetic Algorithm and Stochastic Adam optimizer to maximize the image information entropy. Then, histogram-based analysis is employed to segment regions with similar reflective properties and determine the optimal exposure time. Experimental validation was carried out on three sets of typical mechanical components with diverse geometric characteristics and varying complexity. Compared with both non-saturated single-exposure techniques and conventional image fusion methods employing fixed attenuation steps, the proposed method reduced the average whisker range of reconstruction error by 51.18% and 25.09%, and decreased the median error by 42.48% and 25.42%, respectively. These experimental results verified the effectiveness and precision performance of the proposed method. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
Show Figures

Figure 1

17 pages, 3971 KiB  
Article
3D-NASE: A Novel 3D CT Nasal Attention-Based Segmentation Ensemble
by Alessandro Pani, Luca Zedda, Davide Antonio Mura, Andrea Loddo and Cecilia Di Ruberto
J. Imaging 2025, 11(5), 148; https://doi.org/10.3390/jimaging11050148 - 7 May 2025
Viewed by 99
Abstract
Accurate segmentation of the nasal cavity and paranasal sinuses in CT scans is crucial for disease assessment, treatment planning, and surgical navigation. It also facilitates the advanced computational modeling of airflow dynamics and enhances endoscopic surgery preparation. This work presents a novel ensemble [...] Read more.
Accurate segmentation of the nasal cavity and paranasal sinuses in CT scans is crucial for disease assessment, treatment planning, and surgical navigation. It also facilitates the advanced computational modeling of airflow dynamics and enhances endoscopic surgery preparation. This work presents a novel ensemble framework for 3D nasal CT segmentation that synergistically combines CNN-based and transformer-based architectures, 3D-NASE. By integrating 3D U-Net, UNETR, Swin UNETR, SegResNet, DAF3D, and V-Net with majority and soft voting strategies, our approach leverages both local details and global context to improve segmentation accuracy and robustness. Results on the NasalSeg dataset demonstrate that the proposed ensemble method surpasses previous state-of-the-art results by achieving a 35.88% improvement in the DICE score and reducing the standard deviation by 4.53%. These promising results highlight the potential of our method to advance clinical workflows in diagnosis, treatment planning, and surgical navigation while also promoting further research into computationally efficient and highly accurate segmentation techniques. Full article
Show Figures

Figure 1

30 pages, 25530 KiB  
Article
Towards the Performance Characterization of a Robotic Multimodal Diagnostic Imaging System
by George Papaioannou, Christos Mitrogiannis, Mark Schweitzer, Nikolaos Michailidis, Maria Pappa, Pegah Khosravi, Apostolos Karantanas, Sean Starling and Christian Ruberg
J. Imaging 2025, 11(5), 147; https://doi.org/10.3390/jimaging11050147 - 7 May 2025
Viewed by 150
Abstract
Characterizing imaging performance requires a multidisciplinary approach that evaluates various interconnected parameters, including dosage optimization and dynamic accuracy. Radiation dose and dynamic accuracy are challenged by patient motion that results in poor image quality. These challenges are more prevalent in the brain/cardiac pediatric [...] Read more.
Characterizing imaging performance requires a multidisciplinary approach that evaluates various interconnected parameters, including dosage optimization and dynamic accuracy. Radiation dose and dynamic accuracy are challenged by patient motion that results in poor image quality. These challenges are more prevalent in the brain/cardiac pediatric patient imaging, as they relate to excess radiation dose that may be associated with various complications. Scanning vulnerable pediatric patients ought to eliminate anesthesia due to critical risks associated in some cases with intracranial hemorrhages, brain strokes, and congenital heart disease. Some pediatric imaging, however, requires prolonged scanning under anesthesia. It can often be a laborious, suboptimal process, with limited field of view and considerable dose. High dynamic accuracy is also necessary to diagnose tissue’s dynamic behavior beyond its static structural morphology. This study presents several performance characterization experiments from a new robotic multimodal imaging system using specially designed calibration methods at different system configurations. Additional musculoskeletal imaging and imaging from a pediatric brain stroke patient without anesthesia are presented for comparisons. The findings suggest that the system’s large dynamically controlled gantry enables scanning at full patient movement and with important improvements in scan times, accuracy, radiation dose, and the ability to image brain structures without anesthesia. This could position the system as a potential transformative tool in the pediatric interventional imaging landscape. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

17 pages, 2467 KiB  
Article
Quantitative Ultrasound Texture Analysis of Breast Tumors: A Comparison of a Cart-Based and a Wireless Ultrasound Scanner
by David Alberico, Lakshmanan Sannachi, Maria Lourdes Anzola Pena, Joyce Yip, Laurentius O. Osapoetra, Schontal Halstead, Daniel DiCenzo, Sonal Gandhi, Frances Wright, Michael Oelze and Gregory J. Czarnota
J. Imaging 2025, 11(5), 146; https://doi.org/10.3390/jimaging11050146 - 6 May 2025
Viewed by 237
Abstract
Previous work has demonstrated quantitative ultrasound (QUS) analysis techniques for extracting features and texture features from ultrasound radiofrequency data which can be used to distinguish between benign and malignant breast masses. It is desirable that there be good agreement between estimates of such [...] Read more.
Previous work has demonstrated quantitative ultrasound (QUS) analysis techniques for extracting features and texture features from ultrasound radiofrequency data which can be used to distinguish between benign and malignant breast masses. It is desirable that there be good agreement between estimates of such features acquired using different ultrasound devices. Handheld ultrasound imaging systems are of particular interest as they are compact, relatively inexpensive, and highly portable. This study investigated the agreement between QUS parameters and texture features estimated from clinical ultrasound images of breast tumors acquired using two different ultrasound scanners: a traditional cart-based system and a wireless handheld ultrasound system. The 28 patients who participated were divided into two groups (benign and malignant). The reference phantom technique was used to produce functional estimates of the normalized power spectra and backscatter coefficient for each image. Root mean square differences of feature estimates were calculated for each cohort to quantify the level of feature variation attributable to tissue heterogeneity and differences in system imaging parameters. Cross-system statistical testing using the Mann–Whitney U test was performed on benign and malignant patient cohorts to assess the level of feature estimate agreement between systems, and the Bland–Altman method was employed to assess feature sets for systematic bias introduced by differences in imaging method. The range of p-values was 1.03 × 10−4 to 0.827 for the benign cohort and 3.03 × 10−10 to 0.958 for the malignant cohort. For both cohorts, all five of the primary QUS features (MBF, SS, SI, ASD, AAC) were found to be in agreement at the 5% confidence level. A total of 13 of the 20 QUS texture features (65%) were determined to exhibit statistically significant differences in the sample medians of estimates between systems at the 5% confidence level, with the remaining 7 texture features being in agreement. The results showed a comparable magnitude of feature variation between tissue heterogeneity and system effects, as well as a moderate level of statistical agreement between feature sets. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

8 pages, 1238 KiB  
Article
Shear Wave Elastography for Parotid Glands: Quantitative Analysis of Shear Elastic Modulus in Relation to Age, Gender, and Internal Architecture in Patients with Oral Cancer
by Yuka Tanabe, Ai Shirai and Ichiro Ogura
J. Imaging 2025, 11(5), 145; https://doi.org/10.3390/jimaging11050145 - 4 May 2025
Viewed by 185
Abstract
Background: Recently, shear wave elastography (SWE) has been recognized as an effective tool for evaluating Sjögren’s syndrome (SS) patients. The purpose of this study was to assess the parotid glands with SWE, especially for quantitative analysis of shear elastic modulus in relation to [...] Read more.
Background: Recently, shear wave elastography (SWE) has been recognized as an effective tool for evaluating Sjögren’s syndrome (SS) patients. The purpose of this study was to assess the parotid glands with SWE, especially for quantitative analysis of shear elastic modulus in relation to age, gender, and internal architecture in patients with oral cancer to collect control data for SS. Methods: In total, 124 parotid glands of 62 patients with oral cancer were evaluated with SWE. The parotid glands were examined for the internal architecture (homogeneous or heterogeneous) on B-mode. The SWE allowed the operator to place regions of interest (ROIs) for parotid glands, and displayed automatically shear elastic modulus data (kPa) for each ROI. Gender and internal architecture were compared with the shear elastic modulus of the parotid glands by Mann–Whitney U-test. The comparison of age and shear elastic modulus was assessed using Spearman’s correlation coefficient. p < 0.05 was considered statistically significant. Results: The shear elastic modulus of the parotid glands was not significantly different for according to gender (males, 7.70 ± 2.22 kPa and females, 7.67 ± 2.41 kPa, p = 0.973) or internal architecture (homogeneous: 7.69 ± 2.25 kPa and heterogeneous: 7.72 ± 2.74 kPa, p = 0.981). Furthermore, the shear elastic modulus was not correlated with age (n = 124, R = −0.133, p = 0.139). Conclusion: Our study showed the control data of the shear elastic modulus of the parotid glands for SS. SWE is useful for the quantitative evaluation of the parotid glands. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

28 pages, 8775 KiB  
Article
Motion-Perception Multi-Object Tracking (MPMOT): Enhancing Multi-Object Tracking Performance via Motion-Aware Data Association and Trajectory Connection
by Weijun Meng, Shuaipeng Duan, Sugang Ma and Bin Hu
J. Imaging 2025, 11(5), 144; https://doi.org/10.3390/jimaging11050144 - 3 May 2025
Viewed by 426
Abstract
Multiple Object Tracking (MOT) aims to detect and track multiple targets across consecutive video frames while preserving consistent object identities. While appearance-based approaches have achieved notable success, they often struggle in challenging conditions such as occlusions, motion blur, and the presence of visually [...] Read more.
Multiple Object Tracking (MOT) aims to detect and track multiple targets across consecutive video frames while preserving consistent object identities. While appearance-based approaches have achieved notable success, they often struggle in challenging conditions such as occlusions, motion blur, and the presence of visually similar objects, resulting in identity switches and fragmented trajectories. To address these limitations, we propose Motion-Perception Multi-Object Tracking (MPMOT), a motion-aware tracking framework that emphasizes robust motion modeling and adaptive association. MPMOT incorporates three core components: (1) a Gain Kalman Filter (GKF) that adaptively adjusts detection noise based on confidence scores, stabilizing motion prediction during uncertain observations; (2) an Adaptive Cost Matrix (ACM) that dynamically fuses motion and appearance cues during track–detection association, improving robustness under ambiguity; and (3) a Global Connection Model (GCM) that reconnects fragmented tracklets by modeling spatio-temporal consistency. Extensive experiments on the MOT16, MOT17, and MOT20 benchmarks demonstrate that MPMOT consistently outperforms state-of-the-art trackers, achieving IDF1 scores of 72.8% and 72.6% on MOT16 and MOT17, respectively, surpassing the widely used FairMOT baseline by 1.1% and 1.3%. Additionally, rigorous statistical validation through post hoc analysis confirms that MPMOT’s improvements in tracking accuracy and identity preservation are statistically significant across all datasets. MPMOT delivers these gains while maintaining real-time performance, making it a scalable and reliable solution for multi-object tracking in dynamic and crowded environments. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 4899 KiB  
Review
Cardiac Magnetic Resonance in the Assessment of Atrial Cardiomyopathy and Pulmonary Vein Isolation Planning for Atrial Fibrillation
by Nicola Pegoraro, Serena Chiarello, Riccardo Bisi, Giuseppe Muscogiuri, Matteo Bertini, Aldo Carnevale, Melchiore Giganti and Alberto Cossu
J. Imaging 2025, 11(5), 143; https://doi.org/10.3390/jimaging11050143 - 2 May 2025
Viewed by 279
Abstract
Atrial fibrillation (AF) is the most frequently observed type of arrhythmia among adults, and its absolute prevalence is steadily rising in close association with the aging of the population, with its prevalence varying from 2% in the general population to 10–12% among the [...] Read more.
Atrial fibrillation (AF) is the most frequently observed type of arrhythmia among adults, and its absolute prevalence is steadily rising in close association with the aging of the population, with its prevalence varying from 2% in the general population to 10–12% among the elderly. The relatively new concepts of ‘atrial cardiomyopathy’ and “AF-related atrial cardiomyopathy”, along with the growing body of knowledge regarding remodeling, function, and tissue characterization, highlight the need for novel approaches to the diagnostic process as well as in the therapeutic guidance and monitoring of atrial arrhythmias. Advanced imaging techniques, particularly cardiac magnetic resonance (CMR) imaging, have emerged as pivotal in the detailed assessment of atrial structure and function. CMR facilitates the precise measurement of left atrial volume and morphology, which are critical predictors of AF recurrence post-intervention. Furthermore, it enables the evaluation of atrial fibrosis using late gadolinium enhancement (LGE), offering a non-invasive method to assess the severity and distribution of fibrotic tissue. The possibility of an accurate CMR pulmonary vein anatomy mapping enhances the precision of pulmonary vein isolation procedures, potentially improving outcomes in AF management. This review underlines the integration of novel diagnostic tools in enhancing the understanding and management of AF, advocating for a shift towards more personalized and effective therapeutic programs. Full article
Show Figures

Graphical abstract

16 pages, 1139 KiB  
Article
ARAN: Age-Restricted Anonymized Dataset of Children Images and Body Measurements
by Hezha H. MohammedKhan, Cascha Van Wanrooij, Eric O. Postma, Çiçek Güven, Marleen Balvert, Heersh Raof Saeed and Chenar Omer Ali Al Jaf
J. Imaging 2025, 11(5), 142; https://doi.org/10.3390/jimaging11050142 - 30 Apr 2025
Viewed by 298
Abstract
Precisely estimating a child’s body measurements and weight from a single image is useful in pediatrics for monitoring growth and detecting early signs of malnutrition. The development of estimation models for this task is hampered by the unavailability of a labeled image dataset [...] Read more.
Precisely estimating a child’s body measurements and weight from a single image is useful in pediatrics for monitoring growth and detecting early signs of malnutrition. The development of estimation models for this task is hampered by the unavailability of a labeled image dataset to support supervised learning. This paper introduces the “Age-Restricted Anonymized” (ARAN) dataset, the first labeled image dataset of children with body measurements approved by an ethics committee under the European General Data Protection Regulation guidelines. The ARAN dataset consists of images of 512 children aged 16 to 98 months, each captured from four different viewpoints, i.e., 2048 images in total. The dataset is anonymized manually on the spot through a face mask and includes each child’s height, weight, age, waist circumference, and head circumference measurements. The dataset is a solid foundation for developing prediction models for various tasks related to these measurements; it addresses the gap in computer vision tasks related to body measurements as it is significantly larger than any other comparable dataset of children, along with diverse viewpoints. To create a suitable reference, we trained state-of-the-art deep learning algorithms on the ARAN dataset to predict body measurements from the images. The best results are obtained by a DenseNet121 model achieving competitive estimates for the body measurements, outperforming state-of-the-art results on similar tasks. The ARAN dataset is developed as part of a collaboration to create a mobile app to measure children’s growth and detect early signs of malnutrition, contributing to the United Nations Sustainable Development Goals. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop