Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 15.3 days after submission; acceptance to publication is undertaken in 3.5 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
Pursuing Better Representations: Balancing Discriminability and Transferability for Few-Shot Class-Incremental Learning
J. Imaging 2025, 11(11), 391; https://doi.org/10.3390/jimaging11110391 - 4 Nov 2025
Abstract
Few-Shot Class-Incremental Learning (FSCIL) aims to continually learn novel classes from limited data while retaining knowledge of previously learned classes. To mitigate catastrophic forgetting, most approaches pre-train a powerful backbone on the base session and keep it frozen during incremental sessions. Within this
[...] Read more.
Few-Shot Class-Incremental Learning (FSCIL) aims to continually learn novel classes from limited data while retaining knowledge of previously learned classes. To mitigate catastrophic forgetting, most approaches pre-train a powerful backbone on the base session and keep it frozen during incremental sessions. Within this framework, existing studies primarily focus on representation learning in FSCIL, particularly Self-Supervised Contrastive Learning (SSCL), to enhance the transferability of representations and thereby boost model generalization to novel classes. However, they face a trade-off dilemma: improving transferability comes at the expense of discriminability, precluding simultaneous high performance on both base and novel classes. To address this issue, we propose BR-FSCIL, a representation learning framework for the FSCIL scenario. In the pre-training stage, we first design a Hierarchical Contrastive Learning (HierCon) algorithm. HierCon leverages label information to model hierarchical relationships among features. In contrast to SSCL, it maintains strong discriminability when promoting transferability. Second, to further improve the model’s performance on novel classes, an Alignment Modulation (AM) loss is proposed that explicitly facilitates learning of knowledge shared across classes from an inter-class perspective. Building upon the hierarchical discriminative structure established by HierCon, it additionally improves the model’s adaptability to novel classes. Through optimization at both intra-class and inter-class levels, the representations learned by BR-FSCIL achieve a balance between discriminability and transferability. Extensive experiments on mini-ImageNet, CIFAR100, and CUB200 demonstrate the effectiveness of our method, which achieves final session accuracies of 53.83%, 53.04%, and 62.60%, respectively.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►
Show Figures
Open AccessReview
Next-Generation Advances in Prostate Cancer Imaging and Artificial Intelligence Applications
by
Kathleen H. Miao, Julia H. Miao, Mark Finkelstein, Aritrick Chatterjee and Aytekin Oto
J. Imaging 2025, 11(11), 390; https://doi.org/10.3390/jimaging11110390 - 3 Nov 2025
Abstract
Prostate cancer is one of the leading causes of cancer-related morbidity and mortality worldwide, and imaging plays a critical role in its detection, localization, staging, treatment, and management. The advent of artificial intelligence (AI) has introduced transformative possibilities in prostate imaging, offering enhanced
[...] Read more.
Prostate cancer is one of the leading causes of cancer-related morbidity and mortality worldwide, and imaging plays a critical role in its detection, localization, staging, treatment, and management. The advent of artificial intelligence (AI) has introduced transformative possibilities in prostate imaging, offering enhanced accuracy, efficiency, and consistency. This review explores the integration of AI in prostate cancer diagnostics across key imaging modalities, including multiparametric MRI (mpMRI), PSMA PET/CT, and transrectal ultrasound (TRUS). Advanced AI technologies, such as machine learning, deep learning, and radiomics, are being applied for lesion detection, risk stratification, segmentation, biopsy targeting, and treatment planning. AI-augmented systems have demonstrated the ability to support PI-RADS scoring, automate prostate and tumor segmentation, guide targeted biopsies, and optimize radiation therapy. Despite promising performance, challenges persist regarding data heterogeneity, algorithm generalizability, ethical considerations, and clinical implementation. Looking ahead, multimodal AI models integrating imaging, genomics, and clinical data hold promise for advancing precision medicine in prostate cancer care and assisting clinicians, particularly in underserved regions with limited access to specialists. Continued multidisciplinary collaboration will be essential to translate these innovations into evidence-based practice. This article explores current AI applications and future directions that are transforming prostate imaging and patient care.
Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Knowledge-Guided Symbolic Regression for Interpretable Camera Calibration
by
Rui Pimentel de Figueiredo
J. Imaging 2025, 11(11), 389; https://doi.org/10.3390/jimaging11110389 - 2 Nov 2025
Abstract
Calibrating cameras accurately requires the identification of projection and distortion models that effectively account for lens-specific deviations. Conventional formulations, like the pinhole model or radial–tangential corrections, often struggle to represent the asymmetric and nonlinear distortions encountered in complex environments such as autonomous navigation,
[...] Read more.
Calibrating cameras accurately requires the identification of projection and distortion models that effectively account for lens-specific deviations. Conventional formulations, like the pinhole model or radial–tangential corrections, often struggle to represent the asymmetric and nonlinear distortions encountered in complex environments such as autonomous navigation, robotics, and immersive imaging. Although neural methods offer greater adaptability, they demand extensive training data, are computationally intensive, and often lack transparency. This work introduces a symbolic model discovery framework guided by physical knowledge, where symbolic regression and genetic programming (GP) are used in tandem to identify calibration models tailored to specific optical behaviors. The approach incorporates a broad class of known distortion models, including Brown–Conrady, Mei–Rives, Kannala–Brandt, and double-sphere, as modular components, while remaining extensible to any predefined or domain-specific formulation. Embedding these models directly into the symbolic search process constrains the solution space, enabling efficient parameter fitting and robust model selection without overfitting. Through empirical evaluation across a variety of lens types, including fisheye, omnidirectional, catadioptric, and traditional cameras, we show that our method produces results on par with or surpassing those of established calibration techniques. The outcome is a flexible, interpretable, and resource-efficient alternative suitable for deployment scenarios where calibration data are scarce or computational resources are constrained.
Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Evaluating Feature-Based Homography Pipelines for Dual-Camera Registration in Acupoint Annotation
by
Thathsara Nanayakkara, Hadi Sedigh Malekroodi, Jaeuk Sul, Chang-Su Na, Myunggi Yi and Byeong-il Lee
J. Imaging 2025, 11(11), 388; https://doi.org/10.3390/jimaging11110388 - 1 Nov 2025
Abstract
Reliable acupoint localization is essential for developing artificial intelligence (AI) and extended reality (XR) tools in traditional Korean medicine; however, conventional annotation of 2D images often suffers from inter- and intra-annotator variability. This study presents a low-cost dual-camera imaging system that fuses infrared
[...] Read more.
Reliable acupoint localization is essential for developing artificial intelligence (AI) and extended reality (XR) tools in traditional Korean medicine; however, conventional annotation of 2D images often suffers from inter- and intra-annotator variability. This study presents a low-cost dual-camera imaging system that fuses infrared (IR) and RGB views on a Raspberry Pi 5 platform, incorporating an IR ink pen in conjunction with a 780 nm emitter array to standardize point visibility. Among the tested marking materials, the IR ink showed the highest contrast and visibility under IR illumination, making it the most suitable for acupoint detection. Five feature detectors (SIFT, ORB, KAZE, AKAZE, and BRISK) were evaluated with two matchers (FLANN and BF) to construct representative homography pipelines. Comparative evaluations across multiple camera-to-surface distances revealed that KAZE + FLANN achieved the lowest mean 2D error (1.17 ± 0.70 px) and the lowest mean aspect-aware error (0.08 ± 0.05%) while remaining computationally feasible on the Raspberry Pi 5. In hand-image experiments across multiple postures, the dual-camera registration maintained a mean 2D error below ~3 px and a mean aspect-aware error below ~0.25%, confirming stable and reproducible performance. The proposed framework provides a practical foundation for generating high-quality acupoint datasets, supporting future AI-based localization, XR integration, and automated acupuncture-education systems.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
GATF-PCQA: A Graph Attention Transformer Fusion Network for Point Cloud Quality Assessment
by
Abdelouahed Laazoufi, Mohammed El Hassouni and Hocine Cherifi
J. Imaging 2025, 11(11), 387; https://doi.org/10.3390/jimaging11110387 - 1 Nov 2025
Abstract
Point cloud quality assessment remains a critical challenge due to the high dimensionality and irregular structure of 3D data, as well as the need to align objective predictions with human perception. To solve this, we suggest a novel graph-based learning architecture that integrates
[...] Read more.
Point cloud quality assessment remains a critical challenge due to the high dimensionality and irregular structure of 3D data, as well as the need to align objective predictions with human perception. To solve this, we suggest a novel graph-based learning architecture that integrates perceptual features with advanced graph neural networks. Our method consists of four main stages: First, key perceptual features, including curvature, saliency, and color, are extracted to capture relevant geometric and visual distortions. Second, a graph-based representation of the point cloud is created using these characteristics, where nodes represent perceptual clusters and weighted edges encode their feature similarities, yielding a structured adjacency matrix. Third, a novel Graph Attention Network Transformer Fusion (GATF) module dynamically refines the importance of these features and generates a unified, view-specific representation. Finally, a Graph Convolutional Network (GCN) regresses the fused features into a final quality score. We validate our approach on three benchmark datasets: ICIP2020, WPC, and SJTU-PCQA. Experimental results demonstrate that our method achieves high correlation with human subjective scores, outperforming existing state-of-the-art metrics by effectively modeling the perceptual mechanisms of quality judgment.
Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications (3rd Edition))
►▼
Show Figures

Figure 1
Open AccessArticle
Gated Attention-Augmented Double U-Net for White Blood Cell Segmentation
by
Ilyes Benaissa, Athmane Zitouni, Salim Sbaa, Nizamettin Aydin, Ahmed Chaouki Megherbi, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed and Cosimo Distante
J. Imaging 2025, 11(11), 386; https://doi.org/10.3390/jimaging11110386 - 1 Nov 2025
Abstract
Segmentation of white blood cells is critical for a wide range of applications. It aims to identify and isolate individual white blood cells from medical images, enabling accurate diagnosis and monitoring of diseases. In the last decade, many researchers have focused on this
[...] Read more.
Segmentation of white blood cells is critical for a wide range of applications. It aims to identify and isolate individual white blood cells from medical images, enabling accurate diagnosis and monitoring of diseases. In the last decade, many researchers have focused on this task using U-Net, one of the most used deep learning architectures. To further enhance segmentation accuracy and robustness, recent advances have explored the combination of U-Net with other techniques, such as attention mechanisms and aggregation techniques. However, a common challenge in white blood cell image segmentation is the similarity between the cells’ cytoplasm and other surrounding blood components, which often leads to inaccurate or incomplete segmentation due to difficulties in distinguishing low-contrast or subtle boundaries, leaving a significant gap for improvement. In this paper, we propose GAAD-U-Net, a novel architecture that integrates attention-augmented convolutions to better capture ambiguous boundaries and complex structures such as overlapping cells and low-contrast regions, followed by a gating mechanism to further suppress irrelevant feature information. These two key components are integrated in the Double U-Net base architecture. Our model achieves state-of-the-art performance on white blood cell benchmark datasets, with a 3.4% Dice score coefficient (DSC) improvement specifically on the SegPC-2021 dataset. The proposed model achieves superior performance as measured by mean the intersection over union (IoU) and DSC, with notably strong segmentation performance even for difficult images.
Full article
(This article belongs to the Special Issue Computer Vision for Medical Image Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Lightweight and Real-Time Driver Fatigue Detection Based on MG-YOLOv8 with Facial Multi-Feature Fusion
by
Chengming Chen, Xinyue Liu, Meng Zhou, Zhijian Li, Zhanqi Du and Yandan Lin
J. Imaging 2025, 11(11), 385; https://doi.org/10.3390/jimaging11110385 - 1 Nov 2025
Abstract
Driver fatigue is a primary factor in traffic accidents and poses a serious threat to road safety. To address this issue, this paper proposes a multi-feature fusion fatigue detection method based on an improved YOLOv8 model. First, the method uses an enhanced YOLOv8
[...] Read more.
Driver fatigue is a primary factor in traffic accidents and poses a serious threat to road safety. To address this issue, this paper proposes a multi-feature fusion fatigue detection method based on an improved YOLOv8 model. First, the method uses an enhanced YOLOv8 model to achieve high-precision face detection. Then, it crops the detected face regions. Next, the lightweight PFLD (Practical Facial Landmark Detector) model performs keypoint detection on the cropped images, extracting 68 facial feature points and calculating key indicators related to fatigue status. These indicators include the eye aspect ratio (EAR), eyelid closure percentage (PERCLOS), mouth aspect ratio (MAR), and head posture ratio (HPR). To mitigate the impact of individual differences on detection accuracy, the paper introduces a novel sliding window model that combines a dynamic threshold adjustment strategy with an exponential weighted moving average (EWMA) algorithm. Based on this framework, blink frequency (BF), yawn frequency (YF), and nod frequency (NF) are calculated to extract time-series behavioral features related to fatigue. Finally, the driver’s fatigue state is determined using a comprehensive fatigue assessment algorithm. Experimental results on the WIDER FACE and YAWDD datasets demonstrate this method’s significant advantages in improving detection accuracy and computational efficiency. By striking a better balance between real-time performance and accuracy, the proposed method shows promise for real-world driving applications.
Full article
(This article belongs to the Special Issue Towards Deeper Understanding of Image and Video Processing and Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Structure-Aware Progressive Multi-Modal Fusion Network for RGB-T Crack Segmentation
by
Zhengrong Yuan, Xin Ding, Xinhong Xia, Yibin He, Hui Fang, Bo Yang and Wei Fu
J. Imaging 2025, 11(11), 384; https://doi.org/10.3390/jimaging11110384 - 1 Nov 2025
Abstract
Crack segmentation in images plays a pivotal role in the monitoring of structural surfaces, serving as a fundamental technique for assessing structural integrity. However, existing methods that rely solely on RGB images exhibit high sensitivity to light conditions, which significantly restricts their adaptability
[...] Read more.
Crack segmentation in images plays a pivotal role in the monitoring of structural surfaces, serving as a fundamental technique for assessing structural integrity. However, existing methods that rely solely on RGB images exhibit high sensitivity to light conditions, which significantly restricts their adaptability in complex environmental scenarios. To address this, we propose a structure-aware progressive multi-modal fusion network (SPMFNet) for RGB-thermal (RGB-T) crack segmentation. The main idea is to integrate complementary information from RGB and thermal images and incorporate structural priors (edge information) to achieve accurate segmentation. Here, to better fuse multi-layer features from different modalities, a progressive multi-modal fusion strategy is designed. In the shallow encoder layers, two gate control attention (GCA) modules are introduced to dynamically regulate the fusion process through a gating mechanism, allowing the network to adaptively integrate modality-specific structural details based on the input. In the deeper layers, two attention feature fusion (AFF) modules are employed to enhance semantic consistency by leveraging both local and global attention, thereby facilitating the effective interaction and complementarity of high-level multi-modal features. In addition, edge prior information is introduced to encourage the predicted crack regions to preserve structural integrity, which is constrained by a joint loss of edge-guided loss, multi-scale focal loss, and adaptive fusion loss. Experimental results on publicly available RGB-T crack detection datasets demonstrate that the proposed method outperforms both classical and advanced approaches, verifying the effectiveness of the progressive fusion strategy and the utilization of the structural prior.
Full article
(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition))
►▼
Show Figures

Figure 1
Open AccessArticle
CDSANet: A CNN-ViT-Attention Network for Ship Instance Segmentation
by
Weidong Zhu, Piao Wang and Kuifeng Luan
J. Imaging 2025, 11(11), 383; https://doi.org/10.3390/jimaging11110383 - 31 Oct 2025
Abstract
Ship instance segmentation in remote sensing images is essential for maritime applications such as intelligent surveillance and port management. However, this task remains challenging due to dense target distributions, large variations in ship scales and shapes, and limited high-quality datasets. The existing YOLOv8
[...] Read more.
Ship instance segmentation in remote sensing images is essential for maritime applications such as intelligent surveillance and port management. However, this task remains challenging due to dense target distributions, large variations in ship scales and shapes, and limited high-quality datasets. The existing YOLOv8 framework mainly relies on convolutional neural networks and CIoU loss, which are less effective in modeling global–local interactions and producing accurate mask boundaries. To address these issues, we propose CDSANet, a novel one-stage ship instance segmentation network. CDSANet integrates convolutional operations, Vision Transformers, and attention mechanisms within a unified architecture. The backbone adopts a Convolutional Vision Transformer Attention (CVTA) module to enhance both local feature extraction and global context perception. The neck employs dynamic-weighted DOWConv to adaptively handle multi-scale ship instances, while SIoU loss improves localization accuracy and orientation robustness. Additionally, CBAM enhances the network’s focus on salient regions, and a MixUp-based augmentation strategy is used to improve model generalization. Extensive experiments on the proposed VLRSSD dataset demonstrate that CDSANet achieves state-of-the-art performance with a mask AP (50–95) of 75.9%, surpassing the YOLOv8 baseline by 1.8%.
Full article
(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition))
►▼
Show Figures

Figure 1
Open AccessArticle
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture
by
Vladimir Frants, Sos Agaian, Karen Panetta and Peter Huang
J. Imaging 2025, 11(11), 382; https://doi.org/10.3390/jimaging11110382 - 30 Oct 2025
Abstract
Images used in real-world applications such as image or video retrieval, outdoor surveillance, and autonomous driving suffer from poor weather conditions. When designing robust computer vision systems, removing adverse weather such as haze, rain, and snow is a significant problem. Recently, deep-learning methods
[...] Read more.
Images used in real-world applications such as image or video retrieval, outdoor surveillance, and autonomous driving suffer from poor weather conditions. When designing robust computer vision systems, removing adverse weather such as haze, rain, and snow is a significant problem. Recently, deep-learning methods offered a solution for a single type of degradation. Current state-of-the-art universal methods struggle with combinations of degradations, such as haze and rain streaks. Few algorithms have been developed that perform well when presented with images containing multiple adverse weather conditions. This work focuses on developing an efficient solution for multiple adverse weather removal, using a unified quaternion neural architecture called CMAWRNet. It is based on a novel texture–structure decomposition block, a novel lightweight encoder–decoder quaternion transformer architecture, and an attentive fusion block with low-light correction. We also introduce a quaternion similarity loss function to better preserve color information. The quantitative and qualitative evaluation of the current state-of-the-art benchmarking datasets and real-world images shows the performance advantages of the proposed CMAWRNet, compared to other state-of-the-art weather removal approaches dealing with multiple weather artifacts. Extensive computer simulations validate that CMAWRNet improves the performance of downstream applications, such as object detection. This is the first time the decomposition approach has been applied to the universal weather removal task.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
A Deep Regression Model for Tongue Image Color Correction Based on CNN
by
Xiyuan Cao, Delong Zhang, Chunyang Jin, Wei Zhang, Zhidong Zhang and Chenyang Xue
J. Imaging 2025, 11(11), 381; https://doi.org/10.3390/jimaging11110381 - 29 Oct 2025
Abstract
Different viewing or shooting situations can affect color authenticity and generally lead to visual inconsistencies for the same images. At present, deep learning has gained popularity and opened up new avenues for image processing and optimization. In this paper, we propose a novel
[...] Read more.
Different viewing or shooting situations can affect color authenticity and generally lead to visual inconsistencies for the same images. At present, deep learning has gained popularity and opened up new avenues for image processing and optimization. In this paper, we propose a novel regression model named TococoNet (Tongue Color Correction Network) that extends from CNN (convolutional neural network) to eliminate the color bias in tongue images. The TococoNet model consists of symmetric encoder-–decoder U-Blocks which are connected by M-Block through concatenation layers for feature fusion at different levels. Initially, we train our model by simulatively introducing five common biased colors. The various image quality indicators holistically demonstrate that our model achieves accurate color correction for tongue images, and simultaneously surpasses conventional algorithms and shallow networks. Furthermore, we conduct correction experiments by introducing random degrees of color bias, and the model continues to perform well for achieving excellent correction effects. The model achieves up to 84% correction effectiveness in terms of color distance ΔE for tongue images with varying degrees of random color cast. Finally, we obtain excellent color correction for actual captured images for tongue diagnosis application. Among these, the maximum ΔE can be reduced from 30.38 to 6.05. Overall, the TococoNet model possesses excellent color correction capabilities, which opens promising opportunities for clinical assistance and automatic diagnosis.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Pov9D: Point Cloud-Based Open-Vocabulary 9D Object Pose Estimation
by
Tianfu Wang and Hongguang Wang
J. Imaging 2025, 11(11), 380; https://doi.org/10.3390/jimaging11110380 - 28 Oct 2025
Abstract
We propose a point cloud-based framework for open-vocabulary object pose estimation, called Pov9D. Existing approaches are predominantly RGB-based and often rely on texture or appearance cues, making them susceptible to pose ambiguities when objects are textureless or lack distinctive visual features. In contrast,
[...] Read more.
We propose a point cloud-based framework for open-vocabulary object pose estimation, called Pov9D. Existing approaches are predominantly RGB-based and often rely on texture or appearance cues, making them susceptible to pose ambiguities when objects are textureless or lack distinctive visual features. In contrast, Pov9D takes 3D point clouds as input, enabling direct access to geometric structures that are essential for accurate and robust pose estimation, especially in open-vocabulary settings. To bridge the gap between geometric observations and semantic understanding, Pov9D integrates category-level textual descriptions to guide the estimation process. To this end, we introduce a text-conditioned shape prior generator that predicts a normalized object shape from both the observed point cloud and the textual category description. This shape prior provides a consistent geometric reference, facilitating precise prediction of object translation, rotation, and size, even for unseen categories. Extensive experiments on the OO3D-9D benchmark demonstrate that Pov9D achieves state-of-the-art performance, improving Abs IoU@50 by 7.2% and Rel 10° 10 cm by 27.2% over OV9D.
Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Causal Intervention and Counterfactual Reasoning for Multimodal Pedestrian Trajectory Prediction
by
Xinyu Han and Huosheng Xu
J. Imaging 2025, 11(11), 379; https://doi.org/10.3390/jimaging11110379 - 28 Oct 2025
Abstract
Pedestrian trajectory prediction is crucial for autonomous systems navigating human-populated environments. However, existing methods face fundamental challenges including spurious correlations induced by confounding social environments, passive uncertainty modeling that limits prediction diversity, and bias coupling during feature interaction that contaminates trajectory representations. To
[...] Read more.
Pedestrian trajectory prediction is crucial for autonomous systems navigating human-populated environments. However, existing methods face fundamental challenges including spurious correlations induced by confounding social environments, passive uncertainty modeling that limits prediction diversity, and bias coupling during feature interaction that contaminates trajectory representations. To address these issues, we propose a novel Causal Intervention and Counterfactual Reasoning (CICR) framework that shifts trajectory prediction from associative learning to a causal inference paradigm. Our approach features a hierarchical architecture having three core components: a Multisource Encoder that extracts comprehensive spatio-temporal and social context features; a Causal Intervention Fusion Module that eliminates confounding bias through the front-door criterion and cross-attention mechanisms; and a Counterfactual Reasoning Decoder that proactively generates diverse future trajectories by simulating hypothetical scenarios. Extensive experiments on the ETH/UCY, SDD, and AVD datasets demonstrate superior performance, achieving an average ADE/FDE of 0.17/0.24 on ETH/UCY and 7.13/10.29 on SDD, with particular advantages in long-term prediction and cross-domain generalization.
Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Investigation of the Robustness and Transferability of Adversarial Patches in Multi-View Infrared Target Detection
by
Qing Zhou, Zhongchen Zhou, Zhaoxiang Zhang, Wei Luo, Feng Xiao, Sijia Xia, Chunjia Zhu and Long Wang
J. Imaging 2025, 11(11), 378; https://doi.org/10.3390/jimaging11110378 - 27 Oct 2025
Abstract
This paper proposes a novel adversarial patch-generation method for infrared images, focusing on enhancing the robustness and transferability of infrared adversarial patches. To improve the flexibility and diversity of the generation process, a Bernoulli random dropout strategy is adopted. The loss function integrates
[...] Read more.
This paper proposes a novel adversarial patch-generation method for infrared images, focusing on enhancing the robustness and transferability of infrared adversarial patches. To improve the flexibility and diversity of the generation process, a Bernoulli random dropout strategy is adopted. The loss function integrates multiple components, including target hiding loss, smoothing loss, structural similarity loss, and patch pixel value loss, ensuring that the generated patches maintain low texture complexity and natural visual features. During model training, the Grad-CAM algorithm is employed to identify the critical regions of interest in the target detector, where adversarial patches are applied to maximize the attack effectiveness. Furthermore, affine transformations and random erasing operations are introduced to increase the diversity and adaptability of patches, thereby enhancing their effectiveness across different scenarios. Experimental results demonstrate that the proposed GADP (Generative Adversarial Patch based on Bernoulli Random Dropout and Loss Function Optimization) algorithm achieves a high Attack Success Rate of 75.8% on various target detection models, significantly reducing the average precision (AP). Specifically, the AP of the YOLOv5s model drops from 81.3% to 15.1%. Compared with existing adversarial attack methods such as advYOLO Patch and QR Attack, GADP exhibits superior transferability and attack performance, reducing the Average Precision of multiple detection models to around 40%. The proposed method is not only theoretically innovative but also shows potential practical value, particularly in tasks such as unmanned aerial vehicle (UAV) detection and ground security under low-visibility environments. This study provides new insights into adversarial attack research for infrared target recognition.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach
by
Mohammed Jeryo and Ahad Harati
J. Imaging 2025, 11(11), 377; https://doi.org/10.3390/jimaging11110377 - 27 Oct 2025
Abstract
Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of
[...] Read more.
Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of event data and the absence of dense annotations are significant obstacles to supervised learning for motion segmentation from event streams. Domain adaptation is also challenging due to the considerable domain shift in intensity images. To address these challenges, we propose a two-phase cross-modality adaptation framework that translates motion segmentation knowledge from labeled RGB-flow data to unlabeled event streams. A dual-branch encoder extracts modality-specific motion and appearance features from RGB and optical flow in the source domain. Using reconstruction networks, event voxel grids are converted into pseudo-image and pseudo-flow modalities in the target domain. These modalities are subsequently re-encoded using frozen RGB-trained encoders. Multi-level consistency losses are implemented on features, predictions, and outputs to enforce domain alignment. Our design enables the model to acquire domain-invariant, semantically rich features through the use of shallow architectures, thereby reducing training costs and facilitating real-time inference with a lightweight prediction path. The proposed architecture, alongside the utilized hybrid loss function, effectively bridges the domain and modality gap. We evaluate our method on two challenging benchmarks: EVIMO2, which incorporates real-world dynamics, high-speed motion, illumination variation, and multiple independently moving objects; and MOD++, which features complex object dynamics, collisions, and dense 1kHz supervision in synthetic scenes. The proposed UDA framework achieves 83.1% and 79.4% accuracy on EVIMO2 and MOD++, respectively, outperforming existing state-of-the-art approaches, such as EV-Transfer and SHOT, by up to 3.6%. Additionally, it is lighter and faster and also delivers enhanced mIoU and F1 Score.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Adverse-Weather Image Restoration Method Based on VMT-Net
by
Zhongmin Liu, Xuewen Yu and Wenjin Hu
J. Imaging 2025, 11(11), 376; https://doi.org/10.3390/jimaging11110376 - 26 Oct 2025
Abstract
To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling
[...] Read more.
To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling of Vision Mamba with the global contextual reasoning of Transformers, facilitating the joint modeling of global structures and local details, thus mitigating information loss and detail blurring during restoration. Second, we introduce an Adaptive Content Guidance (ACG) module that employs dynamic gating and spatial–channel attention to enable effective inter-layer feature fusion, thereby enhancing cross-layer semantic consistency. Finally, we embed the VMT and ACG modules into a U-Net backbone, achieving efficient integration of multi-scale feature modeling and cross-layer fusion, significantly improving reconstruction quality under complex weather conditions. The experimental results show that on Snow100K-S/L, VMT-Net improves PSNR over the baseline by approximately 0.89 dB and 0.36 dB, with SSIM gains of about 0.91% and 0.11%, respectively. On Outdoor-Rain and Raindrop, it performs similarly to the baseline and exhibits superior detail recovery in real-world scenes. Overall, the method demonstrates robustness and strong detail restoration across diverse adverse-weather conditions.
Full article
(This article belongs to the Topic Transformer and Deep Learning Applications in Image Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
The Structural Similarity Can Identify the Presence of Noise in Video Data from Unmanned Vehicles
by
Anzor Orazaev, Pavel Lyakhov, Valery Andreev and Denis Butusov
J. Imaging 2025, 11(11), 375; https://doi.org/10.3390/jimaging11110375 - 26 Oct 2025
Abstract
This paper proposes a method for detecting distorted frames in video footage recorded by an unmanned vehicle. The proposed detection method is performed by analyzing a sequence of video frames, utilizing the contrast aspect of the structural similarity index between previous and current
[...] Read more.
This paper proposes a method for detecting distorted frames in video footage recorded by an unmanned vehicle. The proposed detection method is performed by analyzing a sequence of video frames, utilizing the contrast aspect of the structural similarity index between previous and current frames. This approach allows for the detection of distortions in the video caused by various types of noise. The scientific novelty lies in the targeted adaptation of the SSIM component to the task of real interframe analysis in conditions of shooting from an unmanned vehicle, in the absence of a reference. The three videos were considered during the simulation. They were distorted by random significant impulse noise, Gaussian noise, and mixed noise. Every 100th frame of the experimental video was subjected to distortion with increasing density. An additional measure was introduced to provide a more accurate assessment of distortion detection quality. This measure is based on the average absolute difference in similarity between video frames. The developed approach allows for effective identification of distortions and is of significant importance for monitoring systems and video data analysis, particularly in footage obtained from unmanned vehicles, where video quality is critical for subsequent processing and analysis.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
A Comprehensive Evaluation of Thigh Mineral-Free Lean Mass Measures Using Dual Energy X-Ray Absorptiometry (DXA) in Young Children
by
Trey R. Naylor, Mariana V. Jacobs, Michael A. Samaan, Laura C. Murphy, Douglas J. Schneider, Margaret O. Murphy, Hong Huang, John A. Bauer and Jody L. Clasey
J. Imaging 2025, 11(11), 374; https://doi.org/10.3390/jimaging11110374 - 25 Oct 2025
Abstract
This study aimed to (1) demonstrate the intra- and interrater reliability of quadriceps (QUADS) and hamstring (HAMS) mineral-free lean (MFL) mass measures using DXA scanning, (2) determine the association of total thigh MFL mass measures with MFL mass measures of the hamstrings and
[...] Read more.
This study aimed to (1) demonstrate the intra- and interrater reliability of quadriceps (QUADS) and hamstring (HAMS) mineral-free lean (MFL) mass measures using DXA scanning, (2) determine the association of total thigh MFL mass measures with MFL mass measures of the hamstrings and quadriceps combined and (3) analyze the association between total thigh MFL mass and total body MFL mass measures. A total of 80 young children (aged 5 to 11 yrs) participated and unique regions of interest were created using custom analysis software with manual tracing of the QUADS, HAMS, and total thigh MFL mass measures. Repeated-measure analysis of variance was used to determine if there were significant differences among the MFL measures while intraclass correlation coefficients (ICC), coefficients of variation (CV), and regression analysis were used to determine the intra- and interrater reliability and the explained variance in the association among MFL mass measures. The right interrater QUADS MFL mass was the only significant group mean difference, and ICCs between (≥0.961) and within (≥0.919) raters were high for all MFL measures with low variation across all MFL measures (≤6.13%). The explained variance was 92.5% and 96.3% for the between-investigator analyses of the right and left total thigh MFL mass measures, respectively. Furthermore, 97.5% of the variance in total body MFL mass was explained by the total thigh MFL mass. DXA MFL mass measures of the QUADS, HAMS and total thigh can be confidently used in young children and may provide an alternative to CT or MRI scanning when assessing changes in MFL cross-sectional area or volume measures due to disease progression, training and rehabilitative strategies.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessCorrection
Correction: El Othmani et al. AI-driven Automated Blood Cell Anomaly Detection: Enhancing Diagnostics and Telehealth in Hematology. J. Imaging 2025, 11, 157
by
Oussama El Othmani, Amine Mosbah, Aymen Yahyaoui, Amina Bouatay and Raouf Dhaouadi
J. Imaging 2025, 11(11), 373; https://doi.org/10.3390/jimaging11110373 - 24 Oct 2025
Abstract
The authors wish to make the following corrections to the published paper [...]
Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Open AccessArticle
Redefining MRI-Based Skull Segmentation Through AI-Driven Multimodal Integration
by
Michel Beyer, Alexander Aigner, Alexandru Burde, Alexander Brasse, Sead Abazi, Lukas B. Seifert, Jakob Wasserthal, Martin Segeroth, Mohamed Omar and Florian M. Thieringer
J. Imaging 2025, 11(11), 372; https://doi.org/10.3390/jimaging11110372 - 22 Oct 2025
Abstract
Skull segmentation in magnetic resonance imaging (MRI) is essential for cranio-maxillofacial (CMF) surgery planning, yet manual approaches are time-consuming and error-prone. Computed tomography (CT) provides superior bone contrast but exposes patients to ionizing radiation, which is particularly concerning in pediatric care. This study
[...] Read more.
Skull segmentation in magnetic resonance imaging (MRI) is essential for cranio-maxillofacial (CMF) surgery planning, yet manual approaches are time-consuming and error-prone. Computed tomography (CT) provides superior bone contrast but exposes patients to ionizing radiation, which is particularly concerning in pediatric care. This study presents an AI-based workflow that enables skull segmentation directly from routine MRI. Using 186 paired CT–MRI datasets, CT-based segmentations were transferred to MRI via multimodal registration to train dedicated deep learning models. Performance was evaluated against manually segmented CT ground truth using Dice Similarity Coefficient (DSC), Mean Surface Distance (MSD), and Hausdorff Distance (HD). AI achieved higher performance on CT (DSC 0.981) than MRI (DSC 0.864), with MSD and HD also favoring CT. Despite lower absolute accuracy on MRI, the approach substantially improved segmentation quality compared with manual MRI methods, particularly in clinically relevant regions. This automated method enables accurate skull modeling from standard MRI without radiation exposure or specialized sequences. While CT remains more precise, the presented framework enhances MRI utility in surgical planning, reduces manual workload, and supports safer, patient-specific treatment, especially for pediatric and trauma cases.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Journal Menu
► ▼ Journal Menu-
- J. Imaging Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Sections
- Article Processing Charge
- Indexing & Archiving
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Conferences
- Editorial Office
- 10th Anniversary
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, Electronics, MAKE, J. Imaging, Sensors
Applied Computer Vision and Pattern Recognition: 2nd Edition
Topic Editors: Antonio Fernández-Caballero, Byung-Gyu KimDeadline: 31 December 2025
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Advancement in Multispectral and Hyperspectral Pansharpening Image Processing
Guest Editors: Simone Zini, Mirko Paolo Barbato, Flavio PiccoliDeadline: 10 November 2025
Special Issue in
J. Imaging
Advances in Machine Learning for Computer Vision Applications
Guest Editors: Gurmail Singh, Stéfano Frizzo StefenonDeadline: 30 November 2025
Special Issue in
J. Imaging
Explainable AI in Computer Vision
Guest Editor: Bas Van der VeldenDeadline: 30 November 2025
Special Issue in
J. Imaging
Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition
Guest Editors: Gerardo Cazzato, Francesca ArezzoDeadline: 30 November 2025




