-
Gated Attention-Augmented Double U-Net for White Blood Cell Segmentation -
Symbolic Regression for Interpretable Camera Calibration -
GATF-PCQA: A Graph Attention Transformer Fusion Network for Point Cloud Quality Assessment -
Multi-Channel Spectro-Temporal Representations for Parkinson’s Detection -
Image Matching: Foundations, State of the Art, and Future Directions
Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 15.3 days after submission; acceptance to publication is undertaken in 3.5 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
Neural Radiance Fields: Driven Exploration of Visual Communication and Spatial Interaction Design for Immersive Digital Installations
J. Imaging 2025, 11(11), 411; https://doi.org/10.3390/jimaging11110411 - 13 Nov 2025
Abstract
In immersive digital devices, high environmental complexity can lead to rendering delays and loss of interactive details, resulting in a fragmented experience. This paper proposes a lightweight NeRF (Neural Radiance Fields) modeling and multimodal perception fusion method. First, a sparse hash code is
[...] Read more.
In immersive digital devices, high environmental complexity can lead to rendering delays and loss of interactive details, resulting in a fragmented experience. This paper proposes a lightweight NeRF (Neural Radiance Fields) modeling and multimodal perception fusion method. First, a sparse hash code is constructed based on Instant-NGP (Instant Neural Graphics Primitives) to accelerate scene radiance field generation. Second, parameter distillation and channel pruning are used to reduce the model’s size and reduce computational overheads. Next, multimodal data from a depth camera and an IMU (Inertial Measurement Unit) is fused, and Kalman filtering is used to improve pose tracking accuracy. Finally, the optimized NeRF model is integrated into the Unity engine, utilizing custom shaders and asynchronous rendering to achieve low-latency viewpoint responsiveness. Experiments show that the file size of this method in high-complexity scenes is only 79.5 MB ± 5.3 MB, and the first loading time is only 2.9 s ± 0.4 s, effectively reducing rendering latency. The SSIM is 0.951 ± 0.016 at 1.5 m/s, and the GME is 7.68 ± 0.15 at 1.5 m/s. It can stably restore texture details and edge sharpness under dynamic viewing angles. In scenarios that support 3–5 people interacting simultaneously, the average interaction response delay is only 16.3 ms, and the average jitter error is controlled at 0.12°, significantly improving spatial interaction performance. In conclusion, this study provides effective technical solutions for high-quality immersive interaction in complex public scenarios. Future work will explore the framework’s adaptability in larger-scale dynamic environments and further optimize the network synchronization mechanism for multi-user concurrency.
Full article
(This article belongs to the Section Image and Video Processing)
►
Show Figures
Open AccessArticle
TASA: Text-Anchored State–Space Alignment for Long-Tailed Image Classification
by
Long Li, Tinglei Jia, Huaizhi Yue, Huize Cheng, Yongfeng Bu and Zhaoyang Zhang
J. Imaging 2025, 11(11), 410; https://doi.org/10.3390/jimaging11110410 - 13 Nov 2025
Abstract
Long-tailed image classification remains challenging for vision–language models. Head classes dominate training while tail classes are underrepresented and noisy, and short prompts with weak text supervision further amplify head bias. This paper presents TASA, an end-to-end framework that stabilizes textual supervision and enhances
[...] Read more.
Long-tailed image classification remains challenging for vision–language models. Head classes dominate training while tail classes are underrepresented and noisy, and short prompts with weak text supervision further amplify head bias. This paper presents TASA, an end-to-end framework that stabilizes textual supervision and enhances cross-modal fusion. A Semantic Distribution Modulation (SDM) module constructs class-specific text prototypes by cosine-weighted fusion of multiple LLM-generated descriptions with a canonical template, providing stable and diverse semantic anchors without training text parameters. Dual-Space Cross-Modal Fusion (DCF) module incorporates selective-scan state–space blocks into both image and text branches, enabling bidirectional conditioning and efficient feature fusion through a lightweight multilayer perceptron. Together with a margin-aware alignment loss, TASA aligns images with class prototypes for classification without requiring paired image–text data or per-class prompt tuning. Experiments on CIFAR-10/100-LT, ImageNet-LT, and Places-LT demonstrate consistent improvements across many-, medium-, and few-shot groups. Ablation studies confirm that DCF yields the largest single-module gain, while SDM and DCF combined provide the most robust and balanced performance. These results highlight the effectiveness of integrating text-driven prototypes with state–space fusion for long-tailed classification.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Image Matching for UAV Geolocation: Classical and Deep Learning Approaches
by
Fatih Baykal, Mehmet İrfan Gedik, Constantino Carlos Reyes-Aldasoro and Cefa Karabağ
J. Imaging 2025, 11(11), 409; https://doi.org/10.3390/jimaging11110409 - 12 Nov 2025
Abstract
Today, unmanned aerial vehicles (UAVs) are heavily dependent on Global Navigation Satellite Systems (GNSSs) for positioning and navigation. However, GNSS signals are vulnerable to jamming and spoofing attacks. This poses serious security risks, especially for military operations and critical civilian missions. In order
[...] Read more.
Today, unmanned aerial vehicles (UAVs) are heavily dependent on Global Navigation Satellite Systems (GNSSs) for positioning and navigation. However, GNSS signals are vulnerable to jamming and spoofing attacks. This poses serious security risks, especially for military operations and critical civilian missions. In order to solve this problem, an image-based geolocation system has been developed that eliminates GNSS dependency. The proposed system estimates the geographical location of the UAV by matching the aerial images taken by the UAV with previously georeferenced high-resolution satellite images. For this purpose, common visual features were determined between satellite and UAV images and matching operations were carried out using methods based on the homography matrix. Thanks to image processing, a significant relationship has been established between the area where the UAV is located and the geographical coordinates, and reliable positioning is ensured even in cases where GNSS signals cannot be used. Within the scope of the study, traditional methods such as SIFT, AKAZE, and Multiple Template Matching were compared with learning-based methods including SuperPoint, SuperGlue, and LoFTR. The results showed that deep learning-based approaches can make successful matches, especially at high altitudes.
Full article
(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Wafer Defect Detection Technology Based on CTM-IYOLOv10 Network
by
Pengcheng Ji, Zhenzhi He, Weiwei Yang, Jiawei Du, Guo Ye and Xiangning Lu
J. Imaging 2025, 11(11), 408; https://doi.org/10.3390/jimaging11110408 - 12 Nov 2025
Abstract
The continuous scaling of semiconductor devices has increased the density and complexity of wafer dies, making precise and efficient defect detection a critical task for intelligent manufacturing. Traditional manual or semi-automated inspection approaches are often inefficient, error-prone, and susceptible to missed or false
[...] Read more.
The continuous scaling of semiconductor devices has increased the density and complexity of wafer dies, making precise and efficient defect detection a critical task for intelligent manufacturing. Traditional manual or semi-automated inspection approaches are often inefficient, error-prone, and susceptible to missed or false detections, particularly for small or irregular defects. This study presents a wafer defect detection framework that integrates clustering–template matching (CTM) with an improved YOLOv10 network (CTM-IYOLOv10). The CTM strategy enhances die segmentation efficiency and mitigates redundant matching in multi-die fields of view, while the introduction of a modified GhostConv module and an enhanced BiFPN structure strengthens feature representation, reduces computational redundancy, and improves small-object detection. Furthermore, data augmentation strategies are employed to improve robustness and generalization. Experimental evaluations demonstrate that CTM-IYOLOv10 achieves a detection accuracy of 98.1%, reduces inference time by 23.2%, and compresses model size by 52.3% compared with baseline YOLOv10, and consistently outperforms representative detectors such as YOLOv5 and YOLOv8. These results highlight both the methodological contributions of the proposed architecture and its practical significance for real-time wafer defect inspection in semiconductor manufacturing.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Fully Automated AI-Based Digital Workflow for Mirroring of Healthy and Defective Craniofacial Models
by
Michel Beyer, Julian Grossi, Alexandru Burde, Sead Abazi, Lukas Seifert, Joachim Polligkeit, Neha Umakant Chodankar and Florian M. Thieringer
J. Imaging 2025, 11(11), 407; https://doi.org/10.3390/jimaging11110407 - 12 Nov 2025
Abstract
The accurate reconstruction of craniofacial defects requires the precise segmentation and mirroring of healthy anatomy. Conventional workflows rely on manual interaction, making them time-consuming and subject to operator variability. This study developed and validated a fully automated digital pipeline that integrates deep learning–based
[...] Read more.
The accurate reconstruction of craniofacial defects requires the precise segmentation and mirroring of healthy anatomy. Conventional workflows rely on manual interaction, making them time-consuming and subject to operator variability. This study developed and validated a fully automated digital pipeline that integrates deep learning–based segmentation with algorithmic mirroring for craniofacial reconstruction. A total of 388 cranial CT scans were used to train a three-dimensional nnU-Net model for skull and mandible segmentation. A Principal Component Analysis–Iterative Closest Point (PCA–ICP) algorithm was then applied to compute the sagittal symmetry plane and perform mirroring. Automated results were compared with expert-generated segmentations and manually defined symmetry planes using Dice Similarity Coefficient (DSC), Mean Surface Distance (MSD), Hausdorff Distance (HD), and angular deviation. The nnU-Net achieved high segmentation accuracy for both the mandible (mean DSC 0.956) and the skull (mean DSC 0.965). Mirroring results showed minimal angular deviation from expert reference planes (mandible: 1.32° ± 0.71° in defect cases, 1.58° ± 1.12° in intact cases; skull: 1.75° ± 0.84° in defect cases, 1.15° ± 0.81° in intact cases). The presence of defects did not significantly affect accuracy. This automated workflow demonstrated robust performance and clinical applicability, offering standardized, reproducible, and time-efficient planning for craniofacial reconstruction.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessCommunication
High-Resolution Peripheral Quantitative Computed Tomography (HR-pQCT) for Assessment of Avascular Necrosis of the Lunate
by
Esin Rothenfluh, Georg F. Erbach, Léna G. Dietrich, Laura De Pellegrin, Daniela A. Frauchiger and Rainer J. Egli
J. Imaging 2025, 11(11), 406; https://doi.org/10.3390/jimaging11110406 - 12 Nov 2025
Abstract
This exploratory study investigates the feasibility and diagnostic value of high-resolution peripheral quantitative computed tomography (HR-pQCT) in detecting structural and microarchitectural changes in lunate avascular necrosis (AVN), or Kienböck’s disease. Five adult patients with unilateral AVN underwent either MRI or CT, alongside HR-pQCT
[...] Read more.
This exploratory study investigates the feasibility and diagnostic value of high-resolution peripheral quantitative computed tomography (HR-pQCT) in detecting structural and microarchitectural changes in lunate avascular necrosis (AVN), or Kienböck’s disease. Five adult patients with unilateral AVN underwent either MRI or CT, alongside HR-pQCT of both wrists. Imaging features such as subchondral remodeling, joint space narrowing, and bone fragmentation were assessed across modalities. HR-pQCT detected at least one additional pathological feature not seen on MRI or CT in four of five patients and revealed early subchondral changes in two contralateral asymptomatic wrists. Quantitative measurements of bone volume fraction (BV/TV) further indicated altered trabecular structure correlating with disease stage. These findings suggest that HR-pQCT may offer enhanced sensitivity for early-stage AVN and better delineation of disease extent, which is critical for informed surgical planning. While limited by small sample size, this study provides preliminary evidence supporting HR-pQCT as a complementary imaging tool in the assessment of lunate AVN, with potential to improve early detection, staging accuracy, and individualized treatment strategies.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Unified-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images
by
Yongheng Zhang
J. Imaging 2025, 11(11), 405; https://doi.org/10.3390/jimaging11110405 - 11 Nov 2025
Abstract
This work introduces Uni-Removal, an innovative two-stage framework that effectively addresses the critical challenge of domain adaptation in unified image restoration. Contemporary approaches often face significant performance degradation when transitioning from synthetic training environments to complex real-world scenarios due to the substantial domain
[...] Read more.
This work introduces Uni-Removal, an innovative two-stage framework that effectively addresses the critical challenge of domain adaptation in unified image restoration. Contemporary approaches often face significant performance degradation when transitioning from synthetic training environments to complex real-world scenarios due to the substantial domain discrepancy. Our proposed solution establishes a comprehensive pipeline that systematically bridges this gap through dual-phase representation learning. In the first stage, we implement a structured multi-teacher knowledge distillation mechanism that enables a unified student architecture to assimilate and integrate specialized expertise from multiple pre-trained degradation-specific networks. This knowledge transfer is rigorously regularized by our novel Instance-Grained Contrastive Learning (IGCL) objective, which explicitly enforces representation consistency across both feature hierarchies and image spaces. The second stage introduces a groundbreaking output distribution calibration methodology that employs Cluster-Grained Contrastive Learning (CGCL) to adversarially align the restored outputs with authentic real-world image characteristics, effectively embedding the student model within the natural image manifold without requiring paired supervision. Comprehensive experimental validation demonstrates Uni-Removal’s superior performance across multiple real-world degradation tasks including dehazing, deraining, and deblurring, where it consistently surpasses existing state-of-the-art methods. The framework’s exceptional generalization capability is further evidenced by its competitive denoising performance on the SIDD benchmark and, more significantly, by delivering a substantial 4.36 mAP improvement in downstream object detection tasks, unequivocally establishing its practical utility as a robust pre-processing component for advanced computer vision systems.
Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications (3rd Edition))
►▼
Show Figures

Figure 1
Open AccessArticle
LatAtk: A Medical Image Attack Method Focused on Lesion Areas with High Transferability
by
Long Li, Yibo Huang, Chong Li, Fei Zhou, Jingjing Li and Kamarul Hawari Ghazali
J. Imaging 2025, 11(11), 404; https://doi.org/10.3390/jimaging11110404 - 11 Nov 2025
Abstract
The rise in trusted machine learning has prompted concerns about the security, reliability and controllability of deep learning, especially when it is applied to sensitive areas involving life and health safety. To thoroughly analyze potential attacks and promote innovation in security technologies for
[...] Read more.
The rise in trusted machine learning has prompted concerns about the security, reliability and controllability of deep learning, especially when it is applied to sensitive areas involving life and health safety. To thoroughly analyze potential attacks and promote innovation in security technologies for DNNs, this paper conducts research on adversarial attacks against medical images and proposes a medical image attack method that focuses on lesion areas and has good transferability, named LatAtk. First, based on the image segmentation algorithm, LatAtk divides the target image into an attackable area (lesion area) and a non-attackable area and injects perturbations into the attackable area to disrupt the attention of the DNNs. Second, a class activation loss function based on gradient-weighted class activation mapping is proposed. By obtaining the importance of features in images, the features that play a positive role in model decision-making are further disturbed, making LatAtk highly transferable. Third, a texture feature loss function based on local binary patterns is proposed as a constraint to reduce the damage to non-semantic features, effectively preserving texture features of target images and improving the concealment of adversarial samples. Experimental results show that LatAtk has superior aggressiveness, transferability and concealment compared to advanced baselines.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Integrating Deep Learning and Radiogenomics: A Novel Approach to Glioblastoma Segmentation and MGMT Methylation Prediction
by
Nabil M. Abdelaziz, Emad Abdel-Aziz Dawood and Alshaimaa A. Tantawy
J. Imaging 2025, 11(11), 403; https://doi.org/10.3390/jimaging11110403 - 11 Nov 2025
Abstract
Radiogenomics, which integrates imaging phenotypes with genomic profiles, enhances diagnosis, prognosis, and treatment planning for glioblastomas. This study specifically establishes a correlation between radiomic features and MGMT promoter methylation status, advancing towards a non-invasive, integrated diagnostic paradigm. Conventional genetic analysis requires invasive biopsies,
[...] Read more.
Radiogenomics, which integrates imaging phenotypes with genomic profiles, enhances diagnosis, prognosis, and treatment planning for glioblastomas. This study specifically establishes a correlation between radiomic features and MGMT promoter methylation status, advancing towards a non-invasive, integrated diagnostic paradigm. Conventional genetic analysis requires invasive biopsies, which cause delays in obtaining results and necessitate further surgeries. Our methodology is twofold: First, an enhanced U-Net model segments brain tumor regions with high precision (Dice coefficient: 0.889). Second, a hybrid classifier, leveraging the complementary features of EfficientNetB0 and ResNet50, predicts MGMT promoter methylation status from the segmented volumes. The proposed framework demonstrated superior performance in predicting MGMT promoter methylation status in glioblastoma patients compared to conventional methods, achieving a classification accuracy of 95% and an AUC of 0.96. These results underscore the model’s potential to enhance patient stratification and guide treatment selection. The accurate prediction of MGMT promoter methylation status via non-invasive imaging provides a reliable criterion for anticipating patient responsiveness to alkylating chemotherapy. This capability equips clinicians with a tool to inform personalized treatment strategies, optimizing therapeutic efficacy from the outset.
Full article
(This article belongs to the Topic Intelligent Image Processing Technology)
►▼
Show Figures

Figure 1
Open AccessArticle
Key-Frame-Aware Hierarchical Learning for Robust Gait Recognition
by
Ke Wang and Hua Huo
J. Imaging 2025, 11(11), 402; https://doi.org/10.3390/jimaging11110402 - 10 Nov 2025
Abstract
Gait recognition in unconstrained environments is severely hampered by variations in view, clothing, and carrying conditions. To address this, we introduce HierarchGait, a key-frame-aware hierarchical learning framework. Our approach uniquely integrates three complementary modules: a TemplateBlock-based Motion Extraction (TBME) for coarse-to-fine anatomical feature
[...] Read more.
Gait recognition in unconstrained environments is severely hampered by variations in view, clothing, and carrying conditions. To address this, we introduce HierarchGait, a key-frame-aware hierarchical learning framework. Our approach uniquely integrates three complementary modules: a TemplateBlock-based Motion Extraction (TBME) for coarse-to-fine anatomical feature learning, a Sequence-Level Spatio-temporal Feature Aggregator (SSFA) to identify and prioritize discriminative key-frames, and a Frame-level Feature Re-segmentation Extractor (FFRE) to capture fine-grained motion details. This synergistic design yields a robust and comprehensive gait representation. We demonstrate the superiority of our method through extensive experiments. On the highly challenging CASIA-B dataset, HierarchGait achieves new state-of-the-art average Rank-1 accuracies of 98.1% under Normal (NM), 95.9% under Bag (BG), and 87.5% under Coat (CL) conditions. Furthermore, on the large-scale OU-MVLP dataset, our model attains a 91.5% average accuracy. These results validate the significant advantage of explicitly modeling anatomical hierarchies and temporal key-moments for robust gait recognition.
Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
►▼
Show Figures

Figure 1
Open AccessArticle
Knee Cartilage Quantification: Performance of Low-Field MR in Detecting Low Grades of Chondropathy
by
Francesco Pucciarelli, Antonio Marino, Maria Carla Faugno, Giuseppe Argento, Edoardo Monaco, Andrea Redler, Nicola Maffulli, Pierfrancesco Orlandi, Marta Zerunian, Domenico De Santis, Michela Polici, Damiano Caruso, Marco Francone and Andrea Laghi
J. Imaging 2025, 11(11), 401; https://doi.org/10.3390/jimaging11110401 - 8 Nov 2025
Abstract
This study aimed to evaluate the diagnostic accuracy of T2 mapping on low-field (0.31 T) MRI for detecting low-grade knee chondropathy, using arthroscopy as the reference standard. Fifty-two patients (mean age 48.1 ± 17.2 years) undergoing arthroscopy for anterior cruciate ligament or meniscal
[...] Read more.
This study aimed to evaluate the diagnostic accuracy of T2 mapping on low-field (0.31 T) MRI for detecting low-grade knee chondropathy, using arthroscopy as the reference standard. Fifty-two patients (mean age 48.1 ± 17.2 years) undergoing arthroscopy for anterior cruciate ligament or meniscal tears were prospectively enrolled, excluding those with previous surgery, infection, or high-grade chondropathy (Outerbridge III–IV). MRI was performed with a 0.31 T scanner using a 3D SHARC sequence, and T2 relaxometric maps were generated for 14 cartilage regions per knee according to the WORMS classification. Arthroscopy, performed within one month by two blinded surgeons, served as the gold standard. A total of 728 regions were analyzed. T2 mapping differentiated healthy cartilage (grade 0) from early chondropathy (grades I–II) with an optimal cut-off of 45 ms and moderate discriminative accuracy (AUC = 0.714 for Reader 1 and 0.709 for Reader 2). Agreement with arthroscopy was good (κ = 0.731), with excellent intra-reader (ICC = 0.998) and good inter-reader reproducibility (ICC = 0.753). Most degenerative changes were located at the femoral condyles (59%). Low-field T2 mapping showed good diagnostic performance and reproducibility in detecting early cartilage degeneration, supporting its potential as a cost-effective and accessible quantitative biomarker for the assessment of cartilage integrity in clinical practice.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Benchmarking Compact VLMs for Clip-Level Surveillance Anomaly Detection Under Weak Supervision
by
Kirill Borodin, Kirill Kondrashov, Nikita Vasiliev, Ksenia Gladkova, Inna Larina, Mikhail Gorodnichev and Grach Mkrtchian
J. Imaging 2025, 11(11), 400; https://doi.org/10.3390/jimaging11110400 - 8 Nov 2025
Abstract
CCTV safety monitoring demands anomaly detectors combine reliable clip-level accuracy with predictable per-clip latency despite weak supervision. This work investigates compact vision–language models (VLMs) as practical detectors for this regime. A unified evaluation protocol standardizes preprocessing, prompting, dataset splits, metrics, and runtime settings
[...] Read more.
CCTV safety monitoring demands anomaly detectors combine reliable clip-level accuracy with predictable per-clip latency despite weak supervision. This work investigates compact vision–language models (VLMs) as practical detectors for this regime. A unified evaluation protocol standardizes preprocessing, prompting, dataset splits, metrics, and runtime settings to compare parameter-efficiently adapted compact VLMs against training-free VLM pipelines and weakly supervised baselines. Evaluation spans accuracy, precision, recall, F1, ROC-AUC, and average per-clip latency to jointly quantify detection quality and efficiency. With parameter-efficient adaptation, compact VLMs achieve performance on par with, and in several cases exceeding, established approaches while retaining competitive per-clip latency. Adaptation further reduces prompt sensitivity, producing more consistent behavior across prompt regimes under the shared protocol. These results show that parameter-efficient fine-tuning enables compact VLMs to serve as dependable clip-level anomaly detectors, yielding a favorable accuracy–efficiency trade-off within a transparent and consistent experimental setup.
Full article
(This article belongs to the Special Issue Object Detection in Video Surveillance Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
HitoMi-Cam: A Shape-Agnostic Person Detection Method Using the Spectral Characteristics of Clothing
by
Shuji Ono
J. Imaging 2025, 11(11), 399; https://doi.org/10.3390/jimaging11110399 - 7 Nov 2025
Abstract
While convolutional neural network (CNN)-based object detection is widely used, it exhibits a shape dependency that degrades performance for postures not included in the training data. Building upon our previous simulation study published in this journal, this study implements and evaluates the spectral-based
[...] Read more.
While convolutional neural network (CNN)-based object detection is widely used, it exhibits a shape dependency that degrades performance for postures not included in the training data. Building upon our previous simulation study published in this journal, this study implements and evaluates the spectral-based approach on physical hardware to address this limitation. Specifically, this paper introduces HitoMi-Cam, a lightweight and shape-agnostic person detection method that uses the spectral reflectance properties of clothing. The author implemented the system on a resource-constrained edge device without a GPU to assess its practical viability. The results indicate that a processing speed of 23.2 frames per second (fps) (253 × 190 pixels) is achievable, suggesting that the method can be used for real-time applications. In a simulated search and rescue scenario where the performance of CNNs declines, HitoMi-Cam achieved an average precision (AP) of 93.5%, surpassing that of the compared CNN models (best AP of 53.8%). Throughout all evaluation scenarios, the occurrence of false positives remained minimal. This study positions the HitoMi-Cam method not as a replacement for CNN-based detectors but as a complementary tool under specific conditions. The results indicate that spectral-based person detection can be a viable option for real-time operation on edge devices in real-world environments where shapes are unpredictable, such as disaster rescue.
Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
A Deep Learning-Based Approach for Explainable Microsatellite Instability Detection in Gastrointestinal Malignancies
by
Ludovica Ciardiello, Patrizia Agnello, Marta Petyx, Fabio Martinelli, Mario Cesarelli, Antonella Santone and Francesco Mercaldo
J. Imaging 2025, 11(11), 398; https://doi.org/10.3390/jimaging11110398 - 7 Nov 2025
Abstract
Microsatellite instability represents a key biomarker in gastrointestinal cancers with significant diagnostic and therapeutic implications. Traditional molecular assays for microsatellite instability detection, while effective, are costly, time-consuming, and require specialized infrastructure. In this paper we propose an explainable deep learning-based method for microsatellite
[...] Read more.
Microsatellite instability represents a key biomarker in gastrointestinal cancers with significant diagnostic and therapeutic implications. Traditional molecular assays for microsatellite instability detection, while effective, are costly, time-consuming, and require specialized infrastructure. In this paper we propose an explainable deep learning-based method for microsatellite instability detection starting from the analysis of histopathological images. We consider a set of convolutional neural network architectures i.e., MobileNet, Inception, VGG16, VGG19, and a Vision Transformer model, and we propose a way to provide a kind of clinical explainability behind the model prediction through (three) Class Activation Mapping techniques. With the aim to further strengthen trustworthiness in predictions, we introduce a set of robustness metrics aimed to quantify the consistency of highlighted discriminative regions across different Class Activation Mapping methods. Experimental results on a real-world dataset demonstrate that VGG16 and VGG19 models achieve the best performance in terms of accuracy; in particular, the VGG16 model obtains an accuracy of 0.926, while the VGG19 one reaches an accuracy equal to 0.917. Furthermore, Class Activation Mapping techniques confirmed that the developed models consistently focus on similar tissue regions, while robustness analysis highlighted high agreement between different Class Activation Mapping techniques. These results indicate that the proposed method not only achieves interesting predictive accuracy but also provides explainable predictions, with the aim to boost the integration of deep learning into real-world clinical practice.
Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)
Open AccessArticle
Real-Time Colorimetric Imaging System for Automated Quality Classification of Natural Rubber Using Yellowness Index Analysis
by
Suphatchakorn Limhengha and Supattarachai Sudsawat
J. Imaging 2025, 11(11), 397; https://doi.org/10.3390/jimaging11110397 - 7 Nov 2025
Abstract
Natural rubber quality assessment traditionally relies on subjective visual inspection, leading to inconsistent grading and processing inefficiencies. This study presents a colorimetric imaging system integrating 48-megapixel image acquisition with automated colorimetric analysis for objective rubber classification. Five rubber grades—white crepe, STR5, STR5L, RSS3,
[...] Read more.
Natural rubber quality assessment traditionally relies on subjective visual inspection, leading to inconsistent grading and processing inefficiencies. This study presents a colorimetric imaging system integrating 48-megapixel image acquisition with automated colorimetric analysis for objective rubber classification. Five rubber grades—white crepe, STR5, STR5L, RSS3, and RSS5—were analyzed using standardized 25 × 25 mm2 specimens under controlled environmental conditions (25 ± 2 °C, 50 ± 5% relative humidity, 3200 K illumination). The image processing pipeline employed color space transformations from RGB through CIE1931 to CIELAB coordinates, with yellowness index calculation following ASTM E313-20 standards. The classification algorithm achieved 100% accuracy across 100 validation specimens under controlled laboratory conditions, with a processing time of 1.01 ± 0.09 s per specimen. Statistical validation via one-way ANOVA confirmed measurement reliability (p > 0.05) with yellowness index values ranging from 8.52 ± 0.52 for white crepe to 72.15 ± 7.47 for RSS3. Image quality metrics demonstrated a signal-to-noise ratio exceeding 35 dB and a spatial uniformity coefficient of variation below 5%. The system provides 12-fold throughput improvement over manual inspection, offering objective quality assessment suitable for industrial implementation, though field validation under diverse conditions remains necessary.
Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Self-Tuned Two-Stage Point Cloud Reconstruction Framework Combining TPDn and PU-Net
by
Zhiping Ying and Dayuan Lv
J. Imaging 2025, 11(11), 396; https://doi.org/10.3390/jimaging11110396 - 6 Nov 2025
Abstract
This paper presents a self-tuned two-stage framework for point cloud reconstruction. A parameter-free denoising module (TPDn) automatically selects thresholds through polynomial model fitting to remove noise and outliers without manual tuning. The denoised cloud is then upsampled by PU-Net to recover fine-grained geometry.
[...] Read more.
This paper presents a self-tuned two-stage framework for point cloud reconstruction. A parameter-free denoising module (TPDn) automatically selects thresholds through polynomial model fitting to remove noise and outliers without manual tuning. The denoised cloud is then upsampled by PU-Net to recover fine-grained geometry. This synergy enhances structural consistency and demonstrates qualitative robustness under various noise conditions. Experiments on synthetic datasets and real industrial scans show that the proposed method improves geometric accuracy and uniformity while maintaining low computational cost. The framework is simple, efficient, and easily scalable to large-scale point clouds.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing
by
Yubo Wang, Ruijia Wen, Hiroyuki Ishii and Jun Ohya
J. Imaging 2025, 11(11), 395; https://doi.org/10.3390/jimaging11110395 - 6 Nov 2025
Abstract
Recent deep learning-based remote sensing analysis models often struggle with performance degradation due to domain shifts caused by illumination variations (clear to overcast), changing atmospheric conditions (clear to foggy, dusty), and physical scene changes (clear to snowy). Addressing domain shift in aerial image
[...] Read more.
Recent deep learning-based remote sensing analysis models often struggle with performance degradation due to domain shifts caused by illumination variations (clear to overcast), changing atmospheric conditions (clear to foggy, dusty), and physical scene changes (clear to snowy). Addressing domain shift in aerial image segmentation is challenging due to limited training data availability, including costly data collection and annotation. We propose Multi-Weather DomainShifter, a comprehensive multi-weather domain transfer system that augments single-domain images into various weather conditions without additional laborious annotation, coordinated by a large language model (LLM) agent. Specifically, we utilize Unreal Engine to construct a synthetic dataset featuring images captured under diverse conditions such as overcast, foggy, and dusty settings. We then propose a latent space style transfer model that generates alternate domain versions based on real aerial datasets. Additionally, we present a multi-modal snowy scene diffusion model with LLM-assisted scene descriptors to add snowy elements into scenes. Multi-weather DomainShifter integrates these two approaches into a tool library and leverages the agent for tool selection and execution. Extensive experiments on the ISPRS Vaihingen and Potsdam dataset demonstrate that domain shift caused by weather change in aerial image-leads to significant performance drops, then verify our proposal’s capacity to adapt models to perform well in shifted domains while maintaining their effectiveness in the original domain.
Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
RepSAU-Net: Semantic Segmentation of Barcodes in Complex Backgrounds via Fused Self-Attention and Reparameterization Methods
by
Yanfei Sun, Junyu Wang and Rui Yin
J. Imaging 2025, 11(11), 394; https://doi.org/10.3390/jimaging11110394 - 6 Nov 2025
Abstract
In the digital era, commodity barcodes serve as a bridge between the physical and digital worlds and are widely used in retail checkout systems. To meet the broader application demands for product identification, this paper proposes a method for locating, semantically segmenting barcodes
[...] Read more.
In the digital era, commodity barcodes serve as a bridge between the physical and digital worlds and are widely used in retail checkout systems. To meet the broader application demands for product identification, this paper proposes a method for locating, semantically segmenting barcodes in complex backgrounds, decoding hidden information, and recovering these barcodes in wide field-of-view images. This method integrates self-attention mechanisms and reparameterization techniques to construct a RepSAU-Net model. Specifically, this paper first introduces a barcode image dataset synthesis strategy adapted for deep learning models, constructing the SBS (Screen Stego Barcodes) dataset, which comprises 2000 wide field-of-view background images (Type A) and 400 information-hidden barcode images (Type B), totaling 30,000 images. Based on this, a network architecture (RepSAU-Net) combining a self-attention mechanism and RepVGG reparameterization technology was designed, with a parameter count of 32.88 M. Experimental results demonstrate that this network performs well in barcode segmentation tasks, achieving an inference speed of 4.88 frames/s, a Mean Intersection over Union (MIoU) of 98.36%, and an Accuracy (Acc) of 94.96%. This research effectively enhances global information capture and feature extraction capabilities without significantly increasing computational load, providing technical support for the application of data-embedded barcodes.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
PGRF: Physics-Guided Rectified Flow for Low-Light RAW Image Enhancement
by
Juntai Zeng and Qingyun Yang
J. Imaging 2025, 11(11), 393; https://doi.org/10.3390/jimaging11110393 - 6 Nov 2025
Abstract
Enhancing RAW images acquired under low-light conditions remains a fundamental yet challenging problem in computational photography and image signal processing. Recent deep learning-based approaches have shifted from real paired datasets toward synthetic data generation, where sensor noise is typically simulated through physical modeling.
[...] Read more.
Enhancing RAW images acquired under low-light conditions remains a fundamental yet challenging problem in computational photography and image signal processing. Recent deep learning-based approaches have shifted from real paired datasets toward synthetic data generation, where sensor noise is typically simulated through physical modeling. However, most existing methods primarily account for additive noise, neglect multiplicative noise components, and rely on global calibration procedures that fail to capture pixel-level manufacturing variability. Consequently, these methods struggle to faithfully reproduce the complex statistics of real sensor noise. To overcome these limitations, this paper introduces a physically grounded composite noise model that jointly incorporates additive and multiplicative noise components. We further propose a per-pixel noise simulation and calibration strategy, which estimates and synthesizes noise individually for each pixel. This physics-based calibration not only circumvents the constraints of global noise modeling but also captures spatial noise variations arising from microscopic CMOS sensor fabrication differences. Inspired by the recent success of rectified-flow methods in image generation, we integrate our physics-based noise synthesis into a rectified-flow generative framework and present PGRF (Physics-Guided Rectified Flow): a physics-guided rectified-flow framework for low-light RAW image enhancement. PGRF leverages the expressive capacity of rectified flows to model complex data distributions, while physical guidance constrains the generation process toward the desired clean image manifold. To evaluate our method, we constructed the LLID, a dedicated indoor low-light RAW benchmark captured using the Sony A7S II camera. Extensive experiments demonstrate that the proposed framework achieves substantial improvements over state-of-the-art methods in low-light RAW image enhancement.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessSystematic Review
Prognostic Value of Enterography Findings in Crohn’s Disease: A Systematic Review and Meta-Analysis
by
Felipe Montevechi-Luz, Adrieli Heloísa Campardo Pansani, Juliana Delgado Campos Mello, Ana Emilia Carvalho de Paula, Lívia Moreira Genaro, Marcia Carolina Mazzaro, Daniel Lahan-Martins and Raquel Franco Leal
J. Imaging 2025, 11(11), 392; https://doi.org/10.3390/jimaging11110392 - 5 Nov 2025
Abstract
Crohn’s disease is a chronic inflammatory disorder with variable progression that often leads to hospitalization, treatment escalation, or surgery. While clinical and endoscopic indices guide disease monitoring, cross-sectional enterography provides unique visualization of transmural and extramural inflammation, offering valuable prognostic information. This systematic
[...] Read more.
Crohn’s disease is a chronic inflammatory disorder with variable progression that often leads to hospitalization, treatment escalation, or surgery. While clinical and endoscopic indices guide disease monitoring, cross-sectional enterography provides unique visualization of transmural and extramural inflammation, offering valuable prognostic information. This systematic review and meta-analysis examined the prognostic significance of magnetic resonance enterography (MRE) and computed tomography enterography (CTE) in Crohn’s disease. Following PRISMA guidelines and a registered protocol, eight databases were systematically searched through August 2024. Two reviewers independently conducted data extraction, risk-of-bias assessment (QUADAS-2), and certainty grading (GRADE). Random-effects models were applied for pooled analyses. Eleven studies, including more than 1500 patients, met eligibility criteria. Across cohorts, transmural healing on enterography was consistently associated with favorable long-term outcomes, including a markedly lower need for surgery and hospitalization. Conversely, stenosis and persistent inflammatory activity identified patients at substantially higher risk of surgery, treatment intensification, or disease-related hospitalization. The certainty of evidence was high for surgical outcomes and moderate to low for other endpoints. Conventional enterography provides meaningful prognostic insight into Crohn’s disease and should be considered a complementary tool for risk stratification and treatment planning. Transmural healing represents a protective marker of a favorable disease course, whereas structural and inflammatory findings indicate patients who may benefit from closer monitoring or earlier therapeutic intervention.
Full article
(This article belongs to the Special Issue Bridging Medical Imaging and Biosignal Analysis: Innovations in Healthcare Diagnostics)
►▼
Show Figures

Figure 1
Journal Menu
► ▼ Journal Menu-
- J. Imaging Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Sections
- Article Processing Charge
- Indexing & Archiving
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Conferences
- Editorial Office
- 10th Anniversary
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, Electronics, MAKE, J. Imaging, Sensors
Applied Computer Vision and Pattern Recognition: 2nd Edition
Topic Editors: Antonio Fernández-Caballero, Byung-Gyu KimDeadline: 31 December 2025
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Advances in Machine Learning for Computer Vision Applications
Guest Editors: Gurmail Singh, Stéfano Frizzo StefenonDeadline: 30 November 2025
Special Issue in
J. Imaging
Object Detection in Video Surveillance Systems
Guest Editors: Jesús Ruiz-Santaquiteria Alegre, Juan Antonio Álvarez García, Harbinder SinghDeadline: 30 November 2025
Special Issue in
J. Imaging
Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition
Guest Editors: Gerardo Cazzato, Francesca ArezzoDeadline: 30 November 2025
Special Issue in
J. Imaging
Advances in Photoacoustic Imaging: Tomography and Applications
Guest Editor: Xianlin SongDeadline: 30 November 2025





