-
A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis -
Next-Generation Advances in Prostate Cancer Imaging and Artificial Intelligence Applications -
Classifying Sex from MSCT-Derived 3D Mandibular Models Using an Adapted PointNet++ Deep Learning Approach in a Croatian Population -
AIGD Era: From Fragment to One Piece
Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques, published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
Endo-DET: A Domain-Specific Detection Framework for Multi-Class Endoscopic Disease Detection
J. Imaging 2026, 12(3), 112; https://doi.org/10.3390/jimaging12030112 - 6 Mar 2026
Abstract
Gastrointestinal cancers account for roughly a quarter of global cancer incidence, and early detection through endoscopy has proven effective in reducing mortality. Multi-class endoscopic disease detection, however, faces three persistent challenges: feature redundancy from non-pathological content, severe illumination inconsistency across imaging modalities, and
[...] Read more.
Gastrointestinal cancers account for roughly a quarter of global cancer incidence, and early detection through endoscopy has proven effective in reducing mortality. Multi-class endoscopic disease detection, however, faces three persistent challenges: feature redundancy from non-pathological content, severe illumination inconsistency across imaging modalities, and extreme scale variability with blurry boundaries. This paper introduces Endo-DET, a domain-specific detection framework addressing these challenges through three synergistic components. The Adaptive Lesion-Discriminative Filtering (ALDF) module achieves lesion-focused attention via sparse simplex projection, reducing complexity from to . The Global–Local Illumination Modulation Neck (GLIM-Neck) enables illumination-aware multi-scale fusion through four cooperative mechanisms, maintaining stable performance across white-light endoscopy, narrow-band imaging, and chromoendoscopy. The Lesion-aware Unified Calibration and Illumination-robust Discrimination (LUCID) module uses dual-stream reciprocal modulation to integrate boundary-sensitive textures with global semantics while suppressing instrument artifacts. Experiments on EDD2020, Kvasir-SEG, PolypGen2021, and CVC-ClinicDB show that Endo-DET improves mAP50-95 over the DEIM baseline by 5.8, 10.8, 4.1, and 10.1 percentage points respectively, with mAP75 gains of 6.1, 10.3, 6.8, and 9.3 points, and Recall50-95 improvements of 10.9, 12.1, 11.1, and 11.5 points. Running at 330 FPS with TensorRT FP16 optimization, Endo-DET achieves consistent cross-dataset improvements while maintaining real-time capability, providing a methodological foundation for clinical computer-aided diagnosis.
Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
►
Show Figures
Open AccessFeature PaperArticle
Evidence-Guided Diagnostic Reasoning for Pediatric Chest Radiology Based on Multimodal Large Language Models
by
Yuze Zhao, Qing Wang, Yingwen Wang, Ruiwei Zhao, Rui Feng and Xiaobo Zhang
J. Imaging 2026, 12(3), 111; https://doi.org/10.3390/jimaging12030111 - 6 Mar 2026
Abstract
Pediatric respiratory diseases are a leading cause of hospital admissions and childhood mortality worldwide, highlighting the critical need for accurate and timely diagnosis to support effective treatment and long-term care. Chest radiography remains the most widely used imaging modality for pediatric pulmonary assessment.
[...] Read more.
Pediatric respiratory diseases are a leading cause of hospital admissions and childhood mortality worldwide, highlighting the critical need for accurate and timely diagnosis to support effective treatment and long-term care. Chest radiography remains the most widely used imaging modality for pediatric pulmonary assessment. Consequently, reliable AI-assisted diagnostic methods are essential for alleviating the workload of clinical radiologists. However, most existing deep learning-based approaches are data-driven and formulate diagnosis as a black-box image classification task, resulting in limited interpretability and reduced clinical trustworthiness. To address these challenges, we propose a trustworthy two-stage diagnostic paradigm for pediatric chest X-ray diagnosis that closely aligns with the radiological workflow in clinical practice, in which the diagnosis procedure is constrained by evidence. In the first stage, a vision–language model fine-tuned on pediatric data identifies radiological findings from chest radiographs, producing structured and interpretable diagnostic evidence. In the second stage, a multimodal large language model integrates the radiograph, extracted findings, patient demographic information, and external medical domain knowledge with RAG mechanism to generate the final diagnosis. Experiments conducted on the VinDr-PCXR dataset demonstrate that our method achieves 90.1% diagnostic accuracy, 70.9% F1-score, and 82.5% AUC, representing up to a 13.1% increase in diagnosis accuracy over the state-of-the-art baselines. These results validate the effectiveness of combining multimodal reasoning with explicit medical evidence and domain knowledge, and indicate the strong potential of the proposed approach for trustworthy pediatric radiology diagnosis.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Forensic Analysis for Source Camera Identification from EXIF Metadata
by
Pengpeng Yang, Chen Zhou, Daniele Baracchi, Dasara Shullani, Yaobin Zou and Alessandro Piva
J. Imaging 2026, 12(3), 110; https://doi.org/10.3390/jimaging12030110 - 4 Mar 2026
Abstract
Source camera identification on smartphones constitutes a fundamental task in multimedia forensics, providing essential support for applications such as image copyright protection, illegal content tracking, and digital evidence verification. Numerous techniques have been developed for this task over the past decades. Among existing
[...] Read more.
Source camera identification on smartphones constitutes a fundamental task in multimedia forensics, providing essential support for applications such as image copyright protection, illegal content tracking, and digital evidence verification. Numerous techniques have been developed for this task over the past decades. Among existing approaches, Photo-Response Non-Uniformity (PRNU) has been widely recognized as a reliable device-specific fingerprint and has demonstrated remarkable performance in real-world applications. Nevertheless, the rapid advancement of computational photography technologies has introduced significant challenges: modern devices often exhibit anomalous behaviors under PRNU-based analysis. For instance, images captured by different devices may exhibit unexpected correlations, while images captured by the same device can vary substantially in their PRNU patterns. Current approaches are incapable of automatically exploring the underlying causes of these anomalous behaviors. To address this limitation, we propose a simple yet effective forensic analysis framework leveraging Exchangeable Image File Format (EXIF) metadata. Specifically, we represent EXIF metadata as type-aware word embeddings to preserve contextual information across tags. This design enables visual interpretation of the model’s decision-making process and provides complementary insights for identifying the anomalous behaviors observed in modern devices. Extensive experiments conducted on three public benchmark datasets demonstrate that the proposed method not only achieves state-of-the-art performance for source camera identification but also provides valuable insights into anomalous device behaviors.
Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
►▼
Show Figures

Figure 1
Open AccessArticle
A Hierarchical Multi-View Deep Learning Framework for Autism Classification Using Structural and Functional MRI
by
Nayif Mohammed Hammash and Mohammed Chachan Younis
J. Imaging 2026, 12(3), 109; https://doi.org/10.3390/jimaging12030109 - 4 Mar 2026
Abstract
Autism classification is challenging due to the subtle, heterogeneous, and overlapping neural activation profiles that occur in individuals with autism. Novel deep learning approaches, such as Convolutional Neural Networks (CNNs) and their variants, as well as Transformers, have shown moderate performance in discriminating
[...] Read more.
Autism classification is challenging due to the subtle, heterogeneous, and overlapping neural activation profiles that occur in individuals with autism. Novel deep learning approaches, such as Convolutional Neural Networks (CNNs) and their variants, as well as Transformers, have shown moderate performance in discriminating between autism and normal cohorts; yet, they often struggle to jointly capture the spatial–structural and temporal–functional variations present in autistic brains. To overcome these shortcomings, we propose a novel hierarchical deep learning framework that extracts the inherent spatial dependencies from the dual-modal MRI scans. For sMRI, we develop a 3D Hierarchical Convolutional Neural Network to capture both fine and coarse anatomical structures via multi-view projections along the axial, sagittal, and coronal planes. For the fMRI case, we introduced a bidirectional LSTM-based temporal encoder to examine regional brain dynamics and functional connectivity. The sequential embeddings and correlations are combined into a unified spatiotemporal representation of functional imaging, which is then classified using a multilayer perceptron to ensure continuity in diagnostic predictions across the examined modalities. Finally, a cross-modality fusion scheme was employed to integrate feature representations of both modalities. Extensive evaluations on the ABIDE I dataset (NYU repository) demonstrate that our proposed framework outperforms existing baselines, including Vision/Swin Transformers and various newly developed CNN variants. For the sMRI branch, we achieved 90.19 ± 0.12% accuracy (precision: 90.85 ± 0.16%, recall: 89.27 ± 0.19%, F1-score: 90.05 ± 0.14%, and focal loss: 0.3982). For the fMRI branch, we achieved an accuracy of 88.93 ± 0.15% (precision: 89.78 ± 0.18%, recall: 88.29 ± 0.20%, F1-score: 89.03 ± 0.17%, and focal loss of 0.4437). These outcomes affirm the superior generalization and robustness of the proposed framework for integrating structural and functional brain representations to achieve accurate autism classification.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Optimizing Radiographic Diagnosis Through Signal-Balanced Convolutional Models
by
Sakina Juzar Neemuchwala, Raja Hashim Ali, Qamar Abbas, Talha Ali Khan, Ambreen Shahnaz and Iftikhar Ahmed
J. Imaging 2026, 12(3), 108; https://doi.org/10.3390/jimaging12030108 - 4 Mar 2026
Abstract
Accurate interpretation of chest radiographs is central to the early diagnosis and management of pulmonary disorders. This study introduces an explainable deep learning framework that integrates biomedical signal fidelity analysis with transfer learning to enhance diagnostic reliability and transparency. Using the publicly available
[...] Read more.
Accurate interpretation of chest radiographs is central to the early diagnosis and management of pulmonary disorders. This study introduces an explainable deep learning framework that integrates biomedical signal fidelity analysis with transfer learning to enhance diagnostic reliability and transparency. Using the publicly available COVID-19 Radiography Dataset (21,165 chest X-ray images across four classes: COVID-19, Viral Pneumonia, Lung Opacity, and Normal), three architectures, namely baseline Convolutional Neural Network (CNN), ResNet-50, and EfficientNetB3, were trained and evaluated under varied class-balancing and hyperparameter configurations. Signal preservation was quantitatively verified using the Structural Similarity Index Measure (SSIM = 0.93 ± 0.02), ensuring that preprocessing retained key diagnostic features. Among all models, ResNet-50 achieved the highest classification accuracy (93.7%) and macro-AUC = 0.97 (class-balanced), whereas EfficientNetB3 demonstrated superior generalization with reduced parameter overhead. Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations confirmed anatomically coherent activations aligned with pathological lung regions, substantiating clinical interpretability. The integration of signal fidelity metrics with explainable deep learning presents a reproducible and computationally efficient framework for medical image analysis. These findings highlight the potential of signal-aware transfer learning to support reliable, transparent, and resource-efficient diagnostic decision-making in radiology and other imaging-based medical domains.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Development of Surveillance Robots Based on Face Recognition Using High-Order Statistical Features and Evidence Theory
by
Slim Ben Chaabane, Rafika Harrabi, Anas Bushnag and Hassene Seddik
J. Imaging 2026, 12(3), 107; https://doi.org/10.3390/jimaging12030107 - 28 Feb 2026
Abstract
The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are
[...] Read more.
The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are commonly employed in surveillance systems to handle risky tasks that are beyond human capability. In this paper, we present a prototype of a cost-effective mobile surveillance robot built on the Raspberry PI 4, designed for integration into various industrial environments. This smart robot detects intruders using IoT and face recognition technology. The proposed system is equipped with a passive infrared (PIR) sensor and a camera for capturing live-streaming video and photos, which are sent to the control room through IoT technology. Additionally, the system uses face recognition algorithms to differentiate between company staff and potential intruders. The face recognition method combines high-order statistical features and evidence theory to improve facial recognition accuracy and robustness. High-order statistical features are used to capture complex patterns in facial images, enhancing discrimination between individuals. Evidence theory is employed to integrate multiple information sources, allowing for better decision-making under uncertainty. This approach effectively addresses challenges such as variations in lighting, facial expressions, and occlusions, resulting in a more reliable and accurate face recognition system. When the system detects an unfamiliar individual, it sends out alert notifications and emails to the control room with the captured picture using IoT. A web interface has also been set up to control the robot from a distance through Wi-Fi connection. The proposed face recognition method is evaluated, and a comparative analysis with existing techniques is conducted. Experimental results with 400 test images of 40 individuals demonstrate the effectiveness of combining various attribute images in improving human face recognition performance. Experimental results indicate that the algorithm can identify human faces with an accuracy of 98.63%.
Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Vision–Language Models for Transmission Line Fault Detection: A New Approach for Grid Reliability and Optimization
by
Runle Yu, Lihao Mai, Yang Weng, Qiushi Cui, Guochang Xu and Pengliang Ren
J. Imaging 2026, 12(3), 106; https://doi.org/10.3390/jimaging12030106 - 28 Feb 2026
Abstract
Reliable fault detection along transmission corridors is essential for preventing small defects from developing into long outages and costly emergency operations. This study aims to improve the field reliability of an open vocabulary vision language backbone without retraining the large model in an
[...] Read more.
Reliable fault detection along transmission corridors is essential for preventing small defects from developing into long outages and costly emergency operations. This study aims to improve the field reliability of an open vocabulary vision language backbone without retraining the large model in an end-to-end manner. The work focuses on four operational fault classes in multi-region corridor imagery collected during routine inspections and uses a Florence-2 vision language model as the base recognizer. On top of this backbone, three domain-specific components are introduced. A subclass-aware fusion scheme keeps probability mass within the active parent concept so that insulator icing and conductor icing produce stable, action-oriented decisions. A Power-Line Focus Then Crop normalization uses an attention-guided corridor window together with isotropic resizing so that thin conductors and small fittings remain visible in the processed image. A corridor geo prior reduces scores as the distance from the mapped centerline increases and in this way suppresses detections that lie outside the corridor. All methods are evaluated under a shared preprocessing and scoring pipeline in training-free and parameter-efficient tuning modes. Experiments on unseen regions show higher accuracy for skinny and low-contrast faults, fewer false alarms outside the right-of-way, and improved score calibration in the confidence range used for triage, while keeping throughput and memory usage suitable for unmanned aerial vehicles and substation edge devices.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Stereo Gaussian Splatting with Adaptive Scene Depth Estimation for Semantic Mapping
by
Chenhui Fu and Jiangang Lu
J. Imaging 2026, 12(3), 105; https://doi.org/10.3390/jimaging12030105 - 28 Feb 2026
Abstract
Simultaneous Localization and Mapping (SLAM) is a fundamental capability in robotics and augmented reality. However, achieving accurate geometric reconstruction and consistent semantic understanding in complex environments remains challenging. Although recent neural implicit representations have improved reconstruction quality, they often suffer from high computational
[...] Read more.
Simultaneous Localization and Mapping (SLAM) is a fundamental capability in robotics and augmented reality. However, achieving accurate geometric reconstruction and consistent semantic understanding in complex environments remains challenging. Although recent neural implicit representations have improved reconstruction quality, they often suffer from high computational cost and the forgetting phenomenon during online mapping. In this paper, we propose StereoGS-SLAM, a stereo semantic SLAM framework based on 3D Gaussian Splatting (3DGS) for explicit scene representation. Unlike existing approaches, StereoGS-SLAM operates on passive RGB stereo inputs without requiring active depth sensors. An adaptive depth estimation strategy is introduced to dynamically refine Gaussian scales based on real-time stereo depth estimates, ensuring robust and scale-consistent reconstruction. In addition, we propose a hybrid keyframe selection strategy that integrates motion-aware selection with lightweight random sampling to improve keyframe diversity and maintain stable, real-time optimization. Experimental evaluations demonstrate that StereoGS-SLAM achieves consistent and competitive localization, rendering, and semantic reconstruction performance compared with recent 3DGS-based SLAM systems.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessReview
The Retina as a Proxy for Brain Neurodegeneration: A Narrative Review on OCT-Based Retinal Imaging in the Early Detection of Alzheimer’s and Parkinson’s Disease
by
Ouafa Sijilmassi
J. Imaging 2026, 12(3), 104; https://doi.org/10.3390/jimaging12030104 - 27 Feb 2026
Abstract
Neurodegenerative diseases, including Alzheimer’s disease (AD) and Parkinson’s disease (PD), are major causes of cognitive and motor decline, yet early diagnosis remains challenging due to asymptomatic phases and limited non-invasive biomarkers. This narrative review systematically synthesized studies on retinal imaging in AD and
[...] Read more.
Neurodegenerative diseases, including Alzheimer’s disease (AD) and Parkinson’s disease (PD), are major causes of cognitive and motor decline, yet early diagnosis remains challenging due to asymptomatic phases and limited non-invasive biomarkers. This narrative review systematically synthesized studies on retinal imaging in AD and PD. Published studies were identified through searches of PubMed, MEDLINE, Google Scholar, and reference lists, focusing on Optical Coherence Tomography (OCT), OCT Angiography (OCTA), and Spectral-Domain OCT (SD-OCT) assessing retinal structural and vascular changes. Data were extracted on retinal layer thickness, vascular parameters, and diagnostic metrics. Findings indicate that both diseases consistently exhibit thinning of inner retinal layers, particularly the retinal nerve fiber layer (RNFL) and ganglion cell–inner plexiform layer (GCIPL). In AD, studies reported progressive inner retinal thinning across disease stages, sometimes accompanied by outer retinal and retinal pigment epithelium changes. In PD, thinning was observed predominantly in RNFL and GCIPL, correlating with disease duration and motor severity. Microvascular alterations were described in both disorders, with disease-specific spatial patterns reported across studies. Overall, retinal imaging emerges as a non-invasive, high-resolution, and cost-effective tool for early detection, differential assessment, and longitudinal monitoring of neurodegenerative diseases. These findings support the translation of retinal biomarkers into clinical practice for improved disease management.
Full article
(This article belongs to the Special Issue Diagnostic Imaging: From Basic Knowledge to Latest Advancements)
►▼
Show Figures

Figure 1
Open AccessArticle
Restoration of Non-Uniform Motion-Blurred Star Images Based on Dynamic Strip Attention
by
Jixin Han, Zhaodong Niu and Jun He
J. Imaging 2026, 12(3), 103; https://doi.org/10.3390/jimaging12030103 - 27 Feb 2026
Abstract
When capturing star images in long-exposure mode, due to the relative motion between stars and space objects and the observation camera, strip tailings with different directions and lengths will be formed, resulting in a serious decline in image quality and inaccurate centroid positioning.
[...] Read more.
When capturing star images in long-exposure mode, due to the relative motion between stars and space objects and the observation camera, strip tailings with different directions and lengths will be formed, resulting in a serious decline in image quality and inaccurate centroid positioning. Traditional methods for restoring star images are prone to ringing effects and cannot restore the non-uniformly blurred star images. Aiming at this problem, this paper proposes a star image restoration network based on a dynamic strip attention mechanism. Firstly, a Multi-scale Dynamic Strip Pooling Module is designed to adaptively extract blurred features of different lengths and directions by dynamically adjusting the strip convolution. After that, a Multi-scale Feature Fusion Module is designed to fuse multi-level features to reduce the loss of image details of stars and space objects in the image. Experimental results demonstrate that the proposed method achieves a PSNR of 84.08 and an SSIM of 0.9928 on the 16-bit simulated dataset, outperforming both traditional methods and other deep learning-based approaches. Specifically, the recognition accuracy of star points is increased by 174% in comparison with unprocessed images. Furthermore, this paper validates the network using the real-world dataset spotGEO, and the results indicate that the average number of successfully recognized star points is increased by 57% compared to direct processing of the original images.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Fine-Grained Age-Class Identification of Moso Bamboo Using an Improved Lightweight YOLO11 Model
by
Yingbin Zhang, Xinhuang Zhang, Zhichao Cai, Xi He, Shuwei Chen, Zhengxuan Lai, Kunyong Yu and Riwen Lai
J. Imaging 2026, 12(3), 102; https://doi.org/10.3390/jimaging12030102 - 27 Feb 2026
Abstract
Accurate identification of moso bamboo (Phyllostachys edulis) age classes is essential for effective forestry resource management, yet existing methods often struggle to achieve a satisfactory balance between accuracy and computational efficiency under complex field conditions. To address this challenge, this study
[...] Read more.
Accurate identification of moso bamboo (Phyllostachys edulis) age classes is essential for effective forestry resource management, yet existing methods often struggle to achieve a satisfactory balance between accuracy and computational efficiency under complex field conditions. To address this challenge, this study proposes a lightweight object detection model, termed YOLO11-GCR, for fine-grained moso bamboo age-class classification based on close-range imagery. The proposed approach builds upon the YOLO11 framework and incorporates Ghost convolution, the Convolutional Block Attention Module (CBAM), and a Receptive Field Block (RFB) to reduce model complexity, enhance discriminative feature representation, and improve sensitivity to subtle texture variations among age classes. A dataset consisting of 9538 annotated bamboo culm images covering four age classes (I-du to IV-du) was constructed and divided into training, validation, and independent test sets with strict spatiotemporal separation. Experimental results indicate that YOLO11-GCR achieves robust detection performance with a lightweight architecture of 2.62 × 106 parameters and 6.2 GFLOPs, yielding an mAP@0.5 of 0.913 and an mAP@0.5–0.95 of 0.895 on the independent test set. Notably, the model demonstrates improved classification stability for visually similar age classes, such as II-du and III-du. Overall, this study presents an efficient and practical imaging-based solution for automated moso bamboo age-class recognition in complex natural environments.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessCommunication
Lensless Quantitative Phase Imaging with Bayer-Filtered Color Sensors Under Sequential RGB-LED Illumination
by
Jiajia Wu, Yining Li, Yuheng Luo, Leiting Pan, Pengming Song and Qiang Xu
J. Imaging 2026, 12(3), 101; https://doi.org/10.3390/jimaging12030101 - 26 Feb 2026
Abstract
Lensless on-chip microscopy enables high-throughput, wide-FOV imaging; however, the Bayer color filter array (CFA) in standard color sensors spatially multiplexes spectral channels, introducing sub-sampling and spectral crosstalk that degrade phase retrieval. We propose a Wirtinger Poly-Gradient Solver (WPGS) for quantitative phase reconstruction with
[...] Read more.
Lensless on-chip microscopy enables high-throughput, wide-FOV imaging; however, the Bayer color filter array (CFA) in standard color sensors spatially multiplexes spectral channels, introducing sub-sampling and spectral crosstalk that degrade phase retrieval. We propose a Wirtinger Poly-Gradient Solver (WPGS) for quantitative phase reconstruction with Bayer-filtered color sensors under sequential Red–Green–Blue Light-Emitting Diode (RGB-LED) illumination. The method combines Transport of Intensity Equation (TIE)-based initialization with polychromatic Wirtinger optimization to suppress CFA-induced artifacts and enable pixel super-resolution (PSR). Experiments resolve a linewidth using a pixel-pitch sensor, exceeding the nominal Nyquist limit imposed by pixel sampling. We further demonstrate label-free imaging of HeLa cells and unstained tissue sections, supporting high-throughput digital pathology and offering potential for longitudinal biological observation.
Full article
(This article belongs to the Section Computational Imaging and Computational Photography)
►▼
Show Figures

Figure 1
Open AccessReview
The Augmented Cytopathologist: A Conceptual Exploratory Narrative Review on Immersive and Vision–Language Models Tools in Digital Pathology
by
Enrico Giarnieri, Andrea Lastrucci, Alberto Ricci, Pierdonato Bruno and Daniele Giansanti
J. Imaging 2026, 12(3), 100; https://doi.org/10.3390/jimaging12030100 - 26 Feb 2026
Abstract
Emerging digital technologies, including immersive environments (VR/AR/XR) and Vision–Language Models (VLMs), have the potential to reshape digital pathology and medical imaging. While immersive tools can enhance spatial visualization and procedural training, VLM-based copilots offer cognitive and workflow support. Their combined impact on cytopathology
[...] Read more.
Emerging digital technologies, including immersive environments (VR/AR/XR) and Vision–Language Models (VLMs), have the potential to reshape digital pathology and medical imaging. While immersive tools can enhance spatial visualization and procedural training, VLM-based copilots offer cognitive and workflow support. Their combined impact on cytopathology remains largely conceptual and preclinical. This Conceptual Exploratory Narrative Review (CENR) examines how immersive technologies and VLM-based copilots may jointly influence cytopathologists’ professional workflow, training, and diagnostic processes, introducing the notion of the “augmented cytopathologist.” A structured exploratory approach integrated peer-reviewed literature, position papers, preprints, gray literature (technical reports, white papers, conference abstracts, blogs), and cross-disciplinary perspectives. Database searches (PubMed, Web of Science, Scopus) confirmed a limited number of studies addressing immersive or AI-assisted cytopathology imaging. Thematic analysis focused on four conceptual dimensions: (1) technological capabilities and maturity; (2) workflow and educational applications; (3) professional implications and cytopathologist role; and (4) responsible use of LLMs and VLMs as supportive tools. This approach emphasizes interpretation of emerging trends over aggregation of empirical data, enabling conceptual synthesis of early-stage implementations and perspectives in the field. Immersive technologies facilitate three-dimensional visualization, procedural skill development, and collaborative engagement, whereas VLMs support report generation, literature retrieval, and decision guidance. Together, they offer a synergistic model for perceptual and cognitive augmentation. Key challenges include technical maturity, interoperability, workflow integration, regulatory compliance, and ethical oversight. Figures illustrate representative examples of (1) remote collaborative immersive evaluation and (2) integration of immersive visualization with VLM-based copilots, highlighting potential applications in training and workflow support. The CENR underscores the potential of combining immersive tools and AI copilots to support cytopathology, particularly for education, workflow efficiency, and cognitive augmentation. Adoption should be incremental and carefully governed, emphasizing augmentative rather than transformative use. Future research should focus on clinical validation, scalable integration, and regulatory and ethical frameworks to realize the concept of the augmented cytopathologist in practice.
Full article
(This article belongs to the Topic Artificial Intelligence in Public Health: Current Trends and Future Possibilities, 2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Design and Development of an Automated Pipeline for Medical Hyperspectral Image Acquisition, Processing, and Fusion
by
Felix Wühler, Tim Markus Häußermann, Alessa Rache, Björn van Marwick, Carmen Wängler, Julian Reichwald and Matthias Rädle
J. Imaging 2026, 12(3), 99; https://doi.org/10.3390/jimaging12030099 - 25 Feb 2026
Abstract
Automated and comprehensive processing of hyperspectral image data is increasingly important in academic research and medical technology. This study presents an automated processing pipeline that integrates hyperspectral image acquisition, analysis, multimodal fusion, and centralized data management to improve the interpretability of spectral information
[...] Read more.
Automated and comprehensive processing of hyperspectral image data is increasingly important in academic research and medical technology. This study presents an automated processing pipeline that integrates hyperspectral image acquisition, analysis, multimodal fusion, and centralized data management to improve the interpretability of spectral information for biological tissue analysis. The pipeline supports modular hyperspectral data processing, fusion of complementary wavelength ranges, and scalable data storage, and was implemented in Python 3.13.3. The pipeline was evaluated using hyperspectral imaging data acquired from a coronal mouse brain section. Clustering-based analysis and spectral correlation metrics were applied to assess the impact of multimodal data fusion on spectral representation. Clustering of individual modalities yielded silhouette coefficients of 0.5879 for near-infrared data, 0.6020 for mid-infrared data, and 0.6715 for RGB data. Multimodal fusion reduced the silhouette coefficient to 0.5420 and enabled the identification of anatomical structures that were not distinguishable in any single modality. High spectral correlation coefficients exceeding 0.98 confirmed that spectral fidelity was preserved during fusion. These results demonstrate that automated multimodal hyperspectral data fusion can enhance the interpretability of biological tissue despite reduced clustering compactness. The proposed pipeline provides a structured framework for preclinical hyperspectral imaging workflows and supports exploratory biological analysis in medical imaging contexts.
Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Hybrid Vision Transformer–CNN Framework for Alzheimer’s Disease Cell Type Classification: A Comparative Study with Vision–Language Models
by
Md Easin Hasan, Md Tahmid Hasan Fuad, Omar Sharif and Amy Wagler
J. Imaging 2026, 12(3), 98; https://doi.org/10.3390/jimaging12030098 - 25 Feb 2026
Abstract
Accurate identification of Alzheimer’s disease (AD)-related cellular characteristics from microscopy images is essential for understanding neurodegenerative mechanisms at the cellular level. While most computational approaches focus on macroscopic neuroimaging modalities, cell type classification from microscopy remains relatively underexplored. In this study, we propose
[...] Read more.
Accurate identification of Alzheimer’s disease (AD)-related cellular characteristics from microscopy images is essential for understanding neurodegenerative mechanisms at the cellular level. While most computational approaches focus on macroscopic neuroimaging modalities, cell type classification from microscopy remains relatively underexplored. In this study, we propose a hybrid vision transformer–convolutional neural network (ViT–CNN) framework that integrates DeiT-Small and EfficientNet-B7 to classify three AD-related cell types—astrocytes, cortical neurons, and SH-SY5Y neuroblastoma cells—from phase-contrast microscopy images. We perform a comparative evaluation against conventional CNN architectures (DenseNet, ResNet, InceptionNet, and MobileNet) and prompt-based multimodal vision–language models (GPT-5, GPT-4o, and Gemini 2.5-Flash) using zero-shot, few-shot, and chain-of-thought prompting. Experiments conducted with stratified fivefold cross-validation show that the proposed hybrid model achieves a test accuracy of 61.03% and a macro F1 score of 61.85, outperforming standalone CNN baselines and prompt-only LLM approaches under data-limited conditions. These results suggest that combining convolutional inductive biases with transformer-based global context modeling can improve generalization for cellular microscopy classification. While constrained by dataset size and scope, this work serves as a proof of concept and highlights promising directions for future research in domain-specific pretraining, multimodal data integration, and explainable AI for AD-related cellular analysis.
Full article
(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
A Deep Learning-Based Correction for Scanning Radius Errors in Circular-Scan Photoacoustic Tomography
by
Jie Yin, Yingjie Feng, Junjun He, Min Xie and Chao Tao
J. Imaging 2026, 12(3), 97; https://doi.org/10.3390/jimaging12030097 - 25 Feb 2026
Abstract
Circular-Scan photoacoustic tomography (PAT) can provide high-resolution images of optical absorption, but its analytical reconstructions, such as delay-and-sum (DAS), are highly sensitive to scanning radius (SR) inaccuracies, which cause severe geometric distortions and artifacts. In this work, we propose a deep learning framework,
[...] Read more.
Circular-Scan photoacoustic tomography (PAT) can provide high-resolution images of optical absorption, but its analytical reconstructions, such as delay-and-sum (DAS), are highly sensitive to scanning radius (SR) inaccuracies, which cause severe geometric distortions and artifacts. In this work, we propose a deep learning framework, termed smooth deconvolution ResNet (SD-ResNet), to correct DAS reconstruction degradation induced by SR errors. SD-ResNet uses an ImageNet-pretrained ResNet-50 encoder and a lightweight deconvolutional decoder with additional smoothing convolutions to suppress checkerboard artifacts and restore fine structural details. A paired training dataset is generated using k-Wave simulations driven by human thoracic computed tomography (CT) slices: for each phantom, radiofrequency data are simulated once, and DAS images reconstructed with the true SR serve as ground truth, whereas images reconstructed with biased SR values serve as inputs. This design provides structurally diverse training samples and enhances generalization. In silico experiments show that SD-ResNet effectively recovers image quality across a range of SR deviations. Phantom experiments with polyethylene microspheres further confirm that the proposed method can substantially reduce artifacts and recover correct source shapes under practical SR mismatches, offering a robust tool for SR-error-resilient PAT imaging.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Recognition, Localization and 3D Geometric Morphology Calculation of Microblind Holes in Complex Backgrounds Based on the Improved YOLOv11 Network and AVC Algorithm
by
Chengfen Zhang, Dong Xia, Ruizhao Chen, Qunfeng Niu, Tao Wang and Li Wang
J. Imaging 2026, 12(3), 96; https://doi.org/10.3390/jimaging12030096 - 24 Feb 2026
Abstract
Microblind hole processing quality inspection, especially accurately identifying microblind hole contour features and precisely detecting 3D and morphological parameters, has always been challenging, especially for accurately identifying those of different sizes, depths, and contour features simultaneously. This poses a great challenge for identifying
[...] Read more.
Microblind hole processing quality inspection, especially accurately identifying microblind hole contour features and precisely detecting 3D and morphological parameters, has always been challenging, especially for accurately identifying those of different sizes, depths, and contour features simultaneously. This poses a great challenge for identifying and localizing microblind hole contours based on machine vision and accurately calculating three-dimensional parameters. This study takes cigarette microblind holes (diameter of 0.1–0.2 mm, depth of approximately 35 µm) as the research object. It focuses on solving two major challenges: recognizing and localizing microblind hole contours in complex texture backgrounds and accurately calculating their 3D geometric morphology. An improved YOLOv11s model is proposed for microblind hole image multiobject detection with complex texture backgrounds to extract their features completely. An Area–Volume Computation (AVC) algorithm, which utilizes discrete integral estimation and curve-fitting principles, is also proposed for computing their surface area and volume. The experimental results show that the precision, recall, mAP@0.5, mAP@0.5:0.95, and prediction time of the improved YOLOv11 network are 0.915, 0.948, 0.925, 0.615, and 1.27 ms, respectively. The relative errors (REs) of the surface area and volume calculation of the microblind holes are 5.236% and 3.964%, respectively. The proposed method achieves microblind hole recognition, localization and 3D morphology calculation accuracy, meeting cigarette on-site inspection criteria. Additionally, a reference for detecting other similar objects in complex texture backgrounds and accurately calculating 3D tasks is provided.
Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Hybrid MICO-LAC Segmentation with Panoptic Tumor Instance Analysis for Dense Breast Mammograms
by
Razia Jamil, Min Dong, Orken Mamyrbayev and Ainur Akhmediyarova
J. Imaging 2026, 12(3), 95; https://doi.org/10.3390/jimaging12030095 - 24 Feb 2026
Abstract
This study proposes a clinically driven hybrid segmentation framework for dense breast tissue analysis in mammographic images, addressing persistent challenges associated with intensity inhomogeneity, low-contrast, and complex tumor morphology. The framework integrates Multiplicative Intrinsic Component Optimization (MICO_2D) for bias field correction, followed by
[...] Read more.
This study proposes a clinically driven hybrid segmentation framework for dense breast tissue analysis in mammographic images, addressing persistent challenges associated with intensity inhomogeneity, low-contrast, and complex tumor morphology. The framework integrates Multiplicative Intrinsic Component Optimization (MICO_2D) for bias field correction, followed by a distance-regularized multiphase Vese–Chan level-set model for coarse global tumor segmentation. To achieve precise boundary delineation, a localized refinement stage is employed using Localized Active Contours (LAC) with Local Image Fitting (LIF) energy, supported by Gaussian regularization to ensure smooth and coherent boundaries in regions with ambiguous tissue transitions. Building upon the refined semantic tumor mask, the framework further incorporates a panoptic-style tumor instance segmentation stage, enabling the decomposition of connected tumor regions into distinct anatomical instances, which were evaluated on both MIAS and INBreast mammography datasets to demonstrate generalizability. This extension facilitates detailed structural analysis of tumor multiplicity and spatial organization, enhancing interpretability beyond conventional pixel wise segmentation. Experiments conducted on Cranio-Caudal (CC) and Medio-Lateral Oblique (MLO) mammographic views demonstrate competitive performance relative to baseline U-Net and advanced deep learning fusion architectures, including multi-scale and multi-view networks, while offering improved interpretability and robustness. Quantitative evaluation using overlap-related metrics shows strong spatial agreement between predicted and reference segmentations, with per-image Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) distributions reported to ensure reproducibility. Descriptive per-image analysis, supported by bootstrap-based confidence intervals and paired comparisons, indicates consistent performance improvements across images. Robustness analysis under realistic perturbations, including noise, contrast degradation, blur, and rotation, demonstrates stable performance across varying imaging conditions. Furthermore, feature space visualizations using t-SNE and UMAP reveal clear separability between cancerous and non-cancerous tissue regions, highlighting the discriminative capability of the proposed framework. Overall, the results demonstrate the effectiveness, robustness, and clinical motivation of this hybrid panoptic framework for comprehensive dense breast tumor analysis in mammography, while emphasizing reproducibility and conservative statistical assessment.
Full article
(This article belongs to the Special Issue Current Progress in Medical Image Segmentation)
►▼
Show Figures

Figure 1
Open AccessArticle
Towards Lightweight and Multi-Scale Scene Classification: A Lie Group-Guided Deep Learning Network with Collaborative Attention
by
Xuefei Xu and Chengjun Xu
J. Imaging 2026, 12(3), 94; https://doi.org/10.3390/jimaging12030094 - 24 Feb 2026
Abstract
Remote sensing scene classification (RSSC) plays a crucial role in Earth observation. Current deep learning methods, while accurate, tend to focus on high-level semantic features and overlook complementary shallow details such as edges and textures. Moreover, conventional CNNs are limited by fixed receptive
[...] Read more.
Remote sensing scene classification (RSSC) plays a crucial role in Earth observation. Current deep learning methods, while accurate, tend to focus on high-level semantic features and overlook complementary shallow details such as edges and textures. Moreover, conventional CNNs are limited by fixed receptive fields, whereas transformers incur high computational costs. To address these limitations, we propose the Lie Group lightweight multi-scale network (LGLMNet), a lightweight multi-scale network that integrates Lie Group covariance features. It employs a dual-branch architecture combining Lie Group machine learning (LGML) for shallow feature extraction and a deep learning branch for high-level semantics. In the deep branch, we design a parallel depthwise separable convolution block (PDSCB) for multi-scale perception and a spatial-channel collaborative attention mechanism (SCCA) for efficient global–local modeling. A cross-layer feature fusion block (CLFFB) effectively merges the two branches. Compared with state-of-the-art methods, the proposed LGLMNet achieves accuracy improvements of 2.14%, 2.32%, and 1.12% on UCM-21, AID, and NWPU-45 datasets, respectively, while maintaining a lightweight structure with only 2.6 M parameters.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
LTPNet: Lesion-Aware Triple-Path Feature Fusion Network for Skin Lesion Segmentation
by
Yange Sun, Sen Chen, Huaping Guo, Li Zhang, Hongzhou Yue and Yan Feng
J. Imaging 2026, 12(3), 93; https://doi.org/10.3390/jimaging12030093 - 24 Feb 2026
Abstract
Skin lesion segmentation has achieved notable progress in recent years; however, accurate delineation remains challenging due to complex backgrounds, ambiguous boundaries, and low lesion-to-skin contrast. To address these issues, we propose the lesion-aware triple-path feature fusion network (LTPNet), an end-to-end framework that progressively
[...] Read more.
Skin lesion segmentation has achieved notable progress in recent years; however, accurate delineation remains challenging due to complex backgrounds, ambiguous boundaries, and low lesion-to-skin contrast. To address these issues, we propose the lesion-aware triple-path feature fusion network (LTPNet), an end-to-end framework that progressively processes features through extraction, refinement, and aggregation stages. In the extraction stage, we incorporate a general foreground–background attention to suppress background interference and accelerate model convergence. In the refinement stage, we introduce an attentive spatial modulator (ASM) to jointly exploit local structural cues and global semantic context for precise spatial modulation. We further develop a lesion-aware lite-gate attention (LALGA) module that performs local spatial feature modulation and global channel recalibration tailored to lesion characteristics. In the aggregation stage, we propose a triple-path feature fusion (TPFF) module that explicitly models feature relationships across scales via three complementary pathways: a common path (CP) for semantic consistency, a saliency path (SP) for highlighting co-activated regions, and a difference path (DP) for accentuating structural discrepancies. Extensive experiments on in-domain and cross-domain datasets show that LTPNet achieves superior segmentation accuracy with reasonable inference efficiency and model complexity, demonstrating its potential for efficient and reliable clinical decision support.
Full article
(This article belongs to the Special Issue Computer Vision for Medical Image Analysis)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
6 November 2025
MDPI Launches the Michele Parrinello Award for Pioneering Contributions in Computational Physical Science
MDPI Launches the Michele Parrinello Award for Pioneering Contributions in Computational Physical Science
9 October 2025
Meet Us at the 3rd International Conference on AI Sensors and Transducers, 2–7 August 2026, Jeju, South Korea
Meet Us at the 3rd International Conference on AI Sensors and Transducers, 2–7 August 2026, Jeju, South Korea
Topics
Topic in
AI, Applied Sciences, Bioengineering, Healthcare, IJERPH, JCM, Clinics and Practice, J. Imaging
Artificial Intelligence in Public Health: Current Trends and Future Possibilities, 2nd EditionTopic Editors: Daniele Giansanti, Giovanni CostantiniDeadline: 15 March 2026
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Translational Preclinical Imaging: Techniques, Applications and Perspectives
Guest Editors: Sara Gargiulo, Sandra AlbaneseDeadline: 31 March 2026
Special Issue in
J. Imaging
AI-Driven Advances in Computational Pathology
Guest Editors: Yuming Jiang, Wencheng Li, Xiaorui LiuDeadline: 31 March 2026
Special Issue in
J. Imaging
Artificial Intelligence for Medical Imaging and Applications
Guest Editors: Wen Tang, Jinhua LiuDeadline: 31 March 2026
Special Issue in
J. Imaging
Infrared Image Processing with Artificial Intelligence: Progress and Challenges
Guest Editors: Ruiheng Zhang, Na Li, Yaokun Xu, Xiaolin Han, Zhe CaoDeadline: 31 March 2026



