Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques, published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
A Unified Complex-Fresnel Model for Physically Based Long-Wave Infrared Imaging and Simulation
J. Imaging 2026, 12(1), 33; https://doi.org/10.3390/jimaging12010033 - 7 Jan 2026
Abstract
Accurate modelling of reflection, transmission, absorption, and emission at material interfaces is essential for infrared imaging, rendering, and the simulation of optical and sensing systems. This need is particularly pronounced across the short-wave to long-wave infrared (SWIR–LWIR) spectrum, where many materials exhibit dispersion-
[...] Read more.
Accurate modelling of reflection, transmission, absorption, and emission at material interfaces is essential for infrared imaging, rendering, and the simulation of optical and sensing systems. This need is particularly pronounced across the short-wave to long-wave infrared (SWIR–LWIR) spectrum, where many materials exhibit dispersion- and wavelength-dependent attenuation described by complex refractive indices. In this work, we introduce a unified formulation of the full Fresnel equations that directly incorporates wavelength-dependent complex refractive-index data and provides physically consistent interface behaviour for both dielectrics and conductors. The approach reformulates the classical Fresnel expressions to eliminate sign ambiguities and numerical instabilities, resulting in a stable evaluation across incidence angles and for strongly absorbing materials. We demonstrate the model through spectral-rendering simulations that illustrate realistic reflectance and transmittance behaviour for materials with different infrared optical properties. To assess its suitability for thermal-infrared applications, we also compare the simulated long-wave emission of a heated glass sphere with measurements from a LWIR camera. The agreement between measured and simulated radiometric trends indicates that the proposed formulation offers a practical and physically grounded tool for wavelength-parametric interface modelling in infrared imaging, supporting applications in spectral rendering, synthetic data generation, and infrared system analysis.
Full article
(This article belongs to the Section Visualization and Computer Graphics)
Open AccessArticle
Hybrid Skeleton-Based Motion Templates for Cross-View and Appearance-Robust Gait Recognition
by
João Ferreira Nunes, Pedro Miguel Moreira and João Manuel R. S. Tavares
J. Imaging 2026, 12(1), 32; https://doi.org/10.3390/jimaging12010032 - 7 Jan 2026
Abstract
Gait recognition methods based on silhouette templates, such as the Gait Energy Image (GEI), achieve high accuracy under controlled conditions but often degrade when appearance varies due to viewpoint, clothing, or carried objects. In contrast, skeleton-based approaches provide interpretable motion cues but remain
[...] Read more.
Gait recognition methods based on silhouette templates, such as the Gait Energy Image (GEI), achieve high accuracy under controlled conditions but often degrade when appearance varies due to viewpoint, clothing, or carried objects. In contrast, skeleton-based approaches provide interpretable motion cues but remain sensitive to pose-estimation noise. This work proposes two compact 2D skeletal descriptors—Gait Skeleton Images (GSIs)—that encode 3D joint trajectories into line-based and joint-based static templates compatible with standard 2D CNN architectures. A unified processing pipeline is introduced, including skeletal topology normalization, rigid view alignment, orthographic projection, and pixel-level rendering. Core design factors are analyzed on the GRIDDS dataset, where depth-based 3D coordinates provide stable ground truth for evaluating structural choices and rendering parameters. An extensive evaluation is then conducted on the widely used CASIA-B dataset, using 3D coordinates estimated via human pose estimation, to assess robustness under viewpoint, clothing, and carrying covariates. Results show that although GEIs achieve the highest same-view accuracy, GSI variants exhibit reduced degradation under appearance changes and demonstrate greater stability under severe cross-view conditions. These findings indicate that compact skeletal templates can complement appearance-based descriptors and may benefit further from continued advances in 3D human pose estimation.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Deep Learning-Assisted Autofocus for Aerial Cameras in Maritime Photography
by
Haiying Liu, Yingchao Li, Shilong Xu, Haoyu Wang, Qiang Fu and Huilin Jiang
J. Imaging 2026, 12(1), 31; https://doi.org/10.3390/jimaging12010031 - 7 Jan 2026
Abstract
To address the unreliable autofocus problem of drone-mounted visible-light aerial cameras in low-contrast maritime environments, this paper proposes an autofocus system that combines deep-learning-based coarse focusing with traditional search-based fine adjustment. The system uses a built-high-contrast resolution test chart as the signal source.
[...] Read more.
To address the unreliable autofocus problem of drone-mounted visible-light aerial cameras in low-contrast maritime environments, this paper proposes an autofocus system that combines deep-learning-based coarse focusing with traditional search-based fine adjustment. The system uses a built-high-contrast resolution test chart as the signal source. Images captured by the imaging sensor are fed into a lightweight convolutional neural network to regress the defocus distance, enabling fast focus positioning. This avoids the weak signal and inaccurate focusing often encountered when adjusting focus directly on low-contrast sea surfaces. In the fine-focusing stage, a hybrid strategy integrating hill-climbing search and inverse correction is adopted. By evaluating the image sharpness function, the system accurately locks onto the optimal focal plane, forming intelligent closed-loop control. Experiments show that this method, which combines imaging of the built-in calibration target with deep-learning-based coarse focusing, significantly improves focusing efficiency. Compared with traditional full-range search strategies, the focusing speed is increased by approximately 60%. While ensuring high accuracy and strong adaptability, the proposed approach effectively enhances the overall imaging performance of aerial cameras in low-contrast maritime conditions.
Full article
(This article belongs to the Section Computational Imaging and Computational Photography)
Open AccessArticle
From Visual to Multimodal: Systematic Ablation of Encoders and Fusion Strategies in Animal Identification
by
Vasiliy Kudryavtsev, Kirill Borodin, German Berezin, Kirill Bubenchikov, Grach Mkrtchian and Alexander Ryzhkov
J. Imaging 2026, 12(1), 30; https://doi.org/10.3390/jimaging12010030 - 7 Jan 2026
Abstract
Automated animal identification is a practical task for reuniting lost pets with their owners, yet current systems often struggle due to limited dataset scale and reliance on unimodal visual cues. This study introduces a multimodal verification framework that enhances visual features with semantic
[...] Read more.
Automated animal identification is a practical task for reuniting lost pets with their owners, yet current systems often struggle due to limited dataset scale and reliance on unimodal visual cues. This study introduces a multimodal verification framework that enhances visual features with semantic identity priors derived from synthetic textual descriptions. We constructed a massive training corpus of 1.9 million photographs covering 695,091 unique animals to support this investigation. Through systematic ablation studies, we identified SigLIP2-Giant and E5-Small-v2 as the optimal vision and text backbones. We further evaluated fusion strategies ranging from simple concatenation to adaptive gating to determine the best method for integrating these modalities. Our proposed approach utilizes a gated fusion mechanism and achieved a Top-1 accuracy of 84.28% and an Equal Error Rate of 0.0422 on a comprehensive test protocol. These results represent an 11% improvement over leading unimodal baselines and demonstrate that integrating synthesized semantic descriptions significantly refines decision boundaries in large-scale pet re-identification.
Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
Open AccessArticle
DynMultiDep: A Dynamic Multimodal Fusion and Multi-Scale Time Series Modeling Approach for Depression Detection
by
Jincheng Li, Menglin Zheng, Jiongyi Yang, Yihui Zhan and Xing Xie
J. Imaging 2026, 12(1), 29; https://doi.org/10.3390/jimaging12010029 - 6 Jan 2026
Abstract
Depression is a prevalent mental disorder that imposes a significant public health burden worldwide. Although multimodal detection methods have shown potential, existing techniques still face two critical bottlenecks: (i) insufficient integration of global patterns and local fluctuations in long-sequence modeling and (ii) static
[...] Read more.
Depression is a prevalent mental disorder that imposes a significant public health burden worldwide. Although multimodal detection methods have shown potential, existing techniques still face two critical bottlenecks: (i) insufficient integration of global patterns and local fluctuations in long-sequence modeling and (ii) static fusion strategies that fail to dynamically adapt to the complementarity and redundancy among modalities. To address these challenges, this paper proposes a dynamic multimodal depression detection framework, DynMultiDep, which combines multi-scale temporal modeling with an adaptive fusion mechanism. The core innovations of DynMultiDep lie in its Multi-scale Temporal Experts Module (MTEM) and Dynamic Multimodal Fusion module (DynMM). On one hand, MTEM employs Mamba experts to extract long-term trend features and utilizes local-window Transformers to capture short-term dynamic fluctuations, achieving adaptive fusion through a long-short routing mechanism. On the other hand, DynMM introduces modality-level and fusion-level dynamic decision-making, selecting critical modality paths and optimizing cross-modal interaction strategies based on input characteristics. The experimental results demonstrate that DynMultiDep outperforms existing state-of-the-art methods in detection performance on two widely used large-scale depression datasets.
Full article
(This article belongs to the Special Issue Towards Deeper Understanding of Image and Video Processing and Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Ultrashort Echo Time Quantitative Susceptibility Source Separation in Musculoskeletal System: A Feasibility Study
by
Sam Sedaghat, Jin Il Park, Eddie Fu, Annette von Drygalski, Yajun Ma, Eric Y. Chang, Jiang Du, Lorenzo Nardo and Hyungseok Jang
J. Imaging 2026, 12(1), 28; https://doi.org/10.3390/jimaging12010028 - 6 Jan 2026
Abstract
This study aims to demonstrate the feasibility of ultrashort echo time (UTE)-based susceptibility source separation for musculoskeletal (MSK) imaging, enabling discrimination between diamagnetic and paramagnetic tissue components, with a particular focus on hemophilic arthropathy (HA). Three key techniques were integrated to achieve UTE-based
[...] Read more.
This study aims to demonstrate the feasibility of ultrashort echo time (UTE)-based susceptibility source separation for musculoskeletal (MSK) imaging, enabling discrimination between diamagnetic and paramagnetic tissue components, with a particular focus on hemophilic arthropathy (HA). Three key techniques were integrated to achieve UTE-based susceptibility source separation: Iterative decomposition of water and fat with echo asymmetry and least-squares estimation for B0 field estimation, projection onto dipole fields for local field mapping, and χ-separation for quantitative susceptibility mapping (QSM) with source decomposition. A phantom containing varying concentrations of diamagnetic (CaCO3) and paramagnetic (Fe3O4) materials was used to validate the method. In addition, in vivo UTE-QSM scans of the knees and ankles were performed on five HA patients using a 3T clinical MRI scanner. In the phantom, conventional QSM underestimated susceptibility values due to the mixed-source cancelling the effect. In contrast, source-separated maps provided distinct diamagnetic and paramagnetic susceptibility values that correlated strongly with CaCO3 and Fe3O4 concentrations (r = −0.99 and 0.95, p < 0.05). In vivo, paramagnetic maps enabled improved visualization of hemosiderin deposits in joints of HA patients, which were poorly visualized or obscured in conventional QSM due to susceptibility cancellation by surrounding diamagnetic tissues such as bone. This study demonstrates, for the first time, the feasibility of UTE-based quantitative susceptibility source separation for MSK applications. The approach enhances the detection of paramagnetic substances like hemosiderin in HA and offers potential for improved assessment of bone and joint tissue composition.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Vision-Based People Counting and Tracking for Urban Environments
by
Daniyar Nurseitov, Kairat Bostanbekov, Nazgul Toiganbayeva, Aidana Zhalgas, Didar Yedilkhan and Beibut Amirgaliyev
J. Imaging 2026, 12(1), 27; https://doi.org/10.3390/jimaging12010027 - 5 Jan 2026
Abstract
Population growth and expansion of urban areas increase the need for the introduction of intelligent passenger traffic monitoring systems. Accurate estimation of the number of passengers is an important condition for improving the efficiency, safety and quality of transport services. This paper proposes
[...] Read more.
Population growth and expansion of urban areas increase the need for the introduction of intelligent passenger traffic monitoring systems. Accurate estimation of the number of passengers is an important condition for improving the efficiency, safety and quality of transport services. This paper proposes an approach to the automatic detection and counting of people using computer vision and deep learning methods. While YOLOv8 and DeepSORT have been widely explored individually, our contribution lies in a task-specific modification of the DeepSORT tracking pipeline, optimized for dense passenger environments, strong occlusions, and dynamic lighting, as well as in a unified architecture that integrates detection, tracking, and automatic event-log generation. Our new proprietary dataset of 4047 images and 8918 labeled objects has achieved 92% detection accuracy and 85% counting accuracy, which confirms the effectiveness of the solution. Compared to Mask R-CNN and DETR, the YOLOv8 model demonstrates an optimal balance between speed, accuracy, and computational efficiency. The results confirm that computer vision can become an efficient and scalable replacement for traditional sensory passenger counting systems. The developed architecture (YOLO + Tracking) combines recognition, tracking and counting of people into a single system that automatically generates annotated video streams and event logs. In the future, it is planned to expand the dataset, introduce support for multicamera integration, and adapt the model for embedded devices to improve the accuracy and energy efficiency of the solution in real-world conditions.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
A Hierarchical Multi-Resolution Self-Supervised Framework for High-Fidelity 3D Face Reconstruction Using Learnable Gabor-Aware Texture Modeling
by
Pichet Mareo and Rerkchai Fooprateepsiri
J. Imaging 2026, 12(1), 26; https://doi.org/10.3390/jimaging12010026 - 5 Jan 2026
Abstract
High-fidelity 3D face reconstruction from a single image is challenging, owing to the inherently ambiguous depth cues and the strong entanglement of multi-scale facial textures. In this regard, we propose a hierarchical multi-resolution self-supervised framework (HMR-Framework), which reconstructs coarse-, medium-, and fine-scale facial
[...] Read more.
High-fidelity 3D face reconstruction from a single image is challenging, owing to the inherently ambiguous depth cues and the strong entanglement of multi-scale facial textures. In this regard, we propose a hierarchical multi-resolution self-supervised framework (HMR-Framework), which reconstructs coarse-, medium-, and fine-scale facial geometry progressively through a unified pipeline. A coarse geometric prior is first estimated via 3D morphable model regression, followed by medium-scale refinement using a vertex deformation map constrained by a global–local Markov random field loss to preserve structural coherence. In order to improve fine-scale fidelity, a learnable Gabor-aware texture enhancement module has been proposed to decouple spatial–frequency information and thus improve sensitivity for high-frequency facial attributes. Additionally, we employ a wavelet-based detail perception loss to preserve the edge-aware texture features while mitigating noise commonly observed in in-the-wild images. Extensive qualitative and quantitative evaluation of benchmark datasets indicate that the proposed framework provides better fine-detail reconstruction than existing state-of-the-art methods, while maintaining robustness over pose variations. Notably, the hierarchical design increases semantic consistency across multiple geometric scales, providing a functional solution for high-fidelity 3D face reconstruction from monocular images.
Full article
(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
A Slicer-Independent Framework for Measuring G-Code Accuracy in Medical 3D Printing
by
Michel Beyer, Alexandru Burde, Andreas E. Roser, Maximiliane Beyer, Sead Abazi and Florian M. Thieringer
J. Imaging 2026, 12(1), 25; https://doi.org/10.3390/jimaging12010025 - 4 Jan 2026
Abstract
In medical 3D printing, accuracy is critical for fabricating patient-specific implants and anatomical models. Although printer performance has been widely examined, the influence of slicing software on geometric fidelity is less frequently quantified. The slicing step, which converts STL files into printer-readable G-code,
[...] Read more.
In medical 3D printing, accuracy is critical for fabricating patient-specific implants and anatomical models. Although printer performance has been widely examined, the influence of slicing software on geometric fidelity is less frequently quantified. The slicing step, which converts STL files into printer-readable G-code, may introduce deviations that affect the final printed object. To quantify slicer-induced G-code deviations by comparing G-code-derived geometries with their reference STL modelsTwenty mandibular models were processed using five slicers (PrusaSlicer (version 2.9.1.), Cura (version 5.2.2.), Simplify3D (version 4.1.2.), Slic3r (version 1.3.0.) and Fusion 360 (version 2.0.19725)). A custom Python workflow converted the G-code into point clouds and reconstructed STL meshes through XY and Z corrections, marching cubes surface extraction, and volumetric extrusion. A calibration object enabled coordinate normalization across slicers. Accuracy was assessed using Mean Surface Distance (MSD), Root Mean Square (RMS) deviation, and Volume Difference. MSD ranged from 0.071 to 0.095 mm, and RMS deviation from 0.084 to 0.113 mm, depending on the slicer. Volumetric differences were slicer-dependent. PrusaSlicer yielded the highest surface accuracy; Simplify3D and Slic3r showed best repeatability. Fusion 360 produced the largest deviations. The slicers introduced geometric deviations below 0.1 mm that represent a substantial proportion of the overall error in the FDM workflow.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
LLM-Based Pose Normalization and Multimodal Fusion for Facial Expression Recognition in Extreme Poses
by
Bohan Chen, Bowen Qu, Yu Zhou, Han Huang, Jianing Guo, Yanning Xian, Longxiang Ma, Jinxuan Yu and Jingyu Chen
J. Imaging 2026, 12(1), 24; https://doi.org/10.3390/jimaging12010024 - 4 Jan 2026
Abstract
Facial expression recognition (FER) technology has progressively matured over time. However, existing FER methods are primarily optimized for frontal face images, and their recognition accuracy significantly degrades when processing profile or large-angle rotated facial images. Consequently, this limitation hinders the practical deployment of
[...] Read more.
Facial expression recognition (FER) technology has progressively matured over time. However, existing FER methods are primarily optimized for frontal face images, and their recognition accuracy significantly degrades when processing profile or large-angle rotated facial images. Consequently, this limitation hinders the practical deployment of FER systems. To mitigate the interference caused by large pose variations and improve recognition accuracy, we propose a FER method based on profile-to-frontal transformation and multimodal learning. Specifically, we first leverage the visual understanding and generation capabilities of Qwen-Image-Edit that transform profile images to frontal viewpoints, preserving key expression features while standardizing facial poses. Second, we introduce the CLIP model to enhance the semantic representation capability of expression features through vision–language joint learning. The qualitative and quantitative experiments on the RAF (89.39%), EXPW (67.17%), and AffectNet-7 (62.66%) datasets demonstrate that our method outperforms the existing approaches.
Full article
(This article belongs to the Special Issue AI-Driven Image and Video Understanding)
►▼
Show Figures

Figure 1
Open AccessArticle
State of the Art of Remote Sensing Data: Gradient Pattern in Pseudocolor Composite Images
by
Alexey Terekhov, Ravil I. Mukhamediev and Igor Savin
J. Imaging 2026, 12(1), 23; https://doi.org/10.3390/jimaging12010023 - 4 Jan 2026
Abstract
The thematic processing of pseudocolor composite images, especially those created from remote sensing data, is of considerable interest. The set of spectral classes comprising such images is typically described by a nominal scale, meaning the absence of any predetermined relationships between the classes.
[...] Read more.
The thematic processing of pseudocolor composite images, especially those created from remote sensing data, is of considerable interest. The set of spectral classes comprising such images is typically described by a nominal scale, meaning the absence of any predetermined relationships between the classes. However, in many cases, images of this type may contain elements of a regular spatial order, one variant of which is a gradient structure. Gradient structures are characterized by a certain regular spatial ordering of spectral classes. Recognizing gradient patterns in the structure of pseudocolor composite images opens up new possibilities for deeper thematic images processing. This article describes an algorithm for analyzing the spatial structure of a pseudocolor composite image to identify gradient patterns. In this process, the initial nominal scale of spectral classes is transformed into a rank scale of the gradient legend. The algorithm is based on the analysis of Moore neighborhoods for each image pixel. This creates an array of the prevalence of all types of local binary patterns (the pixel’s nearest neighbors). All possible variants of the spectral class rank scale composition are then considered. The rank scale variant that describes the largest proportion of image pixels within its gradient order is used as a final result. The user can independently define the criteria for the significance of the gradient order in the analyzed image, focusing either on the overall statistics of the proportion of pixels consistent with the spatial structure of the selected gradient or on the statistics of a selected key image region. The proposed algorithm is illustrated using analysis of test examples.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Comparative Evaluation of Vision–Language Models for Detecting and Localizing Dental Lesions from Intraoral Images
by
Maria Jahan, Al Ibne Siam, Lamim Zakir Pronay, Saif Ahmed, Nabeel Mohammed, James Dudley and Taseef Hasan Farook
J. Imaging 2026, 12(1), 22; https://doi.org/10.3390/jimaging12010022 - 3 Jan 2026
Abstract
To assess the efficiency of vision–language models in detecting and classifying carious and non-carious lesions from intraoral photo imaging. A dataset of 172 annotated images were classified for microcavitation, cavitated lesions, staining, calculus, and non-carious lesions. Florence-2, PaLI-Gemma, and YOLOv8 models were trained
[...] Read more.
To assess the efficiency of vision–language models in detecting and classifying carious and non-carious lesions from intraoral photo imaging. A dataset of 172 annotated images were classified for microcavitation, cavitated lesions, staining, calculus, and non-carious lesions. Florence-2, PaLI-Gemma, and YOLOv8 models were trained on the dataset and model performance. The dataset was divided into 80:10:10 split, and the model performance was evaluated using mean average precision (mAP), mAP50-95, class-specific precision and recall. YOLOv8 outperformed the vision–language models, achieving a mean average precision (mAP) of 37% with a precision of 42.3% (with 100% for cavitation detection) and 31.3% recall. PaLI-Gemma produced a recall of 13% and 21%. Florence-2 yielded a mean average precision of 10% with a precision and recall was 51% and 35%. YOLOv8 achieved the strongest overall performance. Florence-2 and PaLI-Gemma models underperformed relative to YOLOv8 despite the potential for multimodal contextual understanding, highlighting the need for larger, more diverse datasets and hybrid architectures to achieve improved performance.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Multi-Temporal Shoreline Monitoring and Analysis in Bangkok Bay, Thailand, Using Remote Sensing and GIS Techniques
by
Yan Wang, Adisorn Sirikham, Jessada Konpang and Chunguang Li
J. Imaging 2026, 12(1), 21; https://doi.org/10.3390/jimaging12010021 - 1 Jan 2026
Abstract
Drastic alterations have been observed in the coastline of Bangkok Bay, Thailand, over the past three decades. Understanding how coastlines change plays a key role in developing strategies for coastal protection and sustainable resource utilization. This study investigates the temporal and spatial changes
[...] Read more.
Drastic alterations have been observed in the coastline of Bangkok Bay, Thailand, over the past three decades. Understanding how coastlines change plays a key role in developing strategies for coastal protection and sustainable resource utilization. This study investigates the temporal and spatial changes in the Bangkok Bay coastline, Thailand, using remote sensing and GIS techniques from 1989 to 2024. The historical rate of coastline change for a typical segment was analyzed using the EPR method, and the underlying causes of these changes were discussed. Finally, the variation trend of the total shoreline length and the characteristics of erosion and sedimentation for a typical shoreline in Bangkok Bay, Thailand, over the past 35 years were obtained. An overall increase in coastline length was observed in Bangkok Bay, Thailand, over the 35-year period from 1989 to 2024, with a net gain from 507.23 km to 571.38 km. The rate of growth has transitioned from rapid to slow, with the most significant changes occurring during the period 1989–1994. Additionally, the average and maximum erosion rates for the typical shoreline segment were notably high during 1989–1994, with values of −21.61 m/a and −55.49 m/a, respectively. The maximum sedimentation rate along the coastline was relatively high from 2014 to 2024, reaching 10.57 m/a. Overall, the entire coastline of the Samut Sakhon–Bangkok–Samut Prakan Provinces underwent net erosion from 1989 to 2024, driven by a confluence of natural and anthropogenic factors.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Object Detection on Road: Vehicle’s Detection Based on Re-Training Models on NVIDIA-Jetson Platform
by
Sleiter Ramos-Sanchez, Jinmi Lezama, Ricardo Yauri and Joyce Zevallos
J. Imaging 2026, 12(1), 20; https://doi.org/10.3390/jimaging12010020 - 1 Jan 2026
Abstract
The increasing use of artificial intelligence (AI) and deep learning (DL) techniques has driven advances in vehicle classification and detection applications for embedded devices with deployment constraints due to computational cost and response time. In the case of urban environments with high traffic
[...] Read more.
The increasing use of artificial intelligence (AI) and deep learning (DL) techniques has driven advances in vehicle classification and detection applications for embedded devices with deployment constraints due to computational cost and response time. In the case of urban environments with high traffic congestion, such as the city of Lima, it is important to determine the trade-off between model accuracy, type of embedded system, and the dataset used. This study was developed using a methodology adapted from the CRISP-DM approach, which included the acquisition of traffic videos in the city of Lima, their segmentation, and manual labeling. Subsequently, three SSD-based detection models (MobileNetV1-SSD, MobileNetV2-SSD-Lite, and VGG16-SSD) were trained on the NVIDIA Jetson Orin NX 16 GB platform. The results show that the VGG16-SSD model achieved the highest average precision (mAP ), with a longer training time, while the MobileNetV1-SSD ( ) model achieved comparable performance (mAP ) with a shorter time. Additionally, data augmentation through contrast adjustment improved the detection of minority classes such as Tuk-tuk and Motorcycle. The results indicate that, among the evaluated models, MobileNetV1-SSD ( ) achieved the best balance between accuracy and computational load for its implementation in ADAS embedded systems in congested urban environments.
Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Double-Gated Mamba Multi-Scale Adaptive Feature Learning Network for Unsupervised Single RGB Image Hyperspectral Image Reconstruction
by
Zhongmin Jiang, Zhen Wang, Wenju Wang and Jifan Zhu
J. Imaging 2026, 12(1), 19; https://doi.org/10.3390/jimaging12010019 - 31 Dec 2025
Abstract
Existing methods for reconstructing hyperspectral images from single RGB images struggle to obtain a large number of labeled RGB-HSI paired images. These methods face issues such as detail loss, insufficient robustness, low reconstruction accuracy, and the difficulty of balancing the spatial–spectral trade-off. To
[...] Read more.
Existing methods for reconstructing hyperspectral images from single RGB images struggle to obtain a large number of labeled RGB-HSI paired images. These methods face issues such as detail loss, insufficient robustness, low reconstruction accuracy, and the difficulty of balancing the spatial–spectral trade-off. To address these challenges, a Double-Gated Mamba Multi-Scale Adaptive Feature (DMMAF) learning network model is proposed. DMMAF designs a reflection dot-product adaptive dual-noise-aware feature extraction method, which is used to supplement edge detail information in spectral images and improve robustness. DMMAF also constructs a deformable attention-based global feature extraction method and a double-gated Mamba local feature extraction approach, enhancing the interaction between local and global information during the reconstruction process, thereby improving image accuracy. Meanwhile, DMMAF introduces a structure-aware smooth loss function, which, by combining smoothing, curvature, and attention supervision losses, effectively resolves the spatial–spectral resolution balance problem. This network model is applied to three datasets—NTIRE 2020, Harvard, and CAVE—achieving state-of-the-art unsupervised reconstruction performance compared to existing advanced algorithms. Experiments on the NTIRE 2020, Harvard, and CAVE datasets demonstrate that this model achieves state-of-the-art unsupervised reconstruction performance. On the NTIRE 2020 dataset, our method attains MRAE, RMSE, and PSNR values of 0.133, 0.040, and 31.314, respectively. On the Harvard dataset, it achieves RMSE and PSNR values of 0.025 and 34.955, respectively, while on the CAVE dataset, it achieves RMSE and PSNR values of 0.041 and 30.983, respectively.
Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Revisiting Underwater Image Enhancement for Object Detection: A Unified Quality–Detection Evaluation Framework
by
Ali Awad, Ashraf Saleem, Sidike Paheding, Evan Lucas, Serein Al-Ratrout and Timothy C. Havens
J. Imaging 2026, 12(1), 18; https://doi.org/10.3390/jimaging12010018 - 30 Dec 2025
Abstract
Underwater images often suffer from severe color distortion, low contrast, and reduced visibility, motivating the widespread use of image enhancement as a preprocessing step for downstream computer vision tasks. However, recent studies have questioned whether enhancement actually improves object detection performance. In this
[...] Read more.
Underwater images often suffer from severe color distortion, low contrast, and reduced visibility, motivating the widespread use of image enhancement as a preprocessing step for downstream computer vision tasks. However, recent studies have questioned whether enhancement actually improves object detection performance. In this work, we conduct a comprehensive and rigorous evaluation of nine state-of-the-art enhancement methods and their interactions with modern object detectors. We propose a unified evaluation framework that integrates (1) a distribution-level quality assessment using a composite quality index (Q-index), (2) a fine-grained per-image detection protocol based on COCO-style mAP, and (3) a mixed-set upper-bound analysis that quantifies the theoretical performance achievable through ideal selective enhancement. Our findings reveal that traditional image quality metrics do not reliably predict detection performance, and that dataset-level conclusions often overlook substantial image-level variability. Through per-image evaluation, we identify numerous cases in which enhancement significantly improves detection accuracy—primarily for low-quality inputs—while also demonstrating conditions under which enhancement degrades performance. The mixed-set analysis shows that selective enhancement can yield substantial gains over both original and fully enhanced datasets, establishing a new direction for designing enhancement models optimized for downstream vision tasks. This study provides the most comprehensive evidence to date that underwater image enhancement can be beneficial for object detection when evaluated at the appropriate granularity and guided by informed selection strategies. The data generated and code developed are publicly available.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessReview
Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks
by
Ichrak Khoulqi and Zakariae El Ouazzani
J. Imaging 2026, 12(1), 17; https://doi.org/10.3390/jimaging12010017 - 30 Dec 2025
Abstract
In this paper, we propose a literature review regarding two deep learning architectures, namely Convolutional Neural Networks (CNNs) and Capsule Networks (CapsNets), applied to medical images, in order to analyze them to help in medical decision support. CNNs demonstrate their capacity in the
[...] Read more.
In this paper, we propose a literature review regarding two deep learning architectures, namely Convolutional Neural Networks (CNNs) and Capsule Networks (CapsNets), applied to medical images, in order to analyze them to help in medical decision support. CNNs demonstrate their capacity in the medical diagnostic field; however, their reliability decreases when there is slight spatial variability, which can affect diagnosis, especially since the anatomical structure of the human body can differ from one patient to another. In contrast, CapsNets encode not only feature activation but also spatial relationships, hence improving the reliability and stability of model generalization. This paper proposes a structured comparison by reviewing studies published from 2018 to 2025 across major databases, including IEEE Xplore, ScienceDirect, SpringerLink, and MDPI. The applications in the reviewed papers are based on the benchmark datasets BraTS, INbreast, ISIC, and COVIDx. This paper review compares the core architectural principles, performance, and interpretability of both architectures. To conclude the paper, we underline the complementary roles of these two architectures in medical decision-making and propose future directions toward hybrid, explainable, and computationally efficient deep learning systems for real clinical environments, thereby increasing survival rates by helping prevent diseases at an early stage.
Full article
(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
FluoNeRF: Fluorescent Novel-View Synthesis Under Novel Light Source Colors and Spectra
by
Lin Shi, Kengo Matsufuji, Michitaka Yoshida, Ryo Kawahara and Takahiro Okabe
J. Imaging 2026, 12(1), 16; https://doi.org/10.3390/jimaging12010016 - 29 Dec 2025
Abstract
Synthesizing photo-realistic images of a scene from arbitrary viewpoints and under arbitrary lighting environments is one of the important research topics in computer vision and graphics. In this paper, we propose a method for synthesizing photo-realistic images of a scene with fluorescent objects
[...] Read more.
Synthesizing photo-realistic images of a scene from arbitrary viewpoints and under arbitrary lighting environments is one of the important research topics in computer vision and graphics. In this paper, we propose a method for synthesizing photo-realistic images of a scene with fluorescent objects from novel viewpoints and under novel lighting colors and spectra. In general, fluorescent materials absorb light with certain wavelengths and then emit light with longer wavelengths than the absorbed ones, in contrast to reflective materials, which preserve wavelengths of light. Therefore, we cannot reproduce the colors of fluorescent objects under arbitrary lighting colors by combining conventional view synthesis techniques with the white balance adjustment of the RGB channels. Accordingly, we extend the novel-view synthesis based on the neural radiance fields by incorporating the superposition principle of light; our proposed method captures a sparse set of images of a scene from varying viewpoints and under varying lighting colors or spectra with active lighting systems such as a color display or a multi-spectral light stage and then synthesizes photo-realistic images of the scene without explicitly modeling its geometric and photometric models. We conducted a number of experiments using real images captured with an LCD and confirmed that our method works better than the existing methods. Moreover, we showed that the extension of our method using more than three primary colors with a light stage enables us to reproduce the colors of fluorescent objects under common light sources.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
M3-TransUNet: Medical Image Segmentation Based on Spatial Prior Attention and Multi-Scale Gating
by
Zhigao Zeng, Jiale Xiao, Shengqiu Yi, Qiang Liu and Yanhui Zhu
J. Imaging 2026, 12(1), 15; https://doi.org/10.3390/jimaging12010015 - 29 Dec 2025
Abstract
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To
[...] Read more.
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To address these limitations, we propose the M3-TransUNet architecture, which incorporates three key innovations: (1) MSGA (Multi-Scale Gate Attention) and MSSA (Multi-Scale Selective Attention) modules to enhance multi-scale feature representation; (2) ME-MSA (Manhattan Enhanced Multi-Head Self-Attention) to integrate spatial priors into self-attention computations, thereby overcoming spatial modeling deficiencies; and (3) MKGAG (Multi-kernel Gated Attention Gate) to optimize skip connections by precisely filtering noise and preserving boundary details. Extensive experiments on public datasets—including Synapse, CVC-ClinicDB, and ISIC—demonstrate that M3-TransUNet achieves state-of-the-art performance. Specifically, on the Synapse dataset, our model outperforms recent TransUNet variants such as J-CAPA, improving the average DSC to 82.79% (compared to 82.29%) and significantly reducing the average HD95 from 19.74 mm to 10.21 mm.
Full article
(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Adaptive Normalization Enhances the Generalization of Deep Learning Model in Chest X-Ray Classification
by
Jatsada Singthongchai and Tanachapong Wangkhamhan
J. Imaging 2026, 12(1), 14; https://doi.org/10.3390/jimaging12010014 - 28 Dec 2025
Abstract
This study presents a controlled benchmarking analysis of min–max scaling, Z-score normalization, and an adaptive preprocessing pipeline that combines percentile-based ROI cropping with histogram standardization. The evaluation was conducted across four public chest X-ray (CXR) datasets and three convolutional neural network architectures under
[...] Read more.
This study presents a controlled benchmarking analysis of min–max scaling, Z-score normalization, and an adaptive preprocessing pipeline that combines percentile-based ROI cropping with histogram standardization. The evaluation was conducted across four public chest X-ray (CXR) datasets and three convolutional neural network architectures under controlled experimental settings. The adaptive pipeline generally improved accuracy, F1-score, and training stability on datasets with relatively stable contrast characteristics while yielding limited gains on MIMIC-CXR due to strong acquisition heterogeneity. Ablation experiments showed that histogram standardization provided the primary performance contribution, with ROI cropping offering complementary benefits, and the full pipeline achieving the best overall performance. The computational overhead of the adaptive preprocessing was minimal (+6.3% training-time cost; 5.2 ms per batch). Friedman–Nemenyi and Wilcoxon signed-rank tests confirmed that the observed improvements were statistically significant across most dataset–model configurations. Overall, adaptive normalization is positioned not as a novel algorithmic contribution, but as a practical preprocessing design choice that can enhance cross-dataset robustness and reliability in chest X-ray classification workflows.
Full article
(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Topic in
AI, Applied Sciences, Electronics, J. Imaging, Sensors, IJGI
State-of-the-Art Object Detection, Tracking, and Recognition Techniques
Topic Editors: Mang Ye, Jingwen Ye, Cuiqun ChenDeadline: 30 June 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Image Segmentation: Trends and Challenges
Guest Editor: Nikolaos MitianoudisDeadline: 31 January 2026
Special Issue in
J. Imaging
Progress and Challenges in Biomedical Image Analysis—2nd Edition
Guest Editors: Lei Li, Zehor BelkhatirDeadline: 31 January 2026
Special Issue in
J. Imaging
Computer Vision for Medical Image Analysis
Guest Editors: Rahman Attar, Le ZhangDeadline: 15 February 2026
Special Issue in
J. Imaging
Emerging Technologies for Less Invasive Diagnostic Imaging
Guest Editors: Francesca Angelone, Noemi Pisani, Armando RicciardiDeadline: 28 February 2026




