Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques, published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
A Slicer-Independent Framework for Measuring G-Code Accuracy in Medical 3D Printing
J. Imaging 2026, 12(1), 25; https://doi.org/10.3390/jimaging12010025 (registering DOI) - 4 Jan 2026
Abstract
In medical 3D printing, accuracy is critical for fabricating patient-specific implants and anatomical models. Although printer performance has been widely examined, the influence of slicing software on geometric fidelity is less frequently quantified. The slicing step, which converts STL files into printer-readable G-code,
[...] Read more.
In medical 3D printing, accuracy is critical for fabricating patient-specific implants and anatomical models. Although printer performance has been widely examined, the influence of slicing software on geometric fidelity is less frequently quantified. The slicing step, which converts STL files into printer-readable G-code, may introduce deviations that affect the final printed object. To quantify slicer-induced G-code deviations by comparing G-code-derived geometries with their reference STL modelsTwenty mandibular models were processed using five slicers (PrusaSlicer (version 2.9.1.), Cura (version 5.2.2.), Simplify3D (version 4.1.2.), Slic3r (version 1.3.0.) and Fusion 360 (version 2.0.19725)). A custom Python workflow converted the G-code into point clouds and reconstructed STL meshes through XY and Z corrections, marching cubes surface extraction, and volumetric extrusion. A calibration object enabled coordinate normalization across slicers. Accuracy was assessed using Mean Surface Distance (MSD), Root Mean Square (RMS) deviation, and Volume Difference. MSD ranged from 0.071 to 0.095 mm, and RMS deviation from 0.084 to 0.113 mm, depending on the slicer. Volumetric differences were slicer-dependent. PrusaSlicer yielded the highest surface accuracy; Simplify3D and Slic3r showed best repeatability. Fusion 360 produced the largest deviations. The slicers introduced geometric deviations below 0.1 mm that represent a substantial proportion of the overall error in the FDM workflow.
Full article
(This article belongs to the Section Medical Imaging)
►
Show Figures
Open AccessArticle
LLM-Based Pose Normalization and Multimodal Fusion for Facial Expression Recognition in Extreme Poses
by
Bohan Chen, Bowen Qu, Yu Zhou, Han Huang, Jianing Guo, Yanning Xian, Longxiang Ma, Jinxuan Yu and Jingyu Chen
J. Imaging 2026, 12(1), 24; https://doi.org/10.3390/jimaging12010024 (registering DOI) - 4 Jan 2026
Abstract
Facial expression recognition (FER) technology has progressively matured over time. However, existing FER methods are primarily optimized for frontal face images, and their recognition accuracy significantly degrades when processing profile or large-angle rotated facial images. Consequently, this limitation hinders the practical deployment of
[...] Read more.
Facial expression recognition (FER) technology has progressively matured over time. However, existing FER methods are primarily optimized for frontal face images, and their recognition accuracy significantly degrades when processing profile or large-angle rotated facial images. Consequently, this limitation hinders the practical deployment of FER systems. To mitigate the interference caused by large pose variations and improve recognition accuracy, we propose a FER method based on profile-to-frontal transformation and multimodal learning. Specifically, we first leverage the visual understanding and generation capabilities of Qwen-Image-Edit that transform profile images to frontal viewpoints, preserving key expression features while standardizing facial poses. Second, we introduce the CLIP model to enhance the semantic representation capability of expression features through vision–language joint learning. The qualitative and quantitative experiments on the RAF (89.39%), EXPW (67.17%), and AffectNet-7 (62.66%) datasets demonstrate that our method outperforms the existing approaches.
Full article
(This article belongs to the Special Issue AI-Driven Image and Video Understanding)
►▼
Show Figures

Figure 1
Open AccessArticle
State of the Art of Remote Sensing Data: Gradient Pattern in Pseudocolor Composite Images
by
Alexey Terekhov, Ravil I. Mukhamediev and Igor Savin
J. Imaging 2026, 12(1), 23; https://doi.org/10.3390/jimaging12010023 (registering DOI) - 4 Jan 2026
Abstract
The thematic processing of pseudocolor composite images, especially those created from remote sensing data, is of considerable interest. The set of spectral classes comprising such images is typically described by a nominal scale, meaning the absence of any predetermined relationships between the classes.
[...] Read more.
The thematic processing of pseudocolor composite images, especially those created from remote sensing data, is of considerable interest. The set of spectral classes comprising such images is typically described by a nominal scale, meaning the absence of any predetermined relationships between the classes. However, in many cases, images of this type may contain elements of a regular spatial order, one variant of which is a gradient structure. Gradient structures are characterized by a certain regular spatial ordering of spectral classes. Recognizing gradient patterns in the structure of pseudocolor composite images opens up new possibilities for deeper thematic images processing. This article describes an algorithm for analyzing the spatial structure of a pseudocolor composite image to identify gradient patterns. In this process, the initial nominal scale of spectral classes is transformed into a rank scale of the gradient legend. The algorithm is based on the analysis of Moore neighborhoods for each image pixel. This creates an array of the prevalence of all types of local binary patterns (the pixel’s nearest neighbors). All possible variants of the spectral class rank scale composition are then considered. The rank scale variant that describes the largest proportion of image pixels within its gradient order is used as a final result. The user can independently define the criteria for the significance of the gradient order in the analyzed image, focusing either on the overall statistics of the proportion of pixels consistent with the spatial structure of the selected gradient or on the statistics of a selected key image region. The proposed algorithm is illustrated using analysis of test examples.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Comparative Evaluation of Vision–Language Models for Detecting and Localizing Dental Lesions from Intraoral Images
by
Maria Jahan, Al Ibne Siam, Lamim Zakir Pronay, Saif Ahmed, Nabeel Mohammed, James Dudley and Taseef Hasan Farook
J. Imaging 2026, 12(1), 22; https://doi.org/10.3390/jimaging12010022 (registering DOI) - 3 Jan 2026
Abstract
To assess the efficiency of vision–language models in detecting and classifying carious and non-carious lesions from intraoral photo imaging. A dataset of 172 annotated images were classified for microcavitation, cavitated lesions, staining, calculus, and non-carious lesions. Florence-2, PaLI-Gemma, and YOLOv8 models were trained
[...] Read more.
To assess the efficiency of vision–language models in detecting and classifying carious and non-carious lesions from intraoral photo imaging. A dataset of 172 annotated images were classified for microcavitation, cavitated lesions, staining, calculus, and non-carious lesions. Florence-2, PaLI-Gemma, and YOLOv8 models were trained on the dataset and model performance. The dataset was divided into 80:10:10 split, and the model performance was evaluated using mean average precision (mAP), mAP50-95, class-specific precision and recall. YOLOv8 outperformed the vision–language models, achieving a mean average precision (mAP) of 37% with a precision of 42.3% (with 100% for cavitation detection) and 31.3% recall. PaLI-Gemma produced a recall of 13% and 21%. Florence-2 yielded a mean average precision of 10% with a precision and recall was 51% and 35%. YOLOv8 achieved the strongest overall performance. Florence-2 and PaLI-Gemma models underperformed relative to YOLOv8 despite the potential for multimodal contextual understanding, highlighting the need for larger, more diverse datasets and hybrid architectures to achieve improved performance.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Multi-Temporal Shoreline Monitoring and Analysis in Bangkok Bay, Thailand, Using Remote Sensing and GIS Techniques
by
Yan Wang, Adisorn Sirikham, Jessada Konpang and Chunguang Li
J. Imaging 2026, 12(1), 21; https://doi.org/10.3390/jimaging12010021 - 1 Jan 2026
Abstract
Drastic alterations have been observed in the coastline of Bangkok Bay, Thailand, over the past three decades. Understanding how coastlines change plays a key role in developing strategies for coastal protection and sustainable resource utilization. This study investigates the temporal and spatial changes
[...] Read more.
Drastic alterations have been observed in the coastline of Bangkok Bay, Thailand, over the past three decades. Understanding how coastlines change plays a key role in developing strategies for coastal protection and sustainable resource utilization. This study investigates the temporal and spatial changes in the Bangkok Bay coastline, Thailand, using remote sensing and GIS techniques from 1989 to 2024. The historical rate of coastline change for a typical segment was analyzed using the EPR method, and the underlying causes of these changes were discussed. Finally, the variation trend of the total shoreline length and the characteristics of erosion and sedimentation for a typical shoreline in Bangkok Bay, Thailand, over the past 35 years were obtained. An overall increase in coastline length was observed in Bangkok Bay, Thailand, over the 35-year period from 1989 to 2024, with a net gain from 507.23 km to 571.38 km. The rate of growth has transitioned from rapid to slow, with the most significant changes occurring during the period 1989–1994. Additionally, the average and maximum erosion rates for the typical shoreline segment were notably high during 1989–1994, with values of −21.61 m/a and −55.49 m/a, respectively. The maximum sedimentation rate along the coastline was relatively high from 2014 to 2024, reaching 10.57 m/a. Overall, the entire coastline of the Samut Sakhon–Bangkok–Samut Prakan Provinces underwent net erosion from 1989 to 2024, driven by a confluence of natural and anthropogenic factors.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Object Detection on Road: Vehicle’s Detection Based on Re-Training Models on NVIDIA-Jetson Platform
by
Sleiter Ramos-Sanchez, Jinmi Lezama, Ricardo Yauri and Joyce Zevallos
J. Imaging 2026, 12(1), 20; https://doi.org/10.3390/jimaging12010020 - 1 Jan 2026
Abstract
The increasing use of artificial intelligence (AI) and deep learning (DL) techniques has driven advances in vehicle classification and detection applications for embedded devices with deployment constraints due to computational cost and response time. In the case of urban environments with high traffic
[...] Read more.
The increasing use of artificial intelligence (AI) and deep learning (DL) techniques has driven advances in vehicle classification and detection applications for embedded devices with deployment constraints due to computational cost and response time. In the case of urban environments with high traffic congestion, such as the city of Lima, it is important to determine the trade-off between model accuracy, type of embedded system, and the dataset used. This study was developed using a methodology adapted from the CRISP-DM approach, which included the acquisition of traffic videos in the city of Lima, their segmentation, and manual labeling. Subsequently, three SSD-based detection models (MobileNetV1-SSD, MobileNetV2-SSD-Lite, and VGG16-SSD) were trained on the NVIDIA Jetson Orin NX 16 GB platform. The results show that the VGG16-SSD model achieved the highest average precision (mAP ), with a longer training time, while the MobileNetV1-SSD ( ) model achieved comparable performance (mAP ) with a shorter time. Additionally, data augmentation through contrast adjustment improved the detection of minority classes such as Tuk-tuk and Motorcycle. The results indicate that, among the evaluated models, MobileNetV1-SSD ( ) achieved the best balance between accuracy and computational load for its implementation in ADAS embedded systems in congested urban environments.
Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Double-Gated Mamba Multi-Scale Adaptive Feature Learning Network for Unsupervised Single RGB Image Hyperspectral Image Reconstruction
by
Zhongmin Jiang, Zhen Wang, Wenju Wang and Jifan Zhu
J. Imaging 2026, 12(1), 19; https://doi.org/10.3390/jimaging12010019 - 31 Dec 2025
Abstract
Existing methods for reconstructing hyperspectral images from single RGB images struggle to obtain a large number of labeled RGB-HSI paired images. These methods face issues such as detail loss, insufficient robustness, low reconstruction accuracy, and the difficulty of balancing the spatial–spectral trade-off. To
[...] Read more.
Existing methods for reconstructing hyperspectral images from single RGB images struggle to obtain a large number of labeled RGB-HSI paired images. These methods face issues such as detail loss, insufficient robustness, low reconstruction accuracy, and the difficulty of balancing the spatial–spectral trade-off. To address these challenges, a Double-Gated Mamba Multi-Scale Adaptive Feature (DMMAF) learning network model is proposed. DMMAF designs a reflection dot-product adaptive dual-noise-aware feature extraction method, which is used to supplement edge detail information in spectral images and improve robustness. DMMAF also constructs a deformable attention-based global feature extraction method and a double-gated Mamba local feature extraction approach, enhancing the interaction between local and global information during the reconstruction process, thereby improving image accuracy. Meanwhile, DMMAF introduces a structure-aware smooth loss function, which, by combining smoothing, curvature, and attention supervision losses, effectively resolves the spatial–spectral resolution balance problem. This network model is applied to three datasets—NTIRE 2020, Harvard, and CAVE—achieving state-of-the-art unsupervised reconstruction performance compared to existing advanced algorithms. Experiments on the NTIRE 2020, Harvard, and CAVE datasets demonstrate that this model achieves state-of-the-art unsupervised reconstruction performance. On the NTIRE 2020 dataset, our method attains MRAE, RMSE, and PSNR values of 0.133, 0.040, and 31.314, respectively. On the Harvard dataset, it achieves RMSE and PSNR values of 0.025 and 34.955, respectively, while on the CAVE dataset, it achieves RMSE and PSNR values of 0.041 and 30.983, respectively.
Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Revisiting Underwater Image Enhancement for Object Detection: A Unified Quality–Detection Evaluation Framework
by
Ali Awad, Ashraf Saleem, Sidike Paheding, Evan Lucas, Serein Al-Ratrout and Timothy C. Havens
J. Imaging 2026, 12(1), 18; https://doi.org/10.3390/jimaging12010018 - 30 Dec 2025
Abstract
Underwater images often suffer from severe color distortion, low contrast, and reduced visibility, motivating the widespread use of image enhancement as a preprocessing step for downstream computer vision tasks. However, recent studies have questioned whether enhancement actually improves object detection performance. In this
[...] Read more.
Underwater images often suffer from severe color distortion, low contrast, and reduced visibility, motivating the widespread use of image enhancement as a preprocessing step for downstream computer vision tasks. However, recent studies have questioned whether enhancement actually improves object detection performance. In this work, we conduct a comprehensive and rigorous evaluation of nine state-of-the-art enhancement methods and their interactions with modern object detectors. We propose a unified evaluation framework that integrates (1) a distribution-level quality assessment using a composite quality index (Q-index), (2) a fine-grained per-image detection protocol based on COCO-style mAP, and (3) a mixed-set upper-bound analysis that quantifies the theoretical performance achievable through ideal selective enhancement. Our findings reveal that traditional image quality metrics do not reliably predict detection performance, and that dataset-level conclusions often overlook substantial image-level variability. Through per-image evaluation, we identify numerous cases in which enhancement significantly improves detection accuracy—primarily for low-quality inputs—while also demonstrating conditions under which enhancement degrades performance. The mixed-set analysis shows that selective enhancement can yield substantial gains over both original and fully enhanced datasets, establishing a new direction for designing enhancement models optimized for downstream vision tasks. This study provides the most comprehensive evidence to date that underwater image enhancement can be beneficial for object detection when evaluated at the appropriate granularity and guided by informed selection strategies. The data generated and code developed are publicly available.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessReview
Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks
by
Ichrak Khoulqi and Zakariae El Ouazzani
J. Imaging 2026, 12(1), 17; https://doi.org/10.3390/jimaging12010017 - 30 Dec 2025
Abstract
In this paper, we propose a literature review regarding two deep learning architectures, namely Convolutional Neural Networks (CNNs) and Capsule Networks (CapsNets), applied to medical images, in order to analyze them to help in medical decision support. CNNs demonstrate their capacity in the
[...] Read more.
In this paper, we propose a literature review regarding two deep learning architectures, namely Convolutional Neural Networks (CNNs) and Capsule Networks (CapsNets), applied to medical images, in order to analyze them to help in medical decision support. CNNs demonstrate their capacity in the medical diagnostic field; however, their reliability decreases when there is slight spatial variability, which can affect diagnosis, especially since the anatomical structure of the human body can differ from one patient to another. In contrast, CapsNets encode not only feature activation but also spatial relationships, hence improving the reliability and stability of model generalization. This paper proposes a structured comparison by reviewing studies published from 2018 to 2025 across major databases, including IEEE Xplore, ScienceDirect, SpringerLink, and MDPI. The applications in the reviewed papers are based on the benchmark datasets BraTS, INbreast, ISIC, and COVIDx. This paper review compares the core architectural principles, performance, and interpretability of both architectures. To conclude the paper, we underline the complementary roles of these two architectures in medical decision-making and propose future directions toward hybrid, explainable, and computationally efficient deep learning systems for real clinical environments, thereby increasing survival rates by helping prevent diseases at an early stage.
Full article
(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
FluoNeRF: Fluorescent Novel-View Synthesis Under Novel Light Source Colors and Spectra
by
Lin Shi, Kengo Matsufuji, Michitaka Yoshida, Ryo Kawahara and Takahiro Okabe
J. Imaging 2026, 12(1), 16; https://doi.org/10.3390/jimaging12010016 - 29 Dec 2025
Abstract
Synthesizing photo-realistic images of a scene from arbitrary viewpoints and under arbitrary lighting environments is one of the important research topics in computer vision and graphics. In this paper, we propose a method for synthesizing photo-realistic images of a scene with fluorescent objects
[...] Read more.
Synthesizing photo-realistic images of a scene from arbitrary viewpoints and under arbitrary lighting environments is one of the important research topics in computer vision and graphics. In this paper, we propose a method for synthesizing photo-realistic images of a scene with fluorescent objects from novel viewpoints and under novel lighting colors and spectra. In general, fluorescent materials absorb light with certain wavelengths and then emit light with longer wavelengths than the absorbed ones, in contrast to reflective materials, which preserve wavelengths of light. Therefore, we cannot reproduce the colors of fluorescent objects under arbitrary lighting colors by combining conventional view synthesis techniques with the white balance adjustment of the RGB channels. Accordingly, we extend the novel-view synthesis based on the neural radiance fields by incorporating the superposition principle of light; our proposed method captures a sparse set of images of a scene from varying viewpoints and under varying lighting colors or spectra with active lighting systems such as a color display or a multi-spectral light stage and then synthesizes photo-realistic images of the scene without explicitly modeling its geometric and photometric models. We conducted a number of experiments using real images captured with an LCD and confirmed that our method works better than the existing methods. Moreover, we showed that the extension of our method using more than three primary colors with a light stage enables us to reproduce the colors of fluorescent objects under common light sources.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
M3-TransUNet: Medical Image Segmentation Based on Spatial Prior Attention and Multi-Scale Gating
by
Zhigao Zeng, Jiale Xiao, Shengqiu Yi, Qiang Liu and Yanhui Zhu
J. Imaging 2026, 12(1), 15; https://doi.org/10.3390/jimaging12010015 - 29 Dec 2025
Abstract
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To
[...] Read more.
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To address these limitations, we propose the M3-TransUNet architecture, which incorporates three key innovations: (1) MSGA (Multi-Scale Gate Attention) and MSSA (Multi-Scale Selective Attention) modules to enhance multi-scale feature representation; (2) ME-MSA (Manhattan Enhanced Multi-Head Self-Attention) to integrate spatial priors into self-attention computations, thereby overcoming spatial modeling deficiencies; and (3) MKGAG (Multi-kernel Gated Attention Gate) to optimize skip connections by precisely filtering noise and preserving boundary details. Extensive experiments on public datasets—including Synapse, CVC-ClinicDB, and ISIC—demonstrate that M3-TransUNet achieves state-of-the-art performance. Specifically, on the Synapse dataset, our model outperforms recent TransUNet variants such as J-CAPA, improving the average DSC to 82.79% (compared to 82.29%) and significantly reducing the average HD95 from 19.74 mm to 10.21 mm.
Full article
(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Adaptive Normalization Enhances the Generalization of Deep Learning Model in Chest X-Ray Classification
by
Jatsada Singthongchai and Tanachapong Wangkhamhan
J. Imaging 2026, 12(1), 14; https://doi.org/10.3390/jimaging12010014 - 28 Dec 2025
Abstract
This study presents a controlled benchmarking analysis of min–max scaling, Z-score normalization, and an adaptive preprocessing pipeline that combines percentile-based ROI cropping with histogram standardization. The evaluation was conducted across four public chest X-ray (CXR) datasets and three convolutional neural network architectures under
[...] Read more.
This study presents a controlled benchmarking analysis of min–max scaling, Z-score normalization, and an adaptive preprocessing pipeline that combines percentile-based ROI cropping with histogram standardization. The evaluation was conducted across four public chest X-ray (CXR) datasets and three convolutional neural network architectures under controlled experimental settings. The adaptive pipeline generally improved accuracy, F1-score, and training stability on datasets with relatively stable contrast characteristics while yielding limited gains on MIMIC-CXR due to strong acquisition heterogeneity. Ablation experiments showed that histogram standardization provided the primary performance contribution, with ROI cropping offering complementary benefits, and the full pipeline achieving the best overall performance. The computational overhead of the adaptive preprocessing was minimal (+6.3% training-time cost; 5.2 ms per batch). Friedman–Nemenyi and Wilcoxon signed-rank tests confirmed that the observed improvements were statistically significant across most dataset–model configurations. Overall, adaptive normalization is positioned not as a novel algorithmic contribution, but as a practical preprocessing design choice that can enhance cross-dataset robustness and reliability in chest X-ray classification workflows.
Full article
(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Assessing Change in Stone Burden on Baseline and Follow-Up CT: Radiologist and Radiomics Evaluations
by
Parisa Kaviani, Matthias F. Froelich, Bernardo Bizzo, Andrew Primak, Giridhar Dasegowda, Emiliano Garza-Frias, Lina Karout, Anushree Burade, Seyedehelaheh Hosseini, Javier Eduardo Contreras Yametti, Keith Dreyer, Sanjay Saini and Mannudeep Kalra
J. Imaging 2026, 12(1), 13; https://doi.org/10.3390/jimaging12010013 - 27 Dec 2025
Abstract
This retrospective diagnostic accuracy study compared radiologist-based qualitative assessments and radiomics-based analyses with an automated artificial intelligence (AI)–based volumetric approach for evaluating changes in kidney stone burden on follow-up CT examinations. With institutional review board approval, 157 patients (mean age, 61 ± 13
[...] Read more.
This retrospective diagnostic accuracy study compared radiologist-based qualitative assessments and radiomics-based analyses with an automated artificial intelligence (AI)–based volumetric approach for evaluating changes in kidney stone burden on follow-up CT examinations. With institutional review board approval, 157 patients (mean age, 61 ± 13 years; 99 men, 58 women) who underwent baseline and follow-up non-contrast abdomen–pelvis CT for kidney stone evaluation were included. The index test was an automated AI-based whole-kidney and stone segmentation radiomics prototype (Frontier, Siemens Healthineers), which segmented both kidneys and isolated stone volumes using a fixed threshold of 130 Hounsfield units, providing stone volume and maximum diameter per kidney. The reference standard was a threshold-defined volumetric assessment of stone burden change between baseline and follow-up CTs. The radiologist’s performance was assessed using (1) interpretations from clinical radiology reports and (2) an independent radiologist’s assessment of stone burden change (stable, increased, or decreased). Diagnostic accuracy was evaluated using multivariable logistic regression and receiver operating characteristic (ROC) analysis. Automated volumetric assessment identified stable (n = 44), increased (n = 109), and decreased (n = 108) stone burden across the evaluated kidneys. Qualitative assessments from radiology reports demonstrated weak diagnostic performance (AUC range, 0.55–0.62), similar to the independent radiologist (AUC range, 0.41–0.72) for differentiating changes in stone burden. A model incorporating higher-order radiomics features achieved an AUC of 0.71 for distinguishing increased versus decreased stone burdens compared with the baseline CT (p < 0.001), but did not outperform threshold-based volumetric assessment. The automated threshold-based volumetric quantification of kidney stone burdens provides higher diagnostic accuracy than qualitative radiologist assessments and radiomics-based analyses for identifying a stable, increased, or decreased stone burden on follow-up CT examinations.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Patched-Based Swin Transformer Hyperprior for Learned Image Compression
by
Sibusiso B. Buthelezi and Jules R. Tapamo
J. Imaging 2026, 12(1), 12; https://doi.org/10.3390/jimaging12010012 - 26 Dec 2025
Abstract
We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on
[...] Read more.
We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
A Hybrid Vision Transformer-BiRNN Architecture for Direct k-Space to Image Reconstruction in Accelerated MRI
by
Changheun Oh
J. Imaging 2026, 12(1), 11; https://doi.org/10.3390/jimaging12010011 - 26 Dec 2025
Abstract
Long scan times remain a fundamental challenge in Magnetic Resonance Imaging (MRI). Accelerated MRI, which undersamples k-space, requires robust reconstruction methods to solve the ill-posed inverse problem. Recent methods have shown promise by processing image-domain features to capture global spatial context. However, these
[...] Read more.
Long scan times remain a fundamental challenge in Magnetic Resonance Imaging (MRI). Accelerated MRI, which undersamples k-space, requires robust reconstruction methods to solve the ill-posed inverse problem. Recent methods have shown promise by processing image-domain features to capture global spatial context. However, these approaches are often limited, as they fail to fully leverage the unique, sequential characteristics of the k-space data themselves, which are critical for disentangling aliasing artifacts. This study introduces a novel, hybrid, dual-domain deep learning architecture that combines a ViT-based autoencoder with Bidirectional Recurrent Neural Networks (BiRNNs). The proposed architecture is designed to synergistically process information from both domains: it uses the ViT to learn features from image patches and the BiRNNs to model sequential dependencies directly from k-space data. We conducted a comprehensive comparative analysis against a standard ViT with only an MLP head (Model 1), a ViT autoencoder operating solely in the image domain (Model 2), and a competitive UNet baseline. Evaluations were performed on retrospectively undersampled neuro-MRI data using R = 4 and R = 8 acceleration factors with both regular and random sampling patterns. The proposed architecture demonstrated superior performance and robustness, significantly outperforming all other models in challenging high-acceleration and random-sampling scenarios. The results confirm that integrating sequential k-space processing via BiRNNs is critical for superior artifact suppression, offering a robust solution for accelerated MRI.
Full article
(This article belongs to the Topic New Challenges in Image Processing and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Render-Rank-Refine: Accurate 6D Indoor Localization via Circular Rendering
by
Haya Monawwar and Guoliang Fan
J. Imaging 2026, 12(1), 10; https://doi.org/10.3390/jimaging12010010 - 25 Dec 2025
Abstract
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in
[...] Read more.
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in a query image and their boundaries may overlap or be partially occluded. We present Render-Rank-Refine, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-k candidates are compared to the query via rotation-invariant circular descriptors, which re-ranks the matches before final translation and rotation refinement. Our method increases camera localization accuracy compared to the state-of-the-art SPVLoc baseline by reducing the translation error by 40.4% and the rotation error by 29.7% in ambiguous layouts, as evaluated on the Zillow Indoor Dataset. In terms of inference throughput, our method achieves 25.8–26.4 QPS, (Queries Per Second) which is significantly faster than other recent comparable methods, while maintaining accuracy comparable to or better than the SPVLoc baseline. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessFeature PaperArticle
Accurate Segmentation of Vegetation in UAV Desert Imagery Using HSV-GLCM Features and SVM Classification
by
Thani Jintasuttisak, Patompong Chabplan, Sasitorn Issaro, Orawan Saeung and Thamasan Suwanroj
J. Imaging 2026, 12(1), 9; https://doi.org/10.3390/jimaging12010009 - 25 Dec 2025
Abstract
Segmentation of vegetation from images is an important task in precision agriculture applications, particularly in challenging desert environments where sparse vegetation, varying soil colors, and strong shadows pose significant difficulties. In this paper, we present a machine learning approach to robust green-vegetation segmentation
[...] Read more.
Segmentation of vegetation from images is an important task in precision agriculture applications, particularly in challenging desert environments where sparse vegetation, varying soil colors, and strong shadows pose significant difficulties. In this paper, we present a machine learning approach to robust green-vegetation segmentation in drone imagery captured over desert farmlands. The proposed method combines HSV color-space representation with Gray-Level Co-occurrence Matrix (GLCM) texture features and employs Support Vector Machine (SVM) as the learning algorithm. To enhance robustness, we incorporate comprehensive preprocessing, including Gaussian filtering, illumination normalization, and bilateral filtering, followed by morphological post-processing to improve segmentation quality. The method is evaluated against both traditional spectral index methods (ExG and CIVE) and a modern deep learning baseline using comprehensive metrics including accuracy, precision, recall, F1-score, and Intersection over Union (IoU). Experimental results on 120 high-resolution drone images from UAE desert farmlands demonstrate that the proposed method achieves superior performance with an accuracy of 0.91, F1-score of 0.88, and IoU of 0.82, showing significant improvement over baseline methods in handling challenging desert conditions, including shadows, varying soil colors, and sparse vegetation patterns. The method provides practical computational performance with a processing time of 25 s per image and a training time of 28 min, making it suitable for agricultural applications where accuracy is prioritized over processing speed.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
A Terrain-Constrained TIN Approach for High-Precision DEM Reconstruction Using UAV Point Clouds
by
Ziye He, Shu Gan and Xiping Yuan
J. Imaging 2026, 12(1), 8; https://doi.org/10.3390/jimaging12010008 - 25 Dec 2025
Abstract
To address the decline in self-consistency and limited spatial adaptability of traditional interpolation methods in complex terrain, this study proposes a terrain-constrained Triangulated Irregular Network (TIN) interpolation method based on UAV point clouds. The method was tested in the southern margin of the
[...] Read more.
To address the decline in self-consistency and limited spatial adaptability of traditional interpolation methods in complex terrain, this study proposes a terrain-constrained Triangulated Irregular Network (TIN) interpolation method based on UAV point clouds. The method was tested in the southern margin of the Lufeng Dinosaur National Geopark, Yunnan Province, using ground points at different sampling densities (90%, 70%, 50%, 30%, and 10%), and compared with Spline, Kriging, ANUDEM, and IDW methods. Results show that the proposed method maintains the lowest RMSE and MAE across all densities, demonstrating higher stability and self-consistency and better preserving terrain undulations. This provides technical support for high-precision DEM reconstruction from UAV point clouds in complex terrain.
Full article
(This article belongs to the Special Issue Intelligent Processing and Analysis of Multi-Spectral UAV Remote Sensing Images)
►▼
Show Figures

Figure 1
Open AccessArticle
AKAZE-GMS-PROSAC: A New Progressive Framework for Matching Dynamic Characteristics of Flotation Foam
by
Zhen Peng, Zhihong Jiang, Pengcheng Zhu, Gaipin Cai and Xiaoyan Luo
J. Imaging 2026, 12(1), 7; https://doi.org/10.3390/jimaging12010007 - 25 Dec 2025
Abstract
The dynamic characteristics of flotation foam, such as velocity and breakage rate, are critical factors that influence mineral separation efficiency. However, challenges inherent in foam images, including weak textures, severe deformations, and motion blur, present significant technical hurdles for dynamic monitoring. These issues
[...] Read more.
The dynamic characteristics of flotation foam, such as velocity and breakage rate, are critical factors that influence mineral separation efficiency. However, challenges inherent in foam images, including weak textures, severe deformations, and motion blur, present significant technical hurdles for dynamic monitoring. These issues lead to a fundamental conflict between the efficiency and accuracy of traditional feature matching algorithms. This paper introduces a novel progressive framework for dynamic feature matching in flotation foam images, termed “stable extraction, efficient coarse screening, and precise matching.” This framework first employs the Accelerated-KAZE (AKAZE) algorithm to extract robust, scale- and rotation-invariant feature points from a non-linear scale-space, effectively addressing the challenge of weak textures. Subsequently, it innovatively incorporates the Grid-based Motion Statistics (GMS) algorithm to perform efficient coarse screening based on motion consistency, rapidly filtering out a large number of obvious mismatches. Finally, the Progressive Sample and Consensus (PROSAC) algorithm is used for precise matching, eliminating the remaining subtle mismatches through progressive sampling and geometric constraints. This framework enables the precise analysis of dynamic foam characteristics, including displacement, velocity, and breakage rate (enhanced by a robust “foam lifetime” mechanism). Comparative experimental results demonstrate that, compared to ORB-GMS-RANSAC (with a Mean Absolute Error, MAE of 1.20 pixels and a Mean Relative Error, MRE of 9.10%) and ORB-RANSAC (MAE: 3.53 pixels, MRE: 27.36%), the proposed framework achieves significantly lower error rates (MAE: 0.23 pixels, MRE: 2.13%). It exhibits exceptional stability and accuracy, particularly in complex scenarios involving low texture and minor displacements. This research provides a high-precision, high-robustness technical solution for the dynamic monitoring and intelligent control of the flotation process.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Long-Term Prognostic Value in Nuclear Cardiology: Expert Scoring Combined with Automated Measurements vs. Angiographic Score
by
George Angelidis, Stavroula Giannakou, Varvara Valotassiou, Emmanouil Panagiotidis, Ioannis Tsougos, Chara Tzavara, Dimitrios Psimadas, Evdoxia Theodorou, Charalampos Ziangas, John Skoularigis, Filippos Triposkiadis and Panagiotis Georgoulias
J. Imaging 2026, 12(1), 6; https://doi.org/10.3390/jimaging12010006 - 25 Dec 2025
Abstract
The evaluation of myocardial perfusion imaging (MPI) studies is based on the visual interpretation of the reconstructed images, while the measurements obtained through software packages may contribute to the investigation, mainly in cases of ambiguous scintigraphic findings. We aimed to investigate the long-term
[...] Read more.
The evaluation of myocardial perfusion imaging (MPI) studies is based on the visual interpretation of the reconstructed images, while the measurements obtained through software packages may contribute to the investigation, mainly in cases of ambiguous scintigraphic findings. We aimed to investigate the long-term prognostic value of expert reading of Summed Stress Score (SSS), Summed Rest Score (SRS), and Summed Difference Score (SDS), combined with the automated measurements of these parameters, in comparison to the prognostic ability of the angiographic score for soft and hard cardiac events. The study was conducted at the Nuclear Medicine Laboratory of the University of Thessaly, in Larissa, Greece. Overall, 378 consecutive patients with known or suspected coronary artery disease (CAD) were enrolled. Automated measurements of SSS, SRS, and SDS were obtained using the Emory Cardiac Toolbox, Myovation, and Quantitative Perfusion SPECT software packages. Coronary angiographies were scored according to a four-point scoring system (angiographic score). Follow-up data were recorded after phone contact, as well as through review of hospital records. All participants were followed up for at least 36 months. Soft and hard cardiac events were recorded in 31.7% and 11.6% of the sample, respectively, while any cardiac event was recorded in 36.5%. For hard cardiac events, the prognostic value of expert scoring, combined with the prognostic value of the automated measurements, was significantly greater compared to the prognostic ability of the angiographic score (p < 0.001). As far as any cardiac event, the prognostic value of expert scoring, combined with the prognostic value of the automated analyses, was significantly greater compared to the prognostic ability of the angiographic score (p < 0.001). According to our results, in patients with known or suspected CAD, the combination of expert reading and automated measurements of SSS, SRS, and SDS shows a superior prognostic ability in comparison to the angiographic score.
Full article
(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Topic in
AI, Applied Sciences, Electronics, J. Imaging, Sensors, IJGI
State-of-the-Art Object Detection, Tracking, and Recognition Techniques
Topic Editors: Mang Ye, Jingwen Ye, Cuiqun ChenDeadline: 30 June 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Image Segmentation: Trends and Challenges
Guest Editor: Nikolaos MitianoudisDeadline: 31 January 2026
Special Issue in
J. Imaging
Progress and Challenges in Biomedical Image Analysis—2nd Edition
Guest Editors: Lei Li, Zehor BelkhatirDeadline: 31 January 2026
Special Issue in
J. Imaging
Computer Vision for Medical Image Analysis
Guest Editors: Rahman Attar, Le ZhangDeadline: 15 February 2026
Special Issue in
J. Imaging
Emerging Technologies for Less Invasive Diagnostic Imaging
Guest Editors: Francesca Angelone, Noemi Pisani, Armando RicciardiDeadline: 28 February 2026





