Next Issue
Volume 12, March
Previous Issue
Volume 12, January
 
 

J. Imaging, Volume 12, Issue 2 (February 2026) – 38 articles

Cover Story (view full-size image): This cover illustrates a unified framework for benchmarking explainability in 3D deep learning. The visual composition symbolically represents cross-domain evaluation across volumetric medical imaging, voxelized CAD objects, and point-cloud data. By evaluating correctness, completeness, and compactness, the work highlights domain-dependent explanation behaviors and structured trade-offs between causal alignment and spatial compactness. The visual synthesis reflects the complementary strengths of intrinsic and post hoc explainability approaches, emphasizing the importance of multi-metric evaluation for reliable 3D XAI systems. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
22 pages, 4410 KB  
Article
Accelerating Point Cloud Computation via Memory in Embedded Structured Light Cameras
by Yanan Zhang, Shikang Meng, Shijie Wang and Yaheng Ren
J. Imaging 2026, 12(2), 91; https://doi.org/10.3390/jimaging12020091 - 21 Feb 2026
Cited by 1 | Viewed by 1624
Abstract
Embedded structured light cameras have been widely applied in various fields. However, due to constraints such as insufficient computing resources, it remains difficult to achieve high-speed structured light point cloud computation. To address this issue, this study proposes a memory-driven computational framework for [...] Read more.
Embedded structured light cameras have been widely applied in various fields. However, due to constraints such as insufficient computing resources, it remains difficult to achieve high-speed structured light point cloud computation. To address this issue, this study proposes a memory-driven computational framework for accelerating point cloud computation. Specifically, the point cloud computation process is precomputed as much as possible and stored in memory in the form of parameters, thereby significantly reducing the computational load during actual point cloud computation. The framework is instantiated in two forms: a low-memory method that minimizes memory footprint at the expense of point cloud stability, and a high-memory method that preserves the nonlinear phase–distance relation via an extensive lookup table. Experimental evaluations demonstrate that the proposed methods achieve comparable accuracy to the conventional method while delivering substantial speedups, and data-format optimizations further reduce required bandwidth. This framework offers a generalizable paradigm for optimizing structured light pipelines, paving the way for enhanced real-time 3D sensing in embedded applications. Full article
Show Figures

Figure 1

25 pages, 3654 KB  
Article
MDF2Former: Multi-Scale Dual-Domain Feature Fusion Transformer for Hyperspectral Image Classification of Bacteria in Murine Wounds
by Decheng Wu, Wendan Liu, Rui Li, Xudong Fu, Lin Tao, Yinli Tian, Anqiang Zhang, Zhen Wang and Hao Tang
J. Imaging 2026, 12(2), 90; https://doi.org/10.3390/jimaging12020090 - 19 Feb 2026
Viewed by 520
Abstract
Bacterial wound infection poses a major challenge in trauma care and can lead to severe complications such as sepsis and organ failure. Therefore, rapid and accurate identification of the pathogen, along with targeted intervention, is of vital importance for improving treatment outcomes and [...] Read more.
Bacterial wound infection poses a major challenge in trauma care and can lead to severe complications such as sepsis and organ failure. Therefore, rapid and accurate identification of the pathogen, along with targeted intervention, is of vital importance for improving treatment outcomes and reducing risks. However, current detection methods are still constrained by procedural complexity and long processing times. In this study, a hyperspectral imaging (HSI) acquisition system for bacterial analysis and a multi-scale dual-domain feature fusion transformer (MDF2Former) were developed for classifying wound bacteria. MDF2Former integrates three modules: a multi-scale feature enhancement and fusion module that generates tokens with multi-scale discriminative representations, a spatial–spectral dual-branch attention module that strengthens joint feature modeling, and a frequency and spatial–spectral domain encoding module that captures global and local interactions among tokens through a hierarchical stacking structure, thereby enabling more efficient feature learning. Extensive experiments on our self-constructed HSI dataset of typical wound bacteria demonstrate that MDF2Former achieved outstanding performance across five metrics: Accuracy (91.94%), Precision (92.26%), Recall (91.94%), F1-score (92.01%), and Kappa coefficient (90.73%), surpassing all comparative models. These results have verified the effectiveness of combining HSI with deep learning for bacterial identification, and have highlighted its potential in assisting in the identification of bacterial species and making personalized treatment decisions for wound infections. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

26 pages, 12208 KB  
Article
Classification of the Surrounding Rock Based on Image Processing Analysis and Transfer Learning
by Yanyun Fan, Jiaqi Zhu, Hua Luo, Yaxi Shen, Shuanglong Wang, Xiaoning Liu, Dong Li and Chuhan Deng
J. Imaging 2026, 12(2), 89; https://doi.org/10.3390/jimaging12020089 - 19 Feb 2026
Viewed by 653
Abstract
Currently, standardized classification methods of surrounding rock are relatively insufficient. The classification of surrounding rock mainly relies on the subjective judgment of technicians, leading to diverse evaluation results. This study focuses on the feature extraction and classification methods of surrounding rock images in [...] Read more.
Currently, standardized classification methods of surrounding rock are relatively insufficient. The classification of surrounding rock mainly relies on the subjective judgment of technicians, leading to diverse evaluation results. This study focuses on the feature extraction and classification methods of surrounding rock images in a certain tunnel of the Central Yunnan Water Diversion Project by using image processing analysis and transfer learning. Rich surrounding rock images and the water conservancy tunnel data are collected, and then the surrounding rock is classified relatively accurately according to the code and expert guidance. By introducing the fractal theory, the complexity and irregularity of the spatial distribution of weak layers and joints on the surrounding rock surface are revealed effectively. Based on the analysis of changes in fractal dimension characteristic values, a classification method for surrounding rock based on the fractal theory is proposed. Combined with the quantified parameters of surrounding rock images and the strength data collected by rebound meters, a method for correcting the surrounding rock strength based on image analysis is proposed, which can effectively solve the error caused by the uneven distribution of rock masses in the traditional rebound meter strength values. After correction, more accurate strength characteristics can be obtained, which is conducive to the standardized classification of the surrounding rock. After studying the recognition of tunnel surrounding rock images with transfer learning, a model is constructed to achieve rapid classification of tunnel surrounding rock. This research provides support for the standardized classification of tunnel surrounding rock. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

35 pages, 1423 KB  
Review
Analysis of Biological Images and Quantitative Monitoring Using Deep Learning and Computer Vision
by Aaron Gálvez-Salido, Francisca Robles, Rodrigo J. Gonçalves, Roberto de la Herrán, Carmelo Ruiz Rejón and Rafael Navajas-Pérez
J. Imaging 2026, 12(2), 88; https://doi.org/10.3390/jimaging12020088 - 18 Feb 2026
Cited by 1 | Viewed by 1308
Abstract
Automated biological counting is essential for scaling wildlife monitoring and biodiversity assessments, as manual processing currently limits analytical effort and scalability. This review evaluates the integration of deep learning and computer vision across diverse acquisition platforms, including camera traps, unmanned aerial vehicles (UAVs), [...] Read more.
Automated biological counting is essential for scaling wildlife monitoring and biodiversity assessments, as manual processing currently limits analytical effort and scalability. This review evaluates the integration of deep learning and computer vision across diverse acquisition platforms, including camera traps, unmanned aerial vehicles (UAVs), and remote sensing. Methodological paradigms ranging from Convolutional Neural Networks (CNNs) and one-stage detectors like You Only Look Once (YOLO) to recent transformer-based architectures and hybrid models are examined. The literature shows that these methods consistently achieve high accuracy—often exceeding 95%—across various taxa, including insect pests, aquatic organisms, terrestrial vegetation, and forest ecosystems. However, persistent challenges such as object occlusion, cryptic species differentiation, and the scarcity of high-quality, labeled datasets continue to hinder fully automated workflows. We conclude that while automated counting has fundamentally increased data throughput, future advancements must focus on enhancing model generalization through self-supervised learning and improved data augmentation techniques. These developments are critical for transitioning from experimental models to robust, operational tools for global ecological monitoring and conservation efforts. Full article
Show Figures

Figure 1

22 pages, 3511 KB  
Article
Automated Compactness Quantitative Metrics for Wrist Bone on Conventional Radiography in Rheumatoid Arthritis: A Clinical Evaluation Study
by Jiajing Zhou, Junmu Peng, Haolin Wang, Hiroshi Kataoka, Masaya Mukai, Tunlada Wiriyanukhroh and Tamotsu Kamishima
J. Imaging 2026, 12(2), 87; https://doi.org/10.3390/jimaging12020087 - 18 Feb 2026
Viewed by 553
Abstract
Rheumatoid arthritis (RA) frequently affects the joints of the hands, with joint space narrowing (JSN) representing an important early marker of structural damage. The semi-quantitative Sharp/van der Heijde (SvdH) scoring system is widely used in clinical practice but is inherently subjective and susceptible [...] Read more.
Rheumatoid arthritis (RA) frequently affects the joints of the hands, with joint space narrowing (JSN) representing an important early marker of structural damage. The semi-quantitative Sharp/van der Heijde (SvdH) scoring system is widely used in clinical practice but is inherently subjective and susceptible to observer variability. Moreover, the complex anatomy of the wrist and substantial overlap of carpal bones pose challenges for automated quantitative assessment of wrist JSN on routine radiographs. This study aimed to introduce a novel quantitative assessment perspective and to clinically validate an automated, compactness-related quantification framework for evaluating wrist JSN in RA. This study initially enrolled 51 patients with RA. After excluding one case with severe carpal fusion that precluded anatomical differentiation, 50 patients (44 females and 6 males) were included in the final analysis. The cohort had a mean age of 61 years (range: 21–82), a median symptom duration of 9 years (IQR: 1–32), and a median follow-up interval for bilateral hand radiographs of 1.06 years (IQR: 0.82–1.30). To quantify global wrist JSN, 10 compactness-related metrics were computed based on the spatial distribution of bone centroids extracted from carpal segmentation masks. These metrics were validated against the wrist JSN subscore of the SvdH score (SvdH-JSN_wrist) and the total Sharp score (TSS) as gold standards. Several distance-based metrics among the compactness-related metrics showed significant negative correlations with the wrist joint space narrowing subscore of the Sharp/van der Heijde score (SvdH-JSN_wrist). Specifically, mean-pairwise-distance (MPD), root-mean-square-radius (RMSR), and median-radius (R50) showed moderate to strong correlations (r = −0.52 to −0.63, all p0.0001) that were consistent at BL and FU. Correlations with TSS were weaker overall, with only R50 and its normalized form showing stable negative correlations (r = −0.40 to −0.43, p < 0.01). Longitudinal analyses showed limited correlations between metric changes and clinical score changes. The proposed automated compactness quantification framework enables objective and reliable assessment of wrist JSN on standard radiographs and complements conventional scoring systems by supporting automated and standardized evaluation of RA-related wrist structural changes. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

17 pages, 8549 KB  
Article
Print Quality Assessment of QR Code Elements Achieved by the Digital Thermal Transfer Process
by Igor Majnarić, Marija Jelkić, Marko Morić and Krunoslav Hajdek
J. Imaging 2026, 12(2), 86; https://doi.org/10.3390/jimaging12020086 - 18 Feb 2026
Viewed by 827
Abstract
The new European Regulation (EU) 2025/40 includes provisions on modern packaging and packaging waste. It defines the use of image QR codes on packaging (items 71 and 161) and in personal documents, making line barcodes a thing of the past. The definition of [...] Read more.
The new European Regulation (EU) 2025/40 includes provisions on modern packaging and packaging waste. It defines the use of image QR codes on packaging (items 71 and 161) and in personal documents, making line barcodes a thing of the past. The definition of a QR code is precisely specified in ISO/IEC 18004:2024. However, their implementation in printing systems is not specified and remains an important factor for their future application. Digital foil printing is a completely new hybrid printing process for applying information to highly precise applications such as QR codes, security printing, and packaging printing. The technique is characterized by a combination of two printing techniques: drop-on-demand UV inkjet followed by thermal transfer of black foil. Using a matte-coated printing substrate (Garda Matt, 300 g/m2), Konica Minolta KM1024 LHE Inkjet head settings, and a transfer temperature of 100 °C, the size of the square printing elements in QR codes plays a decisive role in the quality of the decoded information. The aim of this work is to investigate the possibility of realizing the basic elements of the QR code image (the profile of square elements and the success of realizing a precisely defined surface) with a variation in the thickness of the UV varnish coating (7, 14 and 21 µm), realized using the MGI JETvarnish 3DS digital machine. The most commonly used rectangular elements with a surface area of 0.01 cm2 were tested: 0.06 cm2, 0.25 cm2, 1 cm2, 4 cm2, and 16 cm2. The results showed that the imprint quality is uneven for the smallest elements (square elements with base lengths of 0.1 cm and 0.25 cm). The effect is especially visible with a minimum UV varnish application of 7 μm (1 drop). By increasing the amount of UV varnish and the application thickness to 14 μm (2 drops) and 21 μm (3 drops), respectively, a significantly more stable, even reproduction of the achromatic image is achieved. The highest technical precision was achieved with a UV varnish thickness of 21 μm. Full article
Show Figures

Figure 1

20 pages, 8389 KB  
Article
SREF: Semantics-Refined Feature Extraction for Long-Term Visual Localization
by Danfeng Wu, Kaifeng Zhu, Heng Shi, Fenfen Zhou and Minchi Kuang
J. Imaging 2026, 12(2), 85; https://doi.org/10.3390/jimaging12020085 - 18 Feb 2026
Viewed by 749
Abstract
Accurate and robust visual localization under changing environments remains a fundamental challenge in autonomous driving and mobile robotics. Traditional handcrafted features often degrade under long-term illumination and viewpoint variations, while recent CNN-based methods, although more robust, typically rely on coarse semantic cues and [...] Read more.
Accurate and robust visual localization under changing environments remains a fundamental challenge in autonomous driving and mobile robotics. Traditional handcrafted features often degrade under long-term illumination and viewpoint variations, while recent CNN-based methods, although more robust, typically rely on coarse semantic cues and remain vulnerable to dynamic objects. In this paper, we propose a fine-grained semantics-guided feature extraction framework that adaptively selects stable keypoints while suppressing dynamic disturbances. A fine-grained semantic refinement module subdivides coarse semantic categories into stability-homogeneous sub-classes, and a dual-attention mechanism enhances local repeatability and semantic consistency. By integrating physical priors with self-supervised clustering, the proposed framework learns discriminative and reliable feature representations. Extensive experiments on the Aachen and RobotCar-Seasons benchmarks demonstrate that the proposed approach achieves state-of-the-art accuracy and robustness while maintaining real-time efficiency, effectively bridging coarse semantic guidance with fine-grained stability estimation. Quantitatively, our method achieves strong localization performance on Aachen (up to 88.1% at night under the (0.2°,0.25 m) threshold) and on RobotCar-Seasons (up to 57.2%/28.4% under the same threshold for day/night), demonstrating improved robustness to seasonal and illumination changes. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 20177 KB  
Article
LEGS: Visual Localization Enhanced by 3D Gaussian Splatting
by Daewoon Kim and I-gil Kim
J. Imaging 2026, 12(2), 84; https://doi.org/10.3390/jimaging12020084 - 16 Feb 2026
Viewed by 935
Abstract
Accurate six-degree-of-freedom (6-DoF) visual localization is a fundamental component for modern mapping and navigation. While recent data-centric approaches have leveraged Novel View Synthesis (NVS) to augment training datasets, these methods typically rely on uniform grid-based sampling of virtual cameras. Such naive placement often [...] Read more.
Accurate six-degree-of-freedom (6-DoF) visual localization is a fundamental component for modern mapping and navigation. While recent data-centric approaches have leveraged Novel View Synthesis (NVS) to augment training datasets, these methods typically rely on uniform grid-based sampling of virtual cameras. Such naive placement often yields redundant or weakly informative views, failing to effectively bridge the gap between sparse, unordered captures and dense scene geometry. To address these challenges, we present LEGS (Visual Localization Enhanced by 3D Gaussian Splatting), a trajectory-agnostic synthetic-view augmentation framework. LEGS constructs a joint set of 6-DoF camera pose proposals by integrating a coarse 3D lattice with the Structure-from-Motion (SfM) camera graph, followed by a visibility-aware, coverage-driven selection strategy. By utilizing 3D Gaussian Splatting (3DGS), our framework enables high-throughput, scene-specific synthesis within practical computational budgets. Experiments on standard benchmarks and an in-house dataset demonstrate that LEGS consistently improves pose accuracy and robustness, particularly in scenarios characterized by sparse sampling and co-located viewpoints. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

27 pages, 7440 KB  
Article
3D Road Defect Mapping via Differentiable Neural Rendering and Multi-Frame Semantic Fusion in Bird’s-Eye-View Space
by Hongjia Xing and Feng Yang
J. Imaging 2026, 12(2), 83; https://doi.org/10.3390/jimaging12020083 - 15 Feb 2026
Viewed by 566
Abstract
Road defect detection is essential for traffic safety and infrastructure maintenance. Excising automated methods based on 2D image analysis lack spatial context and cannot provide accurate 3D localization required for maintenance planning. We propose a novel framework for road defect mapping from monocular [...] Read more.
Road defect detection is essential for traffic safety and infrastructure maintenance. Excising automated methods based on 2D image analysis lack spatial context and cannot provide accurate 3D localization required for maintenance planning. We propose a novel framework for road defect mapping from monocular video sequences by integrating differentiable Bird’s-Eye-View (BEV) mesh representation, semantic filtering, and multi-frame temporal fusion. Our differentiable mesh-based BEV representation enables efficient scene reconstruction from sparse observations through MLP-based optimization. The semantic filtering strategy leverages road surface segmentation to eliminate off-road false positives, reducing detection errors by 33.7%. Multi-frame fusion with ray-casting projection and exponential moving average update accumulates defect observations across frames while maintaining 3D geometric consistency. Experimental results demonstrate that our framework produces geometrically consistent BEV defect maps with superior accuracy compared to single-frame 2D methods, effectively handling occlusions, motion blur, and varying illumination conditions. Full article
Show Figures

Figure 1

18 pages, 781 KB  
Review
Research Progress on the Application of Radiomics and Deep Learning in Liver Fibrosis
by Yi Dang, Wenjing Li, Zhao Liu and Junqiang Lei
J. Imaging 2026, 12(2), 82; https://doi.org/10.3390/jimaging12020082 - 15 Feb 2026
Viewed by 1133
Abstract
Liver fibrosis (LF) represents a crucial intermediate stage in the pathological progression from chronic liver disease to cirrhosis and hepatocellular carcinoma. Early and accurate diagnosis is of vital importance for the intervention treatment of diseases and the improvement of prognosis. Traditional liver biopsy, [...] Read more.
Liver fibrosis (LF) represents a crucial intermediate stage in the pathological progression from chronic liver disease to cirrhosis and hepatocellular carcinoma. Early and accurate diagnosis is of vital importance for the intervention treatment of diseases and the improvement of prognosis. Traditional liver biopsy, long regarded as the diagnostic gold standard, remains associated with several notable limitations such as invasiveness, sampling errors and inter-observer variability. Lately, as artificial intelligence (AI) technology progresses swiftly, radiomics and deep learning (DL) have risen to prominence as non-invasive diagnostic instruments, showing significant potential in the LF diagnostic evaluation. This review summarizes the latest advancements in radiomics and DL for LF diagnosis, staging, prognosis prediction and etiological differentiation. It also analyzes the application value of multimodal imaging modalities, including magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound in this field. Despite ongoing challenges in model generalization and standardization, improved model interpretability, technological integration and multimodal fusion, the continuous advancement of radiomics and DL technologies holds promise for AI-driven imaging analysis strategies. These approaches aim to integrate multiple clinical monitoring methods, overcome obstacles in the early LF diagnosis and treatment and provide new perspectives for precision medicine of this disease. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

42 pages, 11792 KB  
Article
Automatic Childhood Pneumonia Diagnosis Based on Multi-Model Feature Fusion Using Chi-Square Feature Selection
by Amira Ouerhani, Tareq Hadidi, Hanene Sahli and Halima Mahjoubi
J. Imaging 2026, 12(2), 81; https://doi.org/10.3390/jimaging12020081 - 14 Feb 2026
Viewed by 571
Abstract
Pneumonia is one of the main reasons for child mortality, with chest radiography (CXR) being essential for its diagnosis. However, the low radiation exposure in pediatric analysis complicates the accurate detection of pneumonia, making traditional examination ineffective. Progress in medical imaging with convolutional [...] Read more.
Pneumonia is one of the main reasons for child mortality, with chest radiography (CXR) being essential for its diagnosis. However, the low radiation exposure in pediatric analysis complicates the accurate detection of pneumonia, making traditional examination ineffective. Progress in medical imaging with convolutional neural networks (CNN) has considerably improved performance, gaining widespread recognition for its effectiveness. This paper proposes an accurate pneumonia detection method based on different deep CNN architectures that combine optimal feature fusion. Enhanced VGG-19, ResNet-50, and MobileNet-V2 are trained on the most widely used pneumonia dataset, applying appropriate transfer learning and fine-tuning strategies. To create an effective feature input, the Chi-Square technique removes inappropriate features from every enhanced CNN. The resulting subsets are subsequently fused horizontally, to generate more diverse and robust feature representation for binary classification. By combining 1000 best features from VGG-19 and MobileNet-V2 models, the suggested approach records the best accuracy (97.59%), Recall (98.33%), and F1-score (98.19%) on the test set based on the supervised support vector machines (SVM) classifier. The achieved results demonstrated that our approach provides a significant enhancement in performance compared to previous studies using various ensemble fusion techniques while ensuring computational efficiency. We project this fused-feature system to significantly aid timely detection of childhood pneumonia, especially within constrained healthcare systems. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

20 pages, 2405 KB  
Article
Confidence-Guided Adaptive Diffusion Network for Medical Image Classification
by Yang Yan, Zhuo Xie and Wenbo Huang
J. Imaging 2026, 12(2), 80; https://doi.org/10.3390/jimaging12020080 - 14 Feb 2026
Viewed by 536
Abstract
Medical image classification is a fundamental task in medical image analysis and underpins a wide range of clinical applications, including dermatological screening, retinal disease assessment, and malignant tissue detection. In recent years, diffusion models have demonstrated promising potential for medical image classification owing [...] Read more.
Medical image classification is a fundamental task in medical image analysis and underpins a wide range of clinical applications, including dermatological screening, retinal disease assessment, and malignant tissue detection. In recent years, diffusion models have demonstrated promising potential for medical image classification owing to their strong representation learning capability. However, existing diffusion-based classification methods often rely on oversimplified prior modeling strategies, which fail to adequately capture the intrinsic multi-scale semantic information and contextual dependencies inherent in medical images. As a result, the discriminative power and stability of feature representations are constrained in complex scenarios. In addition, fixed noise injection strategies neglect variations in sample-level prediction confidence, leading to uniform perturbations being imposed on samples with different levels of semantic reliability during the diffusion process, which in turn limits the model’s discriminative performance and generalization ability. To address these challenges, this paper proposes a Confidence-Guided Adaptive Diffusion Network (CGAD-Net) for medical image classification. Specifically, a hybrid prior modeling framework is introduced, consisting of a Hierarchical Pyramid Context Modeling (HPCM) module and an Intra-Scale Dilated Convolution Refinement (IDCR) module. These two components jointly enable the diffusion-based feature modeling process to effectively capture fine-grained structural details and global contextual semantic information. Furthermore, a Confidence-Guided Adaptive Noise Injection (CG-ANI) strategy is designed to dynamically regulate noise intensity during the diffusion process according to sample-level prediction confidence. Without altering the underlying discriminative objective, CG-ANI stabilizes model training and enhances robust representation learning for semantically ambiguous samples.Experimental results on multiple public medical image classification benchmarks, including HAM10000, APTOS2019, and Chaoyang, demonstrate that CGAD-Net achieves competitive performance in terms of classification accuracy, robustness, and training stability. These results validate the effectiveness and application potential of confidence-guided diffusion modeling for two-dimensional medical image classification tasks, and provide valuable insights for further research on diffusion models in the field of medical image analysis. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

22 pages, 2046 KB  
Article
Progressive Upsampling Generative Adversarial Network with Collaborative Attention for Single-Image Super-Resolution
by Haoxiang Lu, Jing Zhang, Mengyuan Jing, Ziming Wang and Wenhao Wang
J. Imaging 2026, 12(2), 79; https://doi.org/10.3390/jimaging12020079 - 11 Feb 2026
Viewed by 519
Abstract
Single-image super-resolution (SISR) is an essential low-level visual task that aims to produce high-resolution images from low-resolution inputs. However, most existing SISR methods heavily rely on ideal degradation kernels and rarely consider the actual noise distribution. To tackle these issues, this paper presents [...] Read more.
Single-image super-resolution (SISR) is an essential low-level visual task that aims to produce high-resolution images from low-resolution inputs. However, most existing SISR methods heavily rely on ideal degradation kernels and rarely consider the actual noise distribution. To tackle these issues, this paper presents a progressive upsampling generative adversarial network with collaborative attention mechanism called PUGAN. Specifically, the residual multiscale blocks (RMBs) based on stacked mixed-pooling multiscale structures (MPMSs) is designed to make full use of multiscale global–local hierarchical features, and the frequency collaborative attention mechanism (CAM) is used to fully dig up high- and low-frequency characteristics. Meanwhile, we design a progressive upsampling strategy to guide the model’s learning better while reducing the model’s complexity. Finally, the discriminator is also used to evaluate the reconstructed high-resolution images for balancing super-resolution reconstruction and detail enhancement. Our PUGAN can yield comparable PSNR/SSIM/LPIPS values for the NTIRE 2020, Urban 100, and B100 datasets, whose values are 33.987/0.9673/0.1210, 32.966/0.9483/0.1431, and 33.627/0.9546/0.1354 for the scale factor of ×2 as well as 26.349/0.8721/0.1975, 26.110/0.8614/0.1983, and 26.306/0.8803/0.1978 for the scale factor of ×4, respectively. Extensive experiments demonstrate that our PUGAN outperforms state-of-the-art SISR methods in qualitative and quantitative assessments for the SISR task. Additionally, our PUGAN shows the potential benefits to pathological image super-resolution. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

15 pages, 3953 KB  
Article
Age Prediction of Hematoma from Hyperspectral Images Using Convolutional Neural Networks
by Arash Keshavarz, Gerald Bieber, Daniel Wulff, Carsten Babian and Stefan Lüdtke
J. Imaging 2026, 12(2), 78; https://doi.org/10.3390/jimaging12020078 - 11 Feb 2026
Viewed by 798
Abstract
Accurate estimation of hematoma age remains a major challenge in forensic practice, as current assessments rely heavily on subjective visual interpretation. Hyperspectral imaging (HSI) captures rich spectral signatures that may reflect the biochemical evolution of hematomas over time. This study evaluates whether a [...] Read more.
Accurate estimation of hematoma age remains a major challenge in forensic practice, as current assessments rely heavily on subjective visual interpretation. Hyperspectral imaging (HSI) captures rich spectral signatures that may reflect the biochemical evolution of hematomas over time. This study evaluates whether a convolutional neural network (CNN) integrating both spectral and spatial information improves hematoma age estimation accuracy. Additionally, we investigate whether performance can be maintained using a reduced, physiologically motivated subset of wavelengths. Using a dataset of forearm hematomas from 25 participants, we applied radiometric normalization and SAM-based segmentation to extract 64×64×204 hyperspectral patches. In leave-one-subject-out cross-validation, the CNN outperformed a spectral-only Lasso baseline, reducing the mean absolute error (MAE) from 3.24 days to 2.29 days. Band-importance analysis combining SmoothGrad and occlusion sensitivity identified 20 highly informative wavelengths; using only these bands matched or exceeded the accuracy of the full 204-band model across early, middle, and late hematoma stages. These results demonstrate that spectral–spatial modeling and physiologically grounded band selection can enhance estimation accuracy while significantly reducing data dimensionality. This approach supports the development of compact multispectral systems for objective clinical and forensic evaluation. Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
Show Figures

Figure 1

1 pages, 162 KB  
Correction
Correction: Jiang et al. Double-Gated Mamba Multi-Scale Adaptive Feature Learning Network for Unsupervised Single RGB Image Hyperspectral Image Reconstruction. J. Imaging 2026, 12, 19
by Zhongmin Jiang, Zhen Wang, Wenju Wang and Jifan Zhu
J. Imaging 2026, 12(2), 77; https://doi.org/10.3390/jimaging12020077 - 11 Feb 2026
Viewed by 336
Abstract
There were two errors in the original publication [...] Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
14 pages, 1705 KB  
Article
A Multiphase CT-Based Integrated Deep Learning Framework for Rectal Cancer Detection, Segmentation, and Staging: Performance Comparison with Radiologist Assessment
by Tzu-Hsueh Tsai, Jia-Hui Lin, Yen-Te Liu, Jhing-Fa Wang, Chien-Hung Lee and Chiao-Yun Chen
J. Imaging 2026, 12(2), 76; https://doi.org/10.3390/jimaging12020076 - 10 Feb 2026
Viewed by 669
Abstract
Accurate staging of rectal cancer is crucial for treatment planning; however, computed tomography (CT) interpretation remains challenging and highly dependent on radiologist expertise. This study aimed to develop and evaluate an AI-assisted system for rectal cancer detection and staging using CT images. The [...] Read more.
Accurate staging of rectal cancer is crucial for treatment planning; however, computed tomography (CT) interpretation remains challenging and highly dependent on radiologist expertise. This study aimed to develop and evaluate an AI-assisted system for rectal cancer detection and staging using CT images. The proposed framework integrates three components—a convolutional neural network (RCD-CNN) for lesion detection, a U-Net model for rectal contour delineation and tumor localization, and a 3D convolutional network (RCS-3DCNN) for staging prediction. CT scans from 223 rectal cancer patients at Kaohsiung Medical University Chung-Ho Memorial Hospital were retrospectively analyzed, including both non-contrast and contrast-enhanced studies. RCD-CNN achieved an accuracy of 0.976, recall of 0.975, and precision of 0.976. U-Net yielded Dice scores of 0.897 (rectal contours) and 0.856 (tumor localization). Radiologist-based clinical staging had 82.6% concordance with pathology, while AI-based staging achieved 80.4%. McNemar’s test showed no significant difference between the AI and radiologist staging results (p = 1.0). The proposed AI-assisted system achieved staging accuracy comparable to that of radiologists and demonstrated feasibility as a decision-support tool in rectal cancer management. This study introduces a novel three-stage, dual-phase CT-based AI framework that integrates lesion detection, segmentation, and staging within a unified workflow. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

22 pages, 4477 KB  
Article
Robust Detection and Localization of Image Copy-Move Forgery Using Multi-Feature Fusion
by Kaiqi Lu and Qiuyu Zhang
J. Imaging 2026, 12(2), 75; https://doi.org/10.3390/jimaging12020075 - 10 Feb 2026
Viewed by 796
Abstract
Copy-move forgery detection (CMFD) is a crucial image forensics analysis technique. The rapid development of deep learning algorithms has led to impressive advancements in CMFD. However, existing models suffer from two key limitations: Their feature fusion modules insufficiently exploit the complementary nature of [...] Read more.
Copy-move forgery detection (CMFD) is a crucial image forensics analysis technique. The rapid development of deep learning algorithms has led to impressive advancements in CMFD. However, existing models suffer from two key limitations: Their feature fusion modules insufficiently exploit the complementary nature of features from the RGB domain and noise domain, resulting in suboptimal feature representations. During decoding, they simply classify pixels as authentic or forged, without aggregating cross-layer information or integrating local and global attention mechanisms, leading to unsatisfactory detection precision. To overcome these limitations, a robust detection and localization approach to image copy-move forgery using multi-feature fusion is proposed. Firstly, a Multi-Feature Fusion Network (MFFNet) was designed. Within its feature fusion module, features from both the RGB domain and noise domain were fused to enable mutual complementarity between distinct characteristics, yielding richer feature information. Then, a Lightweight Multi-layer Perceptron Decoder (LMPD) was developed for image reconstruction and forgery localization map generation. Finally, by aggregating information from different layers and combining local and global attention mechanisms, more accurate prediction masks were obtained. The experimental results demonstrate that the proposed MFFNet model exhibits enhanced robustness and superior detection and localization performance compared to existing methods when faced with JPEG compression, noise addition, and resizing operations. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

20 pages, 3102 KB  
Article
LDFSAM: Localization Distillation-Enhanced Feature Prompting SAM for Medical Image Segmentation
by Xuanbo Zhao, Cheng Wang, Huaxing Xu, Hong Zhou, Zekuan Yu, Tao Chen, Xiaoling Wei and Rongjun Zhang
J. Imaging 2026, 12(2), 74; https://doi.org/10.3390/jimaging12020074 - 10 Feb 2026
Viewed by 849
Abstract
Standard SAM-based approaches in medical imaging typically rely on explicit geometric prompts, such as bounding boxes or points. However, these rigid spatial constraints are often insufficient for capturing the complex, deformable boundaries of medical structures, where localization noise easily propagates into segmentation errors. [...] Read more.
Standard SAM-based approaches in medical imaging typically rely on explicit geometric prompts, such as bounding boxes or points. However, these rigid spatial constraints are often insufficient for capturing the complex, deformable boundaries of medical structures, where localization noise easily propagates into segmentation errors. To overcome this, we propose the Localization Distillation-Enhanced Feature Prompting SAM (LDFSAM), a novel framework that shifts from discrete coordinate inputs to a latent feature prompting paradigm. We employ a lightweight prompt generator, refined via Localization Distillation (LD), to inject multi-scale features into the SAM decoder as complementary Dense Feature Prompts (DFPs) and Sparse Feature Prompts (SFPs). This effectively guides segmentation without explicit box constraints. Extensive experiments on four public benchmarks (3D CBCT Tooth, ISIC 2018, MMOTU, and Kvasir-SEG) demonstrate that LDFSAM outperforms both prior SAM-based baselines and conventional networks, achieving Dice scores exceeding 0.91. Further validation on an in-house cohort demonstrates its robust generalization capabilities. Overall, our method outperforms both prior SAM-based baselines and conventional networks, with particularly strong gains in low-data regimes, providing a reliable solution for automated medical image segmentation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 11220 KB  
Article
Assessing Impact of Data Quality in Early Post-Operative Glioblastoma Segmentation
by Ragnhild Holden Helland, David Bouget, Asgeir Store Jakola, Sébastien Muller, Ole Solheim and Ingerid Reinertsen
J. Imaging 2026, 12(2), 73; https://doi.org/10.3390/jimaging12020073 - 10 Feb 2026
Viewed by 544
Abstract
Quantification of the residual tumor from early post-operative magnetic resonance imaging (MRI) is essential in follow-up and treatment planning for glioblastoma patients. Residual tumor segmentation from early post-operative MRI is particularly challenging compared to the closely related task of pre-operative segmentation, as the [...] Read more.
Quantification of the residual tumor from early post-operative magnetic resonance imaging (MRI) is essential in follow-up and treatment planning for glioblastoma patients. Residual tumor segmentation from early post-operative MRI is particularly challenging compared to the closely related task of pre-operative segmentation, as the tumor lesions are small, fragmented, and easily confounded with noise in the resection cavity. Recently, several studies successfully trained deep learning models for early post-operative segmentation, yet with subpar performances compared to the analogous task pre-operatively. In this study, the impact of image and annotation quality on model training and performance in early post-operative glioblastoma segmentation was assessed. A dataset consisting of early post-operative MRI scans from 423 patients and two hospitals in Norway and Sweden was assembled, for which image and annotation qualities were evaluated by expert neurosurgeons. The Attention U-Net architecture was trained with five-fold cross-validation on different quality-based subsets of the dataset in order to evaluate the impact of training data quality on model performance. Including low-quality images in the training set did not deteriorate performance on high-quality images. However, models trained on exclusively high-quality images did not generalize to low-quality images. Models trained on exclusively high-quality annotations reached the same performance level as the models trained on the entire dataset, using only two-thirds of the dataset. Both image and annotation quality had a significant impact on model performance. In dataset curation, images should ideally be representative of the quality variations in the real-world clinical scenario, and efforts should be made to ensure exact ground truth annotations of high quality. Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)
Show Figures

Figure 1

30 pages, 4931 KB  
Article
GreenViT: A Vision Transformer with Single-Path Progressive Upsampling for Urban Green-Space Segmentation and Auditable Area Estimation
by Ziqiang Xu, Young Choi, Changyong Yi, Chanjeong Park, Jinyoung Park, Hyungkeun Park and Sujeen Song
J. Imaging 2026, 12(2), 72; https://doi.org/10.3390/jimaging12020072 - 10 Feb 2026
Viewed by 549
Abstract
Urban green-space monitoring in dense cityscapes remains limited by accuracy–efficiency trade-offs and the absence of integrated, auditable area estimation. We introduce GreenViT, a Vision Transformer (ViT) based framework for precise segmentation and transparent quantification of urban green space. GreenViT couples a ViT-L/14 backbone [...] Read more.
Urban green-space monitoring in dense cityscapes remains limited by accuracy–efficiency trade-offs and the absence of integrated, auditable area estimation. We introduce GreenViT, a Vision Transformer (ViT) based framework for precise segmentation and transparent quantification of urban green space. GreenViT couples a ViT-L/14 backbone with a lightweight single-path, progressive upsampling decoder (Green Head), preserving global context while recovering thin structures. Experiments were conducted on a manually annotated dataset of 20 high-resolution satellite images collected from Satellites.Pro, covering five land-cover classes (background, green space, building, road, and water). Using a 224 × 224 sliding window sampling scheme, the 20 images yield 62,650 training/validation patches. Under five-fold evaluation, it attains 0.9200 ± 0.0243 mIoU, 0.9580 ± 0.0135 Dice, and 0.9570 PA, and the calibrated estimator achieves 1.10% relative area error. Overall, GreenViT strikes a strong balance between accuracy and efficiency, making it particularly well-suited for thin or boundary-rich classes. It can be used to support planning evaluations, green-space statistics, urban renewal assessments, and ecological red-line verification, while providing reliable green-area metrics to support urban heat mitigation and pollution control efforts. This makes it highly suitable for decision-oriented long-term monitoring and management assessments. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

17 pages, 3773 KB  
Article
Relationship Between Display Pixel Structure and Gloss Perception
by Kosei Aketagawa, Midori Tanaka and Takahiko Horiuchi
J. Imaging 2026, 12(2), 71; https://doi.org/10.3390/jimaging12020071 - 9 Feb 2026
Viewed by 409
Abstract
The demand for accurate representation of gloss perception, which significantly contributes to the impression and evaluation of objects, is increasing owing to recent advancements in display technology enabling high-definition visual reproduction. This study experimentally analyzes the influence of display pixel structure on gloss [...] Read more.
The demand for accurate representation of gloss perception, which significantly contributes to the impression and evaluation of objects, is increasing owing to recent advancements in display technology enabling high-definition visual reproduction. This study experimentally analyzes the influence of display pixel structure on gloss perception. In a visual evaluation experiment using natural images, gloss perception was assessed across six types of stimuli: three subpixel arrays (RGB, RGBW, and PenTile RGBG) combined with two pixel–aperture ratios (100% and 50%). The experimental results statistically confirmed that regardless of pixel–aperture ratio, the RGB subpixel array was perceived as exhibiting the strongest gloss. Furthermore, cluster analysis of observers revealed individual differences in the effect of pixel structure on gloss perception. Additionally, gloss classification and image feature analysis suggested that the magnitude of pixel structure influence varies depending on the frequency components contained in the images. Moreover, analysis using a generalized linear mixed model supported the superiority of the RGB subpixel array even when accounting for variability across observers and natural images. Full article
Show Figures

Figure 1

19 pages, 2617 KB  
Article
Topic-Modeling Guided Semantic Clustering for Enhancing CNN-Based Image Classification Using Scale-Invariant Feature Transform and Block Gabor Filtering
by Natthaphong Suthamno and Jessada Tanthanuch
J. Imaging 2026, 12(2), 70; https://doi.org/10.3390/jimaging12020070 - 9 Feb 2026
Viewed by 542
Abstract
This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local [...] Read more.
This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local feature descriptors. These descriptors are clustered using K-means to build a visual vocabulary. Bag of Words histograms then represent each image as a visual document. Latent Dirichlet Allocation is applied to uncover latent semantic topics, generating coherent image clusters. Cluster-specific CNN models, including AlexNet, GoogLeNet, and several ResNet variants, are trained under identical conditions to identify the most suitable architecture for each cluster. Two topic guided integration strategies, the Maximum Proportion Topic (MPT) and the Weight Proportion Topic (WPT), are then used to assign test images to the corresponding specialized model. Experimental results show that both the SIFT-based and BGF-based pipelines outperform non-clustered CNN models and a baseline method using Incremental PCA, K-means, Same-Cluster Prediction, and unweighted Ensemble Voting. The SIFT pipeline achieves the highest accuracy of 95.24% with the MPT strategy, while the BGF pipeline achieves 93.76% with the WPT strategy. These findings confirm that semantic structure introduced through topic modeling substantially improves CNN classification performance. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

27 pages, 18987 KB  
Article
YOLO11s-UAV: An Advanced Algorithm for Small Object Detection in UAV Aerial Imagery
by Qi Mi, Jianshu Chao, Anqi Chen, Kaiyuan Zhang and Jiahua Lai
J. Imaging 2026, 12(2), 69; https://doi.org/10.3390/jimaging12020069 - 6 Feb 2026
Cited by 4 | Viewed by 2472
Abstract
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in [...] Read more.
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model’s backbone. This module effectively eliminates information loss caused by strided convolution or pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model’s complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 1944 KB  
Article
Automated Radiological Report Generation from Breast Ultrasound Images Using Vision and Language Transformers
by Shaheen Khatoon and Azhar Mahmood
J. Imaging 2026, 12(2), 68; https://doi.org/10.3390/jimaging12020068 - 6 Feb 2026
Viewed by 1160
Abstract
Breast ultrasound imaging is widely used for the detection and characterization of breast abnormalities; however, generating detailed and consistent radiological reports remains a labor-intensive and subjective process. Recent advances in deep learning have demonstrated the potential of automated report generation systems to support [...] Read more.
Breast ultrasound imaging is widely used for the detection and characterization of breast abnormalities; however, generating detailed and consistent radiological reports remains a labor-intensive and subjective process. Recent advances in deep learning have demonstrated the potential of automated report generation systems to support clinical workflows, yet most existing approaches focus on chest X-ray imaging and rely on convolutional–recurrent architectures with limited capacity to model long-range dependencies and complex clinical semantics. In this work, we propose a multimodal Transformer-based framework for automatic breast ultrasound report generation that integrates visual and textual information through cross-attention mechanisms. The proposed architecture employs a Vision Transformer (ViT) to extract rich spatial and morphological features from ultrasound images. For textual embedding, pretrained language models (BERT, BioBERT, and GPT-2) are implemented in various encoder–decoder configurations to leverage both general linguistic knowledge and domain-specific biomedical semantics. A multimodal Transformer decoder is implemented to autoregressively generate diagnostic reports by jointly attending to visual features and contextualized textual embeddings. We conducted an extensive quantitative evaluation using standard report generation metrics, including BLEU, ROUGE-L, METEOR, and CIDEr, to assess lexical accuracy, semantic alignment, and clinical relevance. Experimental results demonstrate that BioBERT-based models consistently outperform general domain counterparts in clinical specificity, while GPT-2-based decoders improve linguistic fluency. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

16 pages, 5504 KB  
Article
Predicting Nutritional and Morphological Attributes of Fresh Commercial Opuntia Cladodes Using Machine Learning and Imaging
by Juan Arredondo Valdez, Josué Israel García López, Héctor Flores Breceda, Ajay Kumar, Ricardo David Valdez Cepeda and Alejandro Isabel Luna Maldonado
J. Imaging 2026, 12(2), 67; https://doi.org/10.3390/jimaging12020067 - 5 Feb 2026
Viewed by 822
Abstract
Opuntia ficus-indica L. is a prominent crop in Mexico, requiring advanced non-destructive technologies for the real-time monitoring and quality control of fresh commercial cladodes. The primary research objective of this study was to develop and validate high-precision mathematical models that correlate hyperspectral signatures [...] Read more.
Opuntia ficus-indica L. is a prominent crop in Mexico, requiring advanced non-destructive technologies for the real-time monitoring and quality control of fresh commercial cladodes. The primary research objective of this study was to develop and validate high-precision mathematical models that correlate hyperspectral signatures (400–1000 nm) with the specific nutritional, morphological, and antioxidant attributes of fresh cladodes (cultivar Villanueva) at their peak commercial maturity. By combining hyperspectral imaging (HSI) with machine learning algorithms, including K-Means clustering for image preprocessing and Partial Least Squares Regression (PLSR) for predictive modeling, this study successfully predicted the concentrations of 10 minerals (N, P, K, Ca, Mg, Fe, B, Mn, Zn, and Cu), chlorophylls (a, b, and Total), and antioxidant capacities (ABTS, FRAP, and DPPH). The innovative nature of this work lies in the simultaneous non-destructive quantification of 17 distinct variables from a single scan, achieving coefficients of determination (R2) as high as 0.988 for Phosphorus and Chlorophyll b. The practical applicability of this research provides a viable replacement for time-consuming and destructive laboratory acid digestion, enabling producers to implement automated, high-throughput sorting lines for quality assurance. Furthermore, this study establishes a framework for interdisciplinary collaborations between agricultural engineers, data scientists for algorithm optimization, and food scientists to enhance the functional value chain of Opuntia products. Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
Show Figures

Figure 1

35 pages, 4998 KB  
Review
A Survey of Crop Disease Recognition Methods Based on Spectral and RGB Images
by Haoze Zheng, Heran Wang, Hualong Dong and Yurong Qian
J. Imaging 2026, 12(2), 66; https://doi.org/10.3390/jimaging12020066 - 5 Feb 2026
Cited by 2 | Viewed by 1228
Abstract
Major crops worldwide are affected by various diseases yearly, leading to crop losses in different regions. The primary methods for addressing crop disease losses include manual inspection and chemical control. However, traditional manual inspection methods are time-consuming, labor-intensive, and require specialized knowledge. The [...] Read more.
Major crops worldwide are affected by various diseases yearly, leading to crop losses in different regions. The primary methods for addressing crop disease losses include manual inspection and chemical control. However, traditional manual inspection methods are time-consuming, labor-intensive, and require specialized knowledge. The preemptive use of chemicals also poses a risk of soil pollution, which may cause irreversible damage. With the advancement of computer hardware, photographic technology, and artificial intelligence, crop disease recognition methods based on spectral and red–green–blue (RGB) images not only recognize diseases without damaging the crops but also offer high accuracy and speed of recognition, essentially solving the problems associated with manual inspection and chemical control. This paper summarizes the research on disease recognition methods based on spectral and RGB images, with the literature spanning from 2020 through early 2025. Unlike previous surveys, this paper reviews recent advances involving emerging paradigms such as State Space Models (e.g., Mamba) and Generative AI in the context of crop disease recognition. In addition, it introduces public datasets and commonly used evaluation metrics for crop disease identification. Finally, the paper discusses potential issues and solutions encountered during research, including the use of diffusion models for data augmentation. Hopefully, this survey will help readers understand the current methods and effectiveness of crop disease detection, inspiring the development of more effective methods to assist farmers in identifying crop diseases. Full article
(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)
Show Figures

Figure 1

23 pages, 3301 KB  
Article
Ciphertext-Only Attack on Grayscale-Based EtC Image Encryption via Component Separation and Regularized Single-Channel Compatibility
by Ruifeng Li and Masaaki Fujiyoshi
J. Imaging 2026, 12(2), 65; https://doi.org/10.3390/jimaging12020065 - 5 Feb 2026
Viewed by 572
Abstract
Grayscale-based Encryption-then-Compression (EtC) systems transform RGB images into the YCbCr color space, concatenate the components into a single grayscale image, and apply block permutation, block rotation/flipping, and block-wise negative–positive inversion. Because this pipeline separates color components and disrupts inter-channel statistics, existing extended jigsaw [...] Read more.
Grayscale-based Encryption-then-Compression (EtC) systems transform RGB images into the YCbCr color space, concatenate the components into a single grayscale image, and apply block permutation, block rotation/flipping, and block-wise negative–positive inversion. Because this pipeline separates color components and disrupts inter-channel statistics, existing extended jigsaw puzzle solvers (JPSs) have been regarded as ineffective, and grayscale-based EtC systems have been considered resistant to ciphertext-only visual reconstruction. In this paper, we present a practical ciphertext-only attack against grayscale-based EtC. The proposed attack introduces three key components: (i) Texture-Based Component Classification (TBCC) to distinguish luminance (Y) and chrominance (Cb/Cr) blocks and focus reconstruction on structure-rich regions; (ii) Regularized Single-Channel Edge Compatibility (R-SCEC), which applies Tikhonov regularization to a single-channel variant of the Mahalanobis Gradient Compatibility (MGC) measure to alleviate covariance rank-deficiency while maintaining robustness under inversion and geometric transforms; and (iii) Adaptive Pruning based on the TBCC-reduced search space that skips redundant boundary matching computations to further improve reconstruction efficiency. Experiments show that, in settings where existing extended JPS solvers fail, our method can still recover visually recognizable semantic content, revealing a potential vulnerability in grayscale-based EtC and calling for a re-evaluation of its security. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

36 pages, 11446 KB  
Article
SIFT-SNN for Traffic-Flow Infrastructure Safety: A Real-Time Context-Aware Anomaly Detection Framework
by Munish Rathee, Boris Bačić and Maryam Doborjeh
J. Imaging 2026, 12(2), 64; https://doi.org/10.3390/jimaging12020064 - 31 Jan 2026
Viewed by 729
Abstract
Automated anomaly detection in transportation infrastructure is essential for enhancing safety and reducing the operational costs associated with manual inspection protocols. This study presents an improved neuromorphic vision system, which extends the prior SIFT-SNN (scale-invariant feature transform–spiking neural network) proof-of-concept by incorporating temporal [...] Read more.
Automated anomaly detection in transportation infrastructure is essential for enhancing safety and reducing the operational costs associated with manual inspection protocols. This study presents an improved neuromorphic vision system, which extends the prior SIFT-SNN (scale-invariant feature transform–spiking neural network) proof-of-concept by incorporating temporal feature aggregation for context-aware and sequence-stable detection. Analysis of classical stitching-based pipelines exposed sensitivity to motion and lighting variations, motivating the proposed temporally smoothed neuromorphic design. SIFT keypoints are encoded into latency-based spike trains and classified using a leaky integrate-and-fire (LIF) spiking neural network implemented in PyTorch. Evaluated across three hardware configurations—an NVIDIA RTX 4060 GPU, an Intel i7 CPU, and a simulated Jetson Nano—the system achieved 92.3% accuracy and a macro F1 score of 91.0% under five-fold cross-validation. Inference latencies were measured at 9.5 ms, 26.1 ms, and ~48.3 ms per frame, respectively. Memory footprints were under 290 MB, and power consumption was estimated to be between 5 and 65 W. The classifier distinguishes between safe, partially dislodged, and fully dislodged barrier pins, which are critical failure modes for the Auckland Harbour Bridge’s Movable Concrete Barrier (MCB) system. Temporal smoothing further improves recall for ambiguous cases. By achieving a compact model size (2.9 MB), low-latency inference, and minimal power demands, the proposed framework offers a deployable, interpretable, and energy-efficient alternative to conventional CNN-based inspection tools. Future work will focus on exploring the generalisability and transferability of the work presented, additional input sources, and human–computer interaction paradigms for various deployment infrastructures and advancements. Full article
Show Figures

Figure 1

20 pages, 1142 KB  
Article
A Cross-Domain Benchmark of Intrinsic and Post Hoc Explainability for 3D Deep Learning Models
by Asmita Chakraborty, Gizem Karagoz and Nirvana Meratnia
J. Imaging 2026, 12(2), 63; https://doi.org/10.3390/jimaging12020063 - 30 Jan 2026
Viewed by 1066
Abstract
Deep learning models for three-dimensional (3D) data are increasingly used in domains such as medical imaging, object recognition, and robotics. At the same time, the use of AI in these domains is increasing, while, due to their black-box nature, the need for explainability [...] Read more.
Deep learning models for three-dimensional (3D) data are increasingly used in domains such as medical imaging, object recognition, and robotics. At the same time, the use of AI in these domains is increasing, while, due to their black-box nature, the need for explainability has grown significantly. However, the lack of standardized and quantitative benchmarks for explainable artificial intelligence (XAI) in 3D data limits the reliable comparison of explanation quality. In this paper, we present a unified benchmarking framework to evaluate both intrinsic and post hoc XAI methods across three representative 3D datasets: volumetric CT scans (MosMed), voxelized CAD models (ModelNet40), and real-world point clouds (ScanObjectNN). The evaluated methods include Grad-CAM, Integrated Gradients, Saliency, Occlusion, and the intrinsic ResAttNet-3D model. We quantitatively assess explanations using the Correctness (AOPC), Completeness (AUPC), and Compactness metrics, consistently applied across all datasets. Our results show that explanation quality significantly varies across methods and domains, demonstrating that Grad-CAM and intrinsic attention performed best on medical CT scans, while gradient-based methods excelled on voxelized and point-based data. Statistical tests (Kruskal–Wallis and Mann–Whitney U) confirmed significant performance differences between methods. No single approach achieved superior results across all domains, highlighting the importance of multi-metric evaluation. This work provides a reproducible framework for standardized assessment of 3D explainability and comparative insights to guide future XAI method selection. Full article
(This article belongs to the Special Issue Explainable AI in Computer Vision)
Show Figures

Figure 1

17 pages, 1082 KB  
Article
AACNN-ViT: Adaptive Attention-Augmented Convolutional and Vision Transformer Fusion for Lung Cancer Detection
by Mohammad Ishtiaque Rahman and Amrina Rahman
J. Imaging 2026, 12(2), 62; https://doi.org/10.3390/jimaging12020062 - 30 Jan 2026
Cited by 1 | Viewed by 798
Abstract
Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented [...] Read more.
Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented Convolutional Neural Network with Vision Transformer (AACNN-ViT), a hybrid framework that integrates local convolutional representations with global transformer embeddings through an adaptive attention-based fusion module. The CNN branch captures fine-grained spatial patterns, the ViT branch encodes long-range contextual dependencies, and the adaptive fusion mechanism learns to weight cross-representation interactions to improve discriminability. To reduce the impact of imbalance, a hybrid objective that combines focal loss with categorical cross-entropy is incorporated during training. Experiments on the IQ-OTH/NCCD dataset (benign, malignant, and normal) show consistent performance progression in an ablation-style evaluation: CNN-only, ViT-only, CNN-ViT concatenation, and AACNN-ViT. The proposed AACNN-ViT achieved 96.97% accuracy on the validation set with macro-averaged precision/recall/F1 of 0.9588/0.9352/0.9458 and weighted F1 of 0.9693, substantially improving minority-class recognition (Benign recall 0.8333) compared with CNN-ViT (accuracy 89.09%, macro-F1 0.7680). One-vs.-rest ROC analysis further indicates strong separability across all classes (micro-average AUC 0.992). These results suggest that adaptive attention-based fusion offers a robust and clinically relevant approach for computer-aided lung cancer screening and decision support. Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop