-
A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis -
Next-Generation Advances in Prostate Cancer Imaging and Artificial Intelligence Applications -
Classifying Sex from MSCT-Derived 3D Mandibular Models Using an Adapted PointNet++ Deep Learning Approach in a Croatian Population -
AIGD Era: From Fragment to One Piece
Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques, published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
YOLO11s-UAV: An Advanced Algorithm for Small Object Detection in UAV Aerial Imagery
J. Imaging 2026, 12(2), 69; https://doi.org/10.3390/jimaging12020069 - 6 Feb 2026
Abstract
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in
[...] Read more.
Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model’s backbone. This module effectively eliminates information loss caused by pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model’s complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Open AccessArticle
Automated Radiological Report Generation from Breast Ultrasound Images Using Vision and Language Transformers
by
Shaheen Khatoon and Azhar Mahmood
J. Imaging 2026, 12(2), 68; https://doi.org/10.3390/jimaging12020068 - 6 Feb 2026
Abstract
Breast ultrasound imaging is widely used for the detection and characterization of breast abnormalities; however, generating detailed and consistent radiological reports remains a labor-intensive and subjective process. Recent advances in deep learning have demonstrated the potential of automated report generation systems to support
[...] Read more.
Breast ultrasound imaging is widely used for the detection and characterization of breast abnormalities; however, generating detailed and consistent radiological reports remains a labor-intensive and subjective process. Recent advances in deep learning have demonstrated the potential of automated report generation systems to support clinical workflows, yet most existing approaches focus on chest X-ray imaging and rely on convolutional–recurrent architectures with limited capacity to model long-range dependencies and complex clinical semantics. In this work, we propose a multimodal Transformer-based framework for automatic breast ultrasound report generation that integrates visual and textual information through cross-attention mechanisms. The proposed architecture employs a Vision Transformer (ViT) to extract rich spatial and morphological features from ultrasound images. For textual embedding, pretrained language models (BERT, BioBERT, and GPT-2) are implemented in various encoder–decoder configurations to leverage both general linguistic knowledge and domain-specific biomedical semantics. A multimodal Transformer decoder is implemented to autoregressively generate diagnostic reports by jointly attending to visual features and contextualized textual embeddings. We conducted an extensive quantitative evaluation using standard report generation metrics, including BLEU, ROUGE-L, METEOR, and CIDEr, to assess lexical accuracy, semantic alignment, and clinical relevance. Experimental results demonstrate that BioBERT-based models consistently outperform general domain counterparts in clinical specificity, while GPT-2-based decoders improve linguistic fluency.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Predicting Nutritional and Morphological Attributes of Fresh Commercial Opuntia Cladodes Using Machine Learning and Imaging
by
Juan Arredondo Valdez, Josué Israel García López, Héctor Flores Breceda, Ajay Kumar, Ricardo David Valdez Cepeda and Alejandro Isabel Luna Maldonado
J. Imaging 2026, 12(2), 67; https://doi.org/10.3390/jimaging12020067 - 5 Feb 2026
Abstract
Opuntia ficus-indica L. is a prominent crop in Mexico, requiring advanced non-destructive technologies for the real-time monitoring and quality control of fresh commercial cladodes. The primary research objective of this study was to develop and validate high-precision mathematical models that correlate hyperspectral signatures
[...] Read more.
Opuntia ficus-indica L. is a prominent crop in Mexico, requiring advanced non-destructive technologies for the real-time monitoring and quality control of fresh commercial cladodes. The primary research objective of this study was to develop and validate high-precision mathematical models that correlate hyperspectral signatures (400–1000 nm) with the specific nutritional, morphological, and antioxidant attributes of fresh cladodes (cultivar Villanueva) at their peak commercial maturity. By combining hyperspectral imaging (HSI) with machine learning algorithms, including K-Means clustering for image preprocessing and Partial Least Squares Regression (PLSR) for predictive modeling, this study successfully predicted the concentrations of 10 minerals (N, P, K, Ca, Mg, Fe, B, Mn, Zn, and Cu), chlorophylls (a, b, and Total), and antioxidant capacities (ABTS, FRAP, and DPPH). The innovative nature of this work lies in the simultaneous non-destructive quantification of 17 distinct variables from a single scan, achieving coefficients of determination (R2) as high as 0.988 for Phosphorus and Chlorophyll b. The practical applicability of this research provides a viable replacement for time-consuming and destructive laboratory acid digestion, enabling producers to implement automated, high-throughput sorting lines for quality assurance. Furthermore, this study establishes a framework for interdisciplinary collaborations between agricultural engineers, data scientists for algorithm optimization, and food scientists to enhance the functional value chain of Opuntia products.
Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
Open AccessReview
A Survey of Crop Disease Recognition Methods Based on Spectral and RGB Images
by
Haoze Zheng, Heran Wang, Hualong Dong and Yurong Qian
J. Imaging 2026, 12(2), 66; https://doi.org/10.3390/jimaging12020066 - 5 Feb 2026
Abstract
Major crops worldwide are affected by various diseases yearly, leading to crop losses in different regions. The primary methods for addressing crop disease losses include manual inspection and chemical control. However, traditional manual inspection methods are time-consuming, labor-intensive, and require specialized knowledge. The
[...] Read more.
Major crops worldwide are affected by various diseases yearly, leading to crop losses in different regions. The primary methods for addressing crop disease losses include manual inspection and chemical control. However, traditional manual inspection methods are time-consuming, labor-intensive, and require specialized knowledge. The preemptive use of chemicals also poses a risk of soil pollution, which may cause irreversible damage. With the advancement of computer hardware, photographic technology, and artificial intelligence, crop disease recognition methods based on spectral and red–green–blue (RGB) images not only recognize diseases without damaging the crops but also offer high accuracy and speed of recognition, essentially solving the problems associated with manual inspection and chemical control. This paper summarizes the research on disease recognition methods based on spectral and RGB images, with the literature spanning from 2020 through early 2025. Unlike previous surveys, this paper reviews recent advances involving emerging paradigms such as State Space Models (e.g., Mamba) and Generative AI in the context of crop disease recognition. In addition, it introduces public datasets and commonly used evaluation metrics for crop disease identification. Finally, the paper discusses potential issues and solutions encountered during research, including the use of diffusion models for data augmentation. Hopefully, this survey will help readers understand the current methods and effectiveness of crop disease detection, inspiring the development of more effective methods to assist farmers in identifying crop diseases.
Full article
(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
Ciphertext-Only Attack on Grayscale-Based EtC Image Encryption via Component Separation and Regularized Single-Channel Compatibility
by
Ruifeng Li and Masaaki Fujiyoshi
J. Imaging 2026, 12(2), 65; https://doi.org/10.3390/jimaging12020065 - 5 Feb 2026
Abstract
Grayscale-based Encryption-then-Compression (EtC) systems transform RGB images into the YCbCr color space, concatenate the components into a single grayscale image, and apply block permutation, block rotation/flipping, and block-wise negative–positive inversion. Because this pipeline separates color components and disrupts inter-channel statistics, existing extended jigsaw
[...] Read more.
Grayscale-based Encryption-then-Compression (EtC) systems transform RGB images into the YCbCr color space, concatenate the components into a single grayscale image, and apply block permutation, block rotation/flipping, and block-wise negative–positive inversion. Because this pipeline separates color components and disrupts inter-channel statistics, existing extended jigsaw puzzle solvers (JPSs) have been regarded as ineffective, and grayscale-based EtC systems have been considered resistant to ciphertext-only visual reconstruction. In this paper, we present a practical ciphertext-only attack against grayscale-based EtC. The proposed attack introduces three key components: (i) Texture-Based Component Classification (TBCC) to distinguish luminance (Y) and chrominance (Cb/Cr) blocks and focus reconstruction on structure-rich regions; (ii) Regularized Single-Channel Edge Compatibility (R-SCEC), which applies Tikhonov regularization to a single-channel variant of the Mahalanobis Gradient Compatibility (MGC) measure to alleviate covariance rank-deficiency while maintaining robustness under inversion and geometric transforms; and (iii) Adaptive Pruning based on the TBCC-reduced search space that skips redundant boundary matching computations to further improve reconstruction efficiency. Experiments show that, in settings where existing extended JPS solvers fail, our method can still recover visually recognizable semantic content, revealing a potential vulnerability in grayscale-based EtC and calling for a re-evaluation of its security.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
SIFT-SNN for Traffic-Flow Infrastructure Safety: A Real-Time Context-Aware Anomaly Detection Framework
by
Munish Rathee, Boris Bačić and Maryam Doborjeh
J. Imaging 2026, 12(2), 64; https://doi.org/10.3390/jimaging12020064 - 31 Jan 2026
Abstract
Automated anomaly detection in transportation infrastructure is essential for enhancing safety and reducing the operational costs associated with manual inspection protocols. This study presents an improved neuromorphic vision system, which extends the prior SIFT-SNN (scale-invariant feature transform–spiking neural network) proof-of-concept by incorporating temporal
[...] Read more.
Automated anomaly detection in transportation infrastructure is essential for enhancing safety and reducing the operational costs associated with manual inspection protocols. This study presents an improved neuromorphic vision system, which extends the prior SIFT-SNN (scale-invariant feature transform–spiking neural network) proof-of-concept by incorporating temporal feature aggregation for context-aware and sequence-stable detection. Analysis of classical stitching-based pipelines exposed sensitivity to motion and lighting variations, motivating the proposed temporally smoothed neuromorphic design. SIFT keypoints are encoded into latency-based spike trains and classified using a leaky integrate-and-fire (LIF) spiking neural network implemented in PyTorch. Evaluated across three hardware configurations—an NVIDIA RTX 4060 GPU, an Intel i7 CPU, and a simulated Jetson Nano—the system achieved 92.3% accuracy and a macro F1 score of 91.0% under five-fold cross-validation. Inference latencies were measured at 9.5 ms, 26.1 ms, and ~48.3 ms per frame, respectively. Memory footprints were under 290 MB, and power consumption was estimated to be between 5 and 65 W. The classifier distinguishes between safe, partially dislodged, and fully dislodged barrier pins, which are critical failure modes for the Auckland Harbour Bridge’s Movable Concrete Barrier (MCB) system. Temporal smoothing further improves recall for ambiguous cases. By achieving a compact model size (2.9 MB), low-latency inference, and minimal power demands, the proposed framework offers a deployable, interpretable, and energy-efficient alternative to conventional CNN-based inspection tools. Future work will focus on exploring the generalisability and transferability of the work presented, additional input sources, and human–computer interaction paradigms for various deployment infrastructures and advancements.
Full article
(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)
Open AccessArticle
A Cross-Domain Benchmark of Intrinsic and Post Hoc Explainability for 3D Deep Learning Models
by
Asmita Chakraborty, Gizem Karagoz and Nirvana Meratnia
J. Imaging 2026, 12(2), 63; https://doi.org/10.3390/jimaging12020063 - 30 Jan 2026
Abstract
Deep learning models for three-dimensional (3D) data are increasingly used in domains such as medical imaging, object recognition, and robotics. At the same time, the use of AI in these domains is increasing, while, due to their black-box nature, the need for explainability
[...] Read more.
Deep learning models for three-dimensional (3D) data are increasingly used in domains such as medical imaging, object recognition, and robotics. At the same time, the use of AI in these domains is increasing, while, due to their black-box nature, the need for explainability has grown significantly. However, the lack of standardized and quantitative benchmarks for explainable artificial intelligence (XAI) in 3D data limits the reliable comparison of explanation quality. In this paper, we present a unified benchmarking framework to evaluate both intrinsic and post hoc XAI methods across three representative 3D datasets: volumetric CT scans (MosMed), voxelized CAD models (ModelNet40), and real-world point clouds (ScanObjectNN). The evaluated methods include Grad-CAM, Integrated Gradients, Saliency, Occlusion, and the intrinsic ResAttNet-3D model. We quantitatively assess explanations using the Correctness (AOPC), Completeness (AUPC), and Compactness metrics, consistently applied across all datasets. Our results show that explanation quality significantly varies across methods and domains, demonstrating that Grad-CAM and intrinsic attention performed best on medical CT scans, while gradient-based methods excelled on voxelized and point-based data. Statistical tests (Kruskal–Wallis and Mann–Whitney U) confirmed significant performance differences between methods. No single approach achieved superior results across all domains, highlighting the importance of multi-metric evaluation. This work provides a reproducible framework for standardized assessment of 3D explainability and comparative insights to guide future XAI method selection.
Full article
(This article belongs to the Special Issue Explainable AI in Computer Vision)
►▼
Show Figures

Figure 1
Open AccessArticle
AACNN-ViT: Adaptive Attention-Augmented Convolutional and Vision Transformer Fusion for Lung Cancer Detection
by
Mohammad Ishtiaque Rahman and Amrina Rahman
J. Imaging 2026, 12(2), 62; https://doi.org/10.3390/jimaging12020062 - 30 Jan 2026
Abstract
Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented
[...] Read more.
Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented Convolutional Neural Network with Vision Transformer (AACNN-ViT), a hybrid framework that integrates local convolutional representations with global transformer embeddings through an adaptive attention-based fusion module. The CNN branch captures fine-grained spatial patterns, the ViT branch encodes long-range contextual dependencies, and the adaptive fusion mechanism learns to weight cross-representation interactions to improve discriminability. To reduce the impact of imbalance, a hybrid objective that combines focal loss with categorical cross-entropy is incorporated during training. Experiments on the IQ-OTH/NCCD dataset (benign, malignant, and normal) show consistent performance progression in an ablation-style evaluation: CNN-only, ViT-only, CNN-ViT concatenation, and AACNN-ViT. The proposed AACNN-ViT achieved 96.97% accuracy on the validation set with macro-averaged precision/recall/F1 of 0.9588/0.9352/0.9458 and weighted F1 of 0.9693, substantially improving minority-class recognition (Benign recall 0.8333) compared with CNN-ViT (accuracy 89.09%, macro-F1 0.7680). One-vs.-rest ROC analysis further indicates strong separability across all classes (micro-average AUC 0.992). These results suggest that adaptive attention-based fusion offers a robust and clinically relevant approach for computer-aided lung cancer screening and decision support.
Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Multiscale RGB-Guided Fusion for Hyperspectral Image Super-Resolution
by
Matteo Kolyszko, Marco Buzzelli, Simone Bianco and Raimondo Schettini
J. Imaging 2026, 12(2), 61; https://doi.org/10.3390/jimaging12020061 - 28 Jan 2026
Abstract
Hyperspectral imaging (HSI) enables fine spectral analysis but is often limited by low spatial resolution due to sensor constraints. To address this, we propose CGNet, a color-guided hyperspectral super-resolution network that leverages complementary information from low-resolution hyperspectral inputs and high-resolution RGB images. CGNet
[...] Read more.
Hyperspectral imaging (HSI) enables fine spectral analysis but is often limited by low spatial resolution due to sensor constraints. To address this, we propose CGNet, a color-guided hyperspectral super-resolution network that leverages complementary information from low-resolution hyperspectral inputs and high-resolution RGB images. CGNet adopts a dual-encoder design: the RGB encoder extracts hierarchical spatial features, while the HSI encoder progressively upsamples spectral features. A multi-scale fusion decoder then combines both modalities in a coarse-to-fine manner to reconstruct the high-resolution HSI. Training is driven by a hybrid loss that balances L1 and Spectral Angle Mapper (SAM), which ablation studies confirm as the most effective formulation. Experiments on two benchmarks, ARAD1K and StereoMSI, at and upscaling factors demonstrate that CGNet consistently outperforms state-of-the-art baselines. CGNet achieves higher PSNR and SSIM, lower SAM, and reduced , confirming its ability to recover sharp spatial structures while preserving spectral fidelity.
Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Real-Time Visual Anomaly Detection in High-Speed Motorsport: An Entropy-Driven Hybrid Retrieval- and Cache-Augmented Architecture
by
Rubén Juárez Cádiz and Fernando Rodríguez-Sela
J. Imaging 2026, 12(2), 60; https://doi.org/10.3390/jimaging12020060 - 28 Jan 2026
Abstract
At 300 km/h, an end-to-end vision delay of 100 ms corresponds to 8.3 m of unobserved travel; therefore, real-time anomaly monitoring must balance sensitivity with strict tail-latency constraints at the edge. We propose a hybrid cache–retrieval inference architecture for visual anomaly detection in
[...] Read more.
At 300 km/h, an end-to-end vision delay of 100 ms corresponds to 8.3 m of unobserved travel; therefore, real-time anomaly monitoring must balance sensitivity with strict tail-latency constraints at the edge. We propose a hybrid cache–retrieval inference architecture for visual anomaly detection in high-speed motorsport that exploits lap-to-lap spatiotemporal redundancy while reserving local similarity retrieval for genuinely uncertain events. The system combines a hierarchical visual encoder (a lightweight backbone with selective refinement via a Nested U-Net for texture-level cues) and an uncertainty-driven router that selects between two memory pathways: (i) a static cache of precomputed scene embeddings for track/background context and (ii) local similarity retrieval over historical telemetry–vision patterns to ground ambiguous frames, improve interpretability, and stabilize decisions under high uncertainty. Routing is governed by an entropy signal computed from prediction and embedding uncertainty: low-entropy frames follow a cache-first path, whereas high-entropy frames trigger retrieval and refinement to preserve decision stability without sacrificing latency. On a high-fidelity closed-circuit benchmark with synchronized onboard video and telemetry and controlled anomaly injections (tire degradation, suspension chatter, and illumination shifts), the proposed approach reduces mean end-to-end latency to 21.7 ms versus 48.6 ms for a retrieval-only baseline (55.3% reduction) while achieving Macro-F1 = 0.89 at safety-oriented operating points. The framework is designed for passive monitoring and decision support, producing advisory outputs without actuating ECU control strategies.
Full article
(This article belongs to the Special Issue AI-Driven Image and Video Understanding)
Open AccessArticle
Neuro-Geometric Graph Transformers with Differentiable Radiographic Geometry for Spinal X-Ray Image Analysis
by
Vuth Kaveevorayan, Rapeepan Pitakaso, Thanatkij Srichok, Natthapong Nanthasamroeng, Chutchai Kaewta and Peerawat Luesak
J. Imaging 2026, 12(2), 59; https://doi.org/10.3390/jimaging12020059 - 28 Jan 2026
Abstract
Radiographic imaging remains a cornerstone of diagnostic practice. However, accurate interpretation faces challenges from subtle visual signatures, anatomical variability, and inter-observer inconsistency. Conventional deep learning approaches, such as convolutional neural networks and vision transformers, deliver strong predictive performance but often lack anatomical grounding
[...] Read more.
Radiographic imaging remains a cornerstone of diagnostic practice. However, accurate interpretation faces challenges from subtle visual signatures, anatomical variability, and inter-observer inconsistency. Conventional deep learning approaches, such as convolutional neural networks and vision transformers, deliver strong predictive performance but often lack anatomical grounding and interpretability, limiting their trustworthiness in imaging applications. To address these challenges, we present SpineNeuroSym, a neuro-geometric imaging framework that unifies geometry-aware learning and symbolic reasoning for explainable medical image analysis. The framework integrates weakly supervised keypoint and region-of-interest discovery, a dual-stream graph–transformer backbone, and a Differentiable Radiographic Geometry Module (dRGM) that computes clinically relevant indices (e.g., slip ratio, disc asymmetry, sacroiliac spacing, and curvature measures). A Neuro-Symbolic Constraint Layer (NSCL) enforces monotonic logic in image-derived predictions, while a Counterfactual Geometry Diffusion (CGD) module generates rare imaging phenotypes and provides diagnostic auditing through counterfactual validation. Evaluated on a comprehensive dataset of 1613 spinal radiographs from Sunpasitthiprasong Hospital encompassing six diagnostic categories—spondylolisthesis (n = 496), infection (n = 322), spondyloarthropathy (n = 275), normal cervical (n = 192), normal thoracic (n = 70), and normal lumbar spine (n = 258)—SpineNeuroSym achieved 89.4% classification accuracy, a macro-F1 of 0.872, and an AUROC of 0.941, outperforming eight state-of-the-art imaging baselines. These results highlight how integrating neuro-geometric modeling, symbolic constraints, and counterfactual validation advances explainable, trustworthy, and reproducible medical imaging AI, establishing a pathway toward transparent image analysis systems.
Full article
(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
SFD-ADNet: Spatial–Frequency Dual-Domain Adaptive Deformation for Point Cloud Data Augmentation
by
Jiacheng Bao, Lingjun Kong and Wenju Wang
J. Imaging 2026, 12(2), 58; https://doi.org/10.3390/jimaging12020058 - 26 Jan 2026
Abstract
Existing 3D point cloud enhancement methods typically rely on artificially designed geometric transformations or local blending strategies, which are prone to introducing illogical deformations, struggle to preserve global structure, and exhibit insufficient adaptability to diverse degradation patterns. To address these limitations, this paper
[...] Read more.
Existing 3D point cloud enhancement methods typically rely on artificially designed geometric transformations or local blending strategies, which are prone to introducing illogical deformations, struggle to preserve global structure, and exhibit insufficient adaptability to diverse degradation patterns. To address these limitations, this paper proposes SFD-ADNet—an adaptive deformation framework based on a dual spatial–frequency domain. It achieves 3D point cloud augmentation by explicitly learning deformation parameters rather than applying predefined perturbations. By jointly modeling spatial structural dependencies and spectral features, SFD-ADNet generates augmented samples that are both structurally aware and task-relevant. In the spatial domain, a hierarchical sequence encoder coupled with a bidirectional Mamba-based deformation predictor captures long-range geometric dependencies and local structural variations, enabling adaptive position-aware deformation control. In the frequency domain, a multi-scale dual-channel mechanism based on adaptive Chebyshev polynomials separates low-frequency structural components from high-frequency details, allowing the model to suppress noise-sensitive distortions while preserving the global geometric skeleton. The two deformation predictions dynamically fuse to balance structural fidelity and sample diversity. Extensive experiments conducted on ModelNet40-C and ScanObjectNN-C involved synthetic CAD models and real-world scanned point clouds under diverse perturbation conditions. SFD-ADNet, as a universal augmentation module, reduces the mCE metrics of PointNet++ and different backbone networks by over 20%. Experiments demonstrate that SFD-ADNet achieves state-of-the-art robustness while preserving critical geometric structures. Furthermore, models enhanced by SFD-ADNet demonstrate consistently improved robustness against diverse point cloud attacks, validating the efficacy of adaptive space-frequency deformation in robust point cloud learning.
Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
CauseHSI: Counterfactual-Augmented Domain Generalization for Hyperspectral Image Classification via Causal Disentanglement
by
Xin Li, Zongchi Yang and Wenlong Li
J. Imaging 2026, 12(2), 57; https://doi.org/10.3390/jimaging12020057 - 26 Jan 2026
Abstract
Cross-scene hyperspectral image (HSI) classification under single-source domain generalization (DG) is a crucial yet challenging task in remote sensing. The core difficulty lies in generalizing from a limited source domain to unseen target scenes. We formalize this through the causal theory, where different
[...] Read more.
Cross-scene hyperspectral image (HSI) classification under single-source domain generalization (DG) is a crucial yet challenging task in remote sensing. The core difficulty lies in generalizing from a limited source domain to unseen target scenes. We formalize this through the causal theory, where different sensing scenes are viewed as distinct interventions on a shared physical system. This perspective reveals two fundamental obstacles: interventional distribution shifts arising from varying acquisition conditions, and confounding biases induced by spurious correlations driven by domain-specific factors. Taking the above considerations into account, we propose CauseHSI, a causality-inspired framework that offers new insights into cross-scene HSI classification. CauseHSI consists of two key components: a Counterfactual Generation Module (CGM) that perturbs domain-specific factors to generate diverse counterfactual variants, simulating cross-domain interventions while preserving semantic consistency, and a Causal Disentanglement Module (CDM) that separates invariant causal semantics from spurious correlations through structured constraints under a structural causal model, ultimately guiding the model to focus on domain-invariant and generalizable representations. By aligning model learning with causal principles, CauseHSI enhances robustness against domain shifts. Extensive experiments on the Pavia, Houston, and HyRANK datasets demonstrate that CauseHSI outperforms existing DG methods.
Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Use of Patient-Specific 3D Models in Paediatric Surgery: Effect on Communication and Surgical Management
by
Cécile O. Muller, Lydia Helbling, Theodoros Xydias, Jeanette Greiner, Valérie Oesch, Henrik Köhler, Tim Ohletz and Jatta Berberat
J. Imaging 2026, 12(2), 56; https://doi.org/10.3390/jimaging12020056 - 26 Jan 2026
Abstract
Children with rare tumours and malformations may benefit from innovative imaging, including patient-specific 3D models that can enhance communication and surgical planning. The primary aim was to evaluate the impact of patient-specific 3D models on communication with families. The secondary aims were to
[...] Read more.
Children with rare tumours and malformations may benefit from innovative imaging, including patient-specific 3D models that can enhance communication and surgical planning. The primary aim was to evaluate the impact of patient-specific 3D models on communication with families. The secondary aims were to assess their influence on medical management and to establish an efficient post-processing workflow. From 2021 to 2024, we prospectively included patients aged 3 months to 18 years with rare tumours or malformations. Families completed questionnaires before and after the presentation of a 3D model generated from MRI sequences, including peripheral nerve tractography. Treating physicians completed a separate questionnaire before surgical planning. Analyses were performed in R. Among 21 patients, diagnoses included 11 tumours, 8 malformations, 1 trauma, and 1 pancreatic pseudo-cyst. Likert scale responses showed improved family understanding after viewing the 3D model (mean score 3.94 to 4.67) and a high overall evaluation (mean 4.61). Physicians also rated the models positively. An efficient image post-processing workflow was defined. Although manual 3D reconstruction remains time-consuming, these preliminary results show that colourful, patient-specific 3D models substantially improve family communication and support clinical decision-making. They also highlight the need for supporting the development of MRI-based automated segmentation softwares using deep neural networks, which are clinically approved and usable in routine practice.
Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Capacity-Limited Failure in Approximate Nearest Neighbor Search on Image Embedding Spaces
by
Morgan Roy Cooper and Mike Busch
J. Imaging 2026, 12(2), 55; https://doi.org/10.3390/jimaging12020055 - 25 Jan 2026
Abstract
Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN
[...] Read more.
Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN neighborhood geometry differs from that of exact k-nearest neighbors (k-NN) search as the neighborhood size increases under constrained search effort. This study quantifies how approximate neighborhood structure changes relative to exact k-NN search as k increases across three experimental conditions. Using multiple random subsets of 10,000 images drawn from the STL-10 dataset, we compute ResNet-50 image embeddings, perform an exact k-NN search, and compare it to a Hierarchical Navigable Small World (HNSW)-based ANN search under controlled hyperparameter regimes. We evaluated the fidelity of neighborhood structure using neighborhood overlap, average neighbor distance, normalized barycenter shift, and local intrinsic dimensionality (LID). Results show that exact k-NN and ANN search behave nearly identically when . However, as the neighborhood size grows and remains fixed, ANN search fails abruptly, exhibiting extreme divergence in neighbor distances at approximately – . Increasing index construction quality delays this failure, and scaling search effort proportionally with neighborhood size ( with ) preserves neighborhood geometry across all evaluated metrics, including LID. The findings indicate that ANN search preserves neighborhood geometry within its operational capacity but abruptly fails when this capacity is exceeded. Documenting this behavior is relevant for scientific applications that approximate embedding spaces and provides practical guidance on when ANN search is interchangeable with exact k-NN and when geometric differences become nontrivial.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
A Robust Skeletonization Method for High-Density Fringe Patterns in Holographic Interferometry Based on Parametric Modeling and Strip Integration
by
Sergey Lychev and Alexander Digilov
J. Imaging 2026, 12(2), 54; https://doi.org/10.3390/jimaging12020054 - 24 Jan 2026
Abstract
Accurate displacement field measurement by holographic interferometry requires robust analysis of high-density fringe patterns, which is hindered by speckle noise inherent in any interferogram, no matter how perfect. Conventional skeletonization methods, such as edge detection algorithms and active contour models, often fail under
[...] Read more.
Accurate displacement field measurement by holographic interferometry requires robust analysis of high-density fringe patterns, which is hindered by speckle noise inherent in any interferogram, no matter how perfect. Conventional skeletonization methods, such as edge detection algorithms and active contour models, often fail under these conditions, producing fragmented and unreliable fringe contours. This paper presents a novel skeletonization procedure that simultaneously addresses three fundamental challenges: (1) topology preservation—by representing the fringe family within a physics-informed, finite-dimensional parametric subspace (e.g., Fourier-based contours), ensuring global smoothness, connectivity, and correct nesting of each fringe; (2) extreme noise robustness—through a robust strip integration functional that replaces noisy point sampling with Gaussian-weighted intensity averaging across a narrow strip, effectively suppressing speckle while yielding a smooth objective function suitable for gradient-based optimization; and (3) sub-pixel accuracy without phase extraction—leveraging continuous bicubic interpolation within a recursive quasi-optimization framework that exploits fringe similarity for precise and stable contour localization. The method’s performance is quantitatively validated on synthetic interferograms with controlled noise, demonstrating significantly lower error compared to baseline techniques. Practical utility is confirmed by successful processing of a real interferogram of a bent plate containing over 100 fringes, enabling precise displacement field reconstruction that closely matches independent theoretical modeling. The proposed procedure provides a reliable tool for processing challenging interferograms where traditional methods fail to deliver satisfactory results.
Full article
(This article belongs to the Special Issue Image Segmentation: Trends and Challenges)
►▼
Show Figures

Figure 1
Open AccessStudy Protocol
Non-Invasive Detection of Prostate Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI
by
Baltasar Ramos, Cristian Garrido, Paulette Narváez, Santiago Gelerstein Claro, Haotian Li, Rafael Salvador, Constanza Vásquez-Venegas, Iván Gallegos, Víctor Castañeda, Cristian Acevedo, Gonzalo Cárdenas and Camilo G. Sotomayor
J. Imaging 2026, 12(1), 53; https://doi.org/10.3390/jimaging12010053 - 22 Jan 2026
Abstract
Prostate cancer (PCa) is the most common malignancy in men worldwide. Multiparametric MRI (mpMRI) improves the detection of clinically significant PCa (csPCa); however, it remains limited by false-positive findings and inter-observer variability. Time-dependent diffusion (TDD) MRI provides microstructural information that may enhance csPCa
[...] Read more.
Prostate cancer (PCa) is the most common malignancy in men worldwide. Multiparametric MRI (mpMRI) improves the detection of clinically significant PCa (csPCa); however, it remains limited by false-positive findings and inter-observer variability. Time-dependent diffusion (TDD) MRI provides microstructural information that may enhance csPCa characterization beyond standard mpMRI. This prospective observational diagnostic accuracy study protocol describes the evaluation of PROS-TD-AI, an in-house developed AI workflow integrating TDD-derived metrics for zone-aware csPCa risk prediction. PROS-TD-AI will be compared with PI-RADS v2.1 in routine clinical imaging using MRI-targeted prostate biopsy as the reference standard.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Multi-Frequency GPR Image Fusion Based on Convolutional Sparse Representation to Enhance Road Detection
by
Liang Fang, Feng Yang, Yuanjing Fang and Junli Nie
J. Imaging 2026, 12(1), 52; https://doi.org/10.3390/jimaging12010052 - 22 Jan 2026
Abstract
Single-frequency ground penetrating radar (GPR) systems are fundamentally constrained by a trade-off between penetration depth and resolution, alongside issues like narrow bandwidth and ringing interference. To break this limitation, we have developed a multi-frequency data fusion technique grounded in convolutional sparse representation (CSR).
[...] Read more.
Single-frequency ground penetrating radar (GPR) systems are fundamentally constrained by a trade-off between penetration depth and resolution, alongside issues like narrow bandwidth and ringing interference. To break this limitation, we have developed a multi-frequency data fusion technique grounded in convolutional sparse representation (CSR). The proposed methodology involves spatially registering multi-frequency GPR signals and fusing them via a CSR framework, where the convolutional dictionaries are derived from simulated high-definition GPR data. Extensive evaluation using information entropy, average gradient, mutual information, and visual information fidelity demonstrates the superiority of our method over traditional fusion approaches (e.g., weighted average, PCA, 2D wavelets). Tests on simulated and real data confirm that our CSR-based fusion successfully synergizes the deep penetration of low frequencies with the fine resolution of high frequencies, leading to substantial gains in GPR image clarity and interpretability.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Interpretable Diagnosis of Pulmonary Emphysema on Low-Dose CT Using ResNet Embeddings
by
Talshyn Sarsembayeva, Madina Mansurova, Ainash Oshibayeva and Stepan Serebryakov
J. Imaging 2026, 12(1), 51; https://doi.org/10.3390/jimaging12010051 - 21 Jan 2026
Abstract
Accurate and interpretable detection of pulmonary emphysema on low-dose computed tomography (LDCT) remains a critical challenge for large-scale screening and population health studies. This work proposes a quality-controlled and interpretable deep learning pipeline for emphysema assessment using ResNet-152 embeddings. The pipeline integrates automated
[...] Read more.
Accurate and interpretable detection of pulmonary emphysema on low-dose computed tomography (LDCT) remains a critical challenge for large-scale screening and population health studies. This work proposes a quality-controlled and interpretable deep learning pipeline for emphysema assessment using ResNet-152 embeddings. The pipeline integrates automated lung segmentation, quality-control filtering, and extraction of 2048-dimensional embeddings from mid-lung patches, followed by analysis using logistic regression, LASSO, and recursive feature elimination (RFE). The embeddings are further fused with quantitative CT (QCT) markers, including %LAA, Perc15, and total lung volume (TLV), to enhance robustness and interpretability. Bootstrapped validation demonstrates strong diagnostic performance (ROC-AUC = 0.996, PR-AUC = 0.962, balanced accuracy = 0.931) with low computational cost. The proposed approach shows that ResNet embeddings pretrained on CT data can be effectively reused without retraining for emphysema characterization, providing a reproducible and explainable framework suitable as a research and screening-support framework for population-level LDCT analysis.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
ADAM-Net: Anatomy-Guided Attentive Unsupervised Domain Adaptation for Joint MG Segmentation and MGD Grading
by
Junbin Fang, Xuan He, You Jiang and Mini Han Wang
J. Imaging 2026, 12(1), 50; https://doi.org/10.3390/jimaging12010050 - 21 Jan 2026
Abstract
Meibomian gland dysfunction (MGD) is a leading cause of dry eye disease, assessable through gland atrophy degree. While deep learning (DL) has advanced meibomian gland (MG) segmentation and MGD classification, existing methods treat these tasks independently and suffer from domain shift across multi-center
[...] Read more.
Meibomian gland dysfunction (MGD) is a leading cause of dry eye disease, assessable through gland atrophy degree. While deep learning (DL) has advanced meibomian gland (MG) segmentation and MGD classification, existing methods treat these tasks independently and suffer from domain shift across multi-center imaging devices. We propose ADAM-Net, an attention-guided unsupervised domain adaptation multi-task framework that jointly models MG segmentation and MGD classification. Our model introduces structure-aware multi-task learning and anatomy-guided attention to enhance feature sharing, suppress background noise, and improve glandular region perception. For the cross-domain tasks MGD-1K→{K5M, CR-2, LV II}, this study systematically evaluates the overall performance of ADAM-Net from multiple perspectives. The experimental results show that ADAM-Net achieves classification accuracies of 77.93%, 74.86%, and 81.77% on the target domains, significantly outperforming current mainstream unsupervised domain adaptation (UDA) methods. The F1-score and the Matthews correlation coefficient (MCC-score) indicate that the model maintains robust discriminative capability even under class-imbalanced scenarios. t-SNE visualizations further validate its cross-domain feature alignment capability. These demonstrate that ADAM-Net exhibits strong robustness and interpretability in multi-center scenarios and provide an effective solution for automated MGD assessment.
Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
6 November 2025
MDPI Launches the Michele Parrinello Award for Pioneering Contributions in Computational Physical Science
MDPI Launches the Michele Parrinello Award for Pioneering Contributions in Computational Physical Science
9 October 2025
Meet Us at the 3rd International Conference on AI Sensors and Transducers, 2–7 August 2026, Jeju, South Korea
Meet Us at the 3rd International Conference on AI Sensors and Transducers, 2–7 August 2026, Jeju, South Korea
Topics
Topic in
AI, Applied Sciences, Bioengineering, Healthcare, IJERPH, JCM, Clinics and Practice, J. Imaging
Artificial Intelligence in Public Health: Current Trends and Future Possibilities, 2nd EditionTopic Editors: Daniele Giansanti, Giovanni CostantiniDeadline: 15 March 2026
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Computer Vision for Medical Image Analysis
Guest Editors: Rahman Attar, Le ZhangDeadline: 15 February 2026
Special Issue in
J. Imaging
Emerging Technologies for Less Invasive Diagnostic Imaging
Guest Editors: Francesca Angelone, Noemi Pisani, Armando RicciardiDeadline: 28 February 2026
Special Issue in
J. Imaging
3D Image Processing: Progress and Challenges
Guest Editor: Chinthaka DineshDeadline: 28 February 2026
Special Issue in
J. Imaging
Translational Preclinical Imaging: Techniques, Applications and Perspectives
Guest Editors: Sara Gargiulo, Sandra AlbaneseDeadline: 31 March 2026


