-
A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis -
Next-Generation Advances in Prostate Cancer Imaging and Artificial Intelligence Applications -
Classifying Sex from MSCT-Derived 3D Mandibular Models Using an Adapted PointNet++ Deep Learning Approach in a Croatian Population -
AIGD Era: From Fragment to One Piece
Journal Description
Journal of Imaging
Journal of Imaging
is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques, published online monthly by MDPI.
- Open Accessfree for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q2 (Imaging Science and Photographic Technology) / CiteScore - Q1 (Radiology, Nuclear Medicine and Imaging)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.3 (2024);
5-Year Impact Factor:
3.3 (2024)
Latest Articles
Capacity-Limited Failure in Approximate Nearest Neighbor Search on Image Embedding Spaces
J. Imaging 2026, 12(2), 55; https://doi.org/10.3390/jimaging12020055 (registering DOI) - 25 Jan 2026
Abstract
Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN
[...] Read more.
Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN neighborhood geometry differs from that of exact k-nearest neighbors (k-NN) search as the neighborhood size increases under constrained search effort. This study quantifies how approximate neighborhood structure changes relative to exact k-NN search as k increases across three experimental conditions. Using multiple random subsets of 10,000 images drawn from the STL-10 dataset, we compute ResNet-50 image embeddings, perform an exact k-NN search, and compare it to a Hierarchical Navigable Small World (HNSW)-based ANN search under controlled hyperparameter regimes. We evaluated the fidelity of neighborhood structure using neighborhood overlap, average neighbor distance, normalized barycenter shift, and local intrinsic dimensionality (LID). Results show that exact k-NN and ANN search behave nearly identically when . However, as the neighborhood size grows and remains fixed, ANN search fails abruptly, exhibiting extreme divergence in neighbor distances at approximately – . Increasing index construction quality delays this failure, and scaling search effort proportionally with neighborhood size ( with ) preserves neighborhood geometry across all evaluated metrics, including LID. The findings indicate that ANN search preserves neighborhood geometry within its operational capacity but abruptly fails when this capacity is exceeded. Documenting this behavior is relevant for scientific applications that approximate embedding spaces and provides practical guidance on when ANN search is interchangeable with exact k-NN and when geometric differences become nontrivial.
Full article
(This article belongs to the Section Image and Video Processing)
►
Show Figures
Open AccessArticle
A Robust Skeletonization Method for High-Density Fringe Patterns in Holographic Interferometry Based on Parametric Modeling and Strip Integration
by
Sergey Lychev and Alexander Digilov
J. Imaging 2026, 12(2), 54; https://doi.org/10.3390/jimaging12020054 (registering DOI) - 24 Jan 2026
Abstract
Accurate displacement field measurement by holographic interferometry requires robust analysis of high-density fringe patterns, which is hindered by speckle noise inherent in any interferogram, no matter how perfect. Conventional skeletonization methods, such as edge detection algorithms and active contour models, often fail under
[...] Read more.
Accurate displacement field measurement by holographic interferometry requires robust analysis of high-density fringe patterns, which is hindered by speckle noise inherent in any interferogram, no matter how perfect. Conventional skeletonization methods, such as edge detection algorithms and active contour models, often fail under these conditions, producing fragmented and unreliable fringe contours. This paper presents a novel skeletonization procedure that simultaneously addresses three fundamental challenges: (1) topology preservation—by representing the fringe family within a physics-informed, finite-dimensional parametric subspace (e.g., Fourier-based contours), ensuring global smoothness, connectivity, and correct nesting of each fringe; (2) extreme noise robustness—through a robust strip integration functional that replaces noisy point sampling with Gaussian-weighted intensity averaging across a narrow strip, effectively suppressing speckle while yielding a smooth objective function suitable for gradient-based optimization; and (3) sub-pixel accuracy without phase extraction—leveraging continuous bicubic interpolation within a recursive quasi-optimization framework that exploits fringe similarity for precise and stable contour localization. The method’s performance is quantitatively validated on synthetic interferograms with controlled noise, demonstrating significantly lower error compared to baseline techniques. Practical utility is confirmed by successful processing of a real interferogram of a bent plate containing over 100 fringes, enabling precise displacement field reconstruction that closely matches independent theoretical modeling. The proposed procedure provides a reliable tool for processing challenging interferograms where traditional methods fail to deliver satisfactory results.
Full article
(This article belongs to the Special Issue Image Segmentation: Trends and Challenges)
Open AccessStudy Protocol
Non-Invasive Detection of Prostate Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI
by
Baltasar Ramos, Cristian Garrido, Paulette Narváez, Santiago Gelerstein Claro, Haotian Li, Rafael Salvador, Constanza Vásquez-Venegas, Iván Gallegos, Víctor Castañeda, Cristian Acevedo, Gonzalo Cárdenas and Camilo G. Sotomayor
J. Imaging 2026, 12(1), 53; https://doi.org/10.3390/jimaging12010053 - 22 Jan 2026
Abstract
Prostate cancer (PCa) is the most common malignancy in men worldwide. Multiparametric MRI (mpMRI) improves the detection of clinically significant PCa (csPCa); however, it remains limited by false-positive findings and inter-observer variability. Time-dependent diffusion (TDD) MRI provides microstructural information that may enhance csPCa
[...] Read more.
Prostate cancer (PCa) is the most common malignancy in men worldwide. Multiparametric MRI (mpMRI) improves the detection of clinically significant PCa (csPCa); however, it remains limited by false-positive findings and inter-observer variability. Time-dependent diffusion (TDD) MRI provides microstructural information that may enhance csPCa characterization beyond standard mpMRI. This prospective observational diagnostic accuracy study protocol describes the evaluation of PROS-TD-AI, an in-house developed AI workflow integrating TDD-derived metrics for zone-aware csPCa risk prediction. PROS-TD-AI will be compared with PI-RADS v2.1 in routine clinical imaging using MRI-targeted prostate biopsy as the reference standard.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Multi-Frequency GPR Image Fusion Based on Convolutional Sparse Representation to Enhance Road Detection
by
Liang Fang, Feng Yang, Yuanjing Fang and Junli Nie
J. Imaging 2026, 12(1), 52; https://doi.org/10.3390/jimaging12010052 - 22 Jan 2026
Abstract
Single-frequency ground penetrating radar (GPR) systems are fundamentally constrained by a trade-off between penetration depth and resolution, alongside issues like narrow bandwidth and ringing interference. To break this limitation, we have developed a multi-frequency data fusion technique grounded in convolutional sparse representation (CSR).
[...] Read more.
Single-frequency ground penetrating radar (GPR) systems are fundamentally constrained by a trade-off between penetration depth and resolution, alongside issues like narrow bandwidth and ringing interference. To break this limitation, we have developed a multi-frequency data fusion technique grounded in convolutional sparse representation (CSR). The proposed methodology involves spatially registering multi-frequency GPR signals and fusing them via a CSR framework, where the convolutional dictionaries are derived from simulated high-definition GPR data. Extensive evaluation using information entropy, average gradient, mutual information, and visual information fidelity demonstrates the superiority of our method over traditional fusion approaches (e.g., weighted average, PCA, 2D wavelets). Tests on simulated and real data confirm that our CSR-based fusion successfully synergizes the deep penetration of low frequencies with the fine resolution of high frequencies, leading to substantial gains in GPR image clarity and interpretability.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
Interpretable Diagnosis of Pulmonary Emphysema on Low-Dose CT Using ResNet Embeddings
by
Talshyn Sarsembayeva, Madina Mansurova, Ainash Oshibayeva and Stepan Serebryakov
J. Imaging 2026, 12(1), 51; https://doi.org/10.3390/jimaging12010051 - 21 Jan 2026
Abstract
Accurate and interpretable detection of pulmonary emphysema on low-dose computed tomography (LDCT) remains a critical challenge for large-scale screening and population health studies. This work proposes a quality-controlled and interpretable deep learning pipeline for emphysema assessment using ResNet-152 embeddings. The pipeline integrates automated
[...] Read more.
Accurate and interpretable detection of pulmonary emphysema on low-dose computed tomography (LDCT) remains a critical challenge for large-scale screening and population health studies. This work proposes a quality-controlled and interpretable deep learning pipeline for emphysema assessment using ResNet-152 embeddings. The pipeline integrates automated lung segmentation, quality-control filtering, and extraction of 2048-dimensional embeddings from mid-lung patches, followed by analysis using logistic regression, LASSO, and recursive feature elimination (RFE). The embeddings are further fused with quantitative CT (QCT) markers, including %LAA, Perc15, and total lung volume (TLV), to enhance robustness and interpretability. Bootstrapped validation demonstrates strong diagnostic performance (ROC-AUC = 0.996, PR-AUC = 0.962, balanced accuracy = 0.931) with low computational cost. The proposed approach shows that ResNet embeddings pretrained on CT data can be effectively reused without retraining for emphysema characterization, providing a reproducible and explainable framework suitable as a research and screening-support framework for population-level LDCT analysis.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
ADAM-Net: Anatomy-Guided Attentive Unsupervised Domain Adaptation for Joint MG Segmentation and MGD Grading
by
Junbin Fang, Xuan He, You Jiang and Mini Han Wang
J. Imaging 2026, 12(1), 50; https://doi.org/10.3390/jimaging12010050 - 21 Jan 2026
Abstract
Meibomian gland dysfunction (MGD) is a leading cause of dry eye disease, assessable through gland atrophy degree. While deep learning (DL) has advanced meibomian gland (MG) segmentation and MGD classification, existing methods treat these tasks independently and suffer from domain shift across multi-center
[...] Read more.
Meibomian gland dysfunction (MGD) is a leading cause of dry eye disease, assessable through gland atrophy degree. While deep learning (DL) has advanced meibomian gland (MG) segmentation and MGD classification, existing methods treat these tasks independently and suffer from domain shift across multi-center imaging devices. We propose ADAM-Net, an attention-guided unsupervised domain adaptation multi-task framework that jointly models MG segmentation and MGD classification. Our model introduces structure-aware multi-task learning and anatomy-guided attention to enhance feature sharing, suppress background noise, and improve glandular region perception. For the cross-domain tasks MGD-1K→{K5M, CR-2, LV II}, this study systematically evaluates the overall performance of ADAM-Net from multiple perspectives. The experimental results show that ADAM-Net achieves classification accuracies of 77.93%, 74.86%, and 81.77% on the target domains, significantly outperforming current mainstream unsupervised domain adaptation (UDA) methods. The F1-score and the Matthews correlation coefficient (MCC-score) indicate that the model maintains robust discriminative capability even under class-imbalanced scenarios. t-SNE visualizations further validate its cross-domain feature alignment capability. These demonstrate that ADAM-Net exhibits strong robustness and interpretability in multi-center scenarios and provide an effective solution for automated MGD assessment.
Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Chest Radiography Optimization: Identifying the Optimal kV for Image Quality in a Phantom Study
by
Ioannis Antonakos, Kyriakos Kokkinogoulis, Maria Giannopoulou and Efstathios P. Efstathopoulos
J. Imaging 2026, 12(1), 49; https://doi.org/10.3390/jimaging12010049 - 21 Jan 2026
Abstract
Chest radiography remains one of the most frequently performed imaging examinations, highlighting the need for optimization of acquisition parameters to balance image quality and radiation dose. This study presents a phantom-based quantitative evaluation of chest radiography acquisition settings using a digital radiography system
[...] Read more.
Chest radiography remains one of the most frequently performed imaging examinations, highlighting the need for optimization of acquisition parameters to balance image quality and radiation dose. This study presents a phantom-based quantitative evaluation of chest radiography acquisition settings using a digital radiography system (AGFA DR 600). Measurements were performed at three tube voltage levels across simulated patient-equivalent thicknesses generated using PMMA slabs, with a Leeds TOR 15FG image quality phantom positioned centrally in the imaging setup. Image quality was quantitatively assessed using signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), which were calculated from mean pixel values obtained from repeated acquisitions. Radiation exposure was evaluated through estimation of entrance surface dose (ESD). The analysis demonstrated that dose-normalized performance metrics favored intermediate tube voltages for slim and average patient-equivalent thicknesses, while higher voltages were required to maintain image quality in obese-equivalent conditions. Overall, image quality and dose were found to be strongly dependent on the combined selection of tube voltage and phantom thickness. These findings indicate that modest adjustments to tube voltage selection may improve the balance between image quality and radiation dose in chest radiography. Nevertheless, as the present work is based on phantom measurements, further validation using clinical images and observer-based studies is required before any modification of routine radiographic practice.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Graph-Enhanced Expectation Maximization for Emission Tomography
by
Ryosuke Kasai and Hideki Otsuka
J. Imaging 2026, 12(1), 48; https://doi.org/10.3390/jimaging12010048 - 20 Jan 2026
Abstract
Emission tomography, including single-photon emission computed tomography (SPECT), requires image reconstruction from noisy and incomplete projection data. The maximum-likelihood expectation maximization (MLEM) algorithm is widely used due to its statistical foundation and non-negativity preservation, but it is highly sensitive to noise, particularly in
[...] Read more.
Emission tomography, including single-photon emission computed tomography (SPECT), requires image reconstruction from noisy and incomplete projection data. The maximum-likelihood expectation maximization (MLEM) algorithm is widely used due to its statistical foundation and non-negativity preservation, but it is highly sensitive to noise, particularly in low-count conditions. Although total variation (TV) regularization can reduce noise, it often oversmooths structural details and requires careful parameter tuning. We propose a Graph-Enhanced Expectation Maximization (GREM) algorithm that incorporates graph-based neighborhood information into an MLEM-type multiplicative reconstruction scheme. The method is motivated by a penalized formulation combining a Kullback–Leibler divergence term with a graph Laplacian regularization term, promoting local structural consistency while preserving edges. The resulting update retains the multiplicative structure of MLEM and preserves the non-negativity of the image estimates. Numerical experiments using synthetic phantoms under multiple noise levels, as well as clinical 99mTc-GSA liver SPECT data, demonstrate that GREM consistently outperforms conventional MLEM and TV-regularized MLEM in terms of PSNR and MS-SSIM. These results indicate that GREM provides an effective and practical approach for edge-preserving noise suppression in emission tomography without relying on external training data.
Full article
(This article belongs to the Special Issue Advances in Photoacoustic Imaging: Tomography and Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Automatic Retinal Nerve Fiber Segmentation and the Influence of Intersubject Variability in Ocular Parameters on the Mapping of Retinal Sites to the Pointwise Orientation Angles
by
Diego Luján Villarreal and Adriana Leticia Vera-Tizatl
J. Imaging 2026, 12(1), 47; https://doi.org/10.3390/jimaging12010047 - 19 Jan 2026
Abstract
The current study investigates the influence of intersubject variability in ocular characteristics on the mapping of visual field (VF) sites to the pointwise directional angles in retinal nerve fiber layer (RNFL) bundle traces. In addition, the performance efficacy on the mapping of VF
[...] Read more.
The current study investigates the influence of intersubject variability in ocular characteristics on the mapping of visual field (VF) sites to the pointwise directional angles in retinal nerve fiber layer (RNFL) bundle traces. In addition, the performance efficacy on the mapping of VF sites to the optic nerve head (ONH) was compared to ground truth baselines. Fundus photographs of 546 eyes of 546 healthy subjects (with no history of ocular disease or diabetic retinopathy) were enhanced digitally and RNFL bundle traces were segmented based on the Personalized Estimated Segmentation (PES) algorithm’s core technique. A 24-2 VF grid pattern was overlaid onto the photographs in order to relate VF test points to intersecting RNFL bundles. The PES algorithm effectively traced RNFL bundles in fundus images, achieving an average accuracy of 97.6% relative to the Jansonius map through the application of 10th-order Bezier curves. The PES algorithm assembled an average of 4726 RNFL bundles per fundus image based on 4975 sampling points, obtaining a total of 2,580,505 RNFL bundles based on 2,716,321 sampling points. The influence of ocular parameters could be evaluated for 34 out of 52 VF locations. The ONH-fovea angle and the ONH position in relation to the fovea were the most prominent predictors for variations in the mapping of retinal locations to the pointwise directional angle (p < 0.001). The variation explained by the model (R2 value) ranges from 27.6% for visual field location 15 to 77.8% in location 22, with a mean of 56%. Significant individual variability was found in the mapping of VF sites to the ONH, with a mean standard deviation (95% limit) of 16.55° (median 17.68°) for 50 out of 52 VF locations, ranging from less than 1° to 44.05°. The mean entry angles differed from previous baselines by a range of less than 1° to 23.9° (average difference of 10.6° ± 5.53°), and RMSE of 11.94.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
A Dual Stream Deep Learning Framework for Alzheimer’s Disease Detection Using MRI Sonification
by
Nadia A. Mohsin and Mohammed H. Abdul Ameer
J. Imaging 2026, 12(1), 46; https://doi.org/10.3390/jimaging12010046 - 15 Jan 2026
Abstract
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the
[...] Read more.
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the existing studies rely solely on the visual representations, leaving alternative features unexplored. The objective of this study is to explore whether MRI sonification can provide complementary diagnostic information when combined with conventional image-based methods. In this study, we propose a novel dual-stream multimodal framework that integrates 2D MRI slices with their corresponding audio representations. MRI images are transformed into audio signals using a multi-scale, multi-orientation Gabor filtering, followed by a Hilbert space-filling curve to preserve spatial locality. The image and sound modalities are processed using a lightweight CNN and YAMNet, respectively, then fused via logistic regression. The experimental results of the multimodal achieved the highest accuracy in distinguishing AD from Cognitively Normal (CN) subjects at 98.2%, 94% for AD vs. Mild Cognitive Impairment (MCI), and 93.2% for MCI vs. CN. This work provides a new perspective and highlights the potential of audio transformation of imaging data for feature extraction and classification.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
A Cross-Device and Cross-OS Benchmark of Modern Web Animation Systems
by
Tajana Koren Ivančević, Trpimir Jeronim Ježić and Nikolina Stanić Loknar
J. Imaging 2026, 12(1), 45; https://doi.org/10.3390/jimaging12010045 - 15 Jan 2026
Abstract
Although modern web technologies increasingly rely on high-performance rendering methods to support rich visual content across a range of devices and operating systems, the field remains significantly under-researched. The performance of animated visual elements is affected by numerous factors, including browsers, operating systems,
[...] Read more.
Although modern web technologies increasingly rely on high-performance rendering methods to support rich visual content across a range of devices and operating systems, the field remains significantly under-researched. The performance of animated visual elements is affected by numerous factors, including browsers, operating systems, GPU acceleration, scripting load, and device limitations. This study systematically evaluates animation performance across multiple platforms using a unified set of circle-based animations implemented with eight web-compatible technologies, including HTML, CSS, SVG, JavaScript, Canvas, and WebGL. Animations were evaluated under controlled feature combinations involving random motion, distance, colour variation, blending, and transformations, with object counts ranging from 10 to 10,000. Measurements were conducted on desktop operating systems (Windows, macOS, Linux) and mobile platforms (iOS, Android), using CPU utilisation, GPU memory usage, and frame rate (FPS) as key metrics. Results show that DOM-based approaches maintain stable performance at 100 animated objects but exhibit notable degradation by 500 objects. Canvas-based rendering extends usability to higher object counts, while WebGL demonstrates the most stable performance at large scales (5000–10,000 objects). These findings provide concrete guidance for selecting appropriate animation technologies based on scene complexity and target platform.
Full article
(This article belongs to the Section Visualization and Computer Graphics)
►▼
Show Figures

Graphical abstract
Open AccessArticle
A Deep Feature Fusion Underwater Image Enhancement Model Based on Perceptual Vision Swin Transformer
by
Shasha Tian, Adisorn Sirikham, Jessada Konpang and Chuyang Wang
J. Imaging 2026, 12(1), 44; https://doi.org/10.3390/jimaging12010044 - 14 Jan 2026
Abstract
Underwater optical images are the primary carriers of underwater scene information, playing a crucial role in marine resource exploration, underwater environmental monitoring, and engineering inspection. However, wavelength-dependent absorption and scattering severely deteriorate underwater images, leading to reduced contrast, chromatic distortions, and loss of
[...] Read more.
Underwater optical images are the primary carriers of underwater scene information, playing a crucial role in marine resource exploration, underwater environmental monitoring, and engineering inspection. However, wavelength-dependent absorption and scattering severely deteriorate underwater images, leading to reduced contrast, chromatic distortions, and loss of structural details. To address these issues, we propose a U-shaped underwater image enhancement framework that integrates Swin-Transformer blocks with lightweight attention and residual modules. A Dual-Window Multi-Head Self-Attention (DWMSA) in the bottleneck models long-range context while preserving fine local structure. A Global-Aware Attention Map (GAMP) adaptively re-weights channels and spatial locations to focus on severely degraded regions. A Feature-Augmentation Residual Network (FARN) stabilizes deep training and emphasizes texture and color fidelity. Trained with a combination of Charbonnier, perceptual, and edge losses, our method achieves state-of-the-art results in PSNR and SSIM, the lowest LPIPS, and improvements in UIQM and UCIQE on the UFO-120 and EUVP datasets, with average metrics of PSNR 29.5 dB, SSIM 0.94, LPIPS 0.17, UIQM 3.62, and UCIQE 0.59. Qualitative results show reduced color cast, restored contrast, and sharper details. Code, weights, and evaluation scripts will be released to support reproducibility.
Full article
(This article belongs to the Special Issue Underwater Imaging (2nd Edition))
►▼
Show Figures

Figure 1
Open AccessArticle
FF-Mamba-YOLO: An SSM-Based Benchmark for Forest Fire Detection in UAV Remote Sensing Images
by
Binhua Guo, Dinghui Liu, Zhou Shen and Tiebin Wang
J. Imaging 2026, 12(1), 43; https://doi.org/10.3390/jimaging12010043 - 13 Jan 2026
Abstract
Timely and accurate detection of forest fires through unmanned aerial vehicle (UAV) remote sensing target detection technology is of paramount importance. However, multiscale targets and complex environmental interference in UAV remote sensing images pose significant challenges during detection tasks. To address these obstacles,
[...] Read more.
Timely and accurate detection of forest fires through unmanned aerial vehicle (UAV) remote sensing target detection technology is of paramount importance. However, multiscale targets and complex environmental interference in UAV remote sensing images pose significant challenges during detection tasks. To address these obstacles, this paper presents FF-Mamba-YOLO, a novel framework based on the principles of Mamba and YOLO (You Only Look Once) that leverages innovative modules and architectures to overcome these limitations. Specifically, we introduce MFEBlock and MFFBlock based on state space models (SSMs) in the backbone and neck parts of the network, respectively, enabling the model to effectively capture global dependencies. Second, we construct CFEBlock, a module that performs feature enhancement before SSM processing, improving local feature processing capabilities. Furthermore, we propose MGBlock, which adopts a dynamic gating mechanism, enhancing the model’s adaptive processing capabilities and robustness. Finally, we enhance the structure of Path Aggregation Feature Pyramid Network (PAFPN) to improve feature fusion quality and introduce DySample to enhance image resolution without significantly increasing computational costs. Experimental results on our self-constructed forest fire image dataset demonstrate that the model achieves 67.4% mAP@50, 36.3% mAP@50:95, and 64.8% precision, outperforming previous state-of-the-art methods. These results highlight the potential of FF-Mamba-YOLO in forest fire monitoring.
Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
►▼
Show Figures

Figure 1
Open AccessArticle
GLCN: Graph-Aware Locality-Enhanced Cross-Modality Re-ID Network
by
Junjie Cao, Yuhang Yu, Rong Rong and Xing Xie
J. Imaging 2026, 12(1), 42; https://doi.org/10.3390/jimaging12010042 - 13 Jan 2026
Abstract
Cross-modality person re-identification faces challenges such as illumination discrepancies, local occlusions, and inconsistent modality structures, leading to misalignment and sensitivity issues. We propose GLCN, a framework that addresses these problems by enhancing representation learning through locality enhancement, cross-modality structural alignment, and intra-modality compactness.
[...] Read more.
Cross-modality person re-identification faces challenges such as illumination discrepancies, local occlusions, and inconsistent modality structures, leading to misalignment and sensitivity issues. We propose GLCN, a framework that addresses these problems by enhancing representation learning through locality enhancement, cross-modality structural alignment, and intra-modality compactness. Key components include the Locality-Preserved Cross-branch Fusion (LPCF) module, which combines Local–Positional–Channel Gating (LPCG) for local region and positional sensitivity; Cross-branch Context Interpolated Attention (CCIA) for stable cross-branch consistency; and Graph-Enhanced Center Geometry Alignment (GE-CGA), which aligns class-center similarity structures across modalities to preserve category-level relationships. We also introduce Intra-Modal Prototype Discrepancy Mining Loss (IPDM-Loss) to reduce intra-class variance and improve inter-class separation, thereby creating more compact identity structures in both RGB and IR spaces. Extensive experiments on SYSU-MM01, RegDB, and other benchmarks demonstrate the effectiveness of our approach.
Full article
(This article belongs to the Special Issue Infrared Image Processing with Artificial Intelligence: Progress and Challenges)
►▼
Show Figures

Figure 1
Open AccessArticle
Calibrated Transformer Fusion for Dual-View Low-Energy CESM Classification
by
Ahmed A. H. Alkurdi and Amira Bibo Sallow
J. Imaging 2026, 12(1), 41; https://doi.org/10.3390/jimaging12010041 - 13 Jan 2026
Abstract
Contrast-enhanced spectral mammography (CESM) provides low-energy images acquired in standard craniocaudal (CC) and mediolateral oblique (MLO) views, and clinical interpretation relies on integrating both views. This study proposes a dual-view classification framework that combines deep CNN feature extraction with transformer-based fusion for breast-side
[...] Read more.
Contrast-enhanced spectral mammography (CESM) provides low-energy images acquired in standard craniocaudal (CC) and mediolateral oblique (MLO) views, and clinical interpretation relies on integrating both views. This study proposes a dual-view classification framework that combines deep CNN feature extraction with transformer-based fusion for breast-side classification using low-energy (DM) images from CESM acquisitions (Normal vs. Tumorous; benign and malignant merged). The evaluation was conducted using 5-fold stratified group cross-validation with patient-level grouping to prevent leakage across folds. The final configuration (Model E) integrates dual-backbone feature extraction, transformer fusion, MC-dropout inference for uncertainty estimation, and post hoc logistic calibration. Across the five held-out test folds, Model E achieved a mean accuracy of 96.88% ± 2.39% and a mean F1-score of 97.68% ± 1.66%. The mean ROC-AUC and PR-AUC were 0.9915 ± 0.0098 and 0.9968 ± 0.0029, respectively. Probability quality was supported by a mean Brier score of 0.0236 ± 0.0145 and a mean expected calibration error (ECE) of 0.0334 ± 0.0171. An ablation study (Models A–E) was also reported to quantify the incremental contribution of dual-view input, transformer fusion, and uncertainty calibration. Within the limits of this retrospective single-center setting, these results suggest that dual-view transformer fusion can provide strong discrimination while also producing calibrated probabilities and uncertainty outputs that are relevant for decision support.
Full article
(This article belongs to the Topic Transformer and Deep Learning Applications in Image Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
A Dual-UNet Diffusion Framework for Personalized Panoramic Generation
by
Jing Shen, Leigang Huo, Chunlei Huo and Shiming Xiang
J. Imaging 2026, 12(1), 40; https://doi.org/10.3390/jimaging12010040 - 11 Jan 2026
Abstract
While text-to-image and customized generation methods demonstrate strong capabilities in single-image generation, they fall short in supporting immersive applications that require coherent 360° panoramas. Conversely, existing panorama generation models lack customization capabilities. In panoramic scenes, reference objects often appear as minor background elements
[...] Read more.
While text-to-image and customized generation methods demonstrate strong capabilities in single-image generation, they fall short in supporting immersive applications that require coherent 360° panoramas. Conversely, existing panorama generation models lack customization capabilities. In panoramic scenes, reference objects often appear as minor background elements and may be multiple in number, while reference images across different views exhibit weak correlations. To address these challenges, we propose a diffusion-based framework for customized multi-view image generation. Our approach introduces a decoupled feature injection mechanism within a dual-UNet architecture to handle weakly correlated reference images, effectively integrating spatial information by concurrently feeding both reference images and noise into the denoising branch. A hybrid attention mechanism enables deep fusion of reference features and multi-view representations. Furthermore, a data augmentation strategy facilitates viewpoint-adaptive pose adjustments, and panoramic coordinates are employed to guide multi-view attention. The experimental results demonstrate our model’s effectiveness in generating coherent, high-quality customized multi-view images.
Full article
(This article belongs to the Section AI in Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
Self-Supervised Learning of Deep Embeddings for Classification and Identification of Dental Implants
by
Amani Almalki, Abdulrahman Almalki and Longin Jan Latecki
J. Imaging 2026, 12(1), 39; https://doi.org/10.3390/jimaging12010039 - 9 Jan 2026
Abstract
This study proposes an automated system using deep learning-based object detection to identify implant systems, leveraging recent progress in self-supervised learning, specifically masked image modeling (MIM). We advocate for self-pre-training, emphasizing that its advantages when acquiring suitable pre-training data is challenging. The proposed
[...] Read more.
This study proposes an automated system using deep learning-based object detection to identify implant systems, leveraging recent progress in self-supervised learning, specifically masked image modeling (MIM). We advocate for self-pre-training, emphasizing that its advantages when acquiring suitable pre-training data is challenging. The proposed Masked Deep Embedding (MDE) pre-training method, extending the masked autoencoder (MAE) transformer, significantly enhances dental implant detection performance compared to baselines. Specifically, the proposed method achieves a best detection performance of AP = 96.1, outperforming supervised ViT and MAE baselines by up to +2.9 AP. In addition, we address the absence of a comprehensive dataset for implant design, enhancing an existing dataset under dental expert supervision. This augmentation includes annotations for implant design, such as coronal, middle, and apical parts, resulting in a unique Implant Design Dataset (IDD). The contributions encompass employing self-supervised learning for limited dental radiograph data, replacing MAE’s patch reconstruction with patch embeddings, achieving substantial performance improvement in implant detection, and expanding possibilities through the labeling of implant design. This study paves the way for AI-driven solutions in implant dentistry, providing valuable tools for dentists and patients facing implant-related challenges.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
SCT-Diff: Seamless Contextual Tracking via Diffusion Trajectory
by
Guohao Nie, Xingmei Wang, Debin Zhang and He Wang
J. Imaging 2026, 12(1), 38; https://doi.org/10.3390/jimaging12010038 - 9 Jan 2026
Abstract
Existing detection-based trackers exploit temporal contexts by updating appearance models or modeling target motion. However, the sequential one-shot integration of temporal priors risks amplifying error accumulation, as frame-level template matching restricts comprehensive spatiotemporal analysis. To address this, we propose SCT-Diff, a video-level framework
[...] Read more.
Existing detection-based trackers exploit temporal contexts by updating appearance models or modeling target motion. However, the sequential one-shot integration of temporal priors risks amplifying error accumulation, as frame-level template matching restricts comprehensive spatiotemporal analysis. To address this, we propose SCT-Diff, a video-level framework that holistically estimates target trajectories. Specifically, SCT-Diff processes video clips globally via a diffusion model to incorporate bidirectional spatiotemporal awareness, where reverse diffusion steps progressively refine noisy trajectory proposals into optimal predictions. Crucially, SCT-Diff enables iterative correction of historical trajectory hypotheses by observing future contexts within a sliding time window. This closed-loop feedback from future frames preserves temporal consistency and breaks the error propagation chain under complex appearance variations. For joint modeling of appearance and motion dynamics, we formulate trajectories as unified discrete token sequences. The designed Mamba-based expert decoder bridges visual features with language-formulated trajectories, enabling lightweight yet coherent sequence modeling. Extensive experiments demonstrate SCT-Diff’s superior efficiency and performance, achieving 75.4% AO on GOT-10k while maintaining real-time computational efficiency.
Full article
(This article belongs to the Special Issue Object Detection in Video Surveillance Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
Degradation-Aware Multi-Stage Fusion for Underwater Image Enhancement
by
Lian Xie, Hao Chen and Jin Shu
J. Imaging 2026, 12(1), 37; https://doi.org/10.3390/jimaging12010037 - 8 Jan 2026
Abstract
Underwater images frequently suffer from color casts, low illumination, and blur due to wavelength-dependent absorption and scattering. We present a practical two-stage, modular, and degradation-aware framework designed for real-time enhancement, prioritizing deployability on edge devices. Stage I employs a lightweight CNN to classify
[...] Read more.
Underwater images frequently suffer from color casts, low illumination, and blur due to wavelength-dependent absorption and scattering. We present a practical two-stage, modular, and degradation-aware framework designed for real-time enhancement, prioritizing deployability on edge devices. Stage I employs a lightweight CNN to classify inputs into three dominant degradation classes (color cast, low light, blur) with 91.85% accuracy on an EUVP subset. Stage II applies three scene-specific lightweight enhancement pipelines and fuses their outputs using two alternative learnable modules: a global Linear Fusion and a LiteUNetFusion (spatially adaptive weighting with optional residual correction). Compared to the three single-scene optimizers (average PSNR = 19.0 dB; mean UCIQE ≈ 0.597; mean UIQM ≈ 2.07), the Linear Fusion improves PSNR by +2.6 dB on average and yields roughly +20.7% in UCIQE and +21.0% in UIQM, while maintaining low latency (~90 ms per 640 × 480 frame on an Intel i5-13400F (Intel Corporation, Santa Clara, CA, USA). The LiteUNetFusion further refines results: it raises PSNR by +1.5 dB over the Linear model (23.1 vs. 21.6 dB), brings modest perceptual gains (UCIQE from 0.72 to 0.74, UIQM 2.5 to 2.8) at a runtime of ≈125 ms per 640 × 480 frame, and better preserves local texture and color consistency in mixed-degradation scenes. We release implementation details for reproducibility and discuss limitations (e.g., occasional blur/noise amplification and domain generalization) together with future directions.
Full article
(This article belongs to the Section Image and Video Processing)
►▼
Show Figures

Figure 1
Open AccessArticle
A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data
by
Ekaterina A. Lopukhova, Gulnaz M. Idrisova, Timur R. Mukhamadeev, Grigory S. Voronkov, Ruslan V. Kutluyarov and Elizaveta P. Topolskaya
J. Imaging 2026, 12(1), 36; https://doi.org/10.3390/jimaging12010036 - 8 Jan 2026
Abstract
The paper focuses on automated diagnosis of retinal diseases, particularly Age-related Macular Degeneration (AMD) and diabetic retinopathy (DR), using optical coherence tomography (OCT), while addressing three key challenges: disease comorbidity, severe class imbalance, and the lack of strictly paired OCT and fundus data.
[...] Read more.
The paper focuses on automated diagnosis of retinal diseases, particularly Age-related Macular Degeneration (AMD) and diabetic retinopathy (DR), using optical coherence tomography (OCT), while addressing three key challenges: disease comorbidity, severe class imbalance, and the lack of strictly paired OCT and fundus data. We propose a hierarchical modular deep learning system designed for multi-label OCT screening with conditional routing to specialized staging modules. To enable DR staging when fundus images are unavailable, we use cross-modal alignment between OCT and fundus representations. This approach involves training a latent bridge that projects OCT embeddings into the fundus feature space. We enhance clinical reliability through per-class threshold calibration and implement quality control checks for OCT-only DR staging. Experiments demonstrate robust multi-label performance (macro-F1 after per-class threshold calibration) and reliable calibration (ECE ), and OCT-only DR staging is feasible in 96.1% of cases that meet the quality control criterion.
Full article
(This article belongs to the Section Medical Imaging)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
6 November 2025
MDPI Launches the Michele Parrinello Award for Pioneering Contributions in Computational Physical Science
MDPI Launches the Michele Parrinello Award for Pioneering Contributions in Computational Physical Science
9 October 2025
Meet Us at the 3rd International Conference on AI Sensors and Transducers, 2–7 August 2026, Jeju, South Korea
Meet Us at the 3rd International Conference on AI Sensors and Transducers, 2–7 August 2026, Jeju, South Korea
Topics
Topic in
AI, Applied Sciences, Bioengineering, Healthcare, IJERPH, JCM, Clinics and Practice, J. Imaging
Artificial Intelligence in Public Health: Current Trends and Future Possibilities, 2nd EditionTopic Editors: Daniele Giansanti, Giovanni CostantiniDeadline: 15 March 2026
Topic in
Applied Sciences, Computers, Electronics, Information, J. Imaging
Visual Computing and Understanding: New Developments and Trends
Topic Editors: Wei Zhou, Guanghui Yue, Wenhan YangDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Topic in
Diagnostics, Electronics, J. Imaging, Mathematics, Sensors
Transformer and Deep Learning Applications in Image Processing
Topic Editors: Fengping An, Haitao Xu, Chuyang YeDeadline: 31 May 2026
Conferences
Special Issues
Special Issue in
J. Imaging
Image Segmentation: Trends and Challenges
Guest Editor: Nikolaos MitianoudisDeadline: 31 January 2026
Special Issue in
J. Imaging
Progress and Challenges in Biomedical Image Analysis—2nd Edition
Guest Editors: Lei Li, Zehor BelkhatirDeadline: 31 January 2026
Special Issue in
J. Imaging
Computer Vision for Medical Image Analysis
Guest Editors: Rahman Attar, Le ZhangDeadline: 15 February 2026
Special Issue in
J. Imaging
Emerging Technologies for Less Invasive Diagnostic Imaging
Guest Editors: Francesca Angelone, Noemi Pisani, Armando RicciardiDeadline: 28 February 2026



