Journal of Imaging

25 pages, 2445 KB

Open AccessArticle

IPSM-UNet: An Inverted Pyramid-Shaped U-Net++ Architecture with Multi-Resolution Information Interaction for Coronary Artery Segmentation

by Yinong Liao, Wei Li, Guopeng Liu, Rong Wang and Nan Zheng

J. Imaging 2026, 12(5), 216; https://doi.org/10.3390/jimaging12050216 - 20 May 2026

Abstract

Accurate coronary artery segmentation is essential for diagnosis and interventional planning, but conventional U-shaped networks often miss thin, low-contrast vessels and break vessel continuity. We propose Inverted Pyramid-Shaped Multi-resolution U-Net (IPSM-UNet), a dual U-Net++ architecture with multi-resolution feature interaction, feature aggregation, and layer-wise [...] Read more.

Accurate coronary artery segmentation is essential for diagnosis and interventional planning, but conventional U-shaped networks often miss thin, low-contrast vessels and break vessel continuity. We propose Inverted Pyramid-Shaped Multi-resolution U-Net (IPSM-UNet), a dual U-Net++ architecture with multi-resolution feature interaction, feature aggregation, and layer-wise deep supervision. The method is evaluated on DRIVE, CHASE_DB1, DCA1, and an internal coronary angiography dataset. IPSM-UNet achieves competitive or better performance across datasets, including F1 = 0.8310 and Acc = 0.9707 on DRIVE, Se = 0.8792 and Acc = 0.9745 on CHASE_DB1, F1 = 0.8043 and Acc = 0.9793 on DCA1, and Se = 0.8741, F1 = 0.8590, and Acc = 0.9879 on the internal dataset. IPSM-UNet improves vessel continuity and overall segmentation quality, particularly for small-caliber vessels, and supports downstream coronary analysis. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

26 pages, 5686 KB

Open AccessArticle

Cell Structure Segmentation in TEM Images of Murine Skin Melanoma Cells by Deep Learning Model

by Mikhail A. Genaev, Izabella S. Gogaeva, Iuliia S. Taskaeva, Nataliya P. Bgatova, Mikhail V. Kozhekin, Evgeniy G. Komyshev and Dmitry A. Afonnikov

J. Imaging 2026, 12(5), 215; https://doi.org/10.3390/jimaging12050215 - 18 May 2026

Abstract

Mitochondria–endoplasmic reticulum contact sites (MERCs) are known as the specialized areas that are involved in a large number of intracellular signaling pathways that regulate Ca²⁺ homeostasis, lipid transport, mitochondrial dynamics, cell death, and autophagy. Understanding MERC dynamics has important therapeutic implications in [...] Read more.

Mitochondria–endoplasmic reticulum contact sites (MERCs) are known as the specialized areas that are involved in a large number of intracellular signaling pathways that regulate Ca²⁺ homeostasis, lipid transport, mitochondrial dynamics, cell death, and autophagy. Understanding MERC dynamics has important therapeutic implications in cancer, as these contacts regulate fundamental cellular processes and MERCs represent promising targets for therapeutic interventions aimed at improving cancer treatment outcomes. Despite the accumulated data, the role of MERCs in carcinogenesis still remains unknown; thus, it seems promising to search for new tools facilitating the study of MERCs in tumor cells. The structure of MERCs can be examined in great detail using transmission electron microscopy (TEM). Currently, several hundred TEM images are required to obtain reliable data on these contacts. The speed of data processing can be significantly improved by using fast and accurate image analysis techniques based on deep learning models. In this study, five U-Net models with a ResNet34 encoder network were evaluated, including the basic U-Net-Vanilla architecture as well as models incorporating various attention blocks and blocks capturing multilevel image structure, for the segmentation of mitochondria and the endoplasmic reticulum (ER). The best performance on the test dataset was demonstrated by the U-Net-scSE network, with F1 scores of 0.872 for mitochondria and 0.744 for the ER being achieved. Two models were tested for their ability to leverage pre-training on external datasets (Lucchi++, Kasthuri++, and DeepPi-EM). Additionally, models pre-trained on the CEM500K dataset were evaluated after the parameters had been tuned on the data. It was demonstrated by the results that pre-training or the use of pre-trained networks did not lead to an improvement in the IoU and F1 metrics on the test dataset. Subsequent image analysis was conducted to assess two types of MERCs in the segmented images. Finally, the free and user-friendly UltraNet web server was developed for automated analysis of mitochondria, ER, and MERCs using TEM images. Full article

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications, 2nd Edition)

► Show Figures

Figure 1

26 pages, 5445 KB

Open AccessArticle

Robust Point Cloud Registration via Rotation-Equivariant Geometric Encoding and State Space Models

by Junjie Li, Jiajun Liu, Anqi Chen, Huifang Shen and Jianya Yuan

J. Imaging 2026, 12(5), 214; https://doi.org/10.3390/jimaging12050214 - 18 May 2026

Abstract

Point cloud registration in environments lacking rich textures or containing repetitive structures remains highly susceptible to misalignments. The core challenge lies in balancing the demand for extracting highly distinctive local features with the computational cost of global context modeling. In this paper, we [...] Read more.

Point cloud registration in environments lacking rich textures or containing repetitive structures remains highly susceptible to misalignments. The core challenge lies in balancing the demand for extracting highly distinctive local features with the computational cost of global context modeling. In this paper, we propose a robust registration framework that efficiently combines rotation-equivariant geometric representations with state space models of linear complexity to mitigate feature ambiguity and mismatch. First, a multivariate geometric encoding mechanism is embedded within convolutional layers, enhancing local feature distinctiveness under strict rotation equivariance by explicitly leveraging surface properties. Second, to efficiently establish long-range spatial dependencies, we replace standard dense attention with a hybrid geometry-state aggregation module. This module integrates local geometric self-attention with the Mamba architecture, strengthening focus on overlapping regions without the quadratic computational burden. Finally, we optimize the generated correspondences through a physically consistent hypothesis generator to compute reliable rigid transformation results. On standard benchmarks, our framework demonstrates exceptional robustness to ambiguous matches, achieving a 96.3% registration recall on the 3DMatch dataset and outstanding accuracy on the KITTI dataset. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 1866 KB

Open AccessArticle

MRI-Derived Biomarkers and Radiomic Signatures for Early, Dose-Dependent Evaluation of Prostate Cancer Radiotherapy: An Exploratory Study

by Eleni Bekou, Admir Mulita, Ioannis M. Koukourakis, Nikolaos Courcoutsakis, Athanasia Kotini, Evlampia Psatha, Georgios Tsakaldimis, Ioannis Seimenis, Michael I. Koukourakis and Efstratios Karavasilis

J. Imaging 2026, 12(5), 213; https://doi.org/10.3390/jimaging12050213 - 17 May 2026

Abstract

This study provides an accurate assessment of radiotherapy-induced tissue changes in prostate cancer when relying solely on serum prostate-specific antigen kinetics. The current study aims to explore the role of quantitative magnetic resonance imaging and radiomic analyses. In this exploratory prospective study, 22 [...] Read more.

This study provides an accurate assessment of radiotherapy-induced tissue changes in prostate cancer when relying solely on serum prostate-specific antigen kinetics. The current study aims to explore the role of quantitative magnetic resonance imaging and radiomic analyses. In this exploratory prospective study, 22 patients with histologically confirmed prostate cancer underwent multiparametric magnetic resonance imaging at three time points: pre-treatment, mid-treatment, and two months post-radiotherapy. Quantitative imaging analysis included total prostate volume, T2, apparent diffusion coefficient—ADC, and T2* mapping, alongside T2-weighted and diffusion-weighted radiomic feature extraction. Longitudinal changes and dose correlations were analyzed using repeated-measures ANOVA and linear mixed-effects models. Prostate volume increased from 44.22 ± 21.26 cm³ at baseline to 51.11 ± 22.36 cm³ mid-treatment (p < 0.001) and decreased to 37.98 ± 15.5626 cm³ post-treatment (p = 0.034), indicative of temporary radiation-induced glandular edema. T2 relaxation times decreased from 106.00 ± 23.74 ms to 93.33 ± 9.50 ms after therapy (p = 0.023), with androgen deprivation therapy influencing overall values (partial η² = 0.228, p = 0.028), while ADC and T2* remained largely stable (p > 0.05). Radiomic features, particularly from DWI, exhibited subtle time- and dose-dependent variations. Radiation dose was significantly associated with volume and T2, but not with ADC or T2*. These findings suggest that quantitative MRI biomarkers combined with radiomic analysis may provide objective, non-invasive measures of early prostate cancer radiotherapy-induced changes. These imaging-derived metrics may capture early treatment-related tissue alterations and could provide exploratory signals for early treatment evaluation in prostate cancer, although their relationship with biochemical markers requires further validation. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

21 pages, 2332 KB

Open AccessArticle

GCA-Trans: Global Context-Aware Transformer for Robust Transparent Object Segmentation in Robotic Environments

by Deping Li, Zujian Dong, Zilong Yang, Ka-Kui Li and Yushen Huang

J. Imaging 2026, 12(5), 212; https://doi.org/10.3390/jimaging12050212 - 16 May 2026

Abstract

Transparent object segmentation plays a critical role in indoor and outdoor scene understanding, particularly driven by the rapid advancements in autonomous driving and robotics. However, this task presents significant challenges due to the lack of distinct texture and chromatic features in transparent objects, [...] Read more.

Transparent object segmentation plays a critical role in indoor and outdoor scene understanding, particularly driven by the rapid advancements in autonomous driving and robotics. However, this task presents significant challenges due to the lack of distinct texture and chromatic features in transparent objects, causing their appearance to blend into the background. Existing methods face inherent architectural limitations: CNNs are restricted by limited receptive fields, while Transformer-based methods may inadvertently suppress the weak feature details of transparent surfaces due to the inherent low-pass filtering property of self-attention mechanisms, treating them as background noise. Consequently, these approaches struggle to consistently segment transparent objects across diverse scales, failing to preserve both fine details and large-scale structures. To address these limitations, we propose the Global Context-Aware Transformer (GCA-Trans). Specifically, we design a Multi-scale Context Mining (MCM) module that leverages parallel dilated convolutions with varying receptive fields to simultaneously extract features at multiple scales. This design allows the model to capture and fuse fine-grained local details (e.g., edges and textures) with coarse-grained global spatial context (e.g., overall object shapes), ensuring robust segmentation performance for transparent objects of varying scales. Extensive experiments on four benchmark datasets demonstrate that GCA-Trans sets a new state of the art, achieving significant improvements of 2.53% mIoU on Trans10K-v2, 2.1% IoU on RGB-D GSD, 2.2% IoU on GDD, and 1.9% IoU on GSD, validating the effectiveness and robustness of our approach. Full article

(This article belongs to the Special Issue AI-Driven Robot Vision: Progress, Challenges, and Perspectives)

► Show Figures

Figure 1

25 pages, 8985 KB

Open AccessArticle

Clinician-Centered Evaluation Framework for Explainable AI Heatmaps in OCT-Based Retinal Disease Classification

by Eirini Maliagkani, Ilias Georgalas, Ioannis Datseris, Elpiniki Papageorgiou and Ioannis D. Apostolopoulos

J. Imaging 2026, 12(5), 211; https://doi.org/10.3390/jimaging12050211 - 16 May 2026

Abstract

This study presents a two-phase framework for selecting clinically plausible explainable artificial intelligence (XAI) heatmaps for retinal optical coherence tomography (OCT) classification. A six-class Swin Transformer model was trained and validated using a combined dataset consisting of a subset of the public OCT-C8 [...] Read more.

This study presents a two-phase framework for selecting clinically plausible explainable artificial intelligence (XAI) heatmaps for retinal optical coherence tomography (OCT) classification. A six-class Swin Transformer model was trained and validated using a combined dataset consisting of a subset of the public OCT-C8 dataset and private data from a Greek tertiary hospital and externally evaluated on an independent dataset from a private ophthalmological institute. Diagnostic performance was high, achieving 97% accuracy in cross-validation and 91.82% on external evaluation. In Phase 1, one ophthalmologist and one artificial intelligence (AI) specialist independently assessed 100 heatmaps per method based on visual quality and anatomical plausibility, reducing the candidate methods to three. In Phase 2, 21 specialists evaluated the selected methods across multiple cases using a five-point Likert scale reflecting agreement between highlighted regions and the model diagnosis. The proposed Token contRAST map (TRAST) achieved the highest ratings, followed by Gradient-weighted Class Activation Mapping (Grad-CAM++), while Cosine-Grad Fusion Map (CGFM) showed the lowest performance. These findings reflect clinical plausibility rather than direct model interpretability and indicate that effective XAI in OCT imaging requires not only technical performance but also structured expert evaluation. The proposed framework provides a practical approach for selecting explanation methods suitable for clinical use in ophthalmology. Full article

(This article belongs to the Special Issue From Code to Clinic: Trustworthy AI for Medical Imaging)

► Show Figures

Figure 1

18 pages, 1066 KB

Open AccessArticle

Beyond GLM: Inter-Subject Variability as a Complementary Approach to Detect Longitudinal Changes in Emotion Processing in Multiple Sclerosis

by Alice Pirastru, Valeria Blasi, Diego Michael Cacciatore, Marco Rovaris, Elena Toselli, Francesco Pagnini, Cesare Cavalera, Fabrizio Esposito, Giuseppe Baselli and Francesca Baglio

J. Imaging 2026, 12(5), 210; https://doi.org/10.3390/jimaging12050210 - 15 May 2026

Abstract

Understanding how to reliably capture neural changes induced by treatments in neurological patients remains a major methodological challenge. This issue is particularly evident in the emotional domain—frequently impaired in conditions such as multiple sclerosis (MS) and a key target of rehabilitation—yet not limited [...] Read more.

Understanding how to reliably capture neural changes induced by treatments in neurological patients remains a major methodological challenge. This issue is particularly evident in the emotional domain—frequently impaired in conditions such as multiple sclerosis (MS) and a key target of rehabilitation—yet not limited to it. Longitudinal neuroimaging studies predominantly rely on group-level analyses (e.g., General Linear Model, GLM), which assume inter-subject homogeneity and treat inter-subject variability (ISV) as noise. Such assumptions may obscure treatment-related neuroplastic changes, especially in domains like emotion processing, where neural responses are intrinsically variable and highly individualized in clinical populations. This study investigates whether modeling ISV can better capture treatment-related neural changes, using emotion-focused rehabilitation as a representative case. We compared GLM with threshold-weighted overlap maps (OM_th-w), which quantify spatial consistency across individuals. Thirty healthy controls (HCs) and thirteen people with MS (pwMS) undergoing EMDR for depression performed an emotional fMRI task (pwMS pre/post-treatment). GLM revealed no longitudinal effects, whereas OM_th-w showed reduced variability in pwMS after treatment, alongside decreased depressive symptoms (p < 0.001). These findings highlight the value of variability-based approaches as a complementary framework to conventional GLM analyses for detecting treatment-related neuroplasticity in neurological populations. Full article

(This article belongs to the Special Issue Advances in Neuroimaging for Human Cognition, Behavior, Brain Modulation and Prediction)

30 pages, 21776 KB

Open AccessArticle

LDSNet: A Lightweight Detail-Sensitive Network for Small Object Detection in Low-Altitude UAV Scenarios

by Tong Tan, Xianrong Peng, Jianlin Zhang, Haorui Zuo, Yao Zhang, Yunhao Wu and Hui Li

J. Imaging 2026, 12(5), 209; https://doi.org/10.3390/jimaging12050209 - 14 May 2026

Abstract

Object detection in Unmanned Aerial Vehicle (UAV) imagery faces significant challenges due to the unique aerial perspective. A major bottleneck is the weak feature representation of small objects, which limits both detection accuracy and computational efficiency. To address this issue, we propose a [...] Read more.

Object detection in Unmanned Aerial Vehicle (UAV) imagery faces significant challenges due to the unique aerial perspective. A major bottleneck is the weak feature representation of small objects, which limits both detection accuracy and computational efficiency. To address this issue, we propose a Lightweight Detail-Sensitive Network (LDSNet). Specifically, LDSNet consists of three key components: (1) Lightweight Detail-Sensitive Downsampling (LDSDown), which combines anti-aliasing smoothing with dual-path feature extraction to preserve the spatial details of small objects during downsampling; (2) Shared Recursive Dilated Convolution (SRDC), which uses weight-shared multi-rate dilated convolutions to capture multi-scale context and enlarge the receptive field without introducing extra parameters; and (3) Deeply Decoupled Grouped Head (DGHead), which employs high-ratio grouped convolutions to significantly reduce the computational cost of processing high-resolution inputs. Extensive experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that LDSNet achieves an excellent trade-off between accuracy and efficiency. Compared to the YOLOv11n baseline, LDSNet reduces parameters by 84.6% (from 2.6 M to 0.4 M) and FLOPs by 29.2% (from 6.5 G to 4.6 G), while improving mAP₅₀ by 2.2% on VisDrone2019 and achieving 94.5% on HIT-UAV. Full article

(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)

► Show Figures

Figure 1

39 pages, 525 KB

Open AccessArticle

Spatial–Temporal EEG Imaging for Dual-Loop Neuro-Adaptive Simulation: Cognitive-State Decoding and Communication Gating in Critical Human–Machine Teams

by Rubén Juárez, Antonio Hernández-Fernández, Claudia Barros Camargo and David Molero

J. Imaging 2026, 12(5), 208; https://doi.org/10.3390/jimaging12050208 - 12 May 2026

Abstract

Human performance in critical environments is frequently degraded by mistimed communication delivered during periods of visual–cognitive saturation. In such settings, failures arise not only from individual limitations but also from poor coordination between operators under rapidly changing workload conditions. We present a dual-loop [...] Read more.

Human performance in critical environments is frequently degraded by mistimed communication delivered during periods of visual–cognitive saturation. In such settings, failures arise not only from individual limitations but also from poor coordination between operators under rapidly changing workload conditions. We present a dual-loop neuro-adaptive simulation framework based on real-time spectral–topographic EEG representations, in which multichannel cortical activity is transformed into dynamic spatial maps and decoded to regulate both operator assistance and team communication. The system integrates 14-channel wireless EEG (Emotiv EPOC X, 256 Hz), gaze tracking, telemetry, and communication events through an LSL-based multimodal synchronization pipeline. A hybrid CNN–LSTM model processes sequences of spectral-topographic EEG maps to classify three operationally actionable neurocognitive states—Channelized Attention, Diverted Attention, and Surprise/Startle—while also estimating a continuous Cognitive Load Index (CLI). These representation-derived features are then used by a multi-agent proximal policy optimization (MAPPO) controller to generate two coordinated outputs: (i) adaptive haptic guidance for the pilot, designed to reduce reliance on overloaded visual and auditory channels, and (ii) a traffic-light communication gate for the telemetry engineer, regulating whether radio intervention should proceed, be delayed, or be withheld. In a high-fidelity dual-station simulation with 25 pilot–engineer pairs, the proposed framework was associated with a reduction of more than 30% in communication breakdown errors relative to open-loop telemetry, with the strongest effects observed during peak-load windows, while preserving realistic task progression. It also improved pilot reaction time to time-critical warnings and reduced engineer decision load under the tested conditions. These findings support the use of spectral-topographic EEG representations as a practical basis for combining multimodal neurophysiological sensing, spatiotemporal pattern decoding, and adaptive coordination in high-pressure human–machine teams. At the same time, the study should be interpreted as evidence of controlled feasibility in a simulated setting rather than as definitive proof of field-level generalization. We further discuss deployment constraints and propose privacy-by-design safeguards to ensure that neurocognitive signals are used exclusively for operational adaptation rather than employability assessment or performance scoring. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

26 pages, 1880 KB

Open AccessArticle

A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs

by Lijuan Ren and Jinjing Chen

J. Imaging 2026, 12(5), 207; https://doi.org/10.3390/jimaging12050207 - 12 May 2026

Abstract

The accurate detection of dental caries is often hindered by modality-specific imaging challenges, such as illumination artifacts in intra-oral photographs and low lesion contrast in radiographs. This study proposes a comprehensive framework comprising three key components: (1) HybridAugment+, an entropy-guided adaptive augmentation strategy [...] Read more.

The accurate detection of dental caries is often hindered by modality-specific imaging challenges, such as illumination artifacts in intra-oral photographs and low lesion contrast in radiographs. This study proposes a comprehensive framework comprising three key components: (1) HybridAugment+, an entropy-guided adaptive augmentation strategy that applies stronger transformations to low-information images; (2) DBAttNet, a dual-branch attention network featuring illumination–reflection aware attention (IRAA) for photographs and contrast–frequency-aware attention (CFA) for radiographs; and (3) a CAM-based explainability method, selected through a systematic evaluation of five advanced techniques. This study utilized two datasets derived from public sources, comprising 639 intra-oral photographs (481 caries, 158 healthy) and 456 radiographs (268 caries, 188 healthy). These were annotated by two dentists, with established inter-rater reliability (κ = 0.82 for photographs, κ = 0.79 for radiographs). The experimental results demonstrate that HybridAugment+ improved performance over conventional augmentation by up to 8.72% on photographs and 7.67% on radiographs. Furthermore, DBAttNet achieved F1-scores of 97.90% on photographs and 95.72% on radiographs, outperforming ResNet50, InceptionV3, MSDNet, DCANet, and ARM-Net. A comparative evaluation identified XGrad-CAM as the most suitable explainability method, with optimal visualization thresholds of 30% for photographs and 20% for radiographs. Generalization experiments on ophthalmology (APTOS 2019, Messidor-2) and chest radiography datasets (Kermany CXR, NIH ChestX-ray14) demonstrated consistent performance gains over domain-specific methods (DT-Net, ConvNeXt-Tiny). These results confirm that the core design principles effectively transfer to other modalities facing analogous imaging challenges. Full article

(This article belongs to the Special Issue Artificial Intelligence for Medical Imaging and Applications)

► Show Figures

Figure 1

14 pages, 1514 KB

Open AccessArticle

Quantification of Costal Cartilage Calcification Using ¹⁸F-NaF-PET/CT

by Vanessa Shehu, Om H. Gandhi, Patrick Glennan, Jaskeerat Gujral, Shashi B. Singh, Amir A. Amanullah, Shiv Patil, Khushi Gujral, William Y. Raynor, Peter Sang Uk Park, Eric M. Teichner, Robert C. Subtirelu, Talha Khan, Thomas J. Werner, Poul Flemming Høilund-Carlsen, Ali Gholamrezanezhad, Mona-Elisabeth Revheim and Abass Alavi

J. Imaging 2026, 12(5), 206; https://doi.org/10.3390/jimaging12050206 - 12 May 2026

Abstract

A quantification technique for costal cartilage calcification using ¹⁸F-sodium fluoride–positron emission tomography/computed tomography (¹⁸F-NaF-PET/CT) has yet to be established, and the effects of aging and other demographic variables on costal cartilage calcification remain understudied. This study aims to introduce a [...] Read more.

A quantification technique for costal cartilage calcification using ¹⁸F-sodium fluoride–positron emission tomography/computed tomography (¹⁸F-NaF-PET/CT) has yet to be established, and the effects of aging and other demographic variables on costal cartilage calcification remain understudied. This study aims to introduce a quantification methodology for assessing costal cartilage calcification using ¹⁸F-NaF-PET/CT, assess age-related changes in its ¹⁸F-NaF uptake in females and males, and examine the relationship between its ¹⁸F-NaF uptake and CT attenuation as well as ¹⁸F-NaF uptake and coronary artery calcification. In this retrospective study, we analyzed subjects from the Cardiovascular Molecular Calcification Assessed by ¹⁸F-NaF PET/CT (CAMONA) clinical trial. This study evaluated 130 subjects (mean age 48.7 ± 14.5 years; n = 67 females). We manually generated regions of interest overlying the costal cartilages from ribs 8 to 10 on the left side, carefully avoiding osseous uptake from adjacent ribs and sternum, to measure cartilaginous ¹⁸F-NaF uptake. Non-parametric statistical analyses (Spearman correlations, Mann–Whitney U tests, Kruskal–Wallis tests) and receiver operating characteristic analysis were performed to evaluate sex-specific age-related changes in uptake, correlations between imaging parameters, and associations with coronary artery calcium (CAC) score. In females, the mean ¹⁸F-NaF uptake (as assessed by average SUV_mean) was 0.69 ± 0.38 while the corresponding mean Hounsfield Unit (HU) was 108.0 ± 40.0. In males, the mean ¹⁸F-NaF uptake (as assessed by average SUV_mean) was 0.63 ± 0.22, and the mean HU was 104.0 ± 24.0. There was a significant correlation between ¹⁸F-NaF uptake and age in both females (p = 0.003, r = 0.36) and males (p < 0.0001, r = 0.63). The correlation was significantly stronger in males than females (Fisher’s z-test, p = 0.040). There was a significant correlation between CAC score and costal cartilage SUV_mean in both females (r = 0.26, p = 0.036) and males (r = 0.51, p < 0.0001). This study introduces a quantification technique to assess costal cartilage calcification using ¹⁸F-NaF-PET/CT and demonstrates that the calcification increases with age, more strongly in males than in females, and ¹⁸F-NaF uptake is correlated with CAC score. This technique can be applied to other cartilages of interest, in both physiological and pathological conditions, to assess the effects of aging and various demographic variables on cartilage calcification. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

25 pages, 1662 KB

Open AccessArticle

Federated Learning with Differential Privacy for Ultrasound Breast Cancer Classification: An Empirical Study

by Nursultan Makhanov, Beibit Abdikenov, Tomiris Zhaksylyk and Temirlan Karibekov

J. Imaging 2026, 12(5), 205; https://doi.org/10.3390/jimaging12050205 - 11 May 2026

Abstract

Breast cancer is a critical global health challenge, and deep learning shows transformative potential for medical image classification. However, privacy regulations such as HIPAA and GDPR create barriers to centralized data aggregation across institutions. This paper presents an empirical evaluation of federated learning [...] Read more.

Breast cancer is a critical global health challenge, and deep learning shows transformative potential for medical image classification. However, privacy regulations such as HIPAA and GDPR create barriers to centralized data aggregation across institutions. This paper presents an empirical evaluation of federated learning (FL) for breast cancer classification in ultrasound images, systematically comparing seven deep learning architectures (ResNet-50, VGG16, VGG19, DenseNet-121, MobileNetV2, Vision Transformer, CoAtNet) across three FL algorithms (FedAvg, FedProx, FedOpt) with client-side differential privacy (DP). Using a simulated federation of eight institutions, we evaluate three clinically relevant classification scenarios. Federated models achieve performance comparable to centralized baselines—98.52% accuracy for normal/abnormal screening, 89.53% for three-class classification—with ViT-small and DenseNet-121 exceeding their centralized counterparts in several configurations. Under strong DP constraints (noise multiplier

η = 2.0

, yielding conservative privacy budget estimates of

ε < 1.0

with

δ = 10^{- 5}

), screening accuracy remains above 82%, though diagnostic tasks incur substantial degradation (best 68.42%). Our findings provide empirical guidance on architecture selection, FL algorithm choice, and privacy-utility trade-offs for privacy-preserving breast cancer diagnosis, while identifying key challenges for clinical deployment. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

16 pages, 5436 KB

Open AccessArticle

Self-Supervised Text-Driven Point Cloud Upsampling via Semantic Text Guidance

by Zhiyong Zhang, Meiling Qiu, Shuo Chen, Ruyu Liu, Jianhua Zhang and Shengyong Chen

J. Imaging 2026, 12(5), 204; https://doi.org/10.3390/jimaging12050204 - 11 May 2026

Abstract

Point cloud upsampling is a fundamental task in 3D vision, yet most existing methods adopt a global and uniform strategy, which is computationally inefficient and fails to address the need for region-specific refinement. To address this challenge, we propose PartSPUNet, a novel self-supervised, [...] Read more.

Point cloud upsampling is a fundamental task in 3D vision, yet most existing methods adopt a global and uniform strategy, which is computationally inefficient and fails to address the need for region-specific refinement. To address this challenge, we propose PartSPUNet, a novel self-supervised, text-driven point cloud upsampling framework designed to enhance robotic perception through task-oriented local refinement. Inspired by the human cognitive process where high-level language instructions guide visual attention to specific regions of interest, our method allows an operator to use intuitive natural language prompts to direct the upsampling process. Specifically, PartSPUNet leverages a pretrained vision–language model to zero-shot localize the user-specified semantic part within a sparse point cloud. It then performs geometry-aware densification exclusively on this target region, recovering rich geometric details while preserving the global structure. Experimental results demonstrate that our approach significantly outperforms existing methods in reconstructing specified areas, offering a powerful and intuitive tool for enhancing the 3D perception pipeline in intelligent robotic systems. Full article

(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)

► Show Figures

Figure 1

24 pages, 10609 KB

Open AccessArticle

Characterization of RGB-Polarization Sensor-Based Cameras

by Andreas Karge, Maximilian Klammer, Bernhard Eberhardt and Andreas Schilling

J. Imaging 2026, 12(5), 203; https://doi.org/10.3390/jimaging12050203 - 7 May 2026

Abstract

This work presents a characterization method for cameras with trichromatic RGB color filter array and polarization layer (RGB-P) sensor-based imaging devices. Such sensors enable the reconstruction of color and polarization of registered scene elements, which is an important requirement in computer vision. We [...] Read more.

This work presents a characterization method for cameras with trichromatic RGB color filter array and polarization layer (RGB-P) sensor-based imaging devices. Such sensors enable the reconstruction of color and polarization of registered scene elements, which is an important requirement in computer vision. We will present spectral responsivity measurements, which reveal different sensitivities for various color and polarization channels. Furthermore, we will discuss and model an observed chromaticity shift in registered camera signals for polarized irradiance. Both lead to inaccurate estimation of color and polarization features. In order to overcome these issues, we will present a neural-network-based model for color and polarization feature reconstruction. Essentially, it considers spectral sensitivity for polarized irradiance. Furthermore, the model takes into account that, for visualization, the color signals have to be a linear combination of polarization channels. Models were trained for selected natural and synthetic reflectance sets, as well as commonly used lighting. We evaluated the resulting performance, which yielded robust results. The method can be employed for an estimation of color and polarization features for RGB-P imaging devices. Applications can be found in photography, as well as machine and computer vision, in which object surface color rendering plays a major role. Full article

(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)

► Show Figures

Figure 1

25 pages, 3115 KB

Open AccessReview

FFR-CT: Technical Advances and Implementation in Clinical Practice

by Kamil Stankowski, Amedeo Pellizzon, Luca Signorelli, Andrea Baggiano, Nicola Cosentino, Alberico Del Torto, Fabio Fazzari, Daniele Junod, Maria Elisabetta Mancini, Riccardo Maragna, Manuela Muratori, Luigi Tassetti, Alessandra Volpe, Saima Mushtaq and Gianluca Pontone

J. Imaging 2026, 12(5), 202; https://doi.org/10.3390/jimaging12050202 - 5 May 2026

Abstract

Fractional flow reserve derived from coronary computed tomography angiography (FFR-CT) has emerged as a non-invasive modality for the functional assessment of coronary artery disease. By using computational fluid dynamics, particularly in its most extensively validated off-site implementation, FFR-CT enables lesion-specific estimation of pressure [...] Read more.

Fractional flow reserve derived from coronary computed tomography angiography (FFR-CT) has emerged as a non-invasive modality for the functional assessment of coronary artery disease. By using computational fluid dynamics, particularly in its most extensively validated off-site implementation, FFR-CT enables lesion-specific estimation of pressure gradients across coronary stenoses without the need for invasive catheterization. This narrative review summarizes the technical foundations of FFR-CT as well as the evidence demonstrating that FFR-CT enhances the diagnostic accuracy of coronary CT angiography alone by improving specificity for hemodynamically significant stenoses when compared with invasive fractional flow reserve. Beyond diagnosis, FFR-CT provides incremental prognostic information, supporting risk stratification and guiding revascularization decisions. Suggestions for clinical implementation of FFR-CT and guidance on interpreting results within the appropriate clinical context are provided. Despite these advantages, limitations remain, including dependence on image quality, reduced performance in heavily calcified vessels, assumptions regarding hyperemic flow conditions, and limited validation in certain populations. While computational fluid dynamics-based FFR-CT remains the most commonly adopted approach in clinical settings, machine learning-based on-site FFR-CT is rapidly evolving and is expected to become a reliable alternative. As technical refinements continue, FFR-CT is poised to play an expanding role in precision-guided management of coronary artery disease. Full article

(This article belongs to the Special Issue Advances and Challenges in Cardiovascular Imaging)

► Show Figures

Graphical abstract

16 pages, 6417 KB

Open AccessArticle

Beyond Single Descriptors: Complementary Feature Learning for Image Matching

by Xianguo Yu, Yulong Feng and Xi Li

J. Imaging 2026, 12(5), 201; https://doi.org/10.3390/jimaging12050201 - 5 May 2026

Abstract

Sparse local feature matching has served as the cornerstone of numerous visual geometry tasks and attracted extensive attention. Although significant progress has been made in this area, improving the discriminative power of descriptors remains a key challenge. As far as we know, existing [...] Read more.

Sparse local feature matching has served as the cornerstone of numerous visual geometry tasks and attracted extensive attention. Although significant progress has been made in this area, improving the discriminative power of descriptors remains a key challenge. As far as we know, existing sparse feature matching methods only predict a single descriptor map for keypoints, which might restrict their potential in solving complex scenarios. This issue is particularly pronounced in real-time applications where most methods only learn descriptor maps at a reduced spatial resolution compared to the input image. Consequently, they require interpolating from the low resolution map for obtaining per-keypoint descriptors, which will introduce background contamination and reduce the discriminability of final descriptors. To address these issues, we propose an efficient novel complementary local feature description model. Specifically, the model simultaneously learns two descriptor maps using different loss functions within a single Convolutional Neural Network (CNN). An orthogonal loss is introduced to effectively coordinate the learning of the two branches, aiming to obtain decoupled and complementary descriptors. Extensive experiments across various visual geometry tasks, such as homography estimation, indoor and outdoor pose estimation, as well as visual localization, have demonstrated the superior performance of the proposed method. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 1566 KB

Open AccessArticle

A Scene-Adaptive Super-Resolution Framework for Video Compression

by Qiyu Zha and Jiangling Guo

J. Imaging 2026, 12(5), 200; https://doi.org/10.3390/jimaging12050200 - 5 May 2026

Abstract

Video compression is central to large-scale video delivery, where better rate–distortion efficiency directly reduces bandwidth and storage cost. A practical way to improve efficiency is to encode a low-resolution video stream with a standard codec and restore high-resolution details with a learned super-resolution [...] Read more.

Video compression is central to large-scale video delivery, where better rate–distortion efficiency directly reduces bandwidth and storage cost. A practical way to improve efficiency is to encode a low-resolution video stream with a standard codec and restore high-resolution details with a learned super-resolution model at the decoder. However, prior SR-assisted compression methods usually update the reconstruction model at fixed temporal intervals, which can waste bitrate when those update boundaries do not match actual scene changes. In this paper, we present SASVC, a scene-adaptive super-resolution video compression framework for offline codec-augmented compression. SASVC detects scene changes using frame-wise grayscale differences, updates only compact adapter modules when a content transition is observed, and compresses the resulting model updates with chained differencing, quantization, and entropy coding. In this way, the method reduces unnecessary model-stream overhead while preserving scene-specific reconstruction fidelity. Experimental results on both long-form and short-form datasets show that SASVC consistently outperforms SRVC-style baselines and conventional codec-based alternatives under the Bjontegaard delta rate based on peak signal-to-noise ratio (BD-rate/PSNR) criterion. Complementary rate–distortion (RD) comparisons in terms of structural similarity index measure (SSIM) and Video Multi-Method Assessment Fusion (VMAF) show the same overall trend, indicating that the gain is not limited to a single distortion metric. Specifically, SASVC achieves BD-rate gains of

- 41.33 %

and

- 53.49 %

on Vimeo and Xiph, respectively, and further reaches

- 51.53 %

and

- 39.83 %

on UVG and MCL-JCV. The decoder also maintains real-time 1080p reconstruction at 125 frames per second (FPS) on an NVIDIA RTX 3080 Ti GPU, indicating that scene-aligned model updates can improve compression efficiency while keeping decoder-side deployment practical. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

19 pages, 3887 KB

Open AccessArticle

A Cost-Effective and Rapidly Manufacturable Infrared–Visible High-Contrast Calibration Board Based on Structural Parametrization

by Yuandong Shao and Aleksandr S. Vasilev

J. Imaging 2026, 12(5), 199; https://doi.org/10.3390/jimaging12050199 - 2 May 2026

Abstract

The infrared (IR)—visible light (VIS) dual-camera system provides complementary cues for image fusion, but issues such as geometric mismatch caused by different imaging methods, inconsistent resolution/field-of-view, and installation offsets often lead to ghosting and artifacts. This study aims to develop a fast-deployable and [...] Read more.

The infrared (IR)—visible light (VIS) dual-camera system provides complementary cues for image fusion, but issues such as geometric mismatch caused by different imaging methods, inconsistent resolution/field-of-view, and installation offsets often lead to ghosting and artifacts. This study aims to develop a fast-deployable and repeatable calibration workflow based on cost-effective calibration board. We designed an infrared-visible high-contrast checkerboard plate that can be generated through structural parameterization and efficiently manufactured using Python/OpenSCAD. We also established a corner-based registration pipeline that estimates global homography to align the visible-light images onto the infrared pixel grid for fusion and quantitative evaluation. Experiments conducted in a controlled indoor environment demonstrated stable sub-pixel performance within a range of 1.5–2.5 m, with an average re-projection error of 0.47–0.50 pixels per frame and a 95th percentile lower than 0.51 pixels. The corner position re-projection error test further confirmed stability near image boundaries, with a median value of 0.53–0.63 pixels and a 95th percentile of 0.54–0.64 pixels. Overall, the proposed target design and workflow can achieve practical infrared-visible calibration under typical deployment constraints and have repeatable accuracy, providing geometrically consistent input for subsequent fusion and dataset construction. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

26 pages, 22835 KB

Open AccessArticle

DAER-YOLO: Defect-Aware and Edge-Reconstruction Enhanced YOLO for Surface Defect Detection of Varistors

by Wu Xie, Shushuo Yao, Tao Zhang, Gaoxue Qiu, Dong Li, Fuxian Luo and Yong Fan

J. Imaging 2026, 12(5), 198; https://doi.org/10.3390/jimaging12050198 - 2 May 2026

Abstract

Varistors are critical overvoltage protection components in modern power electronic systems. They effectively absorb and dissipate surge energy to ensure the safe and stable operation of electrical equipment. However, surface defects can lead to substandard performance or even trigger equipment failure, compromising overall [...] Read more.

Varistors are critical overvoltage protection components in modern power electronic systems. They effectively absorb and dissipate surge energy to ensure the safe and stable operation of electrical equipment. However, surface defects can lead to substandard performance or even trigger equipment failure, compromising overall system stability. Therefore, high-precision surface defect detection is essential for quality assurance. To address these challenges, we propose a lightweight model termed Defect-Aware and Edge-Reconstruction Enhanced YOLO (DAER-YOLO) for efficient varistor inspection. First, we construct a C3k2-based defect-aware enhancement module (C3k2-iEMA). This module tackles the difficulty of extracting features from small or morphologically complex defects. By integrating multi-scale feature extraction, an attention mechanism, and efficient nonlinear mapping, it strengthens the perception of defect details. Second, to enhance the reconstruction capability for edge damage and small-object defects, we introduce the Efficient Up-Convolution Block (EUCB). This block improves multi-level feature fusion and generates clearer enhanced feature maps. Based on these improvements, DAER-YOLO outperforms the YOLOv11n baseline on a custom varistor dataset, with mAP@50 and mAP@50:95 increasing by 1.6% and 2.3%, respectively. Experimental results demonstrate that the model effectively improves detection accuracy while exhibiting significant potential for real-time industrial applications. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Journal Description

Journal of Imaging

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI