Journal of Imaging

J. Imaging, Vol. 12, Pages 208: Spatial–Temporal EEG Imaging for Dual-Loop Neuro-Adaptive Simulation: Cognitive-State Decoding and Communication Gating in Critical Human–Machine Teams

Rubén Juárez — 2026-05-12

J. Imaging, Vol. 12, Pages 208: Spatial–Temporal EEG Imaging for Dual-Loop Neuro-Adaptive Simulation: Cognitive-State Decoding and Communication Gating in Critical Human–Machine Teams

Journal of Imaging doi: 10.3390/jimaging12050208

Authors: Rubén Juárez Antonio Hernández-Fernández Claudia Barros Camargo David Molero

Human performance in critical environments is frequently degraded by mistimed communication delivered during periods of visual–cognitive saturation. In such settings, failures arise not only from individual limitations but also from poor coordination between operators under rapidly changing workload conditions. We present a dual-loop neuro-adaptive simulation framework based on real-time spectral–topographic EEG representations, in which multichannel cortical activity is transformed into dynamic spatial maps and decoded to regulate both operator assistance and team communication. The system integrates 14-channel wireless EEG (Emotiv EPOC X, 256 Hz), gaze tracking, telemetry, and communication events through an LSL-based multimodal synchronization pipeline. A hybrid CNN–LSTM model processes sequences of spectral-topographic EEG maps to classify three operationally actionable neurocognitive states—Channelized Attention, Diverted Attention, and Surprise/Startle—while also estimating a continuous Cognitive Load Index (CLI). These representation-derived features are then used by a multi-agent proximal policy optimization (MAPPO) controller to generate two coordinated outputs: (i) adaptive haptic guidance for the pilot, designed to reduce reliance on overloaded visual and auditory channels, and (ii) a traffic-light communication gate for the telemetry engineer, regulating whether radio intervention should proceed, be delayed, or be withheld. In a high-fidelity dual-station simulation with 25 pilot–engineer pairs, the proposed framework was associated with a reduction of more than 30% in communication breakdown errors relative to open-loop telemetry, with the strongest effects observed during peak-load windows, while preserving realistic task progression. It also improved pilot reaction time to time-critical warnings and reduced engineer decision load under the tested conditions. These findings support the use of spectral-topographic EEG representations as a practical basis for combining multimodal neurophysiological sensing, spatiotemporal pattern decoding, and adaptive coordination in high-pressure human–machine teams. At the same time, the study should be interpreted as evidence of controlled feasibility in a simulated setting rather than as definitive proof of field-level generalization. We further discuss deployment constraints and propose privacy-by-design safeguards to ensure that neurocognitive signals are used exclusively for operational adaptation rather than employability assessment or performance scoring.

J. Imaging, Vol. 12, Pages 207: A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs

Lijuan Ren — 2026-05-12

J. Imaging, Vol. 12, Pages 207: A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs

Journal of Imaging doi: 10.3390/jimaging12050207

Authors: Lijuan Ren Jinjing Chen

The accurate detection of dental caries is often hindered by modality-specific imaging challenges, such as illumination artifacts in intra-oral photographs and low lesion contrast in radiographs. This study proposes a comprehensive framework comprising three key components: (1) HybridAugment+, an entropy-guided adaptive augmentation strategy that applies stronger transformations to low-information images; (2) DBAttNet, a dual-branch attention network featuring illumination–reflection aware attention (IRAA) for photographs and contrast–frequency-aware attention (CFA) for radiographs; and (3) a CAM-based explainability method, selected through a systematic evaluation of five advanced techniques. This study utilized two datasets derived from public sources, comprising 639 intra-oral photographs (481 caries, 158 healthy) and 456 radiographs (268 caries, 188 healthy). These were annotated by two dentists, with established inter-rater reliability (κ = 0.82 for photographs, κ = 0.79 for radiographs). The experimental results demonstrate that HybridAugment+ improved performance over conventional augmentation by up to 8.72% on photographs and 7.67% on radiographs. Furthermore, DBAttNet achieved F1-scores of 97.90% on photographs and 95.72% on radiographs, outperforming ResNet50, InceptionV3, MSDNet, DCANet, and ARM-Net. A comparative evaluation identified XGrad-CAM as the most suitable explainability method, with optimal visualization thresholds of 30% for photographs and 20% for radiographs. Generalization experiments on ophthalmology (APTOS 2019, Messidor-2) and chest radiography datasets (Kermany CXR, NIH ChestX-ray14) demonstrated consistent performance gains over domain-specific methods (DT-Net, ConvNeXt-Tiny). These results confirm that the core design principles effectively transfer to other modalities facing analogous imaging challenges.

J. Imaging, Vol. 12, Pages 206: Quantification of Costal Cartilage Calcification Using 18F-NaF-PET/CT

Vanessa Shehu — 2026-05-12

J. Imaging, Vol. 12, Pages 206: Quantification of Costal Cartilage Calcification Using 18F-NaF-PET/CT

Journal of Imaging doi: 10.3390/jimaging12050206

Authors: Vanessa Shehu Om H. Gandhi Patrick Glennan Jaskeerat Gujral Shashi B. Singh Amir A. Amanullah Shiv Patil Khushi Gujral William Y. Raynor Peter Sang Uk Park Eric M. Teichner Robert C. Subtirelu Talha Khan Thomas J. Werner Poul Flemming Høilund-Carlsen Ali Gholamrezanezhad Mona-Elisabeth Revheim Abass Alavi

A quantification technique for costal cartilage calcification using 18F-sodium fluoride–positron emission tomography/computed tomography (18F-NaF-PET/CT) has yet to be established, and the effects of aging and other demographic variables on costal cartilage calcification remain understudied. This study aims to introduce a quantification methodology for assessing costal cartilage calcification using 18F-NaF-PET/CT, assess age-related changes in its 18F-NaF uptake in females and males, and examine the relationship between its 18F-NaF uptake and CT attenuation as well as 18F-NaF uptake and coronary artery calcification. In this retrospective study, we analyzed subjects from the Cardiovascular Molecular Calcification Assessed by 18F-NaF PET/CT (CAMONA) clinical trial. This study evaluated 130 subjects (mean age 48.7 ± 14.5 years; n = 67 females). We manually generated regions of interest overlying the costal cartilages from ribs 8 to 10 on the left side, carefully avoiding osseous uptake from adjacent ribs and sternum, to measure cartilaginous 18F-NaF uptake. Non-parametric statistical analyses (Spearman correlations, Mann–Whitney U tests, Kruskal–Wallis tests) and receiver operating characteristic analysis were performed to evaluate sex-specific age-related changes in uptake, correlations between imaging parameters, and associations with coronary artery calcium (CAC) score. In females, the mean 18F-NaF uptake (as assessed by average SUVmean) was 0.69 ± 0.38 while the corresponding mean Hounsfield Unit (HU) was 108.0 ± 40.0. In males, the mean 18F-NaF uptake (as assessed by average SUVmean) was 0.63 ± 0.22, and the mean HU was 104.0 ± 24.0. There was a significant correlation between 18F-NaF uptake and age in both females (p = 0.003, r = 0.36) and males (p < 0.0001, r = 0.63). The correlation was significantly stronger in males than females (Fisher’s z-test, p = 0.040). There was a significant correlation between CAC score and costal cartilage SUVmean in both females (r = 0.26, p = 0.036) and males (r = 0.51, p < 0.0001). This study introduces a quantification technique to assess costal cartilage calcification using 18F-NaF-PET/CT and demonstrates that the calcification increases with age, more strongly in males than in females, and 18F-NaF uptake is correlated with CAC score. This technique can be applied to other cartilages of interest, in both physiological and pathological conditions, to assess the effects of aging and various demographic variables on cartilage calcification.

J. Imaging, Vol. 12, Pages 205: Federated Learning with Differential Privacy for Ultrasound Breast Cancer Classification: An Empirical Study

Nursultan Makhanov — 2026-05-11

J. Imaging, Vol. 12, Pages 205: Federated Learning with Differential Privacy for Ultrasound Breast Cancer Classification: An Empirical Study

Journal of Imaging doi: 10.3390/jimaging12050205

Authors: Nursultan Makhanov Beibit Abdikenov Tomiris Zhaksylyk Temirlan Karibekov

Breast cancer is a critical global health challenge, and deep learning shows transformative potential for medical image classification. However, privacy regulations such as HIPAA and GDPR create barriers to centralized data aggregation across institutions. This paper presents an empirical evaluation of federated learning (FL) for breast cancer classification in ultrasound images, systematically comparing seven deep learning architectures (ResNet-50, VGG16, VGG19, DenseNet-121, MobileNetV2, Vision Transformer, CoAtNet) across three FL algorithms (FedAvg, FedProx, FedOpt) with client-side differential privacy (DP). Using a simulated federation of eight institutions, we evaluate three clinically relevant classification scenarios. Federated models achieve performance comparable to centralized baselines—98.52% accuracy for normal/abnormal screening, 89.53% for three-class classification—with ViT-small and DenseNet-121 exceeding their centralized counterparts in several configurations. Under strong DP constraints (noise multiplier η=2.0, yielding conservative privacy budget estimates of ε<1.0 with δ=10−5), screening accuracy remains above 82%, though diagnostic tasks incur substantial degradation (best 68.42%). Our findings provide empirical guidance on architecture selection, FL algorithm choice, and privacy-utility trade-offs for privacy-preserving breast cancer diagnosis, while identifying key challenges for clinical deployment.

J. Imaging, Vol. 12, Pages 204: Self-Supervised Text-Driven Point Cloud Upsampling via Semantic Text Guidance

Zhiyong Zhang — 2026-05-11

J. Imaging, Vol. 12, Pages 204: Self-Supervised Text-Driven Point Cloud Upsampling via Semantic Text Guidance

Journal of Imaging doi: 10.3390/jimaging12050204

Authors: Zhiyong Zhang Meiling Qiu Shuo Chen Ruyu Liu Jianhua Zhang Shengyong Chen

Point cloud upsampling is a fundamental task in 3D vision, yet most existing methods adopt a global and uniform strategy, which is computationally inefficient and fails to address the need for region-specific refinement. To address this challenge, we propose PartSPUNet, a novel self-supervised, text-driven point cloud upsampling framework designed to enhance robotic perception through task-oriented local refinement. Inspired by the human cognitive process where high-level language instructions guide visual attention to specific regions of interest, our method allows an operator to use intuitive natural language prompts to direct the upsampling process. Specifically, PartSPUNet leverages a pretrained vision–language model to zero-shot localize the user-specified semantic part within a sparse point cloud. It then performs geometry-aware densification exclusively on this target region, recovering rich geometric details while preserving the global structure. Experimental results demonstrate that our approach significantly outperforms existing methods in reconstructing specified areas, offering a powerful and intuitive tool for enhancing the 3D perception pipeline in intelligent robotic systems.

J. Imaging, Vol. 12, Pages 203: Characterization of RGB-Polarization Sensor-Based Cameras

Andreas Karge — 2026-05-07

J. Imaging, Vol. 12, Pages 203: Characterization of RGB-Polarization Sensor-Based Cameras

Journal of Imaging doi: 10.3390/jimaging12050203

Authors: Andreas Karge Maximilian Klammer Bernhard Eberhardt Andreas Schilling

This work presents a characterization method for cameras with trichromatic RGB color filter array and polarization layer (RGB-P) sensor-based imaging devices. Such sensors enable the reconstruction of color and polarization of registered scene elements, which is an important requirement in computer vision. We will present spectral responsivity measurements, which reveal different sensitivities for various color and polarization channels. Furthermore, we will discuss and model an observed chromaticity shift in registered camera signals for polarized irradiance. Both lead to inaccurate estimation of color and polarization features. In order to overcome these issues, we will present a neural-network-based model for color and polarization feature reconstruction. Essentially, it considers spectral sensitivity for polarized irradiance. Furthermore, the model takes into account that, for visualization, the color signals have to be a linear combination of polarization channels. Models were trained for selected natural and synthetic reflectance sets, as well as commonly used lighting. We evaluated the resulting performance, which yielded robust results. The method can be employed for an estimation of color and polarization features for RGB-P imaging devices. Applications can be found in photography, as well as machine and computer vision, in which object surface color rendering plays a major role.

J. Imaging, Vol. 12, Pages 202: FFR-CT: Technical Advances and Implementation in Clinical Practice

Kamil Stankowski — 2026-05-05

J. Imaging, Vol. 12, Pages 202: FFR-CT: Technical Advances and Implementation in Clinical Practice

Journal of Imaging doi: 10.3390/jimaging12050202

Authors: Kamil Stankowski Amedeo Pellizzon Luca Signorelli Andrea Baggiano Nicola Cosentino Alberico Del Torto Fabio Fazzari Daniele Junod Maria Elisabetta Mancini Riccardo Maragna Manuela Muratori Luigi Tassetti Alessandra Volpe Saima Mushtaq Gianluca Pontone

Fractional flow reserve derived from coronary computed tomography angiography (FFR-CT) has emerged as a non-invasive modality for the functional assessment of coronary artery disease. By using computational fluid dynamics, particularly in its most extensively validated off-site implementation, FFR-CT enables lesion-specific estimation of pressure gradients across coronary stenoses without the need for invasive catheterization. This narrative review summarizes the technical foundations of FFR-CT as well as the evidence demonstrating that FFR-CT enhances the diagnostic accuracy of coronary CT angiography alone by improving specificity for hemodynamically significant stenoses when compared with invasive fractional flow reserve. Beyond diagnosis, FFR-CT provides incremental prognostic information, supporting risk stratification and guiding revascularization decisions. Suggestions for clinical implementation of FFR-CT and guidance on interpreting results within the appropriate clinical context are provided. Despite these advantages, limitations remain, including dependence on image quality, reduced performance in heavily calcified vessels, assumptions regarding hyperemic flow conditions, and limited validation in certain populations. While computational fluid dynamics-based FFR-CT remains the most commonly adopted approach in clinical settings, machine learning-based on-site FFR-CT is rapidly evolving and is expected to become a reliable alternative. As technical refinements continue, FFR-CT is poised to play an expanding role in precision-guided management of coronary artery disease.

J. Imaging, Vol. 12, Pages 201: Beyond Single Descriptors: Complementary Feature Learning for Image Matching

Xianguo Yu — 2026-05-05

J. Imaging, Vol. 12, Pages 201: Beyond Single Descriptors: Complementary Feature Learning for Image Matching

Journal of Imaging doi: 10.3390/jimaging12050201

Authors: Xianguo Yu Yulong Feng Xi Li

Sparse local feature matching has served as the cornerstone of numerous visual geometry tasks and attracted extensive attention. Although significant progress has been made in this area, improving the discriminative power of descriptors remains a key challenge. As far as we know, existing sparse feature matching methods only predict a single descriptor map for keypoints, which might restrict their potential in solving complex scenarios. This issue is particularly pronounced in real-time applications where most methods only learn descriptor maps at a reduced spatial resolution compared to the input image. Consequently, they require interpolating from the low resolution map for obtaining per-keypoint descriptors, which will introduce background contamination and reduce the discriminability of final descriptors. To address these issues, we propose an efficient novel complementary local feature description model. Specifically, the model simultaneously learns two descriptor maps using different loss functions within a single Convolutional Neural Network (CNN). An orthogonal loss is introduced to effectively coordinate the learning of the two branches, aiming to obtain decoupled and complementary descriptors. Extensive experiments across various visual geometry tasks, such as homography estimation, indoor and outdoor pose estimation, as well as visual localization, have demonstrated the superior performance of the proposed method.

J. Imaging, Vol. 12, Pages 200: A Scene-Adaptive Super-Resolution Framework for Video Compression

Qiyu Zha — 2026-05-05

J. Imaging, Vol. 12, Pages 200: A Scene-Adaptive Super-Resolution Framework for Video Compression

Journal of Imaging doi: 10.3390/jimaging12050200

Authors: Qiyu Zha Jiangling Guo

Video compression is central to large-scale video delivery, where better rate–distortion efficiency directly reduces bandwidth and storage cost. A practical way to improve efficiency is to encode a low-resolution video stream with a standard codec and restore high-resolution details with a learned super-resolution model at the decoder. However, prior SR-assisted compression methods usually update the reconstruction model at fixed temporal intervals, which can waste bitrate when those update boundaries do not match actual scene changes. In this paper, we present SASVC, a scene-adaptive super-resolution video compression framework for offline codec-augmented compression. SASVC detects scene changes using frame-wise grayscale differences, updates only compact adapter modules when a content transition is observed, and compresses the resulting model updates with chained differencing, quantization, and entropy coding. In this way, the method reduces unnecessary model-stream overhead while preserving scene-specific reconstruction fidelity. Experimental results on both long-form and short-form datasets show that SASVC consistently outperforms SRVC-style baselines and conventional codec-based alternatives under the Bjontegaard delta rate based on peak signal-to-noise ratio (BD-rate/PSNR) criterion. Complementary rate–distortion (RD) comparisons in terms of structural similarity index measure (SSIM) and Video Multi-Method Assessment Fusion (VMAF) show the same overall trend, indicating that the gain is not limited to a single distortion metric. Specifically, SASVC achieves BD-rate gains of −41.33% and −53.49% on Vimeo and Xiph, respectively, and further reaches −51.53% and −39.83% on UVG and MCL-JCV. The decoder also maintains real-time 1080p reconstruction at 125 frames per second (FPS) on an NVIDIA RTX 3080 Ti GPU, indicating that scene-aligned model updates can improve compression efficiency while keeping decoder-side deployment practical.

J. Imaging, Vol. 12, Pages 199: A Cost-Effective and Rapidly Manufacturable Infrared–Visible High-Contrast Calibration Board Based on Structural Parametrization

Yuandong Shao — 2026-05-02

J. Imaging, Vol. 12, Pages 199: A Cost-Effective and Rapidly Manufacturable Infrared–Visible High-Contrast Calibration Board Based on Structural Parametrization

Journal of Imaging doi: 10.3390/jimaging12050199

Authors: Yuandong Shao Aleksandr S. Vasilev

The infrared (IR)—visible light (VIS) dual-camera system provides complementary cues for image fusion, but issues such as geometric mismatch caused by different imaging methods, inconsistent resolution/field-of-view, and installation offsets often lead to ghosting and artifacts. This study aims to develop a fast-deployable and repeatable calibration workflow based on cost-effective calibration board. We designed an infrared-visible high-contrast checkerboard plate that can be generated through structural parameterization and efficiently manufactured using Python/OpenSCAD. We also established a corner-based registration pipeline that estimates global homography to align the visible-light images onto the infrared pixel grid for fusion and quantitative evaluation. Experiments conducted in a controlled indoor environment demonstrated stable sub-pixel performance within a range of 1.5–2.5 m, with an average re-projection error of 0.47–0.50 pixels per frame and a 95th percentile lower than 0.51 pixels. The corner position re-projection error test further confirmed stability near image boundaries, with a median value of 0.53–0.63 pixels and a 95th percentile of 0.54–0.64 pixels. Overall, the proposed target design and workflow can achieve practical infrared-visible calibration under typical deployment constraints and have repeatable accuracy, providing geometrically consistent input for subsequent fusion and dataset construction.

J. Imaging, Vol. 12, Pages 198: DAER-YOLO: Defect-Aware and Edge-Reconstruction Enhanced YOLO for Surface Defect Detection of Varistors

Wu Xie — 2026-05-02

J. Imaging, Vol. 12, Pages 198: DAER-YOLO: Defect-Aware and Edge-Reconstruction Enhanced YOLO for Surface Defect Detection of Varistors

Journal of Imaging doi: 10.3390/jimaging12050198

Authors: Wu Xie Shushuo Yao Tao Zhang Gaoxue Qiu Dong Li Fuxian Luo Yong Fan

Varistors are critical overvoltage protection components in modern power electronic systems. They effectively absorb and dissipate surge energy to ensure the safe and stable operation of electrical equipment. However, surface defects can lead to substandard performance or even trigger equipment failure, compromising overall system stability. Therefore, high-precision surface defect detection is essential for quality assurance. To address these challenges, we propose a lightweight model termed Defect-Aware and Edge-Reconstruction Enhanced YOLO (DAER-YOLO) for efficient varistor inspection. First, we construct a C3k2-based defect-aware enhancement module (C3k2-iEMA). This module tackles the difficulty of extracting features from small or morphologically complex defects. By integrating multi-scale feature extraction, an attention mechanism, and efficient nonlinear mapping, it strengthens the perception of defect details. Second, to enhance the reconstruction capability for edge damage and small-object defects, we introduce the Efficient Up-Convolution Block (EUCB). This block improves multi-level feature fusion and generates clearer enhanced feature maps. Based on these improvements, DAER-YOLO outperforms the YOLOv11n baseline on a custom varistor dataset, with mAP@50 and mAP@50:95 increasing by 1.6% and 2.3%, respectively. Experimental results demonstrate that the model effectively improves detection accuracy while exhibiting significant potential for real-time industrial applications.

J. Imaging, Vol. 12, Pages 197: CAMP: A Context-Aware, Multimodal, and Privacy-Preserving Pedestrian Trajectory Prediction Framework

Bin Yue — 2026-05-02

J. Imaging, Vol. 12, Pages 197: CAMP: A Context-Aware, Multimodal, and Privacy-Preserving Pedestrian Trajectory Prediction Framework

Journal of Imaging doi: 10.3390/jimaging12050197

Authors: Bin Yue Shuyu Li Anyu Liu

Pedestrian trajectory prediction is vital for crowd analysis and human–-robot interaction. Recent deep models enhance accuracy by modeling social interactions and scene context, but they often remain opaque and rarely address privacy risks associated with learning individualized motion patterns. We propose CAMP, a Context-Aware, Multimodal, and Privacy-preserving pedestrian trajectory prediction framework designed around a role-aligned multimodal architecture, in which trajectory representations, dynamic scene cues, and explicit spatial interaction constraints are modeled through complementary branches. In CAMP, the trajectory encoder separates shared motion regularities from individualized motion tendencies, the optical-flow encoder captures motion-centric transient scene dynamics, and the potential-field encoder provides an interpretable spatial cost prior for obstacle avoidance and social interaction modeling. A Transformer-based decoder fuses these modalities to predict future trajectory distributions. To reduce the exposure of personalized motion patterns, we apply targeted DP-SGD only to the individual branch during the private fine-tuning stage, while treating the remaining frozen components as post-processing under the stated threat model. Experiments on the ETH/UCY benchmark show that CAMP achieves competitive ADE/FDE performance under the reported setting, while its private variant DP-CAMP maintains a reasonable utility–privacy trade-off across several reported privacy budgets.

J. Imaging, Vol. 12, Pages 196: Computed Fluid Dynamics-Based Blood Pressure Prediction for Coronary Artery Disease Diagnosis Using Coronary Computed Tomography Angiography

Rene Lisasi — 2026-05-02

J. Imaging, Vol. 12, Pages 196: Computed Fluid Dynamics-Based Blood Pressure Prediction for Coronary Artery Disease Diagnosis Using Coronary Computed Tomography Angiography

Journal of Imaging doi: 10.3390/jimaging12050196

Authors: Rene Lisasi Huan Huang William Pei Michele Esposito Chen Zhao

Computational fluid dynamics (CFD)-based simulation of coronary blood flow provides valuable hemodynamic markers, such as pressure gradients, for diagnosing coronary artery disease (CAD). However, CFD is computationally expensive, time-consuming, and difficult to integrate into large-scale clinical workflows. These limitations restrict the availability of labeled hemodynamic data for training AI models and hinder the broad adoption of non-invasive, physiology-based CAD assessment. To address these challenges, we develop an end-to-end pipeline that automates coronary geometry extraction from coronary computed tomography angiography (CCTA), streamlines simulation data generation, and enables efficient learning of coronary blood pressure distributions. The pipeline reduces the manual burden associated with traditional CFD workflows while producing consistent training data. Furthermore, we introduce a diffusion-based regression model. Specifically, the inverted conditional diffusion (ICD) model is designed to predict coronary blood pressure directly from CCTA-derived features, thereby bypassing the need for computationally intensive CFD during inference. The proposed model is trained and validated on two CCTA datasets using the Adam optimizer with a weight decay of 1×10−3, a learning rate of 1×10−5, a batch size of 100, and Huber loss. It is then evaluated on a test set of ten simulated coronary hemodynamic cases. Experimental results demonstrate state-of-the-art performance. Compared with Long Short-Term Memory (LSTM), the proposed model improves the R2 score by 19.78%, reduces the root mean squared error (RMSE) by 19.44%, and lowers the normalized root mean squared error (NRMSE) by 18%. Compared with a multilayer perceptron (MLP), it improves the R2 score by 8.38%, reduces RMSE by 4.3%, and reduces NRMSE by 5.4%. This work represents a first step toward a scalable and accessible framework for rapid, non-invasive, CFD-based blood pressure prediction, with the potential to support CAD diagnosis.

J. Imaging, Vol. 12, Pages 195: On Vision Transformer Explainability for Personal Protective Equipment Detection: A Qualitative and Quantitative Analysis

Miriam Di Renzo — 2026-04-30

J. Imaging, Vol. 12, Pages 195: On Vision Transformer Explainability for Personal Protective Equipment Detection: A Qualitative and Quantitative Analysis

Journal of Imaging doi: 10.3390/jimaging12050195

Authors: Miriam Di Renzo Filomena Niro Patrizia Agnello Marta Petyx Fabio Martinelli Mario Cesarelli Antonella Santone Francesco Mercaldo

The safety of workers in industrial settings is ensured through the correct use of Personal Protective Equipment (PPE). The use of such equipment can be monitored using Deep Learning (DL). Federated Machine Learning (FML) is a technique that can be used in this context to preserve the privacy of sensitive information and provide explainability for the models adopted. Explainability techniques are an essential resource for interpreting the classification performed by the model. In this regard, this study aims to evaluate, through the adoption of specific similarity indices, the robustness and consistency of the explainability algorithms adopted to identify the areas of the images that are decisive for PPE classification. The dataset consists of 1600 real images representing work environments, in which staff are portrayed both with and without Personal Protective Equipment; specifically, there are workers wearing helmets, workers wearing reflective vests, workers wearing both devices and, finally, workers without any PPE. SSIM, VIF and SCC are the most relevant indices involved in the study. In the experimental phase, their mean values stand at 0.99, 0.96 and 0.96 for the intra-client study, and 0.96, 0.91 and 0.71 in the inter-client analysis.

J. Imaging, Vol. 12, Pages 194: N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images

Yafeng Yang — 2026-04-30

J. Imaging, Vol. 12, Pages 194: N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images

Journal of Imaging doi: 10.3390/jimaging12050194

Authors: Yafeng Yang Zhengwei Zhu

Deep learning has substantially advanced the automated classification and segmentation of breast ultrasound images. However, many existing methods do not fully exploit task correlations, which weakens information exchange and limits the delineation of fine structures. In addition, commonly used loss functions often fail to balance classification and segmentation objectives effectively. To address these issues, we propose N-Unet, a multi-task learning framework that combines adaptive optimization with feature-enhancement modules. Specifically, the Adaptive Multi-Task Loss (AMTL) dynamically balances the two task objectives to promote stable joint learning. The Adaptive Feature Fusion (AFF) and Cross-Level Attention Enhancement (CLAE) modules improve feature representation through multi-scale integration and semantic refinement. The Conditional Segmentation Boosting (CSB) module further refines segmentation outputs according to the classification result, improving inference-stage consistency. Together, these components form a unified multi-task framework with a shared encoder, a segmentation branch, and an integrated classification branch whose output further supports segmentation-consistency refinement. Experiments on the BUSI and BUS-UCLM datasets demonstrate the superiority of N-Unet. The model achieves classification accuracies of 96.54% on BUSI and 95.83% on BUS-UCLM, with corresponding Dice scores of 80.70% and 92.16%. It reaches this performance with only 8.95 M parameters and 14.74 GFLOPs, showing a favorable performance-efficiency trade-off. These results confirm the effectiveness of N-Unet and its robustness across the two BUS datasets studied here, supporting its potential for practical breast nodule assessment, while broader external generalization remains to be validated.

J. Imaging, Vol. 12, Pages 193: Evaluation of the Colour Rendering of Brand Identity Elements on Sustainable Papers Made from Invasive Alien Plant Species

Anja Sarjanović — 2026-04-30

J. Imaging, Vol. 12, Pages 193: Evaluation of the Colour Rendering of Brand Identity Elements on Sustainable Papers Made from Invasive Alien Plant Species

Journal of Imaging doi: 10.3390/jimaging12050193

Authors: Anja Sarjanović Klemen Možina

The use of invasive plant species for papermaking presents both environmental and economic opportunities, particularly for companies seeking to introduce sustainable materials. This study examined whether paper made from cellulose fibres of Japanese knotweed is suitable for printing business elements such as logos in specific red colours. The physical, mechanical, and optical properties of the paper were compared with those of standard office and commercial Xerox paper. Two printing techniques—electrophotography and inkjet printing—were tested, and the colour differences (CIE colour difference, ΔE) between the reference logo and the prints, with and without the International Colour Consortium (ICC) colour profile, were evaluated. The results showed that the low whiteness and high porosity of the knotweed paper negatively affected colour reproduction, especially in inkjet printing, where even manually optimised profiles did not yield satisfactory results (minimum ΔE > 23). Electrophotography performed better but still had limitations. It was concluded that Japanese knotweed paper is not suitable for professional reproduction of demanding colour elements without additional processing, although it has potential for sustainable applications with lower visual requirements.

J. Imaging, Vol. 12, Pages 192: Automatic Polygon Annotation of Plant Objects for Training Dataset Preparation in Green Biomass Segmentation Tasks

Evgeniy Ivliev — 2026-04-30

J. Imaging, Vol. 12, Pages 192: Automatic Polygon Annotation of Plant Objects for Training Dataset Preparation in Green Biomass Segmentation Tasks

Journal of Imaging doi: 10.3390/jimaging12050192

Authors: Evgeniy Ivliev Valery Gvindjiliya Danila Donskoy Yevgeniy Chayka

This paper addresses the problem of automated segmentation of plant green biomass in field crop images aimed at improving the accuracy of crop and weed identification. To construct a training dataset for neural network models, an automatic annotation algorithm is proposed, enabling the generation of polygonal object masks without human intervention. The method is based on adaptive analysis of color characteristics of plant fragments with iterative narrowing of the hue range in the HSV color space, combined with an integral quality metric that accounts for the dynamics of contour area and shape. The proposed method achieved an IoU of 93.22% and a DSC of 96.30%, demonstrating a high level of agreement between automatic and manual annotations. The generated masks are used to train segmentation models of the YOLO11-seg family. Models of different scales (n, s, m, l, x) were trained and evaluated using standard metrics, including Intersection over Union (IoU), mAP@0.5, mAP@0.5–0.95, F1-score, and Precision–Recall (PR) curves. Experimental results demonstrate that models trained on automatically generated annotations achieve stable segmentation performance of plant green biomass. The best results were obtained with the YOLO11m-seg model, achieving an F1-score of 0. 772. The results confirm the effectiveness of the proposed approach and demonstrate acceptable segmentation quality, supported by both quantitative metrics and visual analysis. The developed automatic annotation algorithm can be used to expand training datasets in computer vision tasks for agricultural applications.

J. Imaging, Vol. 12, Pages 191: PGT-Net: A Physics-Guided Transformer–CNN Hybrid Network for Low-Light Image Enhancement and Object Detection in Traffic Scenes

Bin Chen — 2026-04-28

J. Imaging, Vol. 12, Pages 191: PGT-Net: A Physics-Guided Transformer–CNN Hybrid Network for Low-Light Image Enhancement and Object Detection in Traffic Scenes

Journal of Imaging doi: 10.3390/jimaging12050191

Authors: Bin Chen Jian Qiao Baowei Li Shipeng Liu Wei She

In autonomous driving and intelligent transportation systems, the degradation of image quality under low-light conditions severely impacts the reliability of subsequent object detection. Existing methods predominantly employ data-driven deep learning models for image enhancement, often lacking physical interpretability and struggling to maintain robustness in complex lighting-varying traffic scenarios. To address this, this paper proposes a Physically Guided Transformer–CNN Hybrid Network (Physically Guided Transformer–CNN Hybrid Network, PGT-Net) for end-to-end joint optimization of low-light enhancement and object detection. PGT-Net innovatively integrates the atmospheric scattering physical model with deep learning architecture: first, a learnable physical guidance branch estimates the scene’s atmospheric illumination map and transmittance map, providing explicit physical priors for the network; second, a dual-branch enhancement backbone is designed, where the local CNN branch (based on an improved UNet) restores fine textures, while the Global Transformer Branch (based on Swin Transformer) models long-range dependencies to correct global uneven illumination, with features adaptively combined via a Physical Fusion Module to ensure enhancement results align with physical laws while retaining rich visual features; finally, the enhanced images are directly fed into a lightweight detection head (e.g., YOLOv7) for joint training and optimization. Comprehensive experiments on public datasets (ExDark, BDD100K-night, etc.) demonstrate that PGT-Net significantly outperforms mainstream methods (e.g., RetinexNet, KinD, Zero-DCE) in both low-light image enhancement quality (PSNR/SSIM) and object detection accuracy (mAP), while maintaining high inference efficiency. This research offers an interpretable, high-performance solution for visual perception tasks under adverse lighting conditions, holding strong theoretical significance and practical value.

J. Imaging, Vol. 12, Pages 190: A Practical Weakly Supervised Framework for Dose-Up Translation of Low-Enhanced CT Under Clinical Acquisition Variability

Jong Bub Lee — 2026-04-27

J. Imaging, Vol. 12, Pages 190: A Practical Weakly Supervised Framework for Dose-Up Translation of Low-Enhanced CT Under Clinical Acquisition Variability

Journal of Imaging doi: 10.3390/jimaging12050190

Authors: Jong Bub Lee Se Hwan Lim Yu Jin Jung Jae Hwan Kim Hyun Gyu Lee

Low-dose contrast-enhanced computed tomography (CT) is widely used to reduce contrast-induced toxicity, but reduced iodine concentration and inconsistent acquisition conditions often produce uneven contrast attenuation and spatial misalignment between scans. In this context, we define dose-up translation as the computational process of synthetically enhancing low-dose contrast images to approximate the visual and diagnostic quality of full-dose acquisitions. These factors limit the effective use of routinely acquired imaging data for dose-up translation, particularly in veterinary abdominal CT where respiratory motion and postural variability further degrade anatomical correspondence. We present a weakly aligned enhancement framework designed to operate under spatial misalignment and limited paired data. Registration-based pseudo-references are constructed using a hybrid strategy that combines deformable anatomical alignment with feature-level correspondence. Dose-up translation is performed using structure-preserving translation with multi-scale consistency and edge-aware regularization to maintain anatomical boundaries. To address limited low-dose datasets, a two-stage knowledge transfer strategy transfers anatomical and contrast priors from abundant pre-contrast data. Quantitative evaluation demonstrated region-level contrast-to-noise ratio improvements of up to 31.5% (e.g., from 5.55 to 8.38 in the caudal vena cava (CVC), P < 0.05) compared with baseline enhancement methods across 1171 test slices. Experiments demonstrate consistent improvements in structural fidelity, distributional realism, and region-level vascular conspicuity compared with paired, unpaired, and synthetic-pairing baselines. These findings suggest that the dose-up translation of low-enhanced CT is better formulated as a weakly aligned domain adaptation problem rather than a strictly paired reconstruction task, enabling practical image translation under realistic clinical acquisition variability.

J. Imaging, Vol. 12, Pages 189: MS-PANet: Multi-Scale Spatial Pyramid Attention for Effective Drainage Pipeline Image Dehazing

Ce Li — 2026-04-27

J. Imaging, Vol. 12, Pages 189: MS-PANet: Multi-Scale Spatial Pyramid Attention for Effective Drainage Pipeline Image Dehazing

Journal of Imaging doi: 10.3390/jimaging12050189

Authors: Ce Li Xinyi Duan Zhongbo Jiang Yijing Ding Quanzhi Li Zhengyan Tang Feng Yang

Urban drainage pipelines are crucial for flood control, drainage, and environmental quality. However, fog within pipelines degrades image quality, hindering the identification of damage features such as cracks and leaks. Existing dehazing algorithms struggle with the unique challenges presented by drainage pipelines, such as their cylindrical structure, non-uniform lighting, and multi-scale particulate interference, leading to inadequate feature extraction and weak cross-channel dependency modeling. To address these issues, we propose a novel drainage pipeline image dehazing network based on a pyramid attention mechanism. Specifically, our proposed method incorporates a custom-designed multi-scale spatial pyramid attention (MSPA) module, which combines hierarchical pyramid convolution and spatial pyramid recalibration modules. This enables the dynamic adjustment of multi-scale feature weights and the effective modeling of cross-channel long-range dependencies. Extensive experiments demonstrate that our network achieves superior dehazing performance across diverse underground environments, particularly in synthetic foggy dataset under real pipeline conditions, outperforming state-of-the-art dehazing algorithms. This proposed approach provides a reliable solution for high-precision visual inspection in complex pipeline scenarios.

J. Imaging, Vol. 12, Pages 188: A Robust Intelligent CNN Model Enhanced with Gabor-Based Feature Extraction, SMOTE Balancing, and Adam Optimization for Multi-Grade Diabetic Retinopathy Classification

Asri Mulyani — 2026-04-27

J. Imaging, Vol. 12, Pages 188: A Robust Intelligent CNN Model Enhanced with Gabor-Based Feature Extraction, SMOTE Balancing, and Adam Optimization for Multi-Grade Diabetic Retinopathy Classification

Journal of Imaging doi: 10.3390/jimaging12050188

Authors: Asri Mulyani Muljono Purwanto Moch Arief Soeleman

Diabetic retinopathy (DR) is a leading cause of vision impairment and permanent blindness worldwide, requiring accurate and automated systems for multi-grade severity classification. However, standard Convolutional Neural Networks (CNNs) often struggle to capture fine, high-frequency microvascular patterns critical for diagnosis. This study proposes a Robust Intelligent CNN Model (RICNN) that integrates Gabor-based feature extraction with deep learning to improve DR classification. Specifically, Gabor filters are applied during preprocessing to extract orientation- and frequency-sensitive texture features, which are transformed into feature maps and concatenated with CNN feature representations at the fully connected layer (feature-level fusion). The model also incorporates the Synthetic Minority Oversampling Technique (SMOTE) for data balancing and the Adam optimizer for efficient convergence. This integration enhances sensitivity to microvascular structures such as microaneurysms and hemorrhages. The proposed RICNN was evaluated on the Messidor dataset (1200 images) across four severity levels: Mild, Moderate, Severe, and Proliferative DR. The model achieved an accuracy of 89%, a precision of 88.75%, a recall of 89%, and an F1-score of 89%, with AUCs of 97% for Severe DR and 99% for Proliferative DR. Comparative analysis confirms that the proposed texture-aware Gabor enhancement significantly outperforms LBP and Color Histogram approaches, indicating its potential for reliable clinical decision support.

J. Imaging, Vol. 12, Pages 187: Projection-Related Bias in the Detection of Thoracic Abnormalities: A Large-Scale Analysis of the NIH ChestX-Ray14 Dataset

Josef Yayan — 2026-04-27

J. Imaging, Vol. 12, Pages 187: Projection-Related Bias in the Detection of Thoracic Abnormalities: A Large-Scale Analysis of the NIH ChestX-Ray14 Dataset

Journal of Imaging doi: 10.3390/jimaging12050187

Authors: Josef Yayan

Chest radiography remains a cornerstone in the diagnosis of thoracic diseases. However, differences in image acquisition—particularly projection type—may influence the apparent prevalence and detectability of radiographic findings. Such differences may represent a potential source of bias in large imaging datasets used for clinical research and artificial intelligence. Importantly, projection type is closely associated with the patient’s condition and may therefore reflect both technical imaging factors and underlying clinical characteristics, including disease severity. A total of 120,120 chest radiographs were available in the dataset. After applying inclusion criteria, 112,104 images were included in the primary analysis. Multivariable logistic regression models were used to assess the association between projection type and the presence of radiographic findings, adjusted for age and sex. Subgroup and interaction analyses were performed to evaluate effect modification by demographic factors. Given the large sample size, emphasis was placed on effect sizes and confidence intervals rather than statistical significance alone. Compared with posteroanterior projection, anteroposterior projection was associated with higher odds of detecting consolidation (aOR 3.27; 95% CI 3.07–3.48), infiltration (aOR 1.90; 95% CI 1.84–1.96), pleural effusion (aOR 1.66; 95% CI 1.60–1.72), atelectasis (aOR 1.63; 95% CI 1.57–1.70), and cardiomegaly (aOR 1.19; 95% CI 1.10–1.28). These associations were consistent across age and sex strata. A significant interaction between projection type and sex was observed for infiltration (p = 0.01). Projection type is associated with substantial differences in the detection of thoracic abnormalities on chest radiographs. These associations should be interpreted with caution, as they likely reflect a combination of technical imaging effects and residual confounding related to patient severity and clinical context. Projection may therefore act as a marker of dataset heterogeneity rather than a purely causal factor. Accounting for projection metadata is therefore essential to improve clinical interpretation and to ensure the robust development and validation of artificial intelligence models.

J. Imaging, Vol. 12, Pages 186: Physically Guided Attention Mechanism for Underwater Motion Deblurring via Cepstrum-Based Blur Estimation

Ning Hu — 2026-04-26

J. Imaging, Vol. 12, Pages 186: Physically Guided Attention Mechanism for Underwater Motion Deblurring via Cepstrum-Based Blur Estimation

Journal of Imaging doi: 10.3390/jimaging12050186

Authors: Ning Hu Shuai Li Jindong Tan

Underwater images often suffer from mixed degradations, including motion blur, which reduce structural clarity and adversely affect downstream vision tasks. To address this problem, we propose a physically guided Transformer framework for underwater motion deblurring. The proposed method combines two-stage cepstrum-based blur estimation with a point spread function (PSF)-guided self-attention mechanism. Specifically, blur parameters are first robustly estimated through cepstrum analysis, ellipse fitting, and negative-peak refinement, and the resulting PSF is then embedded into the Transformer attention module to guide feature aggregation. On the real underwater benchmark datasets UIEB Challenge-60 and EUVP330, the proposed method achieves UIQM/UCIQE scores of 4.09/0.56 and 3.40/0.58, respectively, significantly outperforming UFPNet and Phaseformer, thereby demonstrating superior perceptual restoration in terms of sharpness, contrast, and color consistency. On the synthetic test set, the proposed method attains 24.23 dB PSNR and 0.918 SSIM, outperforming both recent deep models and classical non-blind deconvolution methods, which confirms its strong restoration fidelity and structural consistency. In the controlled water-tank experiments, the proposed method consistently achieves the best performance under different camera motion speeds, demonstrating excellent robustness and practical applicability. Overall, the proposed framework provides an effective and physically interpretable solution for underwater motion deblurring.

J. Imaging, Vol. 12, Pages 185: Influence of Intrapancreatic Fat Deposition on Regional and Total Pancreatic T1 Relaxation Times at 3.0 Tesla MRI

Xiatiguli Shamaitijiang — 2026-04-24

J. Imaging, Vol. 12, Pages 185: Influence of Intrapancreatic Fat Deposition on Regional and Total Pancreatic T1 Relaxation Times at 3.0 Tesla MRI

Journal of Imaging doi: 10.3390/jimaging12050185

Authors: Xiatiguli Shamaitijiang Beau Pontre Loren Skudder-Hill Yutong Liu Maxim S. Petrov

Longitudinal relaxation time (T1) can be used to assess pancreatic pathology on magnetic resonance imaging (MRI). Although pancreatic T1 values may be influenced by intra-organ fat content, regional variation within the pancreas and the impact of potential confounders have not been comprehensively examined. This study aimed to investigate the nuanced associations between intrapancreatic fat deposition (IPFD) and both regional and total pancreatic T1 relaxation times. Pancreatic T1 relaxation times were quantified with B1-corrected dual flip-angle 3D-VIBE imaging at 3.0 Tesla, whereas IPFD was measured with a high-speed, T2-corrected multi-echo sequence. Linear regression models were constructed to evaluate the association between IPFD and T1 values, with adjustment for relevant covariates. A total of 124 individuals were included in the analysis. IPFD explained 4.6% of the variance in total pancreatic T1 values, with notable regional differences: 1.0% in the head, 3.0% in the body, and 7.7% in the tail of the pancreas. In the fully adjusted model, IPFD was significantly associated with total pancreatic T1 values (p = 0.001), with consistent significant associations observed across all pancreatic regions: head (p = 0.03), body (p = 0.004), and tail (p = 0.002). These findings demonstrate that IPFD is a significant determinant of pancreatic T1 relaxation times. Accordingly, IPFD should be considered a potential confounder in pancreatic T1 assessments and accounted for when interpreting T1 relaxation times on pancreatic MRI in both research and clinical contexts.

J. Imaging, Vol. 12, Pages 184: Infrared Small-Target Segmentation Framework Based on Morphological Attention and Energy Core Loss

Baoyu Zhu — 2026-04-24

J. Imaging, Vol. 12, Pages 184: Infrared Small-Target Segmentation Framework Based on Morphological Attention and Energy Core Loss

Journal of Imaging doi: 10.3390/jimaging12050184

Authors: Baoyu Zhu Qunbo Lv Yangyang Liu Haoran Cao Zheng Tan

Infrared small-target segmentation (IRSTS) is crucial for a wide range of applications, including maritime search-and-rescue operations and intelligent traffic surveillance. However, current deep learning methods struggle with dynamic scale variations in infrared small targets, resulting in false detections and missed detections, alongside inadequate core localization accuracy. To address these challenges, we propose an infrared small-target segmentation framework founded on morphological attention and an energy core loss function, IRSTS_Unet. Specifically, we design a Dynamic Shape-adaptive Deformable Attention Module (DSDAM), which achieves parameterized feature extraction via “initial localization–offset deformation–precise sampling”. This approach enables the network to differentially focus on target cores and background cues to suppress clutter. To improve the efficiency of multi-scale feature aggregation, we embed the DSDAM within both the feature extraction and cross-layer fusion stages. Furthermore, we formulate a Core Energy-aware Core-Priority loss (CECP-Loss) function that incorporates the energy prior distribution of small targets, effectively counteracting the “core dilution” phenomenon endemic to conventional loss functions. Through extensive experiments on multiple public datasets, we demonstrate that IRSTS_U-Net outperforms state-of-the-art approaches in terms of both detection accuracy and robustness.

J. Imaging, Vol. 12, Pages 183: DiGS: Depth-Initialized Gaussian Splatting for Single-Object Reconstruction

Jacopo Meglioraldi — 2026-04-24

J. Imaging, Vol. 12, Pages 183: DiGS: Depth-Initialized Gaussian Splatting for Single-Object Reconstruction

Journal of Imaging doi: 10.3390/jimaging12050183

Authors: Jacopo Meglioraldi Pasquale Cascarano Gustavo Marfia

Gaussian Splatting is a state-of-the-art technique for 3D reconstruction. In this paper, we investigate how different initialization strategies influence the optimization process within the Gaussian Splatting framework, showing that more accurate initial point clouds can greatly influence the quality of object reconstruction. We introduce the Depth-initialized Gaussian Splatting (DiGS) approach, a pipeline that leverages depth-based initialization. By incorporating depth data from a calibrated stereo camera setup, the proposed method significantly enhances model performance, particularly during the early optimization stages. DiGS is particularly effective for reconstructing isolated single objects and improving the recovery of fine-grained details. Several tests on synthetic and real-world datasets confirm the effectiveness of the proposed pipeline. To evaluate our approach, we employ objective metrics and a user study involving 20 participants to assess with human perception the quality of the proposed approach.

J. Imaging, Vol. 12, Pages 182: Phase-Domain Peak-Based Correspondence Extraction for Robust Structured-Light Imaging

Andrijana Ćurković — 2026-04-23

J. Imaging, Vol. 12, Pages 182: Phase-Domain Peak-Based Correspondence Extraction for Robust Structured-Light Imaging

Journal of Imaging doi: 10.3390/jimaging12050182

Authors: Andrijana Ćurković Milan Ćurković Alen Grebo

Standard fringe-based structured-light processing estimates wrapped phase from phase-shifted sinusoidal images and commonly relies on phase unwrapping to obtain a globally consistent phase representation. In practical measurements, this approach may become unstable on reflective objects and under low or non-uniform illumination, where the recorded fringe signal is distorted and the recovered phase becomes unreliable. To address these limitations, we propose a correspondence extraction method based on subpixel peak localization performed directly on phase-domain images. The wrapped phase is transformed into absolute value phase profiles, Φ=|ϕw|, whose local structure follows the projected fringe pattern and is less affected by object-dependent intensity variations. The proposed method reformulates correspondence extraction as a local signal-based estimation problem in the phase-domain, thereby reducing reliance on global phase-consistency constraints at the correspondence stage. A practical advantage observed in the evaluated examples is that the method remained usable in some regions where the phase became locally flat because of low modulation, saturation, or reflective surface effects. In such regions, conventional processing relies on sufficiently reliable phase gradients and subsequent unwrapping, whereas the proposed method uses local peak geometry in the transformed phase representation. In the implementation used here, Gray-code information is employed only for pixel-wise phase extension and reference indexing, not as a spatial phase-unwrapping mechanism. The method does not require machine learning models or training data and can be integrated as a correspondence analysis stage in practical structured-light systems.

J. Imaging, Vol. 12, Pages 181: A Pictorial Review on Mastitis: Clinical Aspects, Imaging Features and Complications

Giovanna Romanucci — 2026-04-23

J. Imaging, Vol. 12, Pages 181: A Pictorial Review on Mastitis: Clinical Aspects, Imaging Features and Complications

Journal of Imaging doi: 10.3390/jimaging12050181

Authors: Giovanna Romanucci Claudia Rossati Marco Conti Delia Moretti Gianluca Russo Francesca Fornasa Carlotta Rucci Oscar Tommasini Paolo Belli Rossella Rella

Breast mastitis is a common condition that can be found during clinical practice, challenging the clinician, who must reach the correct diagnosis among the many differentials, to properly treat the underlying pathology. In this review, we aim to provide clinicians and radiologists with an overview of the various forms of mastitis, focusing on clinical presentation, etiological subtypes, imaging appearances across modalities (e.g., ultrasound, mammography/tomosynthesis, contrast enhanced techniques, MRI), related complications, and the typical imaging takeaways. Our goal is also to provide tools for the correct differential diagnosis between various forms of mastitis, breast cancer and other inflammatory breast pathologies. A computerized literature search using PubMed and Google Scholar was performed by authors, entering various keywords (e.g., “mastitis”, “breast infections”, “breast abscess”, “breast cancer mimickers”, “lactational mastitis”, “non lactational mastitis”, “mastitis imaging”, “rare forms of mastitis”). Articles published between 2002 and 2025 were taken into consideration. The authors selected various eligible studies, scientific articles and extracted data to cover the whole spectrum of mastitis clinical presentation and underlying pathology. Authors divided the mastitis spectrum into “lactational” and “non-lactational” forms. Between the second group, periductal mastitis, idiopathic granulomatous mastitis, and rarer forms are taken into consideration. Our review has several limitations: it is a narrative and not systematic review and has limited generalizability of rare subtypes because of the case report driven evidence, heterogeneity of selected studies and potential selection bias. It supplies imaging from various clinical cases, which can be useful to familiarize with the pathology spectrum. In conclusion, breast mastitis is a challenge for breast radiologists and clinicians, familiarity with this condition is crucial to make a correct differential diagnosis. Further studies are needed on rarer subtypes.

J. Imaging, Vol. 12, Pages 180: Neural Computing Advancements in Cardiac Imaging: A Review of Deep Learning Approaches for Heart Disease Diagnosis

Tarek Berghout — 2026-04-22

J. Imaging, Vol. 12, Pages 180: Neural Computing Advancements in Cardiac Imaging: A Review of Deep Learning Approaches for Heart Disease Diagnosis

Journal of Imaging doi: 10.3390/jimaging12050180

Authors: Tarek Berghout

Heart disease remains a leading cause of mortality worldwide, and timely and accurate diagnosis is crucial for improving patient outcomes. Medical imaging plays a pivotal role in this process, yet traditional diagnostic methods often suffer from limitations, including dependency on manual interpretation, susceptibility to observer variability, and inefficiency in handling large-scale data. Deep learning has emerged as an innovative technology in medical imaging, providing unparalleled advancements in feature extraction, segmentation, classification, and prediction tasks. Despite its proven potential, comprehensive reviews of deep learning methods specifically targeted at cardiac imaging remain scarce. This review paper seeks to bridge this gap by analyzing the state-of-the-art deep learning applications for heart disease diagnosis, covering the period from 2015 to 2025. Employing a well-structured methodology, this review categorizes and examines studies based on imaging modalities: Ultrasound (US), Magnetic Resonance Imaging (MRI), X-ray, Computed Tomography (CT), and Electrocardiography (ECG). For each modality, the analysis focuses on utilized datasets, processing techniques (e.g., extraction, segmentation and classification), and paradigms (e.g., transfer learning, federated learning, explainability, interpretability, and uncertainty quantification). Additionally, the types of heart disease addressed and prediction accuracy metrics are also scrutinized. These findings point toward future opportunities, including the study of data quality, optimization, transfer learning, uncertainty quantification and model explainability or interpretability. Furthermore, exploring advanced techniques such as recurrent expansion, transformers, and other architectures may unlock new pathways in cardiac imaging research. This review is a critical synthesis offering a roadmap for researchers and practitioners to advance the application of deep learning in heart disease diagnosis.

J. Imaging, Vol. 12, Pages 179: Automated Morphological Profiling via Deep Learning-Based Segmentation for High-Throughput Phenotypic Screening

Bendegúz H. Zováthi — 2026-04-21

J. Imaging, Vol. 12, Pages 179: Automated Morphological Profiling via Deep Learning-Based Segmentation for High-Throughput Phenotypic Screening

Journal of Imaging doi: 10.3390/jimaging12040179

Authors: Bendegúz H. Zováthi Philipp Kainz

Reproducible morphological profiling, particularly for drug discovery, has become an important tool for compound evaluation. Established workflows such as CellProfiler provide a widely adopted foundation for Cell Painting analysis. However, conventional pipelines often require substantial manual configuration and technical expertise, which can limit scalability and accessibility. In this study, a fully automated deep learning-based workflow is presented for segmentation-driven morphological profiling from raw microscopy data. Using a curated subset of the JUMP Cell Painting pilot dataset, ground-truth masks were generated and used to train a U-net–based segmentation model in the IKOSA platform. Post-processing strategies were introduced to improve instance separation and reduce segmentation artifacts. The final model achieved strong segmentation performance (precision/recall/AP up to 0.98/0.94/0.92 for nuclei), with an average runtime of 2.2 s per 1080 × 1080 image. Segmentation outputs enabled large-scale feature extraction, yielding 3664 morphological descriptors that showed high correlation with CellProfiler-derived measurements (normalized MAE: 0.0298). Feature prioritization further reduced redundancy to 1145 informative descriptors. These results demonstrate that automated deep learning pipelines can complement established Cell Painting workflows by reducing configuration overhead while maintaining compatibility with validated morphological profiling standards. The proposed workflow may help improve resource efficiency in drug discovery and personalized medicine.

J. Imaging, Vol. 12, Pages 178: A Method for Paired Comparisons of Glo Germ Quantity in Images of Hands Before and After Washing

Jordan Ali Rashid — 2026-04-21

J. Imaging, Vol. 12, Pages 178: A Method for Paired Comparisons of Glo Germ Quantity in Images of Hands Before and After Washing

Journal of Imaging doi: 10.3390/jimaging12040178

Authors: Jordan Ali Rashid Stuart Criley

We present a reproducible pipeline that converts color images into quantitative fluorescence maps by combining spectral measurement with a linear mixture model. The method is designed specifically for quantitative comparisons of Glo Germ™ on images of hands taken under different experimental conditions with controlled illumination. The emission spectrum of Glo Germ is measured using a spectral photometer and normalized to obtain its spectral power density function. This spectrum is projected into CIE XYZ coordinates and incorporated into a linear mixture model in which each pixel contains contributions from white light, UV-illuminated skin reflectance, and fluorophore emission. Component magnitudes are estimated with non-negative least squares, yielding a grayscale image whose intensity is a monotonic proxy for local fluorophore density. Spatial integration provides an image-level summary proportional to total detected material. Compared with single-channel proxies, the observer suppresses background structure, improves contrast, and remains radiometrically interpretable. Because the method depends only on measurable spectra and linear transforms, it can be reproduced across cameras and extended to other fluorophores.

J. Imaging, Vol. 12, Pages 177: Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques

Dana El-Rushaidat — 2026-04-20

J. Imaging, Vol. 12, Pages 177: Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques

Journal of Imaging doi: 10.3390/jimaging12040177

Authors: Dana El-Rushaidat Nour Almohammad Raine Yeh Kinda Fayyad

This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community.

J. Imaging, Vol. 12, Pages 176: MSWA-ResNet: Multi-Scale Wavelet Attention for Patient-Level and Interpretable Breast Cancer Histopathology Classification

Ghadeer Al Sukkar — 2026-04-19

J. Imaging, Vol. 12, Pages 176: MSWA-ResNet: Multi-Scale Wavelet Attention for Patient-Level and Interpretable Breast Cancer Histopathology Classification

Journal of Imaging doi: 10.3390/jimaging12040176

Authors: Ghadeer Al Sukkar Ali Rodan Azzam Sleit

Breast cancer histopathological classification is critical for diagnosis and treatment planning, yet manual assessment remains time-consuming and subject to inter-observer variability. Although deep learning approaches have advanced automated analysis, image-level data splitting may introduce data leakage, and spatial-domain architectures lack explicit multi-scale frequency modeling. This study proposes MSWA-ResNet, a Multi-Scale Wavelet Attention Residual Network that embeds recursive discrete wavelet decomposition within residual blocks to enable frequency-aware and scale-aware feature learning. The model is evaluated on the BreakHis dataset using a strict patient-level protocol with 70/30 patient-wise splitting, five-fold stratified cross-validation, ensemble prediction, and hierarchical aggregation from patch to patient level. MSWA-ResNet achieves 96% patient-level accuracy at 100×, 200×, and 400× magnifications, and 92% at 40×, with F1-scores of 0.97 and 0.94, respectively. At 200× and 400×, accuracy improves from 0.92 to 0.96 and F1-score from 0.94 to 0.97 over baseline CNNs while maintaining 11.8–12.1 M parameters and 2.5–4.8 ms inference time. Grad-CAM demonstrates improved localization of diagnostically relevant regions, indicating that explicit multi-scale frequency modeling enhances accurate and interpretable patient-level classification.

J. Imaging, Vol. 12, Pages 175: SurveyNet: A Unified Deep Learning Framework for OCR and OMR-Based Survey Digitization

Rubi Quiñones — 2026-04-17

J. Imaging, Vol. 12, Pages 175: SurveyNet: A Unified Deep Learning Framework for OCR and OMR-Based Survey Digitization

Journal of Imaging doi: 10.3390/jimaging12040175

Authors: Rubi Quiñones Sreeja Cheekireddy Eren Gultepe

Manual survey data entry remains a bottleneck in large-scale research, marketing, and public policy, where survey sheets are still widely used due to accessibility and high response rates. Despite the progress in Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), existing systems treat these tasks separately and are typically tailored to clean, standardized forms, making them unreliable for real-world survey sheets with diverse markings and handwritten inputs. These limitations hinder automation and introduce significant error rates in data transcription. To address this, we propose SurveyNet, a unified deep learning framework that combines OCR and OMR capabilities to automatically digitize complex survey responses within a single model. SurveyNet processes both handwritten digits and a wide variety of mark types including ticks, circles, and crosses across multiple question formats. We also introduce SurveySet, a novel dataset comprising 135 real-world survey forms annotated across four key response types. Experimental results demonstrate that SurveyNet achieves between 50% and 97% classification accuracy across tasks, with strong performance even on small and imbalanced datasets. This framework offers a scalable solution for streamlining survey digitization workflows, reducing manual errors, and enabling timely analysis in domains ranging from consumer research to public health and education.

J. Imaging, Vol. 12, Pages 174: Vision-Based Measurement of Breathing Deformation in Wind Turbine Blade Fatigue Test

Xianlong Wei — 2026-04-17

J. Imaging, Vol. 12, Pages 174: Vision-Based Measurement of Breathing Deformation in Wind Turbine Blade Fatigue Test

Journal of Imaging doi: 10.3390/jimaging12040174

Authors: Xianlong Wei Cailin Li Zhiyong Wang Zhao Hai Jinghua Wang Leian Zhang

Wind turbine blades are subjected to complex environmental conditions during long-term operation, which may lead to structural degradation and performance loss. To ensure structural integrity, fatigue testing prior to deployment is essential. This paper proposes a vision-based method for measuring the full-cycle breathing deformation of wind turbine blades during fatigue testing. The method captures dynamic image sequences of the blade’s hotspot cross-section using industrial cameras and employs a feature-based template matching approach to reconstruct the three-dimensional coordinates of target points. Through coordinate transformation, the deformation trajectories are obtained, enabling quantitative analysis of the blade’s dynamic responses in both flapwise and edgewise directions. A dedicated hardware–software system was developed and validated through full-scale fatigue experiments. Quantitative comparison with strain gage measurements shows that the proposed method achieves mean absolute deviations of 0.84 mm and 0.93 mm in two independent experiments, respectively, with closely matched deformation trends under typical loading conditions. These results demonstrate that the proposed method can reliably capture the global deformation behavior of the blade with millimeter-level accuracy, while significantly reducing instrumentation complexity compared to conventional contact-based approaches. The proposed method provides an effective and practical solution for full-field dynamic deformation measurement in blade fatigue testing, offering strong potential for structural health monitoring and early damage detection in wind turbine systems.

J. Imaging, Vol. 12, Pages 173: Cracking the Code: Computational Image Analysis Tools for Histopathological and Morphometric Insights

Ana Luisa Teixeira de Almeida — 2026-04-17

J. Imaging, Vol. 12, Pages 173: Cracking the Code: Computational Image Analysis Tools for Histopathological and Morphometric Insights

Journal of Imaging doi: 10.3390/jimaging12040173

Authors: Ana Luisa Teixeira de Almeida Ana Beatriz Gram dos Santos Debora Ferreira Barreto-Vieira

The assessment of histopathological features has evolved considerably, transitioning from traditional manual measurements to more sophisticated, technology-assisted approaches. Classical histological evaluation, while foundational and highly reliable, is inherently labor-intensive and subject to inter-observer variability. With the advent of digital pathology, these practices have been progressively enhanced by image processing software, which offers capabilities such as segmentation, feature extraction, and data visualization. However, despite their promise, the integration of machine learning into this branch of pathology faces notable challenges, such as the need for large, high-quality annotated datasets and the integration into existing workflows, which remain significant hurdles. Looking forward, the role of specialists in histological evaluation remains crucial in this evolving landscape. While automation streamlines routine tasks, the expertise of pathologists is indispensable in validating results and interpreting findings in scientific contexts. This comprehensive review explores the trajectory of histological evaluation methods, from manual and classical strategies to cutting-edge digital tools, highlighting the benefits, limitations, and implications of each approach in contemporary practice.

J. Imaging, Vol. 12, Pages 172: Dual RANSAC with Rescue Midpoint Multi-Trend Vanishing Point Detection

Nada Said — 2026-04-16

J. Imaging, Vol. 12, Pages 172: Dual RANSAC with Rescue Midpoint Multi-Trend Vanishing Point Detection

Journal of Imaging doi: 10.3390/jimaging12040172

Authors: Nada Said Bilal Nakhal Ali El-Zaart Lama Affara

Vanishing point detection is a fundamental step in computer vision that allows 3D scene understanding and autonomous navigation. Classical techniques have significant challenges when trying to understand scenes that are heavily cluttered and images containing multiple perspective cues, leading to poor or unreliable vanishing point determination. We present a Dual RANSAC with Rescue Midpoint-based Multi-Trend Vanishing Point Detection framework, which targets the simultaneous detection and fine-tuning of multiple, globally consistent vanishing points. The proposed framework introduces a novel Midpoint-based Multi-Trend Random Sample Consensus formulation that operates on line segment midpoints to infer dominant directional groups, thereby eliminating noisy or unstable midpoints and stabilizing subsequent vanishing point inference. The main novelty lies in using line segment midpoints to model the orientation variation as a linear regression in the midpoint–orientation space, which helps reduce sensitivity to endpoint instability. Candidate vanishing points are prioritized through inlier-based confidence ranking and subsequently optimized via an MSAC-based arbiter to resolve hypothesis conflicts and minimize geometric error. We evaluate our work against state-of-the-art techniques such as J-Linkage and Conditional Sample Consensus, over two of the current challenging public datasets that comprise the York Urban Dataset and the Toulouse Vanishing Point Dataset. The results show that the proposed framework achieves a recall of up to 95% and an image success rate of almost 84%, outperforming both J-Linkage and Conditional Sample Consensus, especially under tighter angular thresholds. This demonstrates the ability of the proposed framework to provide enhanced stability and localization accuracy.

J. Imaging, Vol. 12, Pages 171: Morphological Convolutional Neural Network for Efficient Facial Expression Recognition

Robert — 2026-04-15

J. Imaging, Vol. 12, Pages 171: Morphological Convolutional Neural Network for Efficient Facial Expression Recognition

Journal of Imaging doi: 10.3390/jimaging12040171

Authors: Robert Sarifuddin Madenda Suryadi Harmanto Michel Paindavoine Dina Indarti

This study proposes a morphological convolutional neural network (MCNN) architecture that integrates morphological operations with CNN layers for facial expression recognition (FER). Conventional CNN-based FER models primarily rely on appearance features and may be sensitive to illumination and demographic variations. This work investigates whether morphological structural representations provide complementary information to convolutional features. A multi-source and multi-ethnic FER dataset was constructed by combining CK+, JAFFE, KDEF, TFEID, and a newly collected Indonesian Facial Expression dataset, resulting in 3684 images from 326 subjects across seven expression classes. Subject-independent data splitting with 10-fold cross-validation was applied to ensure reliable evaluation. Experimental results show that the proposed MCNN1 model achieves an average accuracy of 88.16%, while the best MCNN2 variant achieves 88.7%, demonstrating competitive performance compared to MobileNetV2 (88.27%), VGG19 (87.58%), and the morphological baseline MNN (50.73%). The proposed model also demonstrates improved computational efficiency, achieving lower inference latency (21%) and reduced GPU memory usage (64%) compared to baseline models. These results indicate that integrating morphological representations into convolutional architectures provides a modest but consistent improvement in FER performance while enhancing generalization and efficiency under heterogeneous data conditions.

J. Imaging, Vol. 12, Pages 170: ARS-GS: Anisotropic Reflective Spherical 3D Gaussian Splatting

Chenrui Wu — 2026-04-15

J. Imaging, Vol. 12, Pages 170: ARS-GS: Anisotropic Reflective Spherical 3D Gaussian Splatting

Journal of Imaging doi: 10.3390/jimaging12040170

Authors: Chenrui Wu Xinyu Shi Zhenzhong Chu Yao Huang

3D scene reconstruction serves as a fundamental technology with widespread applications in virtual reality, structural inspection, and robotic systems. While recent advances in 3D Gaussian Splatting have significantly enhanced scene reconstruction capabilities, the performance of such methods remains suboptimal when applied to highly reflective environments. To overcome this limitation, we introduce ARS-GS, a novel framework that integrates Anisotropic Spherical Gaussian reflection modeling and spherical harmonics diffuse approximation into a physically based rendering pipeline. This architecture incorporates a skip connection between the Anisotropic Spherical Gaussian module and the Gaussian primitives, effectively preserving surface details while maintaining computational efficiency. Comprehensive experimental evaluations validate the efficacy of ARS-GS across multiple datasets. Specifically, our method establishes new state-of-the-art quantitative benchmarks, achieving a peak signal-to-noise ratio of 38.30 and a structural similarity index measure of 0.997 on the neural radiance fields synthetic dataset, alongside a peak signal-to-noise ratio of 46.31 on the Gloss Blender dataset. Furthermore, on the challenging reflective neural radiance fields real-world dataset, our approach secures the highest peak signal-to-noise ratio scores, highlighted by a metric of 26.26 on the Sedan scene. The proposed method also substantially reduces perceptual errors, yielding a learned perceptual image patch similarity as low as 0.204, thereby consistently outperforming existing techniques in the reconstruction of highly specular surfaces with superior geometric fidelity.

J. Imaging, Vol. 12, Pages 169: Novel, Contrast Echocardiography-Based Trabeculation Quantification Method in the Diagnosis of Left Ventricular Excessive Trabeculation

Kristóf Attila Farkas-Sütő — 2026-04-14

J. Imaging, Vol. 12, Pages 169: Novel, Contrast Echocardiography-Based Trabeculation Quantification Method in the Diagnosis of Left Ventricular Excessive Trabeculation

Journal of Imaging doi: 10.3390/jimaging12040169

Authors: Kristóf Attila Farkas-Sütő Balázs Mester Flóra Klára Gyulánczi Krisztina Filipkó Hajnalka Vágó Béla Merkely Andrea Szűcs

Cardiac MRI (CMR) is the gold standard for diagnosing left ventricular excessive trabeculation (LVET), whereas echocardiography (Echo) often does not yield a definitive diagnosis. The use of ultrasound contrast material offers the potential for more accurate imaging of the trabecular system; however, we do not yet have diagnostic criteria developed specifically for contrast Echo (CE-Echo). We aimed to determine the role of CE-Echo in the diagnosis of LVET and to propose a novel method for quantifying trabeculation. We included 55 LVET subjects and 54 age- and sex-matched healthy Control subjects. All subjects underwent non-contrast Echo, CE-Echo, and CMR examinations. In addition to volumetric parameters and ejection fraction (EF), we measured the area of the trabeculated layer and its ratio to the LV area (Trab/LV_area) on apical CE-Echo views. Based on the CMR-derived diagnosis, the Trab/LV_area ratio identified individuals with LVET with high specificity (98%) and sensitivity (95%) when the average of the apical views reached 17% (AUC = 0.98), or when it exceeded 20% in at least one view (AUC = 0.96). The use of CE-Echo may assist in the quantitative diagnosis of LVET in addition to its morphological assessment, and the Trab_area/LVarea may be a good additional criterion in the diagnosis of LVET.

J. Imaging, Vol. 12, Pages 168: Assessing CNNs and LoRA-Fine-Tuned Vision–Language Models for Breast Cancer Histopathology Image Classification

Tomiris M. Zhaksylyk — 2026-04-14

J. Imaging, Vol. 12, Pages 168: Assessing CNNs and LoRA-Fine-Tuned Vision–Language Models for Breast Cancer Histopathology Image Classification

Journal of Imaging doi: 10.3390/jimaging12040168

Authors: Tomiris M. Zhaksylyk Beibit B. Abdikenov Nurbek M. Saidnassim Birzhan T. Ayanbayev Aruzhan S. Imasheva Temirlan S. Karibekov

Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels. Convolutional neural networks (CNNs) have long been the standard for image-based diagnosis, yet recent advances in vision-language models (VLMs) suggest they may provide strong and transferable representations for complex medical images. In this study, we present a systematic comparison between CNN baselines and large VLMs—Qwen2 and SmolVLM—fine-tuned with Low-Rank Adaptation (LoRA; r=16, α=32, dropout = 0.05) on the BreakHis dataset. Models were evaluated at 40×, 100×, 200×, and 400× magnifications using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). While Qwen2 achieved moderate performance across magnifications (e.g., 0.8736 accuracy and 0.9552 AUC at 200×), SmolVLM consistently outperformed Qwen2 and substantially reduced the gap with CNN baselines, reaching up to 0.9453 accuracy and 0.9572 F1-score at 200×—approaching the performance of AlexNet (0.9543 accuracy) at the same magnification. CNN baselines, particularly ResNet34, remained the strongest models overall, achieving the highest performance across all magnifications (e.g., 0.9879 accuracy and 0.9984 AUC at 40×). These findings demonstrate that LoRA fine-tuned VLMs, despite requiring gradient accumulation and memory-efficient optimizers and operating with a significantly smaller number of trainable parameters, can achieve competitive performance relative to traditional CNNs. However, CNN-based architectures still provide the highest accuracy and robustness for histopathology classification. Our results highlight the potential of VLMs as parameter-efficient alternatives for digital pathology tasks, particularly in resource-constrained settings.

J. Imaging, Vol. 12, Pages 167: Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions

Sara Lopes — 2026-04-12

J. Imaging, Vol. 12, Pages 167: Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions

Journal of Imaging doi: 10.3390/jimaging12040167

Authors: Sara Lopes Miguel Mascarenhas João Fonseca Adelino F. Leite-Moreira

Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural support. However, the current evidence base remains heterogeneous, and translational challenges persist. Methods: This review summarizes current applications and developments of AI across white-light bronchoscopy (WLB), image-enhanced bronchoscopy (e.g., narrow-band imaging and autofluorescence imaging), endobronchial ultrasound (EBUS), virtual and robotic bronchoscopies, and workflow optimization and training. The authors also examine the methodological limitations, regulatory considerations, and implementation barriers that affect translation into routine practice. Results: Reported developments include deep learning-based models for mucosal abnormality detection, lymph-node characterization during EBUS-guided transbronchial needle aspiration (EBUS-TBNA), improved lesion localization, and reduction in operator-dependent variability. Additionally, AI-assisted simulation platforms and decision-support tools are reshaping training paradigms. Nevertheless, most studies remain retrospective or single-center, with limited external validation, dataset heterogeneity, unclear model explainability, and incomplete integration into clinical workflows. Conclusions: AI has the potential to support lesion detection, navigation, and training in pulmonary endoscopy. However, robust prospective validation, standardized datasets, transparent model reporting, robust data governance, multidisciplinary collaboration, and careful integration into clinical practice are required before widespread adoption.

J. Imaging, Vol. 12, Pages 166: A TV–BM3D Iterative Algorithm for VMAT-CT Reconstruction

Chia-Lung Chien — 2026-04-10

J. Imaging, Vol. 12, Pages 166: A TV–BM3D Iterative Algorithm for VMAT-CT Reconstruction

Journal of Imaging doi: 10.3390/jimaging12040166

Authors: Chia-Lung Chien Beibei Guo Rui Zhang

Volumetric modulated arc therapy-computed tomography (VMAT-CT), which is the CT reconstructed using the portal images collected during VMAT, can potentially be an effective onsite imaging tool. The goal of this study was to propose an iterative reconstruction algorithm that can further improve the image quality of VMAT-CT and reduce the number of failed reconstructions. An iterative algorithm combining total variation (TV) with block-matching and 3D filtering (BM3D) was proposed, addressing the L1-L2 regularization problem using the split Bregman method. We collected portal images from 67 VMAT cases including 50 phantom and 17 real-patient cases. Both Feldkamp–Davis–Kress (FDK) and TV-BM3D iterative algorithms were used to reconstruct VMAT-CT using the collected images. The preprocessing methods developed by our group previously were also used in this study. A total of 48 out of 50 phantom cases and 15 out of 17 real-patient cases were successfully reconstructed using the iterative algorithm together with image preprocessing. In contrast, 39 phantom cases and 8 patient cases could be reconstructed using the original FDK algorithm, and 44 phantom cases and 11 patient cases could be reconstructed using the FDK algorithm together with preprocessing. Compared with the FDK algorithm, the TV-BM3D iterative algorithm significantly improved the image quality of VMAT-CT at all treatment sites. To the best of our knowledge, this study is the first to develop an iterative VMAT-CT reconstruction algorithm. It can be used to reconstruct CT images locally, and is superior to FDK-based algorithms in terms of the success rate and reconstructed image quality. This strongly supports the use of VMAT-CT as a promising imaging tool for treatment monitoring and adaptive radiotherapy.

J. Imaging, Vol. 12, Pages 165: Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data

Chingiz Seyidbayli — 2026-04-10

J. Imaging, Vol. 12, Pages 165: Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data

Journal of Imaging doi: 10.3390/jimaging12040165

Authors: Chingiz Seyidbayli Soheil Nezakat Andreas Reinhardt

Cloud cover significantly reduces the electrical power output of photovoltaic systems, making accurate short-term cloud movement predictions essential for reliable solar energy production planning. This article presents a deep learning framework that directly estimates cloud movement from ground-based all-sky camera images, rather than predicting future production from past power data. The system is based on a three-step process: First, a lightweight Convolutional Neural Network segments cloud regions and produces probabilistic masks that represent the spatial distribution of clouds in a compact and computationally efficient manner. This allows subsequent models to focus on the geometry of clouds rather than irrelevant visual features such as illumination changes. Second, a Vector Quantized Variational Autoencoder compresses these masks into discrete latent token sequences, reducing dimensionality while preserving fundamental cloud structure patterns. Third, a GPT-style autoregressive transformer learns temporal dependencies in this token space and predicts future sequences based on past observations, enabling iterative multi-step predictions, where each prediction serves as the input for subsequent time steps. Our evaluations show an average intersection-over-union ratio of 0.92 and a pixel accuracy of 0.96 for single-step (5 s ahead) predictions, while performance smoothly decreases to an intersection-over-union ratio of 0.65 and an accuracy of 0.80 in 10 min autoregressive propagation. The framework also provides prediction uncertainty estimates through token-level entropy measurement, which shows positive correlation with prediction error and serves as a confidence indicator for downstream decision-making in solar energy forecasting applications.

J. Imaging, Vol. 12, Pages 164: High-Resolution Measurement of Surface Normal Maps Using Specular Reflection Imaging

Shinichi Inoue — 2026-04-10

J. Imaging, Vol. 12, Pages 164: High-Resolution Measurement of Surface Normal Maps Using Specular Reflection Imaging

Journal of Imaging doi: 10.3390/jimaging12040164

Authors: Shinichi Inoue Yoshinori Igarashi Seiji Suzuki

This paper presents a method for measuring the spatial distribution of surface normal vectors with high angular accuracy. The measured data are visualized using a color-mapping technique and represented as normal maps, which are commonly used in computer graphics. Reliable methods for evaluating material surface properties have long been sought in industrial applications where visual assessments of reflective properties are still widely employed, particularly in appearance-critical fields. Motivated by this need, we introduce an imaging-based technique for measuring the high-resolution spatial distribution of surface normal vectors from specular reflection. A dedicated measurement apparatus was developed to capture surface normal vectors at 1024 × 1024 sampling points with a spatial resolution of 0.02 × 0.02 mm and an angular resolution of approximately 0.1°. Using this apparatus, normal maps were obtained for various materials, including plastic, ceramic tile, inkjet paper, and aluminum sheets. The spatial distribution of surface normal vectors reflects surface roughness, which strongly influences perceived texture. The resulting normal maps enable not only quantitative surface analysis for industrial inspection but also the physical reproduction of gloss in computer graphics.

J. Imaging, Vol. 12, Pages 163: A Robust Rule-Based Framework for Stone Detection and Posterior Acoustic Shadow Localization in Abdominal Ultrasound

Kyuseok Kim — 2026-04-09

J. Imaging, Vol. 12, Pages 163: A Robust Rule-Based Framework for Stone Detection and Posterior Acoustic Shadow Localization in Abdominal Ultrasound

Journal of Imaging doi: 10.3390/jimaging12040163

Authors: Kyuseok Kim Ji-Youn Kim

Posterior acoustic shadowing is a fundamental physical phenomenon associated with calcified stones in ultrasound image, yet it has not been fully exploited in automated ultrasound analysis. This study aimed to develop an explainable, semi-automatic rule-based framework that explicitly incorporates posterior acoustic shadow characteristics for stone detection and localization in a clinically guided manner. A rule-based framework was designed to generate stone candidates using morphological enhancement and to evaluate them through local contrast analysis, posterior shadow region assessment, and shape-based penalties. A composite score integrating these features was used to rank candidates. The method was evaluated on 52 kidney stone and 66 gallbladder stone ultrasound images, stratified into three diagnostic confidence categories. Performance was assessed using an ablation study and centroid distance error measured in pixels relative to expert-defined references. In the 50–60% confidence group, the accuracy increased from 0.29 to 0.64 for kidney stones and from 0.30 to 0.60 for gallbladder stones when posterior shadow information was included. Centroid distance errors in the ≥80% confidence group were 1.26 ± 0.28 mm for kidney stones and 1.44 ± 0.91 mm for gallbladder stones. The proposed framework enhances diagnostic confidence by leveraging physically grounded posterior acoustic shadow analysis and provides a reproducible augmentation to conventional ultrasound-based stone assessment.

J. Imaging, Vol. 12, Pages 162: D2MNet: Difference-Aware Decoupling and Multi-Prompt Learning for Medical Difference Visual Question Answering

Lingge Lai — 2026-04-09

J. Imaging, Vol. 12, Pages 162: D2MNet: Difference-Aware Decoupling and Multi-Prompt Learning for Medical Difference Visual Question Answering

Journal of Imaging doi: 10.3390/jimaging12040162

Authors: Lingge Lai Weihua Ou Jianping Gou Zhonghua Liu

Difference visual question answering (Diff-VQA) aims to answer questions by identifying and reasoning about differences between medical images. Existing methods often rely on simple feature subtraction or fusion to model image differences, while overlooking the asymmetric descriptive requirements of changed and unchanged cases and providing limited task-specific guidance to pretrained language decoders. To address these limitations, we propose D2MNet (Difference-aware Decoupling and Multi-prompt Network), a framework for medical Diff-VQA that combines change-aware reasoning with prompt-guided answer generation. Specifically, a Change Analysis Module (CAM) predicts whether a change is present and produces a binary change-aware prompt; a Difference-Aware Module (DAM) uses dual attention to capture fine-grained difference features; and a multi-prompt learning mechanism (MLM) injects question-aware, change-aware, and learnable prompts into the language decoder to improve contextual alignment and response generation. Experiments on the MIMIC-DiffVQA benchmark show that D2MNet achieves a CIDEr score of 2.907 ± 0.040, outperforming the strongest baseline, ReAl (2.409), under the same evaluation setting. These results demonstrate the effectiveness of the proposed design on benchmark medical Diff-VQA and suggest its potential for assisting difference-aware medical answer generation.

J. Imaging, Vol. 12, Pages 160: Comparative Assessment of Hyperspectral Image Segmentation Algorithms for Fruit Defect Detection Under Different Illumination Conditions

Anastasia Zolotukhina — 2026-04-08

J. Imaging, Vol. 12, Pages 160: Comparative Assessment of Hyperspectral Image Segmentation Algorithms for Fruit Defect Detection Under Different Illumination Conditions

Journal of Imaging doi: 10.3390/jimaging12040160

Authors: Anastasia Zolotukhina Anton Sudarev Georgiy Nesterov Demid Khokhlov

This study presents a comparative analysis of hyperspectral image segmentation algorithms for fruit defect detection under different illumination conditions. The research evaluates the performance of four segmentation methods (Spectral Angle Mapper, Random Forest, Support Vector Machine, and Neural Network) using three distinct illumination modes (local, simultaneous and sequential). The experimental setup employed hyperspectral imaging to assess tomato fruit samples, with data acquisition performed across the 450–850 nm spectral range. Quantitative metrics, including accuracy, error rate, precision, recall, F1-score, and Intersection over Union (IoU), were used to evaluate algorithm performance. Key findings indicate that Random Forest demonstrated superior performance across most metrics, particularly under simultaneous illumination conditions. The highest accuracy was achieved by Random Forest under sequential illumination (0.9971), while the best combination of segmentation metrics was obtained under simultaneous illumination, with an F1-score of 0.8996 and an IoU of 0.8176. The Neural Network showed competitive results. The Spectral Angle Mapper proved sensitive to illumination variations but excelled in specific scenarios requiring minimal memory usage. By demonstrating that acquisition protocol optimization can substantially improve segmentation performance, our results support the development of accurate, non-contact, high-throughput inspection systems and contribute to reducing postharvest losses and improving supply chain quality control.

J. Imaging, Vol. 12, Pages 161: RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery

Lei Li — 2026-04-08

J. Imaging, Vol. 12, Pages 161: RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery

Journal of Imaging doi: 10.3390/jimaging12040161

Authors: Lei Li Chenrong Fang Wei Li Kan Chen Baolong Li Qian Sun

Structural reconstruction helps infer the spatial relationships and object layouts in a scene, which is an essential computer vision task for understanding visual content. However, it remains challenging due to the high complexity of scene structural topologies in real-world environments. To address this challenge, this paper proposes RegionGraph, a novel method for structural reconstruction of buildings from a satellite image. It utilizes a layout region graph construction and graph contraction approach, introducing a primitive (layout region) estimation network named ConPNet for detecting and estimating different structural primitives. By combining structural extraction and rendering synthesis processes, RegionGraph constructs a graph structure with layout regions as nodes and adjacency relationships as edges, and transforms the graph optimization process into a node-merging-based graph contraction problem to obtain the final structural representation. The experiments demonstrated that RegionGraph achieves a 4% improvement in average F1 scores across three types of primitives and exhibits higher regional completeness and structural coherency in the reconstructed structure.

J. Imaging, Vol. 12, Pages 159: Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness

Hideyuki Ajiki — 2026-04-08

J. Imaging, Vol. 12, Pages 159: Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness

Journal of Imaging doi: 10.3390/jimaging12040159

Authors: Hideyuki Ajiki Midori Tanaka

Even when an object’s color is accurately reproduced in a colorimetrically reproduced image (CRI), the perceived material appearance does not necessarily match that of the original object. This mismatch remains a challenge for faithfully reproducing real-world appearance in digital media. In this study, we investigated how lightness and chroma modulation affect the perception of glossiness, transparency, and roughness. These three attributes were quantitatively correlated with physical surface properties and image features through a direct comparison between objects and images. Observers selected the images that best matched the material appearance of the physical samples for each attribute. Image features derived from the gray-level co-occurrence matrix (GLCM) and surface roughness parameters were analyzed to compare the selected images with the CRI. In the lightness experiment, observers consistently selected images with higher lightness than the CRI, which was accompanied by increased complexity in the luminance distribution. In the chroma experiment, images with higher chroma were preferred; however, changes in GLCM features were negligible. Notably, stimuli with small local luminance differences at the CRI required larger shifts in image features to achieve perceptual matching. These findings indicate that modulating the luminance distribution is crucial for aligning the perceived appearance between physical objects and their digital representations.

J. Imaging, Vol. 12, Pages 158: Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning

Hao Wang — 2026-04-08

J. Imaging, Vol. 12, Pages 158: Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning

Journal of Imaging doi: 10.3390/jimaging12040158

Authors: Hao Wang Dinghui Wu Shuguang Han Jingli Tang Wenlong Zhang

Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (n = 670) were allocated for training (n = 535), and the remaining samples (n = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment.

J. Imaging, Vol. 12, Pages 157: A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning

Ludovica Ciardiello — 2026-04-06

J. Imaging, Vol. 12, Pages 157: A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning

Journal of Imaging doi: 10.3390/jimaging12040157

Authors: Ludovica Ciardiello Patrizia Agnello Marta Petyx Fabio Martinelli Mario Cesarelli Antonella Santone Francesco Mercaldo

Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and telemedicine scenarios. Beyond pose estimation, the proposed method is able to compute angles between joints, enabling analysis of body alignment and posture. The proposed approach is built upon a customized skeleton with 25 anatomical keypoints and a dataset composed of over 150,000 annotated and augmented images derived from multiple open-source datasets. Experimental results demonstrate the effectiveness of the proposed method, achieving a mAP@50 of 0.58 for keypoint localization and 0.98 for object detection. Moreover, we demonstrate several real-world practical use cases in evaluating exercise correctness and identifying postural deviations by exploiting the proposed method, confirming that the proposed method can represent a promising approach for automated motion analysis, with potential impact on digital health, rehabilitation support, and remote patient care.

J. Imaging, Vol. 12, Pages 156: An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching

Kun Zhang — 2026-04-03

J. Imaging, Vol. 12, Pages 156: An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching

Journal of Imaging doi: 10.3390/jimaging12040156

Authors: Kun Zhang Jinming Xing Qingtai Xiao

Ultrasound images have some limitations, such as low signal-to-noise ratio (SNR), speckle noise, lower dynamic range, blurred boundaries, and shadowing; therefore, ultrasound image registration is an important task for estimating tissue motion and analyzing tissue mechanical properties. In this paper, an effective non-rigid ultrasound image registration method is proposed. By integrating intensity, local phase information, and descriptor matching under a variational framework, we can find and track the non-rigid transformation of each pixel under diffeomorphism between the source and target images based on the warping technique. Experiments using simulation and in vivo ultrasound images of the human carotid artery are conducted to demonstrate the advantages of the proposed algorithm, which will act as an important supplement to current ultrasound image registration.

J. Imaging, Vol. 12, Pages 155: DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration

Lujun Zhai — 2026-04-03

J. Imaging, Vol. 12, Pages 155: DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration

Journal of Imaging doi: 10.3390/jimaging12040155

Authors: Lujun Zhai Yonghui Wang Yu Zhou Suxia Cui

Historical images as the dominant method for documenting the world and its inhabitants can help us to better understand the real history. Due to the limited camera technology, historical images captured in the early to mid-20th century tend to be very blurry, unclear, noisy, and obscure. The goal of this paper is to super-resolve images for historical image restoration. Compared to the degradations in modern digital imagery, those in historical images have unique features that are typically much more complex and less well understood. The discrepancy between historical images and modern high-definition digital images leads to a significant performance drop for existing super-resolution (SR) models trained on modern digital imagery. To tackle this problem, we propose a new method, namely DA-CycleGAN. Specifically, the DA-CycleGAN is built on top of CycleGAN to achieve unsupervised learning. We introduce a degradation-adaptive (DA) module with strong, flexible adaptation to learn various unknown degradations from samples. Moreover, we collect a large dataset containing 10,000 low-resolution images from real historical films. The dataset features various natural degradations. Our experimental results demonstrate the superior performance of DA-CycleGAN and the effectiveness of our image dataset for achieving accurate super-resolution enhancement of historical images.

J. Imaging, Vol. 12, Pages 154: WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration

Junyuan Ma — 2026-04-02

J. Imaging, Vol. 12, Pages 154: WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration

Journal of Imaging doi: 10.3390/jimaging12040154

Authors: Junyuan Ma Qunbo Lv Zheng Tan

Image restoration under adverse weather conditions has attracted increasing attention because of its importance for both human perception and downstream vision applications. Existing methods, however, are often designed for a single degradation type. We present WeatherMAR, a multi-weather restoration framework that formulates adverse-weather restoration as a paired-domain completion problem in a shared continuous token space. Specifically, WeatherMAR concatenates degraded and clean token sequences into a joint paired-domain sequence and performs restoration through masked autoregressive modeling, in which self-attention enables direct cross-domain interaction. To strengthen conditional learning while avoiding trivial paired correspondences, we introduce complementary bidirectional masking together with an optional reverse objective used only during training to encourage degradation-aware representations. WeatherMAR further employs a conditional diffusion objective for continuous token prediction and adopts a progress-to-step schedule to improve inference efficiency. Extensive experiments on standard multi-weather benchmarks, including Snow100K, Outdoor-Rain, and RainDrop, show that WeatherMAR achieves the best PSNR/SSIM on Snow100K-S (38.14/0.9684), the best SSIM on Outdoor-Rain (0.9396), and the best PSNR on Snow100K-L (32.58) and RainDrop (33.12). These results demonstrate that paired-domain token completion provides an effective solution for adverse-weather restoration.

J. Imaging, Vol. 12, Pages 153: Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI

Jesus David Rios-Perez — 2026-04-02

J. Imaging, Vol. 12, Pages 153: Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI

Journal of Imaging doi: 10.3390/jimaging12040153

Authors: Jesus David Rios-Perez German Sanchez-Torres John W. Branch-Bedoya Camilo Andres Laiton-Bonadiez

Periodic artifacts such as ringing (Gibbs), herringbone (spike/corduroy), and zipper patterns degrade the quality of brain MRI. We present a reproducible framework that (i) synthetically generates periodic artifacts with controllable severity directly in k-space, (ii) normalizes pattern orientation through a Radon-guided alignment step, and (iii) corrects them in the wavelet domain using a 2D DWT (AA/AD/DA/DD) with a band-weighted loss. The evaluation was conducted using DLBS T1-weighted 3T MRI volumes with synthetically generated periodic artifacts. It combined global image-quality metrics (SSIM, PSNR) with per-band metrics to quantify how correction concentrates on high-frequency components, and included ablation studies, mixed-artifact stress tests, and structural preservation analyses. Compared with several baseline architectures, the proposed approach shows improvements in structural fidelity and a reduction in periodic patterns (SSIM: 0.985±0.022; PSNR: 43.337±5.364; reduction in concentrated error in high-frequency bands), while preserving unaffected structures. These findings indicate that, within a controlled synthetic benchmark, aligning the pattern orientation prior to learning and optimizing correction in the wavelet domain enables suppression of synthetically generated periodic artifacts while limiting over-smoothing.

J. Imaging, Vol. 12, Pages 152: Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support

Alexandru-George Berciu — 2026-03-31

J. Imaging, Vol. 12, Pages 152: Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support

Journal of Imaging doi: 10.3390/jimaging12040152

Authors: Alexandru-George Berciu Diana Rus-Gonciar Teodora Mocan Lucia Agoston-Coldea Carmen Cionca Eva-Henrietta Dulf

Myocardial fibrosis is a critical prognostic marker involving a progressive cascade of pathological conditions. Accurate assessment of fibrosis in myocardial samples is a routine but difficult procedure for pathologists. This article presents a semi-automated system designed to ease this task while providing pixel-level accuracy that exceeds manual estimation capabilities. The proposed innovative approach combines Gabor filters with CIELAB color space analysis to ensure the efficiency and interpretability of calculations. Testing on histopathological samples, differentiating between fibrous, healthy, and variant tissues, yielded a promising accuracy of 87.5% for images with fibrosis and 80% for all 45 images tested. This system successfully establishes a solid foundation for automated diagnosis, providing pathologists with a reliable and highly accurate tool for quantitative analysis of cardiac tissue.

J. Imaging, Vol. 12, Pages 151: Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review

Pasquale Frisina — 2026-03-30

J. Imaging, Vol. 12, Pages 151: Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review

Journal of Imaging doi: 10.3390/jimaging12040151

Authors: Pasquale Frisina Paolo Ricci Filippo Valentini Daniela Messineo

Radiomics may aid the noninvasive characterization of adrenal incidentalomas; however, reproducibility is limited by methodological heterogeneity. In this retrospective, single-center, exploratory study, we tested whether radiomic features from baseline non-enhanced computed tomography (NECT) discriminate benign from malignant/metastatic adrenal lesions and contextualized results with a PRISMA 2020 systematic review (PubMed/Scopus 2017–2025; PROSPERO CRD420251276627). Thirty-three patients (36 lesions: 12 lipid-rich adenomas, 9 lipid-poor adenomas, 6 pheochromocytomas, 7 malignant/metastatic lesions, 2 myelolipomas) were included; myelolipomas were excluded from primary comparisons. Two abdominal radiologists performed consensus 3D segmentation on NECT. Using LIFEx (v7.8.0) and IBSI definitions, 42 features were extracted and z-score standardized. LASSO selected four heterogeneity descriptors: First-order Entropy, gray-level co-occurrence matrix (GLCM) entropy, gray-level size zone matrix (GLSZM) non-uniformity, and neighboring gray tone difference matrix (NGTDM) busyness. Heterogeneity increased from lipid-rich adenomas to pheochromocytomas and malignant/metastatic lesions (Kruskal–Wallis, all p < 0.001. Pairwise separability, measured using the Vargha–Delaney A index (VDA) as a rank-based measure of separability, was highest for lipid-rich adenomas versus malignant/metastatic lesions (0.93), intermediate for lipid-poor adenomas versus pheochromocytomas (0.73), and lowest for lipid-rich versus lipid-poor adenomas (0.64). The review identified 18 eligible CT radiomics studies that consistently reported higher entropy/non-uniformity in pheochromocytomas and malignant lesions than in lipid-rich adenomas. Global heterogeneity metrics on NECT may complement conventional CT criteria in indeterminate lesions; external validation with robust reference standards is needed in larger, multicenter cohorts with harmonization.

J. Imaging, Vol. 12, Pages 150: Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)

Md. Rahatul Islam — 2026-03-30

J. Imaging, Vol. 12, Pages 150: Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)

Journal of Imaging doi: 10.3390/jimaging12040150

Authors: Md. Rahatul Islam Yui Kataoka Keisuke Teramoto Keiichi Horio

Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten.

J. Imaging, Vol. 12, Pages 149: A New Feature Set for Texture-Based Classification of Remotely Sensed Images in a Quantum Framework

Archana G. Pai — 2026-03-30

J. Imaging, Vol. 12, Pages 149: A New Feature Set for Texture-Based Classification of Remotely Sensed Images in a Quantum Framework

Journal of Imaging doi: 10.3390/jimaging12040149

Authors: Archana G. Pai Koushikey Chhapariya Krishna M. Buddhiraju Surya S. Durbha

Texture feature extraction plays a crucial role in land-use and land-cover (LULC) classification for the remotely sensed images. However, when these images are quantized to a limited number of gray levels to reduce data volume or noise, conventional texture descriptors often lose discriminative power. This study investigates singular values of the gray-level co-occurrence matrix (GLCM) as novel texture features for image classification, with local binary pattern (LBP), complete LBP (CLBP) statistics, and original GLCM features proposed by Haralick et al. for comparison. Under coarse quantization, texture descriptors of LBP and its variants, which encode micro-texture, lose detail, whereas GLCM, which encodes macro-texture, retains structural co-occurrence patterns. This study thus proposes a new feature set, namely the Singular Values of the gray-level co-occurrence matrix (SVGM), for texture discrimination. Experimental analysis indicates SVGM achieves higher class separability by preserving dominant spatial structure while suppressing noise and redundancy. Quantitative evaluation using classical SVMs with multiple kernels, quantum learning models with different kernels, and neural baselines (ANN and 1D-CNN) further shows that SVGM consistently improves classification performance. Within our tested models, quantum kernel SVMs are competitive and achieve the best results on some datasets, while classical models perform best on others.

J. Imaging, Vol. 12, Pages 148: Quantifying Light Harshness: Method Automation and Influence of Photographic Light Modifiers

Veronika Štampfl — 2026-03-27

J. Imaging, Vol. 12, Pages 148: Quantifying Light Harshness: Method Automation and Influence of Photographic Light Modifiers

Journal of Imaging doi: 10.3390/jimaging12040148

Authors: Veronika Štampfl Jure Ahtik

Accurate assessment of light properties is essential and is measured with photometric and colorimetric standardized methods. However, the spatial characteristic of light—harshness—remains difficult to quantify. Building on the authors’ previous work, this study presented a fully automated method for determining light source harshness based on image analysis of cast shadows in a standardized environment. The improved method eliminated the need for manual shadow segmentation by introducing algorithmic noise removal and adaptive smoothing of shadow data. The method was applied to 180 test images comprising 30 combinations of photographic light-shaping attachments (e.g., softboxes, beauty dishes, and snoots) across two light sources (halogen and xenon) and three intensity levels. The results showed that the method was capable of detecting subtle differences in shadow properties and confirmed the influence of geometry, material, and orientation of the light modifiers on harshness. In addition, the results provided quantitative insight into the influence of photographic light modifiers on the original light.

J. Imaging, Vol. 12, Pages 147: A Systematic Review of Deep Learning Approaches for Hepatopancreatic Tumor Segmentation

Razeen Hussain — 2026-03-26

J. Imaging, Vol. 12, Pages 147: A Systematic Review of Deep Learning Approaches for Hepatopancreatic Tumor Segmentation

Journal of Imaging doi: 10.3390/jimaging12040147

Authors: Razeen Hussain Muhammad Mohsin Dadan Khan Mohammad Zohaib

Deep learning has advanced rapidly in medical image segmentation, yet hepatopancreatic tumor delineation remains challenging due to low contrast, small lesion size, organ variability, and limited high-quality annotations. Existing reviews are outdated or overly broad, leaving recent architectural developments, training strategies, and dataset limitations insufficiently synthesized. To address this gap, we conducted a PRISMA 2020 systematic literature review of studies published between 2021 and 2026 on deep learning-based liver and pancreatic tumor segmentation. From 2307 records, 84 studies met inclusion criteria. U-Net variants continue to dominate, achieving strong liver segmentation but inconsistent tumor accuracy, while transformer-based and hybrid models improve global context modeling at higher computational cost. Attention mechanisms, boundary-refinement modules, and semi-supervised learning offer incremental gains, yet pancreatic tumor segmentation remains notably difficult. Persistent issues, including domain shift, class imbalance, and limited generalization across datasets, underscore the need for more robust architectures, standardized benchmarks, and clinically oriented evaluation. This review consolidates recent progress and highlights key challenges that must be addressed to advance reliable hepatopancreatic tumor segmentation.

J. Imaging, Vol. 12, Pages 146: A Two-Level Illumination Correction Network for Digital Meter Reading Recognition in Non-Uniform Low-Light Conditions

Haoning Fu — 2026-03-25

J. Imaging, Vol. 12, Pages 146: A Two-Level Illumination Correction Network for Digital Meter Reading Recognition in Non-Uniform Low-Light Conditions

Journal of Imaging doi: 10.3390/jimaging12040146

Authors: Haoning Fu Zhiwei Xie Wenzhu Jiang Xingjiang Ma Dongying Yang

The automatic reading recognition of digital instruments is crucial for achieving metering automation and intelligent inspection. However, in non-standardized industrial environments, the masking effect caused by the coupling of non-uniform low-light conditions and the reflective surfaces of instrument panels severely degrades the displayed information, significantly limiting the recognition performance. Conventional image processing methods, while aiming to restore the imaging quality of instrument panels through low-light enhancement, inevitably introduce overexposure and indiscriminately amplify background noise during this process. To address the two key challenges of illumination recovery and noise suppression in the process of restoring panel image quality under non-uniform low-light conditions, this paper proposes a coarse-to-fine cascaded perception framework (CFCP). First, a lightweight YOLOv10 detector is employed to coarsely localize the meter reading region under non-uniform illumination conditions. Second, an Adaptive Illumination Correction Module (AICM) is designed to decouple and correct the illumination component at the pixel level, effectively restoring details in dark areas. Then, an Illumination-invariant Feature Perception Module (IFPM) is embedded at the feature level to dynamically perceive illumination-invariant features and filter out noise interference. Finally, the refined detection results are fed into a lightweight sequence recognition network to obtain the final meter readings. Experiments on a self-built industrial digital instrument dataset show that the proposed method achieves 93.2% recognition accuracy, with 17.1 ms latency and only 7.9 M parameters.

J. Imaging, Vol. 12, Pages 145: Refer-ASV: Referring Multi-Object Tracking in Autonomous Surface Vehicle Navigation Scenes

Bin Xue — 2026-03-25

J. Imaging, Vol. 12, Pages 145: Refer-ASV: Referring Multi-Object Tracking in Autonomous Surface Vehicle Navigation Scenes

Journal of Imaging doi: 10.3390/jimaging12040145

Authors: Bin Xue Qiang Yu Kun Ding Ying Wang Shiming Xiang Chunhong Pan

Water-surface perception is critical for autonomous surface vehicle navigation, where reliable tracking of task-relevant objects is essential for safe and robust operation. Referring multi-object tracking (RMOT) provides a flexible tracking paradigm by allowing users to specify objects of interest through natural language. However, existing RMOT benchmarks are mainly designed for ground or satellite scenes and fail to capture the distinctive visual and semantic characteristics of water-surface environments, including strong reflections, severe illumination variations, weak motion constraints, and a high proportion of small objects. To address this gap, we introduce Refer-ASV, the first RMOT dataset tailored for ASV navigation in complex water-surface scenes. Refer-ASV is constructed from real-world ASV videos and features diverse navigation scenes and fine-grained vessel categories. To facilitate systematic evaluation on Refer-ASV, we further propose RAMOT, an end-to-end baseline framework that enhances visual–language alignment throughout the tracking pipeline by improving visual–language alignment and robustness in challenging maritime environments. Experimental results show that RAMOT achieves a HOTA score of 39.97 on Refer-ASV, outperforming existing methods. Additional experiments on Refer-KITTI demonstrate its generalization ability across different scenes.

J. Imaging, Vol. 12, Pages 144: Development and Evaluation of a Deep Learning Model for Ovarian Cancer Histotype Classification Using Whole-Slide Imaging

Dagoberto Pulido — 2026-03-25

J. Imaging, Vol. 12, Pages 144: Development and Evaluation of a Deep Learning Model for Ovarian Cancer Histotype Classification Using Whole-Slide Imaging

Journal of Imaging doi: 10.3390/jimaging12040144

Authors: Dagoberto Pulido Nathalia Arias-Mendoza

The histopathological classification of ovarian carcinoma is fundamental for patient management. While microscopic evaluation by pathologists is the current diagnostic standard, it is known to be subject to interobserver variability, which can affect consistency in treatment decisions. This study addresses this clinical need by developing and validating a deep learning-based diagnostic support tool designed to enhance the objectivity and reproducibility of this classification. In this work, we address a key challenge in computational pathology—the tendency of attention mechanisms to overfit by concentrating on limited features—by systematically evaluating a direct regularization method within multiple instance learning (MIL) models. The models were trained and validated using 10-fold cross-validation on a public training set of 538 whole-slide images and further tested on an independent public dataset for the more challenging task of molecular subtype classification. We utilized features from a foundational model pre-trained on histopathology data to represent tissue morphology. Our findings demonstrate that directly regularizing the attention mechanism with a stochastic approach provides a statistically significant improvement in accuracy and generalization, highlighting its power as a robust technique to mitigate overfitting for this clinical task. In direct contrast to the reported variability in manual assessment, our final model achieved high consistency and accuracy, with a balanced accuracy of 0.854 and a Cohen’s Kappa of 0.791. The model also demonstrated strong generalization on the molecular classification task. Its attention mechanism provides visual heatmaps for pathologist review, fostering interpretability and trust. We have developed a highly accurate and generalizable artificial intelligence tool that directly addresses the challenge of interobserver variability in ovarian cancer classification. Its performance highlights the potential for artificial intelligence to serve as a decision support system, standardizing histopathological assessment.

J. Imaging, Vol. 12, Pages 143: Vision and Language Reference for a Segment Anything Model for Few-Shot Segmentation

Kosuke Sakurai — 2026-03-24

J. Imaging, Vol. 12, Pages 143: Vision and Language Reference for a Segment Anything Model for Few-Shot Segmentation

Journal of Imaging doi: 10.3390/jimaging12040143

Authors: Kosuke Sakurai Ryotaro Shimizu Masayuki Goto

Segment Anything Model (SAM)-based few-shot segmentation models traditionally rely solely on annotated reference images as prompts, which inherently limits their accuracy due to an over-reliance on visual cues and a lack of semantic context. This reliance leads to incorrect segmentation, where visually similar objects from different categories are incorrectly identified as the target object. We propose Vision and Language Reference Prompt into SAM (VLP-SAM), a novel few-shot segmentation model that integrates both visual information of reference images and semantic information of text labels into SAM. VLP-SAM introduces a vision-language model (VLM) with pixel–text matching into the prompt encoder for SAM, effectively leveraging textual semantic consistency while preserving SAM’s extensive segmentation knowledge. By incorporating task-specific structures such as an attention mask, our model achieves superior few-shot segmentation performance with only 1.4 M learnable parameters. Evaluations on PASCAL-5i and COCO-20i datasets demonstrate that VLP-SAM significantly outperforms previous methods by 6.8% and 9.3% in mIoU, respectively. Furthermore, VLP-SAM exhibits strong generalization across unseen objects and cross-domain scenarios, highlighting the robustness provided by textual semantic guidance. This study offers an effective and scalable framework for few-shot segmentation with multimodal prompts.

J. Imaging, Vol. 12, Pages 142: Wafer Defect Recognition for Industrial Inspection: FCS-VMamba Model and Experimental Validation

Yijia Zhang — 2026-03-24

J. Imaging, Vol. 12, Pages 142: Wafer Defect Recognition for Industrial Inspection: FCS-VMamba Model and Experimental Validation

Journal of Imaging doi: 10.3390/jimaging12040142

Authors: Yijia Zhang Ziyi Ma Tongji Cui Tiejun Zhao Qi Wang Jianhua Wang

In industrial imaging scenarios, semiconductor wafer defect classification is crucial for chip manufacturing yield and reliability. However, numerous challenges persist, including weak imaging responses and detail loss during downsampling, complex backgrounds that interfere with feature extraction, and the trade-off between performance and efficiency on edge devices. Traditional CNNs and ViTs exhibit limitations in modeling long-range dependencies and managing edge deployment costs. To address these issues, we leverage the VMamba architecture, a Visual State Space Model (SSM) that achieves global contextual modeling with linear computational complexity. Based on the VMamba architecture, we propose FCS-VMamba, a domain-adapted model that integrates three core modules, namely Frequency Attention (FA), Cross-Layer Cross-Attention (CLCA), and Saliency Feature Suppression (SFS). The experimental results show that FCS-VMamba achieved 86.06% macro-precision and 87.91% Top-1 accuracy with only 1.2 M parameters. These results demonstrate that FCS-VMamba provides a practical and parameter-efficient baseline for industrial wafer defect recognition.

J. Imaging, Vol. 12, Pages 141: DFENet: A Novel Dual-Path Feature Extraction Network for Semantic Segmentation of Remote Sensing Images

Li Cao — 2026-03-23

J. Imaging, Vol. 12, Pages 141: DFENet: A Novel Dual-Path Feature Extraction Network for Semantic Segmentation of Remote Sensing Images

Journal of Imaging doi: 10.3390/jimaging12030141

Authors: Li Cao Zishang Liu Yan Wang Run Gao

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while inherently neglecting the exploration and utilization of critical frequency-domain features, which is crucial for addressing issues of semantic confusion and blurred boundaries in complex remote sensing scenes. To address the challenges of feature fusion and the lack of frequency-domain information, we propose a novel dual-path feature extraction network (DFENet) in this paper. Specifically, a dual-path module (DPM) is developed in DFENet to extract global and local features, respectively. In the global path, after applying the channel splitting strategy, four feature extraction strategies are innovatively integrated to extract global features from different granularities. According to the strategy of supplementing frequency-domain information, a frequency-domain feature extraction block (FFEB) dominated by discrete Wavelet transform (DWT) is designed to effectively captures both high- and low-frequency components. Experimental results show that our method outperforms existing state-of-the-art methods in terms of segmentation performance, achieving a mean intersection over union (mIoU) of 83.09% on the ISPRS Vaihingen dataset and 86.05% on the ISPRS Potsdam dataset.

J. Imaging, Vol. 12, Pages 140: Real-Time Small UAV Detection in Complex Airspace Using YOLOv11 with Residual Attention and High-Resolution Feature Enhancement

Chuang Han — 2026-03-20

J. Imaging, Vol. 12, Pages 140: Real-Time Small UAV Detection in Complex Airspace Using YOLOv11 with Residual Attention and High-Resolution Feature Enhancement

Journal of Imaging doi: 10.3390/jimaging12030140

Authors: Chuang Han Md Redwan Ullah Amrul Kayes Khalid Hasan Md Abdur Rouf Md Rakib Hasan Shen Tao Guo Gengli Mohammad Masum Billah

Detecting small unmanned aerial vehicles (UAVs) in complex airspace presents significant challenges due to their minimal pixel footprint, resemblance to birds, and frequent occlusion. To address these issues, we propose YOLOv11-ResCBAM, a novel real-time detection framework that integrates a Residual Convolutional Block Attention Module (ResCBAM) and a high-resolution P2 detection head into the YOLOv11 architecture. ResCBAM enhances channel and spatial feature refinement while preserving original feature contexts through residual connections, and the P2 head maintains fine spatial details crucial for small-object localization. Evaluated on a custom dataset of 4917 images (11,733 after augmentation) across three classes (drone, bird, airplane), our model achieves a mean average precision at the 0.5–0.95 IoU threshold (mAP@0.5–0.95) of 0.845, representing a 7.9% improvement over the baseline YOLOv11n, while maintaining real-time inference at 50.51 FPS. Cross-dataset validation on VisDrone2019-DET and UAVDT benchmarks demonstrates promising generalization trends. This work demonstrates the effectiveness of the proposed approach for UAV surveillance systems, balancing detection accuracy with computational efficiency for deployment in security-critical environments.

J. Imaging, Vol. 12, Pages 139: Optimized Reinforcement Learning-Driven Model for Remote Sensing Change Detection

Yan Zhao — 2026-03-19

J. Imaging, Vol. 12, Pages 139: Optimized Reinforcement Learning-Driven Model for Remote Sensing Change Detection

Journal of Imaging doi: 10.3390/jimaging12030139

Authors: Yan Zhao Zhiyun Xiao Tengfei Bao Yulong Zhou

In recent years, deep learning has driven remarkable progress in remote sensing change detection (CD); however, practical deployment is still hindered by two limitations. First, CD results are easily degraded by imaging-induced uncertainties—mixed pixels and blurred boundaries, radiometric inconsistencies (e.g., shadows and seasonal illumination changes), and slight residual misregistration—leading to pseudo-changes and fragmented boundaries. Second, prevailing methods follow a static one-pass inference paradigm and lack an explicit feedback mechanism for adaptive error correction, which weakens generalization in complex or unseen scenes. To address these issues, we propose a feedback-driven CD framework that integrates a dual-branch U-Net with deep reinforcement learning (RL) for pixel-level probabilistic iterative refinement of an initial change probability map. The backbone produces a preliminary posterior estimate of change likelihood from multi-scale bi-temporal features, while a PPO-based RL agent formulates refinement as a Markov decision process. The agent leverages a state representation that fuses multi-scale features, prediction confidence/uncertainty, and spatial consistency cues (e.g., neighborhood coherence and edge responses) to apply multi-step corrective actions. From an imaging and interpretation perspective, the RL module can be viewed as a learnable, self-adaptive imaging optimization mechanism: for high-risk regions affected by blurred boundaries, radiometric inconsistencies, and local misalignment, the agent performs feedback-driven multi-step corrections to improve boundary fidelity and spatial coherence while suppressing pseudo-changes caused by shadows and illumination variations. Experiments on four datasets (CDD, SYSU-CD, PVCD, and BRIGHT) verify consistent improvements. Using SiamU-Net as an example, the proposed RL refinement increases mIoU by 3.07, 2.54, 6.13, and 3.1 points on CDD, SYSU-CD, PVCD, and BRIGHT, respectively, with similarly consistent gains observed when the same RL module is integrated into other representative CD backbones.

J. Imaging, Vol. 12, Pages 138: Current Trends and Future Prospects of Radiomics and Machine Learning (ML) Models in Spinal Tumors—A Narrative Review

Vivek Sanker — 2026-03-19

J. Imaging, Vol. 12, Pages 138: Current Trends and Future Prospects of Radiomics and Machine Learning (ML) Models in Spinal Tumors—A Narrative Review

Journal of Imaging doi: 10.3390/jimaging12030138

Authors: Vivek Sanker Suhrud Panchawgh Anmol Kaur Vinay Suresh Dhanya Mahesh Eeman Ahmad Srinath Hariharan Dhiraj Pangal Maria Jose Cavgnaro Mirabela Rusu John Ratliff Atman Desai

The intersection between radiomics, the computational analysis of imaging data, and machine learning (ML) may lead to new developments in the diagnosis, prognosis, and management of diseases. For spinal tumors specifically, applications of these fields appear promising. In this educational narrative review, we provide a summary of the current advancements in radiomics and artificial intelligence (AI), as well as applications of both fields in the diagnosis and management of spinal tumors. We also provide a suggested workflow of radiomics and machine learning analysis of spinal tumors for researchers, including a list and description of commonly used radiomic features. Future directions in the field of radiomics and machine learning applications to spinal tumors may involve validating already proposed algorithms with larger datasets, ensuring that all computational applications to patient care maintain high ethical standards, and continuing work in developing novel and highly accurate computational techniques to enhance patient outcomes.

J. Imaging, Vol. 12, Pages 137: Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment

Jianpeng Zhang — 2026-03-19

J. Imaging, Vol. 12, Pages 137: Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment

Journal of Imaging doi: 10.3390/jimaging12030137

Authors: Jianpeng Zhang Tianbo Kang Xin Zhao Mingzhu Sun Yi Yang

Reliable autofocus is a fundamental prerequisite for precise positioning in micro-assembly systems, where complex reflections, scale variations, and narrow depth-of-field often degrade the robustness of traditional sharpness metrics. To address these challenges, we propose an efficient two-stage autofocus method for a dual-camera micro-vision system based on a spatial-frequency image quality assessment (IQA) model. First, we design WaveMamba-IQA for image sharpness estimation, synergistically combining the Discrete Wavelet Transform with Vision Transformers to capture high-frequency details and semantic features, further enhanced by Multi-Linear Transposed Attention and Vision Mamba for global context modeling. Moreover, we implement a coarse-to-fine autofocus workflow, employing the Covariance Matrix Adaptation Evolution Strategy for global optimization on the horizontal camera, followed by geometric prior-based precise adjustment for the oblique camera. Experimental results on a custom microsphere dataset demonstrate that WaveMamba-IQA achieves a Spearman correlation coefficient of 0.9786. Furthermore, the integrated system achieves a 98.33% autofocus success rate across varying lighting conditions. This method significantly improves the robustness and automation level of micro-assembly systems, effectively overcoming the limitations of manual and traditional focusing techniques.

J. Imaging, Vol. 12, Pages 136: Machine Learning-Assisted Classification of Pathogenic Yeasts Using Laser Light Scattering and Conventional Microscopy

Xiaoxuan Liu — 2026-03-19

J. Imaging, Vol. 12, Pages 136: Machine Learning-Assisted Classification of Pathogenic Yeasts Using Laser Light Scattering and Conventional Microscopy

Journal of Imaging doi: 10.3390/jimaging12030136

Authors: Xiaoxuan Liu Shamanth Shankarnarayan Zexi Cheng Manisha Gupta Wojciech Rozmus Mrinal Mandal Daniel A. Charlebois Ying Yin Tsui

Yeast infections are a major concern in clinical settings, and several known species are recognized for their antifungal drug resistance, especially the multidrug-resistant pathogen Candidozyma auris. It is of increasing importance to identify pathogenic yeasts to improve treatment outcomes. We present a technique to identify these yeast pathogens using machine learning with a neural network (DenseNet-201) on images obtained from laser light scattering and conventional microscopy. We performed the binary classification of seven species of pathogenic yeast based on their light scattering patterns and their microscopy images. We achieved an average classification accuracy of 95.3% for light scattering patterns and 96.6% for microscopy images of the yeast cells. We also demonstrate high classification accuracy when isolating Candidozyma auris images from all other species combined, at an average of 95.1% for light scattering patterns and 96.7% for microscopy images. The high average classification accuracies suggest that both light scattering and microscopy image data can be combined with machine learning models to classify pathogenic yeasts.

J. Imaging, Vol. 12, Pages 135: External Validation of an Open-Source Model for Automated Muscle Segmentation in CT Imaging of Cancer Patients

Hendrik Erenstein — 2026-03-18

J. Imaging, Vol. 12, Pages 135: External Validation of an Open-Source Model for Automated Muscle Segmentation in CT Imaging of Cancer Patients

Journal of Imaging doi: 10.3390/jimaging12030135

Authors: Hendrik Erenstein Jona Van den Broeck Annemieke van der Heij-Meijer Wim P. Krijnen Aldo Scafoglieri Harriët Jager-Wittenaar Martine Sealy Peter van Ooijen

Computed tomography (CT) at the third lumbar vertebra (L3) is widely used for muscle quantification, but manual segmentation is labor intensive. This study externally validates an AI model, trained on a public dataset, for automated L3 muscle segmentation using an independent cohort, including a subgroup analysis of subject characteristics (e.g., age and a history of cancer). The AI model was trained on 900 CT scans with expert annotations from a publicly available repository. Validation was performed on 232 PET CT scans from the University Hospital Brussels, each manually segmented by an expert. Segmentation post-processing employed a density-based clustering algorithm to discard arm muscles and Hounsfield unit (HU) thresholding to refine the muscle segmentation. Performance was assessed using the Dice Similarity Coefficient (DSC) and Segmentation Surface Error (SSE). The model achieved a median DSC of 0.978 and a median SSE of 3.863 cm2 across the validation set. At lower BMI values, the model was more prone to overestimation of muscle surface area. Most segmentation errors occurred in the abdominal wall muscles. Analysis showed no significant difference between arm positioning above the head and alongside the body, indicating robustness to minor artifacts from arm positioning. The AI model delivers accurate, automated L3 muscle segmentation, supporting larger-scale body composition studies. However, diminished accuracy at low BMI values and limited demographic diversity of the data highlight the need for broader validation.

J. Imaging, Vol. 12, Pages 134: Real-Time Endoscopic Video Enhancement via Degradation Representation Estimation and Propagation

Handing Xu — 2026-03-16

J. Imaging, Vol. 12, Pages 134: Real-Time Endoscopic Video Enhancement via Degradation Representation Estimation and Propagation

Journal of Imaging doi: 10.3390/jimaging12030134

Authors: Handing Xu Zhenguo Nie Tairan Peng Xin-Jun Liu

Endoscopic images are often degraded by uneven illumination, motion blur, and tissue occlusion, which obscure critical anatomical details and complicate surgical manipulation. This issue is particularly pronounced in single-port endoscopic surgery, where the imaging capability of the camera is further constrained by limited working space. While deep learning-based enhancement methods have demonstrated impressive performance, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose an efficient stepwise endoscopic image enhancement framework that introduces an implicit degradation representation as an intermediate feature to guide the enhancement module toward high-quality results. The framework further exploits the temporal continuity of endoscopic videos, based on the assumption that image degradation evolves smoothly over short time intervals. Accordingly, high-quality degradation representations are estimated only on key frames at fixed intervals, while the representations for the remaining frames are obtained through fast inter-frame propagation, thereby significantly improving computational efficiency while maintaining enhancement quality. Experimental results demonstrate that our method achieves an excellent balance between enhancement quality and computational efficiency. Further evaluation on the downstream segmentation task suggests that our method substantially enhances the understanding of the surgical scene, validating that implicitly learning and degradation representation propagation offer a practical pathway for real-time clinical application.

J. Imaging, Vol. 12, Pages 133: ArtUnmasked: A Multimodal Classifier for Real, AI, and Imitated Artworks

Akshad Chidrawar — 2026-03-16

J. Imaging, Vol. 12, Pages 133: ArtUnmasked: A Multimodal Classifier for Real, AI, and Imitated Artworks

Journal of Imaging doi: 10.3390/jimaging12030133

Authors: Akshad Chidrawar Garima Bajwa

Differentiating AI-generated, real, or imitated artworks is becoming a tedious and computationally challenging problem in digital art analysis. AI-generated art has become nearly indistinguishable from human-made works, posing a significant threat to copyrighted content. This content is appearing on online platforms, at exhibitions, and in commercial galleries, thereby escalating the risk of copyright infringement. This sudden increase in generative images raises concerns like authenticity, intellectual property, and the preservation of cultural heritage. Without an automated, comprehensible system to determine whether an artwork has been AI-generated, authentic (real), or imitated, artists are prone to the reduction of their unique works. Institutions also struggle to curate and safeguard authentic pieces. As the variety of generative models continues to grow, it becomes a cultural necessity to build a robust, efficient, and transparent framework for determining whether a piece of art or an artist is involved in potential copyright infringement. To address these challenges, we introduce ArtUnmasked, a practical and interpretable framework capable of (i) efficiently distinguishing AI-generated artworks from real ones using a lightweight Spectral Artifact Identification (SPAI), (ii) a TagMatch-based artist filtering module for stylistic attribution, and (iii) a DINOv3–CLIP similarity module with patch-level correspondence that leverages the one-shot generalization ability of modern vision transformers to determine whether an artwork is authentic or imitated. We also created a custom dataset of ∼24K imitated artworks to complement our evaluation and support future research. The complete implementation is available in our GitHub repository.

J. Imaging, Vol. 12, Pages 132: Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Maps

Emanuele Caruso — 2026-03-16

J. Imaging, Vol. 12, Pages 132: Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Maps

Journal of Imaging doi: 10.3390/jimaging12030132

Authors: Emanuele Caruso Francesco Pelosin Alessandro Simoni Oswald Lanz

Synthetic dataset generation in Computer Vision, particularly for industrial applications, is still underexplored. Industrial defect segmentation, for instance, requires highly accurate labels, yet acquiring such data is costly and time-consuming. To address this challenge, we propose a novel diffusion-based pipeline for generating high-fidelity industrial datasets with minimal supervision. Our approach conditions the diffusion model on enriched bounding-box representations to produce precise segmentation masks, ensuring realistic and accurately localized defect synthesis. Compared to existing layout-conditioned generative methods, our approach improves defect consistency and spatial accuracy. We introduce two quantitative metrics to evaluate the effectiveness of our method and assess its impact on a downstream segmentation task trained on real and synthetic data. Our results demonstrate that diffusion-based synthesis can bridge the gap between artificial and real-world industrial data, fostering more reliable and cost-efficient segmentation models.

J. Imaging, Vol. 12, Pages 131: Advanced Sensitive Feature Machine Learning for Aesthetic Evaluation Prediction of Industrial Products

Jinyan Ouyang — 2026-03-16

J. Imaging, Vol. 12, Pages 131: Advanced Sensitive Feature Machine Learning for Aesthetic Evaluation Prediction of Industrial Products

Journal of Imaging doi: 10.3390/jimaging12030131

Authors: Jinyan Ouyang Ziyuan Xi Jianning Su Shutao Zhang Ying Hu Aimin Zhou

As product aesthetics increasingly drive consumer preference, quantitative evaluation remains hindered by subjective evaluation biases and the black-box nature of modern artificial intelligence. This study proposes an advanced machine learning framework incorporating sensitivity-aware morphological features for the aesthetic evaluation of industrial products, with automotive design as a representative case. An aesthetic index system and its quantitative formulations are first developed to capture the morphological characteristics of product form. Subjective weights are determined via grey relational analysis (GRA), while objective weights are calculated using the coefficient of variation method (CVM) integrated with the technique for order preference by similarity to an ideal solution (TOPSIS). A game-theoretic weighting approach is then employed to fuse subjective and objective weights, thereby establishing a multi-scale aesthetic evaluation system. Sensitivity analysis is applied to identify six key indicators, forming a high-quality dataset. To enhance prediction performance, a novel model—improved lung performance-based optimization with backpropagation neural network (ILPOBP)—is proposed, where the optimization process leverages a maximin latin hypercube design (MLHD) to enhance exploration efficiency. The ILPOBP model effectively predicts aesthetic ratings based on limited morphological input data. Experimental results demonstrate that the ILPOBP model outperforms baseline models in terms of accuracy and robustness when handling complex aesthetic information, achieving a significantly lower test set mean absolute relative error (MARE = 4.106%). To further enhance model interpretability, Shapley additive explanations (SHAP) are employed to elucidate the internal decision-making mechanisms, offering reverse design insights for product optimization. The proposed framework offers a novel and effective approach for integrating machine learning into the aesthetic assessment of industrial product design.

J. Imaging, Vol. 12, Pages 130: 3D-StyleGAN2-ADA: Volumetric Synthesis of Realistic Prostate T2W MRI

Claudia Giardina — 2026-03-14

J. Imaging, Vol. 12, Pages 130: 3D-StyleGAN2-ADA: Volumetric Synthesis of Realistic Prostate T2W MRI

Journal of Imaging doi: 10.3390/jimaging12030130

Authors: Claudia Giardina Verónica Vilaplana

This work investigates the extension of StyleGAN2-ADA to three-dimensional prostate T2-weighted (T2W) MRI generation. The architecture is adapted to operate on 3D anisotropic volumes, enabling stable training at a clinically relevant resolution of 256×256×24, where a baseline 3D-StyleGAN fails to converge. Quantitative evaluation using Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and generative Precision–Recall metrics demonstrates substantial improvements over a 3D-StyleGAN baseline. Specifically, FID decreased from 114.2 to 27.3, while generative Precision increased from 0.22 to 0.82, indicating markedly improved fidelity and alignment with the real data distribution. Beyond generative metrics, the synthetic volumes were evaluated through radiomic feature analysis and downstream prostate segmentation. Synthetic data augmentation resulted in segmentation performance comparable to real-data training, supporting that volumetric generation preserves anatomically relevant structures, while multivariate radiomic analyses showed strong global feature alignment between real and synthetic volumes. These findings indicate that a 3D extension of StyleGAN2-ADA enables stable high-resolution volumetric prostate MRI synthesis while preserving anatomically coherent structure and global radiomic characteristics.

J. Imaging, Vol. 12, Pages 129: Quantitative Ultrasound Texture Analysis of Breast Tumor Responses to Chemotherapy: Comparison of a Cart-Based and a Wireless Ultrasound Scanner

David Alberico — 2026-03-13

J. Imaging, Vol. 12, Pages 129: Quantitative Ultrasound Texture Analysis of Breast Tumor Responses to Chemotherapy: Comparison of a Cart-Based and a Wireless Ultrasound Scanner

Journal of Imaging doi: 10.3390/jimaging12030129

Authors: David Alberico Maria Lourdes Anzola Pena Laurentius O. Osapoetra Lakshmanan Sannachi Joyce Yip Sonal Gandhi Frances Wright Michael Oelze Gregory J. Czarnota

This study assessed the level of agreement between quantitative ultrasound (QUS) feature estimates derived from ultrasound images of breast tumors in women with locally advanced breast cancer (LABC) produced using a cart-based and a handheld ultrasound system. Thirty LABC patients receiving neoadjuvant chemotherapy were imaged at two separate times: a pre-treatment ‘baseline’ time point, and four weeks after the start of chemotherapy. Three sets of QUS features were produced using the reference phantom technique, one for each imaging time and a third set calculated by taking the differences in feature estimates between times. Cross-system statistical testing using the Wilcoxon signed-rank test was performed for each feature set to assess the level of feature estimate agreement between ultrasound systems. The Bland–Altman method was employed to graphically assess feature sets for systematic skew. The range of p-values was 4.50 × 10−11 to 0.277 for the baseline features, 2.77 × 10−5 to 0.865 for the week 4 features, and 2.03 × 10−9 to 1 for the feature differences. For the feature differences, all five of the primary QUS features (MBF, SS, SI, ASD, AAC) were found to be in agreement between the two scanner types at the 5% confidence level. For the baseline feature set and week 4 feature set, 0 out of 5 and 3 out of 5 of the primary features were found to be in agreement, respectively. Of the 20 QUS texture features examined, the number and proportion of the total for each feature set which were found to have statistically significant similarity in their sample medians at the 5% confidence level were as follows: 2 out of 20 (10%) for the baseline features; 17 out of 20 (85%) for the week 4 features; and 12 out of 20 (60%) for the feature differences. The specific texture features found to be in agreement varied between QUS-specific feature sets. Overall, a moderate level of agreement between sets of feature differences produced using the two systems was demonstrated.

J. Imaging, Vol. 12, Pages 128: Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches

Ali Javadi Moghadam — 2026-03-13

J. Imaging, Vol. 12, Pages 128: Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches

Journal of Imaging doi: 10.3390/jimaging12030128

Authors: Ali Javadi Moghadam Abbas Kiani Reza Naeimaei Shirin Malihi Ioannis Brilakis

Three-dimensional (3D) reconstruction using images is one of the most significant topics in computer vision and photogrammetry, with wide-ranging applications in robotics, augmented reality, and mapping. This study investigates methods of 3D reconstruction using video (especially monocular video) data and focuses on techniques such as Structure from Motion (SfM), Multi-View Stereo (MVS), Visual Simultaneous Localization and Mapping (V-SLAM), and videogrammetry. Based on a statistical analysis of SCOPUS records, these methods collectively account for approximately 6863 journal publications up to the end of 2024. Among these, about 80 studies are analyzed in greater detail to identify trends and advancements in the field. The study also shows that the use of video data for real-time 3D reconstruction is commonly addressed through two main approaches: photogrammetry-based methods, which rely on precise geometric principles and offer high accuracy at the cost of greater computational demand; and V-SLAM methods, which emphasize real-time processing and provide higher speed. Furthermore, the application of IMU data and other indicators, such as color quality and keypoint detection, for selecting suitable frames for 3D reconstruction is investigated. Overall, this study compiles and categorizes video-based reconstruction methods, emphasizing the critical step of keyframe extraction. By summarizing and illustrating the general approaches, the study aims to clarify and facilitate the entry path for researchers interested in this area. Finally, the paper offers targeted recommendations for improving keyframe extraction methods to enhance the accuracy and efficiency of real-time video-based 3D reconstruction, while also outlining future research directions in addressing challenges like dynamic scenes, reducing computational costs, and integrating advanced learning-based techniques.

J. Imaging, Vol. 12, Pages 127: Automated Malaria Ring Form Classification in Blood Smear Images Using Ensemble Parallel Neural Networks

Pongphan Pongpanitanont — 2026-03-12

J. Imaging, Vol. 12, Pages 127: Automated Malaria Ring Form Classification in Blood Smear Images Using Ensemble Parallel Neural Networks

Journal of Imaging doi: 10.3390/jimaging12030127

Authors: Pongphan Pongpanitanont Naparat Suttidate Manit Nuinoon Natthida Khampeeramao Sakhone Laymanivong Penchom Janwan

Manual microscopy for malaria diagnosis is labor-intensive and prone to inter-observer variability. This study presents an automated binary classification approach for detecting malaria ring-form infections in thin blood smear single-cell images using a parallel neural network framework. Utilizing a balanced Kaggle dataset of 27,558 erythrocyte crops, images were standardized to 128 × 128 pixels and subjected to on-the-fly augmentation. The proposed architecture employs a dual-branch fusion strategy, integrating a convolutional neural network for local morphological feature extraction with a multi-head self-attention branch to capture global spatial relationships. Performance was rigorously evaluated using 10-fold stratified cross-validation and an independent 10% hold-out test set. Results demonstrated high-level discrimination, with all models achieving an ROC–AUC of approximately 0.99. The primary model (Model#1) attained a peak mean accuracy of 0.9567 during cross-validation and 0.97 accuracy (macro F1-score: 0.97) on the independent test set. In contrast, increasing architectural complexity in Model#3 led to a performance decline (0.95 accuracy) due to higher false-positive rates. These findings suggest that moderate-capacity feature fusion, combining convolutional descriptors with attention-based aggregation, provides a robust and generalizable solution for automated malaria screening without the risks associated with over-parameterization. Despite a strong performance, immediate clinical use remains limited because the model was developed on pre-segmented single-cell images, and external validation is still required before routine implementation.

J. Imaging, Vol. 12, Pages 126: Parameter Selection in Coupled Dynamical Systems for Tomographic Image Reconstruction

Ryosuke Kasai — 2026-03-12

J. Imaging, Vol. 12, Pages 126: Parameter Selection in Coupled Dynamical Systems for Tomographic Image Reconstruction

Journal of Imaging doi: 10.3390/jimaging12030126

Authors: Ryosuke Kasai Omar M. Abou Al-Ola Tetsuya Yoshinaga

This study investigates the performance of image-reconstruction methods derived from coupled dynamical systems for solving linear inverse problems, focusing on how appropriate parameter selection enhances noise-suppression capability in tomographic image reconstruction. Our previous work has established the stability of linear and nonlinear variants of such systems on the basis of Lyapunov’s theorem. However, the influence of parameter choice on reconstruction quality has not been fully clarified. To address this issue, we introduce a parameter adjustment strategy based on an optimization principle. Two complementary optimization strategies are considered. The first employs ground-truth images to determine optimal parameter values that serve as a numerical benchmark for evaluating reconstruction performance. The second relies solely on measured projection data, enabling practical application without prior knowledge of the true image. Numerical experiments using phantoms with relatively high noise levels demonstrate that appropriate parameter selection markedly improves reconstruction accuracy and robustness. These results clarify how properly tuned reconstruction methods derived from coupled dynamical systems can effectively exploit their inherent dynamics to achieve noise suppression in tomographic inverse problems.

J. Imaging, Vol. 12, Pages 125: Consistency-Driven Dual-Teacher Framework for Semi-Supervised Zooplankton Microscopic Image Segmentation

Zhongwei Li — 2026-03-12

J. Imaging, Vol. 12, Pages 125: Consistency-Driven Dual-Teacher Framework for Semi-Supervised Zooplankton Microscopic Image Segmentation

Journal of Imaging doi: 10.3390/jimaging12030125

Authors: Zhongwei Li Yinglin Wang Dekun Yuan Yanping Qi Xiaoli Song

In-depth research on marine biodiversity is essential for understanding and protecting marine ecosystems, where semantic segmentation of marine species plays a crucial role. However, segmenting microscopic zooplankton images remains challenging due to highly variable morphologies, complex boundaries, and the scarcity of high-quality pixel-level annotations that require expert knowledge. Existing semi-supervised methods often rely on single-model perspectives, producing unreliable pseudo-labels and limiting performance in such complex scenarios. To address these challenges, this paper proposes a consistency-driven dual-teacher framework tailored for zooplankton segmentation. Two heterogeneous teacher networks are employed: one captures global morphological features, while the other focuses on local fine-grained details, providing complementary and diverse supervision and alleviating overfitting under limited annotations. In addition, a dynamic fusion-based pseudo-label filtering strategy is introduced to adaptively integrate hard and soft labels by jointly considering prediction consistency and confidence scores, thereby enhancing supervision flexibility. Extensive experiments on the Zooplankton-21 Microscopic Segmentation Dataset (ZMS-21), a self-constructed microscopic zooplankton dataset demonstrate that the proposed method consistently outperforms existing semi-supervised segmentation approaches under various annotation ratios, achieving mIoU scores of 64.80%, 69.58%, 70.32%, and 73.92% with 1/16, 1/8, 1/4, and 1/2 labeled data, respectively.

J. Imaging, Vol. 12, Pages 124: Selective Trace Mix: A New Processing Tool to Enhance Seismic Imaging of Complex Subsurface Structures

Mohamed Rashed — 2026-03-12

J. Imaging, Vol. 12, Pages 124: Selective Trace Mix: A New Processing Tool to Enhance Seismic Imaging of Complex Subsurface Structures

Journal of Imaging doi: 10.3390/jimaging12030124

Authors: Mohamed Rashed Nassir Al-Amri Riyadh Halawani Ali Atef Hussein Harbi

In seismic imaging, the trace mixing process involves merging neighboring traces in seismic data to enhance the signal-to-noise ratio and improve the continuity and spatial coherence of seismic data. In regions with complex subsurface structures, current trace mix filters are often ineffective as they introduce artifacts that reduce interpretability and obscure the signatures of important structures, such as faults and folds. We introduce the selective trace mix as a novel, data-dependent filter. This filter enhances amplitude consistency, spatial coherence, and the definition of reflections, while it preserves complex structures and maintains their clarity. Selective trace mix uses sequential steps of evaluation, referencing, exclusion, weighting, and normalization of all samples within the filter operator. As a result, selective trace mix is a temporally and spatially variable, data-dependent filter. The filter’s effectiveness is validated using both synthetic and real field seismic data. Synthetic data is a portion of the Marmousi seismic model, while real data include land and marine seismic datasets imaging complex subsurface fault/fold structures. When compared to three of the commonly used conventional filters, the selective trace mix yields far better results in terms of horizon integrity and fault clarity.

J. Imaging, Vol. 12, Pages 123: Influence of Off-Centre Positioning, Scan Direction, and Localiser Projection Angle on Organ-Specific Radiation Doses in Low-Dose Chest CT: A Simulation Study Across Four Scanner Models

Louise D’hondt — 2026-03-11

J. Imaging, Vol. 12, Pages 123: Influence of Off-Centre Positioning, Scan Direction, and Localiser Projection Angle on Organ-Specific Radiation Doses in Low-Dose Chest CT: A Simulation Study Across Four Scanner Models

Journal of Imaging doi: 10.3390/jimaging12030123

Authors: Louise D’hondt Claudia Haentjens Pieter-Jan Kellens Annemiek Snoeckx Klaus Bacher

With the considerable number of low-dose CT examinations performed in lung cancer screening, variations in participant positioning, scan direction, or localiser angle are likely to occur in practice. These variations are known to affect automatic tube current modulation (ATCM) operation, yet organ-specific dose implications across CT models remain unknown. Therefore, this simulation study systematically characterised the effect of the aforementioned variations. Using the Alderson RANDO phantom, ATCM profiles were established on CT scanners from four major vendors (GE, Siemens, Canon, Philips) after introducing vertical and lateral mispositioning, craniocaudal and caudocranial scan directions, and varying localiser projection angles. Additionally, off-centre positioning and scan direction changes preceded by either a single posteroanterior (PA) or dual (PA+lateral) localiser were evaluated. Doses to the lungs, heart, thyroid, liver, and breasts were calculated from Monte Carlo simulations of each setup for 32 patient-specific voxel models. The results demonstrate statistically significant and scanner-dependent dose variations. PA localisers generally produced the highest organ doses. However, on the Philips system, organ dose increases of at least 50% were observed after the lateral projection angle. GE and Siemens scanners showed pronounced dose increases following downward mispositioning with a single PA localiser (18–50% and 5–25%, respectively), an effect largely mitigated by adding a lateral localiser. Canon and Philips scanners exhibited generally stable ATCM behaviour after vertical off-centring, although Canon showed notable dose increases upon lateral mispositioning, with dose increases up to 37.5% and 34% after a single PA or dual localiser, respectively. Variations in scan direction displayed highly model- and organ-dependent effects. Dose deviations were largely mitigated after dual localisers for the GE, Canon, and Philips scanner types. Here, organ dose differences were within an absolute range of 10%, indicating that a change in scan direction preceded by a dual localiser can reduce extreme dose deviations. Remarkably, no significant difference was observed solely for the Siemens scanner when combined with a dual localiser, as lung, heart, breast, and liver doses remained significantly (between 20 and 35%) lower when scanning craniocaudally, whereas the thyroid dose in this setup remained considerably higher (up to 20% mean increase). Ultimately, findings indicate that seemingly minor protocol deviations can lead to significant underestimation of anticipated organ-specific doses associated with lung cancer screening. Scanner-specific optimisation, supported by medical physics expertise, is therefore essential.

J. Imaging, Vol. 12, Pages 122: A Spectrally Compatible Pseudo-Panchromatic Intensity Reconstruction for PCA-Based UAS RGB–Multispectral Image Fusion

Dimitris Kaimaris — 2026-03-11

J. Imaging, Vol. 12, Pages 122: A Spectrally Compatible Pseudo-Panchromatic Intensity Reconstruction for PCA-Based UAS RGB–Multispectral Image Fusion

Journal of Imaging doi: 10.3390/jimaging12030122

Authors: Dimitris Kaimaris

The paper presents a method for generating a pseudo-panchromatic (PPAN) orthophotomosaic that is spectrally compatible with the multispectral (MS) orthophotomosaic, and it targets the fusion of unmanned aircraft system (UAS) RGB–MS orthophotomosaics when no true panchromatic band is available. In typical UAS imaging systems, RGB and multispectral sensors operate independently and exhibit different spectral responses and spatial resolutions, making the construction of a spectrally compatible substitution intensity a critical challenge for component substitution fusion. The conventional RGB-derived PPAN preserves spatial detail but is constrained by RGB–MS spectral incompatibility, expressed as reduced corresponding-band similarity. The proposed hybrid intensity (PPANE) increases the mean corresponding-band correlation from 0.842 (PPANA) to 0.928 (PPANE) and reduces the across-site mean SAM from 5.782° to 4.264°, while maintaining spatial sharpness comparable to the RGB-derived intensity. It is proposed that the PPANE orthophotomosaic be produced as a hybrid intensity (single band) image. Specifically, a multispectral-visible-derived intensity is resampled onto the RGB grid and statistically integrated with RGB spatial detail, followed by mild high-frequency enhancement to produce the final PPANE orthophotomosaic. Principal Component Analysis (PCA) fusion is applied to seven archaeological sites in Northern Greece. Spectral quality is evaluated on the MS grid using band-wise (corresponding-band) correlation and the Spectral Angle Mapper (SAM), while the spatial sharpness of the fused NIR orthophotomosaic is assessed using Tenengrad and Laplacian variance. The PPANE orthophotomosaic consistently increases correlations relative to PPANA (especially in Red Edge/NIR) and reduces the mean site-mean SAM. PPANC yields the lowest SAM but also the lowest spatial sharpness/clarity, whereas PPANE maintains spatial sharpness/clarity comparable to PPANA, supporting a balance between spectral consistency and spatial detail, as also confirmed through comparative evaluation against established component substitution fusion methods. The approach is reproducible and avoids full histogram matching; instead, it relies on explicitly defined linear standardization steps (mean–std normalization) and controlled spatial sharpening, and performs consistently across different scenes.

J. Imaging, Vol. 12, Pages 121: Deep Learning Based Computer-Aided Detection of Prostate Cancer Metastases in Bone Scintigraphy: An Experimental Analysis

Eslam Jabali — 2026-03-11

J. Imaging, Vol. 12, Pages 121: Deep Learning Based Computer-Aided Detection of Prostate Cancer Metastases in Bone Scintigraphy: An Experimental Analysis

Journal of Imaging doi: 10.3390/jimaging12030121

Authors: Eslam Jabali Omar Almomani Louai Qatawneh Sinan Badwan Yazan Almomani Mohammad Al-soreeky Alia Ibrahim Natalie Khalil

Bone scintigraphy is a widely available and cost-effective modality for detecting skeletal metastases in prostate cancer, yet visual interpretation can be challenging due to heterogeneous uptake patterns, benign mimickers, and a high reporting workload, motivating robust computer-aided decision support. In this study, we present an experimental evaluation of fourteen convolutional neural network (CNN) architectures for binary metastasis classification in planar bone scintigraphy using a unified protocol. Fourteen models, CNN (baseline), AlexNet, VGG16, VGG19, ResNet18, ResNet34, ResNet50, ResNet50-attention, DenseNet121, DenseNet169, DenseNet121-attention, WideResNet50_2, EfficientNet-B0, and ConvNeXt-Tiny, were trained and tested on 600 scan images (300 normal, 300 metastatic) from the Jordanian Royal Medical Services under identical preprocessing and augmentation with stratified five-fold cross-validation. We report mean ± SD for AUC-ROC, accuracy, precision, sensitivity (recall), F1-score, specificity, and Cohen’s κ, alongside calibration via the Brier score and deployment indicators (parameters, FLOPs, model size, and inference time). DenseNet121 achieved the best overall balance of diagnostic performance and reliability, reaching AUC-ROC 96.0 ± 1.2, accuracy 89.2 ± 2.2, sensitivity 83.7 ± 3.4, specificity 94.7 ± 2.2, F1-score 88.5 ± 2.5, κ = 0.783 ± 0.045, and the strongest calibration (Brier 0.080 ± 0.013), with stable fold-to-fold behaviour. DenseNet121-attention produced the highest AUC-ROC (96.3 ± 1.1) but exhibited greater variability in specificity, indicating less consistent false-alarm control. Complexity analysis supported DenseNet121 as deployable (~7.0 M parameters, ~26.9 MB, ~92 ms/image), whereas heavier models yielded only limited additional clinical value. These results support DenseNet121 as a reliable backbone for automated metastasis detection in planar scintigraphy, with future work focusing on external validation, threshold optimisation, interpretability, and model compression for clinical adoption.

J. Imaging, Vol. 12, Pages 120: Decoding the Heart Through Computed Tomography: Early Cardiomyopathy Detection Using Ensemble-Based Segmentation and Radiomics

Theodoros Tsampras — 2026-03-10

J. Imaging, Vol. 12, Pages 120: Decoding the Heart Through Computed Tomography: Early Cardiomyopathy Detection Using Ensemble-Based Segmentation and Radiomics

Journal of Imaging doi: 10.3390/jimaging12030120

Authors: Theodoros Tsampras Alexios Antonopoulos Theodora Karamanidou Georgios Kalykakis Konstantinos Tsioufis Charalambos Vlachopoulos

Diagnosis of cardiomyopathies often depends on overt phenotypic manifestations, delaying patient management. This underscores the need for population-level opportunistic screening tools using clinically indicated CT scans to detect subclinical myocardial disease. This study developed an Ensemble Machine Learning (ML) model to automatically segment the left ventricular myocardium from CT data and estimate the probability of underlying myocardial disease using radiomic feature analysis. A total of 60 CT scans (~12,000 images) were used to train ML models for left ventricular myocardium segmentation, including scans from both healthy individuals and patients with myocardial disease. A novel Ensemble model was developed and externally validated on 10 independent CT scans. Subsequently, 100 unseen CT scans were segmented manually and automatically for radiomic feature analysis. After removing highly correlated features through intra-class variation and correlation filtering, the refined dataset was used for model training and testing. Key predictive features were identified, and model performance was evaluated. The four best-performing models (Unet++, ED w/ASC, FPN, and TresUNET) were combined to form an Ensemble model, achieving a final DICE score of 0.882 after hyperparameter optimization. External validation yielded a DICE score of 0.907. Radiomic feature analysis identified 15 key predictors of myocardial disease in both manual and automatic segmentation datasets. The model demonstrated strong performance in detecting underlying myocardial disease, with AUCs of 0.85 and 0.8, respectively. This study presents a fully automated CT-based framework for LV myocardial segmentation and radiomic phenotyping that accurately estimates the probability of underlying myocardial disease. The model demonstrates strong generalizability across different CT protocols and highlights the potential role of AI-driven CT analysis for early, non-invasive cardiomyopathy screening at a population level.

J. Imaging, Vol. 12, Pages 118: ClearScope: A Fully Integrated Light-Sheet Theta Microscope for Sub-Micron-Resolution Imaging Without Lateral Size Constraints

Matthew G. Fay — 2026-03-10

J. Imaging, Vol. 12, Pages 118: ClearScope: A Fully Integrated Light-Sheet Theta Microscope for Sub-Micron-Resolution Imaging Without Lateral Size Constraints

Journal of Imaging doi: 10.3390/jimaging12030118

Authors: Matthew G. Fay Peter J. Lang David S. Denu Nathan J. O’Connor Benjamin Haydock Jeffrey Blaisdell Nicolas Roussel Alissa Wilson Sage R. Aronson Veronica Pessino Paul J. Angstman Cheng Gong Tanvi Butola Orrin Devinsky Jayeeta Basu Raju Tomer Jacob R. Glaser

Three-dimensional (3D) ex vivo imaging of cleared tissue from intact brains from animal models, human brain surgical specimens, and large postmortem human and non-human primate brain specimens is essential for understanding physiological neural connectivity and pathological alterations underlying neurological and neuropsychiatric disorders. Contemporary light-sheet microscopy enables rapid, high-resolution imaging of large, cleared samples but is limited by the orthogonal arrangement of illumination and detection optics, which constrains specimen size. Light-sheet theta microscopy (LSTM) overcomes this limitation by employing two oblique illumination paths while maintaining a perpendicular detection geometry. Here, we report the development of a next-generation, fully integrated and user-friendly LSTM system that enables uniform subcellular-resolution imaging (with subcellular resolution determined by the lateral performance of the system) throughout large specimens without constraining lateral (XY) dimensions. The system provides a seamless workflow encompassing image acquisition, data storage, pre- and post-processing, enhancement and quantitative analysis. Performance is demonstrated by high-resolution 3D imaging of intact mouse brains and human brain samples, including complete downstream analyses such as digital neuron tracing, vascular reconstruction and design-based stereological analysis. This enhanced and accessible LSTM implementation enables rapid quantitative mapping of molecular and cellular features in very large biological specimens.

J. Imaging, Vol. 12, Pages 119: Talking Head Generation Through Generative Models and Cross-Modal Synthesis Techniques

Hira Nisar — 2026-03-10

J. Imaging, Vol. 12, Pages 119: Talking Head Generation Through Generative Models and Cross-Modal Synthesis Techniques

Journal of Imaging doi: 10.3390/jimaging12030119

Authors: Hira Nisar Salman Masood Zaki Malik Adnan Abid

Talking Head Generation (THG) is a rapidly advancing field at the intersection of computer vision, deep learning, and speech synthesis, enabling the creation of animated human-like heads that can produce speech and express emotions with high visual realism. The core objective of THG systems is to synthesize coherent and natural audio–visual outputs by modeling the intricate relationship between speech signals, facial dynamics, and emotional cues. These systems find widespread applications in virtual assistants, interactive avatars, video dubbing for multilingual content, educational technologies, and immersive virtual and augmented reality environments. Moreover, the development of THG has significant implications for accessibility technologies, cultural preservation, and remote healthcare interfaces. This survey paper presents a comprehensive and systematic overview of the technological landscape of Talking Head Generation. We begin by outlining the foundational methodologies that underpin the synthesis process, including generative adversarial networks (GANs), motion-aware recurrent architectures, and attention-based models. A taxonomy is introduced to organize the diverse approaches based on the nature of input modalities and generation goals. We further examine the contributions of various domains such as computer vision, speech processing, and human–robot interaction, each of which plays a critical role in advancing the capabilities of THG systems. The paper also provides a detailed review of datasets used for training and evaluating THG models, highlighting their coverage, structure, and relevance. In parallel, we analyze widely adopted evaluation metrics, categorized by their focus on image quality, motion accuracy, synchronization, and semantic fidelity. Operating parameters such as latency, frame rate, resolution, and real-time capability are also discussed to assess deployment feasibility. Special emphasis is placed on the integration of generative artificial intelligence (GenAI), which has significantly enhanced the adaptability and realism of talking head systems through more powerful and generalizable learning frameworks.

J. Imaging, Vol. 12, Pages 117: The Evolving Role of Radiation Therapy Technologists in Head and Neck Cancer: A Narrative Review and Operational Framework

Andrea Lastrucci — 2026-03-10

J. Imaging, Vol. 12, Pages 117: The Evolving Role of Radiation Therapy Technologists in Head and Neck Cancer: A Narrative Review and Operational Framework

Journal of Imaging doi: 10.3390/jimaging12030117

Authors: Andrea Lastrucci Ilaria Morelli Nicola Iosca Isacco Desideri Eva Serventi Yannick Wandael Carlotta Becherini Viola Salvestrini Vittorio Miele Renzo Ricci Lorenzo Livi Pierluigi Bonomo Daniele Giansanti

Head and neck cancer (HNC) management requires highly coordinated multidisciplinary care. Radiation Therapy Technologists (RTTs) have increasingly expanded their role beyond technical execution, contributing to patient positioning, treatment delivery, monitoring, and supportive care. This narrative review integrates evidence from published literature with structured experiential insights collected through focus group discussions with RTTs and other multidisciplinary team (MDT) members. The resulting conceptual and operational framework highlights RTT contributions across the radiotherapy pathway, including adaptive planning, workflow coordination, and patient-centered interventions, supported by imaging and artificial intelligence (AI) tools for predictive modeling and treatment optimization. By facilitating communication, monitoring anatomical and functional changes, and integrating AI-informed insights, RTTs support timely interventions, reduce treatment interruptions, and enhance treatment safety and precision. Structured training, formal recognition of advanced practice roles, and interprofessional collaboration are key to maximizing the impact of RTTs in HNC care. This review provides a practical roadmap for institutions, professional societies, and research initiatives to support the evolution of RTT roles in complex radiotherapy settings.

J. Imaging, Vol. 12, Pages 116: Assessing RGB Color Reliability via Simultaneous Comparison with Hyperspectral Data on Pantone® Fabrics

Cindy Lorena Gómez-Heredia — 2026-03-10

J. Imaging, Vol. 12, Pages 116: Assessing RGB Color Reliability via Simultaneous Comparison with Hyperspectral Data on Pantone® Fabrics

Journal of Imaging doi: 10.3390/jimaging12030116

Authors: Cindy Lorena Gómez-Heredia Jose David Ardila-Useda Andrés Felipe Cerón-Molina Jhonny Osorio-Gallego Jorge Andrés Ramírez-Rincón

Accurate color property measurements are critical for advancing artificial vision in real-time industrial applications. RGB imaging remains highly applicable and widely used due to its practicality, accessibility, and high spatial resolution. However, significant uncertainties in extracting chromatic information highlight the need to define when conventional digital images can reliably provide accurate color data. This work simultaneously compares six chromatic properties across 700 Pantone® TCX fabric samples, using optical data acquired simultaneously from both hyperspectral (HSI) and digital (RGB) cameras. The results indicate that the accurate interpretation of optical information from RGB (sRGB and REC2020) images is significantly influenced by lightness (L*) values. Samples with bright and unsaturated colors (L*> 50) reach ratio-to-performance-deviation (RPD) values above 2.5 for four properties (L*, a*, b* hab), indicating a good correlation between HSI and RGB information. Absolute color difference comparisons (∆Ea) between HSI and RGB images yield values exceeding 5.5 units for red-yellow-green samples and up to 9.0 units for blue and purple tones. In contrast, relative color differences (∆Er) comparisons show a significant decrease, with values falling below 3.0 for all lightness values, indicating the practical equivalence of both methodologies according to the Two One-Sided Test (TOST) statistical analysis. These results confirm that RGB imagery achieves reliable color consistency when evaluated against a practical reference.

J. Imaging, Vol. 12, Pages 115: Automated Processing and Deviation Analysis of 3D Pipeline Point Clouds Based on Geometric Features

Shaofeng Jin — 2026-03-09

J. Imaging, Vol. 12, Pages 115: Automated Processing and Deviation Analysis of 3D Pipeline Point Clouds Based on Geometric Features

Journal of Imaging doi: 10.3390/jimaging12030115

Authors: Shaofeng Jin Kangrui Fu Chengzhen Yang Huanhuan Rui

To meet the strict non-contact measurement requirements for the assembly of aircraft engine pipelines and to overcome the limitations of the traditional three-dimensional laser scanning workflow, this study proposes an automated pipeline point cloud processing and deviation analysis framework. Through a standardized three-dimensional laser scanning procedure, high-resolution pipeline point clouds are obtained and preprocessed. Based on the geometric characteristics of the pipeline, automated algorithms for point cloud feature segmentation, axis extraction, and model registration are developed. Particularly, the three-dimensional extended Douglas–Peucker (DP) algorithm is introduced to achieve efficient point cloud downsampling while retaining necessary geometric and structural features. These algorithms are fully integrated into a unified software platform, supporting one-click operation, and can automatically analyze and obtain five key types of pipeline deviations: angular deviation, radial deviation, axial deviation, roundness error, and diameter error. The platform also provides intuitive visualization effects and comprehensive report generation functions to facilitate quantitative inspection and analysis. Test results show that the proposed method significantly improves the processing efficiency and measurement reliability of complex pipeline systems. The developed framework provides a powerful practical solution for the automated geometric inspection of aircraft engine pipelines and lays a solid foundation for subsequent quality assessment tasks.

J. Imaging, Vol. 12, Pages 114: Evaluation of High Dynamic Range Imaging Methods for Luminance Measurements

Lou Gevaux — 2026-03-09

J. Imaging, Vol. 12, Pages 114: Evaluation of High Dynamic Range Imaging Methods for Luminance Measurements

Journal of Imaging doi: 10.3390/jimaging12030114

Authors: Lou Gevaux Alejandro Ferrero Alice Dupiau Ángela Sáez Markos Antonopoulos Constantinos Bouroussis

Imaging luminance measurement is increasingly used in lighting applications, but the limited dynamic range of camera sensors requires using high dynamic range (HDR) imaging methods for evaluating scenes with large luminance contrasts. This work aims at investigating how parameters of HDR imaging techniques may impact luminance measurement accuracy, using a numerical evaluation. A numerical simulation framework based on a digital twin of an imaging system and synthetic high-contrast luminance scenes is used to introduce controlled systematic error sources and quantify their impact on HDR luminance accuracy. The results support the identification of HDR approaches most suitable for producing luminance measurements traceable to the International System of Units (SI).

J. Imaging, Vol. 12, Pages 113: A Taxonomy of Six Perceptual Cues Underlying Photorealism in 3D-Rendered Architectural Scenes: A Cue-Based Narrative Review

Matija Grašić — 2026-03-08

J. Imaging, Vol. 12, Pages 113: A Taxonomy of Six Perceptual Cues Underlying Photorealism in 3D-Rendered Architectural Scenes: A Cue-Based Narrative Review

Journal of Imaging doi: 10.3390/jimaging12030113

Authors: Matija Grašić Andrija Bernik Vladimir Cviljušac

Perceived photorealism in architectural 3D rendering is not determined solely by physical accuracy or rendering complexity but also by a limited set of visual cues that observers rely on when judging realism. This literature review synthesizes findings from 41 peer-reviewed studies spanning perception science, computer graphics, and immersive visualization, with the aim of identifying the cues that most strongly contribute to perceived photorealism in rendered scenes. Convergent evidence from psychophysical experiments, user studies in virtual and augmented reality, and rendering-oriented analyses indicate that six cue categories consistently dominate realism judgments. Across the reviewed literature, realism judgments depend less on scene complexity or the number of visual elements and more on the consistency and plausibility of these cues for supporting inferences about shape, material, and spatial layout. These findings suggest that photorealism emerges from the alignment of the rendered image structure with perceptual expectations learned from real-world visual experience. The implications for architectural visualization workflows and directions for future research on cue interactions and perceptual thresholds are discussed.

J. Imaging, Vol. 12, Pages 112: Endo-DET: A Domain-Specific Detection Framework for Multi-Class Endoscopic Disease Detection

Yijie Lu — 2026-03-06

J. Imaging, Vol. 12, Pages 112: Endo-DET: A Domain-Specific Detection Framework for Multi-Class Endoscopic Disease Detection

Journal of Imaging doi: 10.3390/jimaging12030112

Authors: Yijie Lu Yixiang Zhao Qiang Yu Wei Shao Renbin Shen

Gastrointestinal cancers account for roughly a quarter of global cancer incidence, and early detection through endoscopy has proven effective in reducing mortality. Multi-class endoscopic disease detection, however, faces three persistent challenges: feature redundancy from non-pathological content, severe illumination inconsistency across imaging modalities, and extreme scale variability with blurry boundaries. This paper introduces Endo-DET, a domain-specific detection framework addressing these challenges through three synergistic components. The Adaptive Lesion-Discriminative Filtering (ALDF) module achieves lesion-focused attention via sparse simplex projection, reducing complexity from O(N2) to O(αN2). The Global–Local Illumination Modulation Neck (GLIM-Neck) enables illumination-aware multi-scale fusion through four cooperative mechanisms, maintaining stable performance across white-light endoscopy, narrow-band imaging, and chromoendoscopy. The Lesion-aware Unified Calibration and Illumination-robust Discrimination (LUCID) module uses dual-stream reciprocal modulation to integrate boundary-sensitive textures with global semantics while suppressing instrument artifacts. Experiments on EDD2020, Kvasir-SEG, PolypGen2021, and CVC-ClinicDB show that Endo-DET improves mAP50-95 over the DEIM baseline by 5.8, 10.8, 4.1, and 10.1 percentage points respectively, with mAP75 gains of 6.1, 10.3, 6.8, and 9.3 points, and Recall50-95 improvements of 10.9, 12.1, 11.1, and 11.5 points. Running at 330 FPS with TensorRT FP16 optimization, Endo-DET achieves consistent cross-dataset improvements while maintaining real-time capability, providing a methodological foundation for clinical computer-aided diagnosis.

J. Imaging, Vol. 12, Pages 111: Evidence-Guided Diagnostic Reasoning for Pediatric Chest Radiology Based on Multimodal Large Language Models

Yuze Zhao — 2026-03-06

J. Imaging, Vol. 12, Pages 111: Evidence-Guided Diagnostic Reasoning for Pediatric Chest Radiology Based on Multimodal Large Language Models

Journal of Imaging doi: 10.3390/jimaging12030111

Authors: Yuze Zhao Qing Wang Yingwen Wang Ruiwei Zhao Rui Feng Xiaobo Zhang

Pediatric respiratory diseases are a leading cause of hospital admissions and childhood mortality worldwide, highlighting the critical need for accurate and timely diagnosis to support effective treatment and long-term care. Chest radiography remains the most widely used imaging modality for pediatric pulmonary assessment. Consequently, reliable AI-assisted diagnostic methods are essential for alleviating the workload of clinical radiologists. However, most existing deep learning-based approaches are data-driven and formulate diagnosis as a black-box image classification task, resulting in limited interpretability and reduced clinical trustworthiness. To address these challenges, we propose a trustworthy two-stage diagnostic paradigm for pediatric chest X-ray diagnosis that closely aligns with the radiological workflow in clinical practice, in which the diagnosis procedure is constrained by evidence. In the first stage, a vision–language model fine-tuned on pediatric data identifies radiological findings from chest radiographs, producing structured and interpretable diagnostic evidence. In the second stage, a multimodal large language model integrates the radiograph, extracted findings, patient demographic information, and external medical domain knowledge with RAG mechanism to generate the final diagnosis. Experiments conducted on the VinDr-PCXR dataset demonstrate that our method achieves 90.1% diagnostic accuracy, 70.9% F1-score, and 82.5% AUC, representing up to a 13.1% increase in diagnosis accuracy over the state-of-the-art baselines. These results validate the effectiveness of combining multimodal reasoning with explicit medical evidence and domain knowledge, and indicate the strong potential of the proposed approach for trustworthy pediatric radiology diagnosis.

J. Imaging, Vol. 12, Pages 110: Forensic Analysis for Source Camera Identification from EXIF Metadata

Pengpeng Yang — 2026-03-04

J. Imaging, Vol. 12, Pages 110: Forensic Analysis for Source Camera Identification from EXIF Metadata

Journal of Imaging doi: 10.3390/jimaging12030110

Authors: Pengpeng Yang Chen Zhou Daniele Baracchi Dasara Shullani Yaobin Zou Alessandro Piva

Source camera identification on smartphones constitutes a fundamental task in multimedia forensics, providing essential support for applications such as image copyright protection, illegal content tracking, and digital evidence verification. Numerous techniques have been developed for this task over the past decades. Among existing approaches, Photo-Response Non-Uniformity (PRNU) has been widely recognized as a reliable device-specific fingerprint and has demonstrated remarkable performance in real-world applications. Nevertheless, the rapid advancement of computational photography technologies has introduced significant challenges: modern devices often exhibit anomalous behaviors under PRNU-based analysis. For instance, images captured by different devices may exhibit unexpected correlations, while images captured by the same device can vary substantially in their PRNU patterns. Current approaches are incapable of automatically exploring the underlying causes of these anomalous behaviors. To address this limitation, we propose a simple yet effective forensic analysis framework leveraging Exchangeable Image File Format (EXIF) metadata. Specifically, we represent EXIF metadata as type-aware word embeddings to preserve contextual information across tags. This design enables visual interpretation of the model’s decision-making process and provides complementary insights for identifying the anomalous behaviors observed in modern devices. Extensive experiments conducted on three public benchmark datasets demonstrate that the proposed method not only achieves state-of-the-art performance for source camera identification but also provides valuable insights into anomalous device behaviors.

J. Imaging, Vol. 12, Pages 109: A Hierarchical Multi-View Deep Learning Framework for Autism Classification Using Structural and Functional MRI

Nayif Mohammed Hammash — 2026-03-04

J. Imaging, Vol. 12, Pages 109: A Hierarchical Multi-View Deep Learning Framework for Autism Classification Using Structural and Functional MRI

Journal of Imaging doi: 10.3390/jimaging12030109

Authors: Nayif Mohammed Hammash Mohammed Chachan Younis

Autism classification is challenging due to the subtle, heterogeneous, and overlapping neural activation profiles that occur in individuals with autism. Novel deep learning approaches, such as Convolutional Neural Networks (CNNs) and their variants, as well as Transformers, have shown moderate performance in discriminating between autism and normal cohorts; yet, they often struggle to jointly capture the spatial–structural and temporal–functional variations present in autistic brains. To overcome these shortcomings, we propose a novel hierarchical deep learning framework that extracts the inherent spatial dependencies from the dual-modal MRI scans. For sMRI, we develop a 3D Hierarchical Convolutional Neural Network to capture both fine and coarse anatomical structures via multi-view projections along the axial, sagittal, and coronal planes. For the fMRI case, we introduced a bidirectional LSTM-based temporal encoder to examine regional brain dynamics and functional connectivity. The sequential embeddings and correlations are combined into a unified spatiotemporal representation of functional imaging, which is then classified using a multilayer perceptron to ensure continuity in diagnostic predictions across the examined modalities. Finally, a cross-modality fusion scheme was employed to integrate feature representations of both modalities. Extensive evaluations on the ABIDE I dataset (NYU repository) demonstrate that our proposed framework outperforms existing baselines, including Vision/Swin Transformers and various newly developed CNN variants. For the sMRI branch, we achieved 90.19 ± 0.12% accuracy (precision: 90.85 ± 0.16%, recall: 89.27 ± 0.19%, F1-score: 90.05 ± 0.14%, and focal loss: 0.3982). For the fMRI branch, we achieved an accuracy of 88.93 ± 0.15% (precision: 89.78 ± 0.18%, recall: 88.29 ± 0.20%, F1-score: 89.03 ± 0.17%, and focal loss of 0.4437). These outcomes affirm the superior generalization and robustness of the proposed framework for integrating structural and functional brain representations to achieve accurate autism classification.