Next Issue
Volume 12, May
Previous Issue
Volume 12, March
 
 

J. Imaging, Volume 12, Issue 4 (April 2026) – 38 articles

Cover Story (view full-size image): On crowded waterways, an autonomous surface vehicle should not only perceive surrounding vessels but also understand which ones the user wants to follow. The following cover image illustrates a shipborne view in which a natural-language instruction guides the tracker to continuously follow selected boats. Colored boxes and trails highlight the referenced objects, while distant small targets and visually similar vessels with comparable behaviors reveal the inherent ambiguity of real-world ASV navigation. Refer-ASV introduces language-guided multi-object tracking to complex water-surface scenes, and RAMOT provides a robust baseline for visual-language tracking. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
14 pages, 7605 KB  
Article
Automated Morphological Profiling via Deep Learning-Based Segmentation for High-Throughput Phenotypic Screening
by Bendegúz H. Zováthi and Philipp Kainz
J. Imaging 2026, 12(4), 179; https://doi.org/10.3390/jimaging12040179 - 21 Apr 2026
Viewed by 532
Abstract
Reproducible morphological profiling, particularly for drug discovery, has become an important tool for compound evaluation. Established workflows such as CellProfiler provide a widely adopted foundation for Cell Painting analysis. However, conventional pipelines often require substantial manual configuration and technical expertise, which can limit [...] Read more.
Reproducible morphological profiling, particularly for drug discovery, has become an important tool for compound evaluation. Established workflows such as CellProfiler provide a widely adopted foundation for Cell Painting analysis. However, conventional pipelines often require substantial manual configuration and technical expertise, which can limit scalability and accessibility. In this study, a fully automated deep learning-based workflow is presented for segmentation-driven morphological profiling from raw microscopy data. Using a curated subset of the JUMP Cell Painting pilot dataset, ground-truth masks were generated and used to train a U-net–based segmentation model in the IKOSA platform. Post-processing strategies were introduced to improve instance separation and reduce segmentation artifacts. The final model achieved strong segmentation performance (precision/recall/AP up to 0.98/0.94/0.92 for nuclei), with an average runtime of 2.2 s per 1080 × 1080 image. Segmentation outputs enabled large-scale feature extraction, yielding 3664 morphological descriptors that showed high correlation with CellProfiler-derived measurements (normalized MAE: 0.0298). Feature prioritization further reduced redundancy to 1145 informative descriptors. These results demonstrate that automated deep learning pipelines can complement established Cell Painting workflows by reducing configuration overhead while maintaining compatibility with validated morphological profiling standards. The proposed workflow may help improve resource efficiency in drug discovery and personalized medicine. Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
Show Figures

Figure 1

30 pages, 98630 KB  
Article
A Method for Paired Comparisons of Glo Germ Quantity in Images of Hands Before and After Washing
by Jordan Ali Rashid and Stuart Criley
J. Imaging 2026, 12(4), 178; https://doi.org/10.3390/jimaging12040178 - 21 Apr 2026
Viewed by 662
Abstract
We present a reproducible pipeline that converts color images into quantitative fluorescence maps by combining spectral measurement with a linear mixture model. The method is designed specifically for quantitative comparisons of Glo Germ™ on images of hands taken under different experimental conditions with [...] Read more.
We present a reproducible pipeline that converts color images into quantitative fluorescence maps by combining spectral measurement with a linear mixture model. The method is designed specifically for quantitative comparisons of Glo Germ™ on images of hands taken under different experimental conditions with controlled illumination. The emission spectrum of Glo Germ is measured using a spectral photometer and normalized to obtain its spectral power density function. This spectrum is projected into CIE XYZ coordinates and incorporated into a linear mixture model in which each pixel contains contributions from white light, UV-illuminated skin reflectance, and fluorophore emission. Component magnitudes are estimated with non-negative least squares, yielding a grayscale image whose intensity is a monotonic proxy for local fluorophore density. Spatial integration provides an image-level summary proportional to total detected material. Compared with single-channel proxies, the observer suppresses background structure, improves contrast, and remains radiometrically interpretable. Because the method depends only on measurable spectra and linear transforms, it can be reproduced across cameras and extended to other fluorophores. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

27 pages, 3995 KB  
Article
Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques
by Dana El-Rushaidat, Nour Almohammad, Raine Yeh and Kinda Fayyad
J. Imaging 2026, 12(4), 177; https://doi.org/10.3390/jimaging12040177 - 20 Apr 2026
Viewed by 1094
Abstract
This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile [...] Read more.
This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

26 pages, 4019 KB  
Article
MSWA-ResNet: Multi-Scale Wavelet Attention for Patient-Level and Interpretable Breast Cancer Histopathology Classification
by Ghadeer Al Sukkar, Ali Rodan and Azzam Sleit
J. Imaging 2026, 12(4), 176; https://doi.org/10.3390/jimaging12040176 - 19 Apr 2026
Viewed by 671
Abstract
Breast cancer histopathological classification is critical for diagnosis and treatment planning, yet manual assessment remains time-consuming and subject to inter-observer variability. Although deep learning approaches have advanced automated analysis, image-level data splitting may introduce data leakage, and spatial-domain architectures lack explicit multi-scale frequency [...] Read more.
Breast cancer histopathological classification is critical for diagnosis and treatment planning, yet manual assessment remains time-consuming and subject to inter-observer variability. Although deep learning approaches have advanced automated analysis, image-level data splitting may introduce data leakage, and spatial-domain architectures lack explicit multi-scale frequency modeling. This study proposes MSWA-ResNet, a Multi-Scale Wavelet Attention Residual Network that embeds recursive discrete wavelet decomposition within residual blocks to enable frequency-aware and scale-aware feature learning. The model is evaluated on the BreakHis dataset using a strict patient-level protocol with 70/30 patient-wise splitting, five-fold stratified cross-validation, ensemble prediction, and hierarchical aggregation from patch to patient level. MSWA-ResNet achieves 96% patient-level accuracy at 100×, 200×, and 400× magnifications, and 92% at 40×, with F1-scores of 0.97 and 0.94, respectively. At 200× and 400×, accuracy improves from 0.92 to 0.96 and F1-score from 0.94 to 0.97 over baseline CNNs while maintaining 11.8–12.1 M parameters and 2.5–4.8 ms inference time. Grad-CAM demonstrates improved localization of diagnostically relevant regions, indicating that explicit multi-scale frequency modeling enhances accurate and interpretable patient-level classification. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

31 pages, 2783 KB  
Article
SurveyNet: A Unified Deep Learning Framework for OCR and OMR-Based Survey Digitization
by Rubi Quiñones, Sreeja Cheekireddy and Eren Gultepe
J. Imaging 2026, 12(4), 175; https://doi.org/10.3390/jimaging12040175 - 17 Apr 2026
Viewed by 980
Abstract
Manual survey data entry remains a bottleneck in large-scale research, marketing, and public policy, where survey sheets are still widely used due to accessibility and high response rates. Despite the progress in Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), existing systems [...] Read more.
Manual survey data entry remains a bottleneck in large-scale research, marketing, and public policy, where survey sheets are still widely used due to accessibility and high response rates. Despite the progress in Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), existing systems treat these tasks separately and are typically tailored to clean, standardized forms, making them unreliable for real-world survey sheets with diverse markings and handwritten inputs. These limitations hinder automation and introduce significant error rates in data transcription. To address this, we propose SurveyNet, a unified deep learning framework that combines OCR and OMR capabilities to automatically digitize complex survey responses within a single model. SurveyNet processes both handwritten digits and a wide variety of mark types including ticks, circles, and crosses across multiple question formats. We also introduce SurveySet, a novel dataset comprising 135 real-world survey forms annotated across four key response types. Experimental results demonstrate that SurveyNet achieves between 50% and 97% classification accuracy across tasks, with strong performance even on small and imbalanced datasets. This framework offers a scalable solution for streamlining survey digitization workflows, reducing manual errors, and enabling timely analysis in domains ranging from consumer research to public health and education. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

23 pages, 4380 KB  
Article
Vision-Based Measurement of Breathing Deformation in Wind Turbine Blade Fatigue Test
by Xianlong Wei, Cailin Li, Zhiyong Wang, Zhao Hai, Jinghua Wang and Leian Zhang
J. Imaging 2026, 12(4), 174; https://doi.org/10.3390/jimaging12040174 - 17 Apr 2026
Viewed by 569
Abstract
Wind turbine blades are subjected to complex environmental conditions during long-term operation, which may lead to structural degradation and performance loss. To ensure structural integrity, fatigue testing prior to deployment is essential. This paper proposes a vision-based method for measuring the full-cycle breathing [...] Read more.
Wind turbine blades are subjected to complex environmental conditions during long-term operation, which may lead to structural degradation and performance loss. To ensure structural integrity, fatigue testing prior to deployment is essential. This paper proposes a vision-based method for measuring the full-cycle breathing deformation of wind turbine blades during fatigue testing. The method captures dynamic image sequences of the blade’s hotspot cross-section using industrial cameras and employs a feature-based template matching approach to reconstruct the three-dimensional coordinates of target points. Through coordinate transformation, the deformation trajectories are obtained, enabling quantitative analysis of the blade’s dynamic responses in both flapwise and edgewise directions. A dedicated hardware–software system was developed and validated through full-scale fatigue experiments. Quantitative comparison with strain gage measurements shows that the proposed method achieves mean absolute deviations of 0.84 mm and 0.93 mm in two independent experiments, respectively, with closely matched deformation trends under typical loading conditions. These results demonstrate that the proposed method can reliably capture the global deformation behavior of the blade with millimeter-level accuracy, while significantly reducing instrumentation complexity compared to conventional contact-based approaches. The proposed method provides an effective and practical solution for full-field dynamic deformation measurement in blade fatigue testing, offering strong potential for structural health monitoring and early damage detection in wind turbine systems. Full article
Show Figures

Figure 1

18 pages, 1496 KB  
Review
Cracking the Code: Computational Image Analysis Tools for Histopathological and Morphometric Insights
by Ana Luisa Teixeira de Almeida, Ana Beatriz Gram dos Santos and Debora Ferreira Barreto-Vieira
J. Imaging 2026, 12(4), 173; https://doi.org/10.3390/jimaging12040173 - 17 Apr 2026
Viewed by 588
Abstract
The assessment of histopathological features has evolved considerably, transitioning from traditional manual measurements to more sophisticated, technology-assisted approaches. Classical histological evaluation, while foundational and highly reliable, is inherently labor-intensive and subject to inter-observer variability. With the advent of digital pathology, these practices have [...] Read more.
The assessment of histopathological features has evolved considerably, transitioning from traditional manual measurements to more sophisticated, technology-assisted approaches. Classical histological evaluation, while foundational and highly reliable, is inherently labor-intensive and subject to inter-observer variability. With the advent of digital pathology, these practices have been progressively enhanced by image processing software, which offers capabilities such as segmentation, feature extraction, and data visualization. However, despite their promise, the integration of machine learning into this branch of pathology faces notable challenges, such as the need for large, high-quality annotated datasets and the integration into existing workflows, which remain significant hurdles. Looking forward, the role of specialists in histological evaluation remains crucial in this evolving landscape. While automation streamlines routine tasks, the expertise of pathologists is indispensable in validating results and interpreting findings in scientific contexts. This comprehensive review explores the trajectory of histological evaluation methods, from manual and classical strategies to cutting-edge digital tools, highlighting the benefits, limitations, and implications of each approach in contemporary practice. Full article
(This article belongs to the Special Issue AI-Driven Advances in Computational Pathology)
Show Figures

Figure 1

27 pages, 4829 KB  
Article
Dual RANSAC with Rescue Midpoint Multi-Trend Vanishing Point Detection
by Nada Said, Bilal Nakhal, Ali El-Zaart and Lama Affara
J. Imaging 2026, 12(4), 172; https://doi.org/10.3390/jimaging12040172 - 16 Apr 2026
Viewed by 594
Abstract
Vanishing point detection is a fundamental step in computer vision that allows 3D scene understanding and autonomous navigation. Classical techniques have significant challenges when trying to understand scenes that are heavily cluttered and images containing multiple perspective cues, leading to poor or unreliable [...] Read more.
Vanishing point detection is a fundamental step in computer vision that allows 3D scene understanding and autonomous navigation. Classical techniques have significant challenges when trying to understand scenes that are heavily cluttered and images containing multiple perspective cues, leading to poor or unreliable vanishing point determination. We present a Dual RANSAC with Rescue Midpoint-based Multi-Trend Vanishing Point Detection framework, which targets the simultaneous detection and fine-tuning of multiple, globally consistent vanishing points. The proposed framework introduces a novel Midpoint-based Multi-Trend Random Sample Consensus formulation that operates on line segment midpoints to infer dominant directional groups, thereby eliminating noisy or unstable midpoints and stabilizing subsequent vanishing point inference. The main novelty lies in using line segment midpoints to model the orientation variation as a linear regression in the midpoint–orientation space, which helps reduce sensitivity to endpoint instability. Candidate vanishing points are prioritized through inlier-based confidence ranking and subsequently optimized via an MSAC-based arbiter to resolve hypothesis conflicts and minimize geometric error. We evaluate our work against state-of-the-art techniques such as J-Linkage and Conditional Sample Consensus, over two of the current challenging public datasets that comprise the York Urban Dataset and the Toulouse Vanishing Point Dataset. The results show that the proposed framework achieves a recall of up to 95% and an image success rate of almost 84%, outperforming both J-Linkage and Conditional Sample Consensus, especially under tighter angular thresholds. This demonstrates the ability of the proposed framework to provide enhanced stability and localization accuracy. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 13348 KB  
Article
Morphological Convolutional Neural Network for Efficient Facial Expression Recognition
by Robert, Sarifuddin Madenda, Suryadi Harmanto, Michel Paindavoine and Dina Indarti
J. Imaging 2026, 12(4), 171; https://doi.org/10.3390/jimaging12040171 - 15 Apr 2026
Viewed by 668
Abstract
This study proposes a morphological convolutional neural network (MCNN) architecture that integrates morphological operations with CNN layers for facial expression recognition (FER). Conventional CNN-based FER models primarily rely on appearance features and may be sensitive to illumination and demographic variations. This work investigates [...] Read more.
This study proposes a morphological convolutional neural network (MCNN) architecture that integrates morphological operations with CNN layers for facial expression recognition (FER). Conventional CNN-based FER models primarily rely on appearance features and may be sensitive to illumination and demographic variations. This work investigates whether morphological structural representations provide complementary information to convolutional features. A multi-source and multi-ethnic FER dataset was constructed by combining CK+, JAFFE, KDEF, TFEID, and a newly collected Indonesian Facial Expression dataset, resulting in 3684 images from 326 subjects across seven expression classes. Subject-independent data splitting with 10-fold cross-validation was applied to ensure reliable evaluation. Experimental results show that the proposed MCNN1 model achieves an average accuracy of 88.16%, while the best MCNN2 variant achieves 88.7%, demonstrating competitive performance compared to MobileNetV2 (88.27%), VGG19 (87.58%), and the morphological baseline MNN (50.73%). The proposed model also demonstrates improved computational efficiency, achieving lower inference latency (21%) and reduced GPU memory usage (64%) compared to baseline models. These results indicate that integrating morphological representations into convolutional architectures provides a modest but consistent improvement in FER performance while enhancing generalization and efficiency under heterogeneous data conditions. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

21 pages, 27736 KB  
Article
ARS-GS: Anisotropic Reflective Spherical 3D Gaussian Splatting
by Chenrui Wu, Xinyu Shi, Zhenzhong Chu and Yao Huang
J. Imaging 2026, 12(4), 170; https://doi.org/10.3390/jimaging12040170 - 15 Apr 2026
Viewed by 1096
Abstract
3D scene reconstruction serves as a fundamental technology with widespread applications in virtual reality, structural inspection, and robotic systems. While recent advances in 3D Gaussian Splatting have significantly enhanced scene reconstruction capabilities, the performance of such methods remains suboptimal when applied to highly [...] Read more.
3D scene reconstruction serves as a fundamental technology with widespread applications in virtual reality, structural inspection, and robotic systems. While recent advances in 3D Gaussian Splatting have significantly enhanced scene reconstruction capabilities, the performance of such methods remains suboptimal when applied to highly reflective environments. To overcome this limitation, we introduce ARS-GS, a novel framework that integrates Anisotropic Spherical Gaussian reflection modeling and spherical harmonics diffuse approximation into a physically based rendering pipeline. This architecture incorporates a skip connection between the Anisotropic Spherical Gaussian module and the Gaussian primitives, effectively preserving surface details while maintaining computational efficiency. Comprehensive experimental evaluations validate the efficacy of ARS-GS across multiple datasets. Specifically, our method establishes new state-of-the-art quantitative benchmarks, achieving a peak signal-to-noise ratio of 38.30 and a structural similarity index measure of 0.997 on the neural radiance fields synthetic dataset, alongside a peak signal-to-noise ratio of 46.31 on the Gloss Blender dataset. Furthermore, on the challenging reflective neural radiance fields real-world dataset, our approach secures the highest peak signal-to-noise ratio scores, highlighted by a metric of 26.26 on the Sedan scene. The proposed method also substantially reduces perceptual errors, yielding a learned perceptual image patch similarity as low as 0.204, thereby consistently outperforming existing techniques in the reconstruction of highly specular surfaces with superior geometric fidelity. Full article
Show Figures

Figure 1

17 pages, 6145 KB  
Article
Novel, Contrast Echocardiography-Based Trabeculation Quantification Method in the Diagnosis of Left Ventricular Excessive Trabeculation
by Kristóf Attila Farkas-Sütő, Balázs Mester, Flóra Klára Gyulánczi, Krisztina Filipkó, Hajnalka Vágó, Béla Merkely and Andrea Szűcs
J. Imaging 2026, 12(4), 169; https://doi.org/10.3390/jimaging12040169 - 14 Apr 2026
Viewed by 602
Abstract
Cardiac MRI (CMR) is the gold standard for diagnosing left ventricular excessive trabeculation (LVET), whereas echocardiography (Echo) often does not yield a definitive diagnosis. The use of ultrasound contrast material offers the potential for more accurate imaging of the trabecular system; however, we [...] Read more.
Cardiac MRI (CMR) is the gold standard for diagnosing left ventricular excessive trabeculation (LVET), whereas echocardiography (Echo) often does not yield a definitive diagnosis. The use of ultrasound contrast material offers the potential for more accurate imaging of the trabecular system; however, we do not yet have diagnostic criteria developed specifically for contrast Echo (CE-Echo). We aimed to determine the role of CE-Echo in the diagnosis of LVET and to propose a novel method for quantifying trabeculation. We included 55 LVET subjects and 54 age- and sex-matched healthy Control subjects. All subjects underwent non-contrast Echo, CE-Echo, and CMR examinations. In addition to volumetric parameters and ejection fraction (EF), we measured the area of the trabeculated layer and its ratio to the LV area (Trab/LV_area) on apical CE-Echo views. Based on the CMR-derived diagnosis, the Trab/LV_area ratio identified individuals with LVET with high specificity (98%) and sensitivity (95%) when the average of the apical views reached 17% (AUC = 0.98), or when it exceeded 20% in at least one view (AUC = 0.96). The use of CE-Echo may assist in the quantitative diagnosis of LVET in addition to its morphological assessment, and the Trab_area/LVarea may be a good additional criterion in the diagnosis of LVET. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

20 pages, 11776 KB  
Article
Assessing CNNs and LoRA-Fine-Tuned Vision–Language Models for Breast Cancer Histopathology Image Classification
by Tomiris M. Zhaksylyk, Beibit B. Abdikenov, Nurbek M. Saidnassim, Birzhan T. Ayanbayev, Aruzhan S. Imasheva and Temirlan S. Karibekov
J. Imaging 2026, 12(4), 168; https://doi.org/10.3390/jimaging12040168 - 14 Apr 2026
Viewed by 1061
Abstract
Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels. Convolutional neural networks (CNNs) have long been the standard for image-based diagnosis, yet recent advances in vision-language models (VLMs) suggest they may provide [...] Read more.
Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels. Convolutional neural networks (CNNs) have long been the standard for image-based diagnosis, yet recent advances in vision-language models (VLMs) suggest they may provide strong and transferable representations for complex medical images. In this study, we present a systematic comparison between CNN baselines and large VLMs—Qwen2 and SmolVLM—fine-tuned with Low-Rank Adaptation (LoRA; r=16, α=32, dropout = 0.05) on the BreakHis dataset. Models were evaluated at 40×, 100×, 200×, and 400× magnifications using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). While Qwen2 achieved moderate performance across magnifications (e.g., 0.8736 accuracy and 0.9552 AUC at 200×), SmolVLM consistently outperformed Qwen2 and substantially reduced the gap with CNN baselines, reaching up to 0.9453 accuracy and 0.9572 F1-score at 200×—approaching the performance of AlexNet (0.9543 accuracy) at the same magnification. CNN baselines, particularly ResNet34, remained the strongest models overall, achieving the highest performance across all magnifications (e.g., 0.9879 accuracy and 0.9984 AUC at 40×). These findings demonstrate that LoRA fine-tuned VLMs, despite requiring gradient accumulation and memory-efficient optimizers and operating with a significantly smaller number of trainable parameters, can achieve competitive performance relative to traditional CNNs. However, CNN-based architectures still provide the highest accuracy and robustness for histopathology classification. Our results highlight the potential of VLMs as parameter-efficient alternatives for digital pathology tasks, particularly in resource-constrained settings. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

38 pages, 5277 KB  
Review
Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions
by Sara Lopes, Miguel Mascarenhas, João Fonseca and Adelino F. Leite-Moreira
J. Imaging 2026, 12(4), 167; https://doi.org/10.3390/jimaging12040167 - 12 Apr 2026
Viewed by 749
Abstract
Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural [...] Read more.
Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural support. However, the current evidence base remains heterogeneous, and translational challenges persist. Methods: This review summarizes current applications and developments of AI across white-light bronchoscopy (WLB), image-enhanced bronchoscopy (e.g., narrow-band imaging and autofluorescence imaging), endobronchial ultrasound (EBUS), virtual and robotic bronchoscopies, and workflow optimization and training. The authors also examine the methodological limitations, regulatory considerations, and implementation barriers that affect translation into routine practice. Results: Reported developments include deep learning-based models for mucosal abnormality detection, lymph-node characterization during EBUS-guided transbronchial needle aspiration (EBUS-TBNA), improved lesion localization, and reduction in operator-dependent variability. Additionally, AI-assisted simulation platforms and decision-support tools are reshaping training paradigms. Nevertheless, most studies remain retrospective or single-center, with limited external validation, dataset heterogeneity, unclear model explainability, and incomplete integration into clinical workflows. Conclusions: AI has the potential to support lesion detection, navigation, and training in pulmonary endoscopy. However, robust prospective validation, standardized datasets, transparent model reporting, robust data governance, multidisciplinary collaboration, and careful integration into clinical practice are required before widespread adoption. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

19 pages, 2043 KB  
Article
A TV–BM3D Iterative Algorithm for VMAT-CT Reconstruction
by Chia-Lung Chien, Beibei Guo and Rui Zhang
J. Imaging 2026, 12(4), 166; https://doi.org/10.3390/jimaging12040166 - 10 Apr 2026
Viewed by 720
Abstract
Volumetric modulated arc therapy-computed tomography (VMAT-CT), which is the CT reconstructed using the portal images collected during VMAT, can potentially be an effective onsite imaging tool. The goal of this study was to propose an iterative reconstruction algorithm that can further improve the [...] Read more.
Volumetric modulated arc therapy-computed tomography (VMAT-CT), which is the CT reconstructed using the portal images collected during VMAT, can potentially be an effective onsite imaging tool. The goal of this study was to propose an iterative reconstruction algorithm that can further improve the image quality of VMAT-CT and reduce the number of failed reconstructions. An iterative algorithm combining total variation (TV) with block-matching and 3D filtering (BM3D) was proposed, addressing the L1-L2 regularization problem using the split Bregman method. We collected portal images from 67 VMAT cases including 50 phantom and 17 real-patient cases. Both Feldkamp–Davis–Kress (FDK) and TV-BM3D iterative algorithms were used to reconstruct VMAT-CT using the collected images. The preprocessing methods developed by our group previously were also used in this study. A total of 48 out of 50 phantom cases and 15 out of 17 real-patient cases were successfully reconstructed using the iterative algorithm together with image preprocessing. In contrast, 39 phantom cases and 8 patient cases could be reconstructed using the original FDK algorithm, and 44 phantom cases and 11 patient cases could be reconstructed using the FDK algorithm together with preprocessing. Compared with the FDK algorithm, the TV-BM3D iterative algorithm significantly improved the image quality of VMAT-CT at all treatment sites. To the best of our knowledge, this study is the first to develop an iterative VMAT-CT reconstruction algorithm. It can be used to reconstruct CT images locally, and is superior to FDK-based algorithms in terms of the success rate and reconstructed image quality. This strongly supports the use of VMAT-CT as a promising imaging tool for treatment monitoring and adaptive radiotherapy. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

26 pages, 1385 KB  
Article
Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data
by Chingiz Seyidbayli, Soheil Nezakat and Andreas Reinhardt
J. Imaging 2026, 12(4), 165; https://doi.org/10.3390/jimaging12040165 - 10 Apr 2026
Viewed by 985
Abstract
Cloud cover significantly reduces the electrical power output of photovoltaic systems, making accurate short-term cloud movement predictions essential for reliable solar energy production planning. This article presents a deep learning framework that directly estimates cloud movement from ground-based all-sky camera images, rather than [...] Read more.
Cloud cover significantly reduces the electrical power output of photovoltaic systems, making accurate short-term cloud movement predictions essential for reliable solar energy production planning. This article presents a deep learning framework that directly estimates cloud movement from ground-based all-sky camera images, rather than predicting future production from past power data. The system is based on a three-step process: First, a lightweight Convolutional Neural Network segments cloud regions and produces probabilistic masks that represent the spatial distribution of clouds in a compact and computationally efficient manner. This allows subsequent models to focus on the geometry of clouds rather than irrelevant visual features such as illumination changes. Second, a Vector Quantized Variational Autoencoder compresses these masks into discrete latent token sequences, reducing dimensionality while preserving fundamental cloud structure patterns. Third, a GPT-style autoregressive transformer learns temporal dependencies in this token space and predicts future sequences based on past observations, enabling iterative multi-step predictions, where each prediction serves as the input for subsequent time steps. Our evaluations show an average intersection-over-union ratio of 0.92 and a pixel accuracy of 0.96 for single-step (5 s ahead) predictions, while performance smoothly decreases to an intersection-over-union ratio of 0.65 and an accuracy of 0.80 in 10 min autoregressive propagation. The framework also provides prediction uncertainty estimates through token-level entropy measurement, which shows positive correlation with prediction error and serves as a confidence indicator for downstream decision-making in solar energy forecasting applications. Full article
(This article belongs to the Special Issue AI-Driven Image and Video Understanding)
Show Figures

Figure 1

14 pages, 2724 KB  
Article
High-Resolution Measurement of Surface Normal Maps Using Specular Reflection Imaging
by Shinichi Inoue, Yoshinori Igarashi and Seiji Suzuki
J. Imaging 2026, 12(4), 164; https://doi.org/10.3390/jimaging12040164 - 10 Apr 2026
Viewed by 470
Abstract
This paper presents a method for measuring the spatial distribution of surface normal vectors with high angular accuracy. The measured data are visualized using a color-mapping technique and represented as normal maps, which are commonly used in computer graphics. Reliable methods for evaluating [...] Read more.
This paper presents a method for measuring the spatial distribution of surface normal vectors with high angular accuracy. The measured data are visualized using a color-mapping technique and represented as normal maps, which are commonly used in computer graphics. Reliable methods for evaluating material surface properties have long been sought in industrial applications where visual assessments of reflective properties are still widely employed, particularly in appearance-critical fields. Motivated by this need, we introduce an imaging-based technique for measuring the high-resolution spatial distribution of surface normal vectors from specular reflection. A dedicated measurement apparatus was developed to capture surface normal vectors at 1024 × 1024 sampling points with a spatial resolution of 0.02 × 0.02 mm and an angular resolution of approximately 0.1°. Using this apparatus, normal maps were obtained for various materials, including plastic, ceramic tile, inkjet paper, and aluminum sheets. The spatial distribution of surface normal vectors reflects surface roughness, which strongly influences perceived texture. The resulting normal maps enable not only quantitative surface analysis for industrial inspection but also the physical reproduction of gloss in computer graphics. Full article
(This article belongs to the Section Visualization and Computer Graphics)
Show Figures

Figure 1

16 pages, 2524 KB  
Article
A Robust Rule-Based Framework for Stone Detection and Posterior Acoustic Shadow Localization in Abdominal Ultrasound
by Kyuseok Kim and Ji-Youn Kim
J. Imaging 2026, 12(4), 163; https://doi.org/10.3390/jimaging12040163 - 9 Apr 2026
Viewed by 924
Abstract
Posterior acoustic shadowing is a fundamental physical phenomenon associated with calcified stones in ultrasound image, yet it has not been fully exploited in automated ultrasound analysis. This study aimed to develop an explainable, semi-automatic rule-based framework that explicitly incorporates posterior acoustic shadow characteristics [...] Read more.
Posterior acoustic shadowing is a fundamental physical phenomenon associated with calcified stones in ultrasound image, yet it has not been fully exploited in automated ultrasound analysis. This study aimed to develop an explainable, semi-automatic rule-based framework that explicitly incorporates posterior acoustic shadow characteristics for stone detection and localization in a clinically guided manner. A rule-based framework was designed to generate stone candidates using morphological enhancement and to evaluate them through local contrast analysis, posterior shadow region assessment, and shape-based penalties. A composite score integrating these features was used to rank candidates. The method was evaluated on 52 kidney stone and 66 gallbladder stone ultrasound images, stratified into three diagnostic confidence categories. Performance was assessed using an ablation study and centroid distance error measured in pixels relative to expert-defined references. In the 50–60% confidence group, the accuracy increased from 0.29 to 0.64 for kidney stones and from 0.30 to 0.60 for gallbladder stones when posterior shadow information was included. Centroid distance errors in the ≥80% confidence group were 1.26 ± 0.28 mm for kidney stones and 1.44 ± 0.91 mm for gallbladder stones. The proposed framework enhances diagnostic confidence by leveraging physically grounded posterior acoustic shadow analysis and provides a reproducible augmentation to conventional ultrasound-based stone assessment. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 1466 KB  
Article
D2MNet: Difference-Aware Decoupling and Multi-Prompt Learning for Medical Difference Visual Question Answering
by Lingge Lai, Weihua Ou, Jianping Gou and Zhonghua Liu
J. Imaging 2026, 12(4), 162; https://doi.org/10.3390/jimaging12040162 - 9 Apr 2026
Viewed by 586
Abstract
Difference visual question answering (Diff-VQA) aims to answer questions by identifying and reasoning about differences between medical images. Existing methods often rely on simple feature subtraction or fusion to model image differences, while overlooking the asymmetric descriptive requirements of changed and unchanged cases [...] Read more.
Difference visual question answering (Diff-VQA) aims to answer questions by identifying and reasoning about differences between medical images. Existing methods often rely on simple feature subtraction or fusion to model image differences, while overlooking the asymmetric descriptive requirements of changed and unchanged cases and providing limited task-specific guidance to pretrained language decoders. To address these limitations, we propose D2MNet (Difference-aware Decoupling and Multi-prompt Network), a framework for medical Diff-VQA that combines change-aware reasoning with prompt-guided answer generation. Specifically, a Change Analysis Module (CAM) predicts whether a change is present and produces a binary change-aware prompt; a Difference-Aware Module (DAM) uses dual attention to capture fine-grained difference features; and a multi-prompt learning mechanism (MLM) injects question-aware, change-aware, and learnable prompts into the language decoder to improve contextual alignment and response generation. Experiments on the MIMIC-DiffVQA benchmark show that D2MNet achieves a CIDEr score of 2.907 ± 0.040, outperforming the strongest baseline, ReAl (2.409), under the same evaluation setting. These results demonstrate the effectiveness of the proposed design on benchmark medical Diff-VQA and suggest its potential for assisting difference-aware medical answer generation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

23 pages, 9554 KB  
Article
RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery
by Lei Li, Chenrong Fang, Wei Li, Kan Chen, Baolong Li and Qian Sun
J. Imaging 2026, 12(4), 161; https://doi.org/10.3390/jimaging12040161 - 8 Apr 2026
Viewed by 670
Abstract
Structural reconstruction helps infer the spatial relationships and object layouts in a scene, which is an essential computer vision task for understanding visual content. However, it remains challenging due to the high complexity of scene structural topologies in real-world environments. To address this [...] Read more.
Structural reconstruction helps infer the spatial relationships and object layouts in a scene, which is an essential computer vision task for understanding visual content. However, it remains challenging due to the high complexity of scene structural topologies in real-world environments. To address this challenge, this paper proposes RegionGraph, a novel method for structural reconstruction of buildings from a satellite image. It utilizes a layout region graph construction and graph contraction approach, introducing a primitive (layout region) estimation network named ConPNet for detecting and estimating different structural primitives. By combining structural extraction and rendering synthesis processes, RegionGraph constructs a graph structure with layout regions as nodes and adjacency relationships as edges, and transforms the graph optimization process into a node-merging-based graph contraction problem to obtain the final structural representation. The experiments demonstrated that RegionGraph achieves a 4% improvement in average F1 scores across three types of primitives and exhibits higher regional completeness and structural coherency in the reconstructed structure. Full article
Show Figures

Figure 1

14 pages, 2627 KB  
Article
Comparative Assessment of Hyperspectral Image Segmentation Algorithms for Fruit Defect Detection Under Different Illumination Conditions
by Anastasia Zolotukhina, Anton Sudarev, Georgiy Nesterov and Demid Khokhlov
J. Imaging 2026, 12(4), 160; https://doi.org/10.3390/jimaging12040160 - 8 Apr 2026
Viewed by 562
Abstract
This study presents a comparative analysis of hyperspectral image segmentation algorithms for fruit defect detection under different illumination conditions. The research evaluates the performance of four segmentation methods (Spectral Angle Mapper, Random Forest, Support Vector Machine, and Neural Network) using three distinct illumination [...] Read more.
This study presents a comparative analysis of hyperspectral image segmentation algorithms for fruit defect detection under different illumination conditions. The research evaluates the performance of four segmentation methods (Spectral Angle Mapper, Random Forest, Support Vector Machine, and Neural Network) using three distinct illumination modes (local, simultaneous and sequential). The experimental setup employed hyperspectral imaging to assess tomato fruit samples, with data acquisition performed across the 450–850 nm spectral range. Quantitative metrics, including accuracy, error rate, precision, recall, F1-score, and Intersection over Union (IoU), were used to evaluate algorithm performance. Key findings indicate that Random Forest demonstrated superior performance across most metrics, particularly under simultaneous illumination conditions. The highest accuracy was achieved by Random Forest under sequential illumination (0.9971), while the best combination of segmentation metrics was obtained under simultaneous illumination, with an F1-score of 0.8996 and an IoU of 0.8176. The Neural Network showed competitive results. The Spectral Angle Mapper proved sensitive to illumination variations but excelled in specific scenarios requiring minimal memory usage. By demonstrating that acquisition protocol optimization can substantially improve segmentation performance, our results support the development of accurate, non-contact, high-throughput inspection systems and contribute to reducing postharvest losses and improving supply chain quality control. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

18 pages, 6837 KB  
Article
Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness
by Hideyuki Ajiki and Midori Tanaka
J. Imaging 2026, 12(4), 159; https://doi.org/10.3390/jimaging12040159 - 8 Apr 2026
Viewed by 585
Abstract
Even when an object’s color is accurately reproduced in a colorimetrically reproduced image (CRI), the perceived material appearance does not necessarily match that of the original object. This mismatch remains a challenge for faithfully reproducing real-world appearance in digital media. In this study, [...] Read more.
Even when an object’s color is accurately reproduced in a colorimetrically reproduced image (CRI), the perceived material appearance does not necessarily match that of the original object. This mismatch remains a challenge for faithfully reproducing real-world appearance in digital media. In this study, we investigated how lightness and chroma modulation affect the perception of glossiness, transparency, and roughness. These three attributes were quantitatively correlated with physical surface properties and image features through a direct comparison between objects and images. Observers selected the images that best matched the material appearance of the physical samples for each attribute. Image features derived from the gray-level co-occurrence matrix (GLCM) and surface roughness parameters were analyzed to compare the selected images with the CRI. In the lightness experiment, observers consistently selected images with higher lightness than the CRI, which was accompanied by increased complexity in the luminance distribution. In the chroma experiment, images with higher chroma were preferred; however, changes in GLCM features were negligible. Notably, stimuli with small local luminance differences at the CRI required larger shifts in image features to achieve perceptual matching. These findings indicate that modulating the luminance distribution is crucial for aligning the perceived appearance between physical objects and their digital representations. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

21 pages, 11316 KB  
Article
Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning
by Hao Wang, Dinghui Wu, Shuguang Han, Jingli Tang and Wenlong Zhang
J. Imaging 2026, 12(4), 158; https://doi.org/10.3390/jimaging12040158 - 8 Apr 2026
Cited by 1 | Viewed by 605
Abstract
Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations [...] Read more.
Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (n = 670) were allocated for training (n = 535), and the remaining samples (n = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

31 pages, 6317 KB  
Article
A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning
by Ludovica Ciardiello, Patrizia Agnello, Marta Petyx, Fabio Martinelli, Mario Cesarelli, Antonella Santone and Francesco Mercaldo
J. Imaging 2026, 12(4), 157; https://doi.org/10.3390/jimaging12040157 - 6 Apr 2026
Viewed by 1406
Abstract
Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and [...] Read more.
Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and telemedicine scenarios. Beyond pose estimation, the proposed method is able to compute angles between joints, enabling analysis of body alignment and posture. The proposed approach is built upon a customized skeleton with 25 anatomical keypoints and a dataset composed of over 150,000 annotated and augmented images derived from multiple open-source datasets. Experimental results demonstrate the effectiveness of the proposed method, achieving a mAP@50 of 0.58 for keypoint localization and 0.98 for object detection. Moreover, we demonstrate several real-world practical use cases in evaluating exercise correctness and identifying postural deviations by exploiting the proposed method, confirming that the proposed method can represent a promising approach for automated motion analysis, with potential impact on digital health, rehabilitation support, and remote patient care. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

17 pages, 5235 KB  
Article
An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching
by Kun Zhang, Jinming Xing and Qingtai Xiao
J. Imaging 2026, 12(4), 156; https://doi.org/10.3390/jimaging12040156 - 3 Apr 2026
Viewed by 677
Abstract
Ultrasound images have some limitations, such as low signal-to-noise ratio (SNR), speckle noise, lower dynamic range, blurred boundaries, and shadowing; therefore, ultrasound image registration is an important task for estimating tissue motion and analyzing tissue mechanical properties. In this paper, an effective non-rigid [...] Read more.
Ultrasound images have some limitations, such as low signal-to-noise ratio (SNR), speckle noise, lower dynamic range, blurred boundaries, and shadowing; therefore, ultrasound image registration is an important task for estimating tissue motion and analyzing tissue mechanical properties. In this paper, an effective non-rigid ultrasound image registration method is proposed. By integrating intensity, local phase information, and descriptor matching under a variational framework, we can find and track the non-rigid transformation of each pixel under diffeomorphism between the source and target images based on the warping technique. Experiments using simulation and in vivo ultrasound images of the human carotid artery are conducted to demonstrate the advantages of the proposed algorithm, which will act as an important supplement to current ultrasound image registration. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

18 pages, 1365 KB  
Article
DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration
by Lujun Zhai, Yonghui Wang, Yu Zhou and Suxia Cui
J. Imaging 2026, 12(4), 155; https://doi.org/10.3390/jimaging12040155 - 3 Apr 2026
Viewed by 820
Abstract
Historical images as the dominant method for documenting the world and its inhabitants can help us to better understand the real history. Due to the limited camera technology, historical images captured in the early to mid-20th century tend to be very blurry, unclear, [...] Read more.
Historical images as the dominant method for documenting the world and its inhabitants can help us to better understand the real history. Due to the limited camera technology, historical images captured in the early to mid-20th century tend to be very blurry, unclear, noisy, and obscure. The goal of this paper is to super-resolve images for historical image restoration. Compared to the degradations in modern digital imagery, those in historical images have unique features that are typically much more complex and less well understood. The discrepancy between historical images and modern high-definition digital images leads to a significant performance drop for existing super-resolution (SR) models trained on modern digital imagery. To tackle this problem, we propose a new method, namely DA-CycleGAN. Specifically, the DA-CycleGAN is built on top of CycleGAN to achieve unsupervised learning. We introduce a degradation-adaptive (DA) module with strong, flexible adaptation to learn various unknown degradations from samples. Moreover, we collect a large dataset containing 10,000 low-resolution images from real historical films. The dataset features various natural degradations. Our experimental results demonstrate the superior performance of DA-CycleGAN and the effectiveness of our image dataset for achieving accurate super-resolution enhancement of historical images. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

26 pages, 6199 KB  
Article
WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration
by Junyuan Ma, Qunbo Lv and Zheng Tan
J. Imaging 2026, 12(4), 154; https://doi.org/10.3390/jimaging12040154 - 2 Apr 2026
Viewed by 731
Abstract
Image restoration under adverse weather conditions has attracted increasing attention because of its importance for both human perception and downstream vision applications. Existing methods, however, are often designed for a single degradation type. We present WeatherMAR, a multi-weather restoration framework that formulates [...] Read more.
Image restoration under adverse weather conditions has attracted increasing attention because of its importance for both human perception and downstream vision applications. Existing methods, however, are often designed for a single degradation type. We present WeatherMAR, a multi-weather restoration framework that formulates adverse-weather restoration as a paired-domain completion problem in a shared continuous token space. Specifically, WeatherMAR concatenates degraded and clean token sequences into a joint paired-domain sequence and performs restoration through masked autoregressive modeling, in which self-attention enables direct cross-domain interaction. To strengthen conditional learning while avoiding trivial paired correspondences, we introduce complementary bidirectional masking together with an optional reverse objective used only during training to encourage degradation-aware representations. WeatherMAR further employs a conditional diffusion objective for continuous token prediction and adopts a progress-to-step schedule to improve inference efficiency. Extensive experiments on standard multi-weather benchmarks, including Snow100K, Outdoor-Rain, and RainDrop, show that WeatherMAR achieves the best PSNR/SSIM on Snow100K-S (38.14/0.9684), the best SSIM on Outdoor-Rain (0.9396), and the best PSNR on Snow100K-L (32.58) and RainDrop (33.12). These results demonstrate that paired-domain token completion provides an effective solution for adverse-weather restoration. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
Show Figures

Figure 1

42 pages, 10149 KB  
Article
Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI
by Jesus David Rios-Perez, German Sanchez-Torres, John W. Branch-Bedoya and Camilo Andres Laiton-Bonadiez
J. Imaging 2026, 12(4), 153; https://doi.org/10.3390/jimaging12040153 - 2 Apr 2026
Viewed by 1336
Abstract
Periodic artifacts such as ringing (Gibbs), herringbone (spike/corduroy), and zipper patterns degrade the quality of brain MRI. We present a reproducible framework that (i) synthetically generates periodic artifacts with controllable severity directly in k-space, (ii) normalizes pattern orientation through a Radon-guided alignment step, [...] Read more.
Periodic artifacts such as ringing (Gibbs), herringbone (spike/corduroy), and zipper patterns degrade the quality of brain MRI. We present a reproducible framework that (i) synthetically generates periodic artifacts with controllable severity directly in k-space, (ii) normalizes pattern orientation through a Radon-guided alignment step, and (iii) corrects them in the wavelet domain using a 2D DWT (AA/AD/DA/DD) with a band-weighted loss. The evaluation was conducted using DLBS T1-weighted 3T MRI volumes with synthetically generated periodic artifacts. It combined global image-quality metrics (SSIM, PSNR) with per-band metrics to quantify how correction concentrates on high-frequency components, and included ablation studies, mixed-artifact stress tests, and structural preservation analyses. Compared with several baseline architectures, the proposed approach shows improvements in structural fidelity and a reduction in periodic patterns (SSIM: 0.985±0.022; PSNR: 43.337±5.364; reduction in concentrated error in high-frequency bands), while preserving unaffected structures. These findings indicate that, within a controlled synthetic benchmark, aligning the pattern orientation prior to learning and optimizing correction in the wavelet domain enables suppression of synthetically generated periodic artifacts while limiting over-smoothing. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

25 pages, 15197 KB  
Article
Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support
by Alexandru-George Berciu, Diana Rus-Gonciar, Teodora Mocan, Lucia Agoston-Coldea, Carmen Cionca and Eva-Henrietta Dulf
J. Imaging 2026, 12(4), 152; https://doi.org/10.3390/jimaging12040152 - 31 Mar 2026
Viewed by 476
Abstract
Myocardial fibrosis is a critical prognostic marker involving a progressive cascade of pathological conditions. Accurate assessment of fibrosis in myocardial samples is a routine but difficult procedure for pathologists. This article presents a semi-automated system designed to ease this task while providing pixel-level [...] Read more.
Myocardial fibrosis is a critical prognostic marker involving a progressive cascade of pathological conditions. Accurate assessment of fibrosis in myocardial samples is a routine but difficult procedure for pathologists. This article presents a semi-automated system designed to ease this task while providing pixel-level accuracy that exceeds manual estimation capabilities. The proposed innovative approach combines Gabor filters with CIELAB color space analysis to ensure the efficiency and interpretability of calculations. Testing on histopathological samples, differentiating between fibrous, healthy, and variant tissues, yielded a promising accuracy of 87.5% for images with fibrosis and 80% for all 45 images tested. This system successfully establishes a solid foundation for automated diagnosis, providing pathologists with a reliable and highly accurate tool for quantitative analysis of cardiac tissue. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

15 pages, 1260 KB  
Article
Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review
by Pasquale Frisina, Paolo Ricci, Filippo Valentini and Daniela Messineo
J. Imaging 2026, 12(4), 151; https://doi.org/10.3390/jimaging12040151 - 30 Mar 2026
Viewed by 642
Abstract
Radiomics may aid the noninvasive characterization of adrenal incidentalomas; however, reproducibility is limited by methodological heterogeneity. In this retrospective, single-center, exploratory study, we tested whether radiomic features from baseline non-enhanced computed tomography (NECT) discriminate benign from malignant/metastatic adrenal lesions and contextualized results with [...] Read more.
Radiomics may aid the noninvasive characterization of adrenal incidentalomas; however, reproducibility is limited by methodological heterogeneity. In this retrospective, single-center, exploratory study, we tested whether radiomic features from baseline non-enhanced computed tomography (NECT) discriminate benign from malignant/metastatic adrenal lesions and contextualized results with a PRISMA 2020 systematic review (PubMed/Scopus 2017–2025; PROSPERO CRD420251276627). Thirty-three patients (36 lesions: 12 lipid-rich adenomas, 9 lipid-poor adenomas, 6 pheochromocytomas, 7 malignant/metastatic lesions, 2 myelolipomas) were included; myelolipomas were excluded from primary comparisons. Two abdominal radiologists performed consensus 3D segmentation on NECT. Using LIFEx (v7.8.0) and IBSI definitions, 42 features were extracted and z-score standardized. LASSO selected four heterogeneity descriptors: First-order Entropy, gray-level co-occurrence matrix (GLCM) entropy, gray-level size zone matrix (GLSZM) non-uniformity, and neighboring gray tone difference matrix (NGTDM) busyness. Heterogeneity increased from lipid-rich adenomas to pheochromocytomas and malignant/metastatic lesions (Kruskal–Wallis, all p < 0.001. Pairwise separability, measured using the Vargha–Delaney A index (VDA) as a rank-based measure of separability, was highest for lipid-rich adenomas versus malignant/metastatic lesions (0.93), intermediate for lipid-poor adenomas versus pheochromocytomas (0.73), and lowest for lipid-rich versus lipid-poor adenomas (0.64). The review identified 18 eligible CT radiomics studies that consistently reported higher entropy/non-uniformity in pheochromocytomas and malignant lesions than in lipid-rich adenomas. Global heterogeneity metrics on NECT may complement conventional CT criteria in indeterminate lesions; external validation with robust reference standards is needed in larger, multicenter cohorts with harmonization. Full article
(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)
Show Figures

Figure A1

23 pages, 3504 KB  
Article
Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)
by Md. Rahatul Islam, Yui Kataoka, Keisuke Teramoto and Keiichi Horio
J. Imaging 2026, 12(4), 150; https://doi.org/10.3390/jimaging12040150 - 30 Mar 2026
Viewed by 974
Abstract
Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address [...] Read more.
Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop