Next Issue
Volume 12, January
Previous Issue
Volume 11, November
 
 

J. Imaging, Volume 11, Issue 12 (December 2025) – 35 articles

Cover Story (view full-size image): Text-to-image models can imitate historical artistic styles, but their effectiveness remains unclear. We introduce an evaluation framework that merges expert art-historical and semiotic judgment with quantitative analysis. Three experts rated historical works and Midjourney generations across five movements, ten painters, and nine stylistic criteria. Using confidence intervals and our Relative Ratings Map, we visualized rating shifts, dispersion, distributional overlap, and summarized outputs into four quality levels. Results show strong expert variability, moderate effects across movements and criteria, partial alignment with historical trends, occasional stereotypes, and clear failures. Assessing stylistic fidelity thus remains challenging both by experts and quantitative tools. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
29 pages, 9041 KB  
Review
A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development
by Minh Sao Khue Luu, Margaret V. Benedichuk, Ekaterina I. Roppert, Roman M. Kenzhin and Bair N. Tuchinov
J. Imaging 2025, 11(12), 454; https://doi.org/10.3390/jimaging11120454 - 18 Dec 2025
Viewed by 451
Abstract
The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 scans to [...] Read more.
The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 scans to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 14 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field correction, skull stripping, spatial registration, and interpolation alter voxel statistics and geometry. While these steps improve within-dataset consistency, residual differences persist between datasets. Finally, a feature-space case study using a 3D DenseNet121 shows measurable residual covariate shift after standardized preprocessing, confirming that harmonization alone cannot eliminate inter-dataset bias. Together, these analyses provide a unified characterization of variability in public brain MRI resources and emphasize the need for preprocessing-aware and domain-adaptive strategies in the design of generalizable brain MRI foundation models. Full article
Show Figures

Figure 1

18 pages, 1564 KB  
Article
Salient Object Detection in Optical Remote Sensing Images Based on Hierarchical Semantic Interaction
by Jingfan Xu, Qi Zhang, Jinwen Xing, Mingquan Zhou and Guohua Geng
J. Imaging 2025, 11(12), 453; https://doi.org/10.3390/jimaging11120453 - 17 Dec 2025
Viewed by 295
Abstract
Existing salient object detection methods for optical remote sensing images still face certain limitations due to complex background variations, significant scale discrepancies among targets, severe background interference, and diverse topological structures. On the one hand, the feature transmission process often neglects the constraints [...] Read more.
Existing salient object detection methods for optical remote sensing images still face certain limitations due to complex background variations, significant scale discrepancies among targets, severe background interference, and diverse topological structures. On the one hand, the feature transmission process often neglects the constraints and complementary effects of high-level features on low-level features, leading to insufficient feature interaction and weakened model representation. On the other hand, decoder architectures generally rely on simple cascaded structures, which fail to adequately exploit and utilize contextual information. To address these challenges, this study proposes a Hierarchical Semantic Interaction Module to enhance salient object detection performance in optical remote sensing scenarios. The module introduces foreground content modeling and a hierarchical semantic interaction mechanism within a multi-scale feature space, reinforcing the synergy and complementarity among features at different levels. This effectively highlights multi-scale and multi-type salient regions in complex backgrounds. Extensive experiments on multiple optical remote sensing datasets demonstrate the effectiveness of the proposed method. Specifically, on the EORSSD dataset, our full model integrating both CA and PA modules improves the max F-measure from 0.8826 to 0.9100 (↑2.74%), increases maxE from 0.9603 to 0.9727 (↑1.24%), and enhances the S-measure from 0.9026 to 0.9295 (↑2.69%) compared with the baseline. These results clearly demonstrate the effectiveness of the proposed modules and verify the robustness and strong generalization capability of our method in complex remote sensing scenarios. Full article
(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)
Show Figures

Figure 1

20 pages, 4545 KB  
Article
SRE-FMaps: A Sinkhorn-Regularized Elastic Functional Map Framework for Non-Isometric 3D Shape Matching
by Dan Zhang, Yue Zhang, Ning Wang and Dong Zhao
J. Imaging 2025, 11(12), 452; https://doi.org/10.3390/jimaging11120452 - 16 Dec 2025
Viewed by 288
Abstract
Precise 3D shape correspondence is a fundamental prerequisite for critical applications ranging from medical anatomical modeling to visual recognition. However, non-isometric 3D shape matching remains a challenging task due to the limited sensitivity of traditional Laplace–Beltrami (LB) bases to local geometric deformations such [...] Read more.
Precise 3D shape correspondence is a fundamental prerequisite for critical applications ranging from medical anatomical modeling to visual recognition. However, non-isometric 3D shape matching remains a challenging task due to the limited sensitivity of traditional Laplace–Beltrami (LB) bases to local geometric deformations such as stretching and bending. To address these limitations, this paper proposes a Sinkhorn-Regularized Elastic Functional Map framework (SRE-FMaps) that integrates entropy-regularized optimal transport with an elastic thin-shell energy basis. First, a sparse Sinkhorn transport plan is adopted to initialize a bijective correspondence with linear computational complexity. Then, a non-orthogonal elastic basis, derived from the Hessian of thin-shell deformation energy, is introduced to enhance high-frequency feature perception. Finally, correspondence stability is quantified through a cosine-based elastic distance metric, enabling retrieval and classification. Experiments on the SHREC2015, McGill, and Face datasets demonstrate that SRE-FMaps reduces the correspondence error by a maximum of 32% and achieves an average of 92.3% classification accuracy (with a peak of 94.74% on the Face dataset). Moreover, the framework exhibits superior robustness, yielding a recall of up to 91.67% and an F1-score of 0.94, effectively handling bending, stretching, and folding deformations compared with conventional LB-based functional map pipelines. The proposed framework provides a scalable solution for non-isometric shape correspondence in medical modeling, 3D reconstruction, and visual recognition. Full article
Show Figures

Figure 1

27 pages, 20790 KB  
Article
Application of Generative Adversarial Networks to Improve COVID-19 Classification on Ultrasound Images
by Pedro Sérgio Tôrres Figueiredo Silva, Antonio Mauricio Ferreira Leite Miranda de Sá, Wagner Coelho de Albuquerque Pereira, Leonardo Bonato Felix and José Manoel de Seixas
J. Imaging 2025, 11(12), 451; https://doi.org/10.3390/jimaging11120451 - 15 Dec 2025
Viewed by 227
Abstract
COVID-19 screening is crucial for the early diagnosis and treatment of the disease, with lung ultrasound posing as a cost-effective alternative to other imaging techniques. Given the dependency on medical expertise and experience to accurately identify patterns in ultrasound exams, deep learning techniques [...] Read more.
COVID-19 screening is crucial for the early diagnosis and treatment of the disease, with lung ultrasound posing as a cost-effective alternative to other imaging techniques. Given the dependency on medical expertise and experience to accurately identify patterns in ultrasound exams, deep learning techniques have been explored for automatically classifying patients’ conditions. However, the limited availability of public medical databases remains a significant obstacle to the development of more advanced models. To address the data scarcity problem, this study proposes a method that leverages generative adversarial networks (GANs) to generate synthetic lung ultrasound images, which are subsequently used to train frame-based classification models. Two types of GANs are considered: Wasserstein GANs (WGAN) and Pix2Pix. Specific tools are used to show that the synthetic data produced present a distribution close to the original data. The classification models trained with synthetic data achieved a peak accuracy of 96.32% ± 4.17%, significantly outperforming the maximum accuracy of 82.69% ± 10.42% obtained when training only with the original data. Furthermore, the best results are comparable to, and in some cases surpass, those reported in recent related studies. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

14 pages, 2526 KB  
Article
Applying Radiomics to Predict Outcomes in Patients with High-Grade Retroperitoneal Sarcoma Treated with Preoperative Radiotherapy
by Adel Shahnam, Nicholas Hardcastle, David E. Gyorki, Katrina M. Ingley, Krystel Tran, Catherine Mitchell, Sarat Chander, Julie Chu, Michael Henderson, Alan Herschtal, Mathias Bressel and Jeremy Lewin
J. Imaging 2025, 11(12), 450; https://doi.org/10.3390/jimaging11120450 - 15 Dec 2025
Viewed by 281
Abstract
Retroperitoneal sarcomas (RPS) are rare tumours, primarily treated with surgical resection. However, recurrences are frequent. Combining clinical factors with CT-derived radiomic features could enhance treatment stratification and personalization. This study aims to assess whether radiomic features provide additional prognostic value beyond clinicopathological features [...] Read more.
Retroperitoneal sarcomas (RPS) are rare tumours, primarily treated with surgical resection. However, recurrences are frequent. Combining clinical factors with CT-derived radiomic features could enhance treatment stratification and personalization. This study aims to assess whether radiomic features provide additional prognostic value beyond clinicopathological features in patients with high-risk RPS treated with preoperative radiotherapy. This retrospective study included patients aged 18 or older with non-recurrent and non-metastatic RPS treated with preoperative radiotherapy between 2008 and 2016. Hazard ratios (HR) were calculated using Cox proportional hazards regression to assess the impact of clinical and radiomic features on time to event outcomes. Predictive accuracy was assessed with c-statistics. Radiomic analysis was performed on the high-risk group (undifferentiated pleomorphic sarcoma, well-differentiated/de-differentiated liposarcoma or grade 2/3 leiomyosarcoma). Seventy-two patients were included, with a median follow-up of 3.7 years, the 5-year overall survival (OS) was 67%. Multivariable analysis showed older age (HR: 1.3 per 5-year increase, p = 0.04), grade 3 (HR: 180.3, p = 0.02), and larger tumours (HR: 4.0 per 10 cm increase, p = 0.02) predicted worse OS. In the higher-risk group, the c-statistic for the clinical model was 0.59 (time to distant metastasis (TDM)) and 0.56 (OS). Among 27 radiomic features, kurtosis improved OS prediction (c-statistic 0.69, p = 0.013), and Neighbourhood Gray-Tone Difference Matrix (NGTDM) busyness improved it to 0.73 (p = 0.036). Kurtosis also improved TDM prediction (c-statistic 0.72, p = 0.023). Radiomic features may complement clinicopathological factors in predicting overall survival and time to distant metastasis in high-risk retroperitoneal sarcoma. These exploratory findings warrant validation in larger, multi-institutional studies. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 3468 KB  
Article
Sensory Representation of Neural Networks Using Sound and Color for Medical Imaging Segmentation
by Irenel Lopo Da Silva, Nicolas Francisco Lori and José Manuel Ferreira Machado
J. Imaging 2025, 11(12), 449; https://doi.org/10.3390/jimaging11120449 - 15 Dec 2025
Viewed by 278
Abstract
This paper introduces a novel framework for sensory representation of brain imaging data, combining deep learning-based segmentation with multimodal visual and auditory outputs. Structural magnetic resonance imaging (MRI) predictions are converted into color-coded maps and stereophonic/MIDI sonifications, enabling intuitive interpretation of cortical activation [...] Read more.
This paper introduces a novel framework for sensory representation of brain imaging data, combining deep learning-based segmentation with multimodal visual and auditory outputs. Structural magnetic resonance imaging (MRI) predictions are converted into color-coded maps and stereophonic/MIDI sonifications, enabling intuitive interpretation of cortical activation patterns. High-precision U-Net models efficiently generate these outputs, supporting clinical decision-making, cognitive research, and creative applications. Spatial, intensity, and anomalous features are encoded into perceivable visual and auditory cues, facilitating early detection and introducing the concept of “auditory biomarkers” for potential pathological identification. Despite current limitations, including dataset size, absence of clinical validation, and heuristic-based sonification, the pipeline demonstrates technical feasibility and robustness. Future work will focus on clinical user studies, the application of functional MRI (fMRI) time-series for dynamic sonification, and the integration of real-time emotional feedback in cinematic contexts. This multisensory approach offers a promising avenue for enhancing the interpretability of complex neuroimaging data across medical, research, and artistic domains. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Graphical abstract

17 pages, 3706 KB  
Article
Dual-Path Convolutional Neural Network with Squeeze-and-Excitation Attention for Lung and Colon Histopathology Classification
by Helala AlShehri
J. Imaging 2025, 11(12), 448; https://doi.org/10.3390/jimaging11120448 - 14 Dec 2025
Viewed by 325
Abstract
Lung and colon cancers remain among the leading causes of cancer-related mortality worldwide, underscoring the need for rapid and accurate histopathological diagnosis. Manual examination of biopsy slides is often time-consuming and prone to inter-observer variability, which highlights the importance of developing reliable and [...] Read more.
Lung and colon cancers remain among the leading causes of cancer-related mortality worldwide, underscoring the need for rapid and accurate histopathological diagnosis. Manual examination of biopsy slides is often time-consuming and prone to inter-observer variability, which highlights the importance of developing reliable and explainable automated diagnostic systems. This study presents DPCSE-Net, a lightweight dual-path convolutional neural network enhanced with a squeeze-and-excitation (SE) attention mechanism for lung and colon cancer classification. The dual-path structure captures both fine-grained cellular textures and global contextual information through multiscale feature extraction, while the SE attention module adaptively recalibrates channel responses to emphasize discriminative features. To enhance transparency and interpretability, Gradient-weighted Class Activation Mapping (Grad-CAM), attention heatmaps, and Integrated Gradients are employed to visualize class-specific activation patterns and verify that the model’s focus aligns with diagnostically relevant tissue regions. Evaluated on the publicly available LC25000 dataset, DPCSE-Net achieved state-of-the-art performance with 99.88% accuracy and F1-score, while maintaining low computational complexity. Ablation experiments confirmed the contribution of the dual-path design and SE module, and qualitative analyses demonstrated the model’s strong interpretability. These results establish DPCSE-Net as an accurate, efficient, and explainable framework for computer-aided histopathological diagnosis, supporting the broader goals of explainable AI in computer vision. Full article
(This article belongs to the Special Issue Explainable AI in Computer Vision)
Show Figures

Figure 1

28 pages, 4422 KB  
Article
Enhanced Object Detection Algorithms in Complex Environments via Improved CycleGAN Data Augmentation and AS-YOLO Framework
by Zhen Li, Yuxuan Wang, Lingzhong Meng, Wenjuan Chu and Guang Yang
J. Imaging 2025, 11(12), 447; https://doi.org/10.3390/jimaging11120447 - 12 Dec 2025
Viewed by 495
Abstract
Object detection in complex environments, such as challenging lighting conditions, adverse weather, and target occlusions, poses significant difficulties for existing algorithms. To address these challenges, this study introduces a collaborative solution integrating improved CycleGAN-based data augmentation and an enhanced object detection framework, AS-YOLO. [...] Read more.
Object detection in complex environments, such as challenging lighting conditions, adverse weather, and target occlusions, poses significant difficulties for existing algorithms. To address these challenges, this study introduces a collaborative solution integrating improved CycleGAN-based data augmentation and an enhanced object detection framework, AS-YOLO. The improved CycleGAN incorporates a dual self-attention mechanism and spectral normalization to enhance feature capture and training stability. The AS-YOLO framework integrates a channel–spatial parallel attention mechanism, an AFPN structure for improved feature fusion, and the Inner_IoU loss function for better generalization. The experimental results show that compared with YOLOv8n, mAP@0.5 and mAP@0.95 of the AS-YOLO algorithm have increased by 1.5% and 0.6%, respectively. After data augmentation and style transfer, mAP@0.5 and mAP@0.95 have increased by 14.6% and 17.8%, respectively, demonstrating the effectiveness of the proposed method in improving the performance of the model in complex scenarios. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
Show Figures

Figure 1

29 pages, 11999 KB  
Article
Pixel-Wise Sky-Obstacle Segmentation in Fisheye Imagery Using Deep Learning and Gradient Boosting
by Némo Bouillon and Vincent Boitier
J. Imaging 2025, 11(12), 446; https://doi.org/10.3390/jimaging11120446 - 12 Dec 2025
Viewed by 399
Abstract
Accurate sky–obstacle segmentation in hemispherical fisheye imagery is essential for solar irradiance forecasting, photovoltaic system design, and environmental monitoring. However, existing methods often rely on expensive all-sky imagers and region-specific training data, produce coarse sky–obstacle boundaries, and ignore the optical properties of fisheye [...] Read more.
Accurate sky–obstacle segmentation in hemispherical fisheye imagery is essential for solar irradiance forecasting, photovoltaic system design, and environmental monitoring. However, existing methods often rely on expensive all-sky imagers and region-specific training data, produce coarse sky–obstacle boundaries, and ignore the optical properties of fisheye lenses. We propose a low-cost segmentation framework designed for fisheye imagery that combines synthetic data generation, lens-aware augmentation, and a hybrid deep-learning pipeline. Synthetic fisheye training images are created from publicly available street-view panoramas to cover diverse environments without dedicated hardware, and lens-aware augmentations model fisheye projection and photometric effects to improve robustness across devices. On this dataset, we train a convolutional neural network (CNN) and refine its output with gradient-boosted decision trees (GBDT) to sharpen sky–obstacle boundaries. The method is evaluated on real fisheye images captured with smartphones and low-cost clip-on lenses across multiple sites, achieving an Intersection over Union (IoU) of 96.63% and an F1 score of 98.29%, along with high boundary accuracy. An additional evaluation on an external panoramic baseline dataset confirms strong cross-dataset generalization. Together, these results show that the proposed framework enables accurate, low-cost, and widely deployable hemispherical sky segmentation for practical solar and environmental imaging applications. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

19 pages, 10689 KB  
Article
Research on Augmentation of Wood Microscopic Image Dataset Based on Generative Adversarial Networks
by Shuo Xu, Hang Su and Lei Zhao
J. Imaging 2025, 11(12), 445; https://doi.org/10.3390/jimaging11120445 - 12 Dec 2025
Viewed by 254
Abstract
Microscopic wood images are vital in wood analysis and classification research. However, the high cost of acquiring microscopic images and the limitations of experimental conditions have led to a severe problem of insufficient sample data, which significantly restricts the training performance and generalization [...] Read more.
Microscopic wood images are vital in wood analysis and classification research. However, the high cost of acquiring microscopic images and the limitations of experimental conditions have led to a severe problem of insufficient sample data, which significantly restricts the training performance and generalization ability of deep learning models. This study first used basic image processing techniques to perform preliminary augmentation of the original dataset. The augmented data were then input into five GAN models, BGAN, DCGAN, WGAN-GP, LSGAN, and StyleGAN2, for training. The quality and model performance of the generated images were assessed by analyzing the degree of fidelity of cellular structure (e.g., earlywood, latewood, and wood rays), image clarity, and diversity of the images for each model-generated image, as well as by using KID, IS, and SSIM. The results showed that images generated by BGAN and WGAN-GP exhibited high quality, with lower KID values and higher IS values, and the generated images were visually close to real images. In contrast, the DCGAN, LSGAN, and StyleGAN2 models experienced mode collapse during training, resulting in lower image clarity and diversity compared to the other models. Through a comparative analysis of different GAN models, this study demonstrates the feasibility and effectiveness of Generative Adversarial Networks in the domain of small-sample image data augmentation, providing an important reference for further research in the field of wood identification. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

19 pages, 1071 KB  
Article
AI-Driven Clinical Decision Support System for Automated Ventriculomegaly Classification from Fetal Brain MRI
by Mannam Subbarao, Simi Surendran, Seena Thomas, Hemanth Lakshman, Vinjanampati Goutham, Keshagani Goud and Suhas Udayakumaran
J. Imaging 2025, 11(12), 444; https://doi.org/10.3390/jimaging11120444 - 12 Dec 2025
Viewed by 421
Abstract
Fetal ventriculomegaly (VM) is a condition characterized by abnormal enlargement of the cerebral ventricles of the fetus brain that often causes developmental disorders in children. Manual segmentation and classification of ventricular structures from brain MRI scans are time-consuming and require clinical expertise. To [...] Read more.
Fetal ventriculomegaly (VM) is a condition characterized by abnormal enlargement of the cerebral ventricles of the fetus brain that often causes developmental disorders in children. Manual segmentation and classification of ventricular structures from brain MRI scans are time-consuming and require clinical expertise. To address this challenge, we develop an automated pipeline for ventricle segmentation, ventricular width estimation, and VM severity classification using a publicly available dataset. An adaptive slice selection strategy converts 3D MRI volumes into the most informative 2D slices, which are then segmented to isolate the lateral ventricles and deep gray matter. Ventricular width is automatically estimated to assign severity levels based on clinical thresholds, generating labeled data for training a deep learning classifier. Finally, an explainability module using a large language model integrates the MRI slices, segmentation masks, and predicted severity to provide interpretable clinical reasoning. Experimental results demonstrate that the proposed decision support system delivers robust performance, achieving dice scores of 89% and 87.5% for the 2D and 3D segmentation models, respectively. Also, the classification network attains an accuracy of 86% and an F1-score of 0.84 in VM analysis. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

22 pages, 1479 KB  
Article
VMPANet: Vision Mamba Skin Lesion Image Segmentation Model Based on Prompt and Attention Mechanism Fusion
by Zinuo Peng, Shuxian Liu and Chenhao Li
J. Imaging 2025, 11(12), 443; https://doi.org/10.3390/jimaging11120443 - 11 Dec 2025
Viewed by 364
Abstract
In the realm of medical image processing, the segmentation of dermatological lesions is a pivotal technique for the early detection of skin cancer. However, existing methods for segmenting images of skin lesions often encounter limitations when dealing with intricate boundaries and diverse lesion [...] Read more.
In the realm of medical image processing, the segmentation of dermatological lesions is a pivotal technique for the early detection of skin cancer. However, existing methods for segmenting images of skin lesions often encounter limitations when dealing with intricate boundaries and diverse lesion shapes. To address these challenges, we propose VMPANet, designed to accurately localize critical targets and capture edge structures. VMPANet employs an inverted pyramid convolution to extract multi-scale features while utilizing the visual Mamba module to capture long-range dependencies among image features. Additionally, we leverage previously extracted masks as cues to facilitate efficient feature propagation. Furthermore, VMPANet integrates parallel depthwise separable convolutions to enhance feature extraction and introduces innovative mechanisms for edge enhancement, spatial attention, and channel attention to adaptively extract edge information and complex spatial relationships. Notably, VMPANet refines a novel cross-attention mechanism, which effectively facilitates the interaction between deep semantic cues and shallow texture details, thereby generating comprehensive feature representations while reducing computational load and redundancy. We conducted comparative and ablation experiments on two public skin lesion datasets (ISIC2017 and ISIC2018). The results demonstrate that VMPANet outperforms existing mainstream methods. On the ISIC2017 dataset, its mIoU and DSC metrics are 1.38% and 0.83% higher than those of VM-Unet respectively; on the ISIC2018 dataset, these metrics are 1.10% and 0.67% higher than those of EMCAD, respectively. Moreover, VMPANet boasts a parameter count of only 0.383 M and a computational load of 1.159 GFLOPs. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

15 pages, 3346 KB  
Article
HDR Merging of RAW Exposure Series for All-Sky Cameras: A Comparative Study for Circumsolar Radiometry
by Paul Matteschk, Max Aragón, Jose Gomez, Jacob K. Thorning, Stefanie Meilinger and Sebastian Houben
J. Imaging 2025, 11(12), 442; https://doi.org/10.3390/jimaging11120442 - 11 Dec 2025
Viewed by 310
Abstract
All-sky imagers (ASIs) used in solar energy meteorology face an extreme intra-image dynamic range, with the circumsolar neighborhood orders of magnitude brighter than the diffuse dome. Many operational ASI pipelines address this gap with high-dynamic-range (HDR) bracketing inside the camera’s image signal processor [...] Read more.
All-sky imagers (ASIs) used in solar energy meteorology face an extreme intra-image dynamic range, with the circumsolar neighborhood orders of magnitude brighter than the diffuse dome. Many operational ASI pipelines address this gap with high-dynamic-range (HDR) bracketing inside the camera’s image signal processor (ISP), i.e., after demosaicing and color processing in a nonlinear 8-bit RGB domain. Near the Sun, such ISP-domain HDR can down-weight the shortest exposure, retain clipped or near-clipped samples from longer frames, and compress highlight contrast, thereby increasing circumsolar saturation and flattening aureole gradients. A radiance-linear HDR fusion in the sensor/RAW domain (RAW–HDR) is therefore contrasted with the vendor ISP-based HDR mode (ISP–HDR). Solar-based geometric calibration enables Sun-centered analysis. Paired, interleaved acquisitions under clear-sky and broken-cloud conditions are evaluated using two circumsolar performance criteria per RGB channel: (i) saturated-area fraction in concentric rings and (ii) a median-based radial gradient in defined arcs. All quantitative analyses operate on the radiance-linear HDR result; post-merge tone mapping is only used for visualization. Across conditions, ISP–HDR exhibits roughly double the near-saturation within 0–4° of the Sun and about a three- to fourfold weaker circumsolar radial gradient within 0–6° relative to RAW–HDR. These findings indicate that radiance-linear fusion in the RAW domain better preserves circumsolar structure than the examined ISP-domain HDR mode and thus provides more suitable input for downstream tasks such as cloud–edge detection, aerosol retrieval, and irradiance estimation. Full article
(This article belongs to the Special Issue Techniques and Applications of Sky Imagers)
Show Figures

Graphical abstract

21 pages, 1505 KB  
Article
WaveletHSI: Direct HSI Classification from Compressed Wavelet Coefficients via Sub-Band Feature Extraction and Fusion
by Xin Li and Baile Sun
J. Imaging 2025, 11(12), 441; https://doi.org/10.3390/jimaging11120441 - 10 Dec 2025
Viewed by 313
Abstract
A major computational bottleneck in classifying large-scale hyperspectral images (HSI) is the mandatory data decompression prior to processing. Compressed-domain computing offers a solution by enabling deep learning on partially compressed data. However, existing compressed-domain methods are predominantly tailored for the Discrete Cosine Transform [...] Read more.
A major computational bottleneck in classifying large-scale hyperspectral images (HSI) is the mandatory data decompression prior to processing. Compressed-domain computing offers a solution by enabling deep learning on partially compressed data. However, existing compressed-domain methods are predominantly tailored for the Discrete Cosine Transform (DCT) used in natural images, while HSIs are typically compressed using the Discrete Wavelet Transform (DWT). The fundamental structural mismatch between the block-based DCT and the hierarchical DWT sub-bands presents two core challenges: how to extract features from multiple wavelet sub-bands, and how to fuse these features effectively? To address these issues, we propose a novel framework that extracts and fuses features from different DWT sub-bands directly. We design a multi-branch feature extractor with sub-band feature alignment loss that processes functionally different sub-bands in parallel, preserving the independence of each frequency feature. We then employ a sub-band cross-attention mechanism that inverts the typical attention paradigm by using the sparse, high-frequency detail sub-bands as queries to adaptively select and enhance salient features from the dense, information-rich low-frequency sub-bands. This enables a targeted fusion of global context and fine-grained structural information without data reconstruction. Experiments on three benchmark datasets demonstrate that our method achieves classification accuracy comparable to state-of-the-art spatial-domain approaches while eliminating at least 56% of the decompression overhead. Full article
(This article belongs to the Special Issue Multispectral and Hyperspectral Imaging: Progress and Challenges)
Show Figures

Figure 1

23 pages, 3326 KB  
Article
Hybrid Multi-Scale Neural Network with Attention-Based Fusion for Fruit Crop Disease Identification
by Shakhmaran Seilov, Akniyet Nurzhaubayev, Marat Baideldinov, Bibinur Zhursinbek, Medet Ashimgaliyev and Ainur Zhumadillayeva
J. Imaging 2025, 11(12), 440; https://doi.org/10.3390/jimaging11120440 - 10 Dec 2025
Viewed by 401
Abstract
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, [...] Read more.
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, have shown promise for automated plant disease identification, although they still face significant obstacles. These include poor generalization across complicated visual backdrops, limited resilience to different illness sizes, and high processing needs that make deployment on resource-constrained edge devices difficult. We suggest a Hybrid Multi-Scale Neural Network (HMCT-AF with GSAF) architecture for precise and effective fruit crop disease identification in order to overcome these drawbacks. In order to extract long-range dependencies, HMCT-AF with GSAF combines a Vision Transformer-based structural branch with multi-scale convolutional branches to capture both high-level contextual patterns and fine-grained local information. These disparate features are adaptively combined using a novel HMCT-AF with a GSAF module, which enhances model interpretability and classification performance. We conduct evaluations on both PlantVillage (controlled environment) and CLD (real-world in-field conditions), observing consistent performance gains that indicate strong resilience to natural lighting variations and background complexity. With an accuracy of up to 93.79%, HMCT-AF with GSAF outperforms vanilla Transformer models, EfficientNet, and traditional CNNs. These findings demonstrate how well the model captures scale-variant disease symptoms and how it may be used in real-time agricultural applications using hardware that is compatible with the edge. According to our research, HMCT-AF with GSAF presents a viable basis for intelligent, scalable plant disease monitoring systems in contemporary precision farming. Full article
Show Figures

Figure 1

16 pages, 14648 KB  
Article
Application of Artificial Intelligence and Computer Vision for Measuring and Counting Oysters
by Julio Antonio Laria Pino, Jesús David Terán Villanueva, Julio Laria Menchaca, Leobardo Garcia Solorio, Salvador Ibarra Martínez, Mirna Patricia Ponce Flores and Aurelio Alejandro Santiago Pineda
J. Imaging 2025, 11(12), 439; https://doi.org/10.3390/jimaging11120439 - 10 Dec 2025
Viewed by 235
Abstract
One of the most important activities in any oyster farm is the measurement of oyster size; this activity is time-consuming and conducted manually, generally using a caliper, which leads to high measurement variability. This paper proposes a methodology to count and obtain the [...] Read more.
One of the most important activities in any oyster farm is the measurement of oyster size; this activity is time-consuming and conducted manually, generally using a caliper, which leads to high measurement variability. This paper proposes a methodology to count and obtain the length and width averages of a sample of oysters from an image, relying on artificial intelligence (AI), which refers to systems capable of learning and decision-making, and computer vision (CV), which enables the extraction of information from digital images. The proposed approach employs the DBScan clustering algorithm, an artificial neural network (ANN), and a random forest classifier to enable automatic oyster classification, counting, and size estimation from images. As a result of the proposed methodology, the speed in measuring the length and width of the oysters was 86.7 times faster than manual measurement. Regarding the counting, the process missed the total count of oysters in two of the ten images. These results demonstrate the feasibility of using the proposed methodology to measure oyster size and count in oyster farms. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 11414 KB  
Article
Texture-Based Preprocessing Framework with nnU-Net Model for Accurate Intracranial Artery Segmentation
by Kyuseok Kim and Ji-Youn Kim
J. Imaging 2025, 11(12), 438; https://doi.org/10.3390/jimaging11120438 - 9 Dec 2025
Viewed by 403
Abstract
Accurate intracranial artery segmentation from digital subtraction angiography (DSA) is critical for neurovascular diagnosis and intervention planning. Vascular extraction, which combines preprocessing methods and deep learning models, yields a high level of results, but limited preprocessing results constrain the improvement of results. We [...] Read more.
Accurate intracranial artery segmentation from digital subtraction angiography (DSA) is critical for neurovascular diagnosis and intervention planning. Vascular extraction, which combines preprocessing methods and deep learning models, yields a high level of results, but limited preprocessing results constrain the improvement of results. We propose a texture-based contrast enhancement preprocessing framework integrated with the nnU-Net model to improve vessel segmentation in time-sequential DSA images. The method generates a combined feature mask by fusing local contrast, local entropy, and brightness threshold maps, which is then used as input for deep learning–based segmentation. Segmentation performance was evaluated using the DIAS dataset with various standard quantitative metrics. The proposed preprocessing significantly improved segmentation across all metrics compared to both the baseline and contrast-limited adaptive histogram equalization (CLAHE). Using nnU-Net, the method achieved a Dice Similarity Coefficient (DICE) of 0.83 ± 0.20 and an Intersection over Union (IoU) of 0.72 ± 0.14, outperforming CLAHE (DICE 0.79 ± 0.41, IoU 0.70 ± 0.23) and the baseline (DICE 0.65 ± 0.15, IoU 0.47 ± 0.20). Most notably, vessel connectivity (VC) dropped by over 65% relative to unprocessed images, indicating marked improvements in VC and topological accuracy. This study demonstrates that combining texture-based preprocessing with nnU-Net delivers robust, noise-tolerant, and clinically interpretable segmentation of intracranial arteries from DSA. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 6617 KB  
Article
Domain-Adaptive Segment Anything Model for Cross-Domain Water Body Segmentation in Satellite Imagery
by Lihong Yang, Pengfei Liu, Guilong Zhang, Huaici Zhao and Chunyang Zhao
J. Imaging 2025, 11(12), 437; https://doi.org/10.3390/jimaging11120437 - 9 Dec 2025
Viewed by 239
Abstract
Monitoring surface water bodies is crucial for environmental protection and resource management. Existing segmentation methods often struggle with limited generalization across different satellite domains. We propose DASAM, a domain-adaptive Segment Anything Model for cross-domain water body segmentation in satellite imagery. The core innovation [...] Read more.
Monitoring surface water bodies is crucial for environmental protection and resource management. Existing segmentation methods often struggle with limited generalization across different satellite domains. We propose DASAM, a domain-adaptive Segment Anything Model for cross-domain water body segmentation in satellite imagery. The core innovation of DASAM is a contrastive learning module that aligns features between source and style-augmented images, enabling robust domain generalization without requiring annotations from the target domain. Additionally, DASAM integrates a prompt-enhanced module and an encoder adapter to capture fine-grained spatial details and global context, further improving segmentation accuracy. Experiments on the China GF-2 dataset demonstrate superior performance over existing methods, while cross-domain evaluations on GLH-water and Sentinel-2 water body image datasets verify its strong generalization and robustness. These results highlight DASAM’s potential for large-scale, diverse satellite water body monitoring and accurate environmental analysis. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

26 pages, 1656 KB  
Article
Human Detection in UAV Thermal Imagery: Dataset Extension and Comparative Evaluation on Embedded Platforms
by Andrei-Alexandru Ulmămei, Taddeo D’Adamo, Costin-Emanuel Vasile and Radu Hobincu
J. Imaging 2025, 11(12), 436; https://doi.org/10.3390/jimaging11120436 - 9 Dec 2025
Viewed by 814
Abstract
Unmanned aerial vehicles (UAVs) equipped with thermal cameras are increasingly used in search and rescue (SAR) operations, where low visibility and small human footprints make detection a critical challenge. Existing datasets are mostly limited to urban or open-field scenarios, and our experiments show [...] Read more.
Unmanned aerial vehicles (UAVs) equipped with thermal cameras are increasingly used in search and rescue (SAR) operations, where low visibility and small human footprints make detection a critical challenge. Existing datasets are mostly limited to urban or open-field scenarios, and our experiments show that models trained on such heterogeneous data achieve poor results. To address this gap, we collected and annotated thermal images in mountainous environments using a DJI M3T drone under clear daytime conditions. This mountain-specific set was integrated with ten existing sources to form an extensive benchmark of over 75,000 images. We then performed a comparative evaluation of object detection models (YOLOv8/9/10, RT-DETR) and semantic segmentation networks (U-Net variants), analyzing accuracy, inference speed, and energy consumption on an NVIDIA Jetson AGX Orin. Results demonstrate that human detection tasks can be accurately solved through both semantic segmentation and object detection, achieving 90% detection accuracy using segmentation models and 85% accuracy using the YOLOv8 X detection model in mountain scenarios. On the Jetson platform, segmentation achieves real-time performance with up to 27 FPS in FP16 mode. Our contributions are as follows: (i) the introduction of a new mountainous thermal image collection extending current benchmarks and (ii) a comprehensive evaluation of detection methods on embedded hardware for SAR applications. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

13 pages, 5163 KB  
Article
Contrast-Enhanced Mammography in Breast Cancer Follow-Up: Diagnostic Value in Suspected Recurrence
by Claudio Ventura, Marco Fogante, Nicola Carboni, Silvia Gradassi Borgoforte, Barbara Franca Simonetti, Elisabetta Marconi and Giulio Argalia
J. Imaging 2025, 11(12), 435; https://doi.org/10.3390/jimaging11120435 - 6 Dec 2025
Viewed by 379
Abstract
Women with a personal history of breast cancer (PHBC) are at increased risk of local recurrence or new primary tumors, which are often difficult to assess on conventional imaging because of postoperative changes. This prospective study aimed to evaluate the diagnostic performance of [...] Read more.
Women with a personal history of breast cancer (PHBC) are at increased risk of local recurrence or new primary tumors, which are often difficult to assess on conventional imaging because of postoperative changes. This prospective study aimed to evaluate the diagnostic performance of contrast-enhanced mammography (CEM) in women with PHBC presenting with suspicious findings on follow-up mammography or ultrasound. Sixty-two patients underwent CEM between December 2023 and June 2025. Lesions showing enhancement were biopsied, while non-enhancing ones were followed for stability. Histopathology served as the reference standard. Diagnostic performance was assessed using standard statistical methods, including sensitivity, specificity, Fisher’s exact test, and ROC analysis. Among 62 lesions, 34 were enhanced on CEM; 30 of these (88.2%) were malignant, whereas 25 of 28 non-enhancing lesions (89.3%) were benign (p < 0.001). CEM demonstrated a sensitivity of 90.9%, specificity of 86.2%, and diagnostic accuracy of 88.7%. Interobserver agreement was substantial (κ = 0.76, p < 0.001). Enhancement on recombined CEM images was strongly associated with malignancy. These findings confirm that CEM provides excellent diagnostic performance in the surveillance of women with PHBC, effectively distinguishing benign from malignant postoperative changes. CEM may serve as a practical and accessible alternative to magnetic resonance imaging, particularly in patients with contraindications or where it is unavailable. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

15 pages, 4297 KB  
Article
Camera-in-the-Loop Realization of Direct Search with Random Trajectory Method for Binary-Phase Computer-Generated Hologram Optimization
by Evgenii Yu. Zlokazov, Rostislav S. Starikov, Pavel A. Cheremkhin and Timur Z. Minikhanov
J. Imaging 2025, 11(12), 434; https://doi.org/10.3390/jimaging11120434 - 5 Dec 2025
Viewed by 340
Abstract
High-speed realization of computer-generated holograms (CGHs) is a crucial problem in the field of modern 3D visualization and optical image processing system development. Binary CGHs can be realized using high-resolution, high-speed spatial light modulators such as ferroelectric liquid crystals on silicon devices or [...] Read more.
High-speed realization of computer-generated holograms (CGHs) is a crucial problem in the field of modern 3D visualization and optical image processing system development. Binary CGHs can be realized using high-resolution, high-speed spatial light modulators such as ferroelectric liquid crystals on silicon devices or digital micro-mirror devices providing the high throughput of optoelectronic systems. However, the quality of holographic images restored by binary CGHs often suffers from distortions, background noise, and speckle noise caused by the limitations and imperfections of optical system components. The present manuscript introduces a method based on the optimization of CGH models directly in the optical system with a camera-in-the-loop configuration using effective direct search with a random trajectory algorithm. The method was experimentally verified. The results demonstrate a significant enhancement in the quality of the holographic images optically restored by binary-phase CGH models optimized through this method compared to purely digitally generated models. Full article
(This article belongs to the Section Mixed, Augmented and Virtual Reality)
Show Figures

Figure 1

26 pages, 3269 KB  
Article
DiagNeXt: A Two-Stage Attention-Guided ConvNeXt Framework for Kidney Pathology Segmentation and Classification
by Hilal Tekin, Şafak Kılıç and Yahya Doğan
J. Imaging 2025, 11(12), 433; https://doi.org/10.3390/jimaging11120433 - 4 Dec 2025
Viewed by 402
Abstract
Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these [...] Read more.
Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these challenges through an integrated use of attention-enhanced ConvNeXt architectures for both segmentation and classification. In the first stage, DiagNeXt-Seg employs a U-Net-based design incorporating Enhanced Convolutional Blocks (ECBs) with spatial attention gates and Atrous Spatial Pyramid Pooling (ASPP) to achieve precise multi-class kidney segmentation. In the second stage, DiagNeXt-Cls utilizes the segmented regions of interest (ROIs) for pathology classification through a hierarchical multi-resolution strategy enhanced by Context-Aware Feature Fusion (CAFF) and Evidential Deep Learning (EDL) for uncertainty estimation. The main contributions of this work include: (1) enhanced ConvNeXt blocks with large-kernel depthwise convolutions optimized for 3D medical imaging, (2) a boundary-aware compound loss combining Dice, cross-entropy, focal, and distance transform terms to improve segmentation precision, (3) attention-guided skip connections preserving fine-grained spatial details, (4) hierarchical multi-scale feature modeling for robust pathology recognition, and (5) a confidence-modulated classification approach integrating segmentation quality metrics for reliable decision-making. Extensive experiments on a large kidney CT dataset comprising 3847 patients demonstrate that DiagNeXt achieves 98.9% classification accuracy, outperforming state-of-the-art approaches by 6.8%. The framework attains near-perfect AUC scores across all pathology classes (Normal: 1.000, Tumor: 1.000, Cyst: 0.999, Stone: 0.994) while offering clinically interpretable uncertainty maps and attention visualizations. The superior diagnostic accuracy, computational efficiency (6.2× faster inference), and interpretability of DiagNeXt make it a strong candidate for real-world integration into clinical kidney disease diagnosis and treatment planning systems. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

16 pages, 87659 KB  
Article
UAV-TIRVis: A Benchmark Dataset for Thermal–Visible Image Registration from Aerial Platforms
by Costin-Emanuel Vasile, Călin Bîră and Radu Hobincu
J. Imaging 2025, 11(12), 432; https://doi.org/10.3390/jimaging11120432 - 4 Dec 2025
Viewed by 511
Abstract
Registering UAV-based thermal and visible images is a challenging task due to differences in appearance across spectra and the lack of public benchmarks. To address this issue, we introduce UAV-TIRVis, a dataset consisting of 80 accurately and manually registered UAV-based thermal (640 × [...] Read more.
Registering UAV-based thermal and visible images is a challenging task due to differences in appearance across spectra and the lack of public benchmarks. To address this issue, we introduce UAV-TIRVis, a dataset consisting of 80 accurately and manually registered UAV-based thermal (640 × 512) and visible (4K) image pairs, captured across diverse environments. We benchmark our dataset using well-known registration methods, including feature-based (ORB, SURF, SIFT, KAZE), correlation-based, and intensity-based methods, as well as a custom, heuristic intensity-based method. We evaluate the performance of these methods using four metrics: RMSE, PSNR, SSIM, and NCC, averaged per scenario and across the entire dataset. The results show that conventional methods often fail to generalize across scenes, yielding <0.6 NCC on average, whereas the heuristic method shows that it is possible to achieve 0.77 SSIM and 0.82 NCC, highlighting the difficulty of cross-spectral UAV alignment and the need for further research to improve optimization in existing registration methods. Full article
Show Figures

Figure 1

24 pages, 1075 KB  
Article
Hybrid AI Pipeline for Laboratory Detection of Internal Potato Defects Using 2D RGB Imaging
by Slim Hamdi, Kais Loukil, Adem Haj Boubaker, Hichem Snoussi and Mohamed Abid
J. Imaging 2025, 11(12), 431; https://doi.org/10.3390/jimaging11120431 - 3 Dec 2025
Viewed by 366
Abstract
The internal quality assessment of potato tubers is a crucial task in agro-laboratory processing. Traditional methods struggle to detect internal defects such as hollow heart, internal bruises, and insect galleries using only surface features. We present a novel, fully modular hybrid AI architecture [...] Read more.
The internal quality assessment of potato tubers is a crucial task in agro-laboratory processing. Traditional methods struggle to detect internal defects such as hollow heart, internal bruises, and insect galleries using only surface features. We present a novel, fully modular hybrid AI architecture designed for defect detection using RGB images of potato slices, suitable for integration in laboratory. Our pipeline combines high-recall multi-threshold YOLO detection, contextual patch validation using ResNet, precise segmentation via the Segment Anything Model (SAM), and skin-contact analysis using VGG16 with a Random Forest classifier. Experimental results on a labeled dataset of over 6000 annotated instances show a recall above 95% and precision near 97.2% for most defect classes. The approach offers both robustness and interpretability, outperforming previous methods that rely on costly hyperspectral or MRI techniques. This system is scalable, explainable, and compatible with existing 2D imaging hardware. Full article
(This article belongs to the Special Issue Imaging Applications in Agriculture)
Show Figures

Figure 1

7 pages, 194 KB  
Editorial
Editorial on the Special Issue “Image and Video Processing for Blind and Visually Impaired”
by Zhigang Zhu, John-Ross Rizzo and Hao Tang
J. Imaging 2025, 11(12), 430; https://doi.org/10.3390/jimaging11120430 - 3 Dec 2025
Viewed by 278
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
21 pages, 19742 KB  
Article
How Good Is the Machine at the Imitation Game? On Stylistic Characteristics of AI-Generated Images
by Adrien Deliège, Jeanne Marlot, Marc Van Droogenbroeck and Maria Giulia Dondero
J. Imaging 2025, 11(12), 429; https://doi.org/10.3390/jimaging11120429 - 2 Dec 2025
Viewed by 388
Abstract
Text-to-image generative models can be used to imitate historical artistic styles, but their effectiveness in doing so remains unclear. In this work, we propose an evaluation framework that leverages expert knowledge from art history and visual semiotics and combines it with quantitative analysis [...] Read more.
Text-to-image generative models can be used to imitate historical artistic styles, but their effectiveness in doing so remains unclear. In this work, we propose an evaluation framework that leverages expert knowledge from art history and visual semiotics and combines it with quantitative analysis to assess stylistic fidelity. Three experts rated both historical artwork production and images generated with Midjourney v6 for five major movements (Abstract Art, Cubism, Expressionism, Impressionism, Surrealism) and ten associated painters (male and female pairs), using nine visual criteria grounded in Greimas’s plastic categories and Wölfflin’s stylistic oppositions. Ratings were expressed as 95% intervals on continuous 0–100 scales and compared using our Relative Ratings Map (RRMap), which summarizes relative shifts, relative dispersion, and distributional overlap (via the Bhattacharyya coefficient). They were also discretized in four quality ratings (bad, stereotype, fair, excellent). The results show strong inter-expert variability and more moderate intra-expert effects tied to movements, criteria, criterion groups and modalities. Experts tend to agree that the model sometimes aligns with historical trends but also sometimes produces stereotyped versions of a movement or painter, or even completely missed its target, although no unanimous consensus emerges. We conclude that evaluating generative models requires both expert-driven interpretation and quantitative tools, and that stylistic fidelity is hard to quantify even with a rigorous framework. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

30 pages, 2266 KB  
Article
How Safe Are Oxygen–Ozone Therapy Procedures for Spine Disc Herniation? The SIOOT Protocols for Treating Spine Disorders
by Marianno Franzini, Salvatore Chirumbolo, Francesco Vaiano, Luigi Valdenassi, Francesca Giannetti, Marianna Chierchia, Umberto Tirelli, Paolo Bonacina, Gianluca Poggi, Aniello Langella, Edoardo Maria Pieracci, Christian Giannetti and Roberto Antonio Giannetti
J. Imaging 2025, 11(12), 428; https://doi.org/10.3390/jimaging11120428 - 1 Dec 2025
Viewed by 1402
Abstract
Oxygen–ozone (O2–O3) therapy is widely used for treating lumbar disc herniation. However, controversy remains regarding the safest and most effective route of administration. While intradiscal injection is purported to show clinical efficacy, it has also been associated with serious [...] Read more.
Oxygen–ozone (O2–O3) therapy is widely used for treating lumbar disc herniation. However, controversy remains regarding the safest and most effective route of administration. While intradiscal injection is purported to show clinical efficacy, it has also been associated with serious complications. In contrast, the intramuscular route can exhibit a more favourable safety profile and comparable pain outcomes, suggesting its potential as a safer alternative in selected patient populations. This mixed-method study combined computed tomography (CT) imaging, biophysical diffusion modelling, and a meta-analysis of clinical trials to evaluate whether intramuscular O2–O3 therapy can achieve disc penetration and therapeutic efficacy comparable to intradiscal nucleolysis, while minimizing procedural risk. Literature searches across PubMed, Scopus, and Cochrane databases identified seven eligible studies (four randomized controlled trials and three cohort studies), encompassing a total of 120 patients. Statistical analyses included Hedges’ g, odds ratios, and number needed to harm (NNH). CT imaging demonstrated gas migration into the intervertebral disc within minutes after intramuscular injection, confirming the plausibility of diffusion through annular micro-fissures. The meta-analysis revealed substantial pain reduction with intramuscular therapy (Hedges’ g = −1.55) and very high efficacy with intradiscal treatment (g = 2.87), though the latter was associated with significantly greater heterogeneity and higher complication rates. The relative risk of severe adverse events was 6.57 times higher for intradiscal procedures (NNH ≈ 1180). O2–O3 therapy offers a biologically plausible, safer, and effective alternative to intradiscal injection, supporting its adoption as a first-line, minimally invasive strategy for managing lumbar disc herniation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 2524 KB  
Article
Brain Tumour Classification Model Based on Spatial Block–Residual Block Collaborative Architecture with Strip Pooling Feature Fusion
by Meilan Tang, Xinlian Zhou and Zhiyong Li
J. Imaging 2025, 11(12), 427; https://doi.org/10.3390/jimaging11120427 - 29 Nov 2025
Viewed by 302
Abstract
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block–residual block cooperative [...] Read more.
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block–residual block cooperative architecture with striped pooling feature fusion to achieve multi-scale feature representation without requiring tumor masks. The model extracts fine-grained morphological features through three shallow VGG spatial blocks while capturing global contextual information between tumors and surrounding tissues via four deep ResNet residual blocks. Residual connections mitigate gradient vanishing. To effectively fuse multi-level features, strip pooling modules are introduced after the third spatial block and fourth residual block, enabling cross-layer feature integration—particularly optimizing representation of irregular tumor regions. The fused features undergo cross-scale concatenation, integrating both spatial perception and semantic information, and are ultimately classified via an end-to-end Softmax classifier. Experimental results demonstrate that the model achieves an accuracy of 97.29% in brain tumor image classification tasks, significantly outperforming traditional convolutional neural networks. This validates its effectiveness in achieving high-precision, multi-scale feature learning and classification without brain tumor masks, holding potential clinical application value. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

22 pages, 18974 KB  
Article
Lightweight 3D CNN for MRI Analysis in Alzheimer’s Disease: Balancing Accuracy and Efficiency
by Kerang Cao, Zhongqing Lu, Chengkui Zhao, Jiaming Du, Lele Li, Hoekyung Jung and Minghui Geng
J. Imaging 2025, 11(12), 426; https://doi.org/10.3390/jimaging11120426 - 28 Nov 2025
Viewed by 687
Abstract
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder characterized by subtle structural changes in the brain, which can be observed through MRI scans. Although traditional diagnostic approaches rely on clinical and neuropsychological assessments, deep learning-based methods such as 3D convolutional neural networks (CNNs) [...] Read more.
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder characterized by subtle structural changes in the brain, which can be observed through MRI scans. Although traditional diagnostic approaches rely on clinical and neuropsychological assessments, deep learning-based methods such as 3D convolutional neural networks (CNNs) have recently been introduced to improve diagnostic accuracy. However, their high computational complexity remains a challenge. To address this, we propose a lightweight magnetic resonance imaging (MRI) classification framework that integrates adaptive multi-scale feature extraction with structural pruning and parameter optimization. The pruned model achieving a compact architecture with approximately 490k parameters (0.49 million), 4.39 billion floating-point operations, and a model size of 1.9 MB, while maintaining high classification performance across three binary tasks. The proposed framework was evaluated on the Alzheimer’s Disease Neuroimaging Initiative dataset, a widely used benchmark for AD research. Notably, the model achieves a performance density(PD) of 189.87, where PD is a custom efficiency metric defined as the classification accuracy per million parameters (% pm), which is approximately 70× higher than the basemodel, reflecting its balance between accuracy and computational efficiency. Experimental results demonstrate that the proposed framework significantly reduces resource consumption without compromising diagnostic performance, providing a practical foundation for real-time and resource-constrained clinical applications in Alzheimer’s disease detection. Full article
(This article belongs to the Special Issue AI-Driven Image and Video Understanding)
Show Figures

Figure 1

29 pages, 1924 KB  
Article
VT-MFLV: Vision–Text Multimodal Feature Learning V Network for Medical Image Segmentation
by Wenju Wang, Jiaqi Li, Zinuo Ye, Yuyang Cai, Zhen Wang and Renwei Zhang
J. Imaging 2025, 11(12), 425; https://doi.org/10.3390/jimaging11120425 - 28 Nov 2025
Viewed by 286
Abstract
Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network [...] Read more.
Currently, existing multimodal segmentation methods face limitations in effectively leveraging medical text to guide visual feature learning. They often suffer from insufficient multimodal fusion and inadequate accuracy in fine-grained lesion segmentation accuracy. To address these challenges, the Vision–Text Multimodal Feature Learning V Network (VT-MFLV) is proposed. This model exploits the complementarity between medical images and text to enhance multimodal fusion, which consequently improves critical lesion recognition accuracy. VT-MFLV introduces three key modules: Diagnostic Image–Text Residual Multi-Head Semantic Encoding (DIT-RMHSE) module that preserves critical semantic cues while reducing preprocessing complexity; Fine-Grained Multimodal Fusion Local Attention Encoding (FG-MFLA) module that strengthens local cross-modal interaction; and Adaptive Global Feature Compression and Focusing (AGCF) module that emphasizes clinically relevant lesion regions. Experiments are conducted on two publicly available pulmonary infection datasets. On the MosMedData dataset, VT-MFLV achieved Dice and mIoU scores of 75.61 ± 0.32% and 63.98 ± 0.29%. On the QaTa-COV1 dataset, VT-MFLV achieved Dice and mIoU scores of 83.34 ± 0.36% and 72.09 ± 0.30%, both reaching world-leading levels. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop