MDPI - Publisher of Open Access Journals

25 pages, 4145 KiB

Open AccessArticle

Advancing Early Blight Detection in Potato Leaves Through ZeroShot Learning

by Muhammad Shoaib Farooq, Ayesha Kamran, Syed Atir Raza, Muhammad Farooq Wasiq, Bilal Hassan and Nitsa J. Herzog

J. Imaging 2025, 11(8), 256; https://doi.org/10.3390/jimaging11080256 (registering DOI) - 31 Jul 2025

Abstract

Potatoes are one of the world’s most widely cultivated crops, but their yield is coming under mounting pressure from early blight, a fungal disease caused by Alternaria solani. Early detection and accurate identification are key to effective disease management and yield protection. [...] Read more.

Potatoes are one of the world’s most widely cultivated crops, but their yield is coming under mounting pressure from early blight, a fungal disease caused by Alternaria solani. Early detection and accurate identification are key to effective disease management and yield protection. This paper introduces a novel deep learning framework called ZeroShot CNN, which integrates convolutional neural networks (CNNs) and ZeroShot Learning (ZSL) for the efficient classification of seen and unseen disease classes. The model utilizes convolutional layers for feature extraction and employs semantic embedding techniques to identify previously untrained classes. Implemented on the Kaggle potato disease dataset, ZeroShot CNN achieved 98.50% accuracy for seen categories and 99.91% accuracy for unseen categories, outperforming conventional methods. The hybrid approach demonstrated superior generalization, providing a scalable, real-time solution for detecting agricultural diseases. The success of this solution validates the potential in harnessing deep learning and ZeroShot inference to transform plant pathology and crop protection practices. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

29 pages, 3731 KiB

Open AccessArticle

An Automated Method for Identifying Voids and Severe Loosening in GPR Images

by Ze Chai, Zicheng Wang, Zeshan Xu, Ziyu Feng and Yafeng Zhao

J. Imaging 2025, 11(8), 255; https://doi.org/10.3390/jimaging11080255 - 30 Jul 2025

Abstract

This paper proposes a novel automatic recognition method for distinguishing voids and severe loosening in road structures based on features of ground-penetrating radar (GPR) B-scan images. By analyzing differences in image texture, the intensity and clarity of top reflection interfaces, and the regularity [...] Read more.

This paper proposes a novel automatic recognition method for distinguishing voids and severe loosening in road structures based on features of ground-penetrating radar (GPR) B-scan images. By analyzing differences in image texture, the intensity and clarity of top reflection interfaces, and the regularity of internal waveforms, a set of discriminative features is constructed. Based on these features, we develop the FKS-GPR dataset, a high-quality, manually annotated GPR dataset collected from real road environments, covering diverse and complex background conditions. Compared to datasets based on simulations, FKS-GPR offers higher practical relevance. An improved ACF-YOLO network is then designed for automatic detection, and the experimental results show that the proposed method achieves superior accuracy and robustness, validating its effectiveness and engineering applicability. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

36 pages, 4309 KiB

Open AccessReview

Deep Learning Techniques for Prostate Cancer Analysis and Detection: Survey of the State of the Art

by Olushola Olawuyi and Serestina Viriri

J. Imaging 2025, 11(8), 254; https://doi.org/10.3390/jimaging11080254 - 28 Jul 2025

Viewed by 243

Abstract

The human interpretation of medical images, especially for the detection of cancer in the prostate, has traditionally been a time-consuming and challenging process. Manual examination for the detection of prostate cancer is not only time-consuming but also prone to errors, carrying the risk [...] Read more.

The human interpretation of medical images, especially for the detection of cancer in the prostate, has traditionally been a time-consuming and challenging process. Manual examination for the detection of prostate cancer is not only time-consuming but also prone to errors, carrying the risk of an excess biopsy due to the inherent limitations of human visual interpretation. With the technical advancements and rapid growth of computer resources, machine learning (ML) and deep learning (DL) models have been experimentally used for medical image analysis, particularly in lesion detection. However, several state-of-the-art models have shown promising results. There are still challenges when analysing prostate lesion images due to the distinctive and complex nature of medical images. This study offers an elaborate review of the techniques that are used to diagnose prostate cancer using medical images. The goal is to provide a comprehensive and valuable resource that helps researchers develop accurate and autonomous models for effectively detecting prostate cancer. This paper is structured as follows: First, we outline the issues with prostate lesion detection. We then review the methods for analysing prostate lesion images and classification approaches. We then examine convolutional neural network (CNN) architectures and explore their applications in deep learning (DL) for image-based prostate cancer diagnosis. Finally, we provide an overview of prostate cancer datasets and evaluation metrics in deep learning. In conclusion, this review analyses key findings, highlights the challenges in prostate lesion detection, and evaluates the effectiveness and limitations of current deep learning techniques. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

18 pages, 7213 KiB

Open AccessArticle

DFCNet: Dual-Stage Frequency-Domain Calibration Network for Low-Light Image Enhancement

by Hui Zhou, Jun Li, Yaming Mao, Lu Liu and Yiyang Lu

J. Imaging 2025, 11(8), 253; https://doi.org/10.3390/jimaging11080253 - 28 Jul 2025

Viewed by 117

Abstract

Imaging technologies are widely used in surveillance, medical diagnostics, and other critical applications. However, under low-light conditions, captured images often suffer from insufficient brightness, blurred details, and excessive noise, degrading quality and hindering downstream tasks. Conventional low-light image enhancement (LLIE) methods not only [...] Read more.

Imaging technologies are widely used in surveillance, medical diagnostics, and other critical applications. However, under low-light conditions, captured images often suffer from insufficient brightness, blurred details, and excessive noise, degrading quality and hindering downstream tasks. Conventional low-light image enhancement (LLIE) methods not only require annotated data but also often involve heavy models with high computational costs, making them unsuitable for real-time processing. To tackle these challenges, a lightweight and unsupervised LLIE method utilizing a dual-stage frequency-domain calibration network (DFCNet) is proposed. In the first stage, the input image undergoes the preliminary feature modulation (PFM) module to guide the illumination estimation (IE) module in generating a more accurate illumination map. The final enhanced image is obtained by dividing the input by the estimated illumination map. The second stage is used only during training. It applies a frequency-domain residual calibration (FRC) module to the first-stage output, generating a calibration term that is added to the original input to darken dark regions and brighten bright areas. This updated input is then fed back to the PFM and IE modules for parameter optimization. Extensive experiments on benchmark datasets demonstrate that DFCNet achieves superior performance across multiple image quality metrics while delivering visually clearer and more natural results. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

47 pages, 18189 KiB

Open AccessArticle

Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures

by Zineb Sordo, Eric Chagnon, Zixi Hu, Jeffrey J. Donatelli, Peter Andeer, Peter S. Nico, Trent Northen and Daniela Ushizima

J. Imaging 2025, 11(8), 252; https://doi.org/10.3390/jimaging11080252 - 26 Jul 2025

Viewed by 244

Abstract

Generative AI (genAI) has emerged as a powerful tool for synthesizing diverse and complex image data, offering new possibilities for scientific imaging applications. This review presents a comprehensive comparative analysis of leading generative architectures, ranging from Variational Autoencoders (VAEs) to Generative Adversarial Networks [...] Read more.

Generative AI (genAI) has emerged as a powerful tool for synthesizing diverse and complex image data, offering new possibilities for scientific imaging applications. This review presents a comprehensive comparative analysis of leading generative architectures, ranging from Variational Autoencoders (VAEs) to Generative Adversarial Networks (GANs) on through to Diffusion Models, in the context of scientific image synthesis. We examine each model’s foundational principles, recent architectural advancements, and practical trade-offs. Our evaluation, conducted on domain-specific datasets including microCT scans of rocks and composite fibers, as well as high-resolution images of plant roots, integrates both quantitative metrics (SSIM, LPIPS, FID, CLIPScore) and expert-driven qualitative assessments. Results show that GANs, particularly StyleGAN, produce images with high perceptual quality and structural coherence. Diffusion-based models for inpainting and image variation, such as DALL-E 2, delivered high realism and semantic alignment but generally struggled in balancing visual fidelity with scientific accuracy. Importantly, our findings reveal limitations of standard quantitative metrics in capturing scientific relevance, underscoring the need for domain-expert validation. We conclude by discussing key challenges such as model interpretability, computational cost, and verification protocols, and discuss future directions where generative AI can drive innovation in data augmentation, simulation, and hypothesis generation in scientific research. Full article

(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)

► Show Figures

Graphical abstract

30 pages, 3451 KiB

Open AccessArticle

Integrating Google Maps and Smooth Street View Videos for Route Planning

by Federica Massimi, Antonio Tedeschi, Kalapraveen Bagadi and Francesco Benedetto

J. Imaging 2025, 11(8), 251; https://doi.org/10.3390/jimaging11080251 - 25 Jul 2025

Viewed by 265

Abstract

This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach [...] Read more.

This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach to route analysis, issues related to insufficient street view images, and the lack of proper image mapping for desired roads remain unaddressed by current applications, which are predominantly client-based. In response, we propose an innovative automatic system designed to generate videos depicting road routes between two geographic locations. The system calculates and presents the route conventionally, emphasizing the path on a two-dimensional representation, and in a multimedia format. A prototype is developed based on a cloud-based client–server architecture, featuring three core modules: frames acquisition, frames analysis and elaboration, and the persistence of metadata information and computed videos. The tests, encompassing both real-world and synthetic scenarios, have produced promising results, showcasing the efficiency of our system. By providing users with a real and immersive understanding of requested routes, our approach fills a crucial gap in existing navigation solutions. This research contributes to the advancement of route planning technologies, offering a comprehensive and user-friendly system that leverages cloud computing and multimedia visualization for an enhanced navigation experience. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 1282 KiB

Open AccessArticle

The Role of Radiomic Analysis and Different Machine Learning Models in Prostate Cancer Diagnosis

by Eleni Bekou, Ioannis Seimenis, Athanasios Tsochatzis, Karafyllia Tziagkana, Nikolaos Kelekis, Savas Deftereos, Nikolaos Courcoutsakis, Michael I. Koukourakis and Efstratios Karavasilis

J. Imaging 2025, 11(8), 250; https://doi.org/10.3390/jimaging11080250 - 23 Jul 2025

Viewed by 259

Abstract

Prostate cancer (PCa) is the most common malignancy in men. Precise grading is crucial for the effective treatment approaches of PCa. Machine learning (ML) applied to biparametric Magnetic Resonance Imaging (bpMRI) radiomics holds promise for improving PCa diagnosis and prognosis. This study investigated [...] Read more.

Prostate cancer (PCa) is the most common malignancy in men. Precise grading is crucial for the effective treatment approaches of PCa. Machine learning (ML) applied to biparametric Magnetic Resonance Imaging (bpMRI) radiomics holds promise for improving PCa diagnosis and prognosis. This study investigated the efficiency of seven ML models to diagnose the different PCa grades, changing the input variables. Our studied sample comprised 214 men who underwent bpMRI in different imaging centers. Seven ML algorithms were compared using radiomic features extracted from T2-weighted (T2W) and diffusion-weighted (DWI) MRI, with and without the inclusion of Prostate-Specific Antigen (PSA) values. The performance of the models was evaluated using the receiver operating characteristic curve analysis. The models’ performance was strongly dependent on the input parameters. Radiomic features derived from T2WI and DWI, whether used independently or in combination, demonstrated limited clinical utility, with AUC values ranging from 0.703 to 0.807. However, incorporating the PSA index significantly improved the models’ efficiency, regardless of lesion location or degree of malignancy, resulting in AUC values ranging from 0.784 to 1.00. There is evidence that ML methods, in combination with radiomic analysis, can contribute to solving differential diagnostic problems of prostate cancers. Also, optimization of the analysis method is critical, according to the results of our study. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

26 pages, 11237 KiB

Open AccessArticle

Reclassification Scheme for Image Analysis in GRASS GIS Using Gradient Boosting Algorithm: A Case of Djibouti, East Africa

by Polina Lemenkova

J. Imaging 2025, 11(8), 249; https://doi.org/10.3390/jimaging11080249 - 23 Jul 2025

Viewed by 370

Abstract

Image analysis is a valuable approach in a wide array of environmental applications. Mapping land cover categories depicted from satellite images enables the monitoring of landscape dynamics. Such a technique plays a key role for land management and predictive ecosystem modelling. Satellite-based mapping [...] Read more.

Image analysis is a valuable approach in a wide array of environmental applications. Mapping land cover categories depicted from satellite images enables the monitoring of landscape dynamics. Such a technique plays a key role for land management and predictive ecosystem modelling. Satellite-based mapping of environmental dynamics enables us to define factors that trigger these processes and are crucial for our understanding of Earth system processes. In this study, a reclassification scheme of image analysis was developed for mapping the adjusted categorisation of land cover types using multispectral remote sensing datasets and Geographic Resources Analysis Support System (GRASS) Geographic Information System (GIS) software. The data included four Landsat 8–9 satellite images on 2015, 2019, 2021 and 2023. The sequence of time series was used to determine land cover dynamics. The classification scheme consisting of 17 initial land cover classes was employed by logical workflow to extract 10 key land cover types of the coastal areas of Bab-el-Mandeb Strait, southern Red Sea. Special attention is placed to identify changes in the land categories regarding the thermal saline lake, Lake Assal, with fluctuating salinity and water levels. The methodology included the use of machine learning (ML) image analysis GRASS GIS modules ‘r.reclass’ for the reclassification of a raster map based on category values. Other modules included ‘r.random’, ‘r.learn.train’ and ‘r.learn.predict’ for gradient boosting ML classifier and ‘i.cluster’ and ‘i.maxlik’ for clustering and maximum-likelihood discriminant analysis. To reveal changes in the land cover categories around the Lake of Assal, this study uses ML and reclassification methods for image analysis. Auxiliary modules included ‘i.group’, ‘r.import’ and other GRASS GIS scripting techniques applied to Landsat image processing and for the identification of land cover variables. The results of image processing demonstrated annual fluctuations in the landscapes around the saline lake and changes in semi-arid and desert land cover types over Djibouti. The increase in the extent of semi-desert areas and the decrease in natural vegetation proved the processes of desertification of the arid environment in Djibouti caused by climate effects. The developed land cover maps provided information for assessing spatial–temporal changes in Djibouti. The proposed ML-based methodology using GRASS GIS can be employed for integrating techniques of image analysis for land management in other arid regions of Africa. Full article

(This article belongs to the Special Issue Self-Supervised Learning for Image Processing and Analysis)

► Show Figures

Figure 1

18 pages, 33092 KiB

Open AccessArticle

Yarn Color Measurement Method Based on Digital Photography

by Jinxing Liang, Guanghao Wu, Ke Yang, Jiangxiaotian Ma, Jihao Wang, Hang Luo, Xinrong Hu and Yong Liu

J. Imaging 2025, 11(8), 248; https://doi.org/10.3390/jimaging11080248 - 22 Jul 2025

Viewed by 218

Abstract

To overcome the complexity of yarn color measurement using spectrophotometry with yarn winding techniques and to enhance consistency with human visual perception, a yarn color measurement method based on digital photography is proposed. This study employs a photographic colorimetry system to capture digital [...] Read more.

To overcome the complexity of yarn color measurement using spectrophotometry with yarn winding techniques and to enhance consistency with human visual perception, a yarn color measurement method based on digital photography is proposed. This study employs a photographic colorimetry system to capture digital images of single yarns. The yarn and background are segmented using the K-means clustering algorithm, and the centerline of the yarn is extracted using a skeletonization algorithm. Spectral reconstruction and colorimetric principles are then applied to calculate the color values of pixels along the centerline. Considering the nonlinear characteristics of human brightness perception, the final yarn color is obtained through a nonlinear texture-adaptive weighted computation. The method is validated through psychophysical experiments using six yarns of different colors and compared with spectrophotometry and five other photographic measurement methods. Results indicate that among the seven yarn color measurement methods, including spectrophotometry, the proposed method—based on centerline extraction and nonlinear texture-adaptive weighting—yields results that more closely align with actual visual perception. Furthermore, among the six photographic measurement methods, the proposed method produces most similar to those obtained using spectrophotometry. This study demonstrates the inconsistency between spectrophotometric measurements and human visual perception of yarn color and provides methodological support for developing visually consistent color measurement methods for textured textiles. Full article

(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)

► Show Figures

Figure 1

24 pages, 8015 KiB

Open AccessArticle

Innovative Multi-View Strategies for AI-Assisted Breast Cancer Detection in Mammography

by Beibit Abdikenov, Tomiris Zhaksylyk, Aruzhan Imasheva, Yerzhan Orazayev and Temirlan Karibekov

J. Imaging 2025, 11(8), 247; https://doi.org/10.3390/jimaging11080247 - 22 Jul 2025

Viewed by 403

Abstract

Mammography is the main method for early detection of breast cancer, which is still a major global health concern. However, inter-reader variability and the inherent difficulty of interpreting subtle radiographic features frequently limit the accuracy of diagnosis. A thorough assessment of deep convolutional [...] Read more.

Mammography is the main method for early detection of breast cancer, which is still a major global health concern. However, inter-reader variability and the inherent difficulty of interpreting subtle radiographic features frequently limit the accuracy of diagnosis. A thorough assessment of deep convolutional neural networks (CNNs) for automated mammogram classification is presented in this work, along with the introduction of two innovative multi-view integration techniques: Dual-Branch Ensemble (DBE) and Merged Dual-View (MDV). By setting aside two datasets for out-of-sample testing, we evaluate the generalizability of the model using six different mammography datasets that represent various populations and imaging systems. We compare a number of cutting-edge architectures on both individual and combined datasets, including ResNet, DenseNet, EfficientNet, MobileNet, Vision Transformers, and VGG19. Both MDV and DBE strategies improve classification performance, according to experimental results. VGG19 and DenseNet both obtained high ROC AUC scores of 0.9051 and 0.7960 under the MDV approach. DenseNet demonstrated strong performance in the DBE setting, achieving a ROC AUC of 0.8033, while ResNet50 recorded a ROC AUC of 0.8042. These enhancements demonstrate how beneficial multi-view fusion is for boosting model robustness. The impact of domain shift is further highlighted by generalization tests, which emphasize the need for diverse datasets in training. These results offer practical advice for improving CNN architectures and integration tactics, which will aid in the creation of trustworthy, broadly applicable AI-assisted breast cancer screening tools. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Graphical abstract

14 pages, 2370 KiB

Open AccessArticle

DP-AMF: Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion for Single-View 3D Reconstruction

by Luoxi Zhang, Chun Xie and Itaru Kitahara

J. Imaging 2025, 11(7), 246; https://doi.org/10.3390/jimaging11070246 - 21 Jul 2025

Viewed by 273

Abstract

Single-view 3D reconstruction remains fundamentally ill-posed, as a single RGB image lacks scale and depth cues, often yielding ambiguous results under occlusion or in texture-poor regions. We propose DP-AMF, a novel Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion framework that integrates high-fidelity depth priors—generated [...] Read more.

Single-view 3D reconstruction remains fundamentally ill-posed, as a single RGB image lacks scale and depth cues, often yielding ambiguous results under occlusion or in texture-poor regions. We propose DP-AMF, a novel Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion framework that integrates high-fidelity depth priors—generated offline by the MARIGOLD diffusion-based estimator and cached to avoid extra training cost—with hierarchical local features from ResNet-32/ResNet-18 and semantic global features from DINO-ViT. A learnable fusion module dynamically adjusts per-channel weights to balance these modalities according to local texture and occlusion, and an implicit signed-distance field decoder reconstructs the final mesh. Extensive experiments on 3D-FRONT and Pix3D demonstrate that DP-AMF reduces Chamfer Distance by 7.64%, increases F-Score by 2.81%, and boosts Normal Consistency by 5.88% compared to strong baselines, while qualitative results show sharper edges and more complete geometry in challenging scenes. DP-AMF achieves these gains without substantially increasing model size or inference time, offering a robust and effective solution for complex single-view reconstruction tasks. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

11 pages, 1106 KiB

Open AccessReview

Three-Dimensional Ultraviolet Fluorescence Imaging in Cultural Heritage: A Review of Applications in Multi-Material Artworks

by Luca Lanteri, Claudia Pelosi and Paola Pogliani

J. Imaging 2025, 11(7), 245; https://doi.org/10.3390/jimaging11070245 - 21 Jul 2025

Viewed by 331

Abstract

Ultraviolet-induced fluorescence (UVF) imaging represents a simple but powerful technique in cultural heritage studies. It is a nondestructive and non-invasive imaging technique which can supply useful and relevant information to define the state of conservation of an artifact. UVF imaging also helps to [...] Read more.

Ultraviolet-induced fluorescence (UVF) imaging represents a simple but powerful technique in cultural heritage studies. It is a nondestructive and non-invasive imaging technique which can supply useful and relevant information to define the state of conservation of an artifact. UVF imaging also helps to establish the value of an artwork by indicating inpainting, repaired areas, grouting, etc. In general, ultraviolet fluorescence imaging output takes the form of 2D photographs in the case of both paintings and sculptures. For this reason, a few years ago the idea of applying the photogrammetric method to create 3D digital twins under ultraviolet fluorescence was developed to address the requirements of restorers who need daily documentation tools for their work that are simple to use and can display the entire 3D object in a single file. This review explores recent applications of this innovative method of ultraviolet fluorescence imaging with reference to the wider literature on the UVF technique to make evident the practical importance of its application in cultural heritage. Full article

(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)

► Show Figures

Figure 1

27 pages, 3888 KiB

Open AccessArticle

Deep Learning-Based Algorithm for the Classification of Left Ventricle Segments by Hypertrophy Severity

by Wafa Baccouch, Bilel Hasnaoui, Narjes Benameur, Abderrazak Jemai, Dhaker Lahidheb and Salam Labidi

J. Imaging 2025, 11(7), 244; https://doi.org/10.3390/jimaging11070244 - 20 Jul 2025

Viewed by 322

Abstract

In clinical practice, left ventricle hypertrophy (LVH) continues to pose a considerable challenge, highlighting the need for more reliable diagnostic approaches. This study aims to propose an automated framework for the quantification of LVH extent and the classification of myocardial segments according to [...] Read more.

In clinical practice, left ventricle hypertrophy (LVH) continues to pose a considerable challenge, highlighting the need for more reliable diagnostic approaches. This study aims to propose an automated framework for the quantification of LVH extent and the classification of myocardial segments according to hypertrophy severity using a deep learning-based algorithm. The proposed method was validated on 133 subjects, including both healthy individuals and patients with LVH. The process starts with automatic LV segmentation using U-Net and the segmentation of the left ventricle cavity based on the American Heart Association (AHA) standards, followed by the division of each segment into three equal sub-segments. Then, an automated quantification of regional wall thickness (RWT) was performed. Finally, a convolutional neural network (CNN) was developed to classify each myocardial sub-segment according to hypertrophy severity. The proposed approach demonstrates strong performance in contour segmentation, achieving a Dice Similarity Coefficient (DSC) of 98.47% and a Hausdorff Distance (HD) of 6.345 ± 3.5 mm. For thickness quantification, it reaches a minimal mean absolute error (MAE) of 1.01 ± 1.16. Regarding segment classification, it achieves competitive performance metrics compared to state-of-the-art methods with an accuracy of 98.19%, a precision of 98.27%, a recall of 99.13%, and an F1-score of 98.7%. The obtained results confirm the high performance of the proposed method and highlight its clinical utility in accurately assessing and classifying cardiac hypertrophy. This approach provides valuable insights that can guide clinical decision-making and improve patient management strategies. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

15 pages, 4874 KiB

Open AccessArticle

A Novel 3D Convolutional Neural Network-Based Deep Learning Model for Spatiotemporal Feature Mapping for Video Analysis: Feasibility Study for Gastrointestinal Endoscopic Video Classification

by Mrinal Kanti Dhar, Mou Deb, Poonguzhali Elangovan, Keerthy Gopalakrishnan, Divyanshi Sood, Avneet Kaur, Charmy Parikh, Swetha Rapolu, Gianeshwaree Alias Rachna Panjwani, Rabiah Aslam Ansari, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam

J. Imaging 2025, 11(7), 243; https://doi.org/10.3390/jimaging11070243 - 18 Jul 2025

Viewed by 396

Abstract

Accurate analysis of medical videos remains a major challenge in deep learning (DL) due to the need for effective spatiotemporal feature mapping that captures both spatial detail and temporal dynamics. Despite advances in DL, most existing models in medical AI focus on static [...] Read more.

Accurate analysis of medical videos remains a major challenge in deep learning (DL) due to the need for effective spatiotemporal feature mapping that captures both spatial detail and temporal dynamics. Despite advances in DL, most existing models in medical AI focus on static images, overlooking critical temporal cues present in video data. To bridge this gap, a novel DL-based framework is proposed for spatiotemporal feature extraction from medical video sequences. As a feasibility use case, this study focuses on gastrointestinal (GI) endoscopic video classification. A 3D convolutional neural network (CNN) is developed to classify upper and lower GI endoscopic videos using the hyperKvasir dataset, which contains 314 lower and 60 upper GI videos. To address data imbalance, 60 matched pairs of videos are randomly selected across 20 experimental runs. Videos are resized to 224 × 224, and the 3D CNN captures spatiotemporal information. A 3D version of the parallel spatial and channel squeeze-and-excitation (P-scSE) is implemented, and a new block called the residual with parallel attention (RPA) block is proposed by combining P-scSE3D with a residual block. To reduce computational complexity, a (2 + 1)D convolution is used in place of full 3D convolution. The model achieves an average accuracy of 0.933, precision of 0.932, recall of 0.944, F1-score of 0.935, and AUC of 0.933. It is also observed that the integration of P-scSE3D increased the F1-score by 7%. This preliminary work opens avenues for exploring various GI endoscopic video-based prospective studies. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)

► Show Figures

Figure 1

14 pages, 2426 KiB

Open AccessArticle

FakeMusicCaps: A Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models

by Luca Comanducci, Paolo Bestagini and Stefano Tubaro

J. Imaging 2025, 11(7), 242; https://doi.org/10.3390/jimaging11070242 - 18 Jul 2025

Viewed by 350

Abstract

Text-to-music (TTM) models have recently revolutionized the automatic music generation research field, specifically by being able to generate music that sounds more plausible than all previous state-of-the-art models and by lowering the technical proficiency needed to use them. For these reasons, they have [...] Read more.

Text-to-music (TTM) models have recently revolutionized the automatic music generation research field, specifically by being able to generate music that sounds more plausible than all previous state-of-the-art models and by lowering the technical proficiency needed to use them. For these reasons, they have readily started to be adopted for commercial uses and music production practices. This widespread diffusion of TTMs poses several concerns regarding copyright violation and rightful attribution, posing the need of serious consideration of them by the audio forensics community. In this paper, we tackle the problem of detection and attribution of TTM-generated data. We propose a dataset, FakeMusicCaps, that contains several versions of the music-caption pairs dataset MusicCaps regenerated via several state-of-the-art TTM techniques. We evaluate the proposed dataset by performing initial experiments regarding the detection and attribution of TTM-generated audio considering both closed-set and open-set classification. Full article

(This article belongs to the Special Issue Advancements in Deepfake Technology, Biometry System and Multimedia Forensics)

► Show Figures

Figure 1

Search Results (1,981)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Article Types

Countries / Regions

Search Results (1,981)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI