MDPI - Publisher of Open Access Journals

17 pages, 2501 KiB

Open AccessArticle

Weather-Resilient Localizing Ground-Penetrating Radar via Adaptive Spatio-Temporal Mask Alignment

by Yuwei Chen, Beizhen Bi, Pengyu Zhang, Liang Shen, Chaojian Chen, Xiaotao Huang and Tian Jin

Remote Sens. 2025, 17(16), 2854; https://doi.org/10.3390/rs17162854 (registering DOI) - 16 Aug 2025

Abstract

Localizing ground-penetrating radar (LGPR) benefits from deep subsurface coupling, ensuring robustness against surface variations and adverse weather. While LGPR is widely recognized as the complement of existing vehicle localization methods, its reliance on prior maps introduces significant challenges. Channel misalignment during traversal positioning [...] Read more.

Localizing ground-penetrating radar (LGPR) benefits from deep subsurface coupling, ensuring robustness against surface variations and adverse weather. While LGPR is widely recognized as the complement of existing vehicle localization methods, its reliance on prior maps introduces significant challenges. Channel misalignment during traversal positioning and time-dimension distortion caused by non-uniform platform motion degrade matching accuracy. Furthermore, rain and snow conditions induce subsurface water-content variations that distort ground-penetrating radar (GPR) echoes, further complicating the localization process. To address these issues, we propose a weather-resilient adaptive spatio-temporal mask alignment algorithm for LGPR. The method employs adaptive alignment and dynamic time warping (DTW) strategies to sequentially resolve channel and time-dimension misalignments in GPR sequences, followed by calibration of GPR query sequences. Moreover, a multi-level discrete wavelet transform (MDWT) module enhances low-frequency GPR features while adaptive alignment along the channel dimension refines the signals and significantly improves localization accuracy under rain or snow. Additionally, a local matching DTW algorithm is introduced to perform robust temporal image-sequence alignment. Extensive experiments were conducted on both public LGPR datasets: GROUNDED and self-collected data covering five challenging scenarios. The results demonstrate superior localization accuracy and robustness compared to existing methods. Full article

(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)

► Show Figures

Figure 1

24 pages, 5458 KiB

Open AccessArticle

Global Prior-Guided Distortion Representation Learning Network for Remote Sensing Image Blind Super-Resolution

by Guanwen Li, Ting Sun, Shijie Yu and Siyao Wu

Remote Sens. 2025, 17(16), 2830; https://doi.org/10.3390/rs17162830 - 14 Aug 2025

Abstract

Most existing deep learning-based super-resolution (SR) methods for remote sensing images rely on predefined degradation assumptions (e.g., bicubic downsampling). However, when real-world degradations deviate from these assumptions, their performance deteriorates significantly. Moreover, explicit degradation estimation approaches based on iterative schemes inevitably lead to [...] Read more.

Most existing deep learning-based super-resolution (SR) methods for remote sensing images rely on predefined degradation assumptions (e.g., bicubic downsampling). However, when real-world degradations deviate from these assumptions, their performance deteriorates significantly. Moreover, explicit degradation estimation approaches based on iterative schemes inevitably lead to accumulated estimation errors and time-consuming processes. In this paper, instead of explicitly estimating degradation types, we first innovatively introduce an MSCN_G coefficient to capture global prior information corresponding to different distortions. Subsequently, distortion-enhanced representations are implicitly estimated through contrastive learning and embedded into a super-resolution network equipped with multiple distortion decoders (D-Decoder). Furthermore, we propose a distortion-related channel segmentation (DCS) strategy that reduces the network’s parameters and computation (FLOPs). We refer to this Global Prior-guided Distortion-enhanced Representation Learning Network as GDRNet. Experiments on both synthetic and real-world remote sensing images demonstrate that our GDRNet outperforms state-of-the-art blind SR methods for remote sensing images in terms of overall performance. Under the experimental condition of anisotropic Gaussian blurring without added noise, with a kernel width of 1.2 and an upscaling factor of 4, the super-resolution reconstruction of remote sensing images on the NWPU-RESISC45 dataset achieves a PSNR of 28.98 dB and SSIM of 0.7656. Full article

(This article belongs to the Special Issue Advanced Applications of Artificial Intelligence in Remote Sensing Image Recognition)

► Show Figures

Figure 1

16 pages, 9189 KiB

Open AccessFeature PaperArticle

SEND: Semantic-Aware Deep Unfolded Network with Diffusion Prior for Multi-Modal Image Fusion and Object Detection

by Rong Zhang, Mao-Yi Xiong and Jun-Jie Huang

Mathematics 2025, 13(16), 2584; https://doi.org/10.3390/math13162584 - 12 Aug 2025

Viewed by 207

Abstract

Multi-modality image fusion (MIF) aims to integrate complementary information from diverse imaging modalities into a single comprehensive representation and serves as an essential processing step for downstream high-level computer vision tasks. The existing deep unfolding-based processes demonstrate promising results; however, they often rely [...] Read more.

Multi-modality image fusion (MIF) aims to integrate complementary information from diverse imaging modalities into a single comprehensive representation and serves as an essential processing step for downstream high-level computer vision tasks. The existing deep unfolding-based processes demonstrate promising results; however, they often rely on deterministic priors with limited generalization ability and usually decouple from the training process of object detection. In this paper, we propose Semantic-Aware Deep Unfolded Network with Diffusion Prior (SEND), a novel framework designed for transparent and effective multi-modality fusion and object detection. SEND consists of a Denoising Prior Guided Fusion Module and a Fusion Object Detection Module. The Denoising Prior Guided Fusion Module does not utilize the traditional deterministic prior but combines the diffusion prior with deep unfolding, leading to improved multi-modal fusion performance and generalization ability. It is designed with a model-based optimization formulation for multi-modal image fusion, which is unfolded into two cascaded blocks: a Diffusion Denoising Fusion Block to generate informative diffusion priors and a Data Consistency Enhancement Block that explicitly aggregates complementary features from both the diffusion priors and input modalities. Additionally, SEND incorporates the Fusion Object Detection Module with the Denoising Prior Guided Fusion Module for object detection task optimization using a carefully designed two-stage training strategy. Experiments demonstrate that the proposed SEND method outperforms state-of-the-art methods, achieving superior fusion quality with improved efficiency and interpretability. Full article

(This article belongs to the Topic Transformer and Deep Learning Applications in Image Processing)

► Show Figures

Figure 1

14 pages, 31941 KiB

Open AccessArticle

PriKMet: Prior-Guided Pointer Meter Reading for Automated Substation Inspections

by Haidong Chu, Jun Feng, Yidan Wang, Weizhen He, Yunfeng Yan and Donglian Qi

Electronics 2025, 14(16), 3194; https://doi.org/10.3390/electronics14163194 - 11 Aug 2025

Viewed by 288

Abstract

Despite the rapid advancement of smart-grid technologies, automated pointer meter reading in power substations remains a persistent challenge due to complex electromagnetic interference and dynamic field conditions. Traditional computer vision methods, typically designed for ideal imaging environments, exhibit limited robustness against real-world perturbations [...] Read more.

Despite the rapid advancement of smart-grid technologies, automated pointer meter reading in power substations remains a persistent challenge due to complex electromagnetic interference and dynamic field conditions. Traditional computer vision methods, typically designed for ideal imaging environments, exhibit limited robustness against real-world perturbations such as illumination fluctuations, partial occlusions, and motion artifacts. To address this gap, we propose PriKMet (Prior-Guided Pointer Meter Reader), a novel meter reading algorithm that integrates deep learning with domain-specific priors through three key contributions: (1) a unified hierarchical framework for joint meter detection and keypoint localization, (2) an intelligent meter reading method that fuses the predefined inspection route information with perception results, and (3) an adaptive offset correction mechanism for UAV-based inspections. Extensive experiments on a comprehensive dataset of 3237 substation meter images demonstrate the superior performance of PriKMet, achieving state-of-the-art meter detection results of 99.4% AP50 and 85.5% for meter reading accuracy. The real-time processing capability of the method offers a practical solution for modernizing power infrastructure monitoring. This approach effectively reduces reliance on manual inspections in complex operational environments while enhancing the intelligence of power maintenance operations. Full article

(This article belongs to the Special Issue Advances in Condition Monitoring and Fault Diagnosis)

► Show Figures

Figure 1

23 pages, 8286 KiB

Open AccessArticle

Context-Guided SAR Ship Detection with Prototype-Based Model Pretraining and Check–Balance-Based Decision Fusion

by Haowen Zhou, Zhe Geng, Minjie Sun, Linyi Wu and He Yan

Sensors 2025, 25(16), 4938; https://doi.org/10.3390/s25164938 - 10 Aug 2025

Viewed by 265

Abstract

To address the challenging problem of multi-scale inshore–offshore ship detection in synthetic aperture radar (SAR) remote sensing images, we propose a novel deep learning-based automatic ship detection method within the framework of compositional learning. The proposed method is supported by three pillars: context-guided [...] Read more.

To address the challenging problem of multi-scale inshore–offshore ship detection in synthetic aperture radar (SAR) remote sensing images, we propose a novel deep learning-based automatic ship detection method within the framework of compositional learning. The proposed method is supported by three pillars: context-guided region proposal, prototype-based model-pretraining, and multi-model ensemble learning. To reduce the false alarms induced by the discrete ground clutters, the prior knowledge of the harbour’s layout is exploited to generate land masks for terrain delimitation. To prepare the model for the diverse ship targets of different sizes and orientations it might encounter in the test environment, a novel cross-dataset model pretraining strategy is devised, where the SAR images of several key ship target prototypes from the auxiliary dataset are used to support class-incremental learning. To combine the advantages of diverse model architectures, an adaptive decision-level fusion framework is proposed, which consists of three components: a dynamic confidence threshold assignment strategy based on the sizes of targets, a weighted fusion mechanism based on president-senate check–balance, and Soft-NMS-based Dense Group Target Bounding Box Fusion (Soft-NMS-DGT-BBF). The performance enhancement brought by contextual knowledge-aided terrain delimitation, cross-dataset prototype-based model pretraining and check–balance-based adaptive decision-level fusion are validated with a series of ingeniously devised experiments based on the FAIR-CSAR-Ship dataset. Full article

(This article belongs to the Special Issue SAR Imaging Technologies and Applications)

► Show Figures

Figure 1

27 pages, 9566 KiB

Open AccessArticle

CSBBNet: A Specialized Detection Method for Corner Reflector Targets via a Cross-Shaped Bounding Box Network

by Wangshuo Tang, Yuexin Gao, Mengdao Xing, Min Xue, Huitao Liu and Guangcai Sun

Remote Sens. 2025, 17(16), 2760; https://doi.org/10.3390/rs17162760 - 8 Aug 2025

Viewed by 271

Abstract

In synthetic aperture radar (SAR) maritime target detection tasks, corner reflector targets (CRTs) and their arrays can easily interfere with the accurate detection of ship targets, significantly increasing the misdetection rate and false alarm rate of detectors. Current deep learning-based research on SAR [...] Read more.

In synthetic aperture radar (SAR) maritime target detection tasks, corner reflector targets (CRTs) and their arrays can easily interfere with the accurate detection of ship targets, significantly increasing the misdetection rate and false alarm rate of detectors. Current deep learning-based research on SAR maritime target detection primarily focuses on ship targets, while dedicated detection methods addressing corner reflector interference have not yet established a comprehensive research framework. There remains a lack of theoretical innovation in detection principles for such targets. To address these issues, utilizing the prior knowledge of cross-shaped structures exhibited by marine CRTs in SAR images, we propose an innovative cross-shaped bounding box (CSBB) annotation strategy and design a novel dedicated detection network CSBBNet. The proposed method is constructed through three innovative component modules, namely the cross-shaped spatial feature perception (CSSFP) module, the wavelet cross-shaped attention downsampling (WCSAD) module, and the cross-shaped attention detection head (CSAD-Head). Additionally, to ensure effective training, we propose a cross-shaped intersection over union (CS-IoU) loss function. Comparative experiments with state-of-the-art methods demonstrate that our approach exhibits efficient detection capabilities for CRTs. Ablation experiment results validate the effectiveness of the proposed component architectures. Full article

(This article belongs to the Special Issue Advances in Imaging Radar Signal Processing, Target Feature Extraction and Recognition)

► Show Figures

Figure 1

17 pages, 3807 KiB

Open AccessArticle

2AM: Weakly Supervised Tumor Segmentation in Pathology via CAM and SAM Synergy

by Chenyu Ren, Liwen Zou and Luying Gui

Electronics 2025, 14(15), 3109; https://doi.org/10.3390/electronics14153109 - 5 Aug 2025

Viewed by 296

Abstract

Tumor microenvironment (TME) analysis plays an extremely important role in computational pathology. Deep learning shows tremendous potential for tumor tissue segmentation on pathological images, which is an essential part of TME analysis. However, fully supervised segmentation methods based on deep learning usually require [...] Read more.

Tumor microenvironment (TME) analysis plays an extremely important role in computational pathology. Deep learning shows tremendous potential for tumor tissue segmentation on pathological images, which is an essential part of TME analysis. However, fully supervised segmentation methods based on deep learning usually require a large number of manual annotations, which is time-consuming and labor-intensive. Recently, weakly supervised semantic segmentation (WSSS) works based on the Class Activation Map (CAM) have shown promising results to learn the concept of segmentation from image-level class labels but usually have imprecise boundaries due to the lack of pixel-wise supervision. On the other hand, the Segment Anything Model (SAM), a foundation model for segmentation, has shown an impressive ability for general semantic segmentation on natural images, while it suffers from the noise caused by the initial prompts. To address these problems, we propose a simple but effective weakly supervised framework, termed as 2AM, combining CAM and SAM for tumor tissue segmentation on pathological images. Our 2AM model is composed of three modules: (1) a CAM module for generating salient regions for tumor tissues on pathological images; (2) an adaptive point selection (APS) module for providing more reliable initial prompts for the subsequent SAM by designing three priors of basic appearance, space distribution, and feature difference; and (3) a SAM module for predicting the final segmentation. Experimental results on two independent datasets show that our proposed method boosts tumor segmentation accuracy by nearly 25% compared with the baseline method, and achieves more than 15% improvement compared with previous state-of-the-art segmentation methods with WSSS settings. Full article

(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)

► Show Figures

Figure 1

29 pages, 15488 KiB

Open AccessArticle

GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images

by Tao He, Jianyu Chen and Delu Pan

Remote Sens. 2025, 17(15), 2652; https://doi.org/10.3390/rs17152652 - 31 Jul 2025

Viewed by 514

Abstract

Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability [...] Read more.

Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% mIoU and 51.92% mIoU, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories. Full article

► Show Figures

Graphical abstract

14 pages, 1617 KiB

Open AccessArticle

Multi-Label Conditioned Diffusion for Cardiac MR Image Augmentation and Segmentation

by Jianyang Li, Xin Ma and Yonghong Shi

Bioengineering 2025, 12(8), 812; https://doi.org/10.3390/bioengineering12080812 - 28 Jul 2025

Viewed by 396

Abstract

Accurate segmentation of cardiac MR images using deep neural networks is crucial for cardiac disease diagnosis and treatment planning, as it provides quantitative insights into heart anatomy and function. However, achieving high segmentation accuracy relies heavily on extensive, precisely annotated datasets, which are [...] Read more.

Accurate segmentation of cardiac MR images using deep neural networks is crucial for cardiac disease diagnosis and treatment planning, as it provides quantitative insights into heart anatomy and function. However, achieving high segmentation accuracy relies heavily on extensive, precisely annotated datasets, which are costly and time-consuming to obtain. This study addresses this challenge by proposing a novel data augmentation framework based on a condition-guided diffusion generative model, controlled by multiple cardiac labels. The framework aims to expand annotated cardiac MR datasets and significantly improve the performance of downstream cardiac segmentation tasks. The proposed generative data augmentation framework operates in two stages. First, a Label Diffusion Module is trained to unconditionally generate realistic multi-category spatial masks (encompassing regions such as the left ventricle, interventricular septum, and right ventricle) conforming to anatomical prior probabilities derived from noise. Second, cardiac MR images are generated conditioned on these semantic masks, ensuring a precise one-to-one mapping between synthetic labels and images through the integration of a spatially-adaptive normalization (SPADE) module for structural constraint during conditional model training. The effectiveness of this augmentation strategy is demonstrated using the U-Net model for segmentation on the enhanced 2D cardiac image dataset derived from the M&M Challenge. Results indicate that the proposed method effectively increases dataset sample numbers and significantly improves cardiac segmentation accuracy, achieving a 5% to 10% higher Dice Similarity Coefficient (DSC) compared to traditional data augmentation methods. Experiments further reveal a strong correlation between image generation quality and augmentation effectiveness. This framework offers a robust solution for data scarcity in cardiac image analysis, directly benefiting clinical applications. Full article

(This article belongs to the Special Issue Diagnostic Biomedical Image and Processing with Artificial Intelligence and Deep Learning)

► Show Figures

Figure 1

18 pages, 2644 KiB

Open AccessArticle

Multispectral and Chlorophyll Fluorescence Imaging Fusion Using 2D-CNN and Transfer Learning for Cross-Cultivar Early Detection of Verticillium Wilt in Eggplants

by Dongfang Zhang, Shuangxia Luo, Jun Zhang, Mingxuan Li, Xiaofei Fan, Xueping Chen and Shuxing Shen

Agronomy 2025, 15(8), 1799; https://doi.org/10.3390/agronomy15081799 - 25 Jul 2025

Viewed by 226

Abstract

Verticillium wilt is characterized by chlorosis in leaves and is a devastating disease in eggplant. Early diagnosis, prior to the manifestation of symptoms, enables targeted management of the disease. In this study, we aim to detect early leaf wilt in eggplant leaves caused [...] Read more.

Verticillium wilt is characterized by chlorosis in leaves and is a devastating disease in eggplant. Early diagnosis, prior to the manifestation of symptoms, enables targeted management of the disease. In this study, we aim to detect early leaf wilt in eggplant leaves caused by Verticillium dahliae by integrating multispectral imaging with machine learning and deep learning techniques. Multispectral and chlorophyll fluorescence images were collected from leaves of the inbred eggplant line 11-435, including data on image texture, spectral reflectance, and chlorophyll fluorescence. Subsequently, we established a multispectral data model, fusion information model, and multispectral image–information fusion model. The multispectral image–information fusion model, integrated with a two-dimensional convolutional neural network (2D-CNN), demonstrated optimal performance in classifying early-stage Verticillium wilt infection, achieving a test accuracy of 99.37%. Additionally, transfer learning enabled us to diagnose early leaf wilt in another eggplant variety, the inbred line 14-345, with an accuracy of 84.54 ± 1.82%. Compared to traditional methods that rely on visible symptom observation and typically require about 10 days to confirm infection, this study achieved early detection of Verticillium wilt as soon as the third day post-inoculation. These findings underscore the potential of the fusion model as a valuable tool for the early detection of pre-symptomatic states in infected plants, thereby offering theoretical support for in-field detection of eggplant health. Full article

(This article belongs to the Special Issue Current Status and Applications of Remote Sensing in Plant Pest and Disease Detection)

► Show Figures

Figure 1

19 pages, 28897 KiB

Open AccessArticle

MetaRes-DMT-AS: A Meta-Learning Approach for Few-Shot Fault Diagnosis in Elevator Systems

by Hongming Hu, Shengying Yang, Yulai Zhang, Jianfeng Wu, Liang He and Jingsheng Lei

Sensors 2025, 25(15), 4611; https://doi.org/10.3390/s25154611 - 25 Jul 2025

Viewed by 307

Abstract

Recent advancements in deep learning have spurred significant research interest in fault diagnosis for elevator systems. However, conventional approaches typically require substantial labeled datasets that are often impractical to obtain in real-world industrial environments. This limitation poses a fundamental challenge for developing robust [...] Read more.

Recent advancements in deep learning have spurred significant research interest in fault diagnosis for elevator systems. However, conventional approaches typically require substantial labeled datasets that are often impractical to obtain in real-world industrial environments. This limitation poses a fundamental challenge for developing robust diagnostic models capable of performing reliably under data-scarce conditions. To address this critical gap, we propose MetaRes-DMT-AS (Meta-ResNet with Dynamic Meta-Training and Adaptive Scheduling), a novel meta-learning framework for few-shot fault diagnosis. Our methodology employs Gramian Angular Fields to transform 1D raw sensor data into 2D image representations, followed by episodic task construction through stochastic sampling. During meta-training, the system acquires transferable prior knowledge through optimized parameter initialization, while an adaptive scheduling module dynamically configures support/query sets. Subsequent regularization via prototype networks ensures stable feature extraction. Comprehensive validation using the Case Western Reserve University bearing dataset and proprietary elevator acceleration data demonstrates the framework’s superiority: MetaRes-DMT-AS achieves state-of-the-art few-shot classification performance, surpassing benchmark models by 0.94–1.78% in overall accuracy. For critical few-shot fault categories—particularly emergency stops and severe vibrations—the method delivers significant accuracy improvements of 3–16% and 17–29%, respectively. Full article

(This article belongs to the Special Issue Signal Processing and Sensing Technologies for Fault Diagnosis)

► Show Figures

Figure 1

14 pages, 492 KiB

Open AccessArticle

Learnable Priors Support Reconstruction in Diffuse Optical Tomography

by Alessandra Serianni, Alessandro Benfenati and Paola Causin

Photonics 2025, 12(8), 746; https://doi.org/10.3390/photonics12080746 - 24 Jul 2025

Viewed by 355

Abstract

Diffuse Optical Tomography (DOT) is a non-invasive medical imaging technique that makes use of Near-Infrared (NIR) light to recover the spatial distribution of optical coefficients in biological tissues for diagnostic purposes. Due to the intense scattering of light within tissues, the reconstruction process [...] Read more.

Diffuse Optical Tomography (DOT) is a non-invasive medical imaging technique that makes use of Near-Infrared (NIR) light to recover the spatial distribution of optical coefficients in biological tissues for diagnostic purposes. Due to the intense scattering of light within tissues, the reconstruction process inherent to DOT is severely ill-posed. In this paper, we propose to tackle the ill-conditioning by learning a prior over the solution space using an autoencoder-type neural network. Specifically, the decoder part of the autoencoder is used as a generative model. It maps a latent code to estimated physical parameters given in input to the forward model. The latent code is itself the result of an optimization loop which minimizes the discrepancy of the solution computed by the forward model with available observations. The structure and interpretability of the latent space are enhanced by minimizing the rank of its covariance matrix, thereby promoting more effective utilization of its information-carrying capacity. The deep learning-based prior significantly enhances reconstruction capabilities in this challenging domain, demonstrating the potential of integrating advanced neural network techniques into DOT. Full article

(This article belongs to the Special Issue Advances in Diffuse Optical Tomography: Current Trends and Future Perspectives)

► Show Figures

Figure 1

35 pages, 4256 KiB

Open AccessArticle

Automated Segmentation and Morphometric Analysis of Thioflavin-S-Stained Amyloid Deposits in Alzheimer’s Disease Brains and Age-Matched Controls Using Weakly Supervised Deep Learning

by Gábor Barczánfalvi, Tibor Nyári, József Tolnai, László Tiszlavicz, Balázs Gulyás and Karoly Gulya

Int. J. Mol. Sci. 2025, 26(15), 7134; https://doi.org/10.3390/ijms26157134 - 24 Jul 2025

Viewed by 496

Abstract

Alzheimer’s disease (AD) involves the accumulation of amyloid-β (Aβ) plaques, whose quantification plays a central role in understanding disease progression. Automated segmentation of Aβ deposits in histopathological micrographs enables large-scale analyses but is hindered by the high cost of detailed pixel-level annotations. Weakly [...] Read more.

Alzheimer’s disease (AD) involves the accumulation of amyloid-β (Aβ) plaques, whose quantification plays a central role in understanding disease progression. Automated segmentation of Aβ deposits in histopathological micrographs enables large-scale analyses but is hindered by the high cost of detailed pixel-level annotations. Weakly supervised learning offers a promising alternative by leveraging coarse or indirect labels to reduce the annotation burden. We evaluated a weakly supervised approach to segment and analyze thioflavin-S-positive parenchymal amyloid pathology in AD and age-matched brains. Our pipeline integrates three key components, each designed to operate under weak supervision. First, robust preprocessing (including retrospective multi-image illumination correction and gradient-based background estimation) was applied to enhance image fidelity and support training, as models rely more on image features. Second, class activation maps (CAMs), generated by a compact deep classifier SqueezeNet, were used to identify, and coarsely localize amyloid-rich parenchymal regions from patch-wise image labels, serving as spatial priors for subsequent refinement without requiring dense pixel-level annotations. Third, a patch-based convolutional neural network, U-Net, was trained on synthetic data generated from micrographs based on CAM-derived pseudo-labels via an extensive object-level augmentation strategy, enabling refined whole-image semantic segmentation and generalization across diverse spatial configurations. To ensure robustness and unbiased evaluation, we assessed the segmentation performance of the entire framework using patient-wise group k-fold cross-validation, explicitly modeling generalization across unseen individuals, critical in clinical scenarios. Despite relying on weak labels, the integrated pipeline achieved strong segmentation performance with an average Dice similarity coefficient (≈0.763) and Jaccard index (≈0.639), widely accepted metrics for assessing segmentation quality in medical image analysis. The resulting segmentations were also visually coherent, demonstrating that weakly supervised segmentation is a viable alternative in histopathology, where acquiring dense annotations is prohibitively labor-intensive and time-consuming. Subsequent morphometric analyses on automatically segmented Aβ deposits revealed size-, structural complexity-, and global geometry-related differences across brain regions and cognitive status. These findings confirm that deposit architecture exhibits region-specific patterns and reflects underlying neurodegenerative processes, thereby highlighting the biological relevance and practical applicability of the proposed image-processing pipeline for morphometric analysis. Full article

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 3rd Edition)

► Show Figures

Figure 1

25 pages, 6911 KiB

Open AccessArticle

Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network

by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao

Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 - 24 Jul 2025

Viewed by 420

Abstract

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.

To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article

► Show Figures

Figure 1

18 pages, 5806 KiB

Open AccessArticle

Optical Flow Magnification and Cosine Similarity Feature Fusion Network for Micro-Expression Recognition

by Heyou Chang, Jiazheng Yang, Kai Huang, Wei Xu, Jian Zhang and Hao Zheng

Mathematics 2025, 13(15), 2330; https://doi.org/10.3390/math13152330 - 22 Jul 2025

Viewed by 294

Abstract

Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification [...] Read more.

Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification and Cosine Similarity Feature Fusion Network (MCNet). MCNet introduces a multi-facial action optical flow estimation module that integrates global motion-amplified optical flow with localized optical flow from the eye and mouth–nose regions, enabling precise capture of facial expression nuances. Additionally, an enhanced MobileNetV3-based feature extraction module, incorporating Kolmogorov–Arnold networks and convolutional attention mechanisms, effectively captures both global and local features from optical flow images. A novel multi-channel feature fusion module leverages cosine similarity between Query and Key token sequences to optimize feature integration. Extensive evaluations on four public datasets—CASME II, SAMM, SMIC-HS, and MMEW—demonstrate MCNet’s superior performance, achieving state-of-the-art results with 92.88% UF1 and 86.30% UAR on the composite dataset, surpassing the best prior method by 1.77% in UF1 and 6.0% in UAR. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Search Results (567)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (567)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI