Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (176)

Search Parameters:
Keywords = channel local attention block

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 6322 KB  
Article
MAEM-ResUNet: Accurate Glioma Segmentation in Brain MRI via Symmetric Multi-Directional Mamba and Dual-Attention Modules
by Deguo Yang, Boming Yang and Jie Yan
Symmetry 2026, 18(1), 1; https://doi.org/10.3390/sym18010001 - 19 Dec 2025
Abstract
Gliomas are among the most common and aggressive malignant brain tumors. Their irregular morphology and fuzzy boundaries pose substantial challenges for automatic segmentation in MRI. Accurate delineation of tumor subregions is crucial for treatment planning and outcome assessment. This study proposes MAEM-ResUNet, an [...] Read more.
Gliomas are among the most common and aggressive malignant brain tumors. Their irregular morphology and fuzzy boundaries pose substantial challenges for automatic segmentation in MRI. Accurate delineation of tumor subregions is crucial for treatment planning and outcome assessment. This study proposes MAEM-ResUNet, an extension of the ResUNet architecture that integrates three key modules: a multi-scale adaptive attention module for joint channel–spatial feature selection, a symmetric multi-directional Mamba block for long-range context modeling, and an adaptive edge attention module for boundary refinement. Experimental results on the BraTS2020 and BraTS2021 datasets demonstrate that MAEM-ResUNet outperforms mainstream methods. On BraTS2020, it achieves an average Dice Similarity Coefficient of 91.19% and an average Hausdorff Distance (HD) of 5.27 mm; on BraTS2021, the average Dice coefficient is 89.67% and the average HD is 5.87 mm, both showing improvements compared to other mainstream models. Meanwhile, ablation experiments confirm the synergistic effect of the three modules, which significantly enhances the accuracy of glioma segmentation and the precision of boundary localization. Full article
Show Figures

Figure 1

28 pages, 33315 KB  
Article
Hyperspectral Image Classification with Multi-Path 3D-CNN and Coordinated Hierarchical Attention
by Wenyi Hu, Wei Shi, Chunjie Lan, Yuxia Li and Lei He
Remote Sens. 2025, 17(24), 4035; https://doi.org/10.3390/rs17244035 - 15 Dec 2025
Viewed by 208
Abstract
Convolutional Neural Networks (CNNs) have been extensively applied for the extraction of deep features in hyperspectral imagery tasks. However, traditional 3D-CNNs are limited by their fixed-size receptive fields and inherent locality. This restricts their ability to capture multi-scale objects and model long-range dependencies, [...] Read more.
Convolutional Neural Networks (CNNs) have been extensively applied for the extraction of deep features in hyperspectral imagery tasks. However, traditional 3D-CNNs are limited by their fixed-size receptive fields and inherent locality. This restricts their ability to capture multi-scale objects and model long-range dependencies, ultimately hindering the representation of large-area land-cover structures. To overcome these drawbacks, we present a new framework designed to integrate multi-scale feature fusion and a hierarchical attention mechanism for hyperspectral image classification. Channel-wise Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) spatial attention are combined to enhance feature representation from both spectral bands and spatial locations, allowing the network to emphasize critical wavelengths and salient spatial structures. Finally, by integrating the self-attention inherent in the Transformer architecture with a Cross-Attention Fusion (CAF) mechanism, a local-global feature fusion module is developed. This module effectively captures extended-span interdependencies present in hyperspectral remote sensing images, and this process facilitates the effective integration of both localized and holistic attributes. On the Salinas Valley dataset, the proposed method delivers an Overall Accuracy (OA) of 0.9929 and an Average Accuracy (AA) of 0.9949, attaining perfect recognition accuracy for certain classes. The proposed model demonstrates commendable class balance and classification stability. Across multiple publicly available hyperspectral remote sensing image datasets, it systematically produces classification outcomes that significantly outperform those of established benchmark methods, exhibiting distinct advantages in feature representation, structural modeling, and the discrimination of complex ground objects. Full article
Show Figures

Figure 1

27 pages, 7305 KB  
Article
High-Fidelity CT Image Denoising with De-TransGAN: A Transformer-Augmented GAN Framework with Attention Mechanisms
by Usama Jameel and Nicola Belcari
Bioengineering 2025, 12(12), 1350; https://doi.org/10.3390/bioengineering12121350 - 11 Dec 2025
Viewed by 297
Abstract
Low-dose computed tomography (LDCT) has become a widely adopted protocol to reduce radiation exposure during clinical imaging. However, dose reduction inevitably amplifies noise and artifacts, compromising image quality and diagnostic confidence. To address this challenge, this study introduces De-TransGAN, a transformer-augmented Generative Adversarial [...] Read more.
Low-dose computed tomography (LDCT) has become a widely adopted protocol to reduce radiation exposure during clinical imaging. However, dose reduction inevitably amplifies noise and artifacts, compromising image quality and diagnostic confidence. To address this challenge, this study introduces De-TransGAN, a transformer-augmented Generative Adversarial Network specifically designed for high-fidelity LDCT image denoising. Unlike conventional CNN-based denoising models, De-TransGAN combines convolutional layers with transformer blocks to jointly capture local texture details and long-range anatomical dependencies. To further guide the network toward diagnostically critical structures, we embed channel–spatial attention modules based on the Convolutional Block Attention Module (CBAM). On the discriminator side, a hybrid design integrating PatchGAN and vision transformer (ViT) components enhances both fine-grained texture discrimination and global structural consistency. Training stability is achieved using the Wasserstein GAN with Gradient Penalty (WGAN-GP), while a composite objective function—L1 loss, SSIM loss, and VGG perceptual loss—ensures pixel-level fidelity, structural similarity, and perceptual realism. De-TransGAN was trained on the TCIA LDCT and Projection Data dataset and validated on two additional benchmarks: the AAPM Mayo Clinic Low Dose CT Grand Challenge dataset and a private clinical chest LDCT dataset comprising 524 scans (used for qualitative assessment only, as no NDCT ground truth is available). Across these datasets, the proposed method consistently outperformed state-of-the-art CNN- and transformer-based denoising models. On the LDCT and Projection dataset head images, it achieved a PSNR of 44.9217 dB, SSIM of 0.9801, and RMSE of 1.001, while qualitative evaluation on the private dataset confirmed strong generalization with clear noise suppression and preservation of fine anatomical details. These findings establish De-TransGAN as a clinically viable approach for LDCT denoising, enabling radiation reduction without compromising diagnostic quality. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

15 pages, 1765 KB  
Article
Clinically Focused Computer-Aided Diagnosis for Breast Cancer Using SE and CBAM with Multi-Head Attention
by Zeki Ogut, Mucahit Karaduman and Muhammed Yildirim
Tomography 2025, 11(12), 138; https://doi.org/10.3390/tomography11120138 - 10 Dec 2025
Viewed by 147
Abstract
Background/Objectives: Breast cancer is one of the most common malignancies in women worldwide. Early diagnosis and accurate classification in breast cancer detection are among the most critical factors determining treatment success and patient survival. In this study, a deep learning-based model was developed [...] Read more.
Background/Objectives: Breast cancer is one of the most common malignancies in women worldwide. Early diagnosis and accurate classification in breast cancer detection are among the most critical factors determining treatment success and patient survival. In this study, a deep learning-based model was developed that can classify benign, malignant, and normal breast tissues from ultrasound images with high accuracy and achieve better results than the methods commonly used in the literature. Methods: The proposed model was trained on a dataset of breast ultrasound images, and its classification performance was evaluated. The model is designed to effectively learn both local textural features and global contextual relationships by combining Squeeze-and-Excitation (SE) blocks, which emphasize channel-level feature importance, and Convolutional Block Attention Module (CBAM) attention mechanisms, which focus on spatial information, with the MHA structure. The model’s performance is compared with three commonly used convolutional neural networks (CNNs) and three Vision Transformer (ViT) architectures. Results: The developed model achieved an accuracy rate of 96.03% in experimental analyses, outperforming both the six compared models and similar studies in the literature. Additionally, the proposed model was tested on a second dataset consisting of histopathological images and achieved an average accuracy of 99.55%. The results demonstrate that the model can effectively learn meaningful spatial and contextual information from ultrasound data and distinguish different tissue types with high accuracy. Conclusions: This study demonstrates the potential of deep learning-based approaches in breast ultrasound-based computer-aided diagnostic systems, providing a reliable, fast, and accurate decision support tool for early diagnosis. The results obtained with the proposed model suggest that it can significantly contribute to patient management by improving diagnostic accuracy in clinical applications. Full article
(This article belongs to the Special Issue Imaging in Cancer Diagnosis)
Show Figures

Figure 1

19 pages, 4054 KB  
Article
DSGF-YOLO: A Lightweight Deep Neural Network for Traffic Accident Detection and Severity Classifications
by Weijun Li, Huawei Xie and Peiteng Lin
Vehicles 2025, 7(4), 153; https://doi.org/10.3390/vehicles7040153 - 5 Dec 2025
Viewed by 277
Abstract
Traffic accidents pose unpredictable and severe social and economic challenges. Rapid and accurate accident detection, along with reliable severity classification, is essential for timely emergency response and improved road safety. This study proposes DSGF-YOLO, an enhanced deep learning framework based on the YOLOv13 [...] Read more.
Traffic accidents pose unpredictable and severe social and economic challenges. Rapid and accurate accident detection, along with reliable severity classification, is essential for timely emergency response and improved road safety. This study proposes DSGF-YOLO, an enhanced deep learning framework based on the YOLOv13 architecture, developed for automated road accident detection and severity classification. The proposed methodology integrates two novel components: the DS-C3K2-FasterNet-Block module, which enhances local feature extraction and computational efficiency, and the Grouped Channel-Wise Self-Attention (G-CSA) module, which strengthens global context modeling and small-object perception. Comprehensive experiments on a diverse traffic accident dataset validate the effectiveness of the proposed framework. The results show that DSGF-YOLO achieves higher precision, recall, and mean average precision than state-of-the-art models such as Faster R-CNN, DETR, and other YOLO variants, while maintaining real-time performance. These findings highlight its potential for intelligent transportation systems and real-world accident monitoring applications. Full article
Show Figures

Figure 1

33 pages, 10355 KB  
Article
S2GL-MambaResNet: A Spatial–Spectral Global–Local Mamba Residual Network for Hyperspectral Image Classification
by Tao Chen, Hongming Ye, Guojie Li, Yaohan Peng, Jianming Ding, Huayue Chen, Xiangbing Zhou and Wu Deng
Remote Sens. 2025, 17(23), 3917; https://doi.org/10.3390/rs17233917 - 3 Dec 2025
Viewed by 474
Abstract
In hyperspectral image classification (HSIC), each pixel contains information across hundreds of contiguous spectral bands; therefore, the ability to perform long-distance modeling that stably captures and propagates these long-distance dependencies is critical. A selective structured state space model (SSM) named Mamba has shown [...] Read more.
In hyperspectral image classification (HSIC), each pixel contains information across hundreds of contiguous spectral bands; therefore, the ability to perform long-distance modeling that stably captures and propagates these long-distance dependencies is critical. A selective structured state space model (SSM) named Mamba has shown strong capabilities for capturing cross-band long-distance dependencies and exhibits advantages in long-distance modeling. However, the inherently high spectral dimensionality, information redundancy, and spatial heterogeneity of hyperspectral images (HSI) pose challenges for Mamba in fully extracting spatial–spectral features and in maintaining computational efficiency. To address these issues, we propose S2GL-MambaResNet, a lightweight HSI classification network that tightly couples Mamba with progressive residuals to enable richer global, local, and multi-scale spatial–spectral feature extraction, thereby mitigating the negative effects of high dimensionality, redundancy, and spatial heterogeneity on long-distance modeling. To avoid fragmentation of spatial–spectral information caused by serialization and to enhance local discriminability, we design a preprocessing method applied to the features before they are input to Mamba, termed the Spatial–Spectral Gated Attention Aggregator (SS-GAA). SS-GAA uses spatial–spectral adaptive gated fusion to preserve and strengthen the continuity of the central pixel’s neighborhood and its local spatial–spectral representation. To compensate for a single global sequence network’s tendency to overlook local structures, we introduce a novel Mamba variant called the Global_Local Spatial_Spectral Mamba Encoder (GLS2ME). GLS2ME comprises a pixel-level global branch and a non-overlapping sliding-window local branch for modeling long-distance dependencies and patch-level spatial–spectral relations, respectively, jointly improving generalization stability under limited sample regimes. To ensure that spatial details and boundary integrity are maintained while capturing spectral patterns at multiple scales, we propose a multi-scale Mamba encoding scheme, the Hierarchical Spectral Mamba Encoder (HSME). HSME first extracts spectral responses via multi-scale 1D spectral convolutions, then groups spectral bands and feeds these groups into Mamba encoders to capture spectral pattern information at different scales. Finally, we design a Progressive Residual Fusion Block (PRFB) that integrates 3D residual recalibration units with Efficient Channel Attention (ECA) to fuse multi-kernel outputs within a global context. This enables ordered fusion of local multi-scale features under a global semantic context, improving information utilization efficiency while keeping computational overhead under control. Comparative experiments on four publicly available HSI datasets demonstrate that S2GL-MambaResNet achieves superior classification accuracy compared with several state-of-the-art methods, with particularly pronounced advantages under few-shot and class-imbalanced conditions. Full article
Show Figures

Figure 1

18 pages, 2703 KB  
Article
High-Frequency Guided Dual-Branch Attention Multi-Scale Hierarchical Dehazing Network for Transmission Line Inspection Images
by Jian Sun, Lanqi Guo and Rui Hu
Electronics 2025, 14(23), 4632; https://doi.org/10.3390/electronics14234632 - 25 Nov 2025
Viewed by 229
Abstract
To address the edge blurring issue of drone inspection images of mountainous transmission lines caused by non-uniform haze interference, as well as the low operational efficiency of traditional dehazing algorithms due to increased network complexity, this paper proposes a high-frequency guided dual-branch attention [...] Read more.
To address the edge blurring issue of drone inspection images of mountainous transmission lines caused by non-uniform haze interference, as well as the low operational efficiency of traditional dehazing algorithms due to increased network complexity, this paper proposes a high-frequency guided dual-branch attention multi-scale hierarchical dehazing network for transmission line scenarios. The network adopts a core architecture of multi-block hierarchical processing combined with a multi-scale integration scheme, with each layer based on an asymmetric encoder–decoder with residual channels as the basic framework. A Mix structure module is embedded in the encoder to construct a dual-branch attention mechanism: the low-frequency global perception branch cascades channel attention and pixel attention to model global features; the high-frequency local enhancement branch adopts a multi-directional edge feature extraction method to capture edge information, which is well-adapted to the structural characteristics of transmission line conductors and towers. Additionally, a fog density estimation branch based on the dark channel mean is added to dynamically adjust the weights of the dual branches according to haze concentration, solving the problem of attention failure caused by attenuation of high-frequency signals in dense haze regions. At the decoder end, depthwise separable convolution is used to construct lightweight residual modules, which reduce running time while maintaining feature expression capability. At the output stage, an inter-block feature fusion module is introduced to eliminate cross-block artifacts caused by multi-block processing through multi-strategy collaborative optimization. Experimental results on the public datasets NH-HAZE20, NH-HAZE21, O-HAZE, and the self-built foggy transmission line dataset show that, compared with classic and cutting-edge algorithms, the proposed algorithm significantly outperforms others in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM); its running time is 19% shorter than that of DMPHN. Subjectively, the restored images have continuous and complete edges and high color fidelity, which can meet the practical needs of subsequent fault detection in transmission line inspection. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

27 pages, 11404 KB  
Article
Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Classification
by Zahid Ullah, Minki Hong, Tahir Mahmood and Jihie Kim
Mathematics 2025, 13(22), 3728; https://doi.org/10.3390/math13223728 - 20 Nov 2025
Viewed by 481
Abstract
Deep learning has demonstrated significant promise in medical image analysis; however, standard CNNs frequently encounter challenges in detecting subtle and intricate features vital for accurate diagnosis. To address this limitation, we systematically integrated attention mechanisms into five commonly used CNN backbones: VGG16, ResNet18, [...] Read more.
Deep learning has demonstrated significant promise in medical image analysis; however, standard CNNs frequently encounter challenges in detecting subtle and intricate features vital for accurate diagnosis. To address this limitation, we systematically integrated attention mechanisms into five commonly used CNN backbones: VGG16, ResNet18, InceptionV3, DenseNet121, and EfficientNetB5. Each network was modified using either a Squeeze-and-Excitation block or a hybrid Convolutional Block Attention Module, allowing for more effective recalibration of channel and spatial features. We evaluated these attention-augmented models on two distinct datasets: (1) a Products of Conception histopathological dataset containing four tissue categories, and (2) a brain tumor MRI dataset that includes multiple tumor subtypes. Across both datasets, networks enhanced with attention mechanisms consistently outperformed their baseline counterparts on all measured evaluation criteria. Importantly, EfficientNetB5 with hybrid attention achieved superior overall results, with notable enhancements in both accuracy and generalizability. In addition to improved classification outcomes, the inclusion of attention mechanisms also advanced feature localization, thereby increasing robustness across a range of imaging modalities. Our study established a comprehensive framework for incorporating attention modules into diverse CNN architectures and delineated their impact on medical image classification. These results provide important insights for the development of interpretable and clinically robust deep learning-driven diagnostic systems. Full article
Show Figures

Figure 1

22 pages, 2067 KB  
Article
MixMambaNet: Hybrid Perception Encoder and Non-Local Mamba Aggregation for IRSTD
by Zikang Zhang and Songfeng Yin
Electronics 2025, 14(22), 4527; https://doi.org/10.3390/electronics14224527 - 19 Nov 2025
Viewed by 351
Abstract
Infrared small target detection (IRSTD) is hindered by low signal-to-noise ratios, minute object scales, and strong target–background similarity. Although long-range skip fusion is exploited in SCTransNet, the global context is insufficiently captured by its convolutional encoder, and the fusion block remains vulnerable to [...] Read more.
Infrared small target detection (IRSTD) is hindered by low signal-to-noise ratios, minute object scales, and strong target–background similarity. Although long-range skip fusion is exploited in SCTransNet, the global context is insufficiently captured by its convolutional encoder, and the fusion block remains vulnerable to structured clutter. To address these issues, a Mamba-enhanced framework, MixMambaNet, is proposed with three mutually reinforcing components. First, ResBlocks are replaced by a perception-aware hybrid encoder, in which local perceptual attention is coupled with mixed pixel–channel attention along multi-branch paths to emphasize weak target cues while modeling image-wide context. Second, at the bottleneck, dense pre-enhancement is integrated with a selective-scan 2D (SS2D) state-space (Mamba) core and a lightweight hybrid-attention tail, enabling linear-complexity long-range reasoning that is better suited to faint signals than quadratic self-attention. Third, the baseline fusion is substituted with a non-local Mamba aggregation module, where DASI-inspired multi-scale integration, SS2D-driven scanning, and adaptive non-local enhancement are employed to align cross-scale semantics and suppress structured noise. The resulting U-shaped network with deep supervision achieves higher accuracy and fewer false alarms at a competitive cost. Extensive evaluations on NUDT-SIRST, NUAA-SIRST, and IRSTD-1k demonstrate consistent improvements over prevailing IRSTD approaches, including SCTransNet. Full article
Show Figures

Figure 1

25 pages, 12749 KB  
Article
ADFE-DET: An Adaptive Dynamic Feature Enhancement Algorithm for Weld Defect Detection
by Xiaocui Wu, Changjun Liu, Hao Zhang and Pengyu Xu
Appl. Sci. 2025, 15(21), 11595; https://doi.org/10.3390/app152111595 - 30 Oct 2025
Viewed by 458
Abstract
Welding is a critical joining process in modern manufacturing, with defects contributing to 50–80% of structural failures. Traditional inspection methods are often inefficient, subjective, and inconsistent. To address challenges in weld defect detection—including scale variation, morphological complexity, low contrast, and sample imbalance—this paper [...] Read more.
Welding is a critical joining process in modern manufacturing, with defects contributing to 50–80% of structural failures. Traditional inspection methods are often inefficient, subjective, and inconsistent. To address challenges in weld defect detection—including scale variation, morphological complexity, low contrast, and sample imbalance—this paper proposes ADFE-DET, an adaptive dynamic feature enhancement algorithm. The approach introduces three core innovations: the Dynamic Selection Cross-stage Cascade Feature Block (DSCFBlock) captures fine texture features via edge-preserving dynamic selection attention; the Adaptive Hierarchical Spatial Feature Pyramid Network (AHSFPN) achieves adaptive multi-scale feature integration through directional channel attention and hierarchical fusion; and the Multi-Directional Differential Lightweight Head (MDDLH) enables precise defect localization via multi-directional differential convolution while maintaining a lightweight architecture. Experiments on three public datasets (Weld-DET, NEU-DET, PKU-Market-PCB) show that ADFE-DET improves mAP50 by 2.16%, 2.73%, and 1.81%, respectively, over baseline YOLOv11n, while reducing parameters by 34.1%, computational complexity by 4.6%, and achieving 105 FPS inference speed. The results demonstrate that ADFE-DET provides an effective and practical solution for intelligent industrial weld quality inspection. Full article
Show Figures

Figure 1

23 pages, 23535 KB  
Article
FANT-Det: Flow-Aligned Nested Transformer for SAR Small Ship Detection
by Hanfu Li, Dawei Wang, Jianming Hu, Xiyang Zhi and Dong Yang
Remote Sens. 2025, 17(20), 3416; https://doi.org/10.3390/rs17203416 - 12 Oct 2025
Viewed by 731
Abstract
Ship detection in synthetic aperture radar (SAR) remote sensing imagery is of great significance in military and civilian applications. However, two factors limit detection performance: (1) a high prevalence of small-scale ship targets with limited information content and (2) interference affecting ship detection [...] Read more.
Ship detection in synthetic aperture radar (SAR) remote sensing imagery is of great significance in military and civilian applications. However, two factors limit detection performance: (1) a high prevalence of small-scale ship targets with limited information content and (2) interference affecting ship detection from speckle noise and land–sea clutter. To address these challenges, we propose a novel end-to-end (E2E) transformer-based SAR ship detection framework, called Flow-Aligned Nested Transformer for SAR Small Ship Detection (FANT-Det). Specifically, in the feature extraction stage, we introduce a Nested Swin Transformer Block (NSTB). The NSTB employs a two-level local self-attention mechanism to enhance fine-grained target representation, thereby enriching features of small ships. For multi-scale feature fusion, we design a Flow-Aligned Depthwise Efficient Channel Attention Network (FADEN). FADEN achieves precise alignment of features across different resolutions via semantic flow and filters background clutter through lightweight channel attention, further enhancing small-target feature quality. Moreover, we propose an Adaptive Multi-scale Contrastive Denoising (AM-CDN) training paradigm. AM-CDN constructs adaptive perturbation thresholds jointly determined by a target scale factor and a clutter factor, generating contrastive denoising samples that better match the physical characteristics of SAR ships. Finally, extensive experiments on three widely used open SAR ship datasets demonstrate that the proposed method achieves superior detection performance, outperforming current state-of-the-art (SOTA) benchmarks. Full article
Show Figures

Figure 1

27 pages, 7948 KB  
Article
Attention-Driven Time-Domain Convolutional Network for Source Separation of Vocal and Accompaniment
by Zhili Zhao, Min Luo, Xiaoman Qiao, Changheng Shao and Rencheng Sun
Electronics 2025, 14(20), 3982; https://doi.org/10.3390/electronics14203982 - 11 Oct 2025
Viewed by 748
Abstract
Time-domain signal models have been widely applied to single-channel music source separation tasks due to their ability to overcome the limitations of fixed spectral representations and phase information loss. However, the high acoustic similarity and synchronous temporal evolution between vocals and accompaniment make [...] Read more.
Time-domain signal models have been widely applied to single-channel music source separation tasks due to their ability to overcome the limitations of fixed spectral representations and phase information loss. However, the high acoustic similarity and synchronous temporal evolution between vocals and accompaniment make accurate separation challenging for existing time-domain models. These challenges are mainly reflected in two aspects: (1) the lack of a dynamic mechanism to evaluate the contribution of each source during feature fusion, and (2) difficulty in capturing fine-grained temporal details, often resulting in local artifacts in the output. To address these issues, we propose an attention-driven time-domain convolutional network for vocal and accompaniment source separation. Specifically, we design an embedding attention module to perform adaptive source weighting, enabling the network to emphasize components more relevant to the target mask during training. In addition, an efficient convolutional block attention module is developed to enhance local feature extraction. This module integrates an efficient channel attention mechanism based on one-dimensional convolution while preserving spatial attention, thereby improving the ability to learn discriminative features from the target audio. Comprehensive evaluations on public music datasets demonstrate the effectiveness of the proposed model and its significant improvements over existing approaches. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

18 pages, 3177 KB  
Article
Ground Type Classification for Hexapod Robots Using Foot-Mounted Force Sensors
by Yong Liu, Rui Sun, Xianguo Tuo, Tiantao Sun and Tao Huang
Machines 2025, 13(10), 900; https://doi.org/10.3390/machines13100900 - 1 Oct 2025
Viewed by 550
Abstract
In field exploration, disaster rescue, and complex terrain operations, the accuracy of ground type recognition directly affects the walking stability and task execution efficiency of legged robots. To address the problem of terrain recognition in complex ground environments, this paper proposes a high-precision [...] Read more.
In field exploration, disaster rescue, and complex terrain operations, the accuracy of ground type recognition directly affects the walking stability and task execution efficiency of legged robots. To address the problem of terrain recognition in complex ground environments, this paper proposes a high-precision classification method based on single-leg triaxial force signals. The method first employs a one-dimensional convolutional neural network (1D-CNN) module to extract local temporal features, then introduces a long short-term memory (LSTM) network to model long-term and short-term dependencies during ground contact, and incorporates a convolutional block attention module (CBAM) to adaptively enhance the feature responses of critical channels and time steps, thereby improving discriminative capability. In addition, an improved whale optimization algorithm (iBWOA) is adopted to automatically perform global search and optimization of key hyperparameters, including the number of convolution kernels, the number of LSTM units, and the dropout rate, to achieve the optimal training configuration. Experimental results demonstrate that the proposed method achieves excellent classification performance on five typical ground types—grass, cement, gravel, soil, and sand—under varying slope and force conditions, with an overall classification accuracy of 96.94%. Notably, it maintains high recognition accuracy even between ground types with similar contact mechanical properties, such as soil vs. grass and gravel vs. sand. This study provides a reliable perception foundation and technical support for terrain-adaptive control and motion strategy optimization of legged robots in real-world environments. Full article
(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)
Show Figures

Figure 1

25 pages, 13151 KB  
Article
Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12 for Improving Nighttime Pedestrian Detection in Security
by Lijuan Wang, Zuchao Bao and Dongming Lu
Appl. Sci. 2025, 15(19), 10607; https://doi.org/10.3390/app151910607 - 30 Sep 2025
Viewed by 551
Abstract
In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose [...] Read more.
In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose a practical multimodal pipeline—Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12—that first fuses infrared and low-light visible images using per-pixel weights derived from local energy, gradient magnitude and contrast measures, then detects pedestrians with an improved YOLOv12 backbone. The detector integrates an AIFI attention module at high semantic levels, replaces selected modules with A2C2f blocks to enhance cross-channel feature aggregation, and preserves P3–P5 outputs to improve small-object localization. We evaluate the complete pipeline on the LLVIP dataset and report Precision, Recall, mAP@50, mAP@50–95, GFLOPs, FPS and detection time, comparing against YOLOv8, YOLOv10–YOLOv12 baselines (n and s scales). Quantitative and qualitative results show that the proposed fusion restores complementary thermal and visible details and that the AIFI-enhanced detector yields more robust nighttime pedestrian detection while maintaining a competitive computational profile suitable for real-world security deployments. Full article
(This article belongs to the Special Issue Advanced Image Analysis and Processing Technologies and Applications)
Show Figures

Figure 1

25 pages, 20019 KB  
Article
GLFNet: Attention Mechanism-Based Global–Local Feature Fusion Network for Micro-Expression Recognition
by Meng Zhang, Long Yao, Wenzhong Yang and Yabo Yin
Entropy 2025, 27(10), 1023; https://doi.org/10.3390/e27101023 - 28 Sep 2025
Viewed by 650
Abstract
Micro-expressions are extremely subtle and short-lived facial muscle movements that often reveal an individual’s genuine emotions. However, micro-expression recognition (MER) remains highly challenging due to its short duration, low motion intensity, and the imbalanced distribution of training samples. To address these issues, this [...] Read more.
Micro-expressions are extremely subtle and short-lived facial muscle movements that often reveal an individual’s genuine emotions. However, micro-expression recognition (MER) remains highly challenging due to its short duration, low motion intensity, and the imbalanced distribution of training samples. To address these issues, this paper proposes a Global–Local Feature Fusion Network (GLFNet) to effectively extract discriminative features for MER. Specifically, GLFNet consists of three core modules: the Global Attention (LA) module, which captures subtle variations across the entire facial region; the Local Block (GB) module, which partitions the feature map into four non-overlapping regions to emphasize salient local movements while suppressing irrelevant information; and the Adaptive Feature Fusion (AFF) module, which employs an attention mechanism to dynamically adjust channel-wise weights for efficient global–local feature integration. In addition, a class-balanced loss function is introduced to replace the conventional cross-entropy loss, mitigating the common issue of class imbalance in micro-expression datasets. Extensive experiments are conducted on three benchmark databases, SMIC, CASME II, and SAMM, under two evaluation protocols. The experimental results demonstrate that under the Composite Database Evaluation protocol, GLFNet consistently outperforms existing state-of-the-art methods in overall performance. Specifically, the unweighted F1-scores on the Combined, SAMM, CASME II, and SMIC datasets are improved by 2.49%, 2.02%, 0.49%, and 4.67%, respectively, compared to the current best methods. These results strongly validate the effectiveness and superiority of the proposed global–local feature fusion strategy in micro-expression recognition tasks. Full article
Show Figures

Figure 1

Back to TopTop