Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (343)

Search Parameters:
Keywords = hybrid multi-scale features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 5917 KiB  
Article
VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation
by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan
Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 (registering DOI) - 17 Jul 2025
Abstract
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.
Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

23 pages, 5885 KiB  
Article
Binary and Multi-Class Classification of Colorectal Polyps Using CRP-ViT: A Comparative Study Between CNNs and QNNs
by Jothiraj Selvaraj, Fadhiyah Almutairi, Shabnam M. Aslam and Snekhalatha Umapathy
Life 2025, 15(7), 1124; https://doi.org/10.3390/life15071124 - 17 Jul 2025
Abstract
Background: Colorectal cancer (CRC) is a major contributor to cancer mortality on a global scale, with polyps being critical precursors. The accurate classification of colorectal polyps (CRPs) from colonoscopy images is essential for the timely diagnosis and treatment of CRC. Method: This research [...] Read more.
Background: Colorectal cancer (CRC) is a major contributor to cancer mortality on a global scale, with polyps being critical precursors. The accurate classification of colorectal polyps (CRPs) from colonoscopy images is essential for the timely diagnosis and treatment of CRC. Method: This research proposes a novel hybrid model, CRP-ViT, integrating ResNet50 with Vision Transformers (ViTs) to enhance feature extraction and improve classification performance. This study conducted a comprehensive comparison of the CRP-ViT model against traditional convolutional neural networks (CNNs) and emerging quantum neural networks (QNNs). Experiments were conducted for binary classification to predict the presence of polyps and multi-classification to predict specific polyp types (hyperplastic, adenomatous, and serrated). Results: The results demonstrate that CRPQNN-ViT achieved superior classification performance while maintaining computational efficiency. CRPQNN-ViT achieved an accuracy of 98.18% for training and 97.73% for validation on binary classification and 98.13% during training and 97.92% for validation on multi-classification tasks. In addition to the key metrics, computational parameters were compared, where CRPQNN-ViT excelled in computational time. Conclusions: This comparative analysis reveals the potential of integrating quantum computing into medical image analysis and underscores the effectiveness of transformer-based architectures for CRP classification. Full article
(This article belongs to the Special Issue Current Progress in Medical Image Segmentation)
Show Figures

Figure 1

19 pages, 5415 KiB  
Article
Intelligent Optimized Diagnosis for Hydropower Units Based on CEEMDAN Combined with RCMFDE and ISMA-CNN-GRU-Attention
by Wenting Zhang, Huajun Meng, Ruoxi Wang and Ping Wang
Water 2025, 17(14), 2125; https://doi.org/10.3390/w17142125 - 17 Jul 2025
Abstract
This study suggests a hybrid approach that combines improved feature selection and intelligent diagnosis to increase the operational safety and intelligent diagnosis capabilities of hydropower units. In order to handle the vibration data, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is [...] Read more.
This study suggests a hybrid approach that combines improved feature selection and intelligent diagnosis to increase the operational safety and intelligent diagnosis capabilities of hydropower units. In order to handle the vibration data, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used initially. A novel comprehensive index is constructed by combining the Pearson correlation coefficient, mutual information (MI), and Kullback–Leibler divergence (KLD) to select intrinsic mode functions (IMFs). Next, feature extraction is performed on the selected IMFs using Refined Composite Multiscale Fluctuation Dispersion Entropy (RCMFDE). Then, time and frequency domain features are screened by calculating dispersion and combined with IMF features to build a hybrid feature vector. The vector is then fed into a CNN-GRU-Attention model for intelligent diagnosis. The improved slime mold algorithm (ISMA) is employed for the first time to optimize the hyperparameters of the CNN-GRU-Attention model. The experimental results show that the classification accuracy reaches 96.79% for raw signals and 93.33% for noisy signals, significantly outperforming traditional methods. This study incorporates entropy-based feature extraction, combines hyperparameter optimization with the classification model, and addresses the limitations of single feature selection methods for non-stationary and nonlinear signals. The proposed approach provides an excellent solution for intelligent optimized diagnosis of hydropower units. Full article
(This article belongs to the Special Issue Optimization-Simulation Modeling of Sustainable Water Resource)
Show Figures

Figure 1

20 pages, 2926 KiB  
Article
SonarNet: Global Feature-Based Hybrid Attention Network for Side-Scan Sonar Image Segmentation
by Juan Lei, Huigang Wang, Liming Fan, Qingyue Gu, Shaowei Rong and Huaxia Zhang
Remote Sens. 2025, 17(14), 2450; https://doi.org/10.3390/rs17142450 - 15 Jul 2025
Viewed by 72
Abstract
With the rapid advancement of deep learning techniques, side-scan sonar image segmentation has become a crucial task in underwater scene understanding. However, the complex and variable underwater environment poses significant challenges for salient object detection, with traditional deep learning approaches often suffering from [...] Read more.
With the rapid advancement of deep learning techniques, side-scan sonar image segmentation has become a crucial task in underwater scene understanding. However, the complex and variable underwater environment poses significant challenges for salient object detection, with traditional deep learning approaches often suffering from inadequate feature representation and the loss of global context during downsampling, thus compromising the segmentation accuracy of fine structures. To address these issues, we propose SonarNet, a Global Feature-Based Hybrid Attention Network specifically designed for side-scan sonar image segmentation. SonarNet features a dual-encoder architecture that leverages residual blocks and a self-attention mechanism to simultaneously capture both global structural and local contextual information. In addition, an adaptive hybrid attention module is introduced to effectively integrate channel and spatial features, while a global enhancement block fuses multi-scale global and spatial representations from the dual encoders, mitigating information loss throughout the network. Comprehensive experiments on a dedicated underwater sonar dataset demonstrate that SonarNet outperforms ten state-of-the-art saliency detection methods, achieving a mean absolute error as low as 2.35%. These results highlight the superior performance of SonarNet in challenging sonar image segmentation tasks. Full article
Show Figures

Graphical abstract

19 pages, 3619 KiB  
Article
An Adaptive Underwater Image Enhancement Framework Combining Structural Detail Enhancement and Unsupervised Deep Fusion
by Semih Kahveci and Erdinç Avaroğlu
Appl. Sci. 2025, 15(14), 7883; https://doi.org/10.3390/app15147883 - 15 Jul 2025
Viewed by 81
Abstract
The underwater environment severely degrades image quality by absorbing and scattering light. This causes significant challenges, including non-uniform illumination, low contrast, color distortion, and blurring. These degradations compromise the performance of critical underwater applications, including water quality monitoring, object detection, and identification. To [...] Read more.
The underwater environment severely degrades image quality by absorbing and scattering light. This causes significant challenges, including non-uniform illumination, low contrast, color distortion, and blurring. These degradations compromise the performance of critical underwater applications, including water quality monitoring, object detection, and identification. To address these issues, this study proposes a detail-oriented hybrid framework for underwater image enhancement that synergizes the strengths of traditional image processing with the powerful feature extraction capabilities of unsupervised deep learning. Our framework introduces a novel multi-scale detail enhancement unit to accentuate structural information, followed by a Latent Low-Rank Representation (LatLRR)-based simplification step. This unique combination effectively suppresses common artifacts like oversharpening, spurious edges, and noise by decomposing the image into meaningful subspaces. The principal structural features are then optimally combined with a gamma-corrected luminance channel using an unsupervised MU-Fusion network, achieving a balanced optimization of both global contrast and local details. The experimental results on the challenging Test-C60 and OceanDark datasets demonstrate that our method consistently outperforms state-of-the-art fusion-based approaches, achieving average improvements of 7.5% in UIQM, 6% in IL-NIQE, and 3% in AG. Wilcoxon signed-rank tests confirm that these performance gains are statistically significant (p < 0.01). Consequently, the proposed method significantly mitigates prevalent issues such as color aberration, detail loss, and artificial haze, which are frequently encountered in existing techniques. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

35 pages, 6888 KiB  
Article
AirTrace-SA: Air Pollution Tracing for Source Attribution
by Wenchuan Zhao, Qi Zhang, Ting Shu and Xia Du
Information 2025, 16(7), 603; https://doi.org/10.3390/info16070603 - 13 Jul 2025
Viewed by 147
Abstract
Air pollution source tracing is vital for effective pollution prevention and control, yet traditional methods often require large amounts of manual data, have limited cross-regional generalizability, and present challenges in capturing complex pollutant interactions. This study introduces AirTrace-SA (Air Pollution Tracing for Source [...] Read more.
Air pollution source tracing is vital for effective pollution prevention and control, yet traditional methods often require large amounts of manual data, have limited cross-regional generalizability, and present challenges in capturing complex pollutant interactions. This study introduces AirTrace-SA (Air Pollution Tracing for Source Attribution), a novel hybrid deep learning model designed for the accurate identification and quantification of air pollution sources. AirTrace-SA comprises three main components: a hierarchical feature extractor (HFE) that extracts multi-scale features from chemical components, a source association bridge (SAB) that links chemical features to pollution sources through a multi-step decision mechanism, and a source contribution quantifier (SCQ) based on the TabNet regressor for the precise prediction of source contributions. Evaluated on real air quality datasets from five cities (Lanzhou, Luoyang, Haikou, Urumqi, and Hangzhou), AirTrace-SA achieves an average R2 of 0.88 (ranging from 0.84 to 0.94 across 10-fold cross-validation), an average mean absolute error (MAE) of 0.60 (ranging from 0.46 to 0.78 across five cities), and an average root mean square error (RMSE) of 1.06 (ranging from 0.51 to 1.62 across ten pollution sources). The model outperforms baseline models such as 1D CNN and LightGBM in terms of stability, accuracy, and cross-city generalization. Feature importance analysis identifies the main contributions of source categories, further improving interpretability. By reducing the reliance on labor-intensive data collection and providing scalable, high-precision source tracing, AirTrace-SA offers a powerful tool for environmental management that supports targeted emission reduction strategies and sustainable development. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics)
Show Figures

Figure 1

28 pages, 19790 KiB  
Article
HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention
by Kaipeng Wang, Guanglin He and Xinmin Li
Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025
Viewed by 209
Abstract
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

22 pages, 3279 KiB  
Article
HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception
by Dongmei Song, Shuzhen Wang, Bin Wang, Weimin Chen and Lei Chen
J. Mar. Sci. Eng. 2025, 13(7), 1340; https://doi.org/10.3390/jmse13071340 - 13 Jul 2025
Viewed by 156
Abstract
Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is [...] Read more.
Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is difficult to obtain a large number of labeled samples in real oil spill monitoring scenarios. Surprisingly, few-shot learning can achieve excellent classification performance with only a small number of labeled samples. In this context, a new cross-domain few-shot SAR oil spill detection network is proposed in this paper. Significantly, the network is embedded with a hybrid attention feature extraction block, which consists of a coordinate attention module to perceive the channel information and spatial location information, as well as a global self-attention transformer module capturing the global dependencies and a multi-scale self-attention module depicting the local detailed features, thereby achieving deep mining and accurate characterization of image features. In addition, to address the problem that it is difficult to distinguish between the suspected oil film in seawater and real oil film using few-shot due to the small difference in features, this paper proposes a double loss function category determination block, which consists of two parts: a well-designed category-perception loss function and a traditional cross-entropy loss function. The category-perception loss function optimizes the spatial distribution of sample features by shortening the distance between similar samples while expanding the distance between different samples. By combining the category-perception loss function with the cross-entropy loss function, the network’s performance in discriminating between real and suspected oil films is thus maximized. The experimental results effectively demonstrate that this study provides an effective solution for high-precision oil spill detection under few-shot conditions, which is conducive to the rapid identification of oil spill accidents. Full article
(This article belongs to the Section Marine Environmental Science)
Show Figures

Figure 1

25 pages, 4882 KiB  
Article
HSF-YOLO: A Multi-Scale and Gradient-Aware Network for Small Object Detection in Remote Sensing Images
by Fujun Wang and Xing Wang
Sensors 2025, 25(14), 4369; https://doi.org/10.3390/s25144369 - 12 Jul 2025
Viewed by 215
Abstract
Small object detection (SOD) in remote sensing images (RSIs) is a challenging task due to scale variation, severe occlusion, and complex backgrounds, often leading to high miss and false detection rates. To address these issues, this paper proposes a novel detection framework named [...] Read more.
Small object detection (SOD) in remote sensing images (RSIs) is a challenging task due to scale variation, severe occlusion, and complex backgrounds, often leading to high miss and false detection rates. To address these issues, this paper proposes a novel detection framework named HSF-YOLO, which is designed to jointly enhance feature encoding, attention interaction, and localization precision within the YOLOv8 backbone. Specifically, we introduce three tailored modules: Hybrid Atrous Enhanced Convolution (HAEC), a Spatial–Interactive–Shuffle attention module (C2f_SIS), and a Focal Gradient Refinement Loss (FGR-Loss). The HAEC module captures multi-scale semantic and fine-grained local information through parallel atrous and standard convolutions, thereby enhancing small object representation across scales. The C2f_SIS module fuses spatial and improved channel attention with a channel shuffle strategy to enhance feature interaction and suppress background noise. The FGR-Loss incorporates gradient-aware localization, focal weighting, and separation-aware constraints to improve regression accuracy and training robustness. Extensive experiments were conducted on three public remote sensing datasets. Compared with the baseline YOLOv8, HSF-YOLO improved mAP@0.5 and mAP@0.5:0.95 by 5.7% and 4.0% on the VisDrone2019 dataset, by 2.3% and 2.5% on the DIOR dataset, and by 2.3% and 2.1% on the NWPU VHR-10 dataset, respectively. These results confirm that HSF-YOLO is a unified and effective solution for small object detection in complex RSI scenarios, offering a good balance between accuracy and efficiency. Full article
(This article belongs to the Special Issue Application of Satellite Remote Sensing in Geospatial Monitoring)
Show Figures

Figure 1

19 pages, 2468 KiB  
Article
A Dual-Branch Spatial-Frequency Domain Fusion Method with Cross Attention for SAR Image Target Recognition
by Chao Li, Jiacheng Ni, Ying Luo, Dan Wang and Qun Zhang
Remote Sens. 2025, 17(14), 2378; https://doi.org/10.3390/rs17142378 - 10 Jul 2025
Viewed by 228
Abstract
Synthetic aperture radar (SAR) image target recognition has important application values in security reconnaissance and disaster monitoring. However, due to speckle noise and target orientation sensitivity in SAR images, traditional spatial domain recognition methods face challenges in accuracy and robustness. To effectively address [...] Read more.
Synthetic aperture radar (SAR) image target recognition has important application values in security reconnaissance and disaster monitoring. However, due to speckle noise and target orientation sensitivity in SAR images, traditional spatial domain recognition methods face challenges in accuracy and robustness. To effectively address these challenges, we propose a dual-branch spatial-frequency domain fusion recognition method with cross-attention, achieving deep fusion of spatial and frequency domain features. In the spatial domain, we propose an enhanced multi-scale feature extraction module (EMFE), which adopts a multi-branch parallel structure to effectively enhance the network’s multi-scale feature representation capability. Combining frequency domain guided attention, the model focuses on key regional features in the spatial domain. In the frequency domain, we design a hybrid frequency domain transformation module (HFDT) that extracts real and imaginary features through Fourier transform to capture the global structure of the image. Meanwhile, we introduce a spatially guided frequency domain attention to enhance the discriminative capability of frequency domain features. Finally, we propose a cross-domain feature fusion (CDFF) module, which achieves bidirectional interaction and optimal fusion of spatial-frequency domain features through cross attention and adaptive feature fusion. Experimental results demonstrate that our method achieves significantly superior recognition accuracy compared to existing methods on the MSTAR dataset. Full article
Show Figures

Figure 1

24 pages, 3937 KiB  
Article
HyperTransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Hyperspectral Image Classification
by Xin Dai, Zexi Li, Lin Li, Shuihua Xue, Xiaohui Huang and Xiaofei Yang
Remote Sens. 2025, 17(14), 2361; https://doi.org/10.3390/rs17142361 - 9 Jul 2025
Viewed by 238
Abstract
Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) [...] Read more.
Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) insufficient synergy between spectral and spatial feature learning due to rigid coupling mechanisms; (2) high computational complexity resulting from redundant attention calculations; and (3) limited adaptability to spectral redundancy and noise in small-sample scenarios. To address these limitations, we propose HyperTransXNet, a novel CNN-Transformer hybrid architecture that incorporates adaptive spectral-spatial fusion. Specifically, the proposed HyperTransXNet comprises three key modules: (1) a Hybrid Spatial-Spectral Module (HSSM) that captures the refined local spectral-spatial features and models global spectral correlations by combining depth-wise dynamic convolution with frequency-domain attention; (2) a Mixture-of-Experts Routing (MoE-R) module that adaptively fuses multi-scale features by dynamically selecting optimal experts via Top-K sparse weights; and (3) a Spatial-Spectral Tokens Enhancer (SSTE) module that ensures causality-preserving interactions between spectral bands and spatial contexts. Extensive experiments on the Indian Pines, Houston 2013, and WHU-Hi-LongKou datasets demonstrate the superiority of HyperTransXNet. Full article
(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)
Show Figures

Figure 1

18 pages, 70320 KiB  
Article
RIS-UNet: A Multi-Level Hierarchical Framework for Liver Tumor Segmentation in CT Images
by Yuchai Wan, Lili Zhang and Murong Wang
Entropy 2025, 27(7), 735; https://doi.org/10.3390/e27070735 - 9 Jul 2025
Viewed by 274
Abstract
The deep learning-based analysis of liver CT images is expected to provide assistance for clinicians in the diagnostic decision-making process. However, the accuracy of existing methods still falls short of clinical requirements and needs to be further improved. Therefore, in this work, we [...] Read more.
The deep learning-based analysis of liver CT images is expected to provide assistance for clinicians in the diagnostic decision-making process. However, the accuracy of existing methods still falls short of clinical requirements and needs to be further improved. Therefore, in this work, we propose a novel multi-level hierarchical framework for liver tumor segmentation. In the first level, we integrate inter-slice spatial information by a 2.5D network to resolve the accuracy–efficiency trade-off inherent in conventional 2D/3D segmentation strategies for liver tumor segmentation. Then, the second level extracts the inner-slice global and local features for enhancing feature representation. We propose the Res-Inception-SE Block, which combines residual connections, multi-scale Inception modules, and squeeze-excitation attention to capture comprehensive global and local features. Furthermore, we design a hybrid loss function combining Binary Cross Entropy (BCE) and Dice loss to solve the category imbalance problem and accelerate convergence. Extensive experiments on the LiTS17 dataset demonstrate the effectiveness of our method on accuracy, efficiency, and visual results for liver tumor segmentation. Full article
(This article belongs to the Special Issue Cutting-Edge AI in Computational Bioinformatics)
Show Figures

Figure 1

15 pages, 1770 KiB  
Article
PSHNet: Hybrid Supervision and Feature Enhancement for Accurate Infrared Small-Target Detection
by Weicong Chen, Chenghong Zhang and Yuan Liu
Appl. Sci. 2025, 15(14), 7629; https://doi.org/10.3390/app15147629 - 8 Jul 2025
Viewed by 127
Abstract
Detecting small targets in infrared imagery remains highly challenging due to sub-pixel target sizes, low signal-to-noise ratios, and complex background clutter. This paper proposes PSHNet, a hybrid deep-learning framework that combines dense spatial heatmap supervision with geometry-aware regression for accurate infrared small-target detection. [...] Read more.
Detecting small targets in infrared imagery remains highly challenging due to sub-pixel target sizes, low signal-to-noise ratios, and complex background clutter. This paper proposes PSHNet, a hybrid deep-learning framework that combines dense spatial heatmap supervision with geometry-aware regression for accurate infrared small-target detection. The network generates position–scale heatmaps to guide coarse localization, which are further refined through sub-pixel offset and size regression. A Complete IoU (CIoU) loss is introduced as a geometric regularization term to improve alignment between predicted and ground-truth bounding boxes. To better preserve fine spatial details essential for identifying small thermal signatures, an Enhanced Low-level Feature Module (ELFM) is incorporated using multi-scale dilated convolutions and channel attention. Experiments on the NUDT-SIRST and IRSTD-1k datasets demonstrate that PSHNet outperforms existing methods in IoU, detection probability, and false alarm rate, achieving IoU improvement and robust performance under low-SNR conditions. Full article
Show Figures

Figure 1

25 pages, 4082 KiB  
Article
Multi-Scale Attention Fusion Gesture-Recognition Algorithm Based on Strain Sensors
by Zhiqiang Zhang, Jun Cai, Xueyu Dai and Hui Xiao
Sensors 2025, 25(13), 4200; https://doi.org/10.3390/s25134200 - 5 Jul 2025
Viewed by 216
Abstract
Surface electromyography (sEMG) signals are commonly employed for dynamic-gesture recognition. However, their robustness is often compromised by individual variability and sensor placement inconsistencies, limiting their reliability in complex and unconstrained scenarios. In contrast, strain-gauge signals offer enhanced environmental adaptability by stably capturing joint [...] Read more.
Surface electromyography (sEMG) signals are commonly employed for dynamic-gesture recognition. However, their robustness is often compromised by individual variability and sensor placement inconsistencies, limiting their reliability in complex and unconstrained scenarios. In contrast, strain-gauge signals offer enhanced environmental adaptability by stably capturing joint deformation processes. To address the challenges posed by the multi-channel, temporal, and amplitude-varying nature of strain signals, this paper proposes a lightweight hybrid attention network, termed MACLiteNet. The network integrates a local temporal modeling branch, a multi-scale fusion module, and a channel reconstruction mechanism to jointly capture local dynamic transitions and inter-channel structural correlations. Experimental evaluations conducted on both a self-collected strain-gauge dataset and the public sEMG benchmark NinaPro DB1 demonstrate that MACLiteNet achieves recognition accuracies of 99.71% and 98.45%, respectively, with only 0.22M parameters and a computational cost as low as 0.10 GFLOPs. Extensive experimental results demonstrate that the proposed method achieves superior performance in terms of accuracy, efficiency, and cross-modal generalization, offering a promising solution for building efficient and reliable strain-driven interactive systems. Full article
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))
Show Figures

Figure 1

16 pages, 2358 KiB  
Article
A Hybrid Content-Aware Network for Single Image Deraining
by Guoqiang Chai, Rui Yang, Jin Ge and Yulei Chen
Computers 2025, 14(7), 262; https://doi.org/10.3390/computers14070262 - 4 Jul 2025
Viewed by 233
Abstract
Rain streaks degrade the quality of optical images and seriously affect the effectiveness of subsequent vision-based algorithms. Although the applications of a convolutional neural network (CNN) and self-attention mechanism (SA) in single image deraining have shown great success, there are still unresolved issues [...] Read more.
Rain streaks degrade the quality of optical images and seriously affect the effectiveness of subsequent vision-based algorithms. Although the applications of a convolutional neural network (CNN) and self-attention mechanism (SA) in single image deraining have shown great success, there are still unresolved issues regarding the deraining performance and the large computational load. The work in this paper fully coordinates and utilizes the advantages between CNN and SA and proposes a hybrid content-aware deraining network (CAD) to reduce complexity and generate high-quality results. Specifically, we construct the CADBlock, including the content-aware convolution and attention mixer module (CAMM) and the multi-scale double-gated feed-forward module (MDFM). In CAMM, the attention mechanism is used for intricate windows to generate abundant features and simple convolution is used for plain windows to reduce computational costs. In MDFM, multi-scale spatial features are double-gated fused to preserve local detail features and enhance image restoration capabilities. Furthermore, a four-token contextual attention module (FTCA) is introduced to explore the content information among neighbor keys to improve the representation ability. Both qualitative and quantitative validations on synthetic and real-world rain images demonstrate that the proposed CAD can achieve a competitive deraining performance. Full article
(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)
Show Figures

Figure 1

Back to TopTop