Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (196)

Search Parameters:
Keywords = multiple-channel feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 2162 KB  
Article
A Dual-Attention Temporal Convolutional Network-Based Track Initiation Method for Maneuvering Targets
by Hanbao Wu, Yiming Hao, Wei Chen and Mingli Liao
Electronics 2025, 14(21), 4215; https://doi.org/10.3390/electronics14214215 - 28 Oct 2025
Viewed by 117
Abstract
In strong clutter and maneuvering scenarios, radar track initiation faces the dual challenges of a low initiation rate and high false alarm rate. Although the existing deep learning methods show promise, the commonly adopted “feature flattening” input strategy destroys the intrinsic temporal structure [...] Read more.
In strong clutter and maneuvering scenarios, radar track initiation faces the dual challenges of a low initiation rate and high false alarm rate. Although the existing deep learning methods show promise, the commonly adopted “feature flattening” input strategy destroys the intrinsic temporal structure and feature relationships of track data, limiting its discriminative performance. To address this issue, this paper proposes a novel radar track initiation method based on Dual-Attention Temporal Convolutional Network (DA-TCN), reformulating track initiation as a binary classification task for very short multi-channel time series that preserve complete temporal structure. The DA-TCN model employs the TCN as its backbone network to extract local dynamic features and innovatively constructs a dual-attention architecture: a channel attention branch dynamically calibrates the importance of each kinematic feature, while a temporal attention branch integrates Bi-GRU and self-attention mechanisms to capture the dependencies at critical time steps. Ultimately, a learnable gated fusion mechanism adaptively weights the dual-branch information for optimal characterization of track characteristics. Experimental results on maneuvering target datasets demonstrate that the proposed method significantly outperforms multiple baseline models across varying clutter densities: Under the highest clutter density, DA-TCN achieves 95.12% true track initiation rate (+1.6% over best baseline) with 9.65% false alarm rate (3.63% reduction), validating its effectiveness for high-precision and highly robust track initiation in complex environments. Full article
Show Figures

Figure 1

23 pages, 5261 KB  
Article
FocusNet: A Lightweight Insulator Defect Detection Network via First-Order Taylor Importance Assessment and Knowledge Distillation
by Yurong Jing, Zhiyong Tao and Sen Lin
Algorithms 2025, 18(10), 649; https://doi.org/10.3390/a18100649 - 16 Oct 2025
Viewed by 280
Abstract
In the detection of small targets such as insulator defects and flashovers, the existing YOLOv11 has problems such as insufficient feature extraction and difficulty in balancing model lightweight and detection accuracy. We propose a lightweight architecture called FocusNet based on YOLOv11n. To improve [...] Read more.
In the detection of small targets such as insulator defects and flashovers, the existing YOLOv11 has problems such as insufficient feature extraction and difficulty in balancing model lightweight and detection accuracy. We propose a lightweight architecture called FocusNet based on YOLOv11n. To improve the feature expression ability of small targets, Aggregation Diffusion Neck is designed to achieve deep integration and optimization of features at different levels through multiple rounds of multi-scale feature fusion and scale adaptation, and Focus module is introduced to focus on and strengthen the key features of small targets. On this basis, to achieve efficient deployment, the Group-Level First-Order Taylor Expansion Importance Assessment Method is proposed to eliminate channels that have little impact on detection accuracy to streamline the model structure. Then, Channel Distribution Distillation compensates for the slight accuracy loss caused by pruning, and finally achieves the dual optimization of high accuracy and high efficiency. Furthermore, we analyze the interpretability of FocusNet via heatmaps generated by KPCA-CAM. Experiments show that FocusNet achieves 98.50% precision and 99.20% mAP@0.5 on a proprietary insulator defect detection database created for this project using only 3.80 GFLOPs. This research provides reliable technical support for insulator monitoring in power systems. Full article
(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))
Show Figures

Figure 1

22 pages, 3964 KB  
Article
MultiScaleSleepNet: A Hybrid CNN–BiLSTM–Transformer Architecture with Multi-Scale Feature Representation for Single-Channel EEG Sleep Stage Classification
by Cenyu Liu, Qinglin Guan, Wei Zhang, Liyang Sun, Mengyi Wang, Xue Dong and Shuogui Xu
Sensors 2025, 25(20), 6328; https://doi.org/10.3390/s25206328 - 13 Oct 2025
Viewed by 836
Abstract
Accurate automatic sleep stage classification from single-channel EEG remains challenging due to the need for effective extraction of multiscale neurophysiological features and modeling of long-range temporal dependencies. This study aims to address these limitations by developing an efficient and compact deep learning architecture [...] Read more.
Accurate automatic sleep stage classification from single-channel EEG remains challenging due to the need for effective extraction of multiscale neurophysiological features and modeling of long-range temporal dependencies. This study aims to address these limitations by developing an efficient and compact deep learning architecture tailored for wearable and edge device applications. We propose MultiScaleSleepNet, a hybrid convolutional neural network–bidirectional long short-term memory–transformer architecture that extracts multiscale temporal and spectral features through parallel convolutional branches, followed by sequential modeling using a BiLSTM memory network and transformer-based attention mechanisms. The model obtained an accuracy, macro-averaged F1 score, and kappa coefficient of 88.6%, 0.833, and 0.84 on the Sleep-EDF dataset; 85.6%, 0.811, and 0.80 on the Sleep-EDF Expanded dataset; and 84.6%, 0.745, and 0.79 on the SHHS dataset. Ablation studies indicate that attention mechanisms and spectral fusion consistently improve performance, with the most notable gains observed for stages N1, N3, and rapid eye movement. MultiScaleSleepNet demonstrates competitive performance across multiple benchmark datasets while maintaining a compact size of 1.9 million parameters, suggesting robustness to variations in dataset size and class distribution. The study supports the feasibility of real-time, accurate sleep staging from single-channel EEG using parameter-efficient deep models suitable for portable systems. Full article
(This article belongs to the Special Issue AI on Biomedical Signal Sensing and Processing for Health Monitoring)
Show Figures

Figure 1

39 pages, 13725 KB  
Article
SRTSOD-YOLO: Stronger Real-Time Small Object Detection Algorithm Based on Improved YOLO11 for UAV Imageries
by Zechao Xu, Huaici Zhao, Pengfei Liu, Liyong Wang, Guilong Zhang and Yuan Chai
Remote Sens. 2025, 17(20), 3414; https://doi.org/10.3390/rs17203414 - 12 Oct 2025
Viewed by 1111
Abstract
To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a [...] Read more.
To address the challenges of small target detection in UAV aerial images—such as difficulty in feature extraction, complex background interference, high miss rates, and stringent real-time requirements—this paper proposes an innovative model series named SRTSOD-YOLO, based on YOLO11. The backbone network incorporates a Multi-scale Feature Complementary Aggregation Module (MFCAM), designed to mitigate the loss of small target information as network depth increases. By integrating channel and spatial attention mechanisms with multi-scale convolutional feature extraction, MFCAM effectively locates small objects in the image. Furthermore, we introduce a novel neck architecture termed Gated Activation Convolutional Fusion Pyramid Network (GAC-FPN). This module enhances multi-scale feature fusion by emphasizing salient features while suppressing irrelevant background information. GAC-FPN employs three key strategies: adding a detection head with a small receptive field while removing the original largest one, leveraging large-scale features more effectively, and incorporating gated activation convolutional modules. To tackle the issue of positive-negative sample imbalance, we replace the conventional binary cross-entropy loss with an adaptive threshold focal loss in the detection head, accelerating network convergence. Additionally, to accommodate diverse application scenarios, we develop multiple versions of SRTSOD-YOLO by adjusting the width and depth of the network modules: a nano version (SRTSOD-YOLO-n), small (SRTSOD-YOLO-s), medium (SRTSOD-YOLO-m), and large (SRTSOD-YOLO-l). Experimental results on the VisDrone2019 and UAVDT datasets demonstrate that SRTSOD-YOLO-n improves the mAP@0.5 by 3.1% and 1.2% compared to YOLO11n, while SRTSOD-YOLO-l achieves gains of 7.9% and 3.3% over YOLO11l, respectively. Compared to other state-of-the-art methods, SRTSOD-YOLO-l attains the highest detection accuracy while maintaining real-time performance, underscoring the superiority of the proposed approach. Full article
Show Figures

Figure 1

30 pages, 13570 KB  
Article
DVIF-Net: A Small-Target Detection Network for UAV Aerial Images Based on Visible and Infrared Fusion
by Xiaofeng Zhao, Hui Zhang, Chenxiao Li, Kehao Wang and Zhili Zhang
Remote Sens. 2025, 17(20), 3411; https://doi.org/10.3390/rs17203411 - 11 Oct 2025
Viewed by 897
Abstract
During UAV aerial photography tasks, influenced by flight altitude and imaging mechanisms, the target in images often exhibits characteristics such as small size, complex backgrounds, and small inter-class differences. Under single optical modality, the weak and less discriminative feature representation of targets in [...] Read more.
During UAV aerial photography tasks, influenced by flight altitude and imaging mechanisms, the target in images often exhibits characteristics such as small size, complex backgrounds, and small inter-class differences. Under single optical modality, the weak and less discriminative feature representation of targets in drone-captured images makes them easily overwhelmed by complex background noise, leading to low detection accuracy, high missed-detection and false-detection rates in current object detection networks. Moreover, such methods struggle to meet all-weather and all-scenario application requirements. To address these issues, this paper proposes DVIF-Net, a visible-infrared fusion network for small-target detection in UAV aerial images, which leverages the complementary characteristics of visible and infrared images to enhance detection capability in complex environments. Firstly, a dual-branch feature extraction structure is designed based on YOLO architecture to separately extract features from visible and infrared images. Secondly, a P4-level cross-modal fusion strategy is proposed to effectively integrate features from both modalities while reducing computational complexity. Meanwhile, we design a novel dual context-guided fusion module to capture complementary features through channel attention of visible and infrared images during fusion and enhance interaction between modalities via element-wise multiplication. Finally, an edge information enhancement module based on cross stage partial structure is developed to improve sensitivity to small-target edges. Experimental results on two cross-modal datasets, DroneVehicle and VEDAI, demonstrate that DVIF-Net achieves detection accuracies of 85.8% and 62%, respectively. Compared with YOLOv10n, it has improved by 21.7% and 10.5% in visible modality, and by 7.4% and 30.5% in infrared modality, while maintaining a model parameter count of only 2.49 M. Furthermore, compared with 15 other algorithms, the proposed DVIF-Net attains SOTA performance. These results indicate that the method significantly enhances the detection capability for small targets in UAV aerial images, offering a high-precision and lightweight solution for real-time applications in complex aerial scenarios. Full article
Show Figures

Figure 1

31 pages, 3160 KB  
Article
Multimodal Image Segmentation with Dynamic Adaptive Window and Cross-Scale Fusion for Heterogeneous Data Environments
by Qianping He, Meng Wu, Pengchang Zhang, Lu Wang and Quanbin Shi
Appl. Sci. 2025, 15(19), 10813; https://doi.org/10.3390/app151910813 - 8 Oct 2025
Viewed by 570
Abstract
Multi-modal image segmentation is a key task in various fields such as urban planning, infrastructure monitoring, and environmental analysis. However, it remains challenging due to complex scenes, varying object scales, and the integration of heterogeneous data sources (such as RGB, depth maps, and [...] Read more.
Multi-modal image segmentation is a key task in various fields such as urban planning, infrastructure monitoring, and environmental analysis. However, it remains challenging due to complex scenes, varying object scales, and the integration of heterogeneous data sources (such as RGB, depth maps, and infrared). To address these challenges, we proposed a novel multi-modal segmentation framework, DyFuseNet, which features dynamic adaptive windows and cross-scale feature fusion capabilities. This framework consists of three key components: (1) Dynamic Window Module (DWM), which uses dynamic partitioning and continuous position bias to adaptively adjust window sizes, thereby improving the representation of irregular and fine-grained objects; (2) Scale Context Attention (SCA), a hierarchical mechanism that associates local details with global semantics in a coarse-to-fine manner, enhancing segmentation accuracy in low-texture or occluded regions; and (3) Hierarchical Adaptive Fusion Architecture (HAFA), which aligns and fuses features from multiple modalities through shallow synchronization and deep channel attention, effectively balancing complementarity and redundancy. Evaluated on benchmark datasets (such as ISPRS Vaihingen and Potsdam), DyFuseNet achieved state-of-the-art performance, with mean Intersection over Union (mIoU) scores of 80.40% and 80.85%, surpassing MFTransNet by 1.91% and 1.77%, respectively. The model also demonstrated strong robustness in challenging scenes (such as building edges and shadowed objects), achieving an average F1 score of 85% while maintaining high efficiency (26.19 GFLOPs, 30.09 FPS), making it suitable for real-time deployment. This work presents a practical, versatile, and computationally efficient solution for multi-modal image analysis, with potential applications beyond remote sensing, including smart monitoring, industrial inspection, and multi-source data fusion tasks. Full article
(This article belongs to the Special Issue Signal and Image Processing: From Theory to Applications: 2nd Edition)
Show Figures

Figure 1

29 pages, 4573 KB  
Article
LCW-YOLO: A Lightweight Multi-Scale Object Detection Method Based on YOLOv11 and Its Performance Evaluation in Complex Natural Scenes
by Gang Li and Juelong Fang
Sensors 2025, 25(19), 6209; https://doi.org/10.3390/s25196209 - 7 Oct 2025
Viewed by 736
Abstract
Accurate object detection is fundamental to computer vision, yet detecting small targets in complex backgrounds remains challenging due to feature loss and limited model efficiency. To address this, we propose LCW-YOLO, a lightweight detection framework that integrates three innovations: Wavelet Pooling, a CGBlock-enhanced [...] Read more.
Accurate object detection is fundamental to computer vision, yet detecting small targets in complex backgrounds remains challenging due to feature loss and limited model efficiency. To address this, we propose LCW-YOLO, a lightweight detection framework that integrates three innovations: Wavelet Pooling, a CGBlock-enhanced C3K2 structure, and an improved LDHead detection head. The Wavelet Pooling strategy employs Haar-based multi-frequency reconstruction to preserve fine-grained details while mitigating noise sensitivity. CGBlock introduces dynamic channel interactions within C3K2, facilitating the fusion of shallow visual cues with deep semantic features without excessive computational overhead. LDHead incorporates classification and localization functions, thereby improving target recognition accuracy and spatial precision. Extensive experiments across multiple public datasets demonstrate that LCW-YOLO outperforms mainstream detectors in both accuracy and inference speed, with notable advantages in small-object, sparse, and cluttered scenarios. Here we show that the combination of multi-frequency feature preservation and efficient feature fusion enables stronger representations under complex conditions, advancing the design of resource-efficient detection models for safety-critical and real-time applications. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

24 pages, 73520 KB  
Article
2C-Net: A Novel Spatiotemporal Dual-Channel Network for Soil Organic Matter Prediction Using Multi-Temporal Remote Sensing and Environmental Covariates
by Jiale Geng, Chong Luo, Jun Lu, Depiao Kong, Xue Li and Huanjun Liu
Remote Sens. 2025, 17(19), 3358; https://doi.org/10.3390/rs17193358 - 3 Oct 2025
Viewed by 415
Abstract
Soil organic matter (SOM) is essential for ecosystem health and agricultural productivity. Accurate prediction of SOM content is critical for modern agricultural management and sustainable soil use. Existing digital soil mapping (DSM) models, when processing temporal data, primarily focus on modeling the changes [...] Read more.
Soil organic matter (SOM) is essential for ecosystem health and agricultural productivity. Accurate prediction of SOM content is critical for modern agricultural management and sustainable soil use. Existing digital soil mapping (DSM) models, when processing temporal data, primarily focus on modeling the changes in input data across successive time steps. However, they do not adequately model the relationships among different input variables, which hinders the capture of complex data patterns and limits the accuracy of predictions. To address this problem, this paper proposes a novel deep learning model, 2-Channel Network (2C-Net), leveraging sequential multi-temporal remote sensing images to improve SOM prediction. The network separates input data into temporal and spatial data, processing them through independent temporal and spatial channels. Temporal data includes multi-temporal Sentinel-2 spectral reflectance, while spatial data consists of environmental covariates including climate and topography. The Multi-sequence Feature Fusion Module (MFFM) is proposed to globally model spectral data across multiple bands and time steps, and the Diverse Convolutional Architecture (DCA) extracts spatial features from environmental data. Experimental results show that 2C-Net outperforms the baseline model (CNN-LSTM) and mainstream machine learning model for DSM, with R2 = 0.524, RMSE = 0.884 (%), MAE = 0.581 (%), and MSE = 0.781 (%)2. Furthermore, this study demonstrates the significant importance of sequential spectral data for the inversion of SOM content and concludes the following: for the SOM inversion task, the bare soil period after tilling is a more important time window than other bare soil periods. 2C-Net model effectively captures spatiotemporal features, offering high-accuracy SOM predictions and supporting future DSM and soil management. Full article
(This article belongs to the Special Issue Remote Sensing in Soil Organic Carbon Dynamics)
Show Figures

Figure 1

16 pages, 10633 KB  
Article
HVI-Based Spatial–Frequency-Domain Multi-Scale Fusion for Low-Light Image Enhancement
by Yuhang Zhang, Huiying Zheng, Xinya Xu and Hancheng Zhu
Appl. Sci. 2025, 15(19), 10376; https://doi.org/10.3390/app151910376 - 24 Sep 2025
Viewed by 497
Abstract
Low-light image enhancement aims to restore images captured under extreme low-light conditions. Existing methods demonstrate that fusing Fourier transform magnitude and phase information within the RGB color space effectively improves enhancement results. Meanwhile, recent advances have demonstrated that certain color spaces based on [...] Read more.
Low-light image enhancement aims to restore images captured under extreme low-light conditions. Existing methods demonstrate that fusing Fourier transform magnitude and phase information within the RGB color space effectively improves enhancement results. Meanwhile, recent advances have demonstrated that certain color spaces based on human visual perception, such as Hue–Value–Intensity (HVI), are superior to RGB for enhancing low-light images. However, these methods neglect the key impact of the coupling relationship between spatial and frequency-domain features on image enhancement. This paper proposes a spatial–frequency-domain multi-scale fusion for low-light image enhancement by exploring the intrinsic relationships among the three channels of HVI space, which consists of a dual-path parallel processing architecture. In the spatial domain, a specifically designed multi-scale feature extraction module systematically captures comprehensive structural information. In the frequency domain, our model establishes deep coupling between spatial features and Fourier transform features in the I-channel. The effectively fused features from both domains synergistically drive an encoder–decoder network to achieve superior image enhancement performance. Extensive experiments on multiple public benchmark datasets show that the proposed method significantly outperforms state-of-the-art approaches in both quantitative metrics and visual quality. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 4503 KB  
Article
Single-Phase Ground Fault Detection Method in Three-Phase Four-Wire Distribution Systems Using Optuna-Optimized TabNet
by Xiaohua Wan, Hui Fan, Min Li and Xiaoyuan Wei
Electronics 2025, 14(18), 3659; https://doi.org/10.3390/electronics14183659 - 16 Sep 2025
Viewed by 558
Abstract
Single-phase ground (SPG) faults pose significant challenges in three-phase four-wire distribution systems due to their complex transient characteristics and the presence of multiple influencing factors. To solve the aforementioned issues, a comprehensive fault identification framework is proposed, which uses the TabNet deep learning [...] Read more.
Single-phase ground (SPG) faults pose significant challenges in three-phase four-wire distribution systems due to their complex transient characteristics and the presence of multiple influencing factors. To solve the aforementioned issues, a comprehensive fault identification framework is proposed, which uses the TabNet deep learning architecture with hyperparameters optimized by Optuna. Firstly, a 10 kV simulation model is developed in Simulink to generate a diverse fault dataset. For each simulated fault, voltage and current signals from eight channels (L1–L4 voltage and current) are collected. Secondly, multi-domain features are extracted from each channel across time, frequency, waveform, and wavelet perspectives. Then, an attention-based fusion mechanism is employed to capture cross-channel dependencies, followed by L2-norm-based feature selection to enhance generalization. Finally, the optimized TabNet model effectively classifies 24 fault categories, achieving an accuracy of 97.33%, and outperforms baseline methods including Temporal Convolutional Network (TCN), Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM), Capsule Network with Sparse Filtering (CNSF), and Dual-Branch CNN in terms of accuracy, macro-F1 score, and kappa coefficient. It also exhibits strong stability and fast convergence during training. These results demonstrate the robustness and interpretability of the proposed method for SPG fault detection. Full article
(This article belongs to the Section Power Electronics)
Show Figures

Figure 1

19 pages, 3899 KB  
Article
An Automatic Brain Cortex Segmentation Technique Based on Dynamic Recalibration and Region Awareness
by Jiaofen Nan, Gaodeng Fan, Kaifan Zhang, Shuyao Zhai, Xueqi Jin, Duan Li and Chunlai Yu
Electronics 2025, 14(18), 3631; https://doi.org/10.3390/electronics14183631 - 13 Sep 2025
Viewed by 367
Abstract
To address the limitations in the accuracy of current cerebral cortex structure segmentation methods, this study proposes an automatic segmentation network based on dynamic recalibration and region awareness. The network is an improved version of the classic U-shaped architecture, incorporating a Dynamic Recalibration [...] Read more.
To address the limitations in the accuracy of current cerebral cortex structure segmentation methods, this study proposes an automatic segmentation network based on dynamic recalibration and region awareness. The network is an improved version of the classic U-shaped architecture, incorporating a Dynamic Recalibration Block (DRB) and a Region-Aware Block (RAB). The DRB enhances important feature channels by extracting global feature information across channels, computing the significance weights via a two-layer fully connected network, and applying these weights to the original feature maps for dynamic feature reweighting. Meanwhile, the RAB integrates spatial positional information and captures both global and local context across multiple dimensions. It recalibrates features using dimension-specific weights, enabling region-aware feature association and complementing the DRB’s function. Together, these components enable efficient and accurate segmentation of brain structures. The proposed DRA-Net model effectively overcomes the accuracy–efficiency trade-off in cortical segmentation through multi-scale feature fusion, dual attention mechanisms, and deep feature extraction strategies. Experimental results demonstrate that DRA-Net achieves an average Dice score of 91.35% across multiple datasets, outperforming segmentation atlases based on methods such as U-Net, QuickNAT, and FastSurfer. Full article
Show Figures

Figure 1

15 pages, 5090 KB  
Article
EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network
by Hao Cheng, Weiye Pang, Kun Li, Yongzhuang Wei, Yuhang Song and Ji Chen
J. Imaging 2025, 11(9), 312; https://doi.org/10.3390/jimaging11090312 - 12 Sep 2025
Viewed by 594
Abstract
Currently, deepfake detection has garnered widespread attention as a key defense mechanism against the misuse of deepfake technology. However, existing deepfake detection networks still face challenges such as insufficient robustness, limited generalization capabilities, and a single feature extraction domain (e.g., using only spatial [...] Read more.
Currently, deepfake detection has garnered widespread attention as a key defense mechanism against the misuse of deepfake technology. However, existing deepfake detection networks still face challenges such as insufficient robustness, limited generalization capabilities, and a single feature extraction domain (e.g., using only spatial domain features) when confronted with evolving algorithms or diverse datasets, which severely limits their application capabilities. To address these issues, this study proposes a deepfake detection network named EFIMD-Net, which enhances performance by strengthening feature interaction and integrating spatial and frequency domain features. The proposed network integrates a Cross-feature Interaction Enhancement module (CFIE) based on cosine similarity, which achieves adaptive interaction between spatial domain features (RGB stream) and frequency domain features (SRM, Spatial Rich Model stream) through a channel attention mechanism, effectively fusing macro-semantic information with high-frequency artifact information. Additionally, an Enhanced Multi-scale Feature Fusion (EMFF) module is proposed, which effectively integrates multi-scale feature information from various layers of the network through adaptive feature enhancement and reorganization techniques. Experimental results show that compared to the baseline network Xception, EFIMD-Net achieves comparable or even better Area Under the Curve (AUC) on multiple datasets. Ablation experiments also validate the effectiveness of the proposed modules. Furthermore, compared to the baseline traditional two-stream network Locate and Verify, EFIMD-Net significantly improves forgery detection performance, with a 9-percentage-point increase in Area Under the Curve on the CelebDF-v1 dataset and a 7-percentage-point increase on the CelebDF-v2 dataset. These results fully demonstrate the effectiveness and generalization of EFIMD-Net in forgery detection. Potential limitations regarding real-time processing efficiency are acknowledged. Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
Show Figures

Figure 1

17 pages, 3935 KB  
Article
Markerless Force Estimation via SuperPoint-SIFT Fusion and Finite Element Analysis: A Sensorless Solution for Deformable Object Manipulation
by Qingqing Xu, Ruoyang Lai and Junqing Yin
Biomimetics 2025, 10(9), 600; https://doi.org/10.3390/biomimetics10090600 - 8 Sep 2025
Viewed by 532
Abstract
Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential [...] Read more.
Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential interference with robotic flexibility. Consequently, these conventional sensors are unsuitable for biomimetic robot requirements in object perception, natural interaction, and agile movement. Therefore, this study proposes a sensorless external force detection method that integrates SuperPoint-Scale Invariant Feature Transform (SIFT) feature extraction with finite element analysis to address force perception challenges. A visual analysis method based on the SuperPoint-SIFT feature fusion algorithm was implemented to reconstruct a three-dimensional displacement field of the target object. Subsequently, the displacement field was mapped to the contact force distribution using finite element modeling. Experimental results demonstrate a mean force estimation error of 7.60% (isotropic) and 8.15% (anisotropic), with RMSE < 8%, validated by flexible pressure sensors. To enhance the model’s reliability, a dual-channel video comparison framework was developed. By analyzing the consistency of the deformation patterns and mechanical responses between the actual compression and finite element simulation video keyframes, the proposed approach provides a novel solution for real-time force perception in robotic interactions. The proposed solution is suitable for applications such as precision assembly and medical robotics, where sensorless force feedback is crucial. Full article
(This article belongs to the Special Issue Bio-Inspired Intelligent Robot)
Show Figures

Figure 1

24 pages, 3398 KB  
Article
DEMNet: Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding
by Feng He, Qiran Zhang, Yichuan Li and Tianci Wang
Remote Sens. 2025, 17(17), 2963; https://doi.org/10.3390/rs17172963 - 26 Aug 2025
Viewed by 976
Abstract
Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to [...] Read more.
Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to significantly improve the probability of detection (Pd) and reduce false alarms (Fa). Existing multi-frame approaches often employ 3D convolutions/RNNs to implicitly extract temporal features. However, they typically lack explicit modeling of target motion. To address this, we propose a Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding (DEMNet) that explicitly incorporates motion information into the detection process. The first multi-level encoder–decoder module leverages spatial and channel attention mechanisms to fuse hierarchical features across multiple scales, enabling robust spatial feature extraction from each frame of the temporally aligned input sequence. The second encoder–decoder module encodes both inter-frame target motion and intra-frame target positional information, followed by 3D convolution to achieve effective motion information fusion. Extensive experiments demonstrate that DEMNet achieves state-of-the-art performance, outperforming recent advanced methods such as DTUM and SSTNet. For the DAUB dataset, compared to the second-best model, DEMNet improves Pd by 2.42 percentage points and reduces Fa by 4.13 × 10−6 (a 68.72% reduction). For the NUDT dataset, it improves Pd by 1.68 percentage points and reduces Fa by 0.67 × 10−6 (a 7.26% reduction) compared to the next-best model. Notably, DEMNet demonstrates even greater advantages on test sequences with SNR ≤ 3. Full article
(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)
Show Figures

Figure 1

16 pages, 3575 KB  
Article
A Fusion Model for Intelligent Diagnosis of Gear Faults with Small Sample Sizes
by Jianing Huang, Zikang Liu, Jianggui Han, Chenghao Cao and Xiaofeng Li
Sensors 2025, 25(17), 5230; https://doi.org/10.3390/s25175230 - 22 Aug 2025
Viewed by 758
Abstract
Gear faults are a frequent cause of rotating machinery breakdowns. There are two open issues in the current intelligent diagnosis model of gear faults. (1) Shallow models demand fewer data but necessitate feature extraction from raw signals, relying on prior knowledge. (2) Deep [...] Read more.
Gear faults are a frequent cause of rotating machinery breakdowns. There are two open issues in the current intelligent diagnosis model of gear faults. (1) Shallow models demand fewer data but necessitate feature extraction from raw signals, relying on prior knowledge. (2) Deep networks can adaptively extract fault features but require large datasets to train hyperparameters. In this paper, a novel fusion model, called CBAM-TCN-SVM, is proposed for intelligent gear fault diagnosis. It consists of a temporal convolutional network module (TCN), a convolutional block attention module (CBAM), and a support vector machine (SVM) module. More specifically, the frequency-domain sequence data are fed into the CBAM-TCN model, which effectively extracts deep fault features via multiple convolutional layers, channel attention mechanisms, and spatial attention mechanisms. Then, the SVM classifier is employed for intelligent classification. The fusion model combines the advantages of deep networks and shallow classifiers, addressing the issues that arise when the accuracy of fault diagnoses is constrained by the data scale and feature extractions rely on prior knowledge. The experiments result in the proposed method achieving a classification accuracy of 98.3% and demonstrate that it is a feasible approach for predicting gear faults. Full article
Show Figures

Figure 1

Back to TopTop