Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (421)

Search Parameters:
Keywords = Depthwise Separable Convolutions

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 866 KiB  
Article
GhostBlock-Augmented Lightweight Gaze Tracking via Depthwise Separable Convolution
by Jing-Ming Guo, Yu-Sung Cheng, Yi-Chong Zeng and Zong-Yan Yang
Electronics 2025, 14(15), 2978; https://doi.org/10.3390/electronics14152978 (registering DOI) - 25 Jul 2025
Abstract
This paper proposes a lightweight gaze-tracking architecture named GhostBlock-Augmented Look to Coordinate Space (L2CS), which integrates GhostNet-based modules and depthwise separable convolution to achieve a better trade-off between model accuracy and computational efficiency. Conventional lightweight gaze-tracking models often suffer from degraded accuracy due [...] Read more.
This paper proposes a lightweight gaze-tracking architecture named GhostBlock-Augmented Look to Coordinate Space (L2CS), which integrates GhostNet-based modules and depthwise separable convolution to achieve a better trade-off between model accuracy and computational efficiency. Conventional lightweight gaze-tracking models often suffer from degraded accuracy due to aggressive parameter reduction. To address this issue, we introduce GhostBlocks, a custom-designed convolutional unit that combines intrinsic feature generation with ghost feature recomposition through depthwise operations. Our method enhances the original L2CS architecture by replacing each ResNet block with GhostBlocks, thereby significantly reducing the number of parameters and floating-point operations. The experimental results on the Gaze360 dataset demonstrate that the proposed model reduces FLOPs from 16.527 × 108 to 8.610 × 108 and parameter count from 2.387 × 105 to 1.224 × 105 while maintaining comparable gaze estimation accuracy, with MAE increasing only slightly from 10.70° to 10.87°. This work highlights the potential of GhostNet-augmented designs for real-time gaze tracking on edge devices, providing a practical solution for deployment in resource-constrained environments. Full article
25 pages, 19515 KiB  
Article
Towards Efficient SAR Ship Detection: Multi-Level Feature Fusion and Lightweight Network Design
by Wei Xu, Zengyuan Guo, Pingping Huang, Weixian Tan and Zhiqi Gao
Remote Sens. 2025, 17(15), 2588; https://doi.org/10.3390/rs17152588 - 24 Jul 2025
Abstract
Synthetic Aperture Radar (SAR) provides all-weather, all-time imaging capabilities, enabling reliable maritime ship detection under challenging weather and lighting conditions. However, most high-precision detection models rely on complex architectures and large-scale parameters, limiting their applicability to resource-constrained platforms such as satellite-based systems, where [...] Read more.
Synthetic Aperture Radar (SAR) provides all-weather, all-time imaging capabilities, enabling reliable maritime ship detection under challenging weather and lighting conditions. However, most high-precision detection models rely on complex architectures and large-scale parameters, limiting their applicability to resource-constrained platforms such as satellite-based systems, where model size, computational load, and power consumption are tightly restricted. Thus, guided by the principles of lightweight design, robustness, and energy efficiency optimization, this study proposes a three-stage collaborative multi-level feature fusion framework to reduce model complexity without compromising detection performance. Firstly, the backbone network integrates depthwise separable convolutions and a Convolutional Block Attention Module (CBAM) to suppress background clutter and extract effective features. Building upon this, a cross-layer feature interaction mechanism is introduced via the Multi-Scale Coordinated Fusion (MSCF) and Bi-EMA Enhanced Fusion (Bi-EF) modules to strengthen joint spatial-channel perception. To further enhance the detection capability, Efficient Feature Learning (EFL) modules are embedded in the neck to improve feature representation. Experiments on the Synthetic Aperture Radar (SAR) Ship Detection Dataset (SSDD) show that this method, with only 1.6 M parameters, achieves a mean average precision (mAP) of 98.35% in complex scenarios, including inshore and offshore environments. It balances the difficult problem of being unable to simultaneously consider accuracy and hardware resource requirements in traditional methods, providing a new technical path for real-time SAR ship detection on satellite platforms. Full article
Show Figures

Figure 1

22 pages, 16984 KiB  
Article
Small Ship Detection Based on Improved Neural Network Algorithm and SAR Images
by Jiaqi Li, Hongyuan Huo, Li Guo, De Zhang, Wei Feng, Yi Lian and Long He
Remote Sens. 2025, 17(15), 2586; https://doi.org/10.3390/rs17152586 - 24 Jul 2025
Abstract
Synthetic aperture radar images can be used for ship target detection. However, due to the unclear ship outline in SAR images, noise and land background factors affect the difficulty and accuracy of ship (especially small target ship) detection. Therefore, based on the YOLOv5s [...] Read more.
Synthetic aperture radar images can be used for ship target detection. However, due to the unclear ship outline in SAR images, noise and land background factors affect the difficulty and accuracy of ship (especially small target ship) detection. Therefore, based on the YOLOv5s model, this paper improves its backbone network and feature fusion network algorithm to improve the accuracy of ship detection target recognition. First, the LSKModule is used to improve the backbone network of YOLOv5s. By adaptively aggregating the features extracted by large-size convolution kernels to fully obtain context information, at the same time, key features are enhanced and noise interference is suppressed. Secondly, multiple Depthwise Separable Convolution layers are added to the SPPF (Spatial Pyramid Pooling-Fast) structure. Although a small number of parameters and calculations are introduced, features of different receptive fields can be extracted. Third, the feature fusion network of YOLOv5s is improved based on BIFPN, and the shallow feature map is used to optimize the small target detection performance. Finally, the CoordConv module is added before the detect head of YOLOv5, and two coordinate channels are added during the convolution operation to further improve the accuracy of target detection. The map50 of this method for the SSDD dataset and HRSID dataset reached 97.6% and 91.7%, respectively, and was compared with a variety of advanced target detection models. The results show that the detection accuracy of this method is higher than other similar target detection algorithms. Full article
Show Figures

Figure 1

34 pages, 32238 KiB  
Article
ACLC-Detection: A Network for Remote Sensing Image Detection Based on Attention Mechanism and Lightweight Convolution
by Shaodong Liu, Faming Shao, Chenshan Yang, Juying Dai, Jinhong Xue, Qing Liu and Tao Zhang
Remote Sens. 2025, 17(15), 2572; https://doi.org/10.3390/rs17152572 - 24 Jul 2025
Abstract
Detecting small objects using remote sensing technology has consistently posed challenges. To address this issue, a novel detection framework named ACLC-Detection has been introduced. Building upon the Yolov11 architecture, this detector integrates an attention mechanism with lightweight convolution to enhance performance. Specifically, the [...] Read more.
Detecting small objects using remote sensing technology has consistently posed challenges. To address this issue, a novel detection framework named ACLC-Detection has been introduced. Building upon the Yolov11 architecture, this detector integrates an attention mechanism with lightweight convolution to enhance performance. Specifically, the deep and shallow convolutional layers of the backbone network are both introduced to depthwise separable convolution. Moreover, the designed lightweight convolutional excitation module (CEM) is used to obtain the contextual information of targets and reduce the loss of information for small targets. In addition, the C3k2 module in the neck fusion network part, where C3k = True, is replaced by the Convolutional Attention Module with Ghost Module (CAF-GM). This not only reduces the model complexity but also acquires more effective information. The Simple Attention module (SimAM) used in it not only suppresses redundant information but also has zero impact on the growth of model parameters. Finally, the Inner-Complete Intersection over Union (Inner-CIOU) loss function is employed, which enables better localization and detection of small targets. Extensive experiments conducted on the DOTA and VisDrone2019 datasets have demonstrated the advantages of the proposed enhanced model in dealing with small objects in aerial imagery. Full article
Show Figures

Figure 1

21 pages, 1936 KiB  
Article
FFT-RDNet: A Time–Frequency-Domain-Based Intrusion Detection Model for IoT Security
by Bingjie Xiang, Renguang Zheng, Kunsan Zhang, Chaopeng Li and Jiachun Zheng
Sensors 2025, 25(15), 4584; https://doi.org/10.3390/s25154584 - 24 Jul 2025
Abstract
Resource-constrained Internet of Things (IoT) devices demand efficient and robust intrusion detection systems (IDSs) to counter evolving cyber threats. The traditional IDS models, however, struggle with high computational complexity and inadequate feature extraction, limiting their accuracy and generalizability in IoT environments. To address [...] Read more.
Resource-constrained Internet of Things (IoT) devices demand efficient and robust intrusion detection systems (IDSs) to counter evolving cyber threats. The traditional IDS models, however, struggle with high computational complexity and inadequate feature extraction, limiting their accuracy and generalizability in IoT environments. To address this, we propose FFT-RDNet, a lightweight IDS framework leveraging depthwise separable convolution and frequency-domain feature fusion. An ADASYN-Tomek Links hybrid strategy first addresses class imbalances. The core innovation of FFT-RDNet lies in its novel two-dimensional spatial feature modeling approach, realized through a dedicated dual-path feature embedding module. One branch extracts discriminative statistical features in the time domain, while the other branch transforms the data into the frequency domain via Fast Fourier Transform (FFT) to capture the essential energy distribution characteristics. These time–frequency domain features are fused to construct a two-dimensional feature space, which is then processed by a streamlined residual network using depthwise separable convolution. This network effectively captures complex periodic attack patterns with minimal computational overhead. Comprehensive evaluation on the NSL-KDD and CIC-IDS2018 datasets shows that FFT-RDNet outperforms state-of-the-art neural network IDSs across accuracy, precision, recall, and F1 score (improvements: 0.22–1%). Crucially, it achieves superior accuracy with a significantly reduced computational complexity, demonstrating high efficiency for resource-constrained IoT security deployments. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

22 pages, 2514 KiB  
Article
High-Accuracy Recognition Method for Diseased Chicken Feces Based on Image and Text Information Fusion
by Duanli Yang, Zishang Tian, Jianzhong Xi, Hui Chen, Erdong Sun and Lianzeng Wang
Animals 2025, 15(15), 2158; https://doi.org/10.3390/ani15152158 - 22 Jul 2025
Viewed by 212
Abstract
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces [...] Read more.
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces Diagnosis), a ResNet50-based multimodal fusion model leveraging semantic complementarity between images and descriptive text to enhance diagnostic precision. Key innovations include the following: (1) Integrating MASA(Manhattan self-attention)and DSconv (Depthwise Separable convolution) into the backbone network to mitigate feature confusion. (2) Utilizing a pre-trained BERT to extract textual semantic features, reducing annotation dependency and cost. (3) Designing a lightweight Gated Cross-Attention (GCA) module for dynamic multimodal fusion, achieving a 41% parameter reduction versus cross-modal transformers. Experiments demonstrate that MMCD significantly outperforms single-modal baselines in Accuracy (+8.69%), Recall (+8.72%), Precision (+8.67%), and F1 score (+8.72%). It surpasses simple feature concatenation by 2.51–2.82% and reduces parameters by 7.5M and computations by 1.62 GFLOPs versus the base ResNet50. This work validates multimodal fusion’s efficacy in pathological fecal detection, providing a theoretical and technical foundation for agricultural health monitoring systems. Full article
(This article belongs to the Section Animal Welfare)
Show Figures

Figure 1

19 pages, 7168 KiB  
Article
MTD-YOLO: An Improved YOLOv8-Based Rice Pest Detection Model
by Feng Zhang, Chuanzhao Tian, Xuewen Li, Na Yang, Yanting Zhang and Qikai Gao
Electronics 2025, 14(14), 2912; https://doi.org/10.3390/electronics14142912 - 21 Jul 2025
Viewed by 193
Abstract
The impact of insect pests on the yield and quality of rice is extremely significant, and accurate detection of insect pests is of crucial significance to safeguard rice production. However, traditional manual inspection methods are inefficient and subjective, while existing machine learning-based approaches [...] Read more.
The impact of insect pests on the yield and quality of rice is extremely significant, and accurate detection of insect pests is of crucial significance to safeguard rice production. However, traditional manual inspection methods are inefficient and subjective, while existing machine learning-based approaches still suffer from limited generalization and suboptimal accuracy. To address these challenges, this study proposes an improved rice pest detection model, MTD-YOLO, based on the YOLOv8 framework. First, the original backbone is replaced with MobileNetV3, which leverages optimized depthwise separable convolutions and the Hard-Swish activation function through neural architecture search, effectively reducing parameters while maintaining multiscale feature extraction capabilities. Second, a Cross Stage Partial module with Triplet Attention (C2f-T) module incorporating Triplet Attention is introduced to enhance the model’s focus on infested regions via a channel-patial dual-attention mechanism. In addition, a Dynamic Head (DyHead) is introduced to adaptively focus on pest morphological features using the scale–space–task triple-attention mechanism. The experiments were conducted using two datasets, Rice Pest1 and Rice Pest2. On Rice Pest1, the model achieved a precision of 92.5%, recall of 90.1%, mAP@0.5 of 90.0%, and mAP@[0.5:0.95] of 67.8%. On Rice Pest2, these metrics improved to 95.6%, 92.8%, 96.6%, and 82.5%, respectively. The experimental results demonstrate the high accuracy and efficiency of the model in the rice pest detection task, providing strong support for practical applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

23 pages, 1755 KiB  
Article
An Efficient Continuous-Variable Quantum Key Distribution with Parameter Optimization Using Elitist Elk Herd Random Immigrants Optimizer and Adaptive Depthwise Separable Convolutional Neural Network
by Vidhya Prakash Rajendran, Deepalakshmi Perumalsamy, Chinnasamy Ponnusamy and Ezhil Kalaimannan
Future Internet 2025, 17(7), 307; https://doi.org/10.3390/fi17070307 - 17 Jul 2025
Viewed by 232
Abstract
Quantum memory is essential for the prolonged storage and retrieval of quantum information. Nevertheless, no current studies have focused on the creation of effective quantum memory for continuous variables while accounting for the decoherence rate. This work presents an effective continuous-variable quantum key [...] Read more.
Quantum memory is essential for the prolonged storage and retrieval of quantum information. Nevertheless, no current studies have focused on the creation of effective quantum memory for continuous variables while accounting for the decoherence rate. This work presents an effective continuous-variable quantum key distribution method with parameter optimization utilizing the Elitist Elk Herd Random Immigrants Optimizer (2E-HRIO) technique. At the outset of transmission, the quantum device undergoes initialization and authentication via Compressed Hash-based Message Authentication Code with Encoded Post-Quantum Hash (CHMAC-EPQH). The settings are subsequently optimized from the authenticated device via 2E-HRIO, which mitigates the effects of decoherence by adaptively tuning system parameters. Subsequently, quantum bits are produced from the verified device, and pilot insertion is executed within the quantum bits. The pilot-inserted signal is thereafter subjected to pulse shaping using a Gaussian filter. The pulse-shaped signal undergoes modulation. Authenticated post-modulation, the prediction of link failure is conducted through an authenticated channel using Radial Density-Based Spatial Clustering of Applications with Noise. Subsequently, transmission occurs via a non-failure connection. The receiver performs channel equalization on the received signal with Recursive Regularized Least Mean Squares. Subsequently, a dataset for side-channel attack authentication is gathered and preprocessed, followed by feature extraction and classification using Adaptive Depthwise Separable Convolutional Neural Networks (ADS-CNNs), which enhances security against side-channel attacks. The quantum state is evaluated based on the signal received, and raw data are collected. Thereafter, a connection is established between the transmitter and receiver. Both the transmitter and receiver perform the scanning process. Thereafter, the calculation and correction of the error rate are performed based on the sifting results. Ultimately, privacy amplification and key authentication are performed using the repaired key via B-CHMAC-EPQH. The proposed system demonstrated improved resistance to decoherence and side-channel attacks, while achieving a reconciliation efficiency above 90% and increased key generation rate. Full article
Show Figures

Graphical abstract

17 pages, 2115 KiB  
Article
Surface Defect Detection of Magnetic Tiles Based on YOLOv8-AHF
by Cheng Ma, Yurong Pan and Junfu Chen
Electronics 2025, 14(14), 2857; https://doi.org/10.3390/electronics14142857 - 17 Jul 2025
Viewed by 156
Abstract
Magnetic tiles are an important component of permanent magnet motors, and the quality of magnetic tiles directly affects the performance and service life of a motor. It is necessary to perform defect detection on magnetic tiles in industrial production and remove those with [...] Read more.
Magnetic tiles are an important component of permanent magnet motors, and the quality of magnetic tiles directly affects the performance and service life of a motor. It is necessary to perform defect detection on magnetic tiles in industrial production and remove those with defects. The YOLOv8-AHF algorithm is proposed to improve the ability of network feature information extraction and solve the problem of missed detection or poor detection results in surface defect detection due to the small volume of permanent magnet motor tiles, which reduces the deviation between the predicted box and the true box simultaneously. Firstly, a hybrid module of a combination of atrous convolution and depthwise separable convolution (ADConv) is introduced in the backbone of the model to capture global and local features in magnet tile detection images. In the neck section, a hybrid attention module (HAM) is introduced to focus on the regions of interest in the magnetic tile surface defect images, which improves the ability of information transmission and fusion. The Focal-Enhanced Intersection over Union loss function (Focal-EIoU) is optimized to effectively achieve localization. We conducted comparative experiments, ablation experiments, and corresponding generalization experiments on the magnetic tile surface defect dataset. The experimental results show that the evaluation metrics of YOLOv8-AHF surpass mainstream single-stage object detection algorithms. Compared to the You Only Look Once version 8 (YOLOv8) algorithm, the performance of the YOLOv8-AHF algorithm was improved by 5.9%, 4.1%, 5%, 5%, and 5.8% in terms of mAP@0.5, mAP@0.5:0.95, F1-Score, precision, and recall, respectively. This algorithm achieved significant performance improvement in the task of detecting surface defects on magnetic tiles. Full article
Show Figures

Figure 1

17 pages, 1416 KiB  
Article
A Transformer-Based Pavement Crack Segmentation Model with Local Perception and Auxiliary Convolution Layers
by Yi Zhu, Ting Cao and Yiqing Yang
Electronics 2025, 14(14), 2834; https://doi.org/10.3390/electronics14142834 - 15 Jul 2025
Viewed by 213
Abstract
Crack detection in complex pavement scenarios remains challenging due to the sparse small-target features and computational inefficiency of existing methods. To address these limitations, this study proposes an enhanced architecture based on Mask2Former. The framework integrates two key innovations. A Local Perception Module [...] Read more.
Crack detection in complex pavement scenarios remains challenging due to the sparse small-target features and computational inefficiency of existing methods. To address these limitations, this study proposes an enhanced architecture based on Mask2Former. The framework integrates two key innovations. A Local Perception Module (LPM) reconstructs geometric topological relationships through a Sequence-Space Dynamic Transformation Mechanism (DS2M), enhancing neighborhood feature extraction via depthwise separable convolutions. Simultaneously, an Auxiliary Convolutional Layer (ACL) combines lightweight residual convolutions with shallow high-resolution features, preserving critical edge details through channel attention weighting. Experimental evaluations demonstrate the model’s superior performance, achieving improvements of 3.2% in mIoU and 2.7% in mAcc compared to baseline methods, while maintaining computational efficiency with only 12.8 GFLOPs. These results validate the effectiveness of geometric relationship modeling and hierarchical feature fusion for pavement crack detection, suggesting practical potential for infrastructure maintenance systems. The proposed approach balances precision and efficiency, offering a viable solution for real-world applications with complex crack patterns and hardware constraints. Full article
Show Figures

Figure 1

28 pages, 19790 KiB  
Article
HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention
by Kaipeng Wang, Guanglin He and Xinmin Li
Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025
Viewed by 358
Abstract
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

25 pages, 8372 KiB  
Article
CSDNet: Context-Aware Segmentation of Disaster Aerial Imagery Using Detection-Guided Features and Lightweight Transformers
by Ahcene Zetout and Mohand Saïd Allili
Remote Sens. 2025, 17(14), 2337; https://doi.org/10.3390/rs17142337 - 8 Jul 2025
Viewed by 286
Abstract
Accurate multi-class semantic segmentation of disaster-affected areas is essential for rapid response and effective recovery planning. We present CSDNet, a context-aware segmentation model tailored to disaster scene scenarios, designed to improve segmentation of both large-scale disaster zones and small, underrepresented classes. The architecture [...] Read more.
Accurate multi-class semantic segmentation of disaster-affected areas is essential for rapid response and effective recovery planning. We present CSDNet, a context-aware segmentation model tailored to disaster scene scenarios, designed to improve segmentation of both large-scale disaster zones and small, underrepresented classes. The architecture combines a lightweight transformer module for global context modeling with depthwise separable convolutions (DWSCs) to enhance efficiency without compromising representational capacity. Additionally, we introduce a detection-guided feature fusion mechanism that integrates outputs from auxiliary detection tasks to mitigate class imbalance and improve discrimination of visually similar categories. Extensive experiments on several public datasets demonstrate that our model significantly improves segmentation of both man-made infrastructure and natural damage-related features, offering a robust and efficient solution for post-disaster analysis. Full article
Show Figures

Figure 1

25 pages, 11253 KiB  
Article
YOLO-UIR: A Lightweight and Accurate Infrared Object Detection Network Using UAV Platforms
by Chao Wang, Rongdi Wang, Ziwei Wu, Zetao Bian and Tao Huang
Drones 2025, 9(7), 479; https://doi.org/10.3390/drones9070479 - 7 Jul 2025
Viewed by 442
Abstract
Within the field of remote sensing, Unmanned Aerial Vehicle (UAV) infrared object detection plays a pivotal role, especially in complex environments. However, existing methods face challenges such as insufficient accuracy or low computational efficiency, particularly in the detection of small objects. This paper [...] Read more.
Within the field of remote sensing, Unmanned Aerial Vehicle (UAV) infrared object detection plays a pivotal role, especially in complex environments. However, existing methods face challenges such as insufficient accuracy or low computational efficiency, particularly in the detection of small objects. This paper proposes a lightweight and accurate UAV infrared object detection model, YOLO-UIR, for small object detection from a UAV perspective. The model is based on the YOLO architecture and mainly includes the Efficient C2f module, lightweight spatial perception (LSP) module, and bidirectional feature interaction fusion (BFIF) module. The Efficient C2f module significantly enhances feature extraction capabilities by combining local and global features through an Adaptive Dual-Stream Attention Mechanism. Compared with the existing C2f module, the introduction of Partial Convolution reduces the model’s parameter count while maintaining high detection accuracy. The BFIF module further enhances feature fusion effects through cross-level semantic interaction, thereby improving the model’s ability to fuse contextual features. Moreover, the LSP module efficiently combines features from different distances using Large Receptive Field Convolution Layers, significantly enhancing the model’s long-range information capture capability. Additionally, the use of Reparameterized Convolution and Depthwise Separable Convolution ensures the model’s lightweight nature, making it highly suitable for real-time applications. On the DroneVehicle and HIT-UAV datasets, YOLO-UIR achieves superior detection performance compared to existing methods, with an mAP of 71.1% and 90.7%, respectively. The model also demonstrates significant advantages in terms of computational efficiency and parameter count. Ablation experiments verify the effectiveness of each optimization module. Full article
(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)
Show Figures

Figure 1

18 pages, 17685 KiB  
Article
Real-Time Object Detection Model for Electric Power Operation Violation Identification
by Xiaoliang Qian, Longxiang Luo, Yang Li, Li Zeng, Zhiwu Chen, Wei Wang and Wei Deng
Information 2025, 16(7), 569; https://doi.org/10.3390/info16070569 - 3 Jul 2025
Viewed by 220
Abstract
The You Only Look Once (YOLO) object detection model has been widely applied to electric power operation violation identification, owing to its balanced performance in detection accuracy and inference speed. However, it still faces the following challenges: (1) insufficient detection capability for irregularly [...] Read more.
The You Only Look Once (YOLO) object detection model has been widely applied to electric power operation violation identification, owing to its balanced performance in detection accuracy and inference speed. However, it still faces the following challenges: (1) insufficient detection capability for irregularly shaped objects; (2) objects with low object-background contrast are easily omitted; (3) improving detection accuracy while maintaining computational efficiency is difficult. To address the above challenges, a novel real-time object detection model is proposed in this paper, which introduces three key innovations. To handle the first challenge, an edge perception cross-stage partial fusion with two convolutions (EPC2f) module that combines edge convolutions with depthwise separable convolutions is proposed, which can enhance the feature representation of irregularly shaped objects with only a slight increase in parameters. To handle the second challenge, an adaptive combination of local and global features module is proposed to enhance the discriminative ability of features while maintaining computational efficiency, where the local and global features are extracted respectively via 1D convolutions and adaptively combined by using learnable weights. To handle the third challenge, a parameter sharing of a multi-scale detection heads scheme is proposed to reduce the number of parameters and improve the interaction between multi-scale detection heads. The ablation study on the Ali Tianchi competition dataset validates the effectiveness of three innovation points and their combination. EAP-YOLO achieves the mAP@0.5 of 93.4% and an mAP@0.5–0.95 of 70.3% on the Ali Tianchi Competition dataset, outperforming 12 other object detection models while satisfying the real-time requirement. Full article
(This article belongs to the Special Issue Computer Vision for Security Applications, 2nd Edition)
Show Figures

Figure 1

17 pages, 7199 KiB  
Article
YED-Net: Yoga Exercise Dynamics Monitoring with YOLOv11-ECA-Enhanced Detection and DeepSORT Tracking
by Youyu Zhou, Shu Dong, Hao Sheng and Wei Ke
Appl. Sci. 2025, 15(13), 7354; https://doi.org/10.3390/app15137354 - 30 Jun 2025
Viewed by 290
Abstract
Against the backdrop of the deep integration of national fitness and sports science, this study addresses the lack of standardized movement assessment in yoga training by proposing an intelligent analysis system that integrates an improved YOLOv11-ECA detector with the DeepSORT tracking algorithm. A [...] Read more.
Against the backdrop of the deep integration of national fitness and sports science, this study addresses the lack of standardized movement assessment in yoga training by proposing an intelligent analysis system that integrates an improved YOLOv11-ECA detector with the DeepSORT tracking algorithm. A dynamic adaptive anchor mechanism and an Efficient Channel Attention (ECA) module are introduced, while the depthwise separable convolution in the C3k2 module is optimized with a kernel size of 2. Furthermore, a Parallel Spatial Attention (PSA) mechanism is incorporated to enhance multi-target feature discrimination. These enhancements enable the model to achieve a high detection accuracy of 98.6% mAP@0.5 while maintaining low computational complexity (2.35 M parameters, 3.11 GFLOPs). Evaluated on the SND Sun Salutation Yoga Dataset released in 2024, the improved model achieves a real-time processing speed of 85.79 frames per second (FPS) on an RTX 3060 platform, with an 18% reduction in computational cost compared to the baseline. Notably, it achieves a 0.9% improvement in AP@0.5 for small targets (<20 px). By integrating the Mars-smallCNN feature extraction network with a Kalman filtering-based trajectory prediction module, the system attains 58.3% Multiple Object Tracking Accuracy (MOTA) and 62.1% Identity F1 Score (IDF1) in dense multi-object scenarios, representing an improvement of approximately 9.8 percentage points over the conventional YOLO+DeepSORT method. Ablation studies confirm that the ECA module, implemented via lightweight 1D convolution, enhances channel attention modeling efficiency by 23% compared to the original SE module and reduces the false detection rate by 1.2 times under complex backgrounds. This study presents a complete “detection–tracking–assessment” pipeline for intelligent sports training. Future work aims to integrate 3D pose estimation to develop a closed-loop biomechanical analysis system, thereby advancing sports science toward intelligent decision-making paradigms. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

Back to TopTop