Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (88)

Search Parameters:
Keywords = feature scale selection pyramid

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 33417 KiB  
Article
Enhancing UAV Object Detection in Low-Light Conditions with ELS-YOLO: A Lightweight Model Based on Improved YOLOv11
by Tianhang Weng and Xiaopeng Niu
Sensors 2025, 25(14), 4463; https://doi.org/10.3390/s25144463 - 17 Jul 2025
Viewed by 490
Abstract
Drone-view object detection models operating under low-light conditions face several challenges, such as object scale variations, high image noise, and limited computational resources. Existing models often struggle to balance accuracy and lightweight architecture. This paper introduces ELS-YOLO, a lightweight object detection model tailored [...] Read more.
Drone-view object detection models operating under low-light conditions face several challenges, such as object scale variations, high image noise, and limited computational resources. Existing models often struggle to balance accuracy and lightweight architecture. This paper introduces ELS-YOLO, a lightweight object detection model tailored for low-light environments, built upon the YOLOv11s framework. ELS-YOLO features a re-parameterized backbone (ER-HGNetV2) with integrated Re-parameterized Convolution and Efficient Channel Attention mechanisms, a Lightweight Feature Selection Pyramid Network (LFSPN) for multi-scale object detection, and a Shared Convolution Separate Batch Normalization Head (SCSHead) to reduce computational complexity. Layer-Adaptive Magnitude-Based Pruning (LAMP) is employed to compress the model size. Experiments on the ExDark and DroneVehicle datasets demonstrate that ELS-YOLO achieves high detection accuracy with a compact model. Here, we show that ELS-YOLO attains a mAP@0.5 of 74.3% and 68.7% on the ExDark and DroneVehicle datasets, respectively, while maintaining real-time inference capability. Full article
(This article belongs to the Special Issue Vision Sensors for Object Detection and Tracking)
Show Figures

Figure 1

19 pages, 5701 KiB  
Article
Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays
by Junchao Zhu and Nan Gao
Electronics 2025, 14(13), 2612; https://doi.org/10.3390/electronics14132612 - 27 Jun 2025
Viewed by 347
Abstract
Small-scale object detection remains a significant challenge in semi-supervised object detection (SSOD), particularly in panoramic dental X-rays. Due to the small lesion size, low contrast, and complex anatomical background, conventional teacher models often fail to extract accurate lesion features, leading to noisy pseudo [...] Read more.
Small-scale object detection remains a significant challenge in semi-supervised object detection (SSOD), particularly in panoramic dental X-rays. Due to the small lesion size, low contrast, and complex anatomical background, conventional teacher models often fail to extract accurate lesion features, leading to noisy pseudo labels and suboptimal detection performance. Additionally, most existing SSOD methods rely on high-confidence thresholds to select pseudo labels, which may mistakenly discard valuable predictions with low scores but accurate localization—especially for small-scale targets. To address these challenges, we propose Entropy Teacher, a novel SSOD framework specifically designed for small-scale dental disease detection. Our method introduces an Entropy-Guided Feature Pyramid that integrates entropy-guided representations to enhance fine-grained structural learning. Moreover, we develop a low-confidence pseudo-label mining (LCPLM) strategy with a class-adaptive thresholding mechanism to effectively recover high-quality pseudo labels below conventional confidence thresholds. Extensive experiments on the Dental Disease Dataset and ChestX-Det demonstrate that Entropy Teacher achieves state-of-the-art performance, surpassing the baseline Unbiased Teacher by +3.8 AP50 and +4.5 APS. These results confirm the effectiveness of entropy-guided representations and low-confidence mining in improving small-scale lesion detection under limited supervision. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

24 pages, 25315 KiB  
Article
PAMFPN: Position-Aware Multi-Kernel Feature Pyramid Network with Adaptive Sparse Attention for Robust Object Detection in Remote Sensing Imagery
by Xiaofei Yang, Suihua Xue, Lin Li, Sihuan Li, Yudong Fang, Xiaofeng Zhang and Xiaohui Huang
Remote Sens. 2025, 17(13), 2213; https://doi.org/10.3390/rs17132213 - 27 Jun 2025
Viewed by 388
Abstract
Deep learning methods have achieved remarkable success in remote sensing object detection. Existing object detection methods focus on integrating convolutional neural networks (CNNs) and Transformer networks to explore local and global representations to improve performance. However, existing methods relying on fixed convolutional kernels [...] Read more.
Deep learning methods have achieved remarkable success in remote sensing object detection. Existing object detection methods focus on integrating convolutional neural networks (CNNs) and Transformer networks to explore local and global representations to improve performance. However, existing methods relying on fixed convolutional kernels and dense global attention mechanisms suffer from computational redundancy and insufficient discriminative feature extraction, particularly for small and rotation-sensitive targets. To address these limitations, we propose a Dynamic Multi-Kernel Position-Aware Feature Pyramid Network (PAMFPN), which integrates adaptive sparse position modeling and multi-kernel dynamic fusion to achieve robust feature representation. Firstly, we design a position-interactive context module (PICM) that incorporates distance-aware sparse attention and dynamic positional encoding. It selectively focuses computation on sparse targets through a decay function that suppresses background noise while enhancing spatial correlations of critical regions. Secondly, we design a dual-kernel adaptive fusion (DKAF) architecture by combining region-sensitive attention (RSA) and reconfigurable context aggregation (RCA). RSA employs orthogonal large-kernel convolutions to capture anisotropic spatial features for arbitrarily oriented targets, while RCA dynamically adjusts the kernel scales based on content complexity, effectively addressing scale variations and intraclass diversity. Extensive experiments on three benchmark datasets (DOTA-v1.0, SSDD, HWPUVHR-10) demonstrate the effectiveness and versatility of the proposed PAMFPN. This work bridges the gap between efficient computation and robust feature fusion in remote sensing detection, offering a universal solution for real-world applications. Full article
(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)
Show Figures

Figure 1

18 pages, 2488 KiB  
Article
An Improved Segformer for Semantic Segmentation of UAV-Based Mine Restoration Scenes
by Feng Wang, Lizhuo Zhang, Tao Jiang, Zhuqi Li, Wangyu Wu and Yingchun Kuang
Sensors 2025, 25(12), 3827; https://doi.org/10.3390/s25123827 - 19 Jun 2025
Cited by 1 | Viewed by 569
Abstract
Mine ecological restoration is a critical process for promoting the sustainable development of resource-dependent regions, yet existing monitoring methods remain limited in accuracy and adaptability. To address challenges such as small-object recognition, insufficient multi-scale feature fusion, and blurred boundaries in UAV-based remote sensing [...] Read more.
Mine ecological restoration is a critical process for promoting the sustainable development of resource-dependent regions, yet existing monitoring methods remain limited in accuracy and adaptability. To address challenges such as small-object recognition, insufficient multi-scale feature fusion, and blurred boundaries in UAV-based remote sensing imagery, this paper proposes an enhanced semantic segmentation model based on Segformer. Specifically, a multi-scale feature-enhanced feature pyramid network (MSFE-FPN) is introduced between the encoder and decoder to strengthen cross-level feature interaction. Additionally, a selective feature aggregation pyramid pooling module (SFA-PPM) is integrated into the deepest feature layer to improve global semantic perception, while an efficient local attention (ELA) module is embedded into lateral connections to enhance sensitivity to edge structures and small-scale targets. A high-resolution UAV image dataset, named the HUNAN Mine UAV Dataset (HNMUD), is constructed to evaluate model performance, and further validation is conducted on the public Aeroscapes dataset. Experimental results demonstrated that the proposed method exhibited strong performance in terms of segmentation accuracy and generalization ability, effectively supporting the image analysis needs of mine restoration scenes. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

18 pages, 3051 KiB  
Article
Segmentation and Fractional Coverage Estimation of Soil, Illuminated Vegetation, and Shaded Vegetation in Corn Canopy Images Using CCSNet and UAV Remote Sensing
by Shanxin Zhang, Jibo Yue, Xiaoyan Wang, Haikuan Feng, Yang Liu and Meiyan Shu
Agriculture 2025, 15(12), 1309; https://doi.org/10.3390/agriculture15121309 - 18 Jun 2025
Viewed by 554
Abstract
The accurate estimation of corn canopy structure and light conditions is essential for effective crop management and informed variety selection. This study introduces CCSNet, a deep learning-based semantic segmentation model specifically developed to extract fractional coverages of soil, illuminated vegetation, and shaded vegetation [...] Read more.
The accurate estimation of corn canopy structure and light conditions is essential for effective crop management and informed variety selection. This study introduces CCSNet, a deep learning-based semantic segmentation model specifically developed to extract fractional coverages of soil, illuminated vegetation, and shaded vegetation from high-resolution corn canopy images acquired by UAVs. CCSNet improves segmentation accuracy by employing multi-level feature fusion and pyramid pooling to effectively capture multi-scale contextual information. The model was evaluated using Pixel Accuracy (PA), mean Intersection over Union (mIoU), and Recall, and was benchmarked against U-Net, PSPNet and UNetFormer. On the test set, CCSNet utilizing a ResNet50 backbone achieved the highest accuracy, with an mIoU of 86.42% and a PA of 93.58%. In addition, its estimation of fractional coverage for key canopy components yielded a root mean squared error (RMSE) ranging from 3.16% to 5.02%. Compared to lightweight backbones (e.g., MobileNetV2), CCSNet exhibited superior generalization performance when integrated with deeper backbones. These results highlight CCSNet’s capability to deliver high-precision segmentation and reliable phenotypic measurements. This provides valuable insights for breeders to evaluate light-use efficiency and facilitates intelligent decision-making in precision agriculture. Full article
(This article belongs to the Special Issue Research Advances in Perception for Agricultural Robots)
Show Figures

Figure 1

16 pages, 2702 KiB  
Article
Real-Time Image Semantic Segmentation Based on Improved DeepLabv3+ Network
by Peibo Li, Jiangwu Zhou and Xiaohua Xu
Big Data Cogn. Comput. 2025, 9(6), 152; https://doi.org/10.3390/bdcc9060152 - 6 Jun 2025
Viewed by 1078
Abstract
To improve the performance of the image semantic segmentation algorithm and make the algorithm achieve a better balance between accuracy and real-time performance when segmenting images, this paper proposes a real-time image semantic segmentation model based on an improved DeepLabv3+ network. First, the [...] Read more.
To improve the performance of the image semantic segmentation algorithm and make the algorithm achieve a better balance between accuracy and real-time performance when segmenting images, this paper proposes a real-time image semantic segmentation model based on an improved DeepLabv3+ network. First, the MobileNetV2 model with less computational overhead and number of parameters is selected as the backbone network to improve the segmentation speed; then, the Feature Enhancement Module (FEM) is introduced to several shallow features with different scale sizes in MobileNetV2, and then these shallow features are fused to improve the utilization rate of the model encoder on the edge information, to retain more detailed information and to improve the network’s feature representation ability for complex scenes; finally, to address the problem that the output feature maps of Atrous Spatial Pyramid Pooling (ASPP) module do not pay enough attention to detailed information after merging, the FEM attention mechanism is introduced on the feature maps processed by the ASPP module. The algorithm in this study achieves 76.45% for mean intersection over union (mIoU) accuracy with 29.18 FPS real-time performance in the PASCAL VOC2012 Augmented dataset; and 37.31% mIoU accuracy with 23.31 FPS real-time performance in the ADE20K dataset. The experimental results show that the algorithm in this study achieves a good balance between accuracy and real-time performance, and its image semantic segmentation performance is significantly improved compared to DeepLabv3+ and other existing algorithms. Full article
Show Figures

Figure 1

26 pages, 5390 KiB  
Article
DLF-YOLO: A Dynamic Synergy Attention-Guided Lightweight Framework for Few-Shot Clothing Trademark Defect Detection
by Kefeng Chen, Xinpiao Zhou and Jia Ren
Electronics 2025, 14(11), 2113; https://doi.org/10.3390/electronics14112113 - 22 May 2025
Viewed by 635
Abstract
To address key challenges in clothing trademark quality inspection—namely, insufficient defect samples, unstable performance in complex industrial environments, and low detection efficiency—this paper proposes DLF-YOLO, an enhanced YOLOv11-based model optimized for industrial deployment. To mitigate the problem of limited annotated data, an unsupervised [...] Read more.
To address key challenges in clothing trademark quality inspection—namely, insufficient defect samples, unstable performance in complex industrial environments, and low detection efficiency—this paper proposes DLF-YOLO, an enhanced YOLOv11-based model optimized for industrial deployment. To mitigate the problem of limited annotated data, an unsupervised generative network, CycleGAN, is employed to synthesize rare defect patterns and simulate diverse environmental conditions (e.g., rotation, noise, and contrast variations), thereby improving data diversity and model generalization. To reduce the impact of industrial noise, a novel multi-scale dynamic synergy attention (MDSA) attention mechanism is introduced, which utilizes dual attention in both channel and spatial dimensions to focus more accurately on key regions of the trademark, effectively suppressing false detections caused by lighting variations and fabric textures. Furthermore, the high-level selective feature pyramid network (HS-FPN) module is adopted to make the neck structure more lightweight, where the feature selection sub-module enhances the perception of fine edge defects, while the feature fusion sub-module achieves a balance between model lightweighting and detection accuracy through the aggregation of hierarchical multi-scale context information. In the backbone, DWConv replaces standard convolutions before the C3k2 module to reduce computational complexity, and HetConv is integrated into the C3k2 module to simultaneously reduce computational cost and enhance feature extraction capabilities, achieving the goal of maintaining model accuracy. Experimental results on a custom-built dataset demonstrate that DLF-YOLO achieves an mAP@0.5:0.95 of 80.2%, with a 49.6% reduction in parameters and a 25.6% reduction in computational load compared to the original YOLOv11. These results highlight the potential of DLF-YOLO as a scalable and efficient solution for lightweight, industrial-grade defect detection in clothing trademarks. Full article
Show Figures

Figure 1

22 pages, 8016 KiB  
Article
Detection of Seed Potato Sprouts Based on Improved YOLOv8 Algorithm
by Yufei Li, Qinghe Zhao, Zifang Zhang, Jinlong Liu and Junlong Fang
Agriculture 2025, 15(9), 1015; https://doi.org/10.3390/agriculture15091015 - 7 May 2025
Viewed by 733
Abstract
Seed potatoes without sprouts usually need to be manually selected in mechanized production, which has been the bottleneck of efficiency. A fast and efficient object recognition algorithm is required for the additional removal process to identify unqualified seed potatoes. In this paper, a [...] Read more.
Seed potatoes without sprouts usually need to be manually selected in mechanized production, which has been the bottleneck of efficiency. A fast and efficient object recognition algorithm is required for the additional removal process to identify unqualified seed potatoes. In this paper, a lightweight deep learning algorithm, YOLOv8_EBG, is proposed to both improve the detection performance and reduce the model parameters. The ECA attention mechanism was introduced in the backbone and neck of the model to more accurately extract and fuse sprouting features. To further reduce the model parameters, Ghost convolution and C3ghost were introduced to replace the normal convolution and C2f blocks in vanilla YOLOv8n. In addition, a bi-directional feature pyramid network is integrated in the neck part for multi-scale feature fusion to enhance the detection accuracy. The experimental results from an isolated test dataset show that the proposed algorithm performs better in detecting sprouts under natural light conditions, achieving an mAP0.5 of 95.7% and 91.9% AP for bud recognition. Compared to the YOLOv8n model, the improved model showed a 6.5% increase in mAP0.5, a 12.9% increase in AP0.5 for bud recognition, and a 5.6% decrease in the number of parameters. Additionally, the improved algorithm was applied and tested on mechanized sorting equipment, and the accuracy of seed potato detection was as high as 92.5%, which was sufficient to identify and select sprouted potatoes, an indispensable step since only sprouted potatoes can be used as seed potatoes. The results of the study can provide technical support for subsequent potato planting intelligence. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

29 pages, 9314 KiB  
Article
SFRADNet: Object Detection Network with Angle Fine-Tuning Under Feature Matching
by Keliang Liu, Yantao Xi, Donglin Jing, Xue Zhang and Mingfei Xu
Remote Sens. 2025, 17(9), 1622; https://doi.org/10.3390/rs17091622 - 2 May 2025
Viewed by 501
Abstract
Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and [...] Read more.
Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and orientation. However, these methods solve scale and orientation issues separately and ignore their deeper coupling relationships. When the scale features extracted by the network are significantly mismatched with the object, it is difficult for the detection head to effectively capture orientation of object, resulting in misalignment between object and bounding box. Therefore, we propose a one-stage detector—Scale First Refinement-Angle Detection Network (SFRADNet), which aims to fine-tune the rotation angle under precise scale feature matching. We introduce the Group Learning Large Kernel Network (GL2KNet) as the backbone of SFRADNet and employ a Shape-Aware Spatial Feature Extraction Module (SA-SFEM) as the primary component of the detection head. Specifically, within GL2KNet, we construct diverse receptive fields with varying dilation rates to capture features across different spatial coverage ranges. Building on this, we utilize multi-scale features within the layers and apply weighted aggregation based on a Scale Selection Matrix (SSMatrix). The SSMatrix dynamically adjusts the receptive field coverage according to the target size, enabling more refined selection of scale features. Based on precise scale features captured, we first design a Directed Guiding Box (DGBox) within the SA-SFEM, using its shape and position information to supervise the sampling points of the convolution kernels, thereby fitting them to deformations of object. This facilitates the extraction of orientation features near the object region, allowing for accurate refinement of both scale and orientation. Experiments show that our network achieves a mAP of 80.10% on the DOTA-v1.0 dataset, while reducing computational complexity compared to the baseline model. Full article
Show Figures

Figure 1

18 pages, 18045 KiB  
Article
Text-Guided Refinement for Referring Image Segmentation
by Shuang Qiu, Shiyin Zhang and Tao Ruan
Appl. Sci. 2025, 15(9), 5047; https://doi.org/10.3390/app15095047 - 1 May 2025
Viewed by 658
Abstract
Referring image segmentation aims to segment an object described by a natural language expression from an image. Existing methods perform multi-modal fusion during encoding, typically integrating image and text features before predicting masks via upsampling networks. However, this approach often lacks sufficient multi-modal [...] Read more.
Referring image segmentation aims to segment an object described by a natural language expression from an image. Existing methods perform multi-modal fusion during encoding, typically integrating image and text features before predicting masks via upsampling networks. However, this approach often lacks sufficient multi-modal interactions during decoding, leading to challenges in achieving precise edge predictions for objects with varying scales. Additionally, the isolated interaction between linguistic and visual features at different scales fails to utilize the continuous guidance of language to multi-scale visual features. To address this issue, we propose the Text-Guided Refinement Network (TGRN). It employs a cascaded pyramid structure with a text-guided gating mechanism to enable selective and efficient integration of multi-modal features across multiple scales at the decoding stage. The proposed TGRN offers the following advantages: (a) It enhances information flow across feature scales, improving the network’s capacity to represent multi-scale semantics and achieve accurate segmentation. (b) It leverages text information to guide feature fusion, allowing for strengthened multi-modal interactions and refined edge perception during decoding. (c) It facilitates effective multi-modal information integration through a language-embedded visual encoder. Extensive experiments on three benchmark datasets validate the effectiveness of the proposed approach, demonstrating its superior performance in referring segmentation. Full article
Show Figures

Figure 1

20 pages, 5975 KiB  
Article
Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation
by Keju Chen, Yun Zhang, Li Zhong and Yongguo Liu
Electronics 2025, 14(7), 1457; https://doi.org/10.3390/electronics14071457 - 3 Apr 2025
Viewed by 516
Abstract
While existing tongue detection methods have achieved good accuracy, the problems of low detection speed and excessive noise in the background area still exist. To address these problems, a fast tongue detection model based on a lightweight model and deep feature propagation (TD-DFP) [...] Read more.
While existing tongue detection methods have achieved good accuracy, the problems of low detection speed and excessive noise in the background area still exist. To address these problems, a fast tongue detection model based on a lightweight model and deep feature propagation (TD-DFP) is proposed. Firstly, a color channel is added to the RGB tongue image to introduce more prominent tongue features. To reduce the computational complexity, keyframes are selected through inter frame differencing, while optical flow maps are used to achieve feature alignment between non-keyframes and keyframes. Secondly, a convolutional neural network with feature pyramid structures is designed to extract multi-scale features, and object detection heads based on depth-wise convolutions are adopted to achieve real-time tongue region detection. In addition, a knowledge distillation module is introduced to improve training performance during the training phase. TD-DFP achieved 82.8% mean average precision (mAP) values and 61.88 frames per second (FPS) values on the tongue dataset. The experimental results indicate that TD-DFP can achieve efficient and accurate tongue detection, achieving real-time tongue detection. Full article
(This article belongs to the Special Issue Mechanism and Modeling of Graph Convolutional Networks)
Show Figures

Figure 1

30 pages, 11153 KiB  
Article
GCA2Net: Global-Consolidation and Angle-Adaptive Network for Oriented Object Detection in Aerial Imagery
by Shenbo Zhou, Zhenfei Liu, Hui Luo, Guanglin Qi, Yunfeng Liu, Haorui Zuo, Jianlin Zhang and Yuxing Wei
Remote Sens. 2025, 17(6), 1077; https://doi.org/10.3390/rs17061077 - 19 Mar 2025
Cited by 1 | Viewed by 610
Abstract
Enhancing the detection capabilities of rotated objects in aerial imagery is a vital aspect of the burgeoning field of remote sensing technology. The objective is to identify and localize objects oriented in arbitrary directions within the image. In recent years, the capacity for [...] Read more.
Enhancing the detection capabilities of rotated objects in aerial imagery is a vital aspect of the burgeoning field of remote sensing technology. The objective is to identify and localize objects oriented in arbitrary directions within the image. In recent years, the capacity for rotated object detection has seen continuous improvement. However, existing methods largely employ traditional backbone networks, where static convolutions excel at extracting features from objects oriented at a specific angle. In contrast, most objects in aerial imagery are oriented in various directions. This poses a challenge for backbone networks to extract high-quality features from objects of different orientations. In response to the challenge above, we propose the Dynamic Rotational Convolution (DRC) module. By integrating it into the ResNet backbone network, we form the backbone network presented in this paper, DRC-ResNet. Within the proposed DRC module, rotation parameters are predicted by the Adaptive Routing Unit (ARU), employing a data-driven approach to adaptively rotate convolutional kernels to extract features from objects oriented in various directions within different images. Building upon this foundation, we introduce a conditional computation mechanism that enables convolutional kernels to more flexibly and efficiently adapt to the dramatic angular changes of objects within images. To better integrate key information within images after obtaining features rich in angular details, we propose the Multi-Order Spatial-Channel Aggregation Block (MOSCAB) module, which is aimed at enhancing the integration capacity of key information in images through selective focusing and global information aggregation. Meanwhile, considering the significant semantic gap between features at different levels during the feature pyramid fusion process, we propose a new multi-scale fusion network named AugFPN+. This network reduces the semantic gap between different levels before feature fusion, achieves more effective feature integration, and minimizes the spatial information loss of small objects to the greatest extent possible. Experiments conducted on popular benchmark datasets DOTA-V1.0 and HRSC2016 demonstrate that our proposed model has achieved mAP scores of 77.56% and 90.4%, respectively, significantly outperforming current rotated detection models. Full article
Show Figures

Figure 1

25 pages, 5589 KiB  
Article
A Multi-Scale Feature-Fusion Multi-Object Tracking Algorithm for Scale-Variant Vehicle Tracking in UAV Videos
by Shanshan Liu, Xinglin Shen, Shanzhu Xiao, Hanwen Li and Huamin Tao
Remote Sens. 2025, 17(6), 1014; https://doi.org/10.3390/rs17061014 - 14 Mar 2025
Cited by 1 | Viewed by 1228
Abstract
Unmanned Aerial Vehicle (UAV) vehicle-tracking technology has extensive potential for application in various fields. In the actual tracking process, the relative movement of the UAV and vehicles will bring large target-scale variations (i.e., size and aspect ratio change), which leads to missed detection [...] Read more.
Unmanned Aerial Vehicle (UAV) vehicle-tracking technology has extensive potential for application in various fields. In the actual tracking process, the relative movement of the UAV and vehicles will bring large target-scale variations (i.e., size and aspect ratio change), which leads to missed detection and ID switching. Traditional tracking methods usually use multi-scale estimation to adaptively update the target scale for variable-scale detection and tracking. However, this approach requires selecting multiple scaling factors and generating a large number of bounding boxes, which results in high computational costs and affects real-time performance. To tackle the above issue, we propose a novel multi-target tracking method based on the BoT-SORT framework. Firstly, we propose an FB-YOLOv8 framework to solve the missed detection problem. This framework incorporates a Feature Alignment Aggregation Module (FAAM) and a Bidirectional Path Aggregation Network (BPAN) to enhance the multi-scale feature fusion. Secondly, we propose a multi-scale feature-fusion network (MSFF-OSNet) to extract appearance features, which solves the ID switching problem. This framework integrates the Feature Pyramid Network (FPN) and Convolutional Block Attention Module (CBAM) into OSNet to capture multilevel pixel dependencies and combine low-level and high-level features. By effectively integrating the FB-YOLOv8 and MSFF-OSNet modules into the tracking pipeline, the accuracy and stability of tracking are improved. Experiments on the UAVDT dataset achieved 46.1% MOTA and 65.3% IDF1, which outperforms current state-of-the-art trackers. Furthermore, experiments conducted on sequences with scale variations have substantiated the improved tracking stability of our proposed method under scale-changing conditions. Full article
Show Figures

Figure 1

19 pages, 8648 KiB  
Article
Automatic Extraction of Water Body from SAR Images Considering Enhanced Feature Fusion and Noise Suppression
by Meijun Gao, Wenjie Dong, Lifu Chen and Zhongwu Wu
Appl. Sci. 2025, 15(5), 2366; https://doi.org/10.3390/app15052366 - 22 Feb 2025
Viewed by 761
Abstract
Water extraction from Synthetic Aperture Radar (SAR) images is crucial for water resource management and maintaining the sustainability of ecosystems. Though great progress has been achieved, there are still some challenges, such as an insufficient ability to extract water edge details, an inability [...] Read more.
Water extraction from Synthetic Aperture Radar (SAR) images is crucial for water resource management and maintaining the sustainability of ecosystems. Though great progress has been achieved, there are still some challenges, such as an insufficient ability to extract water edge details, an inability to detect small water bodies, and a weak ability to suppress background noise. To address these problems, we propose the Global Context Attention Feature Fusion Network (GCAFF-Net) in this article. It includes an encoder module for hierarchical feature extraction and a decoder module for merging multi-scale features. The encoder utilizes ResNet-101 as the backbone network to generate four-level features of different resolutions. In the middle-level feature fusion stage, the Attention Feature Fusion module (AFFM) is presented for multi-scale feature learning to improve the performance of fine water segmentation. In the advanced feature encoding stage, the Global Context Atrous Spatial Pyramid Pooling (GCASPP) is constructed to adaptively integrate the water information in SAR images from a global perspective, thereby enhancing the network’s ability to express water boundaries. In the decoder module, an attention modulation module (AMM) is introduced to rearrange the distribution of feature importance from the channel-space sequence perspective, so as to better extract the detailed features of water bodies. In the experiment, SAR images from Sentinel-1 system are utilized, and three different water areas with different features and scales are selected for independent testing. The Pixel Accuracy (PA) and Intersection over Union (IoU) values for water extraction are 95.24% and 91.63%, respectively. The results indicate that the network can extract more integral water edges and better detailed features, enhancing the accuracy and generalization of water body extraction. Compared with the several existing classical semantic segmentation models, GCAFF-Net embodies superior performance, which can also be used for typical target segmentation from SAR images. Full article
Show Figures

Figure 1

15 pages, 3069 KiB  
Article
MRB-YOLOv8: An Algorithm for Insulator Defect Detection
by Junhong Xu, Shengjie Zhao, Yuan Li, Wenxin Song and Kecheng Zhang
Electronics 2025, 14(5), 830; https://doi.org/10.3390/electronics14050830 - 20 Feb 2025
Viewed by 839
Abstract
As China’s electricity consumption surges, the reliability and safety of long-distance transmission lines become increasingly crucial. Insulators, vital for grid stability, demand accurate defect identification. Existing methods fall short on small targets and complex backgrounds. An insulator defect detection method MRB-YOLOv8 is proposed. [...] Read more.
As China’s electricity consumption surges, the reliability and safety of long-distance transmission lines become increasingly crucial. Insulators, vital for grid stability, demand accurate defect identification. Existing methods fall short on small targets and complex backgrounds. An insulator defect detection method MRB-YOLOv8 is proposed. By integrating an attention mechanism and multi-scale features, the model’s focus on key features is significantly improved. The Multi-Spectral Channel Attention captures essential information across different frequency domains through a well-designed frequency selection strategy. In addition, Receptive Field Attention Convolution (RFAConv) replaces the C2f module in the backbone network, which enhances the ability to perceive the features in complex backgrounds through the weighting operation of the receptive field weights. Meanwhile, the weighted bi-directional feature pyramid network (BiFPN) and a fourth detection layer prevent feature loss during fusion, enhancing the detection accuracy of small targets. Experimental results show that, at mAP50 and mAP50:95, the improved method obtains a gain of 3.2% and 3.6%, respectively, which significantly improves the model’s capability of detecting defects such as insulator self-explosion, breakage, and flashover in the images captured by UAVs. Full article
Show Figures

Figure 1

Back to TopTop