MDPI - Publisher of Open Access Journals

24 pages, 10190 KiB

Open AccessArticle

MSMT-RTDETR: A Multi-Scale Model for Detecting Maize Tassels in UAV Images with Complex Field Backgrounds

by Zhenbin Zhu, Zhankai Gao, Jiajun Zhuang, Dongchen Huang, Guogang Huang, Hansheng Wang, Jiawei Pei, Jingjing Zheng and Changyu Liu

Agriculture 2025, 15(15), 1653; https://doi.org/10.3390/agriculture15151653 - 31 Jul 2025

Abstract

Accurate detection of maize tassels plays a crucial role in yield estimation of maize in precision agriculture. Recently, UAV and deep learning technologies have been widely introduced in various applications of field monitoring. However, complex field backgrounds pose multiple challenges against the precision [...] Read more.

Accurate detection of maize tassels plays a crucial role in yield estimation of maize in precision agriculture. Recently, UAV and deep learning technologies have been widely introduced in various applications of field monitoring. However, complex field backgrounds pose multiple challenges against the precision detection of maize tassels, including maize tassel multi-scale variations caused by varietal differences and growth stage variations, intra-class occlusion, and background interference. To achieve accurate maize tassel detection in UAV images under complex field backgrounds, this study proposes an MSMT-RTDETR detection model. The Faster-RPE Block is first designed to enhance multi-scale feature extraction while reducing model Params and FLOPs. To improve detection performance for multi-scale targets in complex field backgrounds, a Dynamic Cross-Scale Feature Fusion Module (Dy-CCFM) is constructed by upgrading the CCFM through dynamic sampling strategies and multi-branch architecture. Furthermore, the MPCC3 module is built via re-parameterization methods, and further strengthens cross-channel information extraction capability and model stability to deal with intra-class occlusion. Experimental results on the MTDC-UAV dataset demonstrate that the MSMT-RTDETR significantly outperforms the baseline in detecting maize tassels under complex field backgrounds, where a precision of 84.2% was achieved. Compared with Deformable DETR and YOLOv10m, improvements of 2.8% and 2.0% were achieved, respectively, in the mAP₅₀ for UAV images. This study proposes an innovative solution for accurate maize tassel detection, establishing a reliable technical foundation for maize yield estimation. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

17 pages, 3726 KiB

Open AccessArticle

LEAD-Net: Semantic-Enhanced Anomaly Feature Learning for Substation Equipment Defect Detection

by Linghao Zhang, Junwei Kuang, Yufei Teng, Siyu Xiang, Lin Li and Yingjie Zhou

Processes 2025, 13(8), 2341; https://doi.org/10.3390/pr13082341 - 23 Jul 2025

Viewed by 251

Abstract

Substation equipment defect detection is a critical aspect of ensuring the reliability and stability of modern power grids. However, existing deep-learning-based detection methods often face significant challenges in real-world deployment, primarily due to low detection accuracy and inconsistent anomaly definitions across different substation [...] Read more.

Substation equipment defect detection is a critical aspect of ensuring the reliability and stability of modern power grids. However, existing deep-learning-based detection methods often face significant challenges in real-world deployment, primarily due to low detection accuracy and inconsistent anomaly definitions across different substation environments. To address these limitations, this paper proposes the Language-Guided Enhanced Anomaly Power Equipment Detection Network (LEAD-Net), a novel framework that leverages text-guided learning during training to significantly improve defect detection performance. Unlike traditional methods, LEAD-Net integrates textual descriptions of defects, such as historical maintenance records or inspection reports, as auxiliary guidance during training. A key innovation is the Language-Guided Anomaly Feature Enhancement Module (LAFEM), which refines channel attention using these text features. Crucially, LEAD-Net operates solely on image data during inference, ensuring practical applicability. Experiments on a real-world substation dataset, comprising 8307 image–text pairs and encompassing a diverse range of defect categories encountered in operational substation environments, demonstrate that LEAD-Net significantly outperforms state-of-the-art object detection methods (Faster R-CNN, YOLOv9, DETR, and Deformable DETR), achieving a mean Average Precision (mAP) of 79.51%. Ablation studies confirm the contributions of both LAFEM and the training-time text guidance. The results highlight the effectiveness and novelty of using training-time defect descriptions to enhance visual anomaly detection without requiring text input at inference. Full article

(This article belongs to the Special Issue Smart Optimization Techniques for Microgrid Management)

► Show Figures

Figure 1

19 pages, 7851 KiB

Open AccessArticle

Ship Plate Detection Algorithm Based on Improved RT-DETR

by Lei Zhang and Liuyi Huang

J. Mar. Sci. Eng. 2025, 13(7), 1277; https://doi.org/10.3390/jmse13071277 - 30 Jun 2025

Cited by 1 | Viewed by 381

Abstract

To address the challenges in ship plate detection under complex maritime scenarios—such as small target size, extreme aspect ratios, dense arrangements, and multi-angle rotations—this paper proposes a multi-module collaborative detection algorithm, RT-DETR-HPA, based on an enhanced RT-DETR framework. The proposed model integrates three [...] Read more.

To address the challenges in ship plate detection under complex maritime scenarios—such as small target size, extreme aspect ratios, dense arrangements, and multi-angle rotations—this paper proposes a multi-module collaborative detection algorithm, RT-DETR-HPA, based on an enhanced RT-DETR framework. The proposed model integrates three core components: an improved High-Frequency Enhanced Residual Block (HFERB) embedded in the backbone to strengthen multi-scale high-frequency feature fusion, with deformable convolution added to handle occlusion and deformation; a Pinwheel-shaped Convolution (PConv) module employing multi-directional convolution kernels to achieve rotation-adaptive local detail extraction and accurately capture plate edges and character features; and an Adaptive Sparse Self-Attention (ASSA) mechanism incorporated into the encoder to automatically focus on key regions while suppressing complex background interference, thereby enhancing feature discriminability. Comparative experiments conducted on a self-constructed dataset of 20,000 ship plate images show that, compared to the original RT-DETR, RT-DETR-HPA achieves a 3.36% improvement in mAP@50 (up to 97.12%), a 3.23% increase in recall (reaching 94.88%), and maintains real-time detection speed at 40.1 FPS. Compared with mainstream object detection models such as the YOLO series and Faster R-CNN, RT-DETR-HPA demonstrates significant advantages in high-precision localization, adaptability to complex scenarios, and real-time performance. It effectively reduces missed and false detections caused by low resolution, poor lighting, and dense occlusion, providing a robust and high-accuracy solution for intelligent ship supervision. Future work will focus on lightweight model design and dynamic resolution adaptation to enhance its applicability on mobile maritime surveillance platforms. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

17 pages, 5610 KiB

Open AccessArticle

The Detection of Maize Leaf Disease Based on an Improved Real-Time Detection Transformer Model

by Jianbin Yao, Zhenghao Zhu, Mengqi Yuan, Linyuan Li and Meijia Wang

Symmetry 2025, 17(6), 808; https://doi.org/10.3390/sym17060808 - 22 May 2025

Viewed by 558

Abstract

Maize is one of the most important global crops. It is highly susceptible to diseases during its growth process, meaning that the timely detection and prevention of maize diseases is critically important. However, simple deep learning classification tasks do not allow for the [...] Read more.

Maize is one of the most important global crops. It is highly susceptible to diseases during its growth process, meaning that the timely detection and prevention of maize diseases is critically important. However, simple deep learning classification tasks do not allow for the accurate identification of multiple diseases present in a single leaf, and the existing RT-DETR (Real-Time Detection Transformer) detection methods suffer from issues such as excessive model parameters and inaccurate recognition of multi-scale features on maize leaves. The aim of this paper is to address these challenges by proposing an improved RT-DETR model. The model enhances the feature extraction capability by introducing a DAttention (Deformable Attention) module and optimizes the feature fusion process through the symmetry structure of spatial and channel in the SCConv (Spatial and Channel Reconstruction Convolution) module. In addition, the backbone network of the model is reconfigured, which effectively reduces the parameter size of the model and achieves a balanced symmetry between the model precision and the parameter count. Experimental results demonstrate that the proposed improved model achieves an mAP@0.5 of 92.0% and a detection precision of 89.2%, representing improvements of 7.3% and 8.4%, respectively, compared to the original RT-DETR model. Additionally, the model’s parameter size has been reduced by 18.9 M, leading to a substantial decrease in resource consumption during deployment and underscoring its extensive application potential. Full article

(This article belongs to the Section Life Sciences)

► Show Figures

Figure 1

20 pages, 2853 KiB

Open AccessArticle

MHFS-FORMER: Multiple-Scale Hybrid Features Transformer for Lane Detection

by Dongqi Yan and Tao Zhang

Sensors 2025, 25(9), 2876; https://doi.org/10.3390/s25092876 - 2 May 2025

Viewed by 573

Abstract

Although deep learning has exhibited remarkable performance in lane detection, lane detection remains challenging in complex scenarios, including those with damaged lane markings, obstructions, and insufficient lighting. Furthermore, a significant drawback of most existing lane-detection algorithms lies in their reliance on complex post-processing [...] Read more.

Although deep learning has exhibited remarkable performance in lane detection, lane detection remains challenging in complex scenarios, including those with damaged lane markings, obstructions, and insufficient lighting. Furthermore, a significant drawback of most existing lane-detection algorithms lies in their reliance on complex post-processing and strong prior knowledge. Inspired by the DETR architecture, we propose an end-to-end Transformer-based model, MHFS-FORMER, to resolve these issues. To tackle the interference with lane detection in complex scenarios, we have designed MHFNet. It fuses multi-scale features with the Transformer Encoder to obtain enhanced multi-scale features. These enhanced multi-scale features are then fed into the Transformer Decoder. A novel multi-reference deformable attention module is introduced to disperse the attention around the objects to enhance the model’s representation ability during the training process and better capture the elongated structure of lanes and the global environment. We also designed ShuffleLaneNet, which meticulously explores the channel and spatial information of multi-scale lane features, significantly improving the accuracy of target recognition. Our method has achieved an accuracy score of 96.88%, a real-time processing speed of 87 fps on the TuSimple dataset, and an F1 score of 77.38% on the CULane dataset. Compared with the methods based on CNN and those based on Transformer, our method has demonstrated excellent performance. Full article

(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)

► Show Figures

Figure 1

16 pages, 1683 KiB

Open AccessArticle

Refined Deformable-DETR for SAR Target Detection and Radio Signal Detection

by Zhenghao Li and Xin Zhou

Remote Sens. 2025, 17(8), 1406; https://doi.org/10.3390/rs17081406 - 15 Apr 2025

Cited by 1 | Viewed by 822

Abstract

SAR target detection and signal detection are critical tasks in electromagnetic signal processing, with wide-ranging applications in remote sensing and communication monitoring. However, these tasks are challenged by complex backgrounds, multi-scale target variations, and the limited integration of domain-specific priors into existing deep [...] Read more.

SAR target detection and signal detection are critical tasks in electromagnetic signal processing, with wide-ranging applications in remote sensing and communication monitoring. However, these tasks are challenged by complex backgrounds, multi-scale target variations, and the limited integration of domain-specific priors into existing deep learning models. To address these challenges, we propose Refined Deformable-DETR, a novel Transformer-based method designed to enhance detection performance in SAR and signal processing scenarios. Our approach integrates three key components, including the half-window filter (HWF) to leverage SAR and signal priors, the multi-scale adapter to ensure robust multi-level feature representation, and auxiliary feature extractors to enhance feature learning. Together, these innovations significantly enhance detection precision and robustness. The Refined Deformable-DETR achieves a mAP of 0.682 on the HRSID dataset and 0.540 on the spectrograms dataset, demonstrating remarkable performance compared to other methods. Full article

(This article belongs to the Special Issue Deep Learning Techniques and Applications of MIMO Radar Theory)

► Show Figures

Figure 1

17 pages, 32249 KiB

Open AccessArticle

HPRT-DETR: A High-Precision Real-Time Object Detection Algorithm for Intelligent Driving Vehicles

by Xiaona Song, Bin Fan, Haichao Liu, Lijun Wang and Jinxing Niu

Sensors 2025, 25(6), 1778; https://doi.org/10.3390/s25061778 - 13 Mar 2025

Cited by 1 | Viewed by 1243

Abstract

Object detection is essential for the perception systems of intelligent driving vehicles. RT-DETR has emerged as a prominent model. However, its direct application in intelligent driving vehicles still faces issues with the misdetection of occluded or small targets. To address these challenges, we [...] Read more.

Object detection is essential for the perception systems of intelligent driving vehicles. RT-DETR has emerged as a prominent model. However, its direct application in intelligent driving vehicles still faces issues with the misdetection of occluded or small targets. To address these challenges, we propose a High-Precision Real-Time object detection algorithm (HPRT-DETR). We designed a Basic-iRMB-CGA (BIC) Block for a backbone network that efficiently extracts features and reduces the model’s parameters. We thus propose a Deformable Attention-based Intra-scale Feature Interaction (DAIFI) module by combining the Deformable Attention mechanism with the Intra-Scale Feature Interaction module. This enables the model to capture rich semantic features and enhance object detection accuracy in occlusion. The Local Feature Extraction Fusion (LFEF) block was created by integrating the local feature extraction module with the CNN-based Cross-scale Feature Fusion (CCFF) module. This integration expands the model’s receptive field and enhances feature extraction without adding learnable parameters or complex computations, effectively minimizing missed detections of small targets. Experiments on the KITTI dataset show that, compared to RT-DETR, HPRT-DETR improves mAP50 and FPS by 1.98% and 15.25%, respectively. Additionally, its generalization ability is assessed on the SODA 10M dataset, where HPRT-DETR outperforms RT-DETR in most evaluation metrics, confirming the model’s effectiveness. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 7905 KiB

Open AccessArticle

Transformer-Driven Algal Target Detection in Real Water Samples: From Dataset Construction and Augmentation to Model Optimization

by Liping Li, Ziyi Liang, Tianquan Liu, Cunyue Lu, Qiuyu Yu and Yang Qiao

Water 2025, 17(3), 430; https://doi.org/10.3390/w17030430 - 4 Feb 2025

Viewed by 779

Abstract

Algae are vital to aquatic ecosystems, with their structure and abundance influencing ecological health. However, automated detection in real water samples is hindered by complex backgrounds, species diversity, and size variations. Traditional methods are deemed costly and species-specific, leading to deep learning adoption. [...] Read more.

Algae are vital to aquatic ecosystems, with their structure and abundance influencing ecological health. However, automated detection in real water samples is hindered by complex backgrounds, species diversity, and size variations. Traditional methods are deemed costly and species-specific, leading to deep learning adoption. Current studies rely on CNN-based models and limited datasets. To improve the detection accuracy of multiple algal species in real, complex backgrounds, this study collected multi-species algae samples from actual water environments and implemented an integrated Transformer-based framework for automated localization and recognition of small, medium, and large algae species. Specifically, algae samples from five different regions were collected to construct a comprehensive dataset containing 25 algal species with diverse backgrounds and rich category diversity. To address dataset imbalances in minority species, a segmentation-fusion data augmentation method was proposed, which enhanced performance across YOLO, Faster R-CNN, and Deformable DETR models, with YOLO achieving a 7.1% precision increase and a 1.5% mAP improvement. Model optimization focused on an improved Deformable DETR, incorporating multi-scale feature extraction, deformable attention mechanisms, and the normalized Wasserstein distance loss function. This improvement enhanced small target and overlapping object detection, achieving a 10.4% mAP increase at an intersection over union (IoU) threshold of 0.5 and outperforming unmodified Deformable DETR. Full article

► Show Figures

Graphical abstract

16 pages, 3776 KiB

Open AccessArticle

MDA-DETR: Enhancing Offending Animal Detection with Multi-Channel Attention and Multi-Scale Feature Aggregation

by Haiyan Zhang, Huiqi Li, Guodong Sun and Feng Yang

Animals 2025, 15(2), 259; https://doi.org/10.3390/ani15020259 - 17 Jan 2025

Cited by 1 | Viewed by 1145

Abstract

Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly [...] Read more.

Conflicts between humans and animals in agricultural and settlement areas have recently increased, resulting in significant resource loss and risks to human and animal lives. This growing issue presents a global challenge. This paper addresses the detection and identification of offending animals, particularly in obscured or blurry nighttime images. This article introduces Multi-Channel Coordinated Attention and Multi-Dimension Feature Aggregation (MDA-DETR). It integrates multi-scale features for enhanced detection accuracy, employing a Multi-Channel Coordinated Attention (MCCA) mechanism to incorporate location, semantic, and long-range dependency information and a Multi-Dimension Feature Aggregation Module (DFAM) for cross-scale feature aggregation. Additionally, the VariFocal Loss function is utilized to assign pixel weights, enhancing detail focus and maintaining accuracy. In the dataset section, this article uses a dataset from the Northeast China Tiger and Leopard National Park, which includes images of six common offending animal species. In the comprehensive experiments on the dataset, the

m A P_{50}

index of MDA-DETR was 1.3%, 0.6%, 0.3%, 3%, 1.1%, and 0.5% higher than RT-DETR-r18, yolov8n, yolov9-C, DETR, Deformable-detr, and DCA-yolov8, respectively, indicating that MDA-DETR is superior to other advanced methods. Full article

(This article belongs to the Special Issue Animal–Computer Interaction: Advances and Opportunities)

► Show Figures

Figure 1

18 pages, 2985 KiB

Open AccessArticle

Green Apple Detector Based on Optimized Deformable Detection Transformer

by Qiaolian Liu, Hu Meng, Ruina Zhao, Xiaohui Ma, Ting Zhang and Weikuan Jia

Agriculture 2025, 15(1), 75; https://doi.org/10.3390/agriculture15010075 - 31 Dec 2024

Cited by 1 | Viewed by 895

Abstract

In the process of smart orchard construction, accurate detection of target fruit is an important guarantee to realize intelligent management of orchards. Green apple detection technology greatly diminishes the need for manual labor, cutting costs and time, while enhancing the automation and efficiency [...] Read more.

In the process of smart orchard construction, accurate detection of target fruit is an important guarantee to realize intelligent management of orchards. Green apple detection technology greatly diminishes the need for manual labor, cutting costs and time, while enhancing the automation and efficiency of sorting processes. However, due to the complex orchard environment, the ever-changing posture of the target fruit, and the difficulty of detecting green target fruit similar to the background, they bring new challenges to the detection of green target fruit. Aiming at the problems existing in green apple detection, this study takes green apples as the research object, and proposes a green apple detection model based on optimized deformable DETR. The new method first introduces the ResNeXt network to extract image features to reduce information loss in the feature extraction process; secondly, it improves the accuracy and optimizes the detection results through the deformable attention mechanism; and finally, it uses a feed-forward network to predict the detection results. The experimental results show that the accuracy of the improved detection model has been significantly improved, with an overall AP of 54.1, AP₅₀ of 80.4, AP₇₅ of 58.0, AP_s of 35.4 for small objects, AP_m of 60.2 for medium objects, and AP_l of 85.0 for large objects. It can provide a theoretical reference for green target detection of other fruit and vegetables green target detection. Full article

(This article belongs to the Special Issue Application of Machine Learning and Data Analysis in Agriculture)

► Show Figures

Figure 1

16 pages, 5968 KiB

Open AccessArticle

Pear Fruit Detection Model in Natural Environment Based on Lightweight Transformer Architecture

by Zheng Huang, Xiuhua Zhang, Hongsen Wang, Huajie Wei, Yi Zhang and Guihong Zhou

Agriculture 2025, 15(1), 24; https://doi.org/10.3390/agriculture15010024 - 25 Dec 2024

Cited by 5 | Viewed by 1285

Abstract

Aiming at the problems of low precision, slow speed and difficult detection of small target pear fruit in a real environment, this paper designs a pear fruit detection model in a natural environment based on a lightweight Transformer architecture based on the RT-DETR [...] Read more.

Aiming at the problems of low precision, slow speed and difficult detection of small target pear fruit in a real environment, this paper designs a pear fruit detection model in a natural environment based on a lightweight Transformer architecture based on the RT-DETR model. Meanwhile, Xinli No. 7 fruit data set with different environmental conditions is established. First, based on the original model, the backbone was replaced with a lightweight FasterNet network. Secondly, HiLo, an improved and efficient attention mechanism with high and low-frequency information extraction, was used to make the model lightweight and improve the feature extraction ability of Xinli No. 7 in complex environments. The CCFM module is reconstructed based on the Slim-Neck method, and the loss function of the original model is replaced with the Shape-NWD small target detection mechanism loss function to enhance the feature extraction capability of the network. The comparison test between RT-DETR and YOLOv5m, YOLOv7, YOLOv8m and YOLOv10m, Deformable-DETR models shows that RT-DETR can achieve a good balance in terms of model lightweight and recognition accuracy compared with other models, and comprehensively exceed the detection accuracy of the current advanced YOLOv10 algorithm, which can realize the rapid detection of Xinli No. 7 fruit. In this paper, the accuracy rate, recall rate and average accuracy of the improved model reached 93.7%, 91.9% and 98%, respectively, and compared with the original model, the number of params, calculation amount and weight memory was reduced by 48.47%, 56.2% and 48.31%, respectively. This model provides technical support for Xinli No. 7 fruit detection and model deployment in complex environments. Full article

(This article belongs to the Special Issue Application of Vision Technology and Artificial Intelligence in Smart Farming—2nd Edition)

► Show Figures

Figure 1

16 pages, 6568 KiB

Open AccessCommunication

An Improved DETR Based on Angle Denoising and Oriented Boxes Refinement for Remote Sensing Object Detection

by Hongmei Wang, Chenkai Li, Qiaorong Wu and Jingyu Wang

Remote Sens. 2024, 16(23), 4420; https://doi.org/10.3390/rs16234420 - 26 Nov 2024

Cited by 1 | Viewed by 1799

Abstract

Remote sensing image object detection presents significant challenges, due to the difficulty in accurately predicting the rotational angles of ground-oriented objects, coupled with issues such as the false and missed detection caused by insufficient object information. Moreover, traditional convolutional neural networks are inherently [...] Read more.

Remote sensing image object detection presents significant challenges, due to the difficulty in accurately predicting the rotational angles of ground-oriented objects, coupled with issues such as the false and missed detection caused by insufficient object information. Moreover, traditional convolutional neural networks are inherently limited in their capacity to capture global contextual information. To address these challenges, a DETR-based remote sensing image object detection model is designed for oriented objects. Except for the backbone, transformer encoders and decoders, scenario query guiding modules, oriented boxes refinement modules, auxiliary multiple detectors, and oriented boxes denoising modules are also designed and included in our network. The scenario query guiding module is proposed that implicitly guides the decoder to focus more on object classification information specific to that scene when inferring. The multiple deformable attention mechanism is improved to the oriented one and utilized into the oriented boxes refinement module which repeatedly corrects the oriented boxes, enhancing the network’s ability to predict the oriented boxes precisely. At the same time, the improved auxiliary multiple detectors and oriented boxes denoising module are applied only for the training process to enhance the learning ability of the encoder and decoder for oriented objects. The ablation experiments proved the effectiveness of the designed modules. The detection accuracy of our network on DOTAv1.0 (76.77%) and HRCS2016 (97.01%) is improved compared with the state-of-the-art methods, which are especially significantly higher than DETR detection algorithms. Full article

► Show Figures

Figure 1

15 pages, 596 KiB

Open AccessArticle

DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR

by Xiaolong Wei, Ling Yin, Liangliang Zhang and Fei Wu

Sensors 2024, 24(22), 7376; https://doi.org/10.3390/s24227376 - 19 Nov 2024

Cited by 9 | Viewed by 2723

Abstract

For drone-based detection tasks, accurately identifying small-scale targets like people, bicycles, and pedestrians remains a key challenge. In this paper, we propose DV-DETR, an improved detection model based on the Real-Time Detection Transformer (RT-DETR), specifically optimized for small target detection in high-density scenes. [...] Read more.

For drone-based detection tasks, accurately identifying small-scale targets like people, bicycles, and pedestrians remains a key challenge. In this paper, we propose DV-DETR, an improved detection model based on the Real-Time Detection Transformer (RT-DETR), specifically optimized for small target detection in high-density scenes. To achieve this, we introduce three main enhancements: (1) ResNet18 as the backbone network to improve feature extraction and reduce model complexity; (2) the integration of recalibration attention units and deformable attention mechanisms in the neck network to enhance multi-scale feature fusion and improve localization accuracy; and (3) the use of the Focaler-IoU loss function to better handle the imbalanced distribution of target scales and focus on challenging samples. Experimental results on the VisDrone2019 dataset show that DV-DETR achieves an mAP@0.5 of 50.1%, a 1.7% improvement over the baseline model, while increasing detection speed from 75 FPS to 90 FPS, meeting real-time processing requirements. These improvements not only enhance the model’s accuracy and efficiency but also provide practical significance in complex, high-density urban environments, supporting real-world applications in UAV-based surveillance and monitoring tasks. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

20 pages, 27367 KiB

Open AccessArticle

MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle Imagery

by Chushi Yu and Yoan Shin

Remote Sens. 2024, 16(17), 3169; https://doi.org/10.3390/rs16173169 - 27 Aug 2024

Cited by 8 | Viewed by 3003

Abstract

In recent years, object detection in unmanned aerial vehicle (UAV) imagery has been a prominent and crucial task, with advancements in drone and remote sensing technologies. However, detecting targets in UAV images pose challenges such as complex background, severe occlusion, dense small targets, [...] Read more.

In recent years, object detection in unmanned aerial vehicle (UAV) imagery has been a prominent and crucial task, with advancements in drone and remote sensing technologies. However, detecting targets in UAV images pose challenges such as complex background, severe occlusion, dense small targets, and lighting conditions. Despite the notable progress of object detection algorithms based on deep learning, they still struggle with missed detections and false alarms. In this work, we introduce an MCG-RTDETR approach based on the real-time detection transformer (RT-DETR) with dual and deformable convolution modules, a cascaded group attention module, a context-guided feature fusion structure with context-guided downsampling, and a more flexible prediction head for precise object detection in UAV imagery. Experimental outcomes on the VisDrone2019 dataset illustrate that our approach achieves the highest

A P

of 29.7% and

A P_{50}

of 58.2%, surpassing several cutting-edge algorithms. Visual results further validate the model’s robustness and capability in complex environments. Full article

(This article belongs to the Special Issue Machine Learning for Intelligent Processing and Applications of Multi-Source Remote Sensing Data)

► Show Figures

Figure 1

21 pages, 6786 KiB

Open AccessArticle

Bearing-DETR: A Lightweight Deep Learning Model for Bearing Defect Detection Based on RT-DETR

by Minggao Liu, Haifeng Wang, Luyao Du, Fangsong Ji and Ming Zhang

Sensors 2024, 24(13), 4262; https://doi.org/10.3390/s24134262 - 30 Jun 2024

Cited by 14 | Viewed by 3977

Abstract

Detecting bearing defects accurately and efficiently is critical for industrial safety and efficiency. This paper introduces Bearing-DETR, a deep learning model optimised using the Real-Time Detection Transformer (RT-DETR) architecture. Enhanced with Dysample Dynamic Upsampling, Efficient Model Optimization (EMO) with Meta-Mobile Blocks (MMB), and [...] Read more.

Detecting bearing defects accurately and efficiently is critical for industrial safety and efficiency. This paper introduces Bearing-DETR, a deep learning model optimised using the Real-Time Detection Transformer (RT-DETR) architecture. Enhanced with Dysample Dynamic Upsampling, Efficient Model Optimization (EMO) with Meta-Mobile Blocks (MMB), and Deformable Large Kernel Attention (D-LKA), Bearing-DETR offers significant improvements in defect detection while maintaining a lightweight framework suitable for low-resource devices. Validated on a dataset from a chemical plant, Bearing-DETR outperformed the standard RT-DETR, achieving a mean average precision (mAP) of 94.3% at IoU = 0.5 and 57.5% at IoU = 0.5–0.95. It also reduced floating-point operations (FLOPs) to 8.2 G and parameters to 3.2 M, underscoring its enhanced efficiency and reduced computational demands. These results demonstrate the potential of Bearing-DETR to transform maintenance strategies and quality control across manufacturing environments, emphasising adaptability and impact on sustainability and operational costs. Full article

(This article belongs to the Special Issue Sensors and Machine-Learning Based Signal Processing)

► Show Figures

Figure 1

Search Results (31)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (31)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI