Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (506)

Search Parameters:
Keywords = YOLOv7-Pose

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 49730 KB  
Article
AMSRDet: An Adaptive Multi-Scale UAV Infrared-Visible Remote Sensing Vehicle Detection Network
by Zekai Yan and Yuheng Li
Sensors 2026, 26(3), 817; https://doi.org/10.3390/s26030817 - 26 Jan 2026
Abstract
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an [...] Read more.
Unmanned Aerial Vehicle (UAV) platforms enable flexible and cost-effective vehicle detection for intelligent transportation systems, yet small-scale vehicles in complex aerial scenes pose substantial challenges from extreme scale variations, environmental interference, and single-sensor limitations. We present AMSRDet (Adaptive Multi-Scale Remote Sensing Detector), an adaptive multi-scale detection network fusing infrared (IR) and visible (RGB) modalities for robust UAV-based vehicle detection. Our framework comprises four novel components: (1) a MobileMamba-based dual-stream encoder extracting complementary features via Selective State-Space 2D (SS2D) blocks with linear complexity O(HWC), achieving 2.1× efficiency improvement over standard Transformers; (2) a Cross-Modal Global Fusion (CMGF) module capturing global dependencies through spatial-channel attention while suppressing modality-specific noise via adaptive gating; (3) a Scale-Coordinate Attention Fusion (SCAF) module integrating multi-scale features via coordinate attention and learned scale-aware weighting, improving small object detection by 2.5 percentage points; and (4) a Separable Dynamic Decoder generating scale-adaptive predictions through content-aware dynamic convolution, reducing computational cost by 48.9% compared to standard DETR decoders. On the DroneVehicle dataset, AMSRDet achieves 45.8% mAP@0.5:0.95 (81.2% mAP@0.5) at 68.3 Frames Per Second (FPS) with 28.6 million (M) parameters and 47.2 Giga Floating Point Operations (GFLOPs), outperforming twenty state-of-the-art detectors including YOLOv12 (+0.7% mAP), DEIM (+0.8% mAP), and Mamba-YOLO (+1.5% mAP). Cross-dataset evaluation on Camera-vehicle yields 52.3% mAP without fine-tuning, demonstrating strong generalization across viewpoints and scenarios. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

20 pages, 49658 KB  
Article
Dead Chicken Identification Method Based on a Spatial-Temporal Graph Convolution Network
by Jikang Yang, Chuang Ma, Haikun Zheng, Zhenlong Wu, Xiaohuan Chao, Cheng Fang and Boyi Xiao
Animals 2026, 16(3), 368; https://doi.org/10.3390/ani16030368 - 23 Jan 2026
Viewed by 83
Abstract
In intensive cage rearing systems, accurate dead hen detection remains difficult due to complex environments, severe occlusion, and the high visual similarity between dead hens and live hens in a prone posture. To address these issues, this study proposes a dead hen identification [...] Read more.
In intensive cage rearing systems, accurate dead hen detection remains difficult due to complex environments, severe occlusion, and the high visual similarity between dead hens and live hens in a prone posture. To address these issues, this study proposes a dead hen identification method based on a Spatial-Temporal Graph Convolutional Network (STGCN). Unlike conventional static image-based approaches, the proposed method introduces temporal information to enable dynamic spatial-temporal modeling of hen health states. First, a multimodal fusion algorithm is applied to visible light and thermal infrared images to strengthen multimodal feature representation. Then, an improved YOLOv7-Pose algorithm is used to extract the skeletal keypoints of individual hens, and the ByteTrack algorithm is employed for multi-object tracking. Based on these results, spatial-temporal graph-structured data of hens are constructed by integrating spatial and temporal dimensions. Finally, a spatial-temporal graph convolution model is used to identify dead hens by learning spatial-temporal dependency features from skeleton sequences. Experimental results show that the improved YOLOv7-Pose model achieves an average precision (AP) of 92.8% in keypoint detection. Based on the constructed spatial-temporal graph data, the dead hen identification model reaches an overall classification accuracy of 99.0%, with an accuracy of 98.9% for the dead hen category. These results demonstrate that the proposed method effectively reduces interference caused by feeder occlusion and ambiguous visual features. By using dynamic spatial-temporal information, the method substantially improves robustness and accuracy of dead hen detection in complex cage rearing environments, providing a new technical route for intelligent monitoring of poultry health status. Full article
(This article belongs to the Special Issue Welfare and Behavior of Laying Hens)
Show Figures

Figure 1

25 pages, 12600 KB  
Article
Underwater Object Recovery Using a Hybrid-Controlled ROV with Deep Learning-Based Perception
by Inés Pérez-Edo, Salvador López-Barajas, Raúl Marín-Prades and Pedro J. Sanz
J. Mar. Sci. Eng. 2026, 14(2), 198; https://doi.org/10.3390/jmse14020198 - 18 Jan 2026
Viewed by 327
Abstract
The deployment of large remotely operated vehicles (ROVs) or autonomous underwater vehicles (AUVs) typically requires support vessels, crane systems, and specialized personnel, resulting in increased logistical complexity and operational costs. In this context, lightweight and modular underwater robots have emerged as a cost-effective [...] Read more.
The deployment of large remotely operated vehicles (ROVs) or autonomous underwater vehicles (AUVs) typically requires support vessels, crane systems, and specialized personnel, resulting in increased logistical complexity and operational costs. In this context, lightweight and modular underwater robots have emerged as a cost-effective alternative, capable of reaching significant depths and performing tasks traditionally associated with larger platforms. This article presents a system architecture for recovering a known object using a hybrid-controlled ROV, integrating autonomous perception, high-level interaction, and low-level control. The proposed architecture includes a perception module that estimates the object pose using a Perspective-n-Point (PnP) algorithm, combining object segmentation from a YOLOv11-seg network with 2D keypoints obtained from a YOLOv11-pose model. In addition, a Natural Language ROS Agent is incorporated to enable high-level command interaction between the operator and the robot. These modules interact with low-level controllers that regulate the vehicle degrees of freedom and with autonomous behaviors such as target approach and grasping. The proposed system is evaluated through simulation and experimental tank trials, including object recovery experiments conducted in a 12 × 8 × 5 m test tank at CIRTESU, as well as perception validation in simulated, tank, and harbor scenarios. The results demonstrate successful recovery of a black box using a BlueROV2 platform, showing that architectures of this type can effectively support operators in underwater intervention tasks, reducing operational risk, deployment complexity, and mission costs. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

28 pages, 26208 KB  
Article
Real-Time Target-Oriented Grasping Framework for Resource-Constrained Robots
by Dongxiao Han, Haorong Li, Yuwen Li and Shuai Chen
Sensors 2026, 26(2), 645; https://doi.org/10.3390/s26020645 - 18 Jan 2026
Viewed by 159
Abstract
Target-oriented grasping has become increasingly important in household and industrial environments, and deploying such systems on mobile robots is particularly challenging due to limited computational resources. To address these limitations, we present an efficient framework for real-time target-oriented grasping on resource-constrained platforms, supporting [...] Read more.
Target-oriented grasping has become increasingly important in household and industrial environments, and deploying such systems on mobile robots is particularly challenging due to limited computational resources. To address these limitations, we present an efficient framework for real-time target-oriented grasping on resource-constrained platforms, supporting both click-based grasping for unknown objects and category-based grasping for known objects. To reduce model complexity while maintaining detection accuracy, YOLOv8 is compressed using a structured pruning method. For grasp pose generation, a pretrained GR-ConvNetv2 predicts candidate grasps, which are restricted to the target object using masks generated by MobileSAMv2. A geometry-based correction module then adjusts the position, angle, and width of the initial grasp poses to improve grasp accuracy. Finally, extensive experiments were carried out on the Cornell and Jacquard datasets, as well as in real-world single-object, cluttered, and stacked scenarios. The proposed framework achieves grasp success rates of 98.8% on the Cornell dataset and 95.8% on the Jacquard dataset, with over 90% success in real-world single-object and cluttered settings, while maintaining real-time performance of 67 ms and 75 ms per frame in the click-based and category-specified modes, respectively. These experiments demonstrate that the proposed framework achieves high grasping accuracy and robust performance, with a efficient design that enables deployment on mobile and resource-constrained robots. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

24 pages, 4886 KB  
Article
YOLOv8-ECCα: Enhancing Object Detection for Power Line Asset Inspection Under Real-World Visual Constraints
by Rita Ait el haj, Badr-Eddine Benelmostafa and Hicham Medromi
Algorithms 2026, 19(1), 66; https://doi.org/10.3390/a19010066 - 12 Jan 2026
Viewed by 137
Abstract
Unmanned Aerial Vehicles (UAVs) have revolutionized power-line inspection by enhancing efficiency, safety, and enabling predictive maintenance through frequent remote monitoring. Central to automated UAV-based inspection workflows is the object detection stage, which transforms raw imagery into actionable data by identifying key components such [...] Read more.
Unmanned Aerial Vehicles (UAVs) have revolutionized power-line inspection by enhancing efficiency, safety, and enabling predictive maintenance through frequent remote monitoring. Central to automated UAV-based inspection workflows is the object detection stage, which transforms raw imagery into actionable data by identifying key components such as insulators, dampers, and shackles. However, the real-world complexity of inspection scenes poses significant challenges to detection accuracy. For example, the InsPLAD-det dataset—characterized by over 30,000 annotations across diverse tower structures and viewpoints, with more than 40% of components partially occluded—illustrates the visual and structural variability typical of UAV inspection imagery. In this study, we introduce YOLOv8-ECCα, a novel object detector tailored for these demanding inspection conditions. Our contributions include: (1) integrating CoordConv, selected over deformable convolution for its efficiency in preserving fine spatial cues without heavy computation; (2) adding Efficient Channel Attention (ECA), preferred to SE or CBAM for its ability to enhance feature relevance using only a single 1D convolution and no dimensionality reduction; and (3) adopting Alpha-IoU, chosen instead of CIoU or GIoU to produce smoother gradients and more stable convergence, particularly under partial overlap or occlusion. Evaluated on the InsPLAD-det dataset, YOLOv8-ECCα achieves an mAP@50 of 82.75%, outperforming YOLOv8s (81.89%) and YOLOv9-E (82.61%) by +0.86% and +0.14%, respectively, while maintaining real-time inference at 86.7 FPS—exceeding the baseline by +2.3 FPS. Despite these improvements, the model retains a compact footprint (28.5 GFLOPs, 11.1 M parameters), confirming its suitability for embedded UAV deployment in real inspection environments. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

21 pages, 5664 KB  
Article
M2S-YOLOv8: Multi-Scale and Asymmetry-Aware Ship Detection for Marine Environments
by Peizheng Li, Dayong Qiao, Jianyi Mu and Linlin Qi
Sensors 2026, 26(2), 502; https://doi.org/10.3390/s26020502 - 12 Jan 2026
Viewed by 223
Abstract
Ship detection serves as a core foundational task for marine environmental perception. However, in real marine scenarios, dense vessel traffic often causes severe target occlusion while multi-scale targets, asymmetric vessel geometries, and harsh conditions (e.g., haze, low illumination) further degrade image quality. These [...] Read more.
Ship detection serves as a core foundational task for marine environmental perception. However, in real marine scenarios, dense vessel traffic often causes severe target occlusion while multi-scale targets, asymmetric vessel geometries, and harsh conditions (e.g., haze, low illumination) further degrade image quality. These factors pose significant challenges to vision-based ship detection methods. To address these issues, we propose M2S-YOLOv8, an improved framework based on YOLOv8, which integrates three key enhancements: First, a Multi-Scale Asymmetry-aware Parallelized Patch-wise Attention (MSA-PPA) module is designed in the backbone to strengthen the perception of multi-scale and geometrically asymmetric vessel targets. Second, a Deformable Convolutional Upsampling (DCNUpsample) operator is introduced in the Neck network to enable adaptive feature fusion with high computational efficiency. Third, a Wasserstein-Distance-Based Weighted Normalized CIoU (WA-CIoU) loss function is developed to alleviate gradient imbalance in small-target regression, thereby improving localization stability. Experimental results on the Unmanned Vessel Zhoushan Perception Dataset (UZPD) and the open-source Singapore Maritime Dataset (SMD) demonstrate that M2S-YOLOv8 achieves a balanced performance between lightweight design and real-time inference, showcasing strong potential for reliable deployment on edge devices of unmanned marine platforms. Full article
(This article belongs to the Section Environmental Sensing)
Show Figures

Figure 1

22 pages, 4804 KB  
Article
SER-YOLOv8: An Early Forest Fire Detection Model Integrating Multi-Path Attention and NWD
by Juan Liu, Jiaxin Feng, Shujie Wang, Yian Ding, Jianghua Guo, Yuhang Li, Wenxuan Xue and Jie Hu
Forests 2026, 17(1), 93; https://doi.org/10.3390/f17010093 - 10 Jan 2026
Viewed by 150
Abstract
Forest ecosystems, as vital natural resources, are increasingly endangered by wildfires. Effective forest fire management relies on the accurate and early detection of small–scale flames and smoke. However, the complex and dynamic forest environment, along with the small size and irregular shape of [...] Read more.
Forest ecosystems, as vital natural resources, are increasingly endangered by wildfires. Effective forest fire management relies on the accurate and early detection of small–scale flames and smoke. However, the complex and dynamic forest environment, along with the small size and irregular shape of early fire indicators, poses significant challenges to reliable early warning systems. To address these issues, this paper introduces SER–YOLOv8, an enhanced detection model based on the YOLOv8 architecture. The model incorporates the RepNCSPELAN4 module and an SPPELAN structure to strengthen multi-scale feature representation. Furthermore, to improve small target localization, the Normalized Wasserstein Distance (NWD) loss is adopted, providing a more robust similarity measure than traditional IoU–based losses. The newly designed SERDet module deeply integrates a multi–scale feature extraction mechanism with a multi-path fused attention mechanism, significantly enhancing the recognition capability for flame targets under complex backgrounds. Depthwise separable convolution (DWConv) is utilized to reduce parameters and boost inference efficiency. Experiments on the M4SFWD dataset show that the proposed method improves mAP50 by 1.2% for flames and 2.4% for smoke, with a 1.5% overall gain in mAP50–95 over the baseline YOLOv8, outperforming existing mainstream models and offering a reliable solution for forest fire prevention. Full article
(This article belongs to the Section Natural Hazards and Risk Management)
Show Figures

Figure 1

25 pages, 92335 KB  
Article
A Lightweight Dynamic Counting Algorithm for the Maize Seedling Population in Agricultural Fields for Embedded Applications
by Dongbin Liu, Jiandong Fang and Yudong Zhao
Agronomy 2026, 16(2), 176; https://doi.org/10.3390/agronomy16020176 - 10 Jan 2026
Viewed by 172
Abstract
In the field management of maize, phenomena such as missed sowing and empty seedlings directly affect the final yield. By implementing seedling replenishment activities and promptly evaluating seedling growth, maize output can be increased by improving seedling survival rates. To address the challenges [...] Read more.
In the field management of maize, phenomena such as missed sowing and empty seedlings directly affect the final yield. By implementing seedling replenishment activities and promptly evaluating seedling growth, maize output can be increased by improving seedling survival rates. To address the challenges posed by complex field environments (including varying light conditions, weeds, and foreign objects), as well as the performance limitations of model deployment on resource-constrained devices, this study proposes a Lightweight Real-Time You Only Look Once (LRT-YOLO) model. This model builds upon the You Only Look Once version 11n (YOLOv11n) framework by designing a lightweight, optimized feature architecture (OF) that enables the model to focus on the characteristics of small to medium-sized maize seedlings. The feature fusion network incorporates two key modules: the Feature Complementary Mapping Module (FCM) and the Multi-Kernel Perception Module (MKP). The FCM captures global features of maize seedlings through multi-scale interactive learning, while the MKP enhances the network’s ability to learn multi-scale features by combining different convolution kernels with pointwise convolution. In the detection head component, the introduction of an NMS-free design philosophy has significantly enhanced the model’s detection performance while simultaneously reducing its inference time. The experiments show that the mAP50 and mAP50:95 of the LRT-YOLO model reached 95.9% and 63.6%, respectively. The model has only 0.86M parameters and a size of just 2.35 M, representing reductions of 66.67% and 54.89% in the number of parameters and model size compared to YOLOv11n. To enable mobile deployment in field environments, this study integrates the LRT-YOLO model with the ByteTrack multi-object tracking algorithm and deploys it on the NVIDIA Jetson AGX Orin platform, utilizing OpenCV tools to achieve real-time visualization of maize seedling tracking and counting. Experiments demonstrate that the frame rate (FPS) achieved with TensorRT acceleration reached 23.49, while the inference time decreased by 38.93%. Regarding counting performance, when tested using static image data, the coefficient of determination (R2) and root mean square error (RMSE) were 0.988 and 5.874, respectively. The cross-line counting method was applied to test the video data, resulting in an R2 of 0.971 and an RMSE of 16.912, respectively. Experimental results show that the proposed method demonstrates efficient performance on edge devices, providing robust technical support for the rapid, non-destructive counting of maize seedlings in field environments. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

25 pages, 7611 KB  
Article
BFRI-YOLO: Harmonizing Multi-Scale Features for Precise Small Object Detection in Aerial Imagery
by Xue Zeng, Shenghong Fang and Qi Sun
Electronics 2026, 15(2), 297; https://doi.org/10.3390/electronics15020297 - 9 Jan 2026
Viewed by 233
Abstract
Identifying minute targets within UAV-acquired imagery continues to pose substantial technical hurdles, primarily due to blurred boundaries, scarce textural details, and drastic scale variations amidst complex backgrounds. In response to these limitations, this paper proposes BFRI-YOLO, an enhanced architecture based on the YOLOv11n [...] Read more.
Identifying minute targets within UAV-acquired imagery continues to pose substantial technical hurdles, primarily due to blurred boundaries, scarce textural details, and drastic scale variations amidst complex backgrounds. In response to these limitations, this paper proposes BFRI-YOLO, an enhanced architecture based on the YOLOv11n baseline. The framework is built upon four synergistic components designed to achieve high-precision localization and robust feature representation. First, we construct a Balanced Adaptive Feature Pyramid Network (BAFPN) that utilizes a resolution-aware attention mechanism to promote bidirectional interaction between deep and shallow features. This is complemented by incorporating the Receptive Field Convolutional Block Attention Module (RFCBAM) to refine the backbone network. By constructing the C3K2_RFCBAM block, we effectively enhance the feature representation of small objects across diverse receptive fields. To further refine the prediction phase, we develop a Four-Shared Detail Enhancement Detection Head (FSDED) to improve both efficiency and stability. Finally, regarding the loss function, we formulate the Inner-WIoU strategy by integrating auxiliary bounding boxes with dynamic focusing mechanisms to ensure precise target localization. The experimental results on the VisDrone2019 benchmark demonstrate that our method secures mAP@0.5 and mAP@0.5:0.95 scores of 42.1% and 25.6%, respectively, outperforming the baseline by 8.8% and 6.2%. Extensive tests on the TinyPerson and DOTA1.0 datasets further validate the robust generalization capability of our model, confirming that BFRI-Yolo strikes a superior balance between detection accuracy and computational overhead in aerial scenes. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 4437 KB  
Article
BAE-UNet: A Background-Aware and Edge-Enhanced Segmentation Network for Two-Stage Pest Recognition in Complex Field Environments
by Jing Chang, Xuefang Li, Xingye Ze, Xue Ding and He Gong
Agronomy 2026, 16(2), 166; https://doi.org/10.3390/agronomy16020166 - 8 Jan 2026
Viewed by 270
Abstract
To address issues such as significant scale differences, complex pose variations, strong background interference, and similar category characteristics of pests in the images obtained from field traps, this study proposes a pest recognition method based on a two-stage “segmentation–detection” approach to improve the [...] Read more.
To address issues such as significant scale differences, complex pose variations, strong background interference, and similar category characteristics of pests in the images obtained from field traps, this study proposes a pest recognition method based on a two-stage “segmentation–detection” approach to improve the accuracy of field pest situation monitoring. In the first stage, an improved segmentation model, BAE-UNet (Background-Aware and Edge-Enhanced U-Net), is adopted. Based on the classic U-Net framework, a Background-Aware Contextual Module (BACM), a Spatial-Channel Refinement and Attention Module (SCRA), and a Multi-Scale Edge-Aware Spatial Attention Module (MESA) are introduced. These modules respectively optimize multi-scale feature extraction, background suppression, and boundary refinement, effectively removing complex background information and accurately extracting pest body regions. In the second stage, the segmented pest body images are input into the YOLOv8 model to achieve precise pest detection and classification. Experimental results show that BAE-UNet performs excellently in the segmentation task, achieving an mIoU of 0.930, a Dice coefficient of 0.951, and a Boundary F1 of 0.943, significantly outperforming both the baseline U-Net and mainstream models such as DeepLabV3+. After segmentation preprocessing, the detection performance of YOLOv8 is also significantly improved. The precision, recall, mAP50, and mAP50–95 increase from 0.748, 0.796, 0.818, and 0.525 to 0.958, 0.971, 0.977, and 0.882, respectively. The results verify that the proposed two-stage recognition method can effectively suppress background interference, enhance the stability and generalization ability of the model in complex natural scenes, and provide an efficient and feasible technical approach for intelligent pest trap image recognition and pest situation monitoring. Full article
(This article belongs to the Section Pest and Disease Management)
Show Figures

Figure 1

16 pages, 5236 KB  
Article
Intelligent Disassembly System for PCB Components Integrating Multimodal Large Language Model and Multi-Agent Framework
by Li Wang, Liu Ouyang, Huiying Weng, Xiang Chen, Anna Wang and Kexin Zhang
Processes 2026, 14(2), 227; https://doi.org/10.3390/pr14020227 - 8 Jan 2026
Viewed by 257
Abstract
The escalating volume of waste electrical and electronic equipment (WEEE) poses a significant global environmental challenge. The disassembly of printed circuit boards (PCBs), a critical step for resource recovery, remains inefficient due to limitations in the adaptability and dexterity of existing automated systems. [...] Read more.
The escalating volume of waste electrical and electronic equipment (WEEE) poses a significant global environmental challenge. The disassembly of printed circuit boards (PCBs), a critical step for resource recovery, remains inefficient due to limitations in the adaptability and dexterity of existing automated systems. This paper proposes an intelligent disassembly system for PCB components that integrates a multimodal large language model (MLLM) with a multi-agent framework. The MLLM serves as the system’s cognitive core, enabling high-level visual-language understanding and task planning by converting images into semantic descriptions and generating disassembly strategies. A state-of-the-art object detection algorithm (YOLOv13) is incorporated to provide fine-grained component localization. This high-level intelligence is seamlessly connected to low-level execution through a multi-agent framework that orchestrates collaborative dual robotic arms. One arm controls a heater for precise solder melting, while the other performs fine “probing-grasping” actions guided by real-time force feedback. Experiments were conducted on 30 decommissioned smart electricity meter PCBs, evaluating the system on recognition rate, capture rate, melting rate, and time consumption for seven component types. Results demonstrate that the system achieved a 100% melting rate across all components and high recognition rates (90–100%), validating its strengths in perception and thermal control. However, the capture rate varied significantly, highlighting the grasping of small, low-profile components as the primary bottleneck. This research presents a significant step towards autonomous, non-destructive e-waste recycling by effectively combining high-level cognitive intelligence with low-level robotic control, while also clearly identifying key areas for future improvement. Full article
Show Figures

Figure 1

19 pages, 1933 KB  
Article
ESS-DETR: A Lightweight and High-Accuracy UAV-Deployable Model for Surface Defect Detection
by Yunze Wang, Yong Yao, Heng Zheng and Yeqing Han
Drones 2026, 10(1), 43; https://doi.org/10.3390/drones10010043 - 8 Jan 2026
Viewed by 268
Abstract
Defects on large-scale structural surfaces can compromise integrity and pose safety hazards, highlighting the need for efficient automated inspection. UAVs provide a flexible and effective platform for such inspections, yet traditional vision-based methods often require high computational resources and show limited sensitivity to [...] Read more.
Defects on large-scale structural surfaces can compromise integrity and pose safety hazards, highlighting the need for efficient automated inspection. UAVs provide a flexible and effective platform for such inspections, yet traditional vision-based methods often require high computational resources and show limited sensitivity to small defects, restricting practical UAV deployment. To address these challenges, we propose ESS-DETR, a lightweight and high-precision detection model designed for UAV-based surface inspection, built upon core modules: EMO-inspired lightweight backbone that integrates convolution and efficient attention mechanisms to reduce parameters; Scale-Decoupled Loss that adaptively balances targets of various sizes to enhance accuracy and robustness for small and irregular defect patterns frequently encountered in UAV imagery; and SPPELAN multi-scale fusion module that improves feature discrimination under complex reflections, shadows, and lighting variations typical of aerial inspection environments. Experimental results demonstrate that ESS-DETR reduces computational complexity from 103.4 to 60.5 GFLOPs and achieves a Precision of 0.837, Recall of 0.738, and mAP of 79, outperforming Faster R-CNN, RT-DETR, and YOLOv11, particularly for small-scale defects, confirming that ESS-DETR effectively balances accuracy, efficiency, and onboard deployability, providing a practical solution for intelligent UAV-based surface inspection. Full article
Show Figures

Figure 1

11 pages, 4106 KB  
Article
UAV Detection in Low-Altitude Scenarios Based on the Fusion of Unaligned Dual-Spectrum Images
by Zishuo Huang, Guhao Zhao, Yarong Wu and Chuanjin Dai
Drones 2026, 10(1), 40; https://doi.org/10.3390/drones10010040 - 7 Jan 2026
Viewed by 243
Abstract
The threat posed by unauthorized drones to public airspace has become increasingly critical. To address the challenge of UAV detection in unaligned visible–infrared dual-spectral images, we present a novel framework that comprises two sequential stages: image alignment and object detection. The Speeded-Up Robust [...] Read more.
The threat posed by unauthorized drones to public airspace has become increasingly critical. To address the challenge of UAV detection in unaligned visible–infrared dual-spectral images, we present a novel framework that comprises two sequential stages: image alignment and object detection. The Speeded-Up Robust Features (SURF) algorithm is applied for feature matching, combined with the gray centroid method to remove mismatched feature points. A plane-adaptive pixel remapping algorithm is further developed to achieve images fusion. In addition, an enhanced YOLOv11 model with a modified loss function is employed to achieve robust object detection in the fused images. Experimental results demonstrate that the proposed method enables precise pixel-level dual-spectrum fusion and reliable UAV detection under diverse and complex conditions. Full article
(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones)
Show Figures

Figure 1

29 pages, 7801 KB  
Article
YOLOP-Tomato: An End-to-End Model for Tomato Detection and Main Stem–Lateral Branch Segmentation
by Didun Kou, Jiandong Fang and Yudong Zhao
Agronomy 2026, 16(2), 150; https://doi.org/10.3390/agronomy16020150 - 7 Jan 2026
Viewed by 320
Abstract
Tomatoes are a rich source of nutrients that are essential for human health. However, in greenhouse environments, the complex growth patterns of tomatoes and stems often result in mutual obstruction and overlapping, posing significant challenges for accurate ripeness detection and stem segmentation. Furthermore, [...] Read more.
Tomatoes are a rich source of nutrients that are essential for human health. However, in greenhouse environments, the complex growth patterns of tomatoes and stems often result in mutual obstruction and overlapping, posing significant challenges for accurate ripeness detection and stem segmentation. Furthermore, the current detection and segmentation tasks are typically executed in isolation, resulting in suboptimal inference efficiency and substantial computational expenses. To address these issues, this study proposes the YOLOP-Tomato (YOLO-Based Panoptic Perception for Tomato) based on YOLOv8n, enabling simultaneous tomato detection and stem and branch segmentation. Two RSU (ReSidual U-blocks) modules establish feature connection mechanisms between the backbone and head. SPPCTX (SPP Context) was developed at the neck of the model to perform multi-scale contextual feature fusion and enhancement. The SCDown (Spatial-Channel Decoupled downsampling) is employed to lightweight the backbone’s terminal structure. The experimental results demonstrate that YOLOP-Tomato achieves precision, recall, mAP50, and mAP50–95 of 94.9%, 85.0%, 93.6%, and 60.9% for detection, and mIoU of 77.6% for segmentation. These results represent improvements of 2.5%, 0.1%, 0.5%, 1.1%, and 1.4%, over YOLOv8n. The trained model was deployed on the NVIDIA Jetson AGX Orin platform, an efficient inference speed of 5.67 milliseconds was achieved. The proposed YOLOP-Tomato provides reliable and efficient technical support for tomato detection, ripeness identification, stem and branch segmentation in greenhouses, and holds great significance for improving the level of intelligent agricultural production. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

24 pages, 3232 KB  
Article
YOLOv11n-DSU: A Study on Grading and Detection of Multiple Cucumber Diseases in Complex Field Backgrounds
by Xiuying Tang, Pei Wang, Zhongqing Sun, Zhenglin Liu, Yumei Tang, Jie Shi, Liying Ma and Yonghua Zhang
Agriculture 2026, 16(2), 140; https://doi.org/10.3390/agriculture16020140 - 6 Jan 2026
Viewed by 196
Abstract
Cucumber downy mildew, angular leaf spot, and powdery mildew represent three predominant fungal diseases that substantially compromise cucumber yield and quality. To address the challenges posed by the irregular morphology, prominent multi-scale characteristics, and ambiguous lesion boundaries of cucumber foliar diseases in complex [...] Read more.
Cucumber downy mildew, angular leaf spot, and powdery mildew represent three predominant fungal diseases that substantially compromise cucumber yield and quality. To address the challenges posed by the irregular morphology, prominent multi-scale characteristics, and ambiguous lesion boundaries of cucumber foliar diseases in complex field environments—which often lead to insufficient detection accuracy—along with the existing models’ difficulty in balancing high precision with lightweight deployment, this study presents YOLOv11n-DSU (a lightweight hierarchical detection model engineered using the YOLOv11n architecture). The proposed model integrates three key enhancements: deformable convolution (DEConv) for optimized feature extraction from irregular lesions, a spatial and channel-wise attention (SCSA) mechanism for adaptive feature refinement, and a Unified Intersection over Union (Unified-IoU) loss function to improve localization accuracy. Experimental evaluations demonstrate substantial performance gains, with mean Average Precision at 50% IoU threshold (mAP50) and mAP50–95 increasing by 7.9 and 10.9 percentage points, respectively, and precision and recall improving by 6.1 and 10.0 percentage points. Moreover, the computational complexity is markedly reduced to 5.8 Giga Floating Point Operations (GFLOPs). Successful deployment on an embedded platform confirms the model’s practical viability, exhibiting robust real-time inference capabilities and portability. This work provides an accurate and efficient solution for automated disease grading in field conditions, enabling real-time and precise severity classification, and offers significant potential for advancing precision plant protection and smart agricultural systems. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

Back to TopTop