Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (77)

Search Parameters:
Keywords = keypoint fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 3030 KB  
Article
DualStream-AttnXGS: An Attention-Enhanced Dual-Stream Model Based on Human Keypoint Recognition for Driver Distraction Detection
by Zhuo He, Chengming Chen and Xiaoyi Zhou
Appl. Sci. 2025, 15(24), 12974; https://doi.org/10.3390/app152412974 - 9 Dec 2025
Viewed by 192
Abstract
Driver distraction remains one of the leading causes of traffic accidents. Although deep learning approaches such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers have been extensively applied for distracted driving detection, their performance is often hindered by limited real-time [...] Read more.
Driver distraction remains one of the leading causes of traffic accidents. Although deep learning approaches such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers have been extensively applied for distracted driving detection, their performance is often hindered by limited real-time efficiency and high false detection rates. To address these challenges, this paper proposes an efficient dual-stream neural architecture, termed DualStream-AttnXGS, which jointly leverages visual and pose information to improve distraction recognition accuracy. In the RGB stream, an enhanced EfficientNetB0 backbone is employed, where Ghost Convolution and Coordinate Attention modules are integrated to strengthen feature representation while maintaining lightweight computation. A compound loss function combining Center Loss and Focal Loss is further introduced to promote inter-class separability and stabilize training. In parallel, the keypoint stream extracts human skeletal features using YOLOv8-Pose, which are subsequently classified through a compact ensemble model based on XGBoost v2.1.4 and Gradient Boosting. Finally, a Softmax-based probabilistic fusion strategy integrates the outputs of both streams for the final prediction. The proposed model achieved 99.59% accuracy on the SFD3 dataset while attaining 99.12% accuracy on the AUCD2 dataset, demonstrating that the proposed dual-stream architecture provides a more effective solution than single-stream models by leveraging complementary visual and pose information. Full article
(This article belongs to the Section Transportation and Future Mobility)
Show Figures

Figure 1

22 pages, 5092 KB  
Article
Fault Diagnosis Method for Excitation Dry-Type Transformer Based on Multi-Channel Vibration Signal and Visual Feature Fusion
by Yang Liu, Mingtao Yu, Jingang Wang, Peng Bao, Weiguo Zu, Yinglong Deng, Shiyi Chen, Lijiang Ma, Pengcheng Zhao and Jinyao Dou
Sensors 2025, 25(24), 7460; https://doi.org/10.3390/s25247460 - 8 Dec 2025
Viewed by 283
Abstract
To address the limitations of existing fault diagnosis methods for excitation dry-type transformers, such as inadequate utilization of multi-axis vibration data, low recognition accuracy under complex operational conditions, and limited computational efficiency, this paper presents a lightweight fault diagnosis approach based on the [...] Read more.
To address the limitations of existing fault diagnosis methods for excitation dry-type transformers, such as inadequate utilization of multi-axis vibration data, low recognition accuracy under complex operational conditions, and limited computational efficiency, this paper presents a lightweight fault diagnosis approach based on the fusion of multi-channel vibration signals and visual features. Initially, a multi-physics field coupling simulation model of the excitation dry-type transformer is developed. Vibration data collected from field-installed three-axis sensors are combined to generate typical fault samples, including normal operation, winding looseness, core looseness, and winding eccentricity. Due to the high dimensionality of vibration signals, the Symmetrized Dot Pattern (ISDP) method is extended to aggregate and map time- and frequency-domain information from the x-, y-, and z-axes into a two-dimensional feature map. To optimize the inter-class separability and intra-class consistency of the map, Particle Swarm Optimization (PSO) is employed to adaptively adjust the angle gain factor (η) and time delay coefficient (t). Keypoint descriptors are then extracted from the map using the Oriented FAST and Rotated BRIEF (ORB) feature extraction operator, which improves computational efficiency while maintaining sensitivity to local details. Finally, an efficient fault classification model is constructed using an Adaptive Boosting Support Vector Machine (Adaboost-SVM) to achieve robust fault mode recognition across multiple operating conditions. Experimental results demonstrate that the proposed method achieves a fault diagnosis accuracy of 94.00%, outperforming signal-to-image techniques such as Gramian Angular Field (GAF), Recurrence Plot (RP), and Markov Transition Field (MTF), as well as deep learning models based on Convolutional Neural Networks (CNN) in both training and testing time. Additionally, the method exhibits superior stability and robustness in repeated trials. This approach is well-suited for online monitoring and rapid diagnosis in resource-constrained environments, offering significant engineering value in enhancing the operational safety and reliability of excitation dry-type transformers. Full article
(This article belongs to the Collection Sensors and Sensing Technology for Industry 4.0)
Show Figures

Figure 1

23 pages, 7270 KB  
Article
DHN-YOLO: A Joint Detection Algorithm for Strawberries at Different Maturity Stages and Key Harvesting Points
by Hongrui Hao, Juan Xi, Jingyuan Dai, Guozheng Wang, Dayang Liu and Liangkuan Zhu
Plants 2025, 14(22), 3439; https://doi.org/10.3390/plants14223439 - 10 Nov 2025
Viewed by 731
Abstract
Strawberries are important cash crops. Traditional manual picking is costly and inefficient, while automated harvesting robots are hindered by field challenges like stem-leaf occlusion, fruit overlap, and appearance/maturity variations from lighting and viewing angles. To address the need for accurate cross-maturity fruit identification [...] Read more.
Strawberries are important cash crops. Traditional manual picking is costly and inefficient, while automated harvesting robots are hindered by field challenges like stem-leaf occlusion, fruit overlap, and appearance/maturity variations from lighting and viewing angles. To address the need for accurate cross-maturity fruit identification and keypoint detection, this study constructed a strawberry image dataset covering multiple varieties, ripening stages, and complex ridge-cultivation field conditions: MSRBerry. Based on the YOLO11-pose framework, we proposed DHN-YOLO with three key improvements: replacing the original C2PSA with the CDC module to enhance subtle feature capture and irregular shape adaptability; substituting C3K2 with C3H to strengthen multi-scale feature extraction and robustness to lighting-induced maturity/color variations; and upgrading the neck into a New-Neck via CA and dual-path fusion to reduce feature loss and improve critical region perception. These modifications enhanced feature quality while cutting parameters and accelerating inference. Experimental results showed DHN-YOLO achieved 87.3% precision, 88% recall, and 78.6% mAP@50:95 for strawberry detection (0.9%, 1.6%, 5% higher than YOLO11-pose), and 83%, 87.5%, 83.6% for keypoint detection (1.9%, 2.1%, 4.6% improvements). It also reached 71.6 FPS with 15 ms single-image inference. The overall performance of DHN-YOLO also surpasses other mainstream models such as YOLO13, YOLO10, DETR and so on. This demonstrates DHN-YOLO meets practical needs for robust strawberry and picking point detection in complex agricultural environments. Full article
(This article belongs to the Special Issue AI-Driven Machine Vision Technologies in Plant Science)
Show Figures

Figure 1

23 pages, 8088 KB  
Article
Research on Wheat Spike Phenotype Extraction Based on YOLOv11 and Image Processing
by Xuanxuan Li, Zhenghui Zhang, Jiayu Wang, Lining Liu and Pingzeng Liu
Agriculture 2025, 15(21), 2295; https://doi.org/10.3390/agriculture15212295 - 4 Nov 2025
Cited by 1 | Viewed by 447
Abstract
With the aim of tuning the complexity of traditional image processing parameters, the automated extraction of spike phenotypes based on the fusion of YOLOv11 and image processing was proposed, with winter wheat in Lingcheng District, Dezhou City, Shandong Province as the research object. [...] Read more.
With the aim of tuning the complexity of traditional image processing parameters, the automated extraction of spike phenotypes based on the fusion of YOLOv11 and image processing was proposed, with winter wheat in Lingcheng District, Dezhou City, Shandong Province as the research object. The keypoint detection of spikes was studied, and the integration of FocalModulation and TADDH modules improved the feature extraction ability, solved the problems of light interference and spike awn occlusion under the complex environment in the field, and the detection accuracy of the improved model reached 96.00%, and the mAP50 reached 98.70%, which were 6.6% and 2.8% higher than that of the original model, respectively. On this basis, this paper integrated morphological processing and a watershed algorithm, and innovatively constructed an integrated extraction method for spike length, spike width, and number of grains in the spike to realize the automated extraction of phenotypic parameters in the spike. The experimental results show that the extraction accuracy of spike length, spike width, and number of grains reached 98.08%, 96.21%, and 93.66%, respectively, which provides accurate data support for wheat yield prediction and genetic breeding research, and promotes the development of intelligent agricultural phenomic technology innovation. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
Show Figures

Figure 1

25 pages, 3059 KB  
Article
A Lightweight Framework for Pilot Pose Estimation and Behavior Recognition with Integrated Safety Assessment
by Honglan Wu, Xin Lu, Youchao Sun and Hao Liu
Aerospace 2025, 12(11), 986; https://doi.org/10.3390/aerospace12110986 - 3 Nov 2025
Viewed by 700
Abstract
With the rapid advancement of aviation technology, modern aircraft cockpits are evolving toward high automation and intelligence, making pilot-cockpit interaction a critical factor influencing flight safety and efficiency. Pilot pose estimation and behavior recognition are critical for monitoring pilot state, preventing operational errors, [...] Read more.
With the rapid advancement of aviation technology, modern aircraft cockpits are evolving toward high automation and intelligence, making pilot-cockpit interaction a critical factor influencing flight safety and efficiency. Pilot pose estimation and behavior recognition are critical for monitoring pilot state, preventing operational errors, and enabling adaptive human–machine interaction, thus playing an essential role in aviation safety assurance and intelligent cockpit development. However, existing methods face challenges in real-time performance, reliability, and computational complexity in practical applications. Traditional approaches, such as wearable sensors and image-processing-based algorithms, demonstrate certain effectiveness but still exhibit limitations in aviation environments. To address these issues, this paper proposes a lightweight pilot pose estimation and behavior recognition framework, integrating Vision Transformer with depth-wise separable convolution to optimize the accuracy and efficiency of keypoint detection. Additionally, a novel multimodal data fusion technique is introduced, along with a scientifically designed evaluation system, to enhance the robustness and security of the system in complex environments. Experimental results on a pilot keypoint detection dataset captured in a simulated cockpit environment show that the proposed method achieves 81.9 AP, while substantially reducing model parameters and notably improving inference efficiency compared with HRNet. This study provides new insights and methodologies for the design and evaluation of aviation human-machine interaction systems. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

25 pages, 2630 KB  
Article
Lightweight and Real-Time Driver Fatigue Detection Based on MG-YOLOv8 with Facial Multi-Feature Fusion
by Chengming Chen, Xinyue Liu, Meng Zhou, Zhijian Li, Zhanqi Du and Yandan Lin
J. Imaging 2025, 11(11), 385; https://doi.org/10.3390/jimaging11110385 - 1 Nov 2025
Cited by 1 | Viewed by 748
Abstract
Driver fatigue is a primary factor in traffic accidents and poses a serious threat to road safety. To address this issue, this paper proposes a multi-feature fusion fatigue detection method based on an improved YOLOv8 model. First, the method uses an enhanced YOLOv8 [...] Read more.
Driver fatigue is a primary factor in traffic accidents and poses a serious threat to road safety. To address this issue, this paper proposes a multi-feature fusion fatigue detection method based on an improved YOLOv8 model. First, the method uses an enhanced YOLOv8 model to achieve high-precision face detection. Then, it crops the detected face regions. Next, the lightweight PFLD (Practical Facial Landmark Detector) model performs keypoint detection on the cropped images, extracting 68 facial feature points and calculating key indicators related to fatigue status. These indicators include the eye aspect ratio (EAR), eyelid closure percentage (PERCLOS), mouth aspect ratio (MAR), and head posture ratio (HPR). To mitigate the impact of individual differences on detection accuracy, the paper introduces a novel sliding window model that combines a dynamic threshold adjustment strategy with an exponential weighted moving average (EWMA) algorithm. Based on this framework, blink frequency (BF), yawn frequency (YF), and nod frequency (NF) are calculated to extract time-series behavioral features related to fatigue. Finally, the driver’s fatigue state is determined using a comprehensive fatigue assessment algorithm. Experimental results on the WIDER FACE and YAWDD datasets demonstrate this method’s significant advantages in improving detection accuracy and computational efficiency. By striking a better balance between real-time performance and accuracy, the proposed method shows promise for real-world driving applications. Full article
Show Figures

Figure 1

15 pages, 2201 KB  
Article
CGFusionFormer: Exploring Compact Spatial Representation for Robust 3D Human Pose Estimation with Low Computation Complexity
by Tao Lu, Hongtao Wang and Degui Xiao
Sensors 2025, 25(19), 6052; https://doi.org/10.3390/s25196052 - 1 Oct 2025
Viewed by 734
Abstract
Transformer-based 2D-to-3D lifting methods have demonstrated outstanding performance in 3D human pose estimation from 2D pose sequences. However, they still encounter challenges with the relatively poor quality of 2D joints and substantial computational costs. In this paper, we propose a CGFusionFormer to address [...] Read more.
Transformer-based 2D-to-3D lifting methods have demonstrated outstanding performance in 3D human pose estimation from 2D pose sequences. However, they still encounter challenges with the relatively poor quality of 2D joints and substantial computational costs. In this paper, we propose a CGFusionFormer to address these problems. We propose a compact spatial representation (CSR) to robustly generate local spatial multihypothesis features from part of the 2D pose sequence. Specifically, CSR models spatial constraints based on body parts and incorporates 2D Gaussian filters and nonparametric reduction to improve spatial features against low-quality 2D poses and reduce the computational cost of subsequent temporal encoding. We design a residual-based Hybrid Adaptive Fusion module that combines multihypothesis features with global frequency domain features to accurately estimate the 3D human pose with minimal computational cost. We realize CGFusionFormer with a PoseFormer-like transformer backbone. Extensive experiments on the challenging Human3.6M and MPI-INF-3DHP benchmarks show that our method outperforms prior transformer-based variants in short receptive fields and achieves a superior accuracy–efficiency trade-off. On Human3.6M (sequence length 27, 3 input frames), it achieves 47.6 mm Mean Per Joint Position Error (MPJPE) at only 71.3 MFLOPs, representing about a 40 percent reduction in computation compared with PoseFormerV2 while attaining better accuracy. On MPI-INF-3DHP (81-frame sequences), it reaches 97.9 Percentage of Correct Keypoints (PCK), 78.5 Area Under the Curve (AUC), and 27.2 mm MPJPE, matching the best PCK and achieving the lowest MPJPE among the compared methods under the same setting. Full article
Show Figures

Figure 1

28 pages, 4443 KB  
Article
UCINet: A Multi-Task Network for Umbilical Coiling Index Measurement in Obstetric Ultrasound
by Zhuofu Liu, Lichen Niu, Zhixin Di and Meimei Liu
Algorithms 2025, 18(9), 592; https://doi.org/10.3390/a18090592 - 22 Sep 2025
Viewed by 601
Abstract
The umbilical coiling index (UCI), which quantifies the degree of vascular coiling in the umbilical cord, is a crucial indicator for assessing fetal intrauterine development and predicting perinatal outcomes. However, the existing methods for measuring the UCI primarily rely on manual assessment, which [...] Read more.
The umbilical coiling index (UCI), which quantifies the degree of vascular coiling in the umbilical cord, is a crucial indicator for assessing fetal intrauterine development and predicting perinatal outcomes. However, the existing methods for measuring the UCI primarily rely on manual assessment, which suffers from low efficiency and susceptibility to inter-observer variability. In response to the challenges in measuring the umbilical coiling index during obstetric ultrasound, we propose UCINet, a multi-task neural network engineered explicitly for this purpose. UCINet demonstrates enhanced operational efficiency and significantly improved accuracy in detection, catering to the nuanced requirements of obstetric imaging. Firstly, this paper proposes a Frequency–Spatial Domain Downsampling Module (FSDM) to extract features in both the frequency and spatial domains, thereby reducing the loss of umbilical cord features and enhancing their representational capacity. The proposed Multi-Receptive Field Feature Perception Module (MRPM) employs receptive fields of varying sizes across different stages of the feature maps, enhancing the richness of feature representation. This approach allows the model to capture a more diverse set of spatial information, contributing to improved overall performance in feature extraction. A Multi-Scale Feature Aggregation Module (MSAM) comprehensively leverages multi-scale features via a dynamic fusion mechanism, optimizing the integration of disparate feature scales for enhanced performance. In addition, the UCI dataset, which consisted of 2018 annotated ultrasound images, was constructed, each labeled with the number of vascular coils and keypoints at both ends of the umbilical cord. Compared with state-of-the-art methods, UCINet achieves consistent improvements across two tasks. In object detection, UCINet outperforms Deformable DETR-R50 with an improvement of 1.2% points in mAP@50. In keypoint localization, it further exceeds YOLOv11 with a 3.0% gain in mAP@50, highlighting its effectiveness in both detection accuracy and fine-grained keypoint prediction. Full article
(This article belongs to the Special Issue Machine Learning for Pattern Recognition (3rd Edition))
Show Figures

Figure 1

22 pages, 4598 KB  
Article
A ST-ConvLSTM Network for 3D Human Keypoint Localization Using MmWave Radar
by Siyuan Wei, Huadong Wang, Yi Mo and Dongping Du
Sensors 2025, 25(18), 5857; https://doi.org/10.3390/s25185857 - 19 Sep 2025
Cited by 1 | Viewed by 723
Abstract
Accurate human keypoint localization in complex environments demands robust sensing and advanced modeling. In this article, we construct a ST-ConvLSTM network for 3D human keypoint estimation via millimeter-wave radar point clouds. The ST-ConvLSTM network processes multi-channel radar image inputs, generated from multi-frame fused [...] Read more.
Accurate human keypoint localization in complex environments demands robust sensing and advanced modeling. In this article, we construct a ST-ConvLSTM network for 3D human keypoint estimation via millimeter-wave radar point clouds. The ST-ConvLSTM network processes multi-channel radar image inputs, generated from multi-frame fused point clouds through parallel pathways. These pathways are engineered to extract rich spatiotemporal features from the sequential radar data. The extracted features are then fused and fed into fully connected layers for direct regression of 3D human keypoint coordinates. In order to achieve better network performance, a mmWave radar 3D human keypoint dataset (MRHKD) is built with a hybrid human motion annotation system (HMAS), in which a binocular camera is used to measure the human keypoint coordinates and a 60 GHz 4T4R radar is used to generate radar point clouds. Experimental results demonstrate that the proposed ST-ConvLSTM, leveraging its unique ability to model temporal dependencies and spatial patterns in radar imagery, achieves MAEs of 0.1075 m, 0.0633 m, and 0.1180 m in the horizontal, vertical, and depth directions. This significant improvement underscores the model’s enhanced posture recognition accuracy and keypoint localization capability in challenging conditions. Full article
(This article belongs to the Special Issue Advances in Multichannel Radar Systems)
Show Figures

Figure 1

26 pages, 2929 KB  
Article
A Unified Framework for Enhanced 3D Spatial Localization of Weeds via Keypoint Detection and Depth Estimation
by Shuxin Xie, Tianrui Quan, Junjie Luo, Xuesong Ren and Yubin Miao
Agriculture 2025, 15(17), 1854; https://doi.org/10.3390/agriculture15171854 - 30 Aug 2025
Viewed by 847
Abstract
In this study, a lightweight deep neural network framework WeedLoc3D based on multi-task learning is proposed to meet the demand of accurate three-dimensional positioning of weed targets in automatic laser weeding. Based on a single RGB image, it both locates the 2D keypoints [...] Read more.
In this study, a lightweight deep neural network framework WeedLoc3D based on multi-task learning is proposed to meet the demand of accurate three-dimensional positioning of weed targets in automatic laser weeding. Based on a single RGB image, it both locates the 2D keypoints (growth points) of weeds and estimates the depth with high accuracy. This is a breakthrough from the traditional thinking. To improve the model performance, we introduce several innovative structural modules, including Gated Feature Fusion (GFF) for adaptive feature integration, Hybrid Domain Block (HDB) for dealing with high-frequency details, and Cross-Branch Attention (CBA) for promoting synergy among tasks. Experimental validation on field data sets confirms the effectiveness of our method. It significantly reduces the positioning error of 3D keypoints and achieves stable performance in diverse detection and estimation tasks. The demonstrated high accuracy and robustness highlight its potential for practical application. Full article
Show Figures

Graphical abstract

23 pages, 28831 KB  
Article
Micro-Expression-Based Facial Analysis for Automated Pain Recognition in Dairy Cattle: An Early-Stage Evaluation
by Shuqiang Zhang, Kashfia Sailunaz and Suresh Neethirajan
AI 2025, 6(9), 199; https://doi.org/10.3390/ai6090199 - 22 Aug 2025
Viewed by 1875
Abstract
Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm [...] Read more.
Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm triage. Although earlier systems tracked whole-body posture or static grimace scales, frame-level detection of facial micro-expressions has not been explored fully in livestock. We translate micro-expression analytics from automotive driver monitoring to the barn, linking modern computer vision with veterinary ethology. Our two-stage pipeline first detects faces and 30 landmarks using a custom You Only Look Once (YOLO) version 8-Pose network, achieving a 96.9% mean average precision (mAP) at an Intersection over the Union (IoU) threshold of 0.50 for detection and 83.8% Object Keypoint Similarity (OKS) for keypoint placement. Cropped eye, ear, and muzzle patches are encoded using a pretrained MobileNetV2, generating 3840-dimensional descriptors that capture millisecond muscle twitches. Sequences of five consecutive frames are fed into a 128-unit Long Short-Term Memory (LSTM) classifier that outputs pain probabilities. On a held-out validation set of 1700 frames, the system records 99.65% accuracy and an F1-score of 0.997, with only three false positives and three false negatives. Tested on 14 unseen barn videos, it attains 64.3% clip-level accuracy (i.e., overall accuracy for the whole video clip) and 83% precision for the pain class, using a hybrid aggregation rule that combines a 30% mean probability threshold with micro-burst counting to temper false alarms. As an early exploration from our proof-of-concept study on a subset of our custom dairy farm datasets, these results show that micro-expression mining can deliver scalable, non-invasive pain surveillance across variations in illumination, camera angle, background, and individual morphology. Future work will explore attention-based temporal pooling, curriculum learning for variable window lengths, domain-adaptive fine-tuning, and multimodal fusion with accelerometry on the complete datasets to elevate the performance toward clinical deployment. Full article
Show Figures

Figure 1

16 pages, 707 KB  
Article
High-Resolution Human Keypoint Detection: A Unified Framework for Single and Multi-Person Settings
by Yuhuai Lin, Kelei Li and Haihua Wang
Algorithms 2025, 18(8), 533; https://doi.org/10.3390/a18080533 - 21 Aug 2025
Viewed by 1967
Abstract
Human keypoint detection has become a fundamental task in computer vision, underpinning a wide range of downstream applications such as action recognition, intelligent surveillance, and human–computer interaction. Accurate localization of keypoints is crucial for understanding human posture, behavior, and interactions in various environments. [...] Read more.
Human keypoint detection has become a fundamental task in computer vision, underpinning a wide range of downstream applications such as action recognition, intelligent surveillance, and human–computer interaction. Accurate localization of keypoints is crucial for understanding human posture, behavior, and interactions in various environments. In this paper, we propose a deep-learning-based human skeletal keypoint detection framework that leverages a High-Resolution Network (HRNet) to achieve robust and precise keypoint localization. Our method maintains high-resolution representations throughout the entire network, enabling effective multi-scale feature fusion, without sacrificing spatial details. This approach preserves the fine-grained spatial information that is often lost in conventional downsampling-based methods. To evaluate its performance, we conducted extensive experiments on the COCO dataset, where our approach achieved competitive performance in terms of Average Precision (AP) and Average Recall (AR), outperforming several state-of-the-art methods. Furthermore, we extended our pipeline to support multi-person keypoint detection in real-time scenarios, ensuring scalability for complex environments. Experimental results demonstrated the effectiveness of our method in both single-person and multi-person settings, providing a comprehensive and flexible solution for various pose estimation tasks in dynamic real-world applications. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

28 pages, 9582 KB  
Article
End-to-End Model Enabled GPR Hyperbolic Keypoint Detection for Automatic Localization of Underground Targets
by Feifei Hou, Yu Zhang, Jian Dong and Jinglin Fan
Remote Sens. 2025, 17(16), 2791; https://doi.org/10.3390/rs17162791 - 12 Aug 2025
Cited by 2 | Viewed by 1723
Abstract
Ground-Penetrating Radar (GPR) is a non-destructive detection technique widely employed for identifying underground targets. Despite its utility, conventional approaches suffer from limitations, including poor adaptability to multi-scale targets and suboptimal localization accuracy. To overcome these challenges, we propose a lightweight deep learning framework, [...] Read more.
Ground-Penetrating Radar (GPR) is a non-destructive detection technique widely employed for identifying underground targets. Despite its utility, conventional approaches suffer from limitations, including poor adaptability to multi-scale targets and suboptimal localization accuracy. To overcome these challenges, we propose a lightweight deep learning framework, the Dual Attentive YOLOv11 (You Only Look Once, version 11) Keypoint Detector (DAYKD), designed for robust underground target detection and precise localization. Building upon the YOLOv11 architecture, our method introduces two key innovations to enhance performance: (1) a dual-task learning framework that synergizes bounding box detection with keypoint regression to refine localization precision, and (2) a novel Convolution and Attention Fusion Module (CAFM) coupled with a Feature Refinement Network (FRFN) to enhance multi-scale feature representation. Extensive ablation studies demonstrate that DAYKD achieves a precision of 93.7% and an mAP50 of 94.7% in object detection tasks, surpassing the baseline model by about 13% in F1-score, a balanced metric that combines precision and recall to evaluate overall model performance, underscoring its superior performance. These findings confirm that DAYKD delivers exceptional recognition accuracy and robustness, offering a promising solution for high-precision underground target localization. Full article
(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)
Show Figures

Figure 1

21 pages, 8522 KB  
Article
MythPose: Enhanced Detection of Complex Poses in Thangka Figures
by Yukai Xian, Te Shen, Yurui Lee, Ping Lan, Qijun Zhao and Liang Yan
Sensors 2025, 25(16), 4983; https://doi.org/10.3390/s25164983 - 12 Aug 2025
Viewed by 642
Abstract
Thangka is a unique form of painting in Tibet, which holds rich cultural significance and artistic value. In Thangkas, in addition to the standard human form, there are also figures with multiple limbs. Existing human pose estimation methods are not well suited for [...] Read more.
Thangka is a unique form of painting in Tibet, which holds rich cultural significance and artistic value. In Thangkas, in addition to the standard human form, there are also figures with multiple limbs. Existing human pose estimation methods are not well suited for keypoint detection of figures in Thangka paintings. This paper builds upon YOLOv11-Pose and introduces the Mamba structure to enhance the model’s ability to capture global features. A feature fusion module is employed to integrate both shallow and deep features, and a KAL loss function is proposed to alleviate the interference between keypoints of different body parts. In this study, a dataset of 6208 Thangka images is collected and annotated for Thangka keypoint detection, and data augmentation techniques are used to enhance the generalization of the dataset. Experimental results show that MythPose achieves 89.13% mAP@0.5, 92.51% PCK, and 87.22% OKS in human pose estimation tasks on Thangka images, outperforming the baseline model. This research not only provides a reference for the digital preservation of Thangka art but also offers insights for pose estimation tasks in other similar artworks. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 5705 KB  
Article
Cherry Tomato Bunch and Picking Point Detection for Robotic Harvesting Using an RGB-D Sensor and a StarBL-YOLO Network
by Pengyu Li, Ming Wen, Zhi Zeng and Yibin Tian
Horticulturae 2025, 11(8), 949; https://doi.org/10.3390/horticulturae11080949 - 11 Aug 2025
Cited by 4 | Viewed by 1741
Abstract
For fruit harvesting robots, rapid and accurate detection of fruits and picking points is one of the main challenges for their practical deployment. Several fruits typically grow in clusters or bunches, such as grapes, cherry tomatoes, and blueberries. For such clustered fruits, it [...] Read more.
For fruit harvesting robots, rapid and accurate detection of fruits and picking points is one of the main challenges for their practical deployment. Several fruits typically grow in clusters or bunches, such as grapes, cherry tomatoes, and blueberries. For such clustered fruits, it is desired for them to be picked by bunches instead of individually. This study proposes utilizing a low-cost off-the-shelf RGB-D sensor mounted on the end effector and a lightweight improved YOLOv8-Pose neural network to detect cherry tomato bunches and picking points for robotic harvesting. The problem of occlusion and overlap is alleviated by merging RGB and depth images from the RGB-D sensor. To enhance detection robustness in complex backgrounds and reduce the complexity of the model, the Starblock module from StarNet and the coordinate attention mechanism are incorporated into the YOLOv8-Pose network, termed StarBL-YOLO, to improve the efficiency of feature extraction and reinforce spatial information. Additionally, we replaced the original OKS loss function with the L1 loss function for keypoint loss calculation, which improves the accuracy in picking points localization. The proposed method has been evaluated on a dataset with 843 cherry tomato RGB-D image pairs acquired by a harvesting robot at a commercial greenhouse farm. Experimental results demonstrate that the proposed StarBL-YOLO model achieves a 12% reduction in model parameters compared to the original YOLOv8-Pose while improving detection accuracy for cherry tomato bunches and picking points. Specifically, the model shows significant improvements across all metrics: for computational efficiency, model size (−11.60%) and GFLOPs (−7.23%); for pickable bunch detection, mAP50 (+4.4%) and mAP50-95 (+4.7%); for non-pickable bunch detection, mAP50 (+8.0%) and mAP50-95 (+6.2%); and for picking point detection, mAP50 (+4.3%), mAP50-95 (+4.6%), and RMSE (−23.98%). These results validate that StarBL-YOLO substantially enhances detection accuracy for cherry tomato bunches and picking points while improving computational efficiency, which is valuable for resource-constrained edge-computing deployment for harvesting robots. Full article
(This article belongs to the Special Issue Advanced Automation for Tree Fruit Orchards and Vineyards)
Show Figures

Figure 1

Back to TopTop