Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (424)

Search Parameters:
Keywords = keypoint detection

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 68864 KB  
Article
A Morpho-Phase Feature-Based Method for Geometric Error Mitigation in InSAR Image Matching
by Yanming Chen, Fan Zhang, Yanfang Liu, Fei Ma and Bingnan Wang
Remote Sens. 2026, 18(13), 2060; https://doi.org/10.3390/rs18132060 (registering DOI) - 23 Jun 2026
Abstract
Interferometric Synthetic Aperture Radar (InSAR) is a promising payload for Unmanned Aerial Vehicle (UAV) scene matching navigation due to the rich textures in interferogram images compared to SAR intensity images. However, geometric parameter estimation errors during reference interferogram image generation cause significant textural [...] Read more.
Interferometric Synthetic Aperture Radar (InSAR) is a promising payload for Unmanned Aerial Vehicle (UAV) scene matching navigation due to the rich textures in interferogram images compared to SAR intensity images. However, geometric parameter estimation errors during reference interferogram image generation cause significant textural discrepancies with real-time data. Compounded by inherent non-local similarity of InSAR images, these issues render conventional matching algorithms ineffective, degrading navigation accuracy. To address these challenges, this paper proposes a Morpho-Phase feature-based InSAR image matching method to mitigate the impact of parameter errors. Firstly, a Phase-Robust Keypoint (PRK) detection method is proposed, which overcomes the impact of parameter errors on keypoint detection by introducing a compensated phase and extracting phase extrema. Secondly, a Hierarchical Morphological-Phase Descriptor (HMPD) is constructed to resolve the feature ambiguity caused by the non-local similarity of interferograms by combining morphological features with phase statistics. Experimental results based on real-world InSAR data demonstrate that the proposed matching method effectively mitigates the impact of parameter errors on InSAR image matching, enhances navigation positioning accuracy, and provides stable, high-precision positioning capabilities in practical scene matching navigation tasks. Full article
Show Figures

Figure 1

40 pages, 27259 KB  
Article
Monocular 3D Position Estimation of a Moving Vehicle Based on a Kalman-Goldschmidt Adaptive Filter
by Diana Kalita, Pavel Lyakhov, Valery Andreev and Denis Butusov
J. Sens. Actuator Netw. 2026, 15(3), 48; https://doi.org/10.3390/jsan15030048 (registering DOI) - 18 Jun 2026
Viewed by 86
Abstract
Determining the 3D position of a vehicle from a 2D image plays a key role in video surveillance, autonomous driving, and spatial localization. However, localization accuracy can significantly degrade in conditions of incomplete or synthetic measurement noise and keypoint jitter. In this paper, [...] Read more.
Determining the 3D position of a vehicle from a 2D image plays a key role in video surveillance, autonomous driving, and spatial localization. However, localization accuracy can significantly degrade in conditions of incomplete or synthetic measurement noise and keypoint jitter. In this paper, we propose a new iterative 3D position estimation algorithm (KGA). This algorithm includes geometric correction and calibration steps for converting from 2D to 3D coordinates; trajectory prediction and correction using a Kalman filter; and adaptive tuning of the filter parameters using the Goldschmidt algorithm. Experiments confirm that KGA outperforms the standard (FK) and modified (MFK) Kalman filters in accuracy and convergence speed, demonstrating robustness to various camera angles and noise levels. The novelty of this approach lies in the integration of the Goldschmidt algorithm into the Kalman filter to create an adaptation mechanism that dynamically adjusts the measurement noise covariance based on instantaneous innovation magnitude. Unlike end-to-end deep learning trackers or nonlinear filters (EKF/UKF), KGA is designed as a lightweight post-processing stage that can be seamlessly integrated into existing detection pipelines while maintaining the low computational footprint required for UAV-based edge deployment. The algorithm is of practical value for computer vision systems requiring accurate and robust tracking under varying observational conditions, with current implementation suitable for offline or buffered processing, and clear pathways to real-time deployment through code optimization. The algorithm is of practical value for computer vision systems requiring accurate and robust tracking under varying observational conditions. Full article
(This article belongs to the Section Big Data, Computing and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 1688 KB  
Article
Deep Learning-Based Evaluation of Maxillary Dental Midline Deviation on Orthodontic Frontal Photographs
by Sercan Taskin, Serra Aksoy, Mine Gecgelen Cesur, Pinar Demircioglu and Ismail Bogrekci
Bioengineering 2026, 13(6), 687; https://doi.org/10.3390/bioengineering13060687 - 15 Jun 2026
Viewed by 304
Abstract
Aim: This study aimed to detect the maxillary dental midline region on orthodontic frontal photographs using a YOLOv8-based deep learning approach and to evaluate how the detection outputs affect the classification performance of various machine learning algorithms in distinguishing symmetric from asymmetric midline [...] Read more.
Aim: This study aimed to detect the maxillary dental midline region on orthodontic frontal photographs using a YOLOv8-based deep learning approach and to evaluate how the detection outputs affect the classification performance of various machine learning algorithms in distinguishing symmetric from asymmetric midline conditions. Materials and Methods: A total of 146 standardized frontal photographs (72 with midline deviation ≥ 2 mm from the facial midline, defined by the soft-tissue nasion–subnasal line; 74 symmetric) were analyzed. YOLOv8 was used to obtain bounding-box and keypoint predictions, which were converted into a numerical feature vector and used to train 11 classifiers (including Naive Bayes, Logistic Regression with L1 and ElasticNet penalties, Support Vector Machine, AdaBoost, and others). Performance was assessed using accuracy (with 95% Wilson confidence intervals), precision, recall, F1-score, and ROC-AUC. Optimization of hyperparameters for the downstream classifiers employed five-fold cross-validation along with grid search inside the training data set (n = 126) while final classifier assessment was done using a reserved test data set (n = 20). As the YOLOv8 object detector was trained using the full image dataset before extracting features, the classification metrics presented here should be considered as exploratory results only. Results: YOLOv8 achieved mAP@0.5 = 0.995 for midline detection. Naive Bayes attained the highest classification accuracy of 75% (95% CI: 53–89%) with ROC-AUC = 0.75. AdaBoost achieved 65% (95% CI: 43–82%). Several models defaulted to majority-class prediction (accuracy = 40%), indicating insufficient feature discriminability. Conclusions: YOLOv8 detected the maxillary dental midline under the present internal experimental conditions. However, because leakage-free outer k-fold validation of the complete detection-plus-classification pipeline was not performed, the classification results should be considered preliminary. Future work should address information leakage, incorporate facial reference frame normalization, include inter-observer reliability assessment, and validate the approach on larger datasets. Full article
Show Figures

Figure 1

38 pages, 26167 KB  
Article
Uncertainty-Aware Keypoint Guidance and Fractional Fourier Feature Enhancement for Multi-Class SAR Aircraft Detection
by Yu Qiu, Bin Zou, Fangzhou Han, Lamei Zhang and Jordi J. Mallorqui
Remote Sens. 2026, 18(12), 1969; https://doi.org/10.3390/rs18121969 - 13 Jun 2026
Viewed by 118
Abstract
Aircraft targets in SAR imagery often exhibit discrete scattering characteristics, significant variations in pose and scale, strong speckle noise in background clutter, and complex background interference, which jointly hinder stable structural feature extraction and accurate target localization. Existing detectors for SAR aircraft recognition [...] Read more.
Aircraft targets in SAR imagery often exhibit discrete scattering characteristics, significant variations in pose and scale, strong speckle noise in background clutter, and complex background interference, which jointly hinder stable structural feature extraction and accurate target localization. Existing detectors for SAR aircraft recognition primarily rely on bounding-box regression and classification; they do not completely exploit target structural cues, spatial attention, and frequency-domain information. To address these limitations, we propose a collaborative detection framework that integrates an uncertainty-aware keypoint-driven module (UAKM) with a fractional Fourier convolution backbone (S-FRConv). UAKM introduces a center-keypoint regression branch that jointly predicts keypoint coordinates and Laplacian scale parameters and employs a 2D Laplace negative log-likelihood loss to estimate uncertainty. The derived dense uncertainty heatmap is then used as spatial attention weights to guide distribution-based regression and multi-scale feature re-weighting, without requiring any additional annotations. S-FRConv embeds the Fractional Fourier Transform into shallow backbone layers and C2f modules, enabling joint spatial–spectral feature modeling that suppresses speckle noise and enhances edge and orientation representations. Experiments on the public SAR-AIRcraft-1.0 dataset demonstrate that the proposed method systematically improves the detection performance. For the Nano model, the overall mAP50 increases from 0.810 to 0.867, and the mAP 50:95 improves from 0.637 to 0.655 compared with the baseline, corresponding to gains of 5.7 and 1.8 percentage points, respectively. These results validate the effectiveness and generalization potential of combining uncertainty-driven spatial attention with fractional spectral feature enhancement for SAR aircraft target detection. Full article
(This article belongs to the Special Issue Object Detection in Remote Sensing Imagery)
Show Figures

Figure 1

23 pages, 2117 KB  
Article
A Traffic Police Gesture Recognition Method Based on BiLSTM-Transformer Architecture
by Xiaoyu Zhang, Baohua Guo, Sen Wang, Anthony Sigama and David Bassir
Electronics 2026, 15(12), 2578; https://doi.org/10.3390/electronics15122578 - 11 Jun 2026
Viewed by 217
Abstract
To address the issues of insufficient real-time performance and inadequate modeling of temporal features in traffic police gesture recognition, this paper proposes a method based on skeleton keypoints and hybrid temporal modeling. First, YOLOv11m-Pose is employed to detect human skeleton keypoints in video [...] Read more.
To address the issues of insufficient real-time performance and inadequate modeling of temporal features in traffic police gesture recognition, this paper proposes a method based on skeleton keypoints and hybrid temporal modeling. First, YOLOv11m-Pose is employed to detect human skeleton keypoints in video sequences, extracting reliable two-dimensional skeleton features. Second, this study designs a temporal modeling network that integrates a bidirectional long short-term memory (BiLSTM) with a Transformer. The BiLSTM models local temporal continuity and action transition features between adjacent frames, capturing short-term dynamic changes. The Transformer, through its self-attention mechanism, models global temporal dependencies and weights critical time steps to extract long-range discriminative information. Experimental results demonstrate that the proposed method achieved 98.91% for both Accuracy and F1-Score. In terms of Accuracy, it outperformed the BiLSTM and Transformer models by 2.43% and 7.67%, respectively. It outperforms most methods based on recurrent neural networks and feature fusion. Meanwhile, the model achieves an average inference time of just 1.3299 s per gesture sequence. Consequently, this approach strikes a favorable balance between recognition accuracy and real-time performance, demonstrating significant practical value. Full article
(This article belongs to the Special Issue AI Innovations in Smart Transportation)
Show Figures

Figure 1

28 pages, 22349 KB  
Article
Real-Time Elevation and Orientation-Aware Visual Localization for GNSS-Denied Drone Navigation
by Hadi Fares, Ammar Mohanna and Bilal Kaddouh
Drones 2026, 10(6), 445; https://doi.org/10.3390/drones10060445 - 6 Jun 2026
Viewed by 364
Abstract
Global Navigation Satellite Systems (GNSS)-denied environments pose significant challenges for autonomous drone navigation, requiring robust visual localization systems capable of real-time performance. Existing approaches either sacrifice accuracy for speed or fail to adapt to varying flight altitudes and orientations, limiting their practical deployment. [...] Read more.
Global Navigation Satellite Systems (GNSS)-denied environments pose significant challenges for autonomous drone navigation, requiring robust visual localization systems capable of real-time performance. Existing approaches either sacrifice accuracy for speed or fail to adapt to varying flight altitudes and orientations, limiting their practical deployment. We present Real-Time Elevation and Orientation-Aware Localization Architecture (REOLA), a visual localization system that combines similarity-driven autonomous window sizing, element-wise correlation-based orientation detection, and reinforcement learning with human feedback (RLHF) enhancement for publicly available satellite imagery. On desktop hardware (i7-10700K + RTX 3070), the REOLA achieved approximately 59 FPS performance with sub-5-m accuracy across diverse flight conditions through intelligent similarity-based matching, combined with efficient MobileNet-V3 embeddings and FAISS similarity search. For embedded deployment on NVIDIA Jetson Orin Nano, the system achieved 22.5 FPS, meeting real-time requirements for autonomous drone localization. The system autonomously selects optimal window sizes corresponding to the current elevation and determines drone orientation through element-wise correlation scoring across discrete rotation angles. Enhanced through RLHF, the REOLA achieved a 97.1% success rate (sub-5-m localization) while processing frames in 17 milliseconds on desktop hardware (44.4 ms on embedded hardware), providing a substantial margin over real-time requirements. The approach demonstrates particular superiority over traditional keypoint-based methods in challenging environments with repetitive patterns such as agricultural fields, rocky mountains, dense forests, and grasslands, where conventional keypoint detection struggles. We explicitly identify featureless sand dune deserts and open-sea or coastal water flights as out of scope, since the reference satellite imagery in those regimes does not contain stable landmarks. Full article
Show Figures

Figure 1

18 pages, 7896 KB  
Article
DINOv2-Driven Monocular Body Measurement Keypoint Detection for Low-Texture Endangered Binglangjiang Buffalo
by Yuhan Xun, Xingchen Ye, Yinuo He, Bo Hu and Fei Xiong
AgriEngineering 2026, 8(6), 219; https://doi.org/10.3390/agriengineering8060219 - 1 Jun 2026
Viewed by 255
Abstract
The Binglangjiang buffalo, the only indigenous river-type buffalo in China, poses significant challenges for automated keypoint detection due to its uniformly black, low-texture coat, poor foreground–background contrast, and scarcity of annotated training samples. To address these challenges, this study constructs a benchmark dataset [...] Read more.
The Binglangjiang buffalo, the only indigenous river-type buffalo in China, poses significant challenges for automated keypoint detection due to its uniformly black, low-texture coat, poor foreground–background contrast, and scarcity of annotated training samples. To address these challenges, this study constructs a benchmark dataset of 10,834 lateral-view images covering 424 individuals, annotated with 10 body measurement keypoints following standardized buffalo measurement protocols. A keypoint detection pipeline is developed by adapting DINOv2 with a top-down heatmap regression head under a single-view imaging setup, reducing hardware complexity for practical farm deployment. Benchmarking against YOLOv8 series and a standard ViT baseline shows that DINOv2-Base achieves 96.51% mAP, surpassing YOLOv8m by 5.6 percentage points. Compared to standard ViT, DINOv2 demonstrates more stable localization across keypoints under model scaling. Specifically, on the scapular tip (P8), a particularly low-texture region, DINOv2 exhibits only 0.28% mAP fluctuation versus 0.82% for standard ViT, indicating greater robustness to limited training data and low-contrast imaging. Body measurement validation on 20 individuals yields MAPE values of 1.76–5.69% across five measurements, confirming reliable non-contact measurement performance. The dataset and pipeline provide practical support for precision livestock management of endangered breeds. Full article
Show Figures

Figure 1

15 pages, 8646 KB  
Article
Comparative Evaluation of Histogram Equalization-Based Preprocessing for UAV Thermal–RGB Orthophoto Registration
by Kirim Lee and Wonhee Lee
Geomatics 2026, 6(3), 57; https://doi.org/10.3390/geomatics6030057 - 31 May 2026
Viewed by 198
Abstract
Accurate registration of UAV-derived thermal infrared orthophotos and RGB orthophotos is essential for multi-sensor geospatial analysis, but it remains challenging because thermal imagery generally has lower spatial resolution, weaker texture, and less distinct structural information than RGB imagery. This study comparatively evaluated five [...] Read more.
Accurate registration of UAV-derived thermal infrared orthophotos and RGB orthophotos is essential for multi-sensor geospatial analysis, but it remains challenging because thermal imagery generally has lower spatial resolution, weaker texture, and less distinct structural information than RGB imagery. This study comparatively evaluated five histogram equalization methods—histogram equalization (HE), contrast-limited adaptive histogram equalization (CLAHE), brightness-preserving bi-histogram equalization (BBHE), dualistic sub-image histogram equalization (DSIHE), and minimum mean brightness error bi-histogram equalization (MMBEBHE)—for improving AKAZE-based registration of land surface temperature (LST) orthophotos to reference RGB orthophotos. High-accuracy RGB orthophotos generated using GNSS-surveyed ground control points were used as the geometric reference. Thermal data were acquired twice at each of two study sites with contrasting surface characteristics and processed into LST orthophotos. Each histogram equalization method was applied to the LST orthophotos, after which keypoints and descriptors were extracted using AKAZE, tentative correspondences were established, outliers were removed using RANSAC, and an affine transformation was estimated from the inlier correspondences. Here, an inlier denotes a tentative match that remained geometrically consistent after RANSAC-based outlier rejection. The estimated transformation was then applied to the source LST raster to preserve radiometric values in the final corrected product. Performance was assessed using the number of detected keypoints, tentative matches, RANSAC-verified inliers, matching efficiency, reproducibility, and exploratory statistical analysis. Among the five methods, BBHE consistently produced the highest number of inliers and the best matching efficiency at both study sites, while also showing the lowest variability between repeated acquisitions. These results indicate that brightness-preserving histogram equalization is particularly effective for thermal–RGB orthophoto registration and can improve the reliability of UAV-derived thermal mapping products for geomatics applications. Full article
Show Figures

Figure 1

32 pages, 61848 KB  
Article
A Multi-Level Cross-Modal Edge Filtering Method for High-Resolution Optical-SAR Image Registration
by Jinghong Lan, Ziqi Ye, Rui Li, Kunpeng Qiu, Peixuan Li, Xiaorong Guo and Fengming Hu
Remote Sens. 2026, 18(11), 1741; https://doi.org/10.3390/rs18111741 - 28 May 2026
Viewed by 396
Abstract
Optical and Synthetic Aperture Radar (SAR) image registration is a fundamental task in remote sensing information fusion, yet it remains challenging due to significant differences in imaging mechanisms, radiation characteristics, and noise properties between the two modalities. Existing public datasets suffer from limited [...] Read more.
Optical and Synthetic Aperture Radar (SAR) image registration is a fundamental task in remote sensing information fusion, yet it remains challenging due to significant differences in imaging mechanisms, radiation characteristics, and noise properties between the two modalities. Existing public datasets suffer from limited resolution, small scale, and insufficient scene diversity, and these limitations have hindered algorithm development. This paper constructs a large-scale, high-resolution optical–SAR registration dataset based on the HongTu-1 satellite 3-m SAR imagery and Google Earth optical imagery at zoom level 17, covering diverse scenes across China with a standardized pipeline including terrain correction, geometric alignment, standardized slicing, and quality filtering. Building upon this dataset, a hand-crafted keypoint-based cross-modal registration method is proposed, incorporating multi-level edge filtering and hybrid feature detection. Unlike conventional hand-crafted methods such as RIFT, SRIF, and LNIFT, which mainly refine keypoint detection, description, or matching within a SIFT-style pipeline, the core novelty of this work lies in SAR-specific preprocessing and multi-level hybrid filtering. These components are designed to suppress speckle while extracting more stable and discriminative shared edge responses for cross-modal registration. An improved Log-domain Total Variation (Log-TV) denoising model is introduced for SAR preprocessing. A hybrid edge filtering framework combining phase congruency analysis and Structured Random Forest (SRF) edge detection is constructed within a Gaussian scale space. A dual-branch feature detection scheme integrating blob and corner features is designed with a robust orientation assignment strategy. Feature description uses the Gradient Location–Orientation Histogram (GLOH) descriptor with Principal Component Analysis (PCA) reduction, while geometric estimation employs the Fast Sample Consensus (FSC) algorithm. Experiments on the self-constructed HT dataset and on the public OSdataset and SAR2Opt benchmarks show that the proposed method consistently achieves low RMSE and high success rates. It also maintains competitive efficiency among hand-crafted methods while retaining strong robustness to scale and rotation variations. Full article
Show Figures

Figure 1

23 pages, 27208 KB  
Article
StrawPose-Lite: A Lightweight Pose Network for Strawberry Picking Point Prediction on Edge Devices
by Haojiang Liu, Yunsen Liang, Qile He, Bingbing Li, Wanshu Wang, Hongyu He, Yaoxue Xu, Yujie Yao, Xiangyu Cao, Yongqi Yin, Xuliang Duan and Tao Pang
Agriculture 2026, 16(11), 1185; https://doi.org/10.3390/agriculture16111185 - 28 May 2026
Viewed by 468
Abstract
Strawberry harvesting perception in greenhouse environments requires visual models that remain reliable under occlusion while staying compact enough for edge-side inference. To address this requirement, this study develops StrawPose-Lite, a lightweight pose network for strawberry picking point prediction based on YOLOv11n-pose. The network [...] Read more.
Strawberry harvesting perception in greenhouse environments requires visual models that remain reliable under occlusion while staying compact enough for edge-side inference. To address this requirement, this study develops StrawPose-Lite, a lightweight pose network for strawberry picking point prediction based on YOLOv11n-pose. The network combines ADown and C3Ghost to reduce redundant computation while preserving informative structure, and it adopts a six-keypoint pose definition derived from strawberry phenotypic characteristics. In this representation, the pedicel–fruit junction is used as the final visual picking point, whereas the remaining peak, curvature, and bottom keypoints provide geometric support when the visible contour is incomplete. The keypoint branch is further enhanced by P2-guided multi-scale fusion and SimAM-based refinement to improve sensitivity to fine pedicel-related cues under strict lightweight constraints. On the public validation split, StrawPose-Lite contains 0.73 M parameters and requires 3.0 GFLOPs while achieving a pose mAP@0.5:0.95 of 79.2%. In the independent field deployment set, the TensorRT INT8 version achieved a pure network inference throughput of 277 FPS on a Jetson Orin NX 16G Super platform, with a measured total software latency of 5.01 ms under the embedded pipeline. These results indicate that StrawPose-Lite provides an effective balance between pose accuracy, model compactness, and edge-side inference speed for strawberry picking point perception on edge devices. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

20 pages, 4796 KB  
Article
UHPose-VAD: Unsupervised Video Anomaly Detection via Pose-Graph Learning and Normalizing Flow
by Di Jiang, Huicheng Lai, Guxue Gao, Dan Ma and Liejun Wang
J. Imaging 2026, 12(6), 227; https://doi.org/10.3390/jimaging12060227 - 27 May 2026
Viewed by 442
Abstract
Unsupervised video anomaly detection (VAD) aims to identify unusual events by learning from unlabeled videos. However, many current methods overlook the fine-grained spatiotemporal dynamics of human poses, which are crucial for detecting localized anomalies like falls or assaults. Prevailing methods that rely on [...] Read more.
Unsupervised video anomaly detection (VAD) aims to identify unusual events by learning from unlabeled videos. However, many current methods overlook the fine-grained spatiotemporal dynamics of human poses, which are crucial for detecting localized anomalies like falls or assaults. Prevailing methods that rely on raw RGB frames are often susceptible to variations in lighting and background and struggle to capture the precise structural relationships of human bodies over time. To bridge this gap, we propose UHPose-VAD, a novel unsupervised framework that integrates human pose dynamics with normalizing flow within a graph-based probabilistic model to capture anomalies through spatiotemporal Gaussian distributions. Our framework first extracts human pose keypoints and normalizing flow features. These are then modeled by a graph convolutional network that adaptively learns the graph connectivity, effectively mapping the data to a latent space. This approach allows the model to explicitly reason about the spatiotemporal relationships between body joints, making it inherently more robust and interpretable for human-centric anomaly detection. Finally, a Gaussian Mixture Model fits the latent features of normal training data, learning the intrinsic manifold of regular motion patterns. Extensive experiments on ShanghaiTech and UBnormal datasets show that UHPose-VAD achieves state-of-the-art performance among unsupervised methods, with AUC scores of 86.1% and 69.4%, respectively. Full article
(This article belongs to the Special Issue From Visual Perception to Spatiotemporal Understanding)
Show Figures

Figure 1

25 pages, 17748 KB  
Article
Keypoint-Based Forest Musk Deer Behavioral Recognition Method
by Dequan Guo, Chuankang Chen, Chengli Zheng, Zhenyu Wang, Dapeng Zhang and Dening Luo
Animals 2026, 16(11), 1594; https://doi.org/10.3390/ani16111594 - 23 May 2026
Viewed by 653
Abstract
The traditional monitoring of forest musk deer behavior primarily relies on direct human observation or the post hoc playback analysis of ordinary surveillance videos. This approach is not only time-consuming and labor-intensive but also highly subjective, easily leading to missing or misjudged critical [...] Read more.
The traditional monitoring of forest musk deer behavior primarily relies on direct human observation or the post hoc playback analysis of ordinary surveillance videos. This approach is not only time-consuming and labor-intensive but also highly subjective, easily leading to missing or misjudged critical behavioral information. Moreover, it is difficult to achieve real-time monitoring and anomaly warning. These limitations severely constrain the efficiency of the large-scale artificial breeding of forest musk deer and the effective advancement of wild population conservation. Thus, this study proposes a forest musk deer behavioral recognition method based on an improved YOLOv8-Pose. A forest musk deer behavior image dataset covering four typical behaviors was constructed, and 18 keypoints were systematically annotated. This study designs a Dilated Spatial Pyramid Pooling-Fast (DILATED-SPPF) module and a Multi-scale Depthwise Separable Context Mixer (MDSC-Mixer) module, and integrates them into YOLOv8-Pose. Experimental results show that the improved model outperforms the original YOLOv8-Pose and comparison models such as YOLOv11/v12-Pose on key metrics of object detection (Box-mAP50 0.929, Box-mAP50-95 0.814) and pose estimation (Pose-mAP50 0.879, Pose-mAP50-95 0.565). This study further develops a visual interactive interface that intuitively presents detection results and skeleton structures. This work provides a high-precision, low-cost automated behavior analysis tool for the artificial breeding and wild conservation of forest musk deer with significant application value for enhancing the intelligence level of endangered species protection. Full article
Show Figures

Figure 1

24 pages, 8701 KB  
Article
SY-SLAM: Real-Time Dynamic Indoor RGB-D SLAM with SuperPoint Detection and Asynchronous YOLOv8s-Based Keypoint Suppression
by Shaoshuai Zhi, Shuangfeng Wei, Shan Zhou, Yulan Lao, Mingyang Zhai, Tianyu Yang, Keming Qu and Boyan Jiang
Sensors 2026, 26(11), 3315; https://doi.org/10.3390/s26113315 - 23 May 2026
Viewed by 389
Abstract
Traditional visual SLAM pipelines are typically designed under the static-world assumption and often degrade severely in indoor environments with frequent human motion. To improve trajectory accuracy and front-end stability in such scenarios while maintaining real-time throughput, we present SY-SLAM, an RGB-D SLAM system [...] Read more.
Traditional visual SLAM pipelines are typically designed under the static-world assumption and often degrade severely in indoor environments with frequent human motion. To improve trajectory accuracy and front-end stability in such scenarios while maintaining real-time throughput, we present SY-SLAM, an RGB-D SLAM system for dynamic indoor environments with frequent human motion. (S stands for SuperPoint, which is used as a detector-only learned keypoint front-end, and Y stands for YOLO, which provides asynchronous person-aware keypoint suppression based on detected human bounding boxes.) We integrate a TensorRT-deployed detector-only SuperPoint module to improve keypoint repeatability and robustness while retaining ORB binary descriptors for efficient matching and place recognition within the ORB-SLAM3 framework. To avoid feature starvation while preserving keypoint quality, we further introduce an adaptive SuperPoint keypoint selection strategy that applies stricter filtering when keypoints are abundant and relaxes the selection constraints when they are scarce. In parallel, an asynchronous YOLOv8s TensorRT thread performs person detection with temporal bounding-box memory, and keypoints inside detected person regions are removed before ORB descriptor computation and matching to reduce dynamic-feature contamination in the front end. We evaluate SY-SLAM on five dynamic TUM RGB-D fr3 sequences using ATE and RPE metrics. Compared with ORB-SLAM3, SY-SLAM reduces ATE RMSE by 93.45% across four dynamic walking sequences. On the widely reported fr3/w/x sequence, SY-SLAM achieves competitive accuracy with recent dynamic SLAM methods while maintaining real-time performance. The system runs in real time at 46.8 Hz (21.36 ms per frame) on an Intel i9-13900H CPU with an NVIDIA RTX 4070 Laptop GPU. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

20 pages, 34863 KB  
Article
An Enhanced Image Feature Extraction and Matching Method for Three-Dimensional Reconstruction of Forest Scenes
by Hangui Wang and Hongyu Huang
Remote Sens. 2026, 18(11), 1681; https://doi.org/10.3390/rs18111681 - 22 May 2026
Viewed by 278
Abstract
Accurate and efficient 3D reconstruction of trees is of paramount importance for studying forest spatial structures and dynamic resource patterns, optimizing forest management, protecting environments, and analyzing carbon cycles. Currently, Light Detection and Ranging (LiDAR) remains the dominant method for generating 3D models [...] Read more.
Accurate and efficient 3D reconstruction of trees is of paramount importance for studying forest spatial structures and dynamic resource patterns, optimizing forest management, protecting environments, and analyzing carbon cycles. Currently, Light Detection and Ranging (LiDAR) remains the dominant method for generating 3D models of forest scenes. However, with advancements in computer vision, photogrammetry has emerged as a crucial tool for forest inventory and 3D reconstruction due to its cost-effectiveness. Nevertheless, in practical forestry applications, traditional photogrammetry often suffers from low reconstruction efficiency and poor quality during feature extraction and matching. These issues stem from the complex structure of forest scenes, severe occlusion, and repetitive texture patterns. To address these challenges, this paper proposes an improved 3D tree reconstruction approach based on images, integrating deep learning-based methods. In the sparse reconstruction stage, we utilize the ALIKED (A LIghter Keypoint and descriptor Extraction network with Deformable transformation) algorithm and construct an image pyramid to extract multi-scale robust features. Furthermore, by combining the LightGlue matching algorithm with a neighborhood search constraint strategy, we enhance the stability of camera pose recovery while reducing redundant computations. Experimental results demonstrate that our method outperforms traditional algorithms in both accuracy and robustness regarding image matching. Compared to baseline models, the proposed approach increases the number of feature points by approximately 50% with a more widespread distribution, improves matching accuracy by 4% to 8%, and achieves a 100% image registration rate. Consequently, under the condition of maintaining equivalent re-projection errors, the subsequent sparse point clouds exhibit an average track length increase of 0.6 to 1.4 and a density increase of up to 1.2 times. Notably, this method effectively mitigates artifacts and spurious reconstructions caused by pose drift in forest photogrammetry. Full article
(This article belongs to the Special Issue Digital Modeling for Sustainable Forest Management)
Show Figures

Figure 1

18 pages, 4212 KB  
Article
AHSC-Net: A Fish Pose Estimation Method for Intelligent Monitoring in Precision Aquaculture
by Xiaohong Peng, Ronghan Lu, Zhuohan Xiao and Xiaohan Chen
Fishes 2026, 11(5), 308; https://doi.org/10.3390/fishes11050308 - 21 May 2026
Viewed by 325
Abstract
In aquaculture, fish physiological information serves as the foundation for behavior recognition, precise feeding, and health monitoring. The acquisition of such information relies on accurate keypoint detection and pose estimation of the fish body. To address the challenges caused by inter-occlusion among fish [...] Read more.
In aquaculture, fish physiological information serves as the foundation for behavior recognition, precise feeding, and health monitoring. The acquisition of such information relies on accurate keypoint detection and pose estimation of the fish body. To address the challenges caused by inter-occlusion among fish schools and blurred keypoint boundaries in underwater environments, a novel fish pose estimation method based on the Adaptive-kernel Hybrid-center Structural Constraint Network (AHSC-Net) is proposed. Optimized specifically for the characteristics of fish poses, the proposed method effectively enhances detection accuracy and robustness in complex underwater scenarios. First, a Stochastic Local Centroid Sampling (SLCS) strategy is introduced to improve detection capability. By simulating centroid positions in occluded samples, this approach enhances the model’s ability to detect partially occluded fish. Next, a Spatial-Awareness Enhanced Pose Structural Constraint (SAPSC) is established through coordinate embedding and morphological constraints. It ensures the rationality of the predicted poses. Furthermore, an Adaptive Kernel Modulation Module (AKMM) is designed to dynamically adjust the Gaussian kernel distribution, effectively addressing challenges posed by underwater blurring and variations in fish scales. Experimental results demonstrate that AHSC-Net achieves 92.0% AP and 94.6% AR on a self-constructed largemouth bass dataset, outperforming state-of-the-art methods such as HRNet, HigherHRNet, DEKR, and YOLO-Pose. This study presents a fish pose estimation method that provides effective technical support for automated and precise monitoring in aquaculture. Full article
(This article belongs to the Special Issue Computer Vision Applications for Fisheries and Aquaculture)
Show Figures

Figure 1

Back to TopTop