MDPI - Publisher of Open Access Journals

28 pages, 10424 KB

Open AccessArticle

Distance-Aware DBSCAN–STM Pipeline with Centralized Point Augmentation for LiDAR-Based Pedestrian Candidate Generation

by Jihwan Yeom, Jinman Kim and Joongjin Kook

Appl. Sci. 2026, 16(13), 6286; https://doi.org/10.3390/app16136286 (registering DOI) - 23 Jun 2026

Viewed by 60

This paper presents a non-learning-based, seed-dependent, semi-automatic pedestrian candidate generation pipeline for LiDAR point clouds. The proposed method is designed to support 3D annotation workflows by reducing irrelevant candidate clusters while improving the reliability of pedestrian candidate selection under distance-dependent point sparsity. The [...] Read more.

This paper presents a non-learning-based, seed-dependent, semi-automatic pedestrian candidate generation pipeline for LiDAR point clouds. The proposed method is designed to support 3D annotation workflows by reducing irrelevant candidate clusters while improving the reliability of pedestrian candidate selection under distance-dependent point sparsity. The pipeline integrates distance-aware DBSCAN clustering, Single Template Matching (STM), and Centralized Point Augmentation (CPA). First, LiDAR points within the camera field of view are preprocessed, and pedestrian candidate clusters are generated using DBSCAN parameters configured according to distance intervals. Ground-snapping-based bounding-box refinement and height-based filtering are then applied to improve geometric consistency and reduce non-pedestrian candidates. In the second stage, STM compares PCA-aligned projected silhouettes of candidate clusters with a seed pedestrian template to suppress false positives. To address silhouette instability caused by sparse mid-range pedestrian points, CPA adds centroid-contracted points in the projection-relevant plane before template matching. Experiments on pedestrian-containing frames from the KITTI dataset show that STM improves precision from 27.6% to 60.5% and increases the F1-score from 36.8% to 51.4% compared with the initial DBSCAN-based candidate generation stage. The final CPA configuration improves recall from 44.7% to 46.7% and the overall F1-score from 51.4% to 52.1%, while revealing a precision–recall trade-off. Supplementary IoU analysis shows that the final DBSCAN–STM–CPA configuration maintains meaningful spatial overlap with pedestrian ground-truth boxes, achieving 88.9% at 3D IoU ≥ 0.10 and 81.6% at BEV IoU ≥ 0.25. Runtime analysis further shows that height-based filtering reduces the average per-frame processing time from 151.5 ms to 125.1 ms, while the final CPA configuration introduces only a small overhead, resulting in 126.2 ms per frame. These results demonstrate that the proposed DBSCAN–STM–CPA pipeline can provide reliable pedestrian candidates for semi-automatic 3D labeling without requiring class-specific detector training. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

29 pages, 2096 KB

Open AccessArticle

Bearing-Only Three-UAV Cooperative Target Localization with Adaptive Weighting and Configuration Optimization

by Kangkang Li, Haodong Sun, Chao Cheng, Zhongjing Ren, Jianping Yuan and Mengbi Wang

Aerospace 2026, 13(6), 564; https://doi.org/10.3390/aerospace13060564 (registering DOI) - 22 Jun 2026

Viewed by 84

Abstract

This paper addresses bearing-only three-dimensional target localization using three cooperative UAVs under observation inconsistency and degraded geometry. A weighted point-to-line least-squares localization model is established to fuse multiple line-of-sight (LOS) observations derived from image measurements, camera calibration, and UAV poses. To handle unreliable [...] Read more.

This paper addresses bearing-only three-dimensional target localization using three cooperative UAVs under observation inconsistency and degraded geometry. A weighted point-to-line least-squares localization model is established to fuse multiple line-of-sight (LOS) observations derived from image measurements, camera calibration, and UAV poses. To handle unreliable measurements without ground truth, a reliability assessment mechanism is developed by combining geometric stability indicators with observation consistency metrics, enabling weak geometry and abnormal observations to be identified online. Based on this assessment, an adaptive optimization framework is introduced to perform residual-driven adaptive weighting and configuration optimization, thereby suppressing unreliable LOS measurements and improving the conditioning of cooperative geometry. Simulation results under four representative scenarios show that the proposed method consistently improves localization accuracy and robustness. The mean localization error is reduced from 0.545 m to 0.260 m under abnormal observations, from 0.355 m to 0.081 m under degraded geometry, and from 0.711 m to 0.280 m when both effects occur simultaneously. Statistical evaluations including RMSE, standard deviation, maximum error, confidence intervals, and box-plot analysis further demonstrate that the proposed framework effectively reduces error dispersion and improves robustness. Full article

(This article belongs to the Section Aeronautics)

27 pages, 2238 KB

Open AccessArticle

Camera-Trap Assessment of Terrestrial Mammals and Ground-Dwelling Birds in the Zhangjiajie Chinese Giant Salamander National Nature Reserve, China

by Chenbo Huang, Ying Wei, Zhiyong Deng, Cheng Wang, Pengchen Zhou, Xinyu Cui, Bin Wang and Xiaoyang Mo

Animals 2026, 16(12), 1935; https://doi.org/10.3390/ani16121935 (registering DOI) - 22 Jun 2026

Viewed by 98

Abstract

Baseline information on terrestrial wildlife communities and their activity patterns is essential for protected-area management, but such information remains limited for Hunan Zhangjiajie Giant Salamander National Nature Reserve, where conservation attention has historically focused on the Chinese giant salamander and associated aquatic ecosystems. [...] Read more.

Baseline information on terrestrial wildlife communities and their activity patterns is essential for protected-area management, but such information remains limited for Hunan Zhangjiajie Giant Salamander National Nature Reserve, where conservation attention has historically focused on the Chinese giant salamander and associated aquatic ecosystems. From March 2024 to August 2025, we conducted a camera-trap survey in broad-leaved and coniferous forest habitats of the reserve to document terrestrial mammals and ground-dwelling birds, evaluate taxonomic completeness, and describe diel and seasonal activity patterns. Across 43 camera-trap stations and 16,314 effective camera-trap days, we recorded 59 wildlife species, including 18 mammals and 41 ground-dwelling birds. The assemblage included nationally protected, threatened, and Chinese endemic species, indicating that the reserve’s forest habitats support important terrestrial biodiversity in addition to its aquatic conservation target. Taxonomic completeness curves suggested that the current survey captured most camera-detectable mammal and ground-dwelling bird taxa under the present sampling design, although the results should not be interpreted as a complete inventory of the reserve’s total vertebrate diversity. Annual diel activity analysis of 11 focal species showed clear temporal differentiation among ecological groups: small and medium-sized carnivores were mainly nocturnal, ground-dwelling birds, and red-hipped squirrel were primarily diurnal, and ungulates showed mixed or crepuscular-to-nocturnal tendencies. Seasonal analyses based on bioclimatic periods showed interspecific differences in activity-density distributions between the cool-dry and warm-wet seasons. However, peak-shift reliability analysis indicated that most focal species retained broadly similar main activity peaks across seasons; masked palm civet was the only species showing reliable seasonal displacement of its main activity peak. Pairwise temporal overlap analyses described temporal co-occurrence patterns among selected sympatric species but should not be interpreted as evidence of direct interaction or niche differentiation. Overall, this study provides baseline data on camera-detected terrestrial vertebrates in the reserve and supports long-term monitoring, forest habitat management, and disturbance control for terrestrial mammals and ground-dwelling birds. Full article

(This article belongs to the Topic Wildlife Intelligent Monitoring: Advancing Conservation Through Visual and Acoustic Monitoring Technologies)

30 pages, 43797 KB

Open AccessArticle

Modular Framework for Responsive and Explainable Robotic Assistance with Intention Prediction Using Human-Centric Digital Twins

by Usman Asad, Azfar Khalid, Waqas Akbar Lughmani, Shummaila Rasheed and Muhammad Mahabat Khan

Sensors 2026, 26(12), 3810; https://doi.org/10.3390/s26123810 - 15 Jun 2026

Viewed by 294

Abstract

Proactive robotic assistance in human–robot collaboration (HRC) requires systems that can perceive evolving task contexts, anticipate user needs, and intervene appropriately without disrupting human workflow. We present the Agentic Unified Robotic Assistance (AURA) Framework, which couples Large Language Model (LLM) reasoning grounded by [...] Read more.

Proactive robotic assistance in human–robot collaboration (HRC) requires systems that can perceive evolving task contexts, anticipate user needs, and intervene appropriately without disrupting human workflow. We present the Agentic Unified Robotic Assistance (AURA) Framework, which couples Large Language Model (LLM) reasoning grounded by Standard Operating Procedures (SOPs) with a modular layer of specialized Intent, Motion, Perception, Sound, Affordance, and Performance Monitors that supply structured context to a central decision-making module, making the framework reconfigurable and auditable without retraining or re-prompting. We introduce a human-in-the-loop teleoperation data collection methodology and an offline evaluation scheme with an Appropriateness Score (A-Score) tailored to proactive intervention timing, and release a benchmark dataset of annotated multimodal HRC episodes containing workspace and robot wrist camera videos, robot joint states, and labeled intervention events. Across three tasks of varying complexity, we observe progressive gains in intent prediction and decision-making as the modules are supplied with richer grounded context (prior-state memory and tracked object locations), with Combined F1 rising by over 20 points between context-poor and context-rich conditions. The structured grounding allows lightweight multimodal backbones such as Gemini 3.1 Flash Lite to perform on par with heavier reasoning-tier models at roughly one-fifth the inference latency. Together, these contributions establish a scalable framework, benchmark, and evaluation methodology for advancing proactive robotic assistance in collaborative environments. Full article

(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)

► Show Figures

Figure 1

20 pages, 4434 KB

Open AccessArticle

Feasibility Assessment of a High-Altitude Tethered-Balloon Optical Imaging System for LEO Space Debris Monitoring

by Kunpeng Wang, Fengbiao Ji, Yongfei Gao, Gongmin Yu and Dongyu Li

Appl. Sci. 2026, 16(12), 6053; https://doi.org/10.3390/app16126053 - 15 Jun 2026

Viewed by 117

Abstract

To support low Earth orbit (LEO) debris monitoring, this paper investigates a tethered-balloon-based optical observation concept intended to complement ground- and space-based sensors. The system comprises a high-altitude tethered aerostat, an optical payload, a three-axis stabilization subsystem, and a ground control station. Key [...] Read more.

To support low Earth orbit (LEO) debris monitoring, this paper investigates a tethered-balloon-based optical observation concept intended to complement ground- and space-based sensors. The system comprises a high-altitude tethered aerostat, an optical payload, a three-axis stabilization subsystem, and a ground control station. Key payload parameters, including field of view, spatial resolution, and atmospheric transmittance, are analyzed, and the configuration is examined in terms of spectral-band selection, aperture, and multi-camera mosaic imaging. A multi-station angular-measurement model and a weighted least-squares estimator are developed for debris localization. Monte Carlo and scenario-based simulations indicate that a wide field of view can increase observation duration and availability, with mean continuous observation arcs exceeding 400 s, thereby improving estimator conditioning and localization performance. A 5 km flight experiment further validates the operability of the SWIR imaging chain through star-field imaging and a representative image-sequence example with a highlighted moving point-source target. The results suggest that tethered balloons can provide a cost-effective and rapidly deployable supplementary observation layer for multi-layer space situational awareness. Full article

(This article belongs to the Section Aerospace Science and Engineering)

► Show Figures

Figure 1

35 pages, 8823 KB

Open AccessArticle

A Semantic-Enhanced Multi-Source Fusion Localization Method for GNSS-Degraded Environments

by Haobo Zhao and Xinhua Tang

Sensors 2026, 26(12), 3761; https://doi.org/10.3390/s26123761 - 12 Jun 2026

Viewed by 230

Abstract

In complex urban environments, Global Navigation Satellite System (GNSS) signals are easily affected by building blockage and multipath effects, which may degrade positioning quality or even cause GNSS denial. As a result, conventional integrated navigation systems suffer from accumulated errors due to insufficient [...] Read more.

In complex urban environments, Global Navigation Satellite System (GNSS) signals are easily affected by building blockage and multipath effects, which may degrade positioning quality or even cause GNSS denial. As a result, conventional integrated navigation systems suffer from accumulated errors due to insufficient global constraints. To address this problem, a multi-source integrated positioning method incorporating semantic information is proposed. Fixed traffic lights are selected as semantic landmarks, and an object detection network is used to extract the center pixel coordinates and detection confidence of the landmarks. Then, by combining depth information, camera pose, and the prior global coordinates of fixed semantic landmarks, a semantic target inversion model is established to transform two-dimensional image information into three-dimensional position estimates in the world coordinate system. Semantic factors are further constructed and incorporated into backend factor graph optimization. To determine the weighting of semantic factors, the influences of pixel localization error, depth estimation error, camera pose error, and prior coordinate error of fixed semantic landmarks on semantic observations are analyzed, and a noise covariance model for semantic factors is established. Finally, an unmanned ground vehicle experimental platform is built to validate and analyze the proposed factor graph algorithm. The experimental results show that, under GNSS-degraded conditions, the algorithm with semantic factors can provide supplementary global constraints for the system and effectively suppress accumulated positioning errors. In Experiment 1, compared with the algorithm without semantic factors, the maximum absolute trajectory error is reduced by 46.26%. To further verify the applicability of the proposed method in more complex scenarios, Experiment 2 is conducted on a longer route with multiple semantic landmarks and a more severe GNSS-degraded interval. The results show that the proposed method reduces the maximum APE from 6.5432 m to 3.4778 m, corresponding to a reduction of approximately 46.85%. These results demonstrate that the proposed semantic factor can improve the robustness of multi-source fusion localization in GNSS-degraded environments. Full article

(This article belongs to the Special Issue Multi-Sensor Technology for Tracking, Positioning and Navigation)

► Show Figures

Figure 1

29 pages, 15618 KB

Open AccessArticle

Automated Mapping of Periglacial Landforms on Mars’ Utopia Planitia Using a Multi-Scale Texture-Enhanced U-Net

by Xiaoyi Chang, Shuanggen Jin and Yanchao Zheng

Sensors 2026, 26(12), 3653; https://doi.org/10.3390/s26123653 - 8 Jun 2026

Viewed by 348

Abstract

Martian periglacial landforms are among the clearest surface clues for investigating ground-ice occurrence, climate evolution, and potential habitability on Mars. Utopia Planitia contains abundant ice-related landforms and is therefore well suited to regional-scale mapping of periglacial features. However, most existing identifications still rely [...] Read more.

Martian periglacial landforms are among the clearest surface clues for investigating ground-ice occurrence, climate evolution, and potential habitability on Mars. Utopia Planitia contains abundant ice-related landforms and is therefore well suited to regional-scale mapping of periglacial features. However, most existing identifications still rely heavily on manual interpretation, which is time-consuming and difficult to keep consistent across large image mosaics. In this paper, using Context Camera (CTX) imagery, a dataset of four representative landform types in Utopia Planitia, namely flat-floored depressions, thermal contraction cracks, scalloped depressions, and brain terrain, was built. A Multi-scale Texture-enhanced U-Net (MTU-Net) was then developed as an automated and standardized mapping solution for semantic segmentation of these landforms. The model incorporates hierarchical attention and multi-scale texture enhancement modules, enabling recognition under complex backgrounds where fine-scale landforms such as thermal contraction cracks and brain terrain exhibit only weak textural details, alongside large scale variations. On the held-out test set, MTU-Net reaches a mean intersection over union (mIoU) of 89.55%, a mean F1-score of 94.71%, and a Kappa coefficient of 91.21%, outperforming the baseline U-Net under the same evaluation protocol. The resulting regional maps show marked spatial heterogeneity in the occurrence of the four landform types across Utopia Planitia. This study provides a methodological basis for automated periglacial landform mapping in Mars. Full article

(This article belongs to the Section Environmental Sensing)

► Show Figures

Figure 1

28 pages, 36695 KB

Open AccessArticle

Leaf Angle Distribution Effects on Modelling Accuracy of Sensible and Latent Heat Fluxes in Sunflower and Wheat Crops

by Krisztina Pintér and Zoltán Nagy

Remote Sens. 2026, 18(11), 1732; https://doi.org/10.3390/rs18111732 - 27 May 2026

Viewed by 214

Abstract

The two-source energy balance model pyTSEB-PT was used to model latent heat fluxes from sunflower and wheat crops before senescence, grown on the same field in consecutive years. Input maps for the pyTSEB model were prepared using UAV-acquired multispectral/thermal imagery and ground control [...] Read more.

The two-source energy balance model pyTSEB-PT was used to model latent heat fluxes from sunflower and wheat crops before senescence, grown on the same field in consecutive years. Input maps for the pyTSEB model were prepared using UAV-acquired multispectral/thermal imagery and ground control canopy leaf angle distribution (χ) and leaf area index (LAI) estimations based on canopy light transmission measurements by linear ceptometers. The modelled sensible and latent heat fluxes (H_pyTSEB, LE_pyTSEB) were validated against eddy covariance-measured respective fluxes (H_eddy, LE_eddy). Actual χ (χ_a) was estimated from 2 h courses of canopy light transmission values and ranged between 0.5 and 1.2 for wheat and between 2.8 and 5.8 for sunflower crops, respectively, affecting canopy light extinction coefficients (k) and LAI in both crops compared to the case of the generally assumed spherical leaf angle distribution (χ = 1). Vegetation cover fraction (f_c) was 3.4% smaller in wheat when using χ_a instead of χ₁, but this led to only minor—though significant—changes in modelled T_can, T_soil and canopy and surface resistances. The effect of leaf angle distribution on the combined validation of sensible and latent heat flux data was shown primarily in sunflower due to the decrease in sensible heat flux error, while validation improvement was not detectable in the case of wheat. Using field-calibrated thermal images instead of uncalibrated ones strongly improved validation results (fit of modelled vs. measured sensible and latent heat fluxes), showing the necessity of field calibration of the thermal camera when the data are used for vegetation energy balance modelling. Full article

(This article belongs to the Special Issue High-Throughput Phenotyping in Plants Using Remote Sensing)

► Show Figures

Figure 1

37 pages, 6289 KB

Open AccessArticle

An Indoor Occupancy Detection Method and Application by Fusing Field-of-View Information and Events with a Single Camera

by Pengchen Chen, Chuang Wang and Jingjing An

Buildings 2026, 16(11), 2133; https://doi.org/10.3390/buildings16112133 - 26 May 2026

Viewed by 265

Abstract

Accurate and stable indoor occupancy information is essential for occupant-based intelligent ventilation control. Under a single-camera setting, existing indoor occupancy detection methods commonly suffer from missed detections caused by occlusion and blind zones, false detections caused by people outside the room, and cumulative [...] Read more.

Accurate and stable indoor occupancy information is essential for occupant-based intelligent ventilation control. Under a single-camera setting, existing indoor occupancy detection methods commonly suffer from missed detections caused by occlusion and blind zones, false detections caused by people outside the room, and cumulative entry–exit errors that are difficult to correct. These problems lead to false fluctuations in detected occupancy, affect control performance, and may further reduce indoor comfort or cause unnecessary energy use. To address the practical situation in which indoor spaces are commonly equipped with a single security camera, this study proposes an indoor occupancy detection method by fusing field-of-view information and entry–exit events with a single camera. The study covers method development, multi-scenario validation, parameter analysis, and a ventilation control application. The proposed method uses YOLOv8x and DeepSORT as front-end models and performs post-processing on their outputs to extract field-of-view occupancy information, entry–exit events, and blind-zone events. An occupancy confirmation and correction module is then constructed. The blind-zone event mechanism reduces the influence of missed entry–exit events and camera blind zones on occupancy judgment. The correction module integrates frame-by-frame ID counts, historical outputs, and multiple event signals to verify and suppress false occupancy changes caused by false detections, missed detections, and blind zones, thereby producing more stable indoor occupancy results. Experimental results show that the proposed method outperforms the baseline methods based on front-end object detection and tracking in terms of score, RMSE, and F1 score in three typical scenarios: an office, a home, and a classroom. In the office scenario, the proposed method achieved a score of 99.36%, an RMSE of 0.081, and an F1 score of 0.781. The detection stability was also improved in the home and classroom scenarios. In the high-density and strongly occluded classroom scenario, the absolute detection performance of the fusion-based detection method was limited by the front-end models, indicating that the method still has certain applicability boundaries in complex high-density scenes. Parameter sensitivity analysis shows that key parameters, including the entry–exit area depth, confidence threshold, and time threshold, affect the detection results of the fusion-based detection method. Under the test conditions of this study, the method performs well when the entry–exit area depth is approximately 1.5d, the YOLOv8x confidence threshold is 40%, and the time threshold is 5 × FPS. These results can provide a reference for initial parameter setting and on-site calibration in similar scenarios. Using the office scenario as a case study, the method was further applied to occupant-based ventilation control. The average CO₂ concentration during occupied periods under the proposed method was 622.43 ppm, which was closest to the result under ground-truth occupancy control, with a deviation of only 0.9 ppm. This indicates that the method can help improve indoor air quality. Compared with conventional schedule-based control, occupant-based ventilation control driven by the proposed fusion method reduced cumulative fan energy consumption by approximately 65.2%, showing good energy-saving potential at the ventilation-control level. In summary, the proposed method can effectively improve the accuracy and stability of indoor occupancy detection under a single-camera setting and provide more reliable input for occupant-based ventilation control. The framework is modular, and the front-end object detection and tracking models can be replaced according to actual deployment needs. However, the validation in this study is still mainly based on scenarios where existing security cameras can cover the main activity areas and all entry–exit passages. The applicability of the method under more complex camera arrangements, lighting variations, and automatic region configuration requires further investigation. Full article

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

► Show Figures

Figure 1

59 pages, 1676 KB

Open AccessReview

Vision–Language–Action (VLA) Models for Unmanned Aerial Robotics and Bimanual Manipulation: A Review

by Inkyu Sa, Chanoh Park, Hea-Min Lee, Donghee Noh and Ho Seok Ahn

Drones 2026, 10(6), 412; https://doi.org/10.3390/drones10060412 - 26 May 2026

Viewed by 462

Abstract

Vision–Language–Action (VLA) models unify visual perception, natural-language understanding, and action generation within a single foundation model, allowing a robot to follow instructions such as “fold the towel” or “fly to the red building” directly from camera images. Because VLAs inherit world knowledge from [...] Read more.

Vision–Language–Action (VLA) models unify visual perception, natural-language understanding, and action generation within a single foundation model, allowing a robot to follow instructions such as “fold the towel” or “fly to the red building” directly from camera images. Because VLAs inherit world knowledge from internet-scale pre-training, they have become the dominant framework for learning-based manipulation, with bimanual coordination serving as the most demanding testbed: two arms with 7+ degrees of freedom each must move in concert to fold, assemble, and reorient objects. Unmanned aerial robotics faces a structurally similar challenge: a drone must coordinate thrust, attitude, and increasingly gripper commands from visual observations under strict latency and payload constraints. This review covers 183 contributions spanning 2017–2026 and organized along seven dimensions: VLA architectures, training recipes, action representations, bimanual coordination (2022–2026), unmanned aerial vehicle (UAV) navigation and control (2017–2026), language grounding, and cross-cutting concerns including memory and world models. We show that the coordination strategies, training recipes, and action representations developed for bimanual VLAs transfer to unmanned aerial systems and identify fourteen research directions across both domains. Full article

(This article belongs to the Special Issue Advances in Deep Learning for Drones and Its Applications: 3rd Edition)

► Show Figures

Graphical abstract

17 pages, 2436 KB

Open AccessArticle

A Visual Recognition Method for Stacked Plates Based on Deep Learning

by Xikuan Wu, Qian Zhang, Hongying Ma, Zhanwei Li, Chenghai Pan and Wenchang Zhang

Optics 2026, 7(3), 35; https://doi.org/10.3390/opt7030035 - 25 May 2026

Viewed by 206

Abstract

This paper addresses the problem of counting stacked components in industrial scenarios and proposes a method that combines close-range scanning for complete contour acquisition with deep learning for quantity recognition: The contour acquisition system consists of a line array camera and a linear [...] Read more.

This paper addresses the problem of counting stacked components in industrial scenarios and proposes a method that combines close-range scanning for complete contour acquisition with deep learning for quantity recognition: The contour acquisition system consists of a line array camera and a linear laser. Both are arranged horizontally at a certain angle, and the laser line is perpendicular and in the same direction as the stacking of the components. The system scans and connects single-row pixels along the stacking direction to obtain the contour. This method effectively avoids the occlusion problem caused by uneven stacking of components. The quantity recognition algorithm adopts a network structure similar to Encoding–Decoding using the component gap (cls: 0 indicates not, 1 indicates yes) and the endpoint coordinates of the separation line segment [cls, x1, y1, x2, y2] to form a label. Multi-scale anchors are introduced to predict the translation distance of the line segment (positive or negative, indicating direction). The prediction head is fully convolutional, and the loss for regression is computed using the predicted endpoints of the ground-truth line segments. A line segment redundancy removal method is proposed to output the predicted confidence (conf) and coordinates [conf, px1, py1, px2, py2] for each component gap. The self-built dataset is used for training and validation. Experiments show that the recognition accuracy of each image reaches 95.79%, and the gap recognition accuracy reaches 99.62%, which can meet the requirements of automation. Full article

► Show Figures

Figure 1

23 pages, 3576 KB

Open AccessArticle

3D Pose Estimation Using Virtual Projection Based on 3D Reconstructed Model

by Jung-Woo Kim, Sol Lee, Byung-Seo Park, Hak-Bum Lee, Dong-Ho Kang and Young-Ho Seo

Sensors 2026, 26(11), 3302; https://doi.org/10.3390/s26113302 - 22 May 2026

Viewed by 340

Abstract

In this paper, we estimate and refine 3D human pose using the 3D point cloud or mesh model reconstructed from RGB-D cameras or volumetric capture systems. We first reconstruct the 3D model using the multi-view cameras to estimate a highly accurate skeleton. To [...] Read more.

In this paper, we estimate and refine 3D human pose using the 3D point cloud or mesh model reconstructed from RGB-D cameras or volumetric capture systems. We first reconstruct the 3D model using the multi-view cameras to estimate a highly accurate skeleton. To obtain a 2D skeleton with low error, the reconstructed 3D model is projected to four virtual planes after decidi ng the direction of the 3D model. Four 2D skeletons are estimated from four images projected in the virtual plane. Afterward, the refinement process selects candidate joints based on the distribution of local vertices and the DBSCAN algorithm. It applies a sphere fitting to ensure that the final joints are located within the body volume. The joints are combined at the intersection through the back-projection of the joints, including those in the 2D skeleton on the virtual plane. The joints in the intersection are refined using the spatial distribution of the 3D information. Through the proposed method, we estimated a stable and geometrically consistent 3D human pose from reconstructed volumetric data. Using models with ground truth, we calculated the MPJPE between the skeletons of the proposed and the ground truth. The 3D pose estimation was evaluated through a visual assessment of the captured image, and the results were quantitatively compared with the 3D joint positions acquired by the motion capture device. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

28 pages, 9854 KB

Open AccessArticle

A Single-Transformation Model for Fisheye Image Orthorectification

by Qingyang Wang, Guoqing Zhou, Tao Yue, Bo Song, Jianwu Jiang, Zhen Cao and Xing Zhang

Remote Sens. 2026, 18(10), 1651; https://doi.org/10.3390/rs18101651 - 20 May 2026

Viewed by 216

Abstract

Fisheye lenses can capture surrounding spatial information at once, making them widely applied in various fields. However, the imaging principle of fisheye lenses does not satisfy the collinearity equation, so the theory of orthorectification using traditional differential orthorectification is no longer applicable for [...] Read more.

Fisheye lenses can capture surrounding spatial information at once, making them widely applied in various fields. However, the imaging principle of fisheye lenses does not satisfy the collinearity equation, so the theory of orthorectification using traditional differential orthorectification is no longer applicable for a fisheye image in practice. Therefore, this paper develops a single-spherical-geometry-transformation model for fisheye image orthorectification. This model directly establishes the relationship between spatial ground points and image plane coordinates through spherical geometry, and then combines the digital surface model (DSM) to correct points in the fisheye image to their correct positions on a pixel-by-pixel basis, thereby achieving fisheye image orthorectification. To validate the feasibility of the proposed orthorectification model, an indoor calibration field was established. Experimental validation was then conducted using two fisheye image datasets: an indoor dataset acquired in the calibration field with a digital single-lens reflex (DSLR) camera and an outdoor dataset acquired with an unmanned aerial vehicle (UAV). The results of the two groups of experiments demonstrate that the proposed model can effectively orthorectify fisheye images with ground accuracies of 0.055 m and 0.097 m in x and y direction, respectively. Full article

► Show Figures

Figure 1

25 pages, 18341 KB

Open AccessArticle

A Real-Time DBH Ground-Truth Quadruped-Based Methodology for Precise Forest Management

by Theocharis Tsenis, Vasileios Barmpagiannos, Evangelos D. Spyrou and Vassilios Kappatos

Computers 2026, 15(5), 321; https://doi.org/10.3390/computers15050321 - 19 May 2026

Viewed by 217

Abstract

The integration of quadruped robotics with advanced sensing technologies offers a transformative approach to forest management, particularly for real-time measurement of tree Diameter at Breast Height (DBH). This paper introduces a novel methodology by deploying a quadruped robot equipped with GPS, LiDAR, and [...] Read more.

The integration of quadruped robotics with advanced sensing technologies offers a transformative approach to forest management, particularly for real-time measurement of tree Diameter at Breast Height (DBH). This paper introduces a novel methodology by deploying a quadruped robot equipped with GPS, LiDAR, and an aligned high-definition camera to patrol forest paths via a developed dynamic autonomous mission. Utilizing a YOLO-based model for trunk detection, the methodology retrieves precise DBH measurements and corresponding geotags, constructing a spatial database of DBH ground-truth data. This database serves as a real-time ground-truth lookup table to calibrate allometric equations used in drone-based crown detection missions, enhancing the accuracy of forest biophysical attribute estimations such as tree height, volume, and biomass. Experimental validation demonstrates high precision in DBH estimation (error < 5% in controlled tests), supporting automated, around-the-clock data collection for sustainable forest management in Mediterranean ecosystems. Full article

(This article belongs to the Section AI-Driven Innovations)

► Show Figures

Figure 1

25 pages, 1146 KB

Open AccessArticle

LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables

by Xuejun Yang, Jinbiao Zhong, Kaiyan Lin, Junhui Wu, Jie Chen and Huajun Zhu

Agriculture 2026, 16(10), 1111; https://doi.org/10.3390/agriculture16101111 - 19 May 2026

Viewed by 519

Abstract

High-precision plant phenotyping requires efficient 3D reconstruction methods with high geometric quality. 3D Gaussian Splatting (3DGS) has recently emerged as a promising approach for real-time 3D reconstruction, achieving impressive visual quality. However, in crop environments dominated by monochromatic and low-texture regions, existing 3DGS [...] Read more.

High-precision plant phenotyping requires efficient 3D reconstruction methods with high geometric quality. 3D Gaussian Splatting (3DGS) has recently emerged as a promising approach for real-time 3D reconstruction, achieving impressive visual quality. However, in crop environments dominated by monochromatic and low-texture regions, existing 3DGS methods often produce ambiguous geometries and fail to recover geometry-consistent 3D surfaces. To address these limitations, we propose LV-3DGS (Leafy Vegetables-3DGS), an optimized 3DGS-based framework tailored for the reconstruction of leafy vegetable scenes. First, a blurred reconstruction module is introduced to mitigate reconstruction artifacts caused by camera motion blur during multi-view image acquisition. Second, we propose a planar optimization strategy and design both local and global geometric consistency regularizations to optimize the model, thereby improving the surface reconstruction quality and geometric accuracy. Third, based on an analysis of individual Gaussian contributions, a contribution-based pruning strategy is developed to selectively remove inaccurate geometric components, achieving accurate scene geometry while reducing memory consumption and improving rendering efficiency. In addition, a quantitative geometric evaluation method is proposed for assessing reconstruction quality. Experimental results demonstrate that the proposed method achieves the highest accuracy among the tested baselines, with SSIM, PSNR, and LPIPS reaching 0.94, 34.53 dB, and 0.11, respectively. Moreover, the geometric consistency (GC) metric attains 0.317 cm. Finally, phenotypic parameters are measured from the reconstructed leafy vegetable point clouds. Compared with ground truth measurements, the proposed approach yields coefficients of determination (

R^{2}

) of 0.9959, 0.9651, and 0.9895 for plant height, leaf number, and leaf area, respectively. These results are significantly outperform to some existing phenotyping methods, providing a new methodology and technical solution for high-precision, low-cost, and high-throughput crop phenotyping. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

Search Results (1,429)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,429)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI