Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (557)

Search Parameters:
Keywords = stereo vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 2129 KB  
Article
Do It Once: Concatenating the Image Pair for a Single Pass Feature Extraction in Stereo Depth Sensing
by Žan Regoršek and Andrej Žemva
Sensors 2026, 26(12), 3919; https://doi.org/10.3390/s26123919 (registering DOI) - 20 Jun 2026
Abstract
In the field of stereo depth sensing, modern research predominantly prioritizes accuracy, yet inference speed remains a critical bottleneck for practical, real-time applications on resource-constrained platforms. Existing acceleration approaches often rely on lighter network architectures or runtime-specific optimizations, which may require architectural redesign, [...] Read more.
In the field of stereo depth sensing, modern research predominantly prioritizes accuracy, yet inference speed remains a critical bottleneck for practical, real-time applications on resource-constrained platforms. Existing acceleration approaches often rely on lighter network architectures or runtime-specific optimizations, which may require architectural redesign, platform-specific tuning, or accuracy trade-offs. However, a common inefficiency remains in many stereo pipelines: feature extraction is typically performed using two separate forward passes, one for the left image and one for the right, even though both passes use the same network weights. We address this redundancy by concatenating the left and right images into a single combined tensor, enabling feature extraction in one batched pass while preserving the original network architecture. By reducing feature extraction time by up to 48.4%, our results demonstrate that this method accelerates the overall inference rate by 10% to 39% on average on Nvidia V100 and up to 28.4% on edge device, depending on the model architecture. This speedup is achieved at the expense of only a moderate increase in runtime memory consumption, while retaining the original accuracy. Because the method does not alter the core stereo network, it can be applied as a plug-and-play enhancement to both existing and newly developed stereo matching models. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 4826 KB  
Article
Integrating Visual Perception and Control Strategies in Custom Omnidirectional Mobile Robots
by Radu-Laurențiu Roșca, Andrei-Iulian Iancu, Adrian Burlacu and Cătălin Dosoftei
Sensors 2026, 26(12), 3918; https://doi.org/10.3390/s26123918 (registering DOI) - 20 Jun 2026
Abstract
Autonomous mobile robots are used in optimizing warehouse logistics, yet achieving precise positioning during docking maneuvers and autonomous planning remains a technical challenge. This study presents a custom vision-based control system designed for an autonomous omnidirectional wheeled robot. The proposed methodology acquires visual [...] Read more.
Autonomous mobile robots are used in optimizing warehouse logistics, yet achieving precise positioning during docking maneuvers and autonomous planning remains a technical challenge. This study presents a custom vision-based control system designed for an autonomous omnidirectional wheeled robot. The proposed methodology acquires visual feedback using a stereo camera integrated within the Robot Operating System framework. Two visual feedback control laws are formulated and rigorously evaluated: a Classic Position-Based Visual Servoing algorithm, which minimizes pose error using a quaternion-based approach, and a second solution that utilizes Dual Lie Algebra to compute the 3D visual sensor’s velocities, ensuring convergence towards the desired point-feature configuration. Experimental validation reveals that while both methods achieve docking, the dual pose-free approach enables more robust, effortless movement of the robot platform than Classic Position-Based Visual Servoing. Consequently, these findings indicate that integrating depth-based feature recovery with advanced algebraic strategies offers a stable control strategy for automated industrial scenarios. Full article
(This article belongs to the Special Issue Intelligent Sensing for Robotic Control and Visual Perception)
36 pages, 4092 KB  
Article
Functional Profiling in Paralympic Water Polo Using Deep Learning, Stereo Vision, and Phase-Based Kinematic Analysis: A Pilot Study
by Andrea Zanela
Bioengineering 2026, 13(6), 707; https://doi.org/10.3390/bioengineering13060707 (registering DOI) - 19 Jun 2026
Abstract
Paralympic water polo requires classification systems that reflect sport-specific functional performance under ecologically valid conditions. This pilot study proposes a task-specific kinematic profiling framework for deriving objective, biomechanically interpretable descriptors of residual motor function. Five male national-level water polo athletes—three with eligible motor [...] Read more.
Paralympic water polo requires classification systems that reflect sport-specific functional performance under ecologically valid conditions. This pilot study proposes a task-specific kinematic profiling framework for deriving objective, biomechanically interpretable descriptors of residual motor function. Five male national-level water polo athletes—three with eligible motor impairments and two able-bodied reference participants—performed standardized sport-specific tasks comprising upright floating, vertical propulsion, unilateral passing, non-contested shooting, and contested shooting under physical opposition. Stereoscopic video, OpenPose-based three-dimensional reconstruction, and phase-based analysis were used to extract features and composite indices of postural control, propulsion capacity, upper-limb residual function, and resistance to perturbation. Automatic ball-release detection matched manual frame-level verification in all 128 analyzed ball-related trials. Within the task-specific indices, where higher scores indicate greater functional burden, core values ranged from 0.05–0.15 for upright floating, 0.29–0.68 for combined arm-and-leg vertical propulsion, and 0.040–0.148 for contested shooting across the available subject–side combinations. The profiles showed task- and side-specific differences in stabilization, propulsion, and post-contact motor reorganization. The framework uses pose estimation as a quantitative measurement tool and treats visibility interruptions as functionally meaningful events rather than noise. It is not intended to replace official classification procedures, but to provide transparent and interpretable candidate descriptors for future evidence-based classification research in Paralympic water polo. Full article
Show Figures

Figure 1

30 pages, 86354 KB  
Article
GeometricPrinciples of Stereo Vision: A Quantitative Evaluation and Physical Validation of the Classical Pipeline
by Angel Fernando Ceballos-Espinoza, David Balderas-Silva, Alfredo Diaz-Lara and Rita Q. Fuentes-Aguilar
Appl. Sci. 2026, 16(12), 6212; https://doi.org/10.3390/app16126212 (registering DOI) - 19 Jun 2026
Abstract
Stereo vision is essential for passive three-dimensional perception in resource-constrained applications that require low power consumption, predictable latency, and explainable geometry. Although deep learning architectures dominate recent benchmarks, the classical block-matching pipeline remains a foundational approach. Optimizing this pipeline involves navigating complex trade-offs [...] Read more.
Stereo vision is essential for passive three-dimensional perception in resource-constrained applications that require low power consumption, predictable latency, and explainable geometry. Although deep learning architectures dominate recent benchmarks, the classical block-matching pipeline remains a foundational approach. Optimizing this pipeline involves navigating complex trade-offs among matching robustness, map density, and computational efficiency. This study systematically surveys and physically validates the classical stereo framework. After revisiting geometric first principles, three matching costs (SAD, NCC, ZNCC) are benchmarked alongside Sobel preprocessing and structural refinements, with subsequent validation using a calibrated consumer webcam rig. Middlebury benchmarks (2001–2021) indicate that while SAD fails under complex radiometric distortion, NCC consistently achieves superior quantitative metrics, incurring only a 1.2-fold computational overhead. Extending the disparity search range improves foreground localization, while block size imposes a trade-off between resolving the aperture problem and preserving fine geometric detail. To bridge theoretical analysis and practical deployment, the pipeline is validated using a custom-calibrated consumer stereo rig. The optimized Sobel-NCC architecture is then evaluated for real-time edge deployment on constrained hardware (NVIDIA Jetson Nano) and narrow-baseline sensors (OAK-D SR) in the context of agricultural robotic manipulation. By prioritizing metric precision over dense prediction, the classical pipeline reconstructs target surfaces with approximately 1 cm depth accuracy at 21 frames per second. These results demonstrate that optimized local algorithms offer deterministic and reliable geometric foundations for real-time edge-computed robotics. Although neural networks are essential for dense reconstructions in ill-posed regions, the foundational principles established here remain indispensable for advanced stereo vision system deployment. Full article
(This article belongs to the Section Robotics and Automation)
26 pages, 477 KB  
Article
A Low-Cost RGB-D Sensing Front-End for Stable 3D Hand Landmark Reconstruction Using MediaPipe and ZED2 Stereo Depth
by Laixin Peng, Tiansheng Liu and Bingwei He
Sensors 2026, 26(12), 3730; https://doi.org/10.3390/s26123730 - 11 Jun 2026
Viewed by 206
Abstract
Stable three-dimensional hand landmark reconstruction using low-cost RGB-D sensors is important for human–computer interaction, robot teleoperation, and vision-based motion analysis. RGB-based hand landmark detectors provide stable semantic 2D landmarks, but their depth output is not a metric measurement in the physical camera coordinate [...] Read more.
Stable three-dimensional hand landmark reconstruction using low-cost RGB-D sensors is important for human–computer interaction, robot teleoperation, and vision-based motion analysis. RGB-based hand landmark detectors provide stable semantic 2D landmarks, but their depth output is not a metric measurement in the physical camera coordinate system. Stereo cameras can provide metric depth, but direct landmark-level back-projection is sensitive to invalid pixels, local depth holes, boundary noise, and partial occlusion. To address these problems, this paper presents a lightweight RGB-D sensing front-end that combines MediaPipe semantic hand landmarks with ZED2 stereo depth. The proposed pipeline detects 21 semantic hand landmarks in the RGB image, obtains landmark-level metric depth from the aligned ZED2 depth map using local median sampling, reconstructs 3D landmarks by camera back-projection, and further applies exponential moving average filtering and a bone-length consistency constraint. Experiments were conducted on a self-collected SVO dataset containing 13 hand actions and 26 recorded sequences, and an additional checkerboard-based reference-distance validation was performed to evaluate the metric depth sampling and 3D back-projection component. Compared with single-pixel sampling, the 5×5 local median strategy slightly increased the valid-depth ratio from 0.9731 to 0.9738 and reduced the temporal smoothness metric from 1.7163 mm to 1.6902 mm. To further justify the temporal filtering choice, an additional comparison with the 1 Euro Filter was conducted using the reconstructed win5 trajectories. The 1 Euro Filter produced stronger smoothing, reducing the temporal smoothness metric to 0.196 mm, but also reduced the path-length ratio to 0.484, indicating substantial motion attenuation. EMA0.7 was therefore retained as a more balanced setting, reducing the temporal smoothness metric to 0.826 mm while maintaining a path-length ratio of 0.803. The BL0.5 bone-length constraint reduced the bone-length standard deviation from 2.0727 mm to 1.1995 mm with limited trajectory modification. The final configuration provides a practical low-cost RGB-D front-end for stable 3D hand landmark reconstruction under controlled indoor conditions. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

31 pages, 69219 KB  
Article
AquaFishNet: A Binocular Vision-Based Method for Fish Body Mass Estimation
by Longquan Xu, Haixiong Ye, Shuai Wang, Xiangde Cao and Jingxiang Xu
Fishes 2026, 11(6), 341; https://doi.org/10.3390/fishes11060341 - 6 Jun 2026
Viewed by 226
Abstract
Accurate monitoring of fish body length and mass is essential for evaluating growth status, optimizing feeding strategies, and supporting intelligent aquaculture management. However, conventional manual measurements are labor-intensive and may induce stress or injury due to repeated fish handling. To address these limitations, [...] Read more.
Accurate monitoring of fish body length and mass is essential for evaluating growth status, optimizing feeding strategies, and supporting intelligent aquaculture management. However, conventional manual measurements are labor-intensive and may induce stress or injury due to repeated fish handling. To address these limitations, this study developed AquaFishNet, a binocular vision-based framework for non-contact underwater body length and mass estimation of Leiocassis longirostris. Underwater images were collected in a real recirculating aquaculture environment using a calibrated binocular camera system. AquaFishNet integrates lightweight fish body segmentation, stereo vision-based length estimation, and deep regression-based mass prediction. Experimental results showed that body length estimation errors were mostly within approximately ±2 cm, with relative errors generally below 8%. For body mass prediction, most relative errors were within approximately ±7%, and the model achieved an R2 of 0.9851, RMSE of 18.38 g, and MAE of 12.92 g. These findings demonstrate that AquaFishNet provides an effective non-contact solution for fish growth monitoring and biomass estimation in precision aquaculture. Full article
(This article belongs to the Section Fishery Facilities, Equipment, and Information Technology)
Show Figures

Figure 1

20 pages, 11396 KB  
Article
Development of a Robotic Weed Puller for Precision Management of Palmer Amaranth in Cotton
by Taranjeet Singh Sodhi, Shekhar Thapa, Canicius Mwitta and Glen C. Rains
AgriEngineering 2026, 8(6), 226; https://doi.org/10.3390/agriengineering8060226 - 5 Jun 2026
Viewed by 427
Abstract
The objective of this study was to design, fabricate, and test an automated inter-row robotic system for the precision management of Palmer amaranth (Amaranthus palmeri) in cotton. A Farm-ng robotic platform with custom-designed weed pulling and cutting attachments was used to [...] Read more.
The objective of this study was to design, fabricate, and test an automated inter-row robotic system for the precision management of Palmer amaranth (Amaranthus palmeri) in cotton. A Farm-ng robotic platform with custom-designed weed pulling and cutting attachments was used to achieve weed control. The pulling system consisted of two counter-rotating rollers with a frictional cover to uproot weeds, followed by a cutting operation to shred the weeds into smaller pieces, preventing regrowth. A deep learning model, YOLOv11s, was used for weed identification, while point cloud data from a stereo camera was used to estimate weed height in real-time for dynamic adjustment of the puller height. The system was evaluated at three forward speeds (0.06, 0.15, and 0.25 m/s), two roller speeds (107 and 161 RPM), and three attachment configurations (puller-only, cutter-only, and combined). The combined configuration consistently outperformed individual operations, achieving 80% control at 0.15 m/s and a roller speed of 161 RPM. Optimal performance was observed when the angular puller velocity was 15–25 times the forward speed of the rover. This approach demonstrates the potential of integrating mechanical weed removal with real-time computer vision to improve weed management and reduce labor requirements. Full article
Show Figures

Figure 1

26 pages, 31069 KB  
Article
Eight-Wheel Mecanum Omnidirectional Autonomous Mobile Robot: Kinematics, Architecture, and Validation
by Leonardo D. Ortega-Lomeli, Luis C. Básaca-Preciado, Ulises Orozco-Rosas, J. D. Castro-Toscano and M. A. Ponce-Camacho
Electronics 2026, 15(11), 2441; https://doi.org/10.3390/electronics15112441 - 3 Jun 2026
Viewed by 360
Abstract
Autonomous omnidirectional vehicles that combine redundant holonomic kinematics, ROS 2/micro-ROS implementation, and simulation-to-real validation remain limited in the literature. This paper presents an eight-wheel Mecanum autonomous mobile robot for campus navigation in environments shared with pedestrians. The work formulates forward and inverse kinematics [...] Read more.
Autonomous omnidirectional vehicles that combine redundant holonomic kinematics, ROS 2/micro-ROS implementation, and simulation-to-real validation remain limited in the literature. This paper presents an eight-wheel Mecanum autonomous mobile robot for campus navigation in environments shared with pedestrians. The work formulates forward and inverse kinematics for the redundant eight-wheel topology and implements a distributed architecture in which ROS 2 handles high-level navigation and micro-ROS connects ESP32-based wheel interfaces. The platform integrates LiDAR, stereo vision, inertial, encoder, and ultrasonic sensing within a closed-loop navigation stack. Validation was conducted through Gazebo simulation and physical experiments using an out-and-back navigation protocol. In the physical platform, 91 of 100 missions were completed without safety interruptions, with pose-accuracy success rates of 96% for outbound legs and 81% for return legs under ep<1.5m and |eθ|<15. Median errors at the intermediate waypoint were 0.64m, 0.191m, and 17, while final-pose medians after return were 1.016m, 0.573m, and 28.5. These results provide a quantitative baseline for campus-scale redundant Mecanum navigation and identify heading recovery as the main limitation. Full article
(This article belongs to the Special Issue Robotics: From Technologies to Applications)
Show Figures

Figure 1

23 pages, 32417 KB  
Article
Vision-Based Person-Following Algorithm for Assistive Elderly-Care Quadruped Robots
by Vishnudev Kurumbaparambil, Subashkumar Rajanayagam and Stefan Twieg
Sensors 2026, 26(10), 3263; https://doi.org/10.3390/s26103263 - 21 May 2026
Viewed by 482
Abstract
The demographic shift towards an aging population necessitates innovative solutions for care and mobility support. While commercial quadruped robots like the Unitree Go1 offer dynamic stability, their native following modes often lack the safety margins and predictability required, and they do not consistently [...] Read more.
The demographic shift towards an aging population necessitates innovative solutions for care and mobility support. While commercial quadruped robots like the Unitree Go1 offer dynamic stability, their native following modes often lack the safety margins and predictability required, and they do not consistently follow the user, at times deviating and navigating independently. This paper presents a robust, vision-based, person-following algorithm designed to address these limitations. Utilizing a ZED 2 stereo camera and Robot Operating System (ROS), the system employs a finite state machine to ensure deterministic target tracking. A velocity control strategy partitions the robot’s motion into distinct stability, proportional, and braking zones based on depth data to ensure fluid interaction. The framework was validated on a Unitree Go1 quadruped platform in an outdoor environment involving 90-degree turns to evaluate tracking robustness. By operating in a headless mode, the system achieved a mean processing latency of 66.5±4.3 ms. Experimental results demonstrated consistent operational stability, 0.0% intrusion into the intimate safety zone, and effective velocity synchronization between 0.47 and 0.54 m/s. While this study establishes a robust technical baseline using healthy subjects, it serves as a preliminary development platform; further iterative testing with elderly users in clinical settings is required to move toward deployment. Beyond the evaluated trials, the framework maintained reliable functional performance across various care facility workshops, successfully following the target in all deployment scenarios. These findings establish a stable technical foundation for the future development of robotic walking partners. Full article
(This article belongs to the Special Issue Intelligent Sensing for Robotic Control and Visual Perception)
Show Figures

Figure 1

46 pages, 3292 KB  
Article
Autonomous Fault-Tolerant Cooperative Tracking and Obstacle Avoidance for UAV Swarm in Complex Maritime Environments
by Zhiyang Zhang, Xiaolong Liang, Aoyu Zheng and Ning Wang
Drones 2026, 10(5), 388; https://doi.org/10.3390/drones10050388 - 19 May 2026
Viewed by 254
Abstract
To address the challenge of stable tracking of moving maritime targets by unmanned aerial vehicle(UAV) swarm in environments with threat zones and platform failure risks, this paper proposes a cooperative tracking and guidance strategy integrating Distributed Model Predictive Control (DMPC) with Sequential Quadratic [...] Read more.
To address the challenge of stable tracking of moving maritime targets by unmanned aerial vehicle(UAV) swarm in environments with threat zones and platform failure risks, this paper proposes a cooperative tracking and guidance strategy integrating Distributed Model Predictive Control (DMPC) with Sequential Quadratic Programming (SQP). A cooperative tracking model is developed incorporating UAV kinematics, environmental threats, stereo-vision positioning, and field-of-view constraints. Two original strategies are introduced within the DMPC framework: an altitude-cooperative target recapture strategy reduces target total loss duration by approximately 7 s compared to fixed-altitude baselines, while a distributed formation reconfiguration strategy restores stable tracking within 10 s after member failure and ensures safe inter-UAV separation. A multi-constraint trajectory tracking controller based on DMPC-SQP achieves real-time co-optimization of threat avoidance, formation maintenance, and tracking accuracy. Simulation results in dense threat environments demonstrate a 93.4% Quadratic Programming feasibility rate, with mean tracking error reduced by 25.4% over fixed-altitude DMPC and 48.7% over methods based on the Linear Quadratic Regulator (LQR), while maintaining robust performance under 300 ms communication delay, sensor noise, and moderate wind disturbance. Full article
(This article belongs to the Special Issue Flight Control and Collision Avoidance of UAVs: 2nd Edition)
Show Figures

Figure 1

27 pages, 116833 KB  
Article
Sparse Self-Prompt-Guided Stereo Matching for Real-World Generalization
by Hangbiao Li, Haojun Mo, Xing Li, Tao Fang, Sikun Liu, Shuzhen Yu and Zhibo Rao
Sensors 2026, 26(10), 3173; https://doi.org/10.3390/s26103173 - 17 May 2026
Viewed by 330
Abstract
Stereo matching has witnessed rapid advances on curated benchmarks, yet deploying models in unconstrained real-world environments remains a fundamental challenge. This paper presents a sparse self-prompt-guided network (SSPGNet) for stereo matching with strong generalization across diverse environments. Our core innovation lies in a [...] Read more.
Stereo matching has witnessed rapid advances on curated benchmarks, yet deploying models in unconstrained real-world environments remains a fundamental challenge. This paper presents a sparse self-prompt-guided network (SSPGNet) for stereo matching with strong generalization across diverse environments. Our core innovation lies in a sparse self-prompt guidance mechanism: (1) a sparse disparity map, used as a prompt, is self-estimated from visual foundation model features via cost aggregation; (2) the sparse disparity is progressively refined into dense disparity maps through cross-attention-based stereo feature interaction, enabling sparse-to-dense disparity prediction. Additionally, we collected a diverse set of indoor and outdoor stereo pairs by using a ZED 2 camera to assess the real-world performance of our model. Extensive experiments demonstrate that the proposed sparse-to-dense prompt mechanism not only preserves the semantic awareness of visual foundation models but also enhances stereo correspondence reasoning, achieving strong performance on public benchmarks and our in-the-wild dataset. Specifically, under the cross-domain (zero-shot) protocol, the proposed SSPGNet achieves bad-pixel error rates of 3.6% on KITTI 2012 (>3 px), 4.4% on KITTI 2015 (>3 px), 7.6% on Middlebury (>2 px), and 2.1% on ETH3D (>1 px), ranking first on three of the four public benchmarks. These results highlight the potential of SSPGNet for direct deployment in real-world stereo perception systems. The code is publicly available at GitHub. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

37 pages, 10460 KB  
Article
Research on Visual Recognition and Harvesting Point Localization System for Grape-Picking Robots in Smart Agriculture
by Tao Lin, Qiurong Lv, Fuchun Sun, Wei Ma and Xiaoxiao Li
Agriculture 2026, 16(10), 1073; https://doi.org/10.3390/agriculture16101073 - 14 May 2026
Viewed by 319
Abstract
To improve grape target perception and picking-point positioning for intelligent harvesting robots, this study develops a vision-based method for orchard grape detection and harvesting-point localization. The method is intended to address missed detections, insufficient recognition accuracy, and unsatisfactory peduncle segmentation caused by illumination [...] Read more.
To improve grape target perception and picking-point positioning for intelligent harvesting robots, this study develops a vision-based method for orchard grape detection and harvesting-point localization. The method is intended to address missed detections, insufficient recognition accuracy, and unsatisfactory peduncle segmentation caused by illumination variation, occlusion, and interference from branches and leaves in complex orchard scenes. For grape cluster and peduncle detection, a lightweight YOLOv7-derived model, termed YOLO-FES, was established. In this model, FasterNet and SCConv were introduced to refine the backbone and neck structures, and the EMA mechanism was incorporated to lower parameter complexity and computational cost while improving detection performance. For suspended grape structure association and peduncle extraction, the GJK algorithm was combined with nearest-neighbor rectangular discrimination, and an improved YOLACT-based peduncle segmentation network, named M-YOLACT, was constructed. With the integration of the MLCA mechanism and the Mish activation function, accurate peduncle segmentation was achieved. In addition, a stereo depth camera was employed to obtain two-dimensional picking-point information and further recover the corresponding three-dimensional spatial coordinates. Experimental results showed that the mAP@0.5 of YOLO-FES for grape clusters and peduncles reached 95.37%. For grape peduncle segmentation, the mAP@0.5 values of the bounding boxes and masks produced by M-YOLACT reached 95.73% and 94.36%, respectively. The proposed method achieved an overall harvesting success rate of 89.2%, with an average time consumption of 11 s for a single harvesting operation. By integrating deep-learning-based detection and segmentation with binocular-vision localization, this study provides a practical technical solution and useful reference for the visual system design of grape-harvesting robots. Full article
Show Figures

Figure 1

33 pages, 58211 KB  
Review
Binocular Stereo Vision in Remote Sensing: A Review
by Xing Li, Hongwei Zhou, Mingyu Sun, Bangshu Xiong, Yuchao Dai, Renjie He, Zhihua Chen and Zhibo Rao
Remote Sens. 2026, 18(10), 1480; https://doi.org/10.3390/rs18101480 - 9 May 2026
Viewed by 323
Abstract
Stereo vision leverages binocular imagery to emulate the human visual system in perceiving three-dimensional (3D) structures by estimating disparity from rectified image pairs and converting it to depth via geometric triangulation. In recent years, deep learning-based stereo matching has significantly advanced in accuracy, [...] Read more.
Stereo vision leverages binocular imagery to emulate the human visual system in perceiving three-dimensional (3D) structures by estimating disparity from rectified image pairs and converting it to depth via geometric triangulation. In recent years, deep learning-based stereo matching has significantly advanced in accuracy, efficiency, and generalization, surpassing traditional methods and demonstrating great potential in remote sensing applications. However, stereo matching in remote sensing faces unique challenges not commonly seen in terrestrial datasets. These include limited access to satellite imagery, seasonal differences between image pairs, difficulty in identifying small objects, and widespread regions with repetitive textures, such as lakes and forests. Unlike prior surveys that primarily address ground-level scenes, this paper presents a comprehensive review of stereo matching techniques tailored for remote sensing. It synthesizes the progress and limitations of representative models, analyzes the characteristics and domain-specific constraints of remote sensing stereo datasets, and outlines future research directions and application prospects in this field. Full article
Show Figures

Figure 1

24 pages, 17618 KB  
Article
ORAMA: A Unified Computer Vision Framework for Real-Time Exercise Supervision, Functional Assessment and Remote Monitoring
by Orestis N. Zestas, Dimitrios N. Soumis, Konstantinos I. Roumeliotis, Kyriakos-Ioannis D. Kyriakou, Stefania Tzanera, Konstantinos Laloudakis, Vasileios Sakellariou Kyrou, Theoni Moraitou, Sofia H. Kapellaki, Kyriaki Seklou and Nikolaos D. Tselikas
Appl. Sci. 2026, 16(9), 4539; https://doi.org/10.3390/app16094539 - 5 May 2026
Viewed by 610
Abstract
Remote exercise supervision and functional movement assessment require sensing pipelines that can capture body motion, interpret protocol progression, and provide meaningful feedback within the same runtime environment. This paper presents ORAMA, an integrated computer vision platform for the execution and remote monitoring of [...] Read more.
Remote exercise supervision and functional movement assessment require sensing pipelines that can capture body motion, interpret protocol progression, and provide meaningful feedback within the same runtime environment. This paper presents ORAMA, an integrated computer vision platform for the execution and remote monitoring of digital exercises and clinically oriented assessment protocols related to physical fitness, mobility, balance, and health. The system combines ZED 2i stereo capture and depth-aware body tracking with a protocol-driven software architecture that includes a computer-vision pipeline, an exercise and assessment engine, a real-time feedback layer, persistent session handling, structured output generation, and a chatbot-assisted interaction path. Unlike solutions that focus only on movement recognition, ORAMA organizes each task as an explicit executable protocol with calibration stages, state transitions, task-specific metrics, and live visual guidance. The paper analyzes the system architecture, reviews the surrounding literature on virtual coaching and rehabilitation-oriented computer vision, and demonstrates representative user-interface and runtime views for both assessment and exercise scenarios. The present work reports a prototype architecture and representative operational demonstrations, rather than a completed clinical validation or participant-based efficacy study. The resulting platform shows how markerless 3D body tracking can be embedded within a unified and interpretable environment for guided exercise, functional testing, and remote follow-up without requiring wearable sensors. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

37 pages, 19367 KB  
Article
MarsBird-VII: An Autonomous Stereo–Inertial Navigation System with Real-Time Optimization for a Mars Rotorcraft Space Drone
by Ju Xiao, Hanchen Qiu, Yukun Zhou, Rui Wang and Peng Liu
Drones 2026, 10(5), 346; https://doi.org/10.3390/drones10050346 - 4 May 2026
Viewed by 476
Abstract
Reliable autonomous navigation for Tianwen-3-class Mars rotorcraft must satisfy both sampling-level accuracy and hard real-time execution under severe onboard computational constraints. To address this challenge, we develop MarsBird-VII, a mission-constrained stereo visual–inertial navigation system that combines a computation-aware vision front-end with a Parity-Window [...] Read more.
Reliable autonomous navigation for Tianwen-3-class Mars rotorcraft must satisfy both sampling-level accuracy and hard real-time execution under severe onboard computational constraints. To address this challenge, we develop MarsBird-VII, a mission-constrained stereo visual–inertial navigation system that combines a computation-aware vision front-end with a Parity-Window sliding-window optimization back-end. The front-end decouples high-rate tracking from feature replenishment to bound perception latency, while the back-end alternates updates over interleaved state subsets and preserves full-window coupling through unified marginalization. Unlike simply reducing the sliding-window size, the proposed strategy reduces the per-update optimization cost without shrinking the geometric observation horizon, thereby improving the accuracy–runtime trade-off for embedded avionics. Earth-analog flight experiments demonstrate strong navigation performance under mission-relevant conditions. In full-sequence evaluation, the proposed system achieves an SE(3)-aligned translation APE of 0.31 m RMSE/0.47 m Max and further reaches 0.06 m RMSE/0.15 m Max on a nominal stable segment. Runtime profiling over 5000+ update cycles shows that the Parity-Window back-end keeps the maximum optimization latency below 58.32 ms, satisfying the 66.7 ms hard real-time deadline while maintaining accuracy close to full-window optimization. These results show that the proposed system provides a practical balance of accuracy, robustness, and deterministic real-time performance for Tianwen-3-class Mars rotorcraft navigation. Full article
Show Figures

Figure 1

Back to TopTop