Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (538)

Search Parameters:
Keywords = stereo vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 4062 KB  
Article
Robotic Harvesting of Apples Using ROS2
by Connor Ruybalid, Christian Salisbury and Duke M. Bulanon
Machines 2026, 14(4), 433; https://doi.org/10.3390/machines14040433 - 14 Apr 2026
Viewed by 308
Abstract
Rising global food demand, increasing labor costs, and farm labor shortages have created significant challenges for specialty crop production, particularly in labor-intensive tasks such as fruit harvesting. Robotic harvesting offers a promising long-term solution, yet its adoption in orchard environments remains limited due [...] Read more.
Rising global food demand, increasing labor costs, and farm labor shortages have created significant challenges for specialty crop production, particularly in labor-intensive tasks such as fruit harvesting. Robotic harvesting offers a promising long-term solution, yet its adoption in orchard environments remains limited due to unstructured conditions, variable lighting, and difficulties in fruit recognition and manipulation. This study presents an improved robotic fruit harvesting system, Orchard roBot (OrBot), developed by the Robotics Vision Lab at Northwest Nazarene University, with the goal of advancing autonomous apple harvesting applications. The updated OrBot platform integrates a dual-camera vision system consisting of an eye-to-hand stereo camera with a wide field of view for fruit detection and an eye-in-hand RGB-D camera for precise manipulation. The control architecture was redesigned using Robot Operating System 2 (ROS2) and Python, enabling modular subsystem development and coordination. Fruit detection was performed using a YOLOv5 deep learning model, and visual servoing was employed to guide the robotic manipulator toward the target fruit. System performance was evaluated through laboratory experiments using artificial trees and field tests conducted in a commercial apple orchard in Idaho. OrBot achieved a 100% harvesting success rate in indoor tests and a 75–80% success rate in outdoor orchard conditions. Experimental results demonstrate that the dual-camera approach significantly enhances fruit search efficiency and harvesting efficiency. Identified limitations include sensitivity to lighting conditions, end effector performance with varying fruit sizes, and depth estimation errors. Overall, the results indicate a positive potential toward effective robotic fruit harvesting and highlight key areas for future improvement in vision, manipulation, and system robustness. Full article
Show Figures

Figure 1

28 pages, 3527 KB  
Article
Autonomous Tomato Harvesting System Integrating AI-Controlled Robotics in Greenhouses
by Mihai Gabriel Matache, Florin Bogdan Marin, Catalin Ioan Persu, Robert Dorin Cristea, Florin Nenciu and Atanas Z. Atanasov
Agriculture 2026, 16(8), 847; https://doi.org/10.3390/agriculture16080847 - 11 Apr 2026
Viewed by 855
Abstract
Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning [...] Read more.
Labor shortages and the need for increased productivity have accelerated the development of robotic harvesting systems for greenhouse crops; however, reliable operation under fruit occlusion and clustered arrangements remains a major challenge, particularly due to the limited integration between perception and motion planning modules. The paper presents the design and experimental validation of an autonomous robotic system for greenhouse tomato harvesting. The proposed platform integrates a rail-guided mobile base, a six-degrees-of-freedom robotic manipulator, and an adaptive end effector with a hybrid vision framework that combines convolutional neural networks and watershed-based segmentation to enable robust fruit detection and localization under occluded conditions. The proposed approach enables improved separation of overlapping fruits and provides accurate spatial localization through stereo vision combined with IMU-assisted camera-to-robot coordinate transformation. An occlusion-aware trajectory planning strategy was developed to generate collision-free manipulation paths in the presence of leaves and stems, enhancing harvesting safety and reliability. The system was trained and evaluated using a dataset of real greenhouse images supplemented with synthetic data augmentation. Experimental trials conducted under practical greenhouse conditions demonstrated a fruit detection precision of 96.9%, recall of 93.5%, and mean Intersection-over-Union of 79.2%. The robotic platform achieved an overall harvesting success rate of 78.5%, reaching 85% for unobstructed fruits, with an average cycle time of 15 s per fruit in direct harvesting scenarios. The rail-guided mobility significantly improved positioning stability and repeatability during manipulation compared with fully mobile platforms. The results confirm that integrating hybrid perception with occlusion-aware motion planning can substantially improve the functionality of robotic harvesting systems in protected cultivation environments. The proposed solution contributes to the advancement of automation technologies for greenhouse vegetable production and supports the transition toward more sustainable and labor-efficient agricultural practices. Full article
Show Figures

Figure 1

22 pages, 4848 KB  
Article
A Lightweight Improved RT-DETR for Stereo-Vision-Based Excavator Posture Recognition
by Yunlong Hou, Ke Wu, Yuhan Zhang, Mengying Zhou, Jiasheng Lu and Zhao Zhang
Mathematics 2026, 14(7), 1226; https://doi.org/10.3390/math14071226 - 7 Apr 2026
Viewed by 332
Abstract
In intelligent excavator applications, traditional excavator posture recognition methods face two major challenges: limited recognition accuracy and insufficient computing resources on edge devices. To address these issues, this study proposes an excavator posture recognition method based on an improved Real-Time Detection Transformer (RT-DETR). [...] Read more.
In intelligent excavator applications, traditional excavator posture recognition methods face two major challenges: limited recognition accuracy and insufficient computing resources on edge devices. To address these issues, this study proposes an excavator posture recognition method based on an improved Real-Time Detection Transformer (RT-DETR). First, a new backbone network is designed based on the Reparameterized Vision Transformer to improve feature utilization efficiency while reducing computational demands. Next, the overall architecture is optimized by introducing lightweight Dynamic Upsamplers, which reduce information loss during upsampling and enhance multi-scale feature fusion. In addition, a Cross-Attention Fusion Module is adopted to strengthen local feature extraction while retaining the global modeling capability of the Transformer, thereby improving the discrimination between foreground and background. Finally, a Multi-Scale Fusion Network is introduced to further enhance the multi-scale feature representation ability of RT-DETR. Experimental results show that the proposed method achieves a mean average precision (mAP) of 94.29% for small object detection, which is 7.96% higher than that of the baseline RT-DETR, while reducing the number of model parameters by 34.95%. Compared with YOLO-series models, the proposed method improves mAP by 8.62% to 12.75%. These results indicate that the proposed method outperforms existing methods in both detection accuracy and computational efficiency and provides an efficient and feasible solution for real-time excavator posture recognition. Full article
Show Figures

Figure 1

23 pages, 2145 KB  
Article
Seeing Through Touch: A Stereo-Vision Vibrotactile Aid for Visually Impaired People
by Claudia Presicci, Giulia Ballardini, Giorgia Marchesi, Paolo Robutti, Matteo Moro, Camilla Pierella, Andrea Canessa and Maura Casadio
Electronics 2026, 15(7), 1511; https://doi.org/10.3390/electronics15071511 - 3 Apr 2026
Viewed by 308
Abstract
Blind and visually impaired individuals face persistent challenges when navigating unfamiliar environments, where unseen obstacles compromise their safety and independence. Although many electronic travel aids have been proposed, most remain impractical for daily use—they often rely on bulky or costly hardware, require external [...] Read more.
Blind and visually impaired individuals face persistent challenges when navigating unfamiliar environments, where unseen obstacles compromise their safety and independence. Although many electronic travel aids have been proposed, most remain impractical for daily use—they often rely on bulky or costly hardware, require external processing, or provide unintuitive feedback. This work presents a wearable stereo-vision-based vibrotactile system for real-time obstacle detection and navigation assistance. The device combines an off-the-shelf stereo camera integrated with a simultaneous localization and mapping framework to perceive spatial geometry and detect obstacles in the user’s path. Two stereo-matching methods were implemented to estimate depth: a block-based algorithm optimized for low-latency performance and a semi-global approach providing denser depth maps. Detected obstacles are translated into distinct vibration patterns delivered through four skin-contact body-mounted actuators encoding both direction and distance. The system was evaluated with blindfolded sighted, visually impaired, and blind participants. Both stereo approaches supported reliable real-time guidance and high obstacle-avoidance rates, demonstrating robust performance on affordable, wearable hardware. These findings confirm the feasibility of real-time tactile guidance using commercially available components, marking a concrete step toward accessible navigation support that enhances safety and autonomy for blind and visually impaired individuals. Full article
(This article belongs to the Special Issue Feature Papers in Bioelectronics: 2025–2026 Edition)
Show Figures

Figure 1

12 pages, 2073 KB  
Proceeding Paper
Binocular Stereo Vision Disparity Estimation Based on Distilled Internally Normalized Optimized Version 2 with Multi-Scale Attention Fusion
by Chang-Fu Hung, Tzu-Jung Tseng and Jian-Jiun Ding
Eng. Proc. 2026, 134(1), 20; https://doi.org/10.3390/engproc2026134020 - 31 Mar 2026
Viewed by 247
Abstract
A stereo vision framework is designed to improve disparity estimation in occluded and boundary regions, targeting autonomous driving scenarios. The proposed architecture combines frozen Distilled Internally Normalized Optimized Version 2 features with a modular three-stage attention fusion strategy, which consists of bottom-up semantic [...] Read more.
A stereo vision framework is designed to improve disparity estimation in occluded and boundary regions, targeting autonomous driving scenarios. The proposed architecture combines frozen Distilled Internally Normalized Optimized Version 2 features with a modular three-stage attention fusion strategy, which consists of bottom-up semantic propagation, top-down detail enhancement, and cross-view attention mechanisms. These stages jointly enforce semantic consistency, structural integrity, and accurate correspondence modeling. The fused features are then processed by an Iterative Geometry Encoding and Volumetric regression-based disparity estimation module for multi-stage regression and iterative refinement. A three-phase training pipeline is employed, including pretraining on SceneFlow, fine-tuning on virtual Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) benchmarks, and adaptation to the KITTI and ETH Zurich 3D benchmark dataset. The model achieves an out-of-center, non-occluded pixel error of 7.45% on KITTI2012 and a D1-all error of 4.10% on KITTI2015. Beyond quantitative performance, the proposed method produces visually superior disparity maps. The enhancements of boundary sharpness, occlusion completion, and structural coherence demonstrate the strong potential of the proposed algorithm for real-world deployment in dynamic and complex environments. Full article
Show Figures

Figure 1

24 pages, 4289 KB  
Article
Floor Plan Generation of Existing Buildings Based on Deep Learning and Stereo Vision
by Dejiang Wang and Taoyu Peng
Buildings 2026, 16(7), 1310; https://doi.org/10.3390/buildings16071310 - 26 Mar 2026
Viewed by 440
Abstract
The reinforcement and renovation of existing buildings constitute an important component of the future development of the civil engineering industry. Such projects typically require the original construction drawings of the building. However, for older structures, the original paper-based drawings may be damaged or [...] Read more.
The reinforcement and renovation of existing buildings constitute an important component of the future development of the civil engineering industry. Such projects typically require the original construction drawings of the building. However, for older structures, the original paper-based drawings may be damaged or lost. Moreover, traditional manual surveying and mapping methods are time-consuming, labor-intensive, and limited in accuracy. To address these issues, this paper proposes a floor plan generation method for existing buildings that integrates deep learning and stereo vision based on a fusion of synthetic and real data. First, collaborative modeling and automated rendering between a large language model and Blender are implemented based on the Model Context Protocol (MCP), enabling indoor scene modeling and image acquisition to construct a synthetic dataset containing structural components such as doors, windows, and walls. Meanwhile, manually annotated real indoor images are incorporated. Synthetic and real data are mixed in different proportions to form multiple dataset configurations for model training and validation. Subsequently, the SegFormer model is employed to perform semantic segmentation of indoor components. Combined with stereo camera calibration results, disparity computation is conducted to extract the three-dimensional spatial coordinates of component corner points. On this basis, the architectural floor plan is generated according to the spatial geometric relationships among structural components. Experimental results demonstrate that the proposed method effectively reduces the need for manual annotation and on-site measurement, providing an efficient technical solution for indoor floor plan generation of existing buildings. Full article
(This article belongs to the Topic Application of Smart Technologies in Buildings)
Show Figures

Figure 1

29 pages, 8910 KB  
Article
Field Evaluation of a Robotic Apple Harvester with Negative-Pressure Driven End-Effectors on a Simplified 4-DoF Manipulator
by Guangrui Hu, Jianguo Zhou, Shiwei Wen, Ning Chen, Chen Chen, Fangmin Cheng, Yu Chen and Jun Chen
Agriculture 2026, 16(7), 717; https://doi.org/10.3390/agriculture16070717 - 24 Mar 2026
Viewed by 410
Abstract
Apple picking is an inherently labor-intensive, time-consuming, and costly task, and robotic harvesting represents a potential alternative to address this challenge. This study presents the development and field evaluation of an integrated robotic system for apple harvesting, which combines machine vision, a dual [...] Read more.
Apple picking is an inherently labor-intensive, time-consuming, and costly task, and robotic harvesting represents a potential alternative to address this challenge. This study presents the development and field evaluation of an integrated robotic system for apple harvesting, which combines machine vision, a dual four-degree-of-freedom (DoF) manipulator, and a mobile platform. The harvesting mechanism employed a streamlined 4-DoF manipulator driven by closed-loop stepper motors, incorporating a differential gear mechanism to execute yaw and pitch motions. Trajectory planning utilized linear interpolation with a harmonic acceleration/deceleration profile to ensure smooth end-effector movement. Fruit detection and localization within the canopy were performed by a stereo vision system running a lightweight deep neural network, achieving a mean hand-eye calibration accuracy of 4.7 ± 2.7 mm. Three negative-pressure driven soft end-effector designs—a suction soft end-effector (SSE), a grasping soft end-effector (GSE), and a suction-grasping soft end-effector (SGSE)—were assessed for their harvesting performance. Field trials conducted in a commercial spindle orchard demonstrated that the GSE achieved the highest performance, with a harvesting success rate of 80.80% among reachable fruits, a full-process success rate (from detection to collection) of 61.59%, an overall fruit damage rate of 10.89%, and an average single-fruit cycle time of 5.27 s. In contrast, the SSE and SGSE showed lower success rates (49.21% and 64.71%, respectively). This work provides a practical robotic harvesting solution. It validates the feasibility of a zoned, multi-manipulator harvesting strategy and delivers comparative data to guide the development of more efficient and robust harvesting robots. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Graphical abstract

61 pages, 11232 KB  
Article
A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study
by Andrea Zanela
Appl. Sci. 2026, 16(6), 3091; https://doi.org/10.3390/app16063091 - 23 Mar 2026
Viewed by 256
Abstract
The clinical evaluation of motor impairment in Parkinson’s disease is commonly based on the Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part III, which relies on visual assessment and is therefore subject to inter-rater variability. Existing technology-based solutions often require wearable [...] Read more.
The clinical evaluation of motor impairment in Parkinson’s disease is commonly based on the Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part III, which relies on visual assessment and is therefore subject to inter-rater variability. Existing technology-based solutions often require wearable sensors or lack structural alignment with the item-based architecture of the clinical examination. This study presents a fully automated and contactless framework designed to quantitatively describe motor performance in tasks explicitly aligned with MDS-UPDRS Part III. The system integrates stereo vision, deep learning-based pose estimation, and acoustic analysis to derive continuous, standardized quantitative descriptors. Objective Motor Item Indices were defined for 17 of the 18 motor items, excluding rigidity, which cannot be inferred from vision-based measurements. The framework was evaluated in a cohort of healthy subjects to establish an internal reference baseline for feature normalization and index construction. Within this cohort, descriptors exhibited coherent multivariate organization and internally consistent distributions, supporting methodological feasibility at this baseline definition stage. This work represents a methodological and baseline definition phase. Clinical validation in Parkinsonian populations, correlation with neurologist-rated scores, and longitudinal assessment remain necessary to determine diagnostic, severity-related, or early-stage applicability. Full article
(This article belongs to the Special Issue Emerging Technologies for Assistive Robotics)
Show Figures

Figure 1

14 pages, 3023 KB  
Article
Lightweight Stereo Vision for Obstacle Detection and Range Estimation in Micro-Mobility Vehicles
by Jiansheng Ruan, Hui Weng, Zhaojun Yuan, Guangyuan Jin and Liang Zhou
Sensors 2026, 26(6), 1988; https://doi.org/10.3390/s26061988 - 23 Mar 2026
Viewed by 305
Abstract
Micro-mobility vehicles operating in closed, low-speed environments (e.g., parks) require reliable obstacle detection and accurate range estimation under strict constraints on cost, power, and onboard computation. This paper proposes HAGVNet, a lightweight stereo matching network for embedded ranging and validates its practical deployability [...] Read more.
Micro-mobility vehicles operating in closed, low-speed environments (e.g., parks) require reliable obstacle detection and accurate range estimation under strict constraints on cost, power, and onboard computation. This paper proposes HAGVNet, a lightweight stereo matching network for embedded ranging and validates its practical deployability in a target-level ranging pipeline with YOLO11n as the front-end detector. HAGVNet builds a hierarchical attention-guided cost volume (HAGV) that uses coarse-scale geometric priors to modulate fine-scale cost modeling and adopts ConvNeXtV2-style 2D cost aggregation blocks to improve stability and boundary consistency with controlled complexity. For ranging, depth statistics within detected regions are used to estimate target distance and 3D position. The model is pre-trained on SceneFlow and evaluated on KITTI. On SceneFlow, HAGVNet reaches 0.73 px EPE with 20.08 G FLOPs, indicating a favorable accuracy–complexity trade-off under low computation budgets. On an embedded Jetson Orin Nano Super platform, HAGVNet achieves 46.3 FPS under TensorRT FP16, and field tests indicate relative ranging errors of 0.5–8.6% within 2–10 m, demonstrating its practical feasibility for low-speed target-level ranging. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

30 pages, 3812 KB  
Review
Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches
by Ali Javadi Moghadam, Abbas Kiani, Reza Naeimaei, Shirin Malihi and Ioannis Brilakis
J. Imaging 2026, 12(3), 128; https://doi.org/10.3390/jimaging12030128 - 13 Mar 2026
Viewed by 1184
Abstract
Three-dimensional (3D) reconstruction using images is one of the most significant topics in computer vision and photogrammetry, with wide-ranging applications in robotics, augmented reality, and mapping. This study investigates methods of 3D reconstruction using video (especially monocular video) data and focuses on techniques [...] Read more.
Three-dimensional (3D) reconstruction using images is one of the most significant topics in computer vision and photogrammetry, with wide-ranging applications in robotics, augmented reality, and mapping. This study investigates methods of 3D reconstruction using video (especially monocular video) data and focuses on techniques such as Structure from Motion (SfM), Multi-View Stereo (MVS), Visual Simultaneous Localization and Mapping (V-SLAM), and videogrammetry. Based on a statistical analysis of SCOPUS records, these methods collectively account for approximately 6863 journal publications up to the end of 2024. Among these, about 80 studies are analyzed in greater detail to identify trends and advancements in the field. The study also shows that the use of video data for real-time 3D reconstruction is commonly addressed through two main approaches: photogrammetry-based methods, which rely on precise geometric principles and offer high accuracy at the cost of greater computational demand; and V-SLAM methods, which emphasize real-time processing and provide higher speed. Furthermore, the application of IMU data and other indicators, such as color quality and keypoint detection, for selecting suitable frames for 3D reconstruction is investigated. Overall, this study compiles and categorizes video-based reconstruction methods, emphasizing the critical step of keyframe extraction. By summarizing and illustrating the general approaches, the study aims to clarify and facilitate the entry path for researchers interested in this area. Finally, the paper offers targeted recommendations for improving keyframe extraction methods to enhance the accuracy and efficiency of real-time video-based 3D reconstruction, while also outlining future research directions in addressing challenges like dynamic scenes, reducing computational costs, and integrating advanced learning-based techniques. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

25 pages, 3654 KB  
Project Report
Computer Vision-Based Monitoring and Data Integration in a Multi-Trophic Controlled-Environment Agriculture Demonstrator
by Frederik Werner, Till Glockow, Kai Meissner, Martin Krüger, Markus Reischl and Christof M. Niemeyer
Sustainability 2026, 18(6), 2700; https://doi.org/10.3390/su18062700 - 10 Mar 2026
Viewed by 446
Abstract
Controlled-environment agriculture (CEA) and circular production systems require coordinated monitoring of biological and physicochemical processes across trophic levels. This project report presents the implementation of a multi-trophic controlled-environment agriculture demonstrator that integrates computer-vision-based monitoring with established sensor infrastructure for aquaculture, poultry, plants, microalgae, [...] Read more.
Controlled-environment agriculture (CEA) and circular production systems require coordinated monitoring of biological and physicochemical processes across trophic levels. This project report presents the implementation of a multi-trophic controlled-environment agriculture demonstrator that integrates computer-vision-based monitoring with established sensor infrastructure for aquaculture, poultry, plants, microalgae, duckweed, and insect modules. Stereo imaging and RGB-D systems are deployed for non-invasive quantification of fish biomass and plant growth, while continuous water-quality and environmental measurements (e.g., pH, dissolved oxygen, nitrate, ammonium, temperature, CO2) provide complementary process data. These data streams are synchronized within a shared database architecture to enable cross-module evaluation of nutrient dynamics, growth progression, and operational stability under real facility conditions. The implemented framework demonstrates how computer vision can extend conventional sensor-based monitoring by directly capturing biological performance indicators across aquatic, terrestrial, and microbial domains. While advanced predictive modeling and full digital twin simulation remain future development steps, the realized data-integration architecture establishes a structural foundation for the systematic evaluation of circular indoor food-production systems. The demonstrator illustrates how multimodal monitoring can support nutrient recirculation, transparency of biological variability, and data-driven assessment within controlled multi-trophic environments. Full article
(This article belongs to the Special Issue Food Science and Engineering for Sustainability—2nd Edition)
Show Figures

Figure 1

27 pages, 3381 KB  
Article
Fusion of Stereo Matching and Spatiotemporal Interaction Analysis: A Detection Method for Excavator-Related Struck-By Hazards in Construction Sites
by Yifan Zhu, Hainan Chen, Rui Pan, Mengqi Yuan, Pan Zhang and Wen Wang
Buildings 2026, 16(5), 1002; https://doi.org/10.3390/buildings16051002 - 4 Mar 2026
Viewed by 342
Abstract
In the construction industry, struck-by accidents involving heavy equipment such as crawler excavators are a leading cause of worker fatalities and injuries. Existing vision-based hazard detection methods are limited by approximate evaluations, reliance on specific references, and neglect of spatial relationships between equipment [...] Read more.
In the construction industry, struck-by accidents involving heavy equipment such as crawler excavators are a leading cause of worker fatalities and injuries. Existing vision-based hazard detection methods are limited by approximate evaluations, reliance on specific references, and neglect of spatial relationships between equipment and workers, making them inadequate for complex dynamic construction environments. This study aims to address these limitations by proposing a precise and adaptable struck-by hazard detection method. The method integrates four core modules: object tracking via the YOLOv5-DeepSORT model to detect workers, excavators, and their key components; activity recognition to identify the operational states of excavators, working or static, and workers, driver or field worker; proximity estimation based on stereo vision using the BGNet model and camera calibration to calculate 3D spatial distances; and safety identification to assess worker safety status in real time. Validated through three virtual construction scenarios, flat ground, rugged terrain, slope, the method achieved high safety status identification accuracies of 92.71%, 90.04%, and 94.25% respectively. The results demonstrate its robustness in adapting to diverse construction environments and accurately capturing equipment–worker spatial interactions. This research expands the application scope of hazard monitoring in complex settings, enhances safety identification efficiency, and provides a reliable technical solution for improving construction site safety management. Full article
Show Figures

Figure 1

22 pages, 17599 KB  
Article
Self-Supervised 3D Cloud Motion Inversion from Ground-Based Binocular All-Sky Images
by Shan Jiang, Chen Zhang, Xu Fu, Lei Lin, Zhikuan Wang, Xingtong Li, Tianying Liu and Jifeng Song
Atmosphere 2026, 17(3), 236; https://doi.org/10.3390/atmos17030236 - 25 Feb 2026
Viewed by 461
Abstract
Addressing the challenge of stable cloud velocity field estimation under complex sky conditions in ground-based cloud imaging, this paper proposes a comprehensive 3D cloud velocity calculation framework. The methodology integrates binocular stereo vision geometry, self-supervised deep feature learning, and graph attention-based matching. First, [...] Read more.
Addressing the challenge of stable cloud velocity field estimation under complex sky conditions in ground-based cloud imaging, this paper proposes a comprehensive 3D cloud velocity calculation framework. The methodology integrates binocular stereo vision geometry, self-supervised deep feature learning, and graph attention-based matching. First, a self-supervised feature detection and description model tailored to the radiometric characteristics of cloud images is developed. By incorporating a homography adaptation strategy constrained by physical priors, the model acquires robust feature representations for weakly textured and highly deformable cloud masses without requiring labeled datasets. Subsequently, a Transformer-based graph neural network matcher is employed to establish global feature correspondences across both cross-view and cross-temporal dimensions, thereby substantially augmenting matching robustness. On this basis, the framework establishes a rigorous calibration model for fisheye cameras to derive cloud base height (CBH) via binocular geometry. These geometric constraints are then coupled with sequential feature tracking results to construct 3D velocity inversion equations, enabling an end-to-end mapping from 2D pixel coordinates to 3D physical space and providing direct estimation of physical cloud motion velocity in meters per second (m/s). The experimental results show that the proposed method extracts 4.5 times more feature points than the traditional SIFT method. Furthermore, the Pearson correlation coefficient for cloud motion trends in continuous sequences reaches 0.662 relative to baseline models, indicating good relative consistency in motion estimation. The framework achieves high-precision and stable velocity estimation across diverse cloud types, including cirrus, cumulus, stratus, and mixed clouds. Full article
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)
Show Figures

Figure 1

18 pages, 2990 KB  
Article
Research on Ship-Borne Wave Observation Experiment Based on Stereoscopic Vision
by Aolong Zhu, Kefeng Mao, Li Ding and Yan Li
Sensors 2026, 26(3), 993; https://doi.org/10.3390/s26030993 - 3 Feb 2026
Viewed by 376
Abstract
Currently, most wave observation equipment is used for fixed-point measurements, and there is a relative scarcity of ship-borne real-time wave measurement devices, which limits comprehensive and three-dimensional monitoring of wave characteristics. This paper introduces the Wave Acquisition Stereo System (WASS) and describes the [...] Read more.
Currently, most wave observation equipment is used for fixed-point measurements, and there is a relative scarcity of ship-borne real-time wave measurement devices, which limits comprehensive and three-dimensional monitoring of wave characteristics. This paper introduces the Wave Acquisition Stereo System (WASS) and describes the design and construction of a ship-borne stereoscopic vision experimental apparatus. Sea trials were conducted to evaluate the system’s ship-borne wave-measurement performance and to quantify the effect of deployment parameters on accuracy. The results indicate that the device reliably retrieves wave parameters; compared with concurrent buoy observations, the error in significant wave height did not exceed 0.14 m. Research confirms that deployment parameters have a significant impact on measurement outcomes: sampling frequency directly affects the accuracy of wave-parameter estimation; a higher sampling rate (10 Hz) improves the reliability of the calculated results. The baseline-to-height ratio has an optimal range (0.1–0.3), and values outside this interval reduce measurement accuracy. Under a fixed geometric configuration, the observation field exhibits a band-shaped low-error zone aligned with the baseline direction. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

23 pages, 6708 KB  
Article
Feasibility Domain Construction and Characterization Method for Intelligent Underground Mining Equipment Integrating ORB-SLAM3 and Depth Vision
by Siya Sun, Xiaotong Han, Hongwei Ma, Haining Yuan, Sirui Mao, Chuanwei Wang, Kexiang Ma, Yifeng Guo and Hao Su
Sensors 2026, 26(3), 966; https://doi.org/10.3390/s26030966 - 2 Feb 2026
Viewed by 448
Abstract
To address the limited environmental perception capability and the difficulty of achieving consistent and efficient representation of the workspace feasible domain caused by high dust concentration, uneven illumination, and enclosed spaces in underground coal mines, this paper proposes a digital spatial construction and [...] Read more.
To address the limited environmental perception capability and the difficulty of achieving consistent and efficient representation of the workspace feasible domain caused by high dust concentration, uneven illumination, and enclosed spaces in underground coal mines, this paper proposes a digital spatial construction and representation method for underground environments by integrating RGB-D depth vision with ORB-SLAM3. First, a ChArUco calibration board with embedded ArUco markers is adopted to perform high-precision calibration of the RGB-D camera, improving the reliability of geometric parameters under weak-texture and non-uniform lighting conditions. On this basis, a “dense–sparse cooperative” OAK-DenseMapper Pro module is further developed; the module improves point-cloud generation using a mathematical projection model, and combines enhanced stereo matching with multi-stage depth filtering to achieve high-quality dense point-cloud reconstruction from RGB-D observations. The dense point cloud is then converted into a probabilistic octree occupancy map, where voxel-wise incremental updates are performed for observed space while unknown regions are retained, enabling a memory-efficient and scalable 3D feasible-space representation. Experiments are conducted in multiple representative coal-mine tunnel scenarios; compared with the original ORB-SLAM3, the number of points in dense mapping increases by approximately 38% on average; in trajectory evaluation on the TUM dataset, the root mean square error, mean error, and median error of the absolute pose error are reduced by 7.7%, 7.1%, and 10%, respectively; after converting the dense point cloud to an octree, the map memory footprint is only about 0.5% of the original point cloud, with a single conversion time of approximately 0.75 s. The experimental results demonstrate that, while ensuring accuracy, the proposed method achieves real-time, efficient, and consistent representation of the 3D feasible domain in complex underground environments, providing a reliable digital spatial foundation for path planning, safe obstacle avoidance, and autonomous operation. Full article
Show Figures

Figure 1

Back to TopTop