Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (134)

Search Parameters:
Keywords = stereo tracking

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 4909 KiB  
Article
Rapid 3D Camera Calibration for Large-Scale Structural Monitoring
by Fabio Bottalico, Nicholas A. Valente, Christopher Niezrecki, Kshitij Jerath, Yan Luo and Alessandro Sabato
Remote Sens. 2025, 17(15), 2720; https://doi.org/10.3390/rs17152720 (registering DOI) - 6 Aug 2025
Abstract
Computer vision techniques such as three-dimensional digital image correlation (3D-DIC) and three-dimensional point tracking (3D-PT) have demonstrated broad applicability for monitoring the conditions of large-scale engineering systems by reconstructing and tracking dynamic point clouds corresponding to the surface of a structure. Accurate stereophotogrammetry [...] Read more.
Computer vision techniques such as three-dimensional digital image correlation (3D-DIC) and three-dimensional point tracking (3D-PT) have demonstrated broad applicability for monitoring the conditions of large-scale engineering systems by reconstructing and tracking dynamic point clouds corresponding to the surface of a structure. Accurate stereophotogrammetry measurements require the stereo cameras to be calibrated to determine their intrinsic and extrinsic parameters by capturing multiple images of a calibration object. This image-based approach becomes cumbersome and time-consuming as the size of the tested object increases. To streamline the calibration and make it scale-insensitive, a multi-sensor system embedding inertial measurement units and a laser sensor is developed to compute the extrinsic parameters of the stereo cameras. In this research, the accuracy of the proposed sensor-based calibration method in performing stereophotogrammetry is validated experimentally and compared with traditional approaches. Tests conducted at various scales reveal that the proposed sensor-based calibration enables reconstructing both static and dynamic point clouds, measuring displacements with an accuracy higher than 95% compared to image-based traditional calibration, while being up to an order of magnitude faster and easier to deploy. The novel approach has broad applications for making static, dynamic, and deformation measurements to transform how large-scale structural health monitoring can be performed. Full article
(This article belongs to the Special Issue New Perspectives on 3D Point Cloud (Third Edition))
Show Figures

Figure 1

21 pages, 608 KiB  
Article
A Machine Learning-Assisted Automation System for Optimizing Session Preparation Time in Digital Audio Workstations
by Bogdan Moroșanu, Marian Negru, Georgian Nicolae, Horia Sebastian Ioniță and Constantin Paleologu
Information 2025, 16(6), 494; https://doi.org/10.3390/info16060494 - 13 Jun 2025
Viewed by 624
Abstract
Modern audio production workflows often require significant manual effort during the initial session preparation phase, including track labeling, format standardization, and gain staging. This paper presents a rule-based and Machine Learning-assisted automation system designed to minimize the time required for these tasks in [...] Read more.
Modern audio production workflows often require significant manual effort during the initial session preparation phase, including track labeling, format standardization, and gain staging. This paper presents a rule-based and Machine Learning-assisted automation system designed to minimize the time required for these tasks in Digital Audio Workstations (DAWs). The system automatically detects and labels audio tracks, identifies and eliminates redundant fake stereo channels, merges double-tracked instruments into stereo pairs, standardizes sample rate and bit rate across all tracks, and applies initial gain staging using target loudness values derived from a Genetic Algorithm (GA)-based system, which optimizes gain levels for individual track types based on engineer preferences and instrument characteristics. By replacing manual setup processes with automated decision-making methods informed by Machine Learning (ML) and rule-based heuristics, the system reduces session preparation time by up to 70% in typical multitrack audio projects. The proposed approach highlights how practical automation, combined with lightweight Neural Network (NN) models, can optimize workflow efficiency in real-world music production environments. Full article
(This article belongs to the Special Issue Optimization Algorithms and Their Applications)
Show Figures

Graphical abstract

17 pages, 1922 KiB  
Article
Enhancing Visual–Inertial Odometry Robustness and Accuracy in Challenging Environments
by Alessandro Minervini, Adrian Carrio and Giorgio Guglieri
Robotics 2025, 14(6), 71; https://doi.org/10.3390/robotics14060071 - 27 May 2025
Viewed by 1660
Abstract
Visual–Inertial Odometry (VIO) algorithms are widely adopted for autonomous drone navigation in GNSS-denied environments. However, conventional monocular and stereo VIO setups often lack robustness under challenging environmental conditions or during aggressive maneuvers, due to the sensitivity of visual information to lighting, texture, and [...] Read more.
Visual–Inertial Odometry (VIO) algorithms are widely adopted for autonomous drone navigation in GNSS-denied environments. However, conventional monocular and stereo VIO setups often lack robustness under challenging environmental conditions or during aggressive maneuvers, due to the sensitivity of visual information to lighting, texture, and motion blur. In this work, we enhance an existing open-source VIO algorithm to improve both the robustness and accuracy of the pose estimation. First, we integrate an IMU-based motion prediction module to improve feature tracking across frames, particularly during high-speed movements. Second, we extend the algorithm to support a multi-camera setup, which significantly improves tracking performance in low-texture environments. Finally, to reduce the computational complexity, we introduce an adaptive feature selection strategy that dynamically adjusts the detection thresholds according to the number of detected features. Experimental results validate the proposed approaches, demonstrating notable improvements in both accuracy and robustness across a range of challenging scenarios. Full article
(This article belongs to the Section Sensors and Control in Robotics)
Show Figures

Figure 1

17 pages, 5356 KiB  
Article
A Study on the Features for Multi-Target Dual-Camera Tracking and Re-Identification in a Comparatively Small Environment
by Jong-Chen Chen, Po-Sheng Chang and Yu-Ming Huang
Electronics 2025, 14(10), 1984; https://doi.org/10.3390/electronics14101984 - 13 May 2025
Viewed by 544
Abstract
Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates [...] Read more.
Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates our exploration of some of the above challenging issues. This study uses stereo camera calibration and key point detection to reconstruct the three-dimensional key points of the person being tracked, thereby performing person-tracking tasks. The results show that the dual cameras’ 3D spatial tracking method can have a relatively better continuous monitoring effect than a single camera alone. This study adopts four ways to evaluate person similarity, which can effectively reduce the unnecessary identity generation of persons. However, using all four methods simultaneously may not produce better results than a specific assessment method alone due to differences in people’s activity situations. Full article
(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

21 pages, 13198 KiB  
Article
Infrared Bionic Compound-Eye Camera: Long-Distance Measurement Simulation and Verification
by Xiaoyu Wang, Linhan Li, Jie Liu, Zhen Huang, Yuhan Li, Huicong Wang, Yimin Zhang, Yang Yu, Xiupeng Yuan, Liya Qiu and Sili Gao
Electronics 2025, 14(7), 1473; https://doi.org/10.3390/electronics14071473 - 6 Apr 2025
Cited by 1 | Viewed by 560
Abstract
To achieve rapid distance estimation and tracking of moving targets in a large field of view, this paper proposes an innovative simulation method. Using a low-cost approach, the imaging and distance measurement performance of the designed cooling-type mid-wave infrared compound-eye camera (CM-CECam) is [...] Read more.
To achieve rapid distance estimation and tracking of moving targets in a large field of view, this paper proposes an innovative simulation method. Using a low-cost approach, the imaging and distance measurement performance of the designed cooling-type mid-wave infrared compound-eye camera (CM-CECam) is experimentally evaluated. The compound-eye camera consists of a small-lens array with a spherical shell, a relay optical system, and a cooling-type mid-wave infrared detector. Based on the spatial arrangement of the small-lens array, a precise simulation imaging model for the compound-eye camera is developed, constructing a virtual imaging space. Distance estimation and error analysis for virtual targets are performed using the principle of stereo disparity. This universal simulation method provides a foundation for spatial design and image-plane adjustments for compound-eye cameras with specialized structures. Using the raw images captured by the compound-eye camera, a scene-specific piecewise linear mapping method is applied. This method significantly reduces the brightness contrast differences between sub-images during wide-field observations, enhancing image details. For the fast detection of moving targets, ommatidia clusters are defined as the minimal spatial constraint units. Local information at the centers of these constraint units is prioritized for processing. This approach replaces traditional global detection methods, improving the efficiency of subsequent processing. Finally, the simulated distance measurement results are validated using real-world scene data. Full article
Show Figures

Figure 1

25 pages, 6410 KiB  
Article
Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
by Zongcheng Zuo, Yuanxiang Li, Yu Zhou and Fan Mo
Sensors 2025, 25(7), 2233; https://doi.org/10.3390/s25072233 - 2 Apr 2025
Viewed by 990
Abstract
Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional [...] Read more.
Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets. Full article
Show Figures

Figure 1

42 pages, 3747 KiB  
Review
A Critical Review of Methods and Techniques Used for Monitoring Deformations in Wooden Panel Paintings
by Claudia Gagliardi, Lorenzo Riparbelli, Paola Mazzanti and Marco Fioravanti
Forests 2025, 16(3), 546; https://doi.org/10.3390/f16030546 - 19 Mar 2025
Cited by 1 | Viewed by 490
Abstract
Wooden panel paintings (WPPs) are among the most significant historical artworks that must be preserved for future generations. Ensuring their long-term conservation requires a comprehensive characterization of their condition, making monitoring an essential process. Thus, the primary objective of this study is to [...] Read more.
Wooden panel paintings (WPPs) are among the most significant historical artworks that must be preserved for future generations. Ensuring their long-term conservation requires a comprehensive characterization of their condition, making monitoring an essential process. Thus, the primary objective of this study is to provide a comprehensive overview of the current techniques employed to study support deformations in WPPs, categorizing them into localized and full-field methods. Specifically, we provide information about linear potentiometric transducers, the Deformometric Kit, and Fiber Bragg Grating sensors as techniques that provide information about specific and isolated points on the artwork’s surface. On the other hand, digital image correlation, stereo-correlation, mark-tracking, 3D modeling techniques, and the moiré method, are discussed as techniques that analyze the entire surface or a significant part of the artwork. Each method has advantages and limitations, depending on the type of monitoring needed and the desired information. Nevertheless, these techniques contribute to understanding the behavior of the artworks’ materials under environmental fluctuations or restoration interventions, aiding the development of targeted and effective conservation strategies. Furthermore, this study seeks to evaluate the effectiveness of these methods in various conservation contexts and offers practical guidelines to assist conservators and researchers in selecting the most appropriate approach to support the long-term conservation of these invaluable historical artworks. Full article
(This article belongs to the Section Wood Science and Forest Products)
Show Figures

Figure 1

19 pages, 26378 KiB  
Article
2D to 3D Human Skeleton Estimation Based on the Brown Camera Distortion Model and Constrained Optimization
by Lan Ma and Hua Huo
Electronics 2025, 14(5), 960; https://doi.org/10.3390/electronics14050960 - 27 Feb 2025
Viewed by 1396
Abstract
In the rapidly evolving field of computer vision and machine learning, 3D skeleton estimation is critical for applications such as motion analysis and human–computer interaction. While stereo cameras are commonly used to acquire 3D skeletal data, monocular RGB systems attract attention due to [...] Read more.
In the rapidly evolving field of computer vision and machine learning, 3D skeleton estimation is critical for applications such as motion analysis and human–computer interaction. While stereo cameras are commonly used to acquire 3D skeletal data, monocular RGB systems attract attention due to benefits including cost-effectiveness and simple deployment. However, persistent challenges remain in accurately inferring depth from 2D images and reconstructing 3D structures using monocular approaches. The current 2D to 3D skeleton estimation methods overly rely on deep training of datasets, while neglecting the importance of human intrinsic structure and the principles of camera imaging. To address this, this paper introduces an innovative 2D to 3D gait skeleton estimation method that leverages the Brown camera distortion model and constrained optimization. Utilizing the Azure Kinect depth camera for capturing gait video, the Azure Kinect Body Tracking SDK was employed to effectively extract 2D and 3D joint positions. The camera’s distortion properties were analyzed, using the Brown camera distortion model which is suitable for this scenario, and iterative methods to compensate the distortion of 2D skeleton joints. By integrating the geometric constraints of the human skeleton, an optimization algorithm was analyzed to achieve precise 3D joint estimations. Finally, the framework was validated through comparisons between the estimated 3D joint coordinates and corresponding measurements captured by depth sensors. Experimental evaluations confirmed that this training-free approach achieved superior precision and stability compared to conventional methods. Full article
(This article belongs to the Special Issue 3D Computer Vision and 3D Reconstruction)
Show Figures

Figure 1

35 pages, 37221 KiB  
Article
Target Ship Recognition and Tracking with Data Fusion Based on Bi-YOLO and OC-SORT Algorithms for Enhancing Ship Navigation Assistance
by Shuai Chen, Miao Gao, Peiru Shi, Xi Zeng and Anmin Zhang
J. Mar. Sci. Eng. 2025, 13(2), 366; https://doi.org/10.3390/jmse13020366 - 16 Feb 2025
Cited by 1 | Viewed by 1558
Abstract
With the ever-increasing volume of maritime traffic, the risks of ship navigation are becoming more significant, making the use of advanced multi-source perception strategies and AI technologies indispensable for obtaining information about ship navigation status. In this paper, first, the ship tracking system [...] Read more.
With the ever-increasing volume of maritime traffic, the risks of ship navigation are becoming more significant, making the use of advanced multi-source perception strategies and AI technologies indispensable for obtaining information about ship navigation status. In this paper, first, the ship tracking system was optimized using the Bi-YOLO network based on the C2f_BiFormer module and the OC-SORT algorithms. Second, to extract the visual trajectory of the target ship without a reference object, an absolute position estimation method based on binocular stereo vision attitude information was proposed. Then, a perception data fusion framework based on ship spatio-temporal trajectory features (ST-TF) was proposed to match GPS-based ship information with corresponding visual target information. Finally, AR technology was integrated to fuse multi-source perceptual information into the real-world navigation view. Experimental results demonstrate that the proposed method achieves a mAP0.5:0.95 of 79.6% under challenging scenarios such as low resolution, noise interference, and low-light conditions. Moreover, in the presence of the nonlinear motion of the own ship, the average relative position error of target ship visual measurements is maintained below 8%, achieving accurate absolute position estimation without reference objects. Compared to existing navigation assistance, the AR-based navigation assistance system, which utilizes ship ST-TF-based perception data fusion mechanism, enhances ship traffic situational awareness and provides reliable decision-making support to further ensure the safety of ship navigation. Full article
Show Figures

Figure 1

17 pages, 4402 KiB  
Article
Quality Evaluation for Colored Point Clouds Produced by Autonomous Vehicle Sensor Fusion Systems
by Colin Schaefer, Zeid Kootbally and Vinh Nguyen
Sensors 2025, 25(4), 1111; https://doi.org/10.3390/s25041111 - 12 Feb 2025
Cited by 1 | Viewed by 762
Abstract
Perception systems for autonomous vehicles (AVs) require various types of sensors, including light detection and ranging (LiDAR) and cameras, to ensure their robustness in driving scenarios and weather conditions. The data from these sensors are fused together to generate maps of the surrounding [...] Read more.
Perception systems for autonomous vehicles (AVs) require various types of sensors, including light detection and ranging (LiDAR) and cameras, to ensure their robustness in driving scenarios and weather conditions. The data from these sensors are fused together to generate maps of the surrounding environment and provide information for the detection and tracking of objects. Hence, evaluation methods are necessary to compare existing and future sensor systems through quantifiable measurements given the wide range of sensor models and design choices. This paper presents an evaluation method to compare colored point clouds, a common fused data type, among two LiDAR–camera fusion systems and a stereo camera setup. The evaluation approach uses a test artifact measured by the fusion system’s colored point cloud through the spread, area coverage, and color difference of the colored points within the computed space. The test results showed the evaluation approach was able to rank the sensor fusion systems based on its metrics and complement the experimental observations. The proposed evaluation methodology is, therefore, suitable towards the comparison of generated colored point clouds by sensor fusion systems. Full article
Show Figures

Figure 1

17 pages, 7941 KiB  
Article
Visual Localization Domain for Accurate V-SLAM from Stereo Cameras
by Eleonora Di Salvo, Sara Bellucci, Valeria Celidonio, Ilaria Rossini, Stefania Colonnese and Tiziana Cattai
Sensors 2025, 25(3), 739; https://doi.org/10.3390/s25030739 - 26 Jan 2025
Cited by 2 | Viewed by 1152
Abstract
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. [...] Read more.
Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. Specifically, we propose to perform V-SLAM in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant feature are suitably represented for analysis and tracking. This transformed domain adheres to information-theoretic principles, enabling a maximum likelihood estimation of rotation, translation, and scaling parameters by minimizing the distance between the coefficients of the observed image and those of a reference template. The transformed coefficients are obtained from the output of specialized Circular Harmonic Function (CHF) filters of varying orders. Leveraging this property, we employ a first-order approximation of the image-series representation, directly computing the first-order coefficients through the application of first-order CHF filters. The proposed VILD provides a theoretically grounded and visually relevant representation of the image. We utilize VILD for point matching and tracking across the stereo video sequence. The experimental results on real-world video datasets demonstrate that integrating visually-driven filtering significantly improves trajectory estimation accuracy compared to traditional tracking performed in the spatial domain. Full article
(This article belongs to the Special Issue Emerging Advances in Wireless Positioning and Location-Based Services)
Show Figures

Figure 1

18 pages, 12334 KiB  
Article
Canopy Height Integration for Precise Forest Aboveground Biomass Estimation in Natural Secondary Forests of Northeast China Using Gaofen-7 Stereo Satellite Data
by Caixia Liu, Huabing Huang, Zhiyu Zhang, Wenyi Fan and Di Wu
Remote Sens. 2025, 17(1), 47; https://doi.org/10.3390/rs17010047 - 27 Dec 2024
Cited by 1 | Viewed by 1126
Abstract
Accurate estimates of forest aboveground biomass (AGB) are necessary for the accurate tracking of forest carbon stock. Gaofen-7 (GF-7) is the first civilian sub-meter three-dimensional (3D) mapping satellite from China. It is equipped with a laser altimeter system and a dual-line array stereoscopic [...] Read more.
Accurate estimates of forest aboveground biomass (AGB) are necessary for the accurate tracking of forest carbon stock. Gaofen-7 (GF-7) is the first civilian sub-meter three-dimensional (3D) mapping satellite from China. It is equipped with a laser altimeter system and a dual-line array stereoscopic mapping camera, which enables it to synchronously generate full-waveform LiDAR data and stereoscopic images. The bulk of existing research has examined how accurate GF-7 is for topographic measurements of bare land or canopy height. The measurement of forest aboveground biomass has not received as much attention as it deserves. This study aimed to assess the GF-7 stereo imaging capability, displayed as topographic features for aboveground biomass estimation in forests. The aboveground biomass model was constructed using the random forest machine learning technique, which was accomplished by combining the use of in situ field measurements, pairs of GF-7 stereo images, and the corresponding generated canopy height model (CHM). Findings showed that the biomass estimation model had an accuracy of R2 = 0.76, RMSE = 7.94 t/ha, which was better than the inclusion of forest canopy height (R2 = 0.30, RMSE = 21.02 t/ha). These results show that GF-7 has considerable application potential in gathering large-scale high-precision forest aboveground biomass using a restricted amount of field data. Full article
Show Figures

Figure 1

19 pages, 13755 KiB  
Article
A Dynamic Measurement System Based on Adaptive Clustering and Multi-Classifier
by Bowen Shi, Hongjian You and Huixian Wang
Appl. Sci. 2025, 15(1), 81; https://doi.org/10.3390/app15010081 - 26 Dec 2024
Viewed by 636
Abstract
This technology is highly suitable for detecting and tracking multi-skeletal targets in the field of aerial refueling. The paper presents a target feature responder based on K-means clustering, used for categorizing samples and training models. It employs a decision function to optimize the [...] Read more.
This technology is highly suitable for detecting and tracking multi-skeletal targets in the field of aerial refueling. The paper presents a target feature responder based on K-means clustering, used for categorizing samples and training models. It employs a decision function to optimize the localization region across various classifier detection and tracking algorithms for targets. Additionally, a state judgment module scores the classifiers based on target state and depth information, which allows for adaptive selection of the classifier. To address the issue of missing target information, the method incorporates a stereo-vision-based mechanism to complete the localization region. This approach effectively handles challenges related to target appearance deformation, significant scale variations, motion blur, and occlusions. Full article
Show Figures

Figure 1

29 pages, 34806 KiB  
Article
An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
by Ming Him Lui, Haixu Liu, Zhuochen Tang, Hang Yuan, David Williams, Dongjin Lee, K. C. Wong and Zihao Wang
Eng 2024, 5(4), 3488-3516; https://doi.org/10.3390/eng5040182 - 20 Dec 2024
Cited by 2 | Viewed by 2721
Abstract
This article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in supporting real-time target [...] Read more.
This article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in supporting real-time target differentiation and ease of deployment. Based on the principle of knowledge distillation, a novel data augmentation method is proposed to coordinate the latest open-source pre-trained large models in semantic segmentation, text generation, and image generation tasks to train a BicycleGAN for image enhancement. The resulting dataset is tested on various model structures and backbone sizes of two mainstream object detection frameworks, Ultralytics’ YOLO and MMDetection. Additionally, the algorithm implements and compares two popular object trackers, Bot-SORT and ByteTrack. The experimental proof-of-concept deploys the YOLOv8n model, which achieves an average precision of 82.2% and an inference time of 0.6 ms. Alternatively, the YOLO11x model maximises average precision at 86.7% while maintaining an inference time of 9.3 ms without bottlenecking subsequent processes. Stereo vision achieves accuracy within a median error of 90 mm following a drone flying over 1 m/s in an 8 m × 4 m area of interest. Stable single-object tracking with the PTZ camera is successful at 15 fps with an accuracy of 92.58%. Full article
(This article belongs to the Special Issue Feature Papers in Eng 2024)
Show Figures

Figure 1

17 pages, 4264 KiB  
Article
Interactive Modelling in Augmented Reality with Subdivision Surfaces and Advanced User Gesture Recognition
by Alessio Cellupica, Marco Cirelli, Oliviero Giannini and Pier Paolo Valentini
Appl. Sci. 2024, 14(24), 11873; https://doi.org/10.3390/app142411873 - 19 Dec 2024
Viewed by 1254
Abstract
The paper discusses an integrated methodology to implement an interactive augmented reality 3D modelling environment with natural interaction, empowered by real-time gesture recognition. The methodology is developed from a geometry-sculpting algorithm based on the use of the subdivision surfaces approach to combine the [...] Read more.
The paper discusses an integrated methodology to implement an interactive augmented reality 3D modelling environment with natural interaction, empowered by real-time gesture recognition. The methodology is developed from a geometry-sculpting algorithm based on the use of the subdivision surfaces approach to combine the ease and versatility of interactive modelling even of complex shapes, while maintaining high geometric continuity and smoothness. The interaction with the deformable elements of the geometry’s control cage to be divided uses an optimised version of the Grasp Active Feature/Object Active Feature algorithm developed from hand tracking and gesture recognition based on zero-invasive stereo-infrared techniques. Modelling, combined with an augmented reality environment, allows the modification of geometries having real objects as a reference and, in any case, a general spatial awareness during activities. The methodology was implemented and tested using an advanced mixed-reality headset, the Varjo XR-4, with hi-resolution pass-through and a second-generation Ultraleap for accurate and precise hand tracking. Full article
Show Figures

Figure 1

Back to TopTop