Topic Editors

Dr. Yugang Liu
Department of Electrical and Computer Engineering, Royal Military College of Canada, Kingston, ON K7K 7B4, Canada
Prof. Dr. Sidney Givigi
School of Computing, Queen’s University, 557 Goodwin Hall, Kingston, ON K7L 2N8, Canada

Advances in Robot Vision Perception and Control Technology

Abstract submission deadline
31 January 2027
Manuscript submission deadline
30 April 2027
Viewed by
8800

Topic Information

Dear Colleagues,

Over the past few decades, the robotics industry has witnessed incredible growth. The robotics market, in particular, the autonomous robots market, will continue to expand at remarkable speed. Unlike traditional robotic manipulators, which perform labor-intensive tasks in structured factory settings, modern robots are required to work alongside human beings. In order to succeed in these uncontrolled settings, modern robots must have the ability to understand the surrounding environment and control their actions without continuous human intervention. In other words, robotic perception and control play a vital role for autonomous robots in unstructured human environments. Similar to human eyes, cameras provide a robot with abundant information, allowing the robot to understand its location, detect obstacles, find objects of interest, etc. While promising, robot vision perception and control are underexplored topics, and there remain numerous technical challenges to address. The following Topic provides researchers with a platform to share their research insights on the theoretical analysis and applications of robot vision in practical experiments. Topics of interest include, but are not limited to:

  • Visual SLAM
  • Visual odometry
  • Visual serving
  • Visual tracking
  • Vision-based object detection
  • Machine learning techniques with application to robot vision
  • Vision-based obstacle avoidance
  • Vision-based robotic manipulation
  • Computer vision with application to robotics

Dr. Yugang Liu
Prof. Dr. Sidney Givigi
Topic Editors

Keywords

  • visual SLAM
  • visual odometry
  • visual serving
  • visual tracking
  • vision-based object detection

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
AI
ai
5.0 6.9 2020 19.2 Days CHF 1800 Submit
Applied Sciences
applsci
2.5 5.5 2011 16 Days CHF 2400 Submit
Electronics
electronics
2.6 6.1 2012 16.4 Days CHF 2400 Submit
Machines
machines
2.5 4.7 2013 17.6 Days CHF 2400 Submit
Robotics
robotics
3.3 7.7 2012 23.7 Days CHF 1800 Submit

Preprints.org is a multidisciplinary platform offering a preprint service designed to facilitate the early sharing of your research. It supports and empowers your research journey from the very beginning.

MDPI Topics is collaborating with Preprints.org and has established a direct connection between MDPI journals and the platform. Authors are encouraged to take advantage of this opportunity by posting their preprints at Preprints.org prior to publication:

  1. Share your research immediately: disseminate your ideas prior to publication and establish priority for your work.
  2. Safeguard your intellectual contribution: Protect your ideas with a time-stamped preprint that serves as proof of your research timeline.
  3. Boost visibility and impact: Increase the reach and influence of your research by making it accessible to a global audience.
  4. Gain early feedback: Receive valuable input and insights from peers before submitting to a journal.
  5. Ensure broad indexing: Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (10 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
45 pages, 46439 KB  
Review
Review of Humanoid Robotic Astronauts for Space Missions
by Liping Fang, Jun Zhang, Liang Tang and Quan Hu
Appl. Sci. 2026, 16(10), 5032; https://doi.org/10.3390/app16105032 - 18 May 2026
Viewed by 265
Abstract
As human space missions become longer and more autonomous, robots are expected to assume broader responsibilities in inspection, maintenance, logistics, scientific support, and crew assistance. Among available robot forms, humanoid robotic astronauts are especially relevant because their anthropomorphic embodiment is compatible with human-centered [...] Read more.
As human space missions become longer and more autonomous, robots are expected to assume broader responsibilities in inspection, maintenance, logistics, scientific support, and crew assistance. Among available robot forms, humanoid robotic astronauts are especially relevant because their anthropomorphic embodiment is compatible with human-centered habitats, tools, interfaces, and procedures. Their deployment in orbital and planetary environments, however, introduces challenges that differ from those of terrestrial humanoids, including floating-base dynamics, intermittent contact, whole-body coordination, constrained perception, and delayed supervision. This review contributes a mission-oriented and astronaut-centered synthesis of humanoid robotic astronauts, distinguishing itself from platform-by-platform or morphology-only surveys. It treats these systems as mission-compatible embodied agents whose feasibility depends on the coupling among mission context, morphology, contact behavior, perception, autonomy, and validation evidence. The primary goals are threefold: to classify representative platforms according to mission context, to synthesize the core technical foundations required for mission-compatible operation, and to identify cross-cutting deployment bottlenecks and benchmarking priorities for future development. Representative systems are organized into intravehicular assistance, extravehicular operations and on-orbit servicing, and surface exploration or transitional scenarios, showing how mission demands shape embodiment, mobility, manipulation, autonomy, and validation strategies. This review further summarizes recent progress in microgravity dynamics and contact mechanics, multimodal perception and scene understanding, whole-body motion planning and control, teleoperation and supervised autonomy, and evaluation and benchmarking methods. The analysis indicates that humanoid robotic astronauts are not simple extensions of terrestrial humanoids but astronaut-oriented embodied systems for mission-constrained environments. Three priorities are identified for future development: contact-rich whole-body intelligence under support transitions, delay-tolerant supervised autonomy with explicit authority handoff, and systematic benchmarking pipelines that connect simulation, ground analogs, short-duration microgravity tests, human-in-the-loop trials, and mission-context demonstrations. Full article
Show Figures

Figure 1

14 pages, 8630 KB  
Article
Targetless Multi-LiDAR Extrinsic Calibration via Structural Planar Features and Globally Consistent Pose Graph Optimization
by Xuan Ren, Liang Gong and Chengliang Liu
Electronics 2026, 15(10), 2122; https://doi.org/10.3390/electronics15102122 - 15 May 2026
Viewed by 137
Abstract
Accurate extrinsic calibration among multiple heterogeneous Light Detection and Ranging (LiDAR) sensors is essential for autonomous vehicle perception systems, yet remains challenging in distributed topologies where overlap exists only between adjacent sensor pairs. Existing methods often assume a central LiDAR with direct field-of-view [...] Read more.
Accurate extrinsic calibration among multiple heterogeneous Light Detection and Ranging (LiDAR) sensors is essential for autonomous vehicle perception systems, yet remains challenging in distributed topologies where overlap exists only between adjacent sensor pairs. Existing methods often assume a central LiDAR with direct field-of-view overlap to all others and suffer from error accumulation in sequential pairwise registration. This paper presents a targetless, motionless multi-LiDAR extrinsic calibration framework that is topology-agnostic and resolves error accumulation through global optimization. The method integrates (1) Random Sample Consensus (RANSAC)-based planar patch extraction with a dual-criterion normal-guided matching strategy, (2) robust coarse alignment via TEASER++, and (3) pose graph optimization with analytically derived edge weights from Generalized Iterative Closest Point (GICP) covariance matrices. The use of structural planar primitives rather than local point descriptors overcomes density-dependent matching failures inherent to heterogeneous sensor pairs, while global pose graph optimization eliminates the cumulative error propagation of sequential pairwise approaches. Validation is performed on three distinct real-world configurations: a six-LiDAR autonomous port truck (ring topology), the four-LiDAR EDGAR research vehicle (distributed topology), and a three-LiDAR benchmark from the OpenCalib toolbox. The proposed method consistently outperforms state-of-the-art baselines, achieving 0.021 m translation Root Mean Square Error (RMSE) and 0.36° rotation RMSE on the port dataset, with full calibration completed in under 2 s on CPU—enabling rapid in-situ recalibration without requiring dedicated facilities or vehicle motion. Full article
Show Figures

Figure 1

35 pages, 5718 KB  
Article
A Vision-Guided Active Crack Alignment Framework for Small-Diameter Pipe Inspection Robots
by Yujie Shi, Masato Mizukami, Naohiko Hanajima and Yoshinori Fujihira
Machines 2026, 14(5), 516; https://doi.org/10.3390/machines14050516 - 7 May 2026
Viewed by 236
Abstract
Inspection inside small-diameter pipelines is difficult because the narrow interior space limits the field of view of onboard cameras. Even when a crack is successfully detected, it may still appear near the image boundary rather than in a suitable position for observation. To [...] Read more.
Inspection inside small-diameter pipelines is difficult because the narrow interior space limits the field of view of onboard cameras. Even when a crack is successfully detected, it may still appear near the image boundary rather than in a suitable position for observation. To address this issue, this study proposes a vision-guided active crack alignment framework for small-diameter pipe inspection robots. The proposed framework uses a YOLOv5s detector to identify the crack region and extract the center of the detected bounding box. The positional difference between the crack center and the image center is defined as the image-plane alignment error. After low-pass filtering, this error is converted into actuator-side reference input through a pixel-to-motor mapping, and a PID-based closed-loop controller is used to regulate a local camera adjustment mechanism so that the detected crack region moves toward the image center. The framework is evaluated mainly through simulation, including controller comparison, different initial offset conditions, parameter sensitivity analysis, robustness tests under visual fluctuation and mapping uncertainty, and an ablation study. The controller comparison shows that all tested PID-based controllers achieve stable convergence, while the fuzzy PID controller provides the best overall performance among the tested cases in terms of settling time, steady-state error, and RMS error. The framework also remains stable under different crack positions and moderate uncertainty conditions. In addition, a preliminary laboratory-scale physical consistency test is conducted to examine whether the convergence tendency observed in simulation can also be reproduced under real visual feedback and actuator response. The preliminary physical results show a convergence tendency consistent with the simulation trend, thereby providing initial support for the practical implementability of the proposed detection-driven alignment concept. Complete integration with an in-pipe robot platform and validation under realistic pipe environments remain future work. Full article
Show Figures

Figure 1

62 pages, 10380 KB  
Review
Semantic SLAM with Multi-Modal Perception: Survey on Robust Long-Term Localization for Autonomous Vehicles
by Álvaro Navarro-Pérez, Bladimir Bacca-Cortés and Eduardo Caicedo-Bravo
Robotics 2026, 15(5), 88; https://doi.org/10.3390/robotics15050088 - 28 Apr 2026
Viewed by 1007
Abstract
Long-term localization in dynamic and changing environments remains a key challenge for autonomous vehicles. Semantic Simultaneous Localization and Mapping (SLAM) enhances traditional SLAM by integrating high-level semantic understanding, enabling robust mapping and localization even under complex scenarios. In this context, multi-modal sensor fusion—particularly [...] Read more.
Long-term localization in dynamic and changing environments remains a key challenge for autonomous vehicles. Semantic Simultaneous Localization and Mapping (SLAM) enhances traditional SLAM by integrating high-level semantic understanding, enabling robust mapping and localization even under complex scenarios. In this context, multi-modal sensor fusion—particularly the combination of LiDAR and camera data—has proven essential in leveraging complementary strengths: the geometric accuracy of LiDAR and the rich semantic cues from images. A significant advancement in this domain is the adoption of graph-based semantic localization frameworks, where semantic entities and spatial relationships are encoded in graph structures to improve map consistency, loop closure detection, and data association over time. This review presents a comprehensive survey of recent developments in Semantic SLAM, with a focus on long-term localization for autonomous vehicles using multi-modal fusion strategies. We categorize existing methods into traditional SLAM, vision-based, point-cloud-based, and graph-based techniques, emphasizing the role of semantic data association and loop closure in maintaining long-term consistency. Additionally, we discuss the integration of deep learning techniques for semantic segmentation and feature extraction. Finally, we analyze widely used datasets and evaluation metrics, identifying current limitations and proposing directions for future research on robust, scalable, and semantically enriched localization. Full article
Show Figures

Graphical abstract

29 pages, 2959 KB  
Article
A Diffusion-Augmented GWO-TCN-PSA Method for Real-Time Inverse Kinematics in Robotic Manipulator Applications
by Baiyang Wang, Xiangxiao Zeng, Ming Fang, Fang Li and Hongjun Wang
Electronics 2026, 15(8), 1688; https://doi.org/10.3390/electronics15081688 - 16 Apr 2026
Viewed by 330
Abstract
This paper presents an efficient inverse kinematics (IK) solution for robotic manipulators, addressing the challenges of high computational complexity, low efficiency, and sensitivity to singularities associated with traditional methods. A data augmentation strategy is introduced, utilizing an enhanced Diffusion-TS model to generate diverse [...] Read more.
This paper presents an efficient inverse kinematics (IK) solution for robotic manipulators, addressing the challenges of high computational complexity, low efficiency, and sensitivity to singularities associated with traditional methods. A data augmentation strategy is introduced, utilizing an enhanced Diffusion-TS model to generate diverse joint-angle samples and corresponding end-effector poses through forward kinematics, thereby creating a high-quality dataset. To improve real-time performance, a Temporal Convolutional Network (TCN) model is developed, optimized using the Grey Wolf Optimizer (GWO), and augmented with a probabilistic sparse attention mechanism to effectively capture key pose features. Experimental evaluations on the Jaka MiniCobo robotic arm demonstrate that the proposed method significantly reduces inference time while maintaining high accuracy, making it suitable for real-world applications that demand both speed and precision. Full article
Show Figures

Figure 1

32 pages, 1329 KB  
Review
Deep Learning-Based Gaze Estimation: A Review
by Ahmed A. Abdelrahman, Basheer Al-Tawil and Ayoub Al-Hamadi
Robotics 2026, 15(4), 69; https://doi.org/10.3390/robotics15040069 - 25 Mar 2026
Viewed by 1570
Abstract
Gaze estimation, a critical facet of understanding user intent and enhancing human–computer interaction, has seen substantial advancements with the integration of deep learning technologies. Despite the progress, the application of deep learning in gaze estimation presents unique challenges, notably in the adaptation and [...] Read more.
Gaze estimation, a critical facet of understanding user intent and enhancing human–computer interaction, has seen substantial advancements with the integration of deep learning technologies. Despite the progress, the application of deep learning in gaze estimation presents unique challenges, notably in the adaptation and optimization of these models for precise gaze tracking. This paper conducts a thorough review of recent developments in deep learning-based gaze estimation, with a particular focus on the evolution from traditional methods to sophisticated appearance-based techniques. We examine the key components of successful gaze estimation systems, including input feature processing, neural network architectures, and the importance of data preprocessing in achieving high accuracy. Our analysis extends to a comprehensive comparison of existing methods, shedding light on their effectiveness and limitations within various implementation contexts. Through this systematic review, we aim to consolidate existing knowledge in the field, identify gaps in current research, and suggest directions for future investigation. By providing a clear overview of the state-of-the-art in gaze estimation and discussing ongoing challenges and potential solutions, our work seeks to inspire further innovation and progress in developing more accurate and efficient gaze estimation systems. Full article
Show Figures

Figure 1

42 pages, 16954 KB  
Article
Energy-Efficient Motion Planning for Repetitive Industrial Tasks: An Adaptive Obstacle Modeling Approach
by Zhitao Yang and Likun Hu
Appl. Sci. 2026, 16(6), 2842; https://doi.org/10.3390/app16062842 - 16 Mar 2026
Viewed by 494
Abstract
Efficient operation of robotic manipulators in repetitive industrial tasks, such as welding and logistics sorting, requires careful coordination of obstacle representation and motion planning. Traditional methods, such as axis-aligned bounding boxes, generate overly conservative trajectories, while highly detailed models impose excessive computational burden, [...] Read more.
Efficient operation of robotic manipulators in repetitive industrial tasks, such as welding and logistics sorting, requires careful coordination of obstacle representation and motion planning. Traditional methods, such as axis-aligned bounding boxes, generate overly conservative trajectories, while highly detailed models impose excessive computational burden, both increasing cumulative energy consumption in long-duration operations. This paper presents an adaptive sphere-based obstacle modeling framework integrated with energy-aware motion planning for repetitive manipulation tasks. The proposed method employs an improved Whale Optimization Algorithm with nonlinear parameter adjustment and elite guidance mechanisms to generate compact sphere representations through adaptive voxelization. Experimental validation using a 6-DOF UR5 manipulator demonstrates substantial performance improvements over conventional AABB models, achieving 31–66% energy reduction and 12.5–37% shorter configuration-space paths, with competitive modeling efficiency (2.63–3.34 s) compared to 11 metaheuristic algorithms. The framework provides a systematic methodology for integrating obstacle modeling with motion planning, particularly suitable for applications where cumulative energy savings are critical in repetitive operations. Full article
Show Figures

Figure 1

25 pages, 3703 KB  
Article
An RBF-L1-WBC Approach for Bipedal Wheeled Robots
by Renyi Zhou, Yisheng Guan, Xiaoqun Chen, Haobin Zhu, Qianwen Cao, Guangcai Ma, Tie Zhang and Shouyan Chen
Machines 2026, 14(2), 229; https://doi.org/10.3390/machines14020229 - 15 Feb 2026
Viewed by 743
Abstract
Bipedal wheeled robots combine the advantages of wheeled mobility and legged agility, enabling high-speed locomotion and obstacle negotiation in complex environments. However, their dynamic behavior is inherently unstable and highly coupled, making robust control particularly challenging in the presence of task conflicts, external [...] Read more.
Bipedal wheeled robots combine the advantages of wheeled mobility and legged agility, enabling high-speed locomotion and obstacle negotiation in complex environments. However, their dynamic behavior is inherently unstable and highly coupled, making robust control particularly challenging in the presence of task conflicts, external disturbances, and modeling uncertainties. This paper proposes an RBF–L1–WBC framework that integrates L1 adaptive control to compensate for model inaccuracies and disturbances, radial basis function (RBF) neural networks to approximate nonlinear variations in linear quadratic regulator (LQR) gains, and whole-body control (WBC) to coordinate multiple tasks while mitigating control conflicts. Experimental findings confirm that the proposed methodology yields statistically significant improvements in both attitude regulation precision and velocity tracking accuracy, surpassing the performance of benchmark controllers including classical LQR, adaptive LQR, and classical Virtual Model Control (VMC). Full article
Show Figures

Figure 1

22 pages, 18926 KB  
Article
Fixed-Time and Prescribed-Time Image-Based Visual Servoing with Asymmetric Time-Varying Output Constraint
by Jianfei Lin, Lei Ma, Deqing Huang, Na Qin, Yilin Chen, Yutao Wang and Dongrui Wang
Robotics 2025, 14(12), 190; https://doi.org/10.3390/robotics14120190 - 16 Dec 2025
Cited by 1 | Viewed by 879
Abstract
This paper addresses image-based visual servoing with the field-of-view limitation of the camera. A novel control method is proposed with dual constraints based on fixed-time and prescribed-time convergence. With the introduction of a prescribed-time performance function and an asymmetric barrier Lyapunov function, asymmetric [...] Read more.
This paper addresses image-based visual servoing with the field-of-view limitation of the camera. A novel control method is proposed with dual constraints based on fixed-time and prescribed-time convergence. With the introduction of a prescribed-time performance function and an asymmetric barrier Lyapunov function, asymmetric time-varying output constraints are achieved. This ensures that the image features remain within the predefined range, thereby addressing the field-of-view constraint problem in visual servoing applications. The combination of the prescribed-time performance function and the fixed-time stability theory ensures that the tracking error converges to a predetermined range within a prescribed time. Furthermore, it can converge to zero in fixed time, thus significantly improving the error convergence rates. The effectiveness and superiority of the method are demonstrated through physical experiments. Moreover, a case study of a contact network component bolt alignment task, aiming at automatically aligning a sleeve to a bolt, is carried out to demonstrate the applicability of the proposed method in practice. Full article
Show Figures

Figure 1

43 pages, 1528 KB  
Article
Adaptive Sign Language Recognition for Deaf Users: Integrating Markov Chains with Niching Genetic Algorithm
by Muslem Al-Saidi, Áron Ballagi, Oday Ali Hassen and Saad M. Darwish
AI 2025, 6(8), 189; https://doi.org/10.3390/ai6080189 - 15 Aug 2025
Viewed by 1965
Abstract
Sign language recognition (SLR) plays a crucial role in bridging the communication gap between deaf individuals and the hearing population. However, achieving subject-independent SLR remains a significant challenge due to variations in signing styles, hand shapes, and movement patterns among users. Traditional Markov [...] Read more.
Sign language recognition (SLR) plays a crucial role in bridging the communication gap between deaf individuals and the hearing population. However, achieving subject-independent SLR remains a significant challenge due to variations in signing styles, hand shapes, and movement patterns among users. Traditional Markov Chain-based models struggle with generalizing across different signers, often leading to reduced recognition accuracy and increased uncertainty. These limitations arise from the inability of conventional models to effectively capture diverse gesture dynamics while maintaining robustness to inter-user variability. To address these challenges, this study proposes an adaptive SLR framework that integrates Markov Chains with a Niching Genetic Algorithm (NGA). The NGA optimizes the transition probabilities and structural parameters of the Markov Chain model, enabling it to learn diverse signing patterns while avoiding premature convergence to suboptimal solutions. In the proposed SLR framework, GA is employed to determine the optimal transition probabilities for the Markov Chain components operating across multiple signing contexts. To enhance the diversity of the initial population and improve the model’s adaptability to signer variations, a niche model is integrated using a Context-Based Clearing (CBC) technique. This approach mitigates premature convergence by promoting genetic diversity, ensuring that the population maintains a wide range of potential solutions. By minimizing gene association within chromosomes, the CBC technique enhances the model’s ability to learn diverse gesture transitions and movement dynamics across different users. This optimization process enables the Markov Chain to better generalize subject-independent sign language recognition, leading to improved classification accuracy, robustness against signer variability, and reduced misclassification rates. Experimental evaluations demonstrate a significant improvement in recognition performance, reduced error rates, and enhanced generalization across unseen signers, validating the effectiveness of the proposed approach. Full article
Show Figures

Figure 1

Back to TopTop