MDPI - Publisher of Open Access Journals

17 pages, 1881 KB

Open AccessCommunication

HSG-ON: Hierarchical Scene Graph-Based Object Navigation

by Seokjoon Kwon, Hee-Deok Jang and Dong Eui Chang

Sensors 2026, 26(6), 1755; https://doi.org/10.3390/s26061755 - 10 Mar 2026

Viewed by 302

For a robot to operate effectively in human-centric environments, finding objects based on natural language is essential. Zero-shot object goal navigation is a significant challenge where robots must find unseen objects in new environments without prior knowledge. Existing methods often struggle with strategic [...] Read more.

For a robot to operate effectively in human-centric environments, finding objects based on natural language is essential. Zero-shot object goal navigation is a significant challenge where robots must find unseen objects in new environments without prior knowledge. Existing methods often struggle with strategic exploration, leading to inefficient searches. In this study, we propose a hierarchical scene graph-based navigation system to address this challenge. Our core innovations are twofold: dynamically constructing a three-layer “room–workspace–object” hierarchical scene graph without manually pre-tuned parameters, and introducing a novel workspace-based searching strategy. By evaluating semantic relevance at the workspace level rather than the object level, the robot infers probable containers for a target, enabling focused, human-like exploration. Simulation results demonstrate that our system significantly outperforms existing state-of-the-art methods. Quantitatively, our approach improves the Success Rate (SR) by 26.8% (SR 0.4859) under distance-constrained settings and by 20.2% (SR 0.7360) under unconstrained settings, compared to the best baselines. These results validate that our framework offers a robust solution for zero-shot object goal navigation. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

25 pages, 6915 KB

Open AccessArticle

EXAONE-VLA: A Unified Vision–Language Framework for Mobile Manipulation via Semantic Topology and Hierarchical LLM Reasoning

by Jeong-Seop Park, Yong-Jun Lee, Jong-Chan Park, Sung-Gil Park, Jong-Jin Woo and Myo-Taeg Lim

Appl. Sci. 2026, 16(5), 2600; https://doi.org/10.3390/app16052600 - 9 Mar 2026

Viewed by 495

Abstract

This paper proposes a unified vision–language framework that translates user instructions into navigation for the mobile base and actions for the manipulator in indoor environments. In general, occupancy grid maps constructed via SLAM capture solely the geometric layout of the environment. This renders [...] Read more.

This paper proposes a unified vision–language framework that translates user instructions into navigation for the mobile base and actions for the manipulator in indoor environments. In general, occupancy grid maps constructed via SLAM capture solely the geometric layout of the environment. This renders the robot incapable of leveraging the semantic information required for object distinction. The proposed method encodes semantic information from vision–language models and the robot’s pose in a textual format, referred to as a semantic topological graph. Specifically, the models including GroundingDINO, LG EXAONE, and SAM2 extract object-level semantic information, which is subsequently used to identify room characteristics. A large language model then interprets user instructions to identify the final destination for navigation within the semantic topological graph, followed by reasoning to determine the suitable action network. Notably, the proposed text-based representation facilitates a substantial reduction in inference time, and its effectiveness is validated through real-world experiments. Full article

(This article belongs to the Special Issue Deep Reinforcement Learning for Multiagent Systems)

► Show Figures

Figure 1

25 pages, 15267 KB

Open AccessArticle

3D Semantic Map Reconstruction for Orchard Environments Using Multi-Sensor Fusion

by Quanchao Wang, Yiheng Chen, Jiaxiang Li, Yongxing Chen and Hongjun Wang

Agriculture 2026, 16(4), 455; https://doi.org/10.3390/agriculture16040455 - 15 Feb 2026

Viewed by 615

Abstract

Semantic point cloud maps play a pivotal role in smart agriculture. They provide not only core three-dimensional data for orchard management but also empower robots with environmental perception, enabling safer and more efficient navigation and planning. However, traditional point cloud maps primarily model [...] Read more.

Semantic point cloud maps play a pivotal role in smart agriculture. They provide not only core three-dimensional data for orchard management but also empower robots with environmental perception, enabling safer and more efficient navigation and planning. However, traditional point cloud maps primarily model surrounding obstacles from a geometric perspective, failing to capture distinctions and characteristics between individual obstacles. In contrast, semantic maps encompass semantic information and even topological relationships among objects in the environment. Furthermore, existing semantic map construction methods are predominantly vision-based, making them ill-suited to handle rapid lighting changes in agricultural settings that can cause positioning failures. Therefore, this paper proposes a positioning and semantic map reconstruction method tailored for orchards. It integrates visual, LiDAR, and inertial sensors to obtain high-precision pose and point cloud maps. By combining open-vocabulary detection and semantic segmentation models, it projects two-dimensional detected semantic information onto the three-dimensional point cloud, ultimately generating a point cloud map enriched with semantic information. The resulting 2D occupancy grid map is utilized for robotic motion planning. Experimental results demonstrate that on a custom dataset, the proposed method achieves 74.33% mIoU for semantic segmentation accuracy, 12.4% relative error for fruit recall rate, and 0.038803 m mean translation error for localization. The deployed semantic segmentation network Fast-SAM achieves a processing speed of 13.36 ms per frame. These results demonstrate that the proposed method combines high accuracy with real-time performance in semantic map reconstruction. This exploratory work provides theoretical and technical references for future research on more precise localization and more complete semantic mapping, offering broad application prospects and providing key technological support for intelligent agriculture. Full article

(This article belongs to the Special Issue Advances in Robotic Systems for Precision Orchard Operations)

► Show Figures

Figure 1

31 pages, 11832 KB

Open AccessArticle

A Visual Navigation Path Extraction Method for Complex and Variable Agricultural Scenarios Based on AFU-Net and Key Contour Point Constraints

by Jin Lu, Zhao Wang, Jin Wang, Zhongji Cao, Jia Zhao and Minjie Zhang

Agriculture 2026, 16(3), 324; https://doi.org/10.3390/agriculture16030324 - 28 Jan 2026

Viewed by 339

Abstract

In intelligent unmanned agricultural machinery research, navigation line extraction in natural field/orchard environments is critical for autonomous operation. Existing methods still face two prominent challenges: (1) Dynamic shooting perspective shifts caused by natural environmental interference lead to geometric distortion of image features, making [...] Read more.

In intelligent unmanned agricultural machinery research, navigation line extraction in natural field/orchard environments is critical for autonomous operation. Existing methods still face two prominent challenges: (1) Dynamic shooting perspective shifts caused by natural environmental interference lead to geometric distortion of image features, making it difficult to acquire high-precision navigation features; (2) Symmetric distribution of crop row boundaries hinders traditional algorithms from accurately extracting effective navigation trajectories, resulting in insufficient accuracy and reliability. To address these issues, this paper proposes an environment-adaptive navigation path extraction method for multi-type agricultural scenarios, consisting of two core components: an Attention-Feature-Enhanced U-Net (AFU-Net) for semantic segmentation of navigation feature regions, and a key-point constraint-based adaptive navigation line extraction algorithm. AFU-Net improves the U-Net framework by embedding Efficient Channel Attention (ECA) modules at the ends of Encoders 1–3 to enhance feature expression, and replacing Encoder 4 with a cascaded Semantic Aware Multi-scale Enhancement (SAME) module. Trained and tested on both our KVW dataset and Yu’s field dataset, our method achieves outstanding performance: On the KVW dataset, AFU-Net attains a Mean Intersection over Union (MIoU) of 97.55% and a real-time inference speed of 32.60 FPS with only 3.95 M Params, outperforming state-of-the-art models. On Yu’s field dataset, it maintains an MIoU of 95.20% and 16.30 FPS. Additionally, compared with traditional navigation line extraction algorithms, the proposed adaptive algorithm reduces the mean absolute yaw angle error (mAYAE) to 2.06° in complex scenarios. This research exhibits strong adaptability and robustness, providing reliable technical support for the precise navigation of intelligent agricultural machinery across multiple agricultural scenarios. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

23 pages, 3037 KB

Open AccessArticle

Depth Matters: Geometry-Aware RGB-D-Based Transformer-Enabled Deep Reinforcement Learning for Mapless Navigation

by Alpaslan Burak İnner and Mohammed E. Chachoua

Appl. Sci. 2026, 16(3), 1242; https://doi.org/10.3390/app16031242 - 26 Jan 2026

Cited by 1 | Viewed by 549

Abstract

Autonomous navigation in unknown environments demands policies that can jointly perceive semantic context and geometric safety. Existing Transformer-enabled deep reinforcement learning (DRL) frameworks, such as the Goal-guided Transformer Soft Actor–Critic (GoT-SAC), rely on temporal stacking of multiple RGB frames, which encodes short-term motion [...] Read more.

Autonomous navigation in unknown environments demands policies that can jointly perceive semantic context and geometric safety. Existing Transformer-enabled deep reinforcement learning (DRL) frameworks, such as the Goal-guided Transformer Soft Actor–Critic (GoT-SAC), rely on temporal stacking of multiple RGB frames, which encodes short-term motion cues but lacks explicit spatial understanding. This study introduces a geometry-aware RGB-D early fusion modality that replaces temporal redundancy with cross-modal alignment between appearance and depth. Within the GoT-SAC framework, we integrate a pixel-aligned RGB-D input into the Transformer encoder, enabling the attention mechanism to simultaneously capture semantic textures and obstacle geometry. A comprehensive systematic ablation study was conducted across five modality variants (4RGB, RGB-D, G-D, 4G-D, and 4RGB-D) and three fusion strategies (early, parallel, and late) under identical hyperparameter settings in a controlled simulation environment. The proposed RGB-D early fusion achieved a 40.0% success rate and +94.1 average reward, surpassing the canonical 4RGB baseline (28.0% success, +35.2 reward), while a tuned configuration further improved performance to 54.0% success and +146.8 reward. These results establish early pixel-level multimodal fusion (RGB-D) as a principled and efficient successor to temporal stacking, yielding higher stability, sample efficiency, and geometry-aware decision-making. This work provides the first controlled evidence that spatially aligned multimodal fusion within Transformer-based DRL significantly enhances mapless navigation performance and offers a reproducible foundation for sim-to-real transfer in autonomous mobile robots. Full article

(This article belongs to the Special Issue Next-Generation Mobile Robotics: Intelligent Navigation, Adaptive Planning, and Sensor Integration)

► Show Figures

Figure 1

60 pages, 3790 KB

Open AccessReview

Autonomous Mobile Robot Path Planning Techniques—A Review: Metaheuristic and Cognitive Techniques

by Mubarak Badamasi Aremu, Gamil Ahmed, Sami Elferik and Abdul-Wahid A. Saif

Robotics 2026, 15(1), 23; https://doi.org/10.3390/robotics15010023 - 14 Jan 2026

Cited by 2 | Viewed by 1462

Abstract

Autonomous mobile robots (AMRs) require robust, efficient path planning to operate safely in complex, often dynamic environments (e.g., logistics, transportation, and healthcare). This systematic review focuses on advanced metaheuristic and learning- and reasoning-based (cognitive) techniques for AMR path planning. Drawing on approximately 230 [...] Read more.

Autonomous mobile robots (AMRs) require robust, efficient path planning to operate safely in complex, often dynamic environments (e.g., logistics, transportation, and healthcare). This systematic review focuses on advanced metaheuristic and learning- and reasoning-based (cognitive) techniques for AMR path planning. Drawing on approximately 230 articles published between 2018 and 2025, we organize the literature into two prominent families, metaheuristic optimization and AI-based navigation, and introduce and apply a unified taxonomy (planning scope, output type, and constraint awareness) to guide the comparative analysis and practitioner-oriented synthesis. We synthesize representative approaches, including swarm- and evolutionary-based planners (e.g., PSO, GA, ACO, GWO), fuzzy and neuro-fuzzy systems, neural methods, and RL/DRL-based navigation, highlighting their operating principles, recent enhancements, strengths, and limitations, and typical deployment roles within hierarchical navigation stacks. Comparative tables and a compact trade-off synthesis summarize capabilities across static/dynamic settings, real-world validation, and hybridization trends. Persistent gaps remain in parameter tuning, safety, and interpretability of learning-enabled navigation; sim-to-real transfer; scalability under real-time compute limits; and limited physical experimentation. Finally, we outline research opportunities and open research questions, covering benchmarking and reproducibility, resource-aware planning, multi-robot coordination, 3D navigation, and emerging foundation models (LLMs/VLMs) for high-level semantic navigation. Collectively, this review provides a consolidated reference and practical guidance for future AMR path-planning research. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Figure 1

17 pages, 4323 KB

Open AccessArticle

Render-Rank-Refine: Accurate 6D Indoor Localization via Circular Rendering

by Haya Monawwar and Guoliang Fan

J. Imaging 2026, 12(1), 10; https://doi.org/10.3390/jimaging12010010 - 25 Dec 2025

Viewed by 495

Abstract

Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in [...] Read more.

Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in a query image and their boundaries may overlap or be partially occluded. We present Render-Rank-Refine, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-k candidates are compared to the query via rotation-invariant circular descriptors, which re-ranks the matches before final translation and rotation refinement. Our method increases camera localization accuracy compared to the state-of-the-art SPVLoc baseline by reducing the translation error by 40.4% and the rotation error by 29.7% in ambiguous layouts, as evaluated on the Zillow Indoor Dataset. In terms of inference throughput, our method achieves 25.8–26.4 QPS, (Queries Per Second) which is significantly faster than other recent comparable methods, while maintaining accuracy comparable to or better than the SPVLoc baseline. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

5 pages, 180 KB

Open AccessEditorial

Advanced Autonomous Systems and the Artificial Intelligence Stage

by Liviu Marian Ungureanu and Iulian-Sorin Munteanu

Technologies 2026, 14(1), 9; https://doi.org/10.3390/technologies14010009 - 23 Dec 2025

Viewed by 626

Abstract

This Editorial presents an integrative overview of the Special Issue “Advanced Autonomous Systems and Artificial Intelligence Stage”, which assembles fifteen peer-reviewed articles dedicated to the recent evolution of AI-enabled and autonomous systems. The contributions span a broad spectrum of domains, including renewable energy [...] Read more.

This Editorial presents an integrative overview of the Special Issue “Advanced Autonomous Systems and Artificial Intelligence Stage”, which assembles fifteen peer-reviewed articles dedicated to the recent evolution of AI-enabled and autonomous systems. The contributions span a broad spectrum of domains, including renewable energy and power systems, intelligent transportation, agricultural robotics, clinical and assistive technologies, mobile robotic platforms, and space robotics. Across these diverse applications, the collection highlights core research themes such as robust perception and navigation, semantic and multi modal sensing, resource-efficient embedded inference, human–machine interaction, sustainable infrastructures, and validation frameworks for safety-critical systems. Several articles demonstrate how physical modeling, hybrid control architectures, deep learning, and data-driven methods can be combined to enhance operational robustness, reliability, and autonomy in real-world environments. Other works address challenges related to fall detection, predictive maintenance, teleoperation safety, and the deployment of intelligent systems in large-scale or mission-critical contexts. Overall, this Special Issue offers a consolidated and rigorous academic synthesis of current advances in Autonomous Systems and Artificial Intelligence, providing researchers and practitioners with a valuable reference for understanding emerging trends, practical implementations, and future research directions. Full article

(This article belongs to the Special Issue Advanced Autonomous Systems and Artificial Intelligence Stage)

38 pages, 3484 KB

Open AccessArticle

From Prompts to Paths: Large Language Models for Zero-Shot Planning in Unmanned Ground Vehicle Simulation

by Kelvin Olaiya, Giovanni Delnevo, Chan-Tong Lam, Giovanni Pau and Paola Salomoni

Drones 2025, 9(12), 875; https://doi.org/10.3390/drones9120875 - 18 Dec 2025

Viewed by 1469

Abstract

This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose [...] Read more.

This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose LLM with visual and spatial inputs for adaptive planning to iteratively guide UGV behavior. Although the framework is demonstrated in a ground-based setting, it directly extends to other unmanned systems, where semantic reasoning and adaptive planning are increasingly critical for autonomous mission execution. To assess performance, we employ a continuous evaluation metric that jointly considers distance and orientation, offering a more informative and fine-grained alternative to binary success measures. We evaluate a foundational LLM (i.e., Gemini 2.0 Flash, Google DeepMind) on a suite of zero-shot navigation and exploration tasks in simulated environments. Unlike prior LLM-robot systems that rely on fine-tuning or learned waypoint policies, we evaluate a purely zero-shot, stepwise LLM planner that receives no task demonstrations and reasons only from the sensed data. Our findings show that LLMs exhibit encouraging signs of goal-directed spatial planning and partial task completion, even in a zero-shot setting. However, inconsistencies in plan generation across models highlight the need for task-specific adaptation or fine-tuning. These findings highlight the potential of LLM-based multimodal reasoning to enhance autonomy in UGV and drone navigation, bridging high-level semantic understanding with robust spatial planning. Full article

(This article belongs to the Special Issue Advances in Guidance, Navigation, and Control)

► Show Figures

Figure 1

25 pages, 3616 KB

Open AccessArticle

A Deep Learning-Driven Semantic Mapping Strategy for Robotic Inspection of Desalination Facilities

by Albandari Alotaibi, Reem Alrashidi, Hanan Alatawi, Lamaa Duwayriat, Aseel Binnouh, Tareq Alhmiedat and Ahmad Al-Qerem

Machines 2025, 13(12), 1129; https://doi.org/10.3390/machines13121129 - 8 Dec 2025

Viewed by 646

Abstract

The area of robot autonomous navigation has become essential for reducing labor-intensive tasks. These robots’ current navigation systems are based on sensed geometrical structures of the environment, with the engagement of an array of sensor units such as laser scanners, range-finders, and light [...] Read more.

The area of robot autonomous navigation has become essential for reducing labor-intensive tasks. These robots’ current navigation systems are based on sensed geometrical structures of the environment, with the engagement of an array of sensor units such as laser scanners, range-finders, and light detection and ranging (LiDAR) in order to obtain the environment layout. Scene understanding is an important task in the development of robots that need to act autonomously. Hence, this paper presents an efficient semantic mapping system that integrates LiDAR, RGB-D, and odometry data to generate precise and information-rich maps. The proposed system enables the automatic detection and labeling of critical infrastructure components, while preserving high spatial accuracy. As a case study, the system was applied to a desalination plant, where it interactively labeled key entities by integrating Simultaneous Localization and Mapping (SLAM) with vision-based techniques in order to determine the location of installed pipes. The developed system was validated using an efficient development environment known as Robot Operating System (ROS) and a two-wheel-drive robot platform. Several simulations and real-world experiments were conducted to validate the efficiency of the developed semantic mapping system. The obtained results are promising, as the developed semantic map generation system achieves an average object detection accuracy of 84.97% and an average localization error of 1.79 m. Full article

(This article belongs to the Special Issue Robotic Intelligence Development of AI in Robot Perception, Learning, and Decision)

► Show Figures

Figure 1

26 pages, 2310 KB

Open AccessSystematic Review

A Systematic Review of Intelligent Navigation in Smart Warehouses Using Prisma: Integrating AI, SLAM, and Sensor Fusion for Mobile Robots

by Domagoj Zimmer, Mladen Jurišić, Ivan Plaščak, Željko Barač, Hrvoje Glavaš, Dorijan Radočaj and Robert Benković

Eng 2025, 6(12), 339; https://doi.org/10.3390/eng6120339 - 1 Dec 2025

Viewed by 1775

Abstract

This systematic review focuses on intelligent navigation as a core enabler of autonomy in smart warehouses, where mobile robots must dynamically perceive, reason, and act in complex, human-shared environments. By synthesizing advancements in AI-driven decision-making, SLAM, and multi-sensor fusion, the study highlights how [...] Read more.

This systematic review focuses on intelligent navigation as a core enabler of autonomy in smart warehouses, where mobile robots must dynamically perceive, reason, and act in complex, human-shared environments. By synthesizing advancements in AI-driven decision-making, SLAM, and multi-sensor fusion, the study highlights how intelligent navigation architectures reduce operational uncertainty and enhance task efficiency in logistics automation. Smart warehouses, powered by mobile robots and AGVs and integrated with AI and algorithms, are enabling more efficient storage with less human labour. This systematic review followed PRISMA 2020 guidelines to systematically identify, screen, and synthesize evidence from 106 peer-reviewed scientific articles (including pri-mary studies, technical papers, and reviews) published between 2020–2025, sourced from Web of Science. Thematic synthesis was conducted across 8 domains: AI, SLAM, sensor fusion, safety, network, path planning, implementation, and design. The transition to smart warehouses requires modern technologies to automate tasks and optimize resources. This article examines how intelligent systems can be integrated with mathematical models to improve navigation accuracy, reduce costs and prioritize human safety. Real-time data management with precise information for AMRs and AGVs is crucial for low-risk operation. This article studies AI, the IoT, LiDAR, machine learning (ML), SLAM and other new technologies for the successful implementation of mobile robots in smart warehouses. Modern technologies such as reinforcement learning optimize the routes and tasks of mobile robots. Data and sensor fusion methods integrate information from various sources to provide a more precise understanding of the indoor environment and inventory. Semantic mapping enables mobile robots to navigate and interact with complex warehouse environments with high accuracy in real time. The article also analyses how virtual reality (VR) can improve the spatial orientation of mobile robots by developing sophisticated navigation solutions that reduce time and financial costs. Full article

(This article belongs to the Special Issue Interdisciplinary Insights in Engineering Research)

► Show Figures

Figure 1

16 pages, 8229 KB

Open AccessArticle

MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization

by Zhendong Xiao, Shan Yang, Shujie Ji, Jun Yin, Ziling Wen and Wu Wei

Appl. Sci. 2025, 15(23), 12642; https://doi.org/10.3390/app152312642 - 28 Nov 2025

Viewed by 686

Abstract

Camera relocalization, a cornerstone capability of modern computer vision, accurately determines a camera’s position and orientation from images and is essential for applications in augmented reality, mixed reality, autonomous driving, delivery drones, and robotic navigation. Unlike traditional deep learning-based methods regress camera pose [...] Read more.

Camera relocalization, a cornerstone capability of modern computer vision, accurately determines a camera’s position and orientation from images and is essential for applications in augmented reality, mixed reality, autonomous driving, delivery drones, and robotic navigation. Unlike traditional deep learning-based methods regress camera pose from images in a single scene which lack generalization and robustness in diverse environments. We propose MVL-Loc, a novel end-to-end multi-scene six degrees of freedom camera relocalization framework. MVL-Loc leverages pretrained world knowledge from vision-language models and incorporates multimodal data to generalize across both indoor and outdoor settings. Furthermore, natural language is employed as a directive tool to guide the multi-scene learning process, facilitating semantic understanding of complex scenes and capturing spatial relationships among objects. Extensive experiments on the 7Scenes and Cambridge Landmarks datasets demonstrate MVL-Loc’s robustness and state-of-the-art performance in real-world multi-scene camera relocalization, with improved accuracy in both positional and orientational estimates. Full article

► Show Figures

Figure 1

29 pages, 5128 KB

Open AccessArticle

ROVON: An Ontology for Supporting Interoperability for Underwater Robots

by Mansour Taheri Andani and Farhad Ameri

J. Mar. Sci. Eng. 2025, 13(12), 2227; https://doi.org/10.3390/jmse13122227 - 21 Nov 2025

Viewed by 614

Abstract

Underwater robotics produces diverse and complex streams of sensor, image, video, and navigational data under challenging environmental conditions, creating obstacles for seamless integration and interpretation. This paper introduces ROVON (Remotely Operated Vehicle Ontology), a semantic framework designed to enhance interoperability and reasoning in [...] Read more.

Underwater robotics produces diverse and complex streams of sensor, image, video, and navigational data under challenging environmental conditions, creating obstacles for seamless integration and interpretation. This paper introduces ROVON (Remotely Operated Vehicle Ontology), a semantic framework designed to enhance interoperability and reasoning in underwater operations. While ROVON is conceptually scalable to large, heterogeneous datasets, its validation in this study focuses on controlled underwater inspection data collected for pipeline applications. ROVON enables the representation and analysis of multimodal underwater data by semantically annotating raw sensor feeds, enforcing data integrity, and leveraging knowledge graphs to convert disparate inputs into actionable insights. The ontology demonstrates how a structured semantic approach facilitates advanced analysis that improves decision-making, supports proactive maintenance strategies, and enhances operational safety. The proposed framework was validated through a controlled pipeline inspection scenario. Full article

(This article belongs to the Special Issue Innovations in Underwater Robotic Software Systems)

► Show Figures

Figure 1

21 pages, 8490 KB

Open AccessArticle

BDGS-SLAM: A Probabilistic 3D Gaussian Splatting Framework for Robust SLAM in Dynamic Environments

by Tianyu Yang, Shuangfeng Wei, Jingxuan Nan, Mingyang Li and Mingrui Li

Sensors 2025, 25(21), 6641; https://doi.org/10.3390/s25216641 - 30 Oct 2025

Viewed by 2930

Abstract

Simultaneous Localization and Mapping (SLAM) utilizes sensor data to concurrently construct environmental maps and estimate its own position, finding wide application in scenarios like robotic navigation and augmented reality. SLAM systems based on 3D Gaussian Splatting (3DGS) have garnered significant attention due to [...] Read more.

Simultaneous Localization and Mapping (SLAM) utilizes sensor data to concurrently construct environmental maps and estimate its own position, finding wide application in scenarios like robotic navigation and augmented reality. SLAM systems based on 3D Gaussian Splatting (3DGS) have garnered significant attention due to their real-time, high-fidelity rendering capabilities. However, in real-world environments containing dynamic objects, existing 3DGS-SLAM methods often suffer from mapping errors and tracking drift due to dynamic interference. To address this challenge, this paper proposes BDGS-SLAM—a Bayesian Dynamic Gaussian Splatting SLAM framework specifically designed for dynamic environments. During the tracking phase, the system integrates semantic detection results from YOLOv5 to build a dynamic prior probability model based on Bayesian filtering, enabling accurate identification of dynamic Gaussians. In the mapping phase, a multi-view probabilistic update mechanism is employed, which aggregates historical observation information from co-visible keyframes. By introducing an exponential decay factor to dynamically adjust weights, this mechanism effectively restores static Gaussians that were mistakenly culled. Furthermore, an adaptive dynamic Gaussian optimization strategy is proposed. This strategy applies penalizing constraints to suppress the negative impact of dynamic Gaussians on rendering while avoiding the erroneous removal of static Gaussians and ensuring the integrity of critical scene information. Experimental results demonstrate that, compared to baseline methods, BDGS-SLAM achieves comparable tracking accuracy while generating fewer artifacts in rendered results and realizing higher-fidelity scene reconstruction. Full article

(This article belongs to the Special Issue Indoor Localization Technologies and Applications)

► Show Figures

Figure 1

20 pages, 5472 KB

Open AccessArticle

Research on Indoor 3D Semantic Mapping Based on ORB-SLAM2 and Multi-Object Tracking

by Wei Wang, Ruoxi Wu, Yan Dong and Huilin Jiang

Appl. Sci. 2025, 15(20), 10881; https://doi.org/10.3390/app152010881 - 10 Oct 2025

Cited by 2 | Viewed by 1564

Abstract

The integration of semantic simultaneous localization and mapping (SLAM) with 3D object detection in indoor scenes is a significant challenge in the field of robot perception. Existing methods typically rely on expensive sensors and lack robustness and accuracy in complex environments. To address [...] Read more.

The integration of semantic simultaneous localization and mapping (SLAM) with 3D object detection in indoor scenes is a significant challenge in the field of robot perception. Existing methods typically rely on expensive sensors and lack robustness and accuracy in complex environments. To address this, this paper proposes a novel 3D semantic SLAM framework that integrates Oriented FAST and Rotated BRIEF-SLAM2 (ORB-SLAM2), 3D object detection, and multi-object tracking (MOT) techniques to achieve efficient and robust semantic environment modeling. Specifically, we employ an improved 3D object detection network to extract semantic information and enhance detection accuracy through category balancing strategies and optimized loss functions. Additionally, we introduce MOT algorithms to filter and track 3D bounding boxes, enhancing stability in dynamic scenes. Finally, we deeply integrate 3D semantic information into the SLAM system, achieving high-precision 3D semantic map construction. Experiments were conducted on the public dataset SUNRGBD and two self-collected datasets (robot navigation and XR glasses scenes). The results show that, compared with the current state-of-the-art methods, our method demonstrates significant advantages in detection accuracy, localization accuracy, and system robustness, providing an effective solution for low-cost, high-precision indoor semantic SLAM. Full article

► Show Figures

Figure 1

Search Results (139)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (139)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI