MDPI - Publisher of Open Access Journals

22 pages, 4827 KiB

Open AccessArticle

Development of a Multifunctional Mobile Manipulation Robot Based on Hierarchical Motion Planning Strategy and Hybrid Grasping

by Yuning Cao, Xianli Wang, Zehao Wu and Qingsong Xu

Robotics 2025, 14(7), 96; https://doi.org/10.3390/robotics14070096 (registering DOI) - 15 Jul 2025

Viewed by 149

Abstract

A mobile manipulation robot combines the navigation capability of unmanned ground vehicles and manipulation advantage of robotic arms. However, the development of a mobile manipulation robot is challenging due to the integration requirement of numerous heterogeneous subsystems. In this paper, we propose a [...] Read more.

A mobile manipulation robot combines the navigation capability of unmanned ground vehicles and manipulation advantage of robotic arms. However, the development of a mobile manipulation robot is challenging due to the integration requirement of numerous heterogeneous subsystems. In this paper, we propose a multifunctional mobile manipulation robot by integrating perception, mapping, navigation, object detection, and grasping functions into a seamless workflow to conduct search-and-fetch tasks. To realize navigation and collision avoidance in complex environments, a new hierarchical motion planning strategy is proposed by fusing global and local planners. Control Lyapunov Function (CLF) and Control Barrier Function (CBF) are employed to realize path tracking and to guarantee safety during navigation. The convolutional neural network and the gripper’s kinematic constraints are adopted to construct a learning-optimization hybrid grasping algorithm to generate precise grasping poses. The efficiency of the developed mobile manipulation robot is demonstrated by performing indoor fetching experiments, showcasing its promising capabilities in real-world applications. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Figure 1

27 pages, 1533 KiB

Open AccessArticle

Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions

by Bastian Estay Zamorano, Ali Dehghan Firoozabadi, Alessio Brutti, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar A. Azurdia-Meza

Electronics 2025, 14(14), 2778; https://doi.org/10.3390/electronics14142778 - 10 Jul 2025

Viewed by 270

Abstract

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article [...] Read more.

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article addresses SELD by reformulating direction of arrival (DOA) estimation as a multi-class classification task, leveraging deep convolutional recurrent neural networks (CRNNs). We propose and evaluate two modified architectures: M-DOAnet, an optimized version of DOAnet for localization and tracking, and M-SELDnet, a modified version of SELDnet, which has been designed for joint SELD. Both modified models were rigorously evaluated on the STARSS23 dataset, which comprises 13-class, real-world indoor scenes totaling over 7 h of audio, using spectrograms and acoustic intensity maps from first-order Ambisonics (FOA) signals. M-DOAnet achieved exceptional localization (6.00° DOA error, 72.8% F1-score) and perfect tracking (100% MOTA with zero identity switches). It also demonstrated high computational efficiency, training in 4.5 h (164 s/epoch). In contrast, M-SELDnet delivered strong overall SELD performance (0.32 rad DOA error, 0.75 F1-score, 0.38 error rate, 0.20 SELD score), but with significantly higher resource demands, training in 45 h (1620 s/epoch). Our findings underscore a clear trade-off between model specialization and multifunctionality, providing practical insights for designing SELD systems in real-time and computationally constrained environments. Full article

(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis, 2nd Edition)

► Show Figures

Figure 1

29 pages, 8640 KiB

Open AccessArticle

A Multi-Objective Optimization and Decision Support Framework for Natural Daylight and Building Areas in Community Elderly Care Facilities in Land-Scarce Cities

by Fang Wen, Lu Zhang, Ling Jiang, Wenqi Sun, Tong Jin and Bo Zhang

ISPRS Int. J. Geo-Inf. 2025, 14(7), 272; https://doi.org/10.3390/ijgi14070272 - 10 Jul 2025

Viewed by 164

Abstract

With the rapid advancement of urbanization in China, the demand for community-based elderly care facilities (CECFs) has been increasing. One pressing challenge is the question of how to provide CECFs that not only meet the health needs of the elderly but also make [...] Read more.

With the rapid advancement of urbanization in China, the demand for community-based elderly care facilities (CECFs) has been increasing. One pressing challenge is the question of how to provide CECFs that not only meet the health needs of the elderly but also make efficient use of limited urban land resources. This study addresses this issue by adopting an integrated multi-method research framework that combines multi-objective optimization (MOO) algorithms, Spearman rank correlation analysis, ensemble learning methods (Random Forest combined with SHapley Additive exPlanations (SHAP), where SHAP enhances the interpretability of ensemble models), and Self-Organizing Map (SOM) neural networks. This framework is employed to identify optimal building configurations and to examine how different architectural parameters influence key daylight performance indicators—Useful Daylight Illuminance (UDI) and Daylight Factor (DF). Results indicate that when UDI and DF meet the comfort thresholds for elderly users, the minimum building area can be controlled to as little as 351 m² and can achieve a balance between natural lighting and spatial efficiency. This ensures sufficient indoor daylight while mitigating excessive glare that could impair elderly vision. Significant correlations are observed between spatial form and daylight performance, with factors such as window-to-wall ratio (WWR) and wall thickness (WT) playing crucial roles. Specifically, wall thickness affects indoor daylight distribution by altering window depth and shading. Moreover, the ensemble learning models combined with SHAP analysis uncover nonlinear relationships between various architectural parameters and daylight performance. In addition, a decision support method based on SOM is proposed to replace the subjective decision-making process commonly found in traditional optimization frameworks. This method enables the visualization of a large Pareto solution set in a two-dimensional space, facilitating more informed and rational design decisions. Finally, the findings are translated into a set of practical design strategies for application in real-world projects. Full article

► Show Figures

Figure 1

50 pages, 28354 KiB

Open AccessArticle

Mobile Mapping Approach to Apply Innovative Approaches for Real Estate Asset Management: A Case Study

by Giorgio P. M. Vassena

Appl. Sci. 2025, 15(14), 7638; https://doi.org/10.3390/app15147638 - 8 Jul 2025

Viewed by 425

Abstract

Technological development has strongly impacted all processes related to the design, construction, and management of real estate assets. In fact, the introduction of the BIM approach has required the application of three-dimensional survey technologies, and in particular the use of LiDAR instruments, both [...] Read more.

Technological development has strongly impacted all processes related to the design, construction, and management of real estate assets. In fact, the introduction of the BIM approach has required the application of three-dimensional survey technologies, and in particular the use of LiDAR instruments, both in their static (TLS—terrestrial laser scanner) and dynamic (iMMS—indoor mobile mapping system) implementations. Operators and developers of LiDAR technologies, for the implementation of scan-to-BIM procedures, initially placed particular care on the 3D surveying accuracy obtainable from such tools. The incorporation of RGB sensors into these instruments has progressively expanded LiDAR-based applications from essential topographic surveying to geospatial applications, where the emphasis is no longer on the accurate three-dimensional reconstruction of buildings but on the capability to create three-dimensional image-based visualizations, such as virtual tours, which allow the recognition of assets located in every area of the buildings. Although much has been written about obtaining the best possible accuracy for extensive asset surveying of large-scale building complexes using iMMS systems, it is now essential to develop and define suitable procedures for controlling such kinds of surveying, targeted at specific geospatial applications. We especially address the design, field acquisition, quality control, and mass data management techniques that might be used in such complex environments. This work aims to contribute by defining the technical specifications for the implementation of geospatial mapping of vast asset survey activities involving significant building sites utilizing iMMS instrumentation. Three-dimensional models can also facilitate virtual tours, enable local measurements inside rooms, and particularly support the subsequent integration of self-locating image-based technologies that can efficiently perform field updates of surveyed databases. Full article

(This article belongs to the Section Civil Engineering)

► Show Figures

Figure 1

22 pages, 9762 KiB

Open AccessArticle

A Map Information Collection Tool for a Pedestrian Navigation System Using Smartphone

by Kadek Suarjuna Batubulan, Nobuo Funabiki, Komang Candra Brata, I Nyoman Darma Kotama, Htoo Htoo Sandi Kyaw and Shintami Chusnul Hidayati

Information 2025, 16(7), 588; https://doi.org/10.3390/info16070588 - 8 Jul 2025

Viewed by 247

Abstract

Nowadays, a pedestrian navigation system using a smartphone has become popular as a useful tool to reach an unknown destination. When the destination is the office of a person, a detailed map information is necessary on the target area such as the room [...] Read more.

Nowadays, a pedestrian navigation system using a smartphone has become popular as a useful tool to reach an unknown destination. When the destination is the office of a person, a detailed map information is necessary on the target area such as the room number and location inside the building. The information can be collected from various sources including Google maps, websites for the building, and images of signs. In this paper, we propose a map information collection tool for a pedestrian navigation system. To improve the accuracy and completeness of information, it works with the four steps: (1) a user captures building and room images manually, (2) an OCR software using Google ML Kit v2 processes them to extract the sign information from images, (3) web scraping using Scrapy (v2.11.0) and crawling with Apache Nutch (v1.19) software collects additional details such as room numbers, facilities, and occupants from relevant websites, and (4) the collected data is stored in the database to be integrated with a pedestrian navigation system. For evaluations of the proposed tool, the map information was collected for 10 buildings at Okayama University, Japan, a representative environment combining complex indoor layouts (e.g., interconnected corridors, multi-floor facilities) and high pedestrian traffic, which are critical for testing real-world navigation challenges. The collected data is assessed in completeness and effectiveness. A university campus was selected as it presents a complex indoor and outdoor environment that can be ideal for testing pedestrian navigations in real-world scenarios. With the obtained map information, 10 users used the navigation system to successfully reach destinations. The System Usability Scale (SUS) results through a questionnaire confirms the high usability. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Figure 1

24 pages, 3241 KiB

Open AccessArticle

An Advanced Indoor Localization Method Based on xLSTM and Residual Multimodal Fusion of UWB/IMU Data

by Haoyang Wang, Jiaxing He and Lizhen Cui

Electronics 2025, 14(13), 2730; https://doi.org/10.3390/electronics14132730 - 7 Jul 2025

Viewed by 209

Abstract

To address the limitations of single-modality UWB/IMU systems in complex indoor environments, this study proposes a multimodal fusion localization method based on xLSTM. After extracting features from UWB and IMU data, the xLSTM network enables deep temporal feature learning. A three-stage residual fusion [...] Read more.

To address the limitations of single-modality UWB/IMU systems in complex indoor environments, this study proposes a multimodal fusion localization method based on xLSTM. After extracting features from UWB and IMU data, the xLSTM network enables deep temporal feature learning. A three-stage residual fusion module is introduced to enhance cross-modal complementarity, while a multi-head attention mechanism dynamically adjusts the sensor weights. The end-to-end trained network effectively constructs nonlinear multimodal mappings for two-dimensional position estimation under both static and dynamic non-line-of-sight (NLOS) conditions with human-induced interference. Experimental results demonstrate that the localization errors reach 0.181 m under static NLOS and 0.187 m under dynamic NLOS, substantially outperforming traditional filtering-based approaches. The proposed deep fusion framework significantly improves localization reliability under occlusion and offers an innovative solution for high-precision indoor positioning. Full article

► Show Figures

Figure 1

24 pages, 11256 KiB

Open AccessArticle

Indoor Measurement of Contact Stress Distributions for a Slick Tyre at Low Speed

by Gabriel Anghelache and Raluca Moisescu

Sensors 2025, 25(13), 4193; https://doi.org/10.3390/s25134193 - 5 Jul 2025

Viewed by 220

Abstract

The paper presents results of experimental research on tyre–road contact stress distributions, measured indoors for a motorsport slick tyre. The triaxial contact stress distributions have been measured using the complex transducer containing a transversal array of 30 sensing pins covering the entire contact [...] Read more.

The paper presents results of experimental research on tyre–road contact stress distributions, measured indoors for a motorsport slick tyre. The triaxial contact stress distributions have been measured using the complex transducer containing a transversal array of 30 sensing pins covering the entire contact patch width. Wheel displacement in the longitudinal direction was measured using a rotary encoder. The parameters allocated for the experimental programme have included different values of tyre inflation pressure, vertical load, camber angle and toe angle. All measurements were performed at low longitudinal speed in free-rolling conditions. The influence of tyre functional parameters on the contact patch shape and size has been discussed. The stress distributions on each orthogonal direction are presented in multiple formats, such as 2D graphs in which the curves show the stresses measured by each sensing element versus contact length; surfaces with stress values plotted as vertical coordinates versus contact patch length and width; and colour maps for stress distributions and orientations of shear stress vectors. The effects of different parameter types and values on stress distributions have been emphasised and analysed. Furthermore, the magnitude and position of local extreme values for each stress distribution have been investigated with respect to the above-mentioned tyre functional parameters. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

31 pages, 9881 KiB

Open AccessArticle

Guide Robot Based on Image Processing and Path Planning

by Chen-Hsien Yang and Jih-Gau Juang

Machines 2025, 13(7), 560; https://doi.org/10.3390/machines13070560 - 27 Jun 2025

Viewed by 184

Abstract

While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm [...] Read more.

While guide dogs remain the primary aid for visually impaired individuals, robotic guides continue to be an important area of research. This study introduces an indoor guide robot designed to physically assist a blind person by holding their hand with a robotic arm and guiding them to a specified destination. To enable hand-holding, we employed a camera combined with object detection to identify the human hand and a closed-loop control system to manage the robotic arm’s movements. For path planning, we implemented a Dueling Double Deep Q Network (D3QN) enhanced with a genetic algorithm. To address dynamic obstacles, the robot utilizes a depth camera alongside fuzzy logic to control its wheels and navigate around them. A 3D point cloud map is generated to determine the start and end points accurately. The D3QN algorithm, supplemented by variables defined using the genetic algorithm, is then used to plan the robot’s path. As a result, the robot can safely guide blind individuals to their destinations without collisions. Full article

(This article belongs to the Special Issue Autonomous Navigation of Mobile Robots and UAVs, 2nd Edition)

► Show Figures

Figure 1

25 pages, 9860 KiB

Open AccessArticle

Indoor Dynamic Environment Mapping Based on Semantic Fusion and Hierarchical Filtering

by Yiming Li, Luying Na, Xianpu Liang and Qi An

ISPRS Int. J. Geo-Inf. 2025, 14(7), 236; https://doi.org/10.3390/ijgi14070236 - 21 Jun 2025

Viewed by 530

Abstract

To address the challenges of dynamic object interference and redundant information representation in map construction for indoor dynamic environments, this paper proposes an indoor dynamic environment mapping method based on semantic fusion and hierarchical filtering. First, prior dynamic object masks are obtained using [...] Read more.

To address the challenges of dynamic object interference and redundant information representation in map construction for indoor dynamic environments, this paper proposes an indoor dynamic environment mapping method based on semantic fusion and hierarchical filtering. First, prior dynamic object masks are obtained using the YOLOv8 model, and geometric constraints between prior static objects and dynamic regions are introduced to identify non-prior dynamic objects, thereby eliminating all dynamic features (both prior and non-prior). Second, an initial semantic point cloud map is constructed by integrating prior static features from a semantic segmentation network with pose estimates from an RGB-D camera. Dynamic noise is then removed using statistical outlier removal (SOR) filtering, while voxel filtering optimizes point cloud density, generating a compact yet texture-rich semantic dense point cloud map with minimal dynamic artifacts. Subsequently, a multi-resolution semantic octree map is built using a recursive spatial partitioning algorithm. Finally, point cloud poses are corrected via Transform Frame (TF) transformation, and a 2D traversability grid map is generated using passthrough filtering and grid projection. Experimental results demonstrate that the proposed method constructs multi-level semantic maps with rich information, clear structure, and high reliability in indoor dynamic scenarios. Additionally, the map file size is compressed by 50–80%, significantly enhancing the reliability of mobile robot navigation and the efficiency of path planning. Full article

(This article belongs to the Special Issue Indoor Mobile Mapping and Location-Based Knowledge Services)

► Show Figures

Figure 1

35 pages, 1553 KiB

Open AccessArticle

Efficient Learning-Based Robotic Navigation Using Feature-Based RGB-D Pose Estimation and Topological Maps

by Eder A. Rodríguez-Martínez, Jesús Elías Miranda-Vega, Farouk Achakir, Oleg Sergiyenko, Julio C. Rodríguez-Quiñonez, Daniel Hernández Balbuena and Wendy Flores-Fuentes

Entropy 2025, 27(6), 641; https://doi.org/10.3390/e27060641 - 15 Jun 2025

Viewed by 593

Abstract

Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological [...] Read more.

Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological map; edges are added when visual similarity and geometric–kinematic constraints are jointly satisfied. During autonomy, LightGlue features and SVD give six-DoF relative pose to the active keyframe, and the MLP predicts one of four discrete actions. Low visual similarity or detected obstacles trigger graph editing and Dijkstra replanning in real time. Across eight tasks in four Habitat-Sim environments, the agent covered 190.44 m, replanning when required, and consistently stopped within 0.1 m of the goal while running on commodity hardware. An information-theoretic analysis over the Multi-Illumination dataset shows that LightGlue maximizes per-second information gain under lighting changes, motivating its selection. The modular design attains reliable navigation without metric SLAM or large-scale learning, and seamlessly accommodates future perception or policy upgrades. Full article

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

35 pages, 21267 KiB

Open AccessArticle

Unmanned Aerial Vehicle–Unmanned Ground Vehicle Centric Visual Semantic Simultaneous Localization and Mapping Framework with Remote Interaction for Dynamic Scenarios

by Chang Liu, Yang Zhang, Liqun Ma, Yong Huang, Keyan Liu and Guangwei Wang

Drones 2025, 9(6), 424; https://doi.org/10.3390/drones9060424 - 10 Jun 2025

Viewed by 1081

Abstract

In this study, we introduce an Unmanned Aerial Vehicle (UAV) centric visual semantic simultaneous localization and mapping (SLAM) framework that integrates RGB–D cameras, inertial measurement units (IMUs), and a 5G–enabled remote interaction module. Our system addresses three critical limitations in existing approaches: (1) [...] Read more.

In this study, we introduce an Unmanned Aerial Vehicle (UAV) centric visual semantic simultaneous localization and mapping (SLAM) framework that integrates RGB–D cameras, inertial measurement units (IMUs), and a 5G–enabled remote interaction module. Our system addresses three critical limitations in existing approaches: (1) Distance constraints in remote operations; (2) Static map assumptions in dynamic environments; and (3) High–dimensional perception requirements for UAV–based applications. By combining YOLO–based object detection with epipolar–constraint-based dynamic feature removal, our method achieves real-time semantic mapping while rejecting motion artifacts. The framework further incorporates a dual–channel communication architecture to enable seamless human–in–the–loop control over UAV–Unmanned Ground Vehicle (UGV) teams in large–scale scenarios. Experimental validation across indoor and outdoor environments indicates that the system can achieve a detection rate of up to 75 frames per second (FPS) on an NVIDIA Jetson AGX Xavier using YOLO–FASTEST, ensuring the rapid identification of dynamic objects. In dynamic scenarios, the localization accuracy attains an average absolute pose error (APE) of 0.1275 m. This outperforms state–of–the–art methods like Dynamic–VINS (0.211 m) and ORB–SLAM3 (0.148 m) on the EuRoC MAV Dataset. The dual-channel communication architecture (Web Real–Time Communication (WebRTC) for video and Message Queuing Telemetry Transport (MQTT) for telemetry) reduces bandwidth consumption by 65% compared to traditional TCP–based protocols. Moreover, our hybrid dynamic feature filtering can reject 89% of dynamic features in occluded scenarios, guaranteeing accurate mapping in complex environments. Our framework represents a significant advancement in enabling intelligent UAVs/UGVs to navigate and interact in complex, dynamic environments, offering real-time semantic understanding and accurate localization. Full article

(This article belongs to the Special Issue Advances in Perception, Communications, and Control for Drones)

► Show Figures

Figure 1

15 pages, 3156 KiB

Open AccessArticle

Adaptive AR Navigation: Real-Time Mapping for Indoor Environment Using Node Placement and Marker Localization

by Bagas Samuel Christiananta Putra, I. Kadek Dendy Senapartha, Jyun-Cheng Wang, Matahari Bhakti Nendya, Dan Daniel Pandapotan, Felix Nathanael Tjahjono and Halim Budi Santoso

Information 2025, 16(6), 478; https://doi.org/10.3390/info16060478 - 7 Jun 2025

Viewed by 835

Abstract

Indoor navigation remains a challenge due to the limitations of GPS-based systems in enclosed environments. Current approaches, such as marker-based ones, have been developed for indoor navigation. However, it requires extensive manual mapping and makes indoor navigation time-consuming and difficult to scale. To [...] Read more.

Indoor navigation remains a challenge due to the limitations of GPS-based systems in enclosed environments. Current approaches, such as marker-based ones, have been developed for indoor navigation. However, it requires extensive manual mapping and makes indoor navigation time-consuming and difficult to scale. To enhance current approaches to indoor navigation, this study proposes a node-based mapping for indoor navigation, allowing users to dynamically construct navigation paths using a mobile device. The system leverages NavMesh, the A* algorithm for pathfinding, and is integrated into the ARCore for real-time AR guidance. Nodes are placed within the environment to define walkable paths, which can be stored and reused without requiring a full system to rebuild. Once the prototype has been developed, usability testing is conducted using the Handheld Augmented Reality Usability Scale (HARUS) to evaluate manipulability, comprehensibility, and overall usability. This study finds that using node-based mapping for indoor navigation can help enhance flexibility in mapping new indoor spaces and offers an effective AR-guided navigation experience. However, there are some areas of improvement, including interface clarity and system scalability, that can be considered for future research. This study contributes practically to improving current practices in adaptive indoor navigation systems using AR-based dynamic mapping techniques. Full article

(This article belongs to the Special Issue Extended Reality: A New Way of Interacting with the World, 2nd Edition)

► Show Figures

Figure 1

23 pages, 4909 KiB

Open AccessArticle

Autonomous Navigation and Obstacle Avoidance for Orchard Spraying Robots: A Sensor-Fusion Approach with ArduPilot, ROS, and EKF

by Xinjie Zhu, Xiaoshun Zhao, Jingyan Liu, Weijun Feng and Xiaofei Fan

Agronomy 2025, 15(6), 1373; https://doi.org/10.3390/agronomy15061373 - 3 Jun 2025

Viewed by 683

Abstract

To address the challenges of low pesticide utilization, insufficient automation, and health risks in orchard plant protection, we developed an autonomous spraying vehicle using ArduPilot firmware and a robot operating system (ROS). The system tackles orchard navigation hurdles, including global navigation satellite system [...] Read more.

To address the challenges of low pesticide utilization, insufficient automation, and health risks in orchard plant protection, we developed an autonomous spraying vehicle using ArduPilot firmware and a robot operating system (ROS). The system tackles orchard navigation hurdles, including global navigation satellite system (GNSS) signal obstruction, light detection and ranging (LIDAR) simultaneous localization and mapping (SLAM) error accumulation, and lighting-limited visual positioning. A key innovation is the integration of an extended Kalman filter (EKF) to dynamically fuse T265 visual odometry, inertial measurement unit (IMU), and GPS data, overcoming single-sensor limitations and enhancing positioning robustness in complex environments. Additionally, the study optimizes PID controller derivative parameters for tracked chassis, improving acceleration/deceleration control smoothness. The system, composed of Pixhawk 4, Raspberry Pi 4B, Silan S2L LIDAR, T265 visual odometry, and a Quectel EC200A 4G module, enables autonomous path planning, real-time obstacle avoidance, and multi-mission navigation. Indoor/outdoor tests and field experiments in Sun Village Orchard validated its autonomous cruising and obstacle avoidance capabilities under real-world orchard conditions, demonstrating feasibility for intelligent plant protection. Full article

(This article belongs to the Special Issue Smart Pest Control for Building Farm Resilience)

► Show Figures

Figure 1

16 pages, 2315 KiB

Open AccessArticle

ResT-IMU: A Two-Stage ResNet-Transformer Framework for Inertial Measurement Unit Localization

by Yanping Zhu, Jianqiang Zhang, Wenlong Chen, Chenyang Zhu, Sen Yan and Qi Chen

Sensors 2025, 25(11), 3441; https://doi.org/10.3390/s25113441 - 30 May 2025

Viewed by 493

Abstract

To address the challenges of accurate indoor positioning in complex environments, this paper proposes a two-stage indoor positioning method, ResT-IMU, which integrates the ResNet and Transformer architectures. The method initially processes the IMU data using Kalman filtering, followed by the application of windowing [...] Read more.

To address the challenges of accurate indoor positioning in complex environments, this paper proposes a two-stage indoor positioning method, ResT-IMU, which integrates the ResNet and Transformer architectures. The method initially processes the IMU data using Kalman filtering, followed by the application of windowing to the data. Residual networks are then employed to extract motion features by learning the residual mapping of the input data, which enhances the model’s ability to capture motion changes and predict instantaneous velocity. Subsequently, the self-attention mechanism of the Transformer is utilized to capture the temporal features of the IMU data, thereby refining the estimation of movement direction in conjunction with the velocity predictions. Finally, a fully connected layer outputs the predicted velocity and direction, which are used to calculate the trajectory. During training, the RMSE loss is used to optimize velocity prediction, while the cosine similarity loss is employed for direction prediction. Theexperimental results demonstrate that ResT-IMU achieves velocity prediction errors of 0.0182 m/s on the iIMU-TD dataset and 0.014 m/s on the RoNIN dataset. Compared with the ResNet model, ResT-IMU achieves reductions of 0.19 m in ATE and 0.05 m in RTE on the RoNIN dataset. Compared with the IMUNet model, ResT-IMU achieves reductions of 0.61 m in ATE and 0.02 m in RTE on the iIMU-TD dataset and reductions of 0.32 m in ATE and 0.33 m in RTE on the RoNIN dataset. Compared with the ResMixer model, ResT-IMU achieves reductions of 0.13 m in ATE and 0.02 m in RTE on the RoNIN dataset. These improvements indicate that ResT-IMU offers superior accuracy and robustness in trajectory prediction. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

19 pages, 3903 KiB

Open AccessArticle

CFANet: The Cross-Modal Fusion Attention Network for Indoor RGB-D Semantic Segmentation

by Long-Fei Wu, Dan Wei and Chang-An Xu

J. Imaging 2025, 11(6), 177; https://doi.org/10.3390/jimaging11060177 - 27 May 2025

Viewed by 1049

Abstract

Indoor image semantic segmentation technology is applied to fields such as smart homes and indoor security. The challenges faced by semantic segmentation techniques using RGB images and depth maps as data sources include the semantic gap between RGB images and depth maps and [...] Read more.

Indoor image semantic segmentation technology is applied to fields such as smart homes and indoor security. The challenges faced by semantic segmentation techniques using RGB images and depth maps as data sources include the semantic gap between RGB images and depth maps and the loss of detailed information. To address these issues, a multi-head self-attention mechanism is adopted to adaptively align features of the two modalities and perform feature fusion in both spatial and channel dimensions. Appropriate feature extraction methods are designed according to the different characteristics of RGB images and depth maps. For RGB images, asymmetric convolution is introduced to capture features in the horizontal and vertical directions, enhance short-range information dependence, mitigate the gridding effect of dilated convolution, and introduce criss-cross attention to obtain contextual information from global dependency relationships. On the depth map, a strategy of extracting significant unimodal features from the channel and spatial dimensions is used. A lightweight skip connection module is designed to fuse low-level and high-level features. In addition, since the first layer contains the richest detailed information and the last layer contains rich semantic information, a feature refinement head is designed to fuse the two. The method achieves an mIoU of 53.86% and 51.85% on the NYUDv2 and SUN-RGBD datasets, which is superior to mainstream methods. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Search Results (895)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (895)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI