MDPI - Publisher of Open Access Journals

20 pages, 2103 KiB

Open AccessArticle

Federated Multi-Stage Attention Neural Network for Multi-Label Electricity Scene Classification

by Lei Zhong, Xuejiao Jiang, Jialong Xu, Kaihong Zheng, Min Wu, Lei Gao, Chao Ma, Dewen Zhu and Yuan Ai

J. Low Power Electron. Appl. 2025, 15(3), 46; https://doi.org/10.3390/jlpea15030046 - 5 Aug 2025

Abstract

Privacy-sensitive electricity scene classification requires robust models under data localization constraints, making federated learning (FL) a suitable framework. Existing FL frameworks face two critical challenges in multi-label electricity scene classification: (1) Label correlations and their strengths significantly impact classification performance. (2) Electricity scene [...] Read more.

Privacy-sensitive electricity scene classification requires robust models under data localization constraints, making federated learning (FL) a suitable framework. Existing FL frameworks face two critical challenges in multi-label electricity scene classification: (1) Label correlations and their strengths significantly impact classification performance. (2) Electricity scene data and labels show distributional inconsistencies across regions. However, current FL frameworks lack explicit modeling of label correlation strengths, and locally trained regional models naturally capture these differences, leading to regional differences in their model parameters. In this scenario, the server’s standard single-stage aggregation often over-averages the global model’s parameters, reducing its discriminative ability. To address these issues, we propose FMMAN, a federated multi-stage attention neural network for multi-label electricity scene classification. The main contributions of this FMMAN lie in label correlation learning and the stepwise model aggregation. It splits the client–server interaction into multiple stages: (1) Clients train models locally to encode features and label correlation strengths after receiving the server’s initial model. (2) The server clusters these locally trained models into K groups to ensure that models within a group have more consistent parameters and generates K prototype models via intra-group aggregation to reduce over-averaging. The K models are then distributed back to the clients. (3) Clients refine their models using the K prototypes with contrastive group-specific consistency regularization to further mitigate over-averaging, and sends the refined model back to the server. (4) Finally, the server aggregates the models into a global model. Experiments on multi-label benchmarks verify that FMMAN outperforms baseline methods. Full article

(This article belongs to the Special Issue Advances in Low Power Neuromorphic Computing: Models, Algorithms, and Applications)

► Show Figures

Figure 1

28 pages, 21813 KiB

Open AccessArticle

Adaptive RGB-D Semantic Segmentation with Skip-Connection Fusion for Indoor Staircase and Elevator Localization

by Zihan Zhu, Henghong Lin, Anastasia Ioannou and Tao Wang

J. Imaging 2025, 11(8), 258; https://doi.org/10.3390/jimaging11080258 - 4 Aug 2025

Abstract

Accurate semantic segmentation of indoor architectural elements, such as staircases and elevators, is critical for safe and efficient robotic navigation, particularly in complex multi-floor environments. Traditional fusion methods struggle with occlusions, reflections, and low-contrast regions. In this paper, we propose a novel feature [...] Read more.

Accurate semantic segmentation of indoor architectural elements, such as staircases and elevators, is critical for safe and efficient robotic navigation, particularly in complex multi-floor environments. Traditional fusion methods struggle with occlusions, reflections, and low-contrast regions. In this paper, we propose a novel feature fusion module, Skip-Connection Fusion (SCF), that dynamically integrates RGB (Red, Green, Blue) and depth features through an adaptive weighting mechanism and skip-connection integration. This approach enables the model to selectively emphasize informative regions while suppressing noise, effectively addressing challenging conditions such as partially blocked staircases, glossy elevator doors, and dimly lit stair edges, which improves obstacle detection and supports reliable human–robot interaction in complex environments. Extensive experiments on a newly collected dataset demonstrate that SCF consistently outperforms state-of-the-art methods, including PSPNet and DeepLabv3, in both overall mIoU (mean Intersection over Union) and challenging-case performance. Specifically, our SCF module improves segmentation accuracy by 5.23% in the top 10% of challenging samples, highlighting its robustness in real-world conditions. Furthermore, we conduct a sensitivity analysis on the learnable weights, demonstrating their impact on segmentation quality across varying scene complexities. Our work provides a strong foundation for real-world applications in autonomous navigation, assistive robotics, and smart surveillance. Full article

(This article belongs to the Topic State-of-the-Art Object Detection, Tracking, and Recognition Techniques)

► Show Figures

Figure 1

20 pages, 5647 KiB

Open AccessArticle

Research on the Improved ICP Algorithm for LiDAR Point Cloud Registration

by Honglei Yuan, Guangyun Li, Li Wang and Xiangfei Li

Sensors 2025, 25(15), 4748; https://doi.org/10.3390/s25154748 - 1 Aug 2025

Viewed by 202

Abstract

Over three decades of research has been undertaken on point cloud registration algorithms, resulting in mature theoretical frameworks and methodologies. However, among the numerous registration techniques used, the impact of point cloud scanning quality on registration outcomes has rarely been addressed. In most [...] Read more.

Over three decades of research has been undertaken on point cloud registration algorithms, resulting in mature theoretical frameworks and methodologies. However, among the numerous registration techniques used, the impact of point cloud scanning quality on registration outcomes has rarely been addressed. In most engineering and industrial measurement applications, the accuracy and density of LiDAR point clouds are highly dependent on laser scanners, leading to significant variability that critically affects registration quality. Key factors influencing point cloud accuracy include scanning distance, incidence angle, and the surface characteristics of the target. Notably, in short-range scanning scenarios, incidence angle emerges as the dominant error source. Building on this insight, this study systematically investigates the relationship between scanning incidence angles and point cloud quality. We propose an incident-angle-dependent weighting function for point cloud observations, and further develop an improved weighted Iterative Closest Point (ICP) registration algorithm. Experimental results demonstrate that the proposed method achieves approximately 30% higher registration accuracy compared to traditional ICP algorithms and a 10% improvement over Faro SCENE’s proprietary solution. Full article

(This article belongs to the Special Issue Intelligent Point Cloud Processing, Sensing and Understanding—Third Edition)

► Show Figures

Figure 1

28 pages, 2174 KiB

Open AccessArticle

Validating Lava Tube Stability Through Finite Element Analysis of Real-Scene 3D Models

by Jiawang Wang, Zhizhong Kang, Chenming Ye, Haiting Yang and Xiaoman Qi

Electronics 2025, 14(15), 3062; https://doi.org/10.3390/electronics14153062 - 31 Jul 2025

Viewed by 197

Abstract

The structural stability of lava tubes is a critical factor for their potential use in lunar base construction. Previous studies could not reflect the details of lava tube boundaries and perform accurate mechanical analysis. To this end, this study proposes a robust method [...] Read more.

The structural stability of lava tubes is a critical factor for their potential use in lunar base construction. Previous studies could not reflect the details of lava tube boundaries and perform accurate mechanical analysis. To this end, this study proposes a robust method to construct a high-precision, real-scene 3D model based on ground lava tube point cloud data. By employing finite element analysis, this study investigated the impact of real-world cross-sectional geometry, particularly the aspect ratio, on structural stability under surface pressure simulating meteorite impacts. A high-precision 3D reconstruction was achieved using UAV-mounted LiDAR and SLAM-based positioning systems, enabling accurate geometric capture of lava tube profiles. The original point cloud data were processed to extract cross-sections, which were then classified by their aspect ratios for analysis. Experimental results confirmed that the aspect ratio is a significant factor in determining stability. Crucially, unlike the monotonic trends often suggested by idealized models, analysis of real-world geometries revealed that the greatest deformation and structural vulnerability occur in sections with an aspect ratio between 0.5 and 0.6. For small lava tubes buried 3 m deep, the ground pressure they can withstand does not exceed 6 GPa. This process helps identify areas with weaker load-bearing capacity. The analysis demonstrated that a realistic 3D modeling approach provides a more accurate and reliable assessment of lava tube stability. This framework is vital for future evaluations of lunar lava tubes as safe habitats and highlights that complex, real-world geometry can lead to non-intuitive structural weaknesses not predicted by simplified models. Full article

► Show Figures

Figure 1

28 pages, 3441 KiB

Open AccessArticle

Which AI Sees Like Us? Investigating the Cognitive Plausibility of Language and Vision Models via Eye-Tracking in Human-Robot Interaction

by Khashayar Ghamati, Maryam Banitalebi Dehkordi and Abolfazl Zaraki

Sensors 2025, 25(15), 4687; https://doi.org/10.3390/s25154687 - 29 Jul 2025

Viewed by 343

Abstract

As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception [...] Read more.

As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception capabilities, their cognitive plausibility remains underexplored. In this study, we address this gap by using human visual attention as a behavioural proxy for cognition in a naturalistic human-robot interaction (HRI) scenario. Eye-tracking data were previously collected from participants engaging in social human-human interactions, providing frame-level gaze fixations as a human attentional ground truth. We then prompted a state-of-the-art VLM (LLaVA) to generate scene descriptions, which were processed by four LLMs (DeepSeek-R1-Distill-Qwen-7B, Qwen1.5-7B-Chat, LLaMA-3.1-8b-instruct, and Gemma-7b-it) to infer saliency points. Critically, we evaluated each model in both stateless and memory-augmented (short-term memory, STM) modes to assess the influence of temporal context on saliency prediction. Our results presented that whilst stateless LLaVA most closely replicates human gaze patterns, STM confers measurable benefits only for DeepSeek, whose lexical anchoring mirrors human rehearsal mechanisms. Other models exhibited degraded performance with memory due to prompt interference or limited contextual integration. This work introduces a novel, empirically grounded framework for assessing cognitive plausibility in generative models and underscores the role of short-term memory in shaping human-like visual attention in robotic systems. Full article

(This article belongs to the Special Issue Multimodal Human Behavior Understanding in Human–AI Interaction: Sensor-Based Signal Processing and Interaction Techniques)

► Show Figures

Figure 1

17 pages, 1603 KiB

Open AccessPerspective

A Perspective on Quality Evaluation for AI-Generated Videos

by Zhichao Zhang, Wei Sun and Guangtao Zhai

Sensors 2025, 25(15), 4668; https://doi.org/10.3390/s25154668 - 28 Jul 2025

Viewed by 285

Abstract

Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames [...] Read more.

Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames but also by temporal coherence across frames and precise semantic alignment with the intended message. The foundational role of sensor technologies is critical, as they determine the physical plausibility of AIGC outputs. In this perspective, we argue that multimodal large language models (MLLMs) are poised to become the cornerstone of next-generation video quality assessment (VQA). By jointly encoding cues from multiple modalities such as vision, language, sound, and even depth, the MLLM can leverage its powerful language understanding capabilities to assess the quality of scene composition, motion dynamics, and narrative consistency, overcoming the fragmentation of hand-engineered metrics and the poor generalization ability of CNN-based methods. Furthermore, we provide a comprehensive analysis of current methodologies for assessing AIGC video quality, including the evolution of generation models, dataset design, quality dimensions, and evaluation frameworks. We argue that advances in sensor fusion enable MLLMs to combine low-level physical constraints with high-level semantic interpretations, further enhancing the accuracy of visual quality assessment. Full article

(This article belongs to the Special Issue Perspectives in Intelligent Sensors and Sensing Systems)

► Show Figures

Figure 1

27 pages, 5938 KiB

Open AccessArticle

Noise-Adaptive GNSS/INS Fusion Positioning for Autonomous Driving in Complex Environments

by Xingyang Feng, Mianhao Qiu, Tao Wang, Xinmin Yao, Hua Cong and Yu Zhang

Vehicles 2025, 7(3), 77; https://doi.org/10.3390/vehicles7030077 - 22 Jul 2025

Cited by 1 | Viewed by 391

Abstract

Accurate and reliable multi-scene positioning remains a critical challenge in autonomous driving systems, as conventional fixed-noise fusion strategies struggle to handle the dynamic error characteristics of heterogeneous sensors in complex operational environments. This paper proposes a novel noise-adaptive fusion framework integrating Global Navigation [...] Read more.

Accurate and reliable multi-scene positioning remains a critical challenge in autonomous driving systems, as conventional fixed-noise fusion strategies struggle to handle the dynamic error characteristics of heterogeneous sensors in complex operational environments. This paper proposes a novel noise-adaptive fusion framework integrating Global Navigation Satellite System (GNSS) and Inertial Navigation System (INS) measurements. Our key innovation lies in developing a dual noise estimation model that synergizes priori weighting with posterior variance compensation. Specifically, we establish an a priori weighting model for satellite pseudorange errors based on elevation angles and signal-to-noise ratios (SNRs), complemented by a Helmert variance component estimation for posterior refinement. For INS error modeling, we derive a bias instability noise accumulation model through Allan variance analysis. These adaptive noise estimates dynamically update both process and observation noise covariance matrices in our Error-State Kalman Filter (ESKF) implementation, enabling real-time calibration of GNSS and INS contributions. Comprehensive field experiments demonstrate two key advantages: (1) The proposed noise estimation model achieves 37.7% higher accuracy in quantifying GNSS single-point positioning uncertainties compared to conventional elevation-based weighting; (2) in unstructured environments with intermittent signal outages, the fusion system maintains an average absolute trajectory error (ATE) of less than 0.6 m, outperforming state-of-the-art fixed-weight fusion methods by 36.71% in positioning consistency. These results validate the framework’s capability to autonomously balance sensor reliability under dynamic environmental conditions, significantly enhancing positioning robustness for autonomous vehicles. Full article

► Show Figures

Figure 1

22 pages, 2485 KiB

Open AccessArticle

Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning

by Qingqing Hu, Yiran Peng, KinTak U and Siyuan Zhao

Mathematics 2025, 13(15), 2333; https://doi.org/10.3390/math13152333 - 22 Jul 2025

Viewed by 228

Abstract

Infrared and visible image fusion plays a critical role in multimodal perception systems, particularly under challenging conditions such as low illumination, occlusion, or complex backgrounds. However, existing approaches often struggle with global feature modelling, cross-modal dependency learning, and preserving structural details in the [...] Read more.

Infrared and visible image fusion plays a critical role in multimodal perception systems, particularly under challenging conditions such as low illumination, occlusion, or complex backgrounds. However, existing approaches often struggle with global feature modelling, cross-modal dependency learning, and preserving structural details in the fused images. In this paper, we propose a novel adversarial fusion framework driven by a state-space modelling paradigm to address these limitations. In the feature extraction phase, a computationally efficient state-space model is utilized to capture global semantic context from both infrared and visible inputs. A cross-modality state-space architecture is then introduced in the fusion phase to model long-range dependencies between heterogeneous features effectively. Finally, a multi-class discriminator, trained under an adversarial learning scheme, enhances the structural fidelity and detail consistency of the fused output. Extensive experiments conducted on publicly available infrared–visible fusion datasets demonstrate that the proposed method achieves superior performance in terms of information retention, contrast enhancement, and visual realism. The results confirm the robustness and generalizability of our framework for complex scene understanding and downstream tasks such as object detection under adverse conditions. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence, Machine Learning and Optimization)

► Show Figures

Figure 1

22 pages, 6556 KiB

Open AccessArticle

Multi-Task Trajectory Prediction Using a Vehicle-Lane Disentangled Conditional Variational Autoencoder

by Haoyang Chen, Na Li, Hangguan Shan, Eryun Liu and Zhiyu Xiang

Sensors 2025, 25(14), 4505; https://doi.org/10.3390/s25144505 - 20 Jul 2025

Viewed by 400

Abstract

Trajectory prediction under multimodal information is critical for autonomous driving, necessitating the integration of dynamic vehicle states and static high-definition (HD) maps to model complex agent–scene interactions effectively. However, existing methods often employ static scene encodings and unstructured latent spaces, limiting their ability [...] Read more.

Trajectory prediction under multimodal information is critical for autonomous driving, necessitating the integration of dynamic vehicle states and static high-definition (HD) maps to model complex agent–scene interactions effectively. However, existing methods often employ static scene encodings and unstructured latent spaces, limiting their ability to capture evolving spatial contexts and produce diverse yet contextually coherent predictions. To tackle these challenges, we propose MS-SLV, a novel generative framework that introduces (1) a time-aware scene encoder that aligns HD map features with vehicle motion to capture evolving scene semantics and (2) a structured latent model that explicitly disentangles agent-specific intent and scene-level constraints. Additionally, we introduce an auxiliary lane prediction task to provide targeted supervision for scene understanding and improve latent variable learning. Our approach jointly predicts future trajectories and lane sequences, enabling more interpretable and scene-consistent forecasts. Extensive evaluations on the nuScenes dataset demonstrate the effectiveness of MS-SLV, achieving a 12.37% reduction in average displacement error and a 7.67% reduction in final displacement error over state-of-the-art methods. Moreover, MS-SLV significantly improves multi-modal prediction, reducing the top-5 Miss Rate (

{MR}_{5}

) and top-10 Miss Rate (

{MR}_{10}

) by 26% and 33%, respectively, and lowering the Off-Road Rate (ORR) by 3%, as compared with the strongest baseline in our evaluation. Full article

(This article belongs to the Special Issue AI-Driven Sensor Technologies for Next-Generation Electric Vehicles)

► Show Figures

Figure 1

34 pages, 24111 KiB

Open AccessArticle

Natural and Anthropic Constraints on Historical Morphological Dynamics in the Middle Stretch of the Po River (Northern Italy)

by Laura Turconi, Barbara Bono, Carlo Mambriani, Lucia Masotti, Fabio Stocchi and Fabio Luino

Sustainability 2025, 17(14), 6608; https://doi.org/10.3390/su17146608 - 19 Jul 2025

Viewed by 403

Abstract

Geo-historical information deduced from geo-iconographical resources, derived from extensive research and the selection of cartographies and historical documents, enabled the investigation of the natural and anthropic transformations of the perifluvial area of the Po River in the Emilia-Romagna region (Italy). This territory, significant [...] Read more.

Geo-historical information deduced from geo-iconographical resources, derived from extensive research and the selection of cartographies and historical documents, enabled the investigation of the natural and anthropic transformations of the perifluvial area of the Po River in the Emilia-Romagna region (Italy). This territory, significant in terms of its historical, cultural, and environmental contexts, for centuries has been the scene of flood events. These have characterised the morphological and dynamic variability in the riverbed and relative floodplain. The close relationship between man and river is well documented: the interference induced by anthropic activity has alternated with the sometimes-damaging effects of river dynamics. The attention given to the fluvial region of the Po River and its main tributaries, in a peculiar lowland sector near Parma, is critical for understanding spatial–temporal changes contributing to current geo-hydrological risks. A GIS project outlined the geomorphological aspects that define the considerable variations in the course of the Po River (involving width reductions of up to 66% and length changes of up to 14%) and its confluences from the 16th to the 21st century. Knowledge of anthropic modifications is essential as a tool within land-use planning and enhancing community awareness in risk-mitigation activities and strategic management. This study highlights the importance of interdisciplinary geo-historical studies that are complementary in order to decode river dynamics in damaging flood events and latent hazards in an altered river environment. Full article

► Show Figures

Figure 1

22 pages, 32971 KiB

Open AccessArticle

Spatial-Channel Multiscale Transformer Network for Hyperspectral Unmixing

by Haixin Sun, Qiuguang Cao, Fanlei Meng, Jingwen Xu and Mengdi Cheng

Sensors 2025, 25(14), 4493; https://doi.org/10.3390/s25144493 - 19 Jul 2025

Viewed by 347

Abstract

In recent years, deep learning (DL) has been demonstrated remarkable capabilities in hyperspectral unmixing (HU) due to its powerful feature representation ability. Convolutional neural networks (CNNs) are effective in capturing local spatial information, but limited in modeling long-range dependencies. In contrast, transformer architectures [...] Read more.

In recent years, deep learning (DL) has been demonstrated remarkable capabilities in hyperspectral unmixing (HU) due to its powerful feature representation ability. Convolutional neural networks (CNNs) are effective in capturing local spatial information, but limited in modeling long-range dependencies. In contrast, transformer architectures extract global contextual features via multi-head self-attention (MHSA) mechanisms. However, most existing transformer-based HU methods focus only on spatial or spectral modeling at a single scale, lacking a unified mechanism to jointly explore spatial and channel-wise dependencies. This limitation is particularly critical for multiscale contextual representation in complex scenes. To address these issues, this article proposes a novel Spatial-Channel Multiscale Transformer Network (SCMT-Net) for HU. Specifically, a compact feature projection (CFP) module is first used to extract shallow discriminative features. Then, a spatial multiscale transformer (SMT) and a channel multiscale transformer (CMT) are sequentially applied to model contextual relations across spatial dimensions and long-range dependencies among spectral channels. In addition, a multiscale multi-head self-attention (MMSA) module is designed to extract rich multiscale global contextual and channel information, enabling a balance between accuracy and efficiency. An efficient feed-forward network (E-FFN) is further introduced to enhance inter-channel information flow and fusion. Experiments conducted on three real hyperspectral datasets (Samson, Jasper and Apex) and one synthetic dataset showed that SCMT-Net consistently outperformed existing approaches in both abundance estimation and endmember extraction, demonstrating superior accuracy and robustness. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

19 pages, 1196 KiB

Open AccessArticle

The Effects of Landmark Salience on Drivers’ Spatial Cognition and Takeover Performance in Autonomous Driving Scenarios

by Xianyun Liu, Yongdong Zhou and Yunhong Zhang

Behav. Sci. 2025, 15(7), 966; https://doi.org/10.3390/bs15070966 - 16 Jul 2025

Viewed by 234

Abstract

With the increasing prevalence of autonomous vehicles (AVs), drivers’ spatial cognition and takeover performance have become critical to traffic safety. This study investigates the effects of landmark salience—specifically visual and structural salience—on drivers’ spatial cognition and takeover behavior in autonomous driving scenarios. Two [...] Read more.

With the increasing prevalence of autonomous vehicles (AVs), drivers’ spatial cognition and takeover performance have become critical to traffic safety. This study investigates the effects of landmark salience—specifically visual and structural salience—on drivers’ spatial cognition and takeover behavior in autonomous driving scenarios. Two simulator-based experiments were conducted. Experiment 1 examined the impact of landmark salience on spatial cognition tasks, including route re-cruise, scene recognition, and sequence recognition. Experiment 2 assessed the effects of landmark salience on takeover performance. Results indicated that salient landmarks generally enhance spatial cognition; the effects of visual and structural salience differ in scope and function in autonomous driving scenarios. Landmarks with high visual salience not only improved drivers’ accuracy in making intersection decisions but also significantly reduced the time it took to react to a takeover. In contrast, structurally salient landmarks had a more pronounced effect on memory-based tasks, such as scene recognition and sequence recognition, but showed a limited influence on dynamic decision-making tasks like takeover response. These findings underscore the differentiated roles of visual and structural landmark features, highlighting the critical importance of visually salient landmarks in supporting both navigation and timely takeover during autonomous driving. The results provide practical insights for urban road design, advocating for the strategic placement of visually prominent landmarks at key decision points. This approach has the potential to enhance both navigational efficiency and traffic safety. Full article

(This article belongs to the Section Cognition)

► Show Figures

Figure 1

18 pages, 1150 KiB

Open AccessArticle

Navigating by Design: Effects of Individual Differences and Navigation Modality on Spatial Memory Acquisition

by Xianyun Liu, Yanan Zhang and Baihu Sun

Behav. Sci. 2025, 15(7), 959; https://doi.org/10.3390/bs15070959 - 15 Jul 2025

Viewed by 292

Abstract

Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment [...] Read more.

Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment to examine how participants with varying spatial abilities (good or poor) performed under three navigation modes, namely visual, audio, and combined audio–visual navigation modes. A total of 78 participants were divided into two groups, good sense of direction (G-SOD) and poor sense of direction (P-SOD), according to their Santa Barbara Sense of Direction (SBSOD) scores. They were randomly assigned to one of the three navigation modes (visual, audio, audio–visual). Participants followed navigation cues and simulated driving behavior to the end point twice during the learning phase, then completed the route retracing task, recognizing scenes task and recognizing the order task. Significant main effects were found for both SOD group and navigation mode, with no interaction. G-SOD participants outperformed P-SOD participants in route retracing task. Audio navigation mode led to better performance in tasks involving complex spatial decisions, such as turn intersections and recognizing the order. The accuracy of recognizing scenes did not significantly differ across SOD groups or navigation modes. These findings suggest that audio navigation mode may reduce visual distraction and support more effective spatial encoding and that individual spatial abilities influence navigation performance independently of guidance type. These findings highlight the importance of aligning navigation modalities with users’ cognitive profiles and support the development of adaptive navigation systems that accommodate individual differences in spatial ability. Full article

(This article belongs to the Section Cognition)

► Show Figures

Figure 1

20 pages, 3710 KiB

Open AccessArticle

An Accurate LiDAR-Inertial SLAM Based on Multi-Category Feature Extraction and Matching

by Nuo Li, Yiqing Yao, Xiaosu Xu, Shuai Zhou and Taihong Yang

Remote Sens. 2025, 17(14), 2425; https://doi.org/10.3390/rs17142425 - 12 Jul 2025

Viewed by 437

Abstract

Light Detection and Ranging(LiDAR)-inertial simultaneous localization and mapping (SLAM) is a critical component in multi-sensor autonomous navigation systems, providing both accurate pose estimation and detailed environmental understanding. Despite its importance, existing optimization-based LiDAR-inertial SLAM methods often face key limitations: unreliable feature extraction, sensitivity [...] Read more.

Light Detection and Ranging(LiDAR)-inertial simultaneous localization and mapping (SLAM) is a critical component in multi-sensor autonomous navigation systems, providing both accurate pose estimation and detailed environmental understanding. Despite its importance, existing optimization-based LiDAR-inertial SLAM methods often face key limitations: unreliable feature extraction, sensitivity to noise and sparsity, and the inclusion of redundant or low-quality feature correspondences. These weaknesses hinder their performance in complex or dynamic environments and fail to meet the reliability requirements of autonomous systems. To overcome these challenges, we propose a novel and accurate LiDAR-inertial SLAM framework with three major contributions. First, we employ a robust multi-category feature extraction method based on principal component analysis (PCA), which effectively filters out noisy and weakly structured points, ensuring stable feature representation. Second, to suppress outlier correspondences and enhance pose estimation reliability, we introduce a coarse-to-fine two-stage feature correspondence selection strategy that evaluates geometric consistency and structural contribution. Third, we develop an adaptive weighted pose estimation scheme that considers both distance and directional consistency, improving the robustness of feature matching under varying scene conditions. These components are jointly optimized within a sliding-window-based factor graph, integrating LiDAR feature factors, IMU pre-integration, and loop closure constraints. Extensive experiments on public datasets (KITTI, M2DGR) and a custom-collected dataset validate the proposed method’s effectiveness. Results show that our system consistently outperforms state-of-the-art approaches in accuracy and robustness, particularly in scenes with sparse structure, motion distortion, and dynamic interference, demonstrating its suitability for reliable real-world deployment. Full article

(This article belongs to the Special Issue LiDAR Technology for Autonomous Navigation and Mapping)

► Show Figures

Figure 1

23 pages, 88853 KiB

Open AccessArticle

RSW-YOLO: A Vehicle Detection Model for Urban UAV Remote Sensing Images

by Hao Wang, Jiapeng Shang, Xinbo Wang, Qingqi Zhang, Xiaoli Wang, Jie Li and Yan Wang

Sensors 2025, 25(14), 4335; https://doi.org/10.3390/s25144335 - 11 Jul 2025

Viewed by 570

Abstract

Vehicle detection in remote sensing images faces significant challenges due to small object sizes, scale variation, and cluttered backgrounds. To address these issues, we propose RSW-YOLO, an enhanced detection model built upon the YOLOv8n framework, designed to improve feature extraction and robustness against [...] Read more.

Vehicle detection in remote sensing images faces significant challenges due to small object sizes, scale variation, and cluttered backgrounds. To address these issues, we propose RSW-YOLO, an enhanced detection model built upon the YOLOv8n framework, designed to improve feature extraction and robustness against environmental noise. A Restormer module is incorporated into the backbone to model long-range dependencies via self-attention, enabling better handling of multi-scale features and complex scenes. A dedicated detection head is introduced for small objects, focusing on critical channels while suppressing irrelevant information. Additionally, the original CIoU loss is replaced with WIoU, which dynamically reweights predicted boxes based on their quality, enhancing localization accuracy and stability. Experimental results on the DJCAR dataset show mAP@0.5 and mAP@0.5:0.95 improvements of 5.4% and 6.2%, respectively, and corresponding gains of 4.3% and 2.6% on the VisDrone dataset. These results demonstrate that RSW-YOLO offers a robust and accurate solution for UAV-based vehicle detection, particularly in urban scenes with dense or small targets. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

Search Results (509)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (509)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI