Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (836)

Search Parameters:
Keywords = camera and LiDAR

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 5749 KB  
Article
Automatic Multi-Sensor Calibration for Autonomous Vehicles: A Rapid Approach to LiDAR and Camera Data Fusion
by Stefano Arrigoni, Francesca D’Amato and Hafeez Husain Cholakkal
Appl. Sci. 2026, 16(3), 1498; https://doi.org/10.3390/app16031498 - 2 Feb 2026
Abstract
Precise sensor integration is crucial for autonomous vehicle (AV) navigation, yet traditional extrinsic calibration remains costly and labor-intensive. This study proposes an automated calibration approach that uses metaheuristic algorithms (Simulated Annealing (SA), Genetic Algorithms (GA), and Particle Swarm Optimization (PSO)) to independently optimize [...] Read more.
Precise sensor integration is crucial for autonomous vehicle (AV) navigation, yet traditional extrinsic calibration remains costly and labor-intensive. This study proposes an automated calibration approach that uses metaheuristic algorithms (Simulated Annealing (SA), Genetic Algorithms (GA), and Particle Swarm Optimization (PSO)) to independently optimize rotational and translational parameters, reducing cross-compensation errors. Bayesian optimization is used offline to define the search bounds (and tune hyperparameters), accelerating convergence, while computer vision techniques enhance automation by detecting geometric features using a checkerboard reference and a Huber estimator for noise handling. Experimental results demonstrate high accuracy with a single-pose acquisition, supporting multi-sensor configurations and reducing manual intervention, making the method practical for real-world AV applications. Full article
Show Figures

Figure 1

19 pages, 5725 KB  
Article
Real-Time 3D Scene Understanding for Road Safety: Depth Estimation and Object Detection for Autonomous Vehicle Awareness
by Marcel Simeonov, Andrei Kurdiumov and Milan Dado
Vehicles 2026, 8(2), 28; https://doi.org/10.3390/vehicles8020028 - 2 Feb 2026
Abstract
Accurate depth perception is vital for autonomous driving and roadside monitoring. Traditional stereo vision methods are cost-effective but often fail under challenging conditions such as low texture, reflections, or complex lighting. This work presents a perception pipeline built around FoundationStereo, a Transformer-based stereo [...] Read more.
Accurate depth perception is vital for autonomous driving and roadside monitoring. Traditional stereo vision methods are cost-effective but often fail under challenging conditions such as low texture, reflections, or complex lighting. This work presents a perception pipeline built around FoundationStereo, a Transformer-based stereo depth estimation model. At low resolutions, FoundationStereo achieves real-time performance (up to 26 FPS) on embedded platforms like NVIDIA Jetson AGX Orin with TensorRT acceleration and power-of-two input sizes, enabling deployment in roadside cameras and in-vehicle systems. For Full HD stereo pairs, the same model delivers dense and precise environmental scans, complementing LiDAR while maintaining a high level of accuracy. YOLO11 object detection and segmentation is deployed in parallel for object extraction. Detected objects are removed from depth maps generated by FoundationStereo prior to point cloud generation, producing cleaner 3D reconstructions of the environment. This approach demonstrates that advanced stereo networks can operate efficiently on embedded hardware. Rather than replacing LiDAR or radar, it complements existing sensors by providing dense depth maps in situations where other sensors may be limited. By improving depth completeness, robustness, and enabling filtered point clouds, the proposed system supports safer navigation, collision avoidance, and scalable roadside infrastructure scanning for autonomous mobility. Full article
Show Figures

Figure 1

21 pages, 12301 KB  
Article
Visual Localization Algorithm with Dynamic Point Removal Based on Multi-Modal Information Association
by Jing Ni, Boyang Gao, Hongyuan Zhu, Minkun Zhao and Xiaoxiong Liu
ISPRS Int. J. Geo-Inf. 2026, 15(2), 60; https://doi.org/10.3390/ijgi15020060 - 30 Jan 2026
Viewed by 129
Abstract
To enhance the autonomous navigation capability of intelligent agents in complex environments, this paper presents a visual localization algorithm for dynamic scenes that leverages multi-source information fusion. The proposed approach is built upon an odometry framework integrating LiDAR, camera, and IMU data, and [...] Read more.
To enhance the autonomous navigation capability of intelligent agents in complex environments, this paper presents a visual localization algorithm for dynamic scenes that leverages multi-source information fusion. The proposed approach is built upon an odometry framework integrating LiDAR, camera, and IMU data, and incorporates the YOLOv8 model to extract semantic information from images, which is then fused with laser point cloud data. We design a dynamic point removal method based on multi-modal association, which links 2D image masks to 3D point cloud regions, applies Euclidean clustering to differentiate static and dynamic points, and subsequently employs PnP-RANSAC to eliminate any remaining undetected dynamic points. This process yields a robust localization algorithm for dynamic environments. Experimental results on datasets featuring dynamic objects and a custom-built hardware platform demonstrate that the proposed dynamic point removal method significantly improves both the robustness and accuracy of the visual localization system. These findings confirm the feasibility and effectiveness of our system, showcasing its capabilities in precise positioning and autonomous navigation in complex environments. Full article
Show Figures

Figure 1

24 pages, 29852 KB  
Article
Dual-Axis Transformer-GNN Framework for Touchless Finger Location Sensing by Using Wi-Fi Channel State Information
by Minseok Koo and Jaesung Park
Electronics 2026, 15(3), 565; https://doi.org/10.3390/electronics15030565 - 28 Jan 2026
Viewed by 154
Abstract
Camera, lidar, and wearable-based gesture recognition technologies face practical limitations such as lighting sensitivity, occlusion, hardware cost, and user inconvenience. Wi-Fi channel state information (CSI) can be used as a contactless alternative to capture subtle signal variations caused by human motion. However, existing [...] Read more.
Camera, lidar, and wearable-based gesture recognition technologies face practical limitations such as lighting sensitivity, occlusion, hardware cost, and user inconvenience. Wi-Fi channel state information (CSI) can be used as a contactless alternative to capture subtle signal variations caused by human motion. However, existing CSI-based methods are highly sensitive to domain shifts and often suffer notable performance degradation when applied to environments different from the training conditions. To address this issue, we propose a domain-robust touchless finger location sensing framework that operates reliably even in a single-link environment composed of commercial Wi-Fi devices. The proposed system applies preprocessing procedures to reduce noise and variability introduced by environmental factors and introduces a multi-domain segment combination strategy to increase the domain diversity during training. In addition, the dual-axis transformer learns temporal and spatial features independently, and the GNN-based integration module incorporates relationships among segments originating from different domains to produce more generalized representations. The proposed model is evaluated using CSI data collected from various users and days; experimental results show that the proposed method achieves an in-domain accuracy of 99.31% and outperforms the best baseline by approximately 4% and 3% in cross-user and cross-day evaluation settings, respectively, even in a single-link setting. Our work demonstrates a viable path for robust, calibration-free finger-level interaction using ubiquitous single-link Wi-Fi in real-world and constrained environments, providing a foundation for more reliable contactless interaction systems. Full article
Show Figures

Figure 1

19 pages, 7381 KB  
Article
Vision-Aided Velocity Estimation in GNSS Degraded or Denied Environments
by Pierpaolo Serio, Andrea Dan Ryals, Francesca Piana, Lorenzo Gentilini and Lorenzo Pollini
Sensors 2026, 26(3), 786; https://doi.org/10.3390/s26030786 - 24 Jan 2026
Viewed by 233
Abstract
This paper introduces a novel architecture for a navigation system that is designed to estimate the position and velocity of a moving vehicle specifically for remote piloting scenarios where GPS availability is intermittent and can be lost for extended periods of time. The [...] Read more.
This paper introduces a novel architecture for a navigation system that is designed to estimate the position and velocity of a moving vehicle specifically for remote piloting scenarios where GPS availability is intermittent and can be lost for extended periods of time. The purpose of the navigation system is to keep velocity estimation as reliable as possible to allow the vehicle guidance and control systems to maintain close-to-nominal performance. The cornerstone of this system is a landmark-extraction algorithm, which identifies pertinent features within the environment. These features serve as landmarks, enabling continuous and precise adjustments to the vehicle’s estimated velocity. State estimations are performed by a Sequential Kalman filter, which processes camera data regarding the vehicle’s relative position to the identified landmarks. Tracking the landmarks supports a state-of-the-art LiDAR odometry segment and keeps the velocity error low. During an extensive testing phase, the system’s performance was evaluated across various real word trajectories. These tests were designed to assess the system’s capability in maintaining stable velocity estimation under different conditions. The results from these evaluations indicate that the system effectively estimates velocity, demonstrating the feasibility of its application in scenarios where GPS signals are compromised or entirely absent. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

15 pages, 3879 KB  
Article
Bluetooth Low Energy-Based Docking Solution for Mobile Robots
by Kyuman Lee
Electronics 2026, 15(2), 483; https://doi.org/10.3390/electronics15020483 - 22 Jan 2026
Viewed by 66
Abstract
Existing docking methods for mobile robots rely on a LiDAR sensor or image processing using a camera. Although both demonstrate excellent performance in terms of sensing distance and spatial resolution, they are sensitive to environmental effects, such as illumination and occlusion, and are [...] Read more.
Existing docking methods for mobile robots rely on a LiDAR sensor or image processing using a camera. Although both demonstrate excellent performance in terms of sensing distance and spatial resolution, they are sensitive to environmental effects, such as illumination and occlusion, and are expensive. Some environments or conditions require low-power, low-cost novel docking solutions that are less sensitive to the environment. In this study, we propose a guidance and navigation solution for a mobile robot to dock into a docking station using the values of the angle of arrival and received signal strength indicator between the mobile robot and the docking station, measured via wireless communication based on Bluetooth low energy (BLE). This proposed algorithm is a LiDAR- and camera-free docking solution. The proposed algorithm is used to run an actual mobile robot and BLE transceiver hardware, and the obtained result is significantly close to the ground truth for docking. Full article
Show Figures

Figure 1

35 pages, 5497 KB  
Article
Robust Localization of Flange Interface for LNG Tanker Loading and Unloading Under Variable Illumination a Fusion Approach of Monocular Vision and LiDAR
by Mingqin Liu, Han Zhang, Jingquan Zhu, Yuming Zhang and Kun Zhu
Appl. Sci. 2026, 16(2), 1128; https://doi.org/10.3390/app16021128 - 22 Jan 2026
Viewed by 61
Abstract
The automated localization of the flange interface in LNG tanker loading and unloading imposes stringent requirements for accuracy and illumination robustness. Traditional monocular vision methods are prone to localization failure under extreme illumination conditions, such as intense glare or low light, while LiDAR, [...] Read more.
The automated localization of the flange interface in LNG tanker loading and unloading imposes stringent requirements for accuracy and illumination robustness. Traditional monocular vision methods are prone to localization failure under extreme illumination conditions, such as intense glare or low light, while LiDAR, despite being unaffected by illumination, suffers from limitations like a lack of texture information. This paper proposes an illumination-robust localization method for LNG tanker flange interfaces by fusing monocular vision and LiDAR, with three scenario-specific innovations beyond generic multi-sensor fusion frameworks. First, an illumination-adaptive fusion framework is designed to dynamically adjust detection parameters via grayscale mean evaluation, addressing extreme illumination (e.g., glare, low light with water film). Second, a multi-constraint flange detection strategy is developed by integrating physical dimension constraints, K-means clustering, and weighted fitting to eliminate background interference and distinguish dual flanges. Third, a customized fusion pipeline (ROI extraction-plane fitting-3D circle center solving) is established to compensate for monocular depth errors and sparse LiDAR point cloud limitations using flange radius prior. High-precision localization is achieved via four key steps: multi-modal data preprocessing, LiDAR-camera spatial projection, fusion-based flange circle detection, and 3D circle center fitting. While basic techniques such as LiDAR-camera spatiotemporal synchronization and K-means clustering are adapted from prior works, their integration with flange-specific constraints and illumination-adaptive design forms the core novelty of this study. Comparative experiments between the proposed fusion method and the monocular vision-only localization method are conducted under four typical illumination scenarios: uniform illumination, local strong illumination, uniform low illumination, and low illumination with water film. The experimental results based on 20 samples per illumination scenario (80 valid data sets in total) show that, compared with the monocular vision method, the proposed fusion method reduces the Mean Absolute Error (MAE) of localization accuracy by 33.08%, 30.57%, and 75.91% in the X, Y, and Z dimensions, respectively, with the overall 3D MAE reduced by 61.69%. Meanwhile, the Root Mean Square Error (RMSE) in the X, Y, and Z dimensions is decreased by 33.65%, 32.71%, and 79.88%, respectively, and the overall 3D RMSE is reduced by 64.79%. The expanded sample size verifies the statistical reliability of the proposed method, which exhibits significantly superior robustness to extreme illumination conditions. Full article
Show Figures

Figure 1

21 pages, 15860 KB  
Article
Robot Object Detection and Tracking Based on Image–Point Cloud Instance Matching
by Hongxing Wang, Rui Zhu, Zelin Ye and Yaxin Li
Sensors 2026, 26(2), 718; https://doi.org/10.3390/s26020718 - 21 Jan 2026
Viewed by 212
Abstract
Effectively fusing the rich semantic information from camera images with the high-precision geometric measurements provided by LiDAR point clouds is a key challenge in mobile robot environmental perception. To address this problem, this paper proposes a highly extensible instance-aware fusion framework designed to [...] Read more.
Effectively fusing the rich semantic information from camera images with the high-precision geometric measurements provided by LiDAR point clouds is a key challenge in mobile robot environmental perception. To address this problem, this paper proposes a highly extensible instance-aware fusion framework designed to achieve efficient alignment and unified modeling of heterogeneous sensory data. The proposed approach adopts a modular processing pipeline. First, semantic instance masks are extracted from RGB images using an instance segmentation network, and a projection mechanism is employed to establish spatial correspondences between image pixels and LiDAR point cloud measurements. Subsequently, three-dimensional bounding boxes are reconstructed through point cloud clustering and geometric fitting, and a reprojection-based validation mechanism is introduced to ensure consistency across modalities. Building upon this representation, the system integrates a data association module with a Kalman filter-based state estimator to form a closed-loop multi-object tracking framework. Experimental results on the KITTI dataset demonstrate that the proposed system achieves strong 2D and 3D detection performance across different difficulty levels. In multi-object tracking evaluation, the method attains a MOTA score of 47.8 and an IDF1 score of 71.93, validating the stability of the association strategy and the continuity of object trajectories in complex scenes. Furthermore, real-world experiments on a mobile computing platform show an average end-to-end latency of only 173.9 ms, while ablation studies further confirm the effectiveness of individual system components. Overall, the proposed framework exhibits strong performance in terms of geometric reconstruction accuracy and tracking robustness, and its lightweight design and low latency satisfy the stringent requirements of practical robotic deployment. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

19 pages, 2984 KB  
Article
Development and Field Testing of an Acoustic Sensor Unit for Smart Crossroads as Part of V2X Infrastructure
by Yury Furletov, Dinara Aptinova, Mekan Mededov, Andrey Keller, Sergey S. Shadrin and Daria A. Makarova
Smart Cities 2026, 9(1), 17; https://doi.org/10.3390/smartcities9010017 - 21 Jan 2026
Viewed by 136
Abstract
Improving city crossroads safety is a critical problem for modern smart transportation systems (STS). This article presents the results of developing, upgrading, and comprehensively experimentally testing an acoustic monitoring system prototype designed for rapid accident detection. Unlike conventional camera- or lidar-based approaches, the [...] Read more.
Improving city crossroads safety is a critical problem for modern smart transportation systems (STS). This article presents the results of developing, upgrading, and comprehensively experimentally testing an acoustic monitoring system prototype designed for rapid accident detection. Unlike conventional camera- or lidar-based approaches, the proposed solution uses passive sound source localization to operate effectively with no direct visibility and in adverse weather conditions, addressing a key limitation of camera- or lidar-based systems. Generalized Cross-Correlation with Phase Transform (GCC-PHAT) algorithms were used to develop a hardware–software complex featuring four microphones, a multichannel audio interface, and a computation module. This study focuses on the gradual upgrading of the algorithm to reduce the mean localization error in real-life urban conditions. Laboratory and complex field tests were conducted on an open-air testing ground of a university campus. During these tests, the system demonstrated that it can accurately determine the coordinates of a sound source imitating accidents (sirens, collisions). The analysis confirmed that the system satisfies the V2X infrastructure integration response time requirement (<200 ms). The results suggest that the system can be used as part of smart transportation systems. Full article
(This article belongs to the Section Physical Infrastructures and Networks in Smart Cities)
Show Figures

Figure 1

28 pages, 9411 KB  
Article
A Real-Time Mobile Robotic System for Crack Detection in Construction Using Two-Stage Deep Learning
by Emmanuella Ogun, Yong Ann Voeurn and Doyun Lee
Sensors 2026, 26(2), 530; https://doi.org/10.3390/s26020530 - 13 Jan 2026
Viewed by 306
Abstract
The deterioration of civil infrastructure poses a significant threat to public safety, yet conventional manual inspections remain subjective, labor-intensive, and constrained by accessibility. To address these challenges, this paper presents a real-time robotic inspection system that integrates deep learning perception and autonomous navigation. [...] Read more.
The deterioration of civil infrastructure poses a significant threat to public safety, yet conventional manual inspections remain subjective, labor-intensive, and constrained by accessibility. To address these challenges, this paper presents a real-time robotic inspection system that integrates deep learning perception and autonomous navigation. The proposed framework employs a two-stage neural network: a U-Net for initial segmentation followed by a Pix2Pix conditional generative adversarial network (GAN) that utilizes adversarial residual learning to refine boundary accuracy and suppress false positives. When deployed on an Unmanned Ground Vehicle (UGV) equipped with an RGB-D camera and LiDAR, this framework enables simultaneous automated crack detection and collision-free autonomous navigation. Evaluated on the CrackSeg9k dataset, the two-stage model achieved a mean Intersection over Union (mIoU) of 73.9 ± 0.6% and an F1-score of 76.4 ± 0.3%. Beyond benchmark testing, the robotic system was further validated through simulation, laboratory experiments, and real-world campus hallway tests, successfully detecting micro-cracks as narrow as 0.3 mm. Collectively, these results demonstrate the system’s potential for robust, autonomous, and field-deployable infrastructure inspection. Full article
(This article belongs to the Special Issue Sensing and Control Technology of Intelligent Robots)
Show Figures

Figure 1

20 pages, 6170 KB  
Article
Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances
by Muhammad Arslan Ghaffar, Kangshuai Zhang, Nuo Pan and Lei Peng
Sensors 2026, 26(2), 408; https://doi.org/10.3390/s26020408 - 8 Jan 2026
Viewed by 372
Abstract
Autonomous vehicles (AVs) rely on LiDAR and camera sensors to perceive their environment. However, adverse weather conditions, such as rain, snow, and fog, negatively affect these sensors, reducing their reliability by introducing unwanted noise. Effective denoising of multimodal sensor data is crucial for [...] Read more.
Autonomous vehicles (AVs) rely on LiDAR and camera sensors to perceive their environment. However, adverse weather conditions, such as rain, snow, and fog, negatively affect these sensors, reducing their reliability by introducing unwanted noise. Effective denoising of multimodal sensor data is crucial for safe and reliable AV operation in such circumstances. Existing denoising methods primarily focus on unimodal approaches, addressing noise in individual modalities without fully leveraging the complementary nature of LiDAR and camera data. To enhance multimodal perception in adverse weather, we propose a novel Adaptive Cross-Modal Denoising (ACMD) framework, which leverages modality-specific self-denoising encoders, followed by an Adaptive Bridge Controller (ABC) to evaluate residual noise and guide the direction of cross-modal denoising. Following this, the Cross-Modal Denoising (CMD) module is introduced, which selectively refines the noisier modality using semantic guidance from the cleaner modality. Synthetic noise was added to both sensors’ data during training to simulate real-world noisy conditions. Experiments on the WeatherKITTI dataset show that ACMD surpasses traditional unimodal denoising methods (Restormer, PathNet, BM3D, PointCleanNet) by 28.2% in PSNR and 33.3% in CD, and outperforms state-of-the-art fusion models by 16.2% in JDE. The ACMD framework enhances AV reliability in adverse weather conditions, supporting safe autonomous driving. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

29 pages, 4806 KB  
Article
KuRALS: Ku-Band Radar Datasets for Multi-Scene Long-Range Surveillance with Baselines and Loss Design
by Teng Li, Qingmin Liao, Youcheng Zhang, Xinyan Zhang, Zongqing Lu and Liwen Zhang
Remote Sens. 2026, 18(1), 173; https://doi.org/10.3390/rs18010173 - 5 Jan 2026
Viewed by 323
Abstract
Compared to cameras and LiDAR, radar provides superior robustness under adverse conditions, as well as extended sensing range and inherent velocity measurement, making it critical for surveillance applications. To advance research in deep learning-based radar perception technology, several radar datasets have been publicly [...] Read more.
Compared to cameras and LiDAR, radar provides superior robustness under adverse conditions, as well as extended sensing range and inherent velocity measurement, making it critical for surveillance applications. To advance research in deep learning-based radar perception technology, several radar datasets have been publicly released. However, most of these datasets are designed for autonomous driving applications, and existing radar surveillance datasets suffer from limited scene and target diversity. To address this gap, we introduce KuRALS, a range–Doppler (RD)-level radar surveillance dataset designed for learning-based long-range detection of moving targets. The dataset covers aerial (unmanned aerial vehicles), land (pedestrians and cars) and maritime (boats) scenarios. KuRALS is real-measured by two Kurz-under (Ku) band radars and contains two subsets (KuRALS-CW and KuRALS-PD). It consists of RD spectrograms with pixel-wise annotations of categories, velocity and range coordinates, and the azimuth and elevation angles are also provided. To benchmark performance, we develop a lightweight radar semantic segmentation (RSS) baseline model and further investigate various perception modules within this framework. In addition, we propose a novel interference-suppression loss function to enhance robustness against background interference. Extensive experimental results demonstrate that our proposed solution significantly outperforms existing approaches, with improvements of 10.0% in mIoU on the KuRALS-CW dataset and 9.4% on the KuRALS-PD dataset. Full article
Show Figures

Figure 1

14 pages, 2571 KB  
Article
RMP: Robust Multi-Modal Perception Under Missing Condition
by Xin Ma, Xuqi Cai, Yuansheng Song, Yu Liang, Gang Liu and Yijun Yang
Electronics 2026, 15(1), 119; https://doi.org/10.3390/electronics15010119 - 26 Dec 2025
Viewed by 271
Abstract
Multi-modal perception is a core technology for edge devices to achieve safe and reliable environmental understanding in autonomous driving scenarios. In recent years, most approaches have focused on integrating complementary signals from diverse sensors, including cameras and LiDAR, to improve scene understanding in [...] Read more.
Multi-modal perception is a core technology for edge devices to achieve safe and reliable environmental understanding in autonomous driving scenarios. In recent years, most approaches have focused on integrating complementary signals from diverse sensors, including cameras and LiDAR, to improve scene understanding in complex traffic environments, thereby attracting significant attention. However, in real-world applications, sensor failures frequently occur; for instance, cameras may malfunction in scenarios with poor illumination, which severely reduces the accuracy of perception models. To overcome this issue, we propose a robust multi-modal perception pipeline designed to improve model performance under missing modality conditions. Specifically, we design a missing feature reconstruction mechanism to reconstruct absent features by leveraging intra-modal common clues. Furthermore, we introduce a multi-modal adaptive fusion strategy to facilitate adaptive multi-modal integration through inter-modal feature interactions. Extensive experiments on the nuScenes benchmark demonstrate that our method achieves SOTA-level performance under missing-modality conditions. Full article
(This article belongs to the Special Issue Hardware and Software Co-Design in Intelligent Systems)
Show Figures

Figure 1

24 pages, 4196 KB  
Article
Real-Time Cooperative Path Planning and Collision Avoidance for Autonomous Logistics Vehicles Using Reinforcement Learning and Distributed Model Predictive Control
by Mingxin Li, Hui Li, Yunan Yao, Yulei Zhu, Hailong Weng, Huabiao Jin and Taiwei Yang
Machines 2026, 14(1), 27; https://doi.org/10.3390/machines14010027 - 24 Dec 2025
Viewed by 371
Abstract
In industrial environments such as ports and warehouses, autonomous logistics vehicles face significant challenges in coordinating multiple vehicles while ensuring safe and efficient path planning. This study proposes a novel real-time cooperative control framework for autonomous vehicles, combining reinforcement learning (RL) and distributed [...] Read more.
In industrial environments such as ports and warehouses, autonomous logistics vehicles face significant challenges in coordinating multiple vehicles while ensuring safe and efficient path planning. This study proposes a novel real-time cooperative control framework for autonomous vehicles, combining reinforcement learning (RL) and distributed model predictive control (DMPC). The RL agent dynamically adjusts the optimization weights of the DMPC to adapt to the vehicle’s real-time environment, while the DMPC enables decentralized path planning and collision avoidance. The system leverages multi-source sensor fusion, including GNSS, UWB, IMU, LiDAR, and stereo cameras, to provide accurate state estimations of vehicles. Simulation results demonstrate that the proposed RL-DMPC approach outperforms traditional centralized control strategies in terms of tracking accuracy, collision avoidance, and safety margins. Furthermore, the proposed method significantly improves control smoothness compared to rule-based strategies. This framework is particularly effective in dynamic and constrained industrial settings, offering a robust solution for multi-vehicle coordination with minimal communication delays. The study highlights the potential of combining RL with DMPC to achieve real-time, scalable, and adaptive solutions for autonomous logistics. Full article
(This article belongs to the Special Issue Control and Path Planning for Autonomous Vehicles)
Show Figures

Figure 1

19 pages, 2488 KB  
Article
Bidirectional Complementary Cross-Attention and Temporal Adaptive Fusion for 3D Object Detection in Intelligent Transportation Scenes
by Di Tian, Jiawei Wang, Jiabo Li, Mingming Gong, Jiahang Shi, Zhongyi Huang and Zhongliang Fu
Electronics 2026, 15(1), 83; https://doi.org/10.3390/electronics15010083 - 24 Dec 2025
Viewed by 234
Abstract
Multi-sensor fusion represents a primary approach for enhancing environmental perception in intelligent transportation scenes. Among diverse fusion strategies, Bird’s-Eye View (BEV) perspective-based fusion methods have emerged as a prominent research focus owing to advantages such as unified spatial representation. However, current BEV fusion [...] Read more.
Multi-sensor fusion represents a primary approach for enhancing environmental perception in intelligent transportation scenes. Among diverse fusion strategies, Bird’s-Eye View (BEV) perspective-based fusion methods have emerged as a prominent research focus owing to advantages such as unified spatial representation. However, current BEV fusion methods still face challenges with insufficient robustness in cross-modal alignment and weak perception of dynamic objects. To address these challenges, this paper proposes a Bidirectional Complementary Cross-Attention Module (BCCA), which achieves deep fusion of image and point cloud features by adaptively learning cross-modal attention weights, thereby significantly improving cross-modal information interaction. Secondly, we propose a Temporal Adaptive Fusion Module (TAFusion). This module effectively incorporates temporal information within the BEV space and enables efficient fusion of multi-modal features across different frames through a two-stage alignment strategy, substantially enhancing the model’s ability to perceive dynamic objects. Based on the above, we integrate these two modules to propose the Dual Temporal and Transversal Attention Network (DTTANet), a novel camera and LiDAR fusion framework. Comprehensive experiments demonstrate that our proposed method achieves improvements of 1.42% in mAP and 1.26% in NDS on the nuScenes dataset compared to baseline networks, effectively advancing the development of 3D object detection technology for intelligent transportation scenes. Full article
Show Figures

Figure 1

Back to TopTop