MDPI - Publisher of Open Access Journals

34 pages, 2338 KB

Open AccessReview

A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey

by Wan D. Bae, Shayma Alkobaisi, Muhammad Farhan Safdar and Prachitee Chouhan

AgriEngineering 2026, 8(6), 249; https://doi.org/10.3390/agriengineering8060249 - 18 Jun 2026

Viewed by 232

Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over [...] Read more.

Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over 100 peer-reviewed studies within a unified four-dimensional taxonomy defined by sensing modality, data type, model family, and analytical task. The taxonomy enables systematic comparison across RGB, multispectral, hyperspectral, LiDAR, and IoT data sources and across classical machine learning, deep learning, hybrid sequential models, and emerging transformer-based architectures. We analyze how modeling choices interact with data characteristics to influence robustness, cross-environment generalization, computational efficiency, and deployment feasibility on UAV and edge platforms. Recurring challenges include limited labeled data, domain shift across seasons and fields, multimodal heterogeneity, occlusion, and real-time processing constraints. We identify emerging research directions, including data-efficient learning, representation-level multimodal fusion, domain adaptation, lightweight architectures for embedded deployment, and uncertainty aware decision support. By formalizing the landscape through a unified taxonomy, this survey provides a foundation for designing scalable, robust, and deployable machine learning systems for next-generation precision agriculture. Full article

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

► Show Figures

Graphical abstract

20 pages, 8392 KB

Open AccessArticle

Rail-BEV: A LiDAR-Centric and Sensor-Aware BEV Perception Framework for Long-Range Railway Obstacle Detection

by Jinghan Huang, Wentao Hu, Zifeng He, Chixiang Ma, Wenbo Song, Xinci Liu and Mingxin Yang

Sensors 2026, 26(12), 3637; https://doi.org/10.3390/s26123637 - 7 Jun 2026

Viewed by 322

Abstract

Reliable long-range onboard perception is a prerequisite for future railway safety systems, where potential obstacles must be recognized under long braking distances, sparse far-field returns, and strongly constrained rail-corridor geometry. This paper presents Rail-BEV as an initial reproducible baseline study for LiDAR-centric, sensor-aware [...] Read more.

Reliable long-range onboard perception is a prerequisite for future railway safety systems, where potential obstacles must be recognized under long braking distances, sparse far-field returns, and strongly constrained rail-corridor geometry. This paper presents Rail-BEV as an initial reproducible baseline study for LiDAR-centric, sensor-aware bird’s-eye-view (BEV) railway obstacle perception. LiDAR is used as the primary geometric sensing modality, while a front-center RGB camera provides lightweight auxiliary visual evidence through calibrated LiDAR-to-image projection. The aligned geometric and visual cues are organized within a unified railway-oriented BEV backend that integrates geometry-aware fusion, rail-geometry prediction, and lightweight inference-time structural refinement. Evaluation was conducted on a scene-isolated railway benchmark with range-stratified center-distance matching, and all model variants were assessed on independent test sequences rather than on validation-selected checkpoints. Compared with CenterPoint and BEVFusion baselines evaluated under the same settings, Rail-BEV achieved the highest overall mAP of 0.6669, with particularly improved long-range pedestrian perception. The controlled ablation further shows that front-view RGB evidence improves the LiDAR-only baseline from 0.5612 to 0.5750 mAP, while ROI-based rail-corridor refinement further increases mAP to 0.5916 and Rail-BEV mIoU to 0.1193. These results indicate that LiDAR-centered sensing, lightweight visual assistance, and coarse rail-aware structural reasoning can be jointly organized to support reproducible long-range railway obstacle perception. This study also clarifies the remaining limitations in rail-geometry quality, calibration robustness, sensor degradation, and strict railway-oriented localization. Full article

(This article belongs to the Section Communications)

► Show Figures

Graphical abstract

21 pages, 3457 KB

Open AccessArticle

Hardware-Accelerated 3D LiDAR-Based Object Detection with BEV Spatial Mapping on Embedded FPGA Platforms

by Güner Tatar and Mahmud Esad Arar

Electronics 2026, 15(11), 2296; https://doi.org/10.3390/electronics15112296 - 25 May 2026

Viewed by 314

Abstract

This paper introduces a hardware/software co-designed 3D object detection pipeline based on the PointPillars architecture for low-power embedded MPSoC deployment. The proposed system accelerates the computationally intensive stages in programmable logic (PL), including ROI filtering, coordinate transformation, pillarization, centroid extraction, and INT8 neural [...] Read more.

This paper introduces a hardware/software co-designed 3D object detection pipeline based on the PointPillars architecture for low-power embedded MPSoC deployment. The proposed system accelerates the computationally intensive stages in programmable logic (PL), including ROI filtering, coordinate transformation, pillarization, centroid extraction, and INT8 neural inference, using Vitis high-level synthesis (HLS) and an integrated Deep Learning Processing Unit (DPU). Control-oriented and irregular operations, such as data acquisition, Direct Memory Access (DMA) control, lightweight Non-Maximum Suppression (NMS), visualization, and logging, remain on the processing system (PS). The design targets the AMD Kria KV260 platform and achieves an accelerated core pipeline latency of 11.4 ms per frame at 300 MHz, corresponding to 87.4 Hz throughput, with 6.842 W board-level power consumption. Including PS-side NMS, the practical end-to-end latency is approximately 12.2 ms for typical KITTI scenes. Compared with existing Field-Programmable Gate Array (FPGA)-based implementations implementations, the proposed design reduces latency by up to

33 \times

. It achieves a

202 \times

improvement in on-chip BRAM efficiency across HLS optimization versions through FIFO streaming, dataflow execution, and array partitioning. Experimental validation on physical hardware confirms that the proposed PL-accelerated hardware/software co-design provides a practical and cost-effective solution for real-time 3D LiDAR perception on embedded FPGA platforms. Full article

(This article belongs to the Special Issue Advances in 2D/3D Object Detection Techniques and Systems)

► Show Figures

Figure 1

35 pages, 13120 KB

Open AccessArticle

A Three-Dimensional LiDAR Observability Framework for Pedestrian Representation: Sensor Placement and Multi-View Fusion on a Compact Autonomous Vehicle

by Juan Diego Valladolid, Juan P. Ortiz, Franklin Castillo, José Vuelvas and Chuan Yu

Sensors 2026, 26(9), 2670; https://doi.org/10.3390/s26092670 - 25 Apr 2026

Viewed by 1273

Abstract

Reliable pedestrian perception in autonomous driving depends not only on detecting the target, but also on how completely and consistently its three-dimensional geometry is captured from different sensor viewpoints. This study presents a LiDAR-based observability framework for evaluating pedestrian representation on the ANTA [...] Read more.

Reliable pedestrian perception in autonomous driving depends not only on detecting the target, but also on how completely and consistently its three-dimensional geometry is captured from different sensor viewpoints. This study presents a LiDAR-based observability framework for evaluating pedestrian representation on the ANTA compact autonomous vehicle platform using a roof-mounted Top LiDAR (TL), a Front-Right LiDAR (FRL), and their fused configuration. The pedestrian was analyzed in a canonical local frame using geometric extent ratios, projected surface occupancy, voxel-based volumetric occupancy, and statistical descriptors of the local point distribution, integrated into a global observability score,

S_{3 D}

. A Distance-Robustness Index (DRI), an overlap-based complementarity analysis, and a lightweight temporal centroid-sensitivity check over 20 consecutive frames were used to characterize performance across distance. Using ROS 2 bag data processed offline in MATLAB R2025b the fused configuration achieved the highest mean global score (0.563), compared with 0.504 for FRL and 0.432 for TL, and the highest robustness (

DRI = 0.5628

,

C V = 10.7 %

). The results show that 1 m maximizes local density, 2–3 m maximize projected and volumetric completeness, and 7 m provides the best balanced observability. Within the evaluated platform and under the controlled benchmark conditions, complementary multi-LiDAR fusion provided the strongest overall geometry-aware pedestrian representation. Full article

(This article belongs to the Special Issue Sensor Fusion for the Safety of Automated Driving Systems)

► Show Figures

Figure 1

32 pages, 9060 KB

Open AccessArticle

Snow-Covered Filter-Enhanced Canopy Surface Points: A Lightweight and Efficient Framework for Individual Tree Segmentation from LiDAR Data

by Bin Wang, Guangqing Xie, Ning Li, Ertao Gao, Guoqing Zhou, Cheng Wang and Haoyu Wang

Remote Sens. 2026, 18(9), 1305; https://doi.org/10.3390/rs18091305 - 24 Apr 2026

Viewed by 291

Abstract

As fundamental units of forest ecosystems, individual trees provide essential structural characteristics for forest resource assessment. However, existing LiDAR-based individual tree segmentation methods are often limited by a trade-off between information preservation and computational efficiency. This study proposes a novel framework for individual [...] Read more.

As fundamental units of forest ecosystems, individual trees provide essential structural characteristics for forest resource assessment. However, existing LiDAR-based individual tree segmentation methods are often limited by a trade-off between information preservation and computational efficiency. This study proposes a novel framework for individual tree segmentation from LiDAR data based on canopy surface points (CSP), aiming to balance this trade-off. The framework introduces a Snow-Covered Filter (SCF) that simulates snow deposition to extract surface points from the point cloud. After removing ground points from these surface points, the resulting CSP retains the core 3D structure of the canopy while significantly reducing data volume. We validate the proposed framework on four multi-platform datasets using four algorithms that represent the evolution of individual tree segmentation methods: Dalponte2016, K-means, Li2012, and SegmentAnyTree. The results demonstrate that: (a) the SCF effectively extracts surface points, with an average F1-score of 0.703; (b) segmentation using CSP achieves accuracy comparable to that obtained using all points or raster data (mean

Δ F = 0.027

), with the primary gap observed for SegmentAnyTree (maximum F-score reduction of 0.259); (c) the framework offers substantial efficiency gains: >40% point reduction, ~38.4% average runtime reduction (maximum saving ~4660 s), and lower memory consumption. By providing a lightweight yet structurally rich data representation, this work presents an innovative and efficient approach to individual tree segmentation, with promising potential for large-scale forest resource management. Full article

► Show Figures

Figure 1

34 pages, 6632 KB

Open AccessArticle

SPICD-Net: A Siamese PointNet Framework for Autonomous Indoor Change Detection in 3D LiDAR Point Clouds

by Dalibor Šeljmeši, Vladimir Brtka, Velibor Ilić, Dalibor Dobrilović, Eleonora Brtka and Višnja Ognjenović

AI 2026, 7(4), 141; https://doi.org/10.3390/ai7040141 - 15 Apr 2026

Viewed by 978

Abstract

Reliable change detection in indoor environments remains a challenge for autonomous robotic systems using 3D LiDAR. Existing methods often require manual annotation, computationally intensive architectures, or focus on outdoor scenes. This paper presents SPICD-Net, a lightweight Siamese PointNet framework for indoor 3D change [...] Read more.

Reliable change detection in indoor environments remains a challenge for autonomous robotic systems using 3D LiDAR. Existing methods often require manual annotation, computationally intensive architectures, or focus on outdoor scenes. This paper presents SPICD-Net, a lightweight Siamese PointNet framework for indoor 3D change detection trained exclusively on synthetically generated anomalies, eliminating manual labeling. The framework offers three deployment-oriented contributions: a three-class Siamese formulation separating no-change, changed, and geometrically inconsistent tile pairs; a pre-FPS anomaly injection strategy that aligns synthetic training with inference-time preprocessing; and a stochastic-gated Chamfer-statistics branch that complements learned embeddings with explicit geometric cues under consumer-grade hardware constraints. Evaluated on 14 controlled simulation experiments in an indoor corridor dataset, SPICD-Net achieved aggregated Precision = 0.86, Recall = 0.82, F1-score = 0.84, and Accuracy = 0.96, with zero false positives in the no-change baseline and mean inference time of 22.4 s for a 172-tile map on a single consumer GPU. Additional robustness experiments identified registration accuracy as the main operational prerequisite. A limited real-world validation in one unseen room (four scans, 67 tiles) achieved Precision = 0.583, Recall = 1.000, and F1 = 0.737. Full article

(This article belongs to the Special Issue Artificial Intelligence for Robotic Perception and Planning)

► Show Figures

Figure 1

22 pages, 4917 KB

Open AccessTechnical Note

Reducing Latency in Digital Twins: A Framework for Near-Real-Time Progress and Quality Reporting

by Zvonko Sigmund, Ivica Završki, Ivan Marović and Kristijan Vilibić

Buildings 2026, 16(7), 1448; https://doi.org/10.3390/buildings16071448 - 6 Apr 2026

Viewed by 840

Abstract

While Digital Twins offer transformative potential, their efficacy for real-time control is constrained by the slow data acquisition and the high computational intensity required to process raw datasets like point clouds. This paper identifies these critical bottlenecks—specifically the latency between data capture and [...] Read more.

While Digital Twins offer transformative potential, their efficacy for real-time control is constrained by the slow data acquisition and the high computational intensity required to process raw datasets like point clouds. This paper identifies these critical bottlenecks—specifically the latency between data capture and actionable insight—and proposes a refined theoretical framework for near-real-time automated progress monitoring and quality reporting. Building on the findings of the NORMENG project and informing the subsequent AutoGreenTraC project, this research synthesizes state-of-the-art advancements in reality capture, including LIDAR, SfM-MVS, and 360-degree vision. The study highlights a fundamental divergence in stakeholder requirements: the need for millimeter-level precision in quality control versus the demand for high-velocity documentation for progress monitoring. A key innovation presented is the shift toward neural rendering techniques to bypass the computational delays of traditional photogrammetry and enable immediate on-site visualization. By structuring a tiered processing hierarchy that combines lightweight edge analysis for immediate safety and progress monitoring with asynchronous high-fidelity Digital Twin updates, the framework aims to establish a single source of truth. Full article

(This article belongs to the Special Issue Application of Building Information Modelling in Construction Management)

► Show Figures

Figure 1

38 pages, 3132 KB

Open AccessArticle

Lightweight Semantic-Aware Route Planning on Edge Hardware for Indoor Mobile Robots: Monocular Camera–2D LiDAR Fusion with Penalty-Weighted Nav2 Route Server Replanning

by Bogdan Felician Abaza, Andrei-Alexandru Staicu and Cristian Vasile Doicin

Sensors 2026, 26(7), 2232; https://doi.org/10.3390/s26072232 - 4 Apr 2026

Viewed by 1834

Abstract

The paper introduces a computationally efficient semantic-aware route planning framework for indoor mobile robots, designed for real-time execution on resource-constrained edge hardware (Raspberry Pi 5, CPU-only). The proposed architecture fuses monocular object detection with 2D LiDAR-based range estimation and integrates the resulting semantic [...] Read more.

The paper introduces a computationally efficient semantic-aware route planning framework for indoor mobile robots, designed for real-time execution on resource-constrained edge hardware (Raspberry Pi 5, CPU-only). The proposed architecture fuses monocular object detection with 2D LiDAR-based range estimation and integrates the resulting semantic annotations into the Nav2 Route Server for penalty-weighted route selection. Object localization in the map frame is achieved through the Angular Sector Fusion (ASF) pipeline, a deterministic geometric method requiring no parameter tuning. The ASF projects YOLO bounding boxes onto LiDAR angular sectors and estimates the object range using a 25th-percentile distance statistic, providing robustness to sparse returns and partial occlusions. All intrinsic and extrinsic sensor parameters are resolved at runtime via ROS 2 topic introspection and the URDF transform tree, enabling platform-agnostic deployment. Detected entities are classified according to mobility semantics (dynamic, static, and minor) and persistently encoded in a GeoJSON-based semantic map, with these annotations subsequently propagated to navigation graph edges as additive penalties and velocity constraints. Route computation is performed by the Nav2 Route Server through the minimization of a composite cost functional combining geometric path length with semantic penalties. A reactive replanning module monitors semantic cost updates during execution and triggers route invalidation and re-computation when threshold violations occur. Experimental evaluation over 115 navigation segments (legs) on three heterogeneous robotic platforms (two single-board RPi5 configurations and one dual-board setup with inference offloading) yielded an overall success rate of 97% (baseline: 100%, adaptive: 94%), with 42 replanning events observed in 57% of adaptive trials. Navigation time distributions exhibited statistically significant departures from normality (Shapiro–Wilk, p < 0.005). While central tendency differences between the baseline and adaptive modes were not significant (Mann–Whitney U, p = 0.157), the adaptive planner reduced temporal variance substantially (σ = 11.0 s vs. 31.1 s; Levene’s test W = 3.14, p = 0.082), primarily by mitigating AMCL recovery-induced outliers. On-device YOLO26n inference, executed via the NCNN backend, achieved 5.5 ± 0.7 FPS (167 ± 21 ms latency), and distributed inference reduced the average system CPU load from 85% to 48%. The study further reports deployment-level observations relevant to the Nav2 ecosystem, including GeoJSON metadata persistence constraints, graph discontinuity (“path-gap”) artifacts, and practical Route Server configuration patterns for semantic cost integration. Full article

(This article belongs to the Special Issue Advances in Sensing, Control and Path Planning for Robotic Systems)

► Show Figures

Figure 1

18 pages, 4452 KB

Open AccessArticle

Fast 3D Gaussian Reconstruction for Open-Pit Mine Teleoperated Excavation via Monocular-LiDAR Fusion

by Lin Bi, Muqian Tan, Ziyu Zhao, Jinbo Li and Xintong Wang

Mathematics 2026, 14(7), 1191; https://doi.org/10.3390/math14071191 - 2 Apr 2026

Viewed by 473

Abstract

Teleoperated open-pit excavation requires fast and reliable 3D scene modeling under lightweight sensor configurations. To this end, this paper proposes a monocular camera–LiDAR fusion-based fast 3D Gaussian reconstruction method tailored for teleoperated open-pit excavation. The proposed approach uses only two sensors, a monocular [...] Read more.

Teleoperated open-pit excavation requires fast and reliable 3D scene modeling under lightweight sensor configurations. To this end, this paper proposes a monocular camera–LiDAR fusion-based fast 3D Gaussian reconstruction method tailored for teleoperated open-pit excavation. The proposed approach uses only two sensors, a monocular camera and LiDAR, and integrates SPNet, a depth completion network, to improve the geometric completeness of the reconstructed scene. It further introduces a stride-aware initialization strategy that leverages the depth–stride correlation to jointly construct the initial Gaussian set and estimate the initial scales. During optimization, scale and color regularization are applied to prevent uncontrolled growth of Gaussians. Experiments in a Carla-simulated open-pit excavation scenario show that, under high-resolution input of 1920 × 1080, the proposed method achieves a stable 3D model update rate of approximately 2.5 Hz. The reconstruction quality under training viewpoints reaches PSNR 30.5388, SSIM 0.9161, and LPIPS 0.1333. Compared with 4DTAM and MonoGS, the proposed method achieves better overall reconstruction quality. It also maintains a much higher update rate than 4DTAM and a comparable update rate to MonoGS. Ablation studies further verify the critical contribution of the depth completion module and the stride-aware initialization strategy to the overall reconstruction performance. In addition, preliminary validation on field data further demonstrates the applicability of the proposed method under real-world open-pit excavation-loading conditions. The proposed method generates stable and usable 3D models of rock-pile working face under a lightweight sensor configuration, providing a reliable geometric basis for remote situational awareness and excavation assistance. Full article

(This article belongs to the Special Issue Mathematical Modeling and Analysis in Mining Engineering)

► Show Figures

Figure 1

15 pages, 3088 KB

Open AccessArticle

Lightweight Semantic Segmentation Algorithm Based on Gated Visual State Space Models

by Kui Di, Jinming Cheng, Lili Zhang and Yubin Bao

Electronics 2026, 15(6), 1175; https://doi.org/10.3390/electronics15061175 - 12 Mar 2026

Viewed by 643

Abstract

LiDAR serves as the primary sensor for acquiring environmental information in intelligent driving systems. However, under adverse weather conditions, point cloud signals obtained by LiDAR suffer from intensity attenuation and noise interference, leading to a decline in segmentation accuracy. To address these issues, [...] Read more.

LiDAR serves as the primary sensor for acquiring environmental information in intelligent driving systems. However, under adverse weather conditions, point cloud signals obtained by LiDAR suffer from intensity attenuation and noise interference, leading to a decline in segmentation accuracy. To address these issues, this paper designs a lightweight semantic segmentation system based on the Gated Visual State Space Model (VMamba), named RainMamba. Specifically, the system utilizes spherical projection to transform point clouds into 2D sequences and constructs a physical perception feature embedding module guided by the Beer–Lambert law to explicitly model and suppress spatial noise at the source. Subsequently, an uncertainty-weighted cross-modal correction module is employed to incorporate RGB images for dynamically calibrating the degraded point cloud data. Finally, a VMamba backbone is adopted to establish global dependencies with linear complexity. Experimental results on the SemanticKITTI dataset demonstrate that the system achieves an inference speed of 83 FPS, with a relative mIoU improvement of approximately 7.2% compared to the real-time baseline PolarNet. Furthermore, zero-shot evaluations on the real-world SemanticSTF dataset validate the system’s robust Sim-to-Real generalization capability. Notably, RainMamba delivers highly competitive accuracy comparable to the state-of-the-art heavy-weight model PTv3 while requiring a significantly lower parameter footprint, thereby demonstrating its immense potential for practical edge-computing deployment. Full article

(This article belongs to the Special Issue Advanced Technologies and Future Trends in Visual Recognition and Signal Processing)

► Show Figures

Figure 1

15 pages, 1892 KB

Open AccessArticle

Lightweight LiDAR-Based 3D Human Pose Estimation via 2D Depth Images for Autonomous Driving

by Gyu-Yeon Kim, Somi Park, Sunkyung Lee, Bobin Seo, Seon-Han Choi and Sung-Min Park

Sensors 2026, 26(5), 1631; https://doi.org/10.3390/s26051631 - 5 Mar 2026

Viewed by 638

Abstract

Real-world traffic is highly dynamic, with pedestrians exhibiting unpredictable movements. Pedestrians’ poses are essential cues for predicting their actions, enabling vehicles to respond proactively and reduce accident risks. In autonomous driving, the distance between vehicles and pedestrians is critical, making 3D human pose [...] Read more.

Real-world traffic is highly dynamic, with pedestrians exhibiting unpredictable movements. Pedestrians’ poses are essential cues for predicting their actions, enabling vehicles to respond proactively and reduce accident risks. In autonomous driving, the distance between vehicles and pedestrians is critical, making 3D human pose estimation crucial. In this context, pedestrian pose estimation has been actively studied, and recently, light detection and ranging (LiDAR) sensors have attracted attention due to their accurate 3D depth information and privacy benefits. However, existing LiDAR-based 3D pose estimation methods mainly process 3D data directly, requiring high computational cost and memory. In this paper, we propose a lightweight LiDAR-based 3D human pose estimation method specifically designed for deployment in autonomous driving systems. Unlike conventional 3D direct processing methods, our approach strategically reduces computational complexity by projecting point clouds into 2D depth images and leveraging a lightweight MoveNet, followed by efficient 3D lifting. Furthermore, we introduce a self-occlusion correction algorithm to improve robustness under side-view and bending poses, where depth-based projections often suffer from distortion. Experimental results on benchmark datasets demonstrate that the proposed method achieves competitive pose estimation accuracy while substantially improving efficiency, highlighting its practicality and scalability for real-time autonomous vehicle applications. Full article

(This article belongs to the Special Issue Recent Advances in LiDAR Sensing Technology for Autonomous Vehicles)

► Show Figures

Figure 1

20 pages, 3202 KB

Open AccessArticle

Robust LiDAR-Based Train Detection via Point Cloud Segmentation for Railway Safety

by Yuxing Yang, Siyue Yu and Jimin Xiao

Sensors 2026, 26(5), 1514; https://doi.org/10.3390/s26051514 - 27 Feb 2026

Cited by 1 | Viewed by 621

Abstract

Ensuring railway safety requires reliable monitoring of trains in critical safety areas, such as station throat zones and railway crossings. Compared with cameras, roadside LiDAR can more reliably capture the geometry of trains under low-light, high-speed, and adverse weather conditions. However, industrial LiDAR [...] Read more.

Ensuring railway safety requires reliable monitoring of trains in critical safety areas, such as station throat zones and railway crossings. Compared with cameras, roadside LiDAR can more reliably capture the geometry of trains under low-light, high-speed, and adverse weather conditions. However, industrial LiDAR solutions still primarily use the background comparison technique, which compares each sample against a pre-recorded clean map and then applies a size-based filter. Such approaches are highly sensitive to point cloud background changes arising from varying LiDAR installation distances, train speeds, and surface materials, often resulting in fragmented clustering and missed detections. In this paper, train detection is reformulated as a point-level semantic segmentation problem. A lightweight 3D segmentation network that directly predicts train points from raw data is designed, and clustering-based post-processing is applied to generate train-level events in real time. Experiments on real railway data under various operating conditions show that the proposed method achieves higher detection accuracy and greater robustness than traditional compare-based methods and representative deep learning benchmark methods, and is therefore suitable for practical railway safety monitoring. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

19 pages, 20762 KB

Open AccessEditor’s ChoiceArticle

Asymmetric Explicit Synergy for Multi-Modal 3D Gaussian Pre-Training in Autonomous Driving

by Dingwei Zhang, Jie Ji, Chengjun Huang, Bichun Li, Chennian Yu, Chenhui Qu, Zhengyuan Yang, Chen Hua and Biao Yu

World Electr. Veh. J. 2026, 17(2), 102; https://doi.org/10.3390/wevj17020102 - 19 Feb 2026

Viewed by 975

Abstract

Generative pre-training via neural rendering has become a cornerstone for scaling 3D perception in autonomous driving. However, prevalent approaches relying on implicit Neural Radiance Fields (NeRFs) face two fundamental limitations: the shape-radiance ambiguity inherent in vision-centric optimization and the prohibitive computational overhead of [...] Read more.

Generative pre-training via neural rendering has become a cornerstone for scaling 3D perception in autonomous driving. However, prevalent approaches relying on implicit Neural Radiance Fields (NeRFs) face two fundamental limitations: the shape-radiance ambiguity inherent in vision-centric optimization and the prohibitive computational overhead of volumetric ray marching. To address these challenges, we propose AES-Gaussian, a novel multi-modal pre-training framework grounded in the efficient 3D Gaussian Splatting (3DGS) representation. Diverging from symmetric fusion paradigms, our core innovation is an Asymmetric Encoder architecture that couples a deep semantic vision backbone with a lightweight, physics-aware LiDAR branch. In this framework, LiDAR data serve not merely for semantic extraction, but as sparse physical anchors. By employing a novel Explicit Feature Synergy mechanism, we directly inject raw LiDAR intensity and depth priors into the Gaussian decoding process, thereby rigidly constraining scene geometry in open-world environments. Extensive empirical validation on the nuScenes dataset demonstrates the superiority of our approach. AES-Gaussian achieves state-of-the-art transfer performance, yielding a substantial 7.0% improvement in NDS for 3D Object Detection and a 4.8% mIoU gain in 3D semantic occupancy prediction compared to baselines. Notably, our method reduces geometric reconstruction error by over 50% while significantly improving training and inference efficiency, attributed to the streamlined asymmetric design and rapid Gaussian rasterization. Ultimately, by enhancing both perception accuracy and system efficiency, this work contributes to the development of safer and more reliable autonomous driving systems. Full article

(This article belongs to the Section Automated and Connected Vehicles)

► Show Figures

Figure 1

25 pages, 12559 KB

Open AccessArticle

Design and Implementation of a Low-Cost Perception System for Aerial Robots in Confined Spaces

by Susan Basnet, Jens Christian Andersen and Evangelos Boukas

Sensors 2026, 26(4), 1140; https://doi.org/10.3390/s26041140 - 10 Feb 2026

Viewed by 1071

Abstract

Operating an aerial vehicle in a confined space, such as a vessel ballast tank, is a major challenge in terms of localization, perception, and control due to limited visibility, constrained maneuvering space, and the absence of reliable (if any) GNSS signals. This paper [...] Read more.

Operating an aerial vehicle in a confined space, such as a vessel ballast tank, is a major challenge in terms of localization, perception, and control due to limited visibility, constrained maneuvering space, and the absence of reliable (if any) GNSS signals. This paper addresses the design considerations for a quadcopter in confined spaces, focusing on a novel perception system using 12 VL53L8CX time-of-flight (ToF) sensors from STMicroelectronics. These sensors are used for enhanced perception and collision avoidance while flying in confined spaces, making them a suitable alternative to bulky LiDAR systems, reducing weight, cost, and required computational power. These sensors are placed strategically around the quadcopter to cover 360° radial view within a 4 m range. Experiments are conducted to test the reliability and repeatability of the integrated system, along with its synchronization feature. Furthermore, the applicability is verified by flying in confined and cluttered spaces, both in simulation and the real world. This design and study aims to establish a baseline for lightweight, compact, and safe navigation for small drones in confined and featureless environments. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

27 pages, 6570 KB

Open AccessArticle

LiDAR–Inertial–Visual Odometry Based on Elastic Registration and Dynamic Feature Removal

by Qiang Ma, Fuhong Qin, Peng Xiao, Meng Wei, Sihong Chen, Wenbo Xu, Xingrui Yue, Ruicheng Xu and Zheng He

Electronics 2026, 15(4), 741; https://doi.org/10.3390/electronics15040741 - 9 Feb 2026

Cited by 1 | Viewed by 914

Abstract

Simultaneous Localization and Mapping (SLAM) is a fundamental capability for autonomous robots. However, in highly dynamic scenes, conventional SLAM systems often suffer from degraded accuracy due to LiDAR motion distortion and interference from moving objects. To address these challenges, this paper proposes a [...] Read more.

Simultaneous Localization and Mapping (SLAM) is a fundamental capability for autonomous robots. However, in highly dynamic scenes, conventional SLAM systems often suffer from degraded accuracy due to LiDAR motion distortion and interference from moving objects. To address these challenges, this paper proposes a LiDAR–Inertial–Visual odometry framework based on elastic registration and dynamic feature removal, with the aim of enhancing system robustness through detailed algorithmic supplements. In the LiDAR odometry module, an elastic registration-based de-skewing method is introduced by modeling second-order motion, enabling accurate point cloud correction under non-uniform motion. In the visual odometry module, a multi-strategy dynamic feature suppression mechanism is developed, combining IMU-assisted motion consistency verification with a lightweight YOLOv5-based detection network to effectively filter out dynamic interference with low computational overhead. Furthermore, depth information for visual key points is recovered using LiDAR assistance to enable tightly coupled pose estimation. Extensive experiments on the TUM and M2DGR datasets demonstrate that the proposed method achieves a 96.3% reduction in absolute trajectory error (ATE) compared with ORB-SLAM2 in highly dynamic scenarios. Real-world deployment on an embedded computing device further confirms the framework’s real-time performance and practical applicability in complex environments. Full article

► Show Figures

Figure 1

Search Results (143)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (143)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI