MDPI - Publisher of Open Access Journals

13 pages, 4728 KiB

Open AccessArticle

Stereo Direct Sparse Visual–Inertial Odometry with Efficient Second-Order Minimization

by Chenhui Fu and Jiangang Lu

Sensors 2025, 25(15), 4852; https://doi.org/10.3390/s25154852 (registering DOI) - 7 Aug 2025

Visual–inertial odometry (VIO) is the primary supporting technology for autonomous systems, but it faces three major challenges: initialization sensitivity, dynamic illumination, and multi-sensor fusion. In order to overcome these challenges, this paper proposes stereo direct sparse visual–inertial odometry with efficient second-order minimization. It [...] Read more.

Visual–inertial odometry (VIO) is the primary supporting technology for autonomous systems, but it faces three major challenges: initialization sensitivity, dynamic illumination, and multi-sensor fusion. In order to overcome these challenges, this paper proposes stereo direct sparse visual–inertial odometry with efficient second-order minimization. It is entirely implemented using the direct method, which includes a depth initialization module based on visual–inertial alignment, a stereo image tracking module, and a marginalization module. Inertial measurement unit (IMU) data is first aligned with a stereo image to initialize the system effectively. Then, based on the efficient second-order minimization (ESM) algorithm, the photometric error and the inertial error are minimized to jointly optimize camera poses and sparse scene geometry. IMU information is accumulated between several frames using measurement preintegration and is inserted into the optimization as an additional constraint between keyframes. A marginalization module is added to reduce the computation complexity of the optimization and maintain the information about the previous states. The proposed system is evaluated on the KITTI visual odometry benchmark and the EuRoC dataset. The experimental results demonstrate that the proposed system achieves state-of-the-art performance in terms of accuracy and robustness. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

19 pages, 1408 KiB

Open AccessArticle

Self-Supervised Learning of End-to-End 3D LiDAR Odometry for Urban Scene Modeling

by Shuting Chen, Zhiyong Wang, Chengxi Hong, Yanwen Sun, Hong Jia and Weiquan Liu

Remote Sens. 2025, 17(15), 2661; https://doi.org/10.3390/rs17152661 - 1 Aug 2025

Viewed by 292

Abstract

Accurate and robust spatial perception is fundamental for dynamic 3D city modeling and urban environmental sensing. High-resolution remote sensing data, particularly LiDAR point clouds, are pivotal for these tasks due to their lighting invariance and precise geometric information. However, processing and aligning sequential [...] Read more.

Accurate and robust spatial perception is fundamental for dynamic 3D city modeling and urban environmental sensing. High-resolution remote sensing data, particularly LiDAR point clouds, are pivotal for these tasks due to their lighting invariance and precise geometric information. However, processing and aligning sequential LiDAR point clouds in complex urban environments presents significant challenges: traditional point-based or feature-matching methods are often sensitive to urban dynamics (e.g., moving vehicles and pedestrians) and struggle to establish reliable correspondences. While deep learning offers solutions, current approaches for point cloud alignment exhibit key limitations: self-supervised losses often neglect inherent alignment uncertainties, and supervised methods require costly pixel-level correspondence annotations. To address these challenges, we propose UnMinkLO-Net, an end-to-end self-supervised LiDAR odometry framework. Our method is as follows: (1) we efficiently encode 3D point cloud structures using voxel-based sparse convolution, and (2) we model inherent alignment uncertainty via covariance matrices, enabling novel self-supervised loss based on uncertainty modeling. Extensive evaluations on the KITTI urban dataset demonstrate UnMinkLO-Net’s effectiveness in achieving highly accurate point cloud registration. Our self-supervised approach, eliminating the need for manual annotations, provides a powerful foundation for processing and analyzing LiDAR data within multi-sensor urban sensing frameworks. Full article

(This article belongs to the Special Issue 3D City Modeling and Sensing Using High-Resolution and Multi-Sensor Remote Sensing Data)

► Show Figures

Figure 1

28 pages, 7472 KiB

Open AccessArticle

Small but Mighty: A Lightweight Feature Enhancement Strategy for LiDAR Odometry in Challenging Environments

by Jiaping Chen, Kebin Jia and Zhihao Wei

Remote Sens. 2025, 17(15), 2656; https://doi.org/10.3390/rs17152656 - 31 Jul 2025

Viewed by 193

Abstract

LiDAR-based Simultaneous Localization and Mapping (SLAM) serves as a fundamental technology for autonomous navigation. However, in complex environments, LiDAR odometry often experience degraded localization accuracy and robustness. This paper proposes a computationally efficient enhancement strategy for LiDAR odometry, which improves system performance by [...] Read more.

LiDAR-based Simultaneous Localization and Mapping (SLAM) serves as a fundamental technology for autonomous navigation. However, in complex environments, LiDAR odometry often experience degraded localization accuracy and robustness. This paper proposes a computationally efficient enhancement strategy for LiDAR odometry, which improves system performance by reinforcing high-quality features throughout the optimization process. For non-ground features, the method employs statistical geometric analysis to identify stable points and incorporates a contribution-weighted optimization scheme to strengthen their impact in point-to-plane and point-to-line constraints. In parallel, for ground features, locally stable planar surfaces are fitted to replace discrete point correspondences, enabling more consistent point-to-plane constraint formulation during ground registration. Experimental results on the KITTI and M2DGR datasets demonstrated that the proposed method significantly improves localization accuracy and system robustness, while preserving real-time performance with minimal computational overhead. The performance gains were particularly notable in scenarios dominated by unstructured environments. Full article

(This article belongs to the Special Issue Laser Scanning in Environmental and Engineering Applications)

► Show Figures

Figure 1

23 pages, 4256 KiB

Open AccessArticle

A GAN-Based Framework with Dynamic Adaptive Attention for Multi-Class Image Segmentation in Autonomous Driving

by Bashir Sheikh Abdullahi Jama and Mehmet Hacibeyoglu

Appl. Sci. 2025, 15(15), 8162; https://doi.org/10.3390/app15158162 - 22 Jul 2025

Viewed by 242

Abstract

Image segmentation is a foundation for autonomous driving frameworks that empower vehicles to explore and navigate their surrounding environment. It gives a fundamental setting to the dynamic cycles by dividing the image into significant parts like streets, vehicles, walkers, and traffic signs. Precise [...] Read more.

Image segmentation is a foundation for autonomous driving frameworks that empower vehicles to explore and navigate their surrounding environment. It gives a fundamental setting to the dynamic cycles by dividing the image into significant parts like streets, vehicles, walkers, and traffic signs. Precise segmentation ensures safe navigation and the avoidance of collisions, while following the rules of traffic is very critical for seamless operation in self-driving cars. The most recent deep learning-based image segmentation models have demonstrated impressive performance in structured environments, yet they often fall short when applied to the complex and unpredictable conditions encountered in autonomous driving. This study proposes an Adaptive Ensemble Attention (AEA) mechanism within a Generative Adversarial Network architecture to deal with dynamic and complex driving conditions. The AEA integrates the features of self, spatial, and channel attention adaptively and powerfully changes the amount of each contribution as per input and context-oriented relevance. It does this by allowing the discriminator network in GAN to evaluate the segmentation mask created by the generator. This explains the difference between real and fake masks by considering a concatenated pair of an original image and its mask. The adversarial training will prompt the generator, via the discriminator, to mask out the image in such a way that the output aligns with the expected ground truth and is also very realistic. The exchange of information between the generator and discriminator improves the quality of the segmentation. In order to check the accuracy of the proposed method, the three widely used datasets BDD100K, Cityscapes, and KITTI were selected to calculate average IoU, where the value obtained was 89.46%, 89.02%, and 88.13% respectively. These outcomes emphasize the model’s effectiveness and consistency. Overall, it achieved a remarkable accuracy of 98.94% and AUC of 98.4%, indicating strong enhancements compared to the State-of-the-art (SOTA) models. Full article

► Show Figures

Figure 1

25 pages, 8560 KiB

Open AccessArticle

Visual Point Cloud Map Construction and Matching Localization for Autonomous Vehicle

by Shuchen Xu, Kedong Zhao, Yongrong Sun, Xiyu Fu and Kang Luo

Drones 2025, 9(7), 511; https://doi.org/10.3390/drones9070511 - 21 Jul 2025

Viewed by 353

Abstract

Collaboration between autonomous vehicles and drones can enhance the efficiency and connectivity of three-dimensional transportation systems. When satellite signals are unavailable, vehicles can achieve accurate localization by matching rich ground environmental data to digital maps, simultaneously providing the auxiliary localization information for drones. [...] Read more.

Collaboration between autonomous vehicles and drones can enhance the efficiency and connectivity of three-dimensional transportation systems. When satellite signals are unavailable, vehicles can achieve accurate localization by matching rich ground environmental data to digital maps, simultaneously providing the auxiliary localization information for drones. However, conventional digital maps suffer from high construction costs, easy misalignment, and low localization accuracy. Thus, this paper proposes a visual point cloud map (VPCM) construction and matching localization for autonomous vehicles. We fuse multi-source information from vehicle-mounted sensors and the regional road network to establish the geographically high-precision VPCM. In the absence of satellite signals, we segment the prior VPCM on the road network based on real-time localization results, which accelerates matching speed and reduces mismatch probability. Simultaneously, by continuously introducing matching constraints of real-time point cloud and prior VPCM through improved iterative closest point matching method, the proposed solution can effectively suppress the drift error of the odometry and output accurate fusion localization results based on pose graph optimization theory. The experiments carried out on the KITTI datasets demonstrate the effectiveness of the proposed method, which can autonomously construct the high-precision prior VPCM. The localization strategy achieves sub-meter accuracy and reduces the average error per frame by 25.84% compared to similar methods. Subsequently, this method’s reusability and localization robustness under light condition changes and environment changes are verified using the campus dataset. Compared to the similar camera-based method, the matching success rate increased by 21.15%, and the average localization error decreased by 62.39%. Full article

(This article belongs to the Special Issue Advanced Autonomous Mobility Toward Low-Altitude Economy and Three-Dimensional Transportation Systems)

► Show Figures

Figure 1

17 pages, 4914 KiB

Open AccessArticle

Large-Scale Point Cloud Semantic Segmentation with Density-Based Grid Decimation

by Liangcun Jiang, Jiacheng Ma, Han Zhou, Boyi Shangguan, Hongyu Xiao and Zeqiang Chen

ISPRS Int. J. Geo-Inf. 2025, 14(7), 279; https://doi.org/10.3390/ijgi14070279 - 17 Jul 2025

Viewed by 487

Abstract

Accurate segmentation of point clouds into categories such as roads, buildings, and trees is critical for applications in 3D reconstruction and autonomous driving. However, large-scale point cloud segmentation encounters challenges such as uneven density distribution, inefficient sampling, and limited feature extraction capabilities. To [...] Read more.

Accurate segmentation of point clouds into categories such as roads, buildings, and trees is critical for applications in 3D reconstruction and autonomous driving. However, large-scale point cloud segmentation encounters challenges such as uneven density distribution, inefficient sampling, and limited feature extraction capabilities. To address these issues, this paper proposes RT-Net, a novel framework that incorporates a density-based grid decimation algorithm for efficient preprocessing of outdoor point clouds. The proposed framework helps alleviate the problem of uneven density distribution and improves computational efficiency. RT-Net also introduces two modules: Local Attention Aggregation, which extracts local detailed features of points using an attention mechanism, enhancing the model’s recognition ability for small-sized objects; and Attention Residual, which integrates local details of point clouds with global features by an attention mechanism to improve the model’s generalization ability. Experimental results on the Toronto3D, Semantic3D, and SemanticKITTI datasets demonstrate the superiority of RT-Net for small-sized object segmentation, achieving state-of-the-art mean Intersection over Union (mIoU) scores of 86.79% on Toronto3D and 79.88% on Semantic3D. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

39 pages, 7470 KiB

Open AccessArticle

Estimation of Fractal Dimension and Semantic Segmentation of Motion-Blurred Images by Knowledge Distillation in Autonomous Vehicle

by Seong In Jeong, Min Su Jeong and Kang Ryoung Park

Fractal Fract. 2025, 9(7), 460; https://doi.org/10.3390/fractalfract9070460 - 15 Jul 2025

Viewed by 406

Abstract

Research on semantic segmentation for remote sensing road scenes advanced significantly, driven by autonomous driving technology. However, motion blur from camera or subject movements hampers segmentation performance. To address this issue, we propose a knowledge distillation-based semantic segmentation network (KDS-Net) that is robust [...] Read more.

Research on semantic segmentation for remote sensing road scenes advanced significantly, driven by autonomous driving technology. However, motion blur from camera or subject movements hampers segmentation performance. To address this issue, we propose a knowledge distillation-based semantic segmentation network (KDS-Net) that is robust to motion blur, eliminating the need for image restoration networks. KDS-Net leverages innovative knowledge distillation techniques and edge-enhanced segmentation loss to refine edge regions and improve segmentation precision across various receptive fields. To enhance the interpretability of segmentation quality under motion blur, we incorporate fractal dimension estimation to quantify the geometric complexity of class-specific regions, allowing for a structural assessment of predictions generated by the proposed knowledge distillation framework for autonomous driving. Experiments on well-known motion-blurred remote sensing road scene datasets (CamVid and KITTI) demonstrate mean IoU scores of 72.42% and 59.29%, respectively, surpassing state-of-the-art methods. Additionally, the lightweight KDS-Net (21.44 M parameters) enables real-time edge computing, mitigating data privacy concerns and communication overheads in internet of vehicles scenarios. Full article

(This article belongs to the Special Issue Advances in Pattern Recognition—Image and Time Series Analyses—through Fractal Geometry and Complexity Theory)

► Show Figures

Figure 1

21 pages, 4044 KiB

Open AccessArticle

DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking, and Loop Closing

by Hao Qu, Lilian Zhang, Jun Mao, Junbo Tie, Xiaofeng He, Xiaoping Hu, Yifei Shi and Changhao Chen

Appl. Sci. 2025, 15(14), 7838; https://doi.org/10.3390/app15147838 - 13 Jul 2025

Viewed by 426

Abstract

The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in [...] Read more.

The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in continuous motion scenes, adversely affecting loop detection accuracy. Our system employs a Model-Agnostic Meta-Learning (MAML) strategy to optimize the training of keypoint extraction networks, enhancing their adaptability to diverse environments. Additionally, we introduce a coarse-to-fine feature tracking mechanism for learned keypoints. It begins with a direct method to approximate the relative pose between consecutive frames, followed by a feature matching method for refined pose estimation. To mitigate cumulative positioning errors, DK-SLAM incorporates a novel online learning module that utilizes binary features for loop closure detection. This module dynamically identifies loop nodes within a sequence, ensuring accurate and efficient localization. Experimental evaluations on publicly available datasets demonstrate that DK-SLAM outperforms leading traditional and learning-based SLAM systems, such as ORB-SLAM3 and LIFT-SLAM. DK-SLAM achieves 17.7% better translation accuracy and 24.2% better rotation accuracy than ORB-SLAM3 on KITTI and 34.2% better translation accuracy on EuRoC. These results underscore the efficacy and robustness of our DK-SLAM in varied and challenging real-world environments. Full article

(This article belongs to the Section Robotics and Automation)

► Show Figures

Figure 1

20 pages, 3710 KiB

Open AccessArticle

An Accurate LiDAR-Inertial SLAM Based on Multi-Category Feature Extraction and Matching

by Nuo Li, Yiqing Yao, Xiaosu Xu, Shuai Zhou and Taihong Yang

Remote Sens. 2025, 17(14), 2425; https://doi.org/10.3390/rs17142425 - 12 Jul 2025

Viewed by 450

Abstract

Light Detection and Ranging(LiDAR)-inertial simultaneous localization and mapping (SLAM) is a critical component in multi-sensor autonomous navigation systems, providing both accurate pose estimation and detailed environmental understanding. Despite its importance, existing optimization-based LiDAR-inertial SLAM methods often face key limitations: unreliable feature extraction, sensitivity [...] Read more.

Light Detection and Ranging(LiDAR)-inertial simultaneous localization and mapping (SLAM) is a critical component in multi-sensor autonomous navigation systems, providing both accurate pose estimation and detailed environmental understanding. Despite its importance, existing optimization-based LiDAR-inertial SLAM methods often face key limitations: unreliable feature extraction, sensitivity to noise and sparsity, and the inclusion of redundant or low-quality feature correspondences. These weaknesses hinder their performance in complex or dynamic environments and fail to meet the reliability requirements of autonomous systems. To overcome these challenges, we propose a novel and accurate LiDAR-inertial SLAM framework with three major contributions. First, we employ a robust multi-category feature extraction method based on principal component analysis (PCA), which effectively filters out noisy and weakly structured points, ensuring stable feature representation. Second, to suppress outlier correspondences and enhance pose estimation reliability, we introduce a coarse-to-fine two-stage feature correspondence selection strategy that evaluates geometric consistency and structural contribution. Third, we develop an adaptive weighted pose estimation scheme that considers both distance and directional consistency, improving the robustness of feature matching under varying scene conditions. These components are jointly optimized within a sliding-window-based factor graph, integrating LiDAR feature factors, IMU pre-integration, and loop closure constraints. Extensive experiments on public datasets (KITTI, M2DGR) and a custom-collected dataset validate the proposed method’s effectiveness. Results show that our system consistently outperforms state-of-the-art approaches in accuracy and robustness, particularly in scenes with sparse structure, motion distortion, and dynamic interference, demonstrating its suitability for reliable real-world deployment. Full article

(This article belongs to the Special Issue LiDAR Technology for Autonomous Navigation and Mapping)

► Show Figures

Figure 1

19 pages, 18048 KiB

Open AccessArticle

Natural Occlusion-Based Backdoor Attacks: A Novel Approach to Compromising Pedestrian Detectors

by Qiong Li, Yalun Wu, Qihuan Li, Xiaoshu Cui, Yuanwan Chen, Xiaolin Chang, Jiqiang Liu and Wenjia Niu

Sensors 2025, 25(13), 4203; https://doi.org/10.3390/s25134203 - 5 Jul 2025

Viewed by 355

Abstract

Pedestrian detection systems are widely used in safety-critical domains such as autonomous driving, where deep neural networks accurately perceive individuals and distinguish them from other objects. However, their vulnerability to backdoor attacks remains understudied. Existing backdoor attacks, relying on unnatural digital perturbations or [...] Read more.

Pedestrian detection systems are widely used in safety-critical domains such as autonomous driving, where deep neural networks accurately perceive individuals and distinguish them from other objects. However, their vulnerability to backdoor attacks remains understudied. Existing backdoor attacks, relying on unnatural digital perturbations or explicit patches, are difficult to deploy stealthily in the physical world. In this paper, we propose a novel backdoor attack method that leverages real-world occlusions (e.g., backpacks) as natural triggers for the first time. We design a dynamically optimized heuristic-based strategy to adaptively adjust the trigger’s position and size for diverse occlusion scenarios, and develop three model-independent trigger embedding mechanisms for attack implementation. We conduct extensive experiments on two different pedestrian detection models using publicly available datasets. The results demonstrate that while maintaining baseline performance, the backdoored models achieve average attack success rates of 75.1% on KITTI and 97.1% on CityPersons datasets, respectively. Physical tests verify that pedestrians wearing backpack triggers could successfully evade detection under varying shooting distances of iPhone cameras, though the attack failed when pedestrians rotated by 90°, confirming the practical feasibility of our method. Through ablation studies, we further investigate the impact of key parameters such as trigger patterns and poisoning rates on attack effectiveness. Finally, we evaluate the defense resistance capability of our proposed method. This study reveals that common occlusion phenomena can serve as backdoor carriers, providing critical insights for designing physically robust pedestrian detection systems. Full article

(This article belongs to the Special Issue Intelligent Traffic Safety and Security)

► Show Figures

Figure 1

20 pages, 1993 KiB

Open AccessArticle

AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features

by Ruochen Zhang, Hyeung-Sik Choi, Dongwook Jung, Phan Huy Nam Anh, Sang-Ki Jeong and Zihao Zhu

Appl. Sci. 2025, 15(13), 7538; https://doi.org/10.3390/app15137538 - 4 Jul 2025

Viewed by 307

Abstract

Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and complicate integration into existing systems. [...] Read more.

Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and complicate integration into existing systems. To overcome these limitations, we propose AuxDepthNet, an efficient framework for real-time monocular 3D object detection that eliminates the reliance on external depth maps or pre-trained depth models. AuxDepthNet introduces two key components: the Auxiliary Depth Feature (ADF) module, which implicitly learns depth-sensitive features to improve spatial reasoning and computational efficiency, and the Depth Position Mapping (DPM) module, which embeds depth positional information directly into the detection process to enable accurate object localization and 3D bounding box regression. Leveraging the DepthFusion Transformer (DFT) architecture, AuxDepthNet globally integrates visual and depth-sensitive features through depth-guided interactions, ensuring robust and efficient detection. Extensive experiments on the KITTI dataset show that AuxDepthNet achieves state-of-the-art performance, with

{A P}_{3 D}

scores of 24.72% (Easy), 18.63% (Moderate), and 15.31% (Hard), and

{A P}_{B E V}

scores of 34.11% (Easy), 25.18% (Moderate), and 21.90% (Hard) at an IoU threshold of 0.7. Full article

► Show Figures

Figure 1

21 pages, 6136 KiB

Open AccessArticle

A ROS-Based Online System for 3D Gaussian Splatting Optimization: Flexible Frontend Integration and Real-Time Refinement

by Li’an Wang, Jian Xu, Xuan An, Yujie Ji, Yuxuan Wu and Zhaoyuan Ma

Sensors 2025, 25(13), 4151; https://doi.org/10.3390/s25134151 - 3 Jul 2025

Viewed by 582

Abstract

The 3D Gaussian splatting technique demonstrates significant efficiency advantages in real-time scene reconstruction. However, when its initialization process relies on traditional SfM methods (such as COLMAP), there are obvious bottlenecks, such as high computational resource consumption, as well as the decoupling problem between [...] Read more.

The 3D Gaussian splatting technique demonstrates significant efficiency advantages in real-time scene reconstruction. However, when its initialization process relies on traditional SfM methods (such as COLMAP), there are obvious bottlenecks, such as high computational resource consumption, as well as the decoupling problem between camera pose optimization and map construction. This paper proposes an online 3DGS optimization system based on ROS. Through the design of a loose-coupling architecture, it realizes real-time data interaction between the frontend SfM/SLAM module and backend 3DGS optimization. Using ROS as a middleware, this system can access the keyframe poses and point-cloud data generated by any frontend algorithms (such as ORB-SLAM, COLMAP, etc.). With the help of a dynamic sliding-window strategy and a rendering-quality loss function that combines L1 and SSIM, it achieves online optimization of the 3DGS map. The experimental data shows that compared with the traditional COLMAP-3DGS process, this system reduces the initialization time by 90% and achieves an average PSNR improvement of 1.9 dB on the TUM-RGBD, Tanks and Temples, and KITTI datasets. Full article

(This article belongs to the Special Issue Design and Integration of Sensors for Control, Planning and Deployment in Robotic Systems)

► Show Figures

Figure 1

17 pages, 7477 KiB

Open AccessArticle

The Development of a Lane Identification and Assessment Framework for Maintenance Using AI Technology

by Hohyuk Na, Do Gyeong Kim, Ji Min Kang and Chungwon Lee

Appl. Sci. 2025, 15(13), 7410; https://doi.org/10.3390/app15137410 - 1 Jul 2025

Viewed by 413

Abstract

This study proposes a vision-based framework to support AVs in maintaining stable lane-keeping by assessing the condition of lane markings. Unlike existing infrastructure standards focused on human visibility, this study addresses the need for criteria suited to sensor-based AV environments. Using real driving [...] Read more.

This study proposes a vision-based framework to support AVs in maintaining stable lane-keeping by assessing the condition of lane markings. Unlike existing infrastructure standards focused on human visibility, this study addresses the need for criteria suited to sensor-based AV environments. Using real driving data from urban expressways in Seoul, a YOLOv5-based lane detection algorithm was developed and enhanced through multi-label annotation and data augmentation. The model achieved a mean average precision (mAP) of 97.4% and demonstrated strong generalization on external datasets such as KITTI and TuSimple. For lane condition assessment, a pixel occupancy–based method was applied, combined with Canny edge detection and morphological operations. A threshold of 80-pixel occupancy was used to classify lanes as intact or worn. The proposed framework reliably detected lane degradation under various road and lighting conditions. These results suggest that quantitative, image-based indicators can complement traditional standards and guide AV-oriented infrastructure policy. Limitations include a lack of adverse weather data and dataset-specific threshold sensitivity. Full article

(This article belongs to the Special Issue Artificial Intelligence in Transportation Safety and Traffic Management)

► Show Figures

Figure 1

17 pages, 1609 KiB

Open AccessArticle

Parallel Multi-Scale Semantic-Depth Interactive Fusion Network for Depth Estimation

by Chenchen Fu, Sujunjie Sun, Ning Wei, Vincent Chau, Xueyong Xu and Weiwei Wu

J. Imaging 2025, 11(7), 218; https://doi.org/10.3390/jimaging11070218 - 1 Jul 2025

Viewed by 359

Abstract

Self-supervised depth estimation from monocular image sequences provides depth information without costly sensors like LiDAR, offering significant value for autonomous driving. Although self-supervised algorithms can reduce the dependence on labeled data, the performance is still affected by scene occlusions, lighting differences, and sparse [...] Read more.

Self-supervised depth estimation from monocular image sequences provides depth information without costly sensors like LiDAR, offering significant value for autonomous driving. Although self-supervised algorithms can reduce the dependence on labeled data, the performance is still affected by scene occlusions, lighting differences, and sparse textures. Existing methods do not consider the enhancement and interaction fusion of features. In this paper, we propose a novel parallel multi-scale semantic-depth interactive fusion network. First, we adopt a multi-stage feature attention network for feature extraction, and a parallel semantic-depth interactive fusion module is introduced to refine edges. Furthermore, we also employ a metric loss based on semantic edges to take full advantage of semantic geometric information. Our network is trained and evaluated on KITTI datasets. The experimental results show that the methods achieve satisfactory performance compared to other existing methods. Full article

► Show Figures

Figure 1

59 pages, 3738 KiB

Open AccessArticle

A Survey of Visual SLAM Based on RGB-D Images Using Deep Learning and Comparative Study for VOE

by Van-Hung Le and Thi-Ha-Phuong Nguyen

Algorithms 2025, 18(7), 394; https://doi.org/10.3390/a18070394 - 27 Jun 2025

Viewed by 652

Abstract

Visual simultaneous localization and mapping (Visual SLAM) based on RGB-D image data includes two main tasks: One is to build an environment map, and the other is to simultaneously track the position and movement of visual odometry estimation (VOE). Visual SLAM and VOE [...] Read more.

Visual simultaneous localization and mapping (Visual SLAM) based on RGB-D image data includes two main tasks: One is to build an environment map, and the other is to simultaneously track the position and movement of visual odometry estimation (VOE). Visual SLAM and VOE are used in many applications, such as robot systems, autonomous mobile robots, assistance systems for the blind, human–machine interaction, industry, etc. To solve the computer vision problems in Visual SLAM and VOE from RGB-D images, deep learning (DL) is an approach that gives very convincing results. This manuscript examines the results, advantages, difficulties, and challenges of the problem of Visual SLAM and VOE based on DL. In this paper, the taxonomy is proposed to conduct a complete survey based on three methods to construct Visual SLAM and VOE from RGB-D images (1) using DL for the modules of the Visual SLAM and VOE systems; (2) using DL to supplement the modules of Visual SLAM and VOE systems; and (3) using end-to-end DL to build Visual SLAM and VOE systems. The 220 scientific publications on Visual SLAM, VOE, and related issues were surveyed. The studies were surveyed based on the order of methods, datasets, evaluation measures, and detailed results. In particular, studies on using DL to build Visual SLAM and VOE systems have analyzed the challenges, advantages, and disadvantages. We also proposed and published the TQU-SLAM benchmark dataset, and a comparative study on fine-tuning the VOE model using a Multi-Layer Fusion network (MLF-VO) framework was performed. The comparison results of VOE on the TQU-SLAM benchmark dataset range from 16.97 m to 57.61 m. This is a huge error compared to the VOE methods on the KITTI, TUM RGB-D SLAM, and ICL-NUIM datasets. Therefore, the dataset we publish is very challenging, especially in the opposite direction (OP-D) when collecting and annotation data. The results of the comparative study are also presented in detail and available. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Next-Generation Internet Technologies)

► Show Figures

Figure 1

Search Results (479)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (479)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI