Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,913)

Search Parameters:
Keywords = 3D scene

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1652 KB  
Article
Classification of Point Cloud Data in Road Scenes Based on PointNet++
by Jingfeng Xue, Bin Zhao, Chunhong Zhao, Yueru Li and Yihao Cao
Sensors 2026, 26(1), 153; https://doi.org/10.3390/s26010153 - 25 Dec 2025
Abstract
Point cloud data, with its rich information and high-precision geometric details, holds significant value for urban road infrastructure surveying and management. To overcome the limitations of manual classification, this study employs deep learning techniques for automated point cloud feature extraction and classification, achieving [...] Read more.
Point cloud data, with its rich information and high-precision geometric details, holds significant value for urban road infrastructure surveying and management. To overcome the limitations of manual classification, this study employs deep learning techniques for automated point cloud feature extraction and classification, achieving high-precision object recognition in road scenes. By integrating the Princeton ModelNet40, ShapeNet, and Sydney Urban Objects datasets, we extracted 3D spatial coordinates from the Sydney Urban Objects Dataset and organized labeled point cloud files to build a comprehensive dataset reflecting real-world road scenarios. To address noise and occlusion-induced data gaps, three augmentation strategies were implemented: (1) Farthest Point Sampling (FPS): Preserves critical features while mitigating overfitting. (2) Random Z-axis rotation, translation, and scaling: Enhances model generalization. (3) Gaussian noise injection: Improves training sample realism. The PointNet++ framework was enhanced by integrating a point-filling method into the preprocessing module. Model training and prediction were conducted using its Multi-Scale Grouping (MSG) and Single-Scale Grouping (SSG) schemes. The model achieved an average training accuracy of 86.26% (peak single-instance accuracy: 98.54%; best category accuracy: 93.15%) and a test set accuracy of 97.41% (category accuracy: 84.50%). This study demonstrates successful road scene point cloud classification, providing valuable insights for point cloud data processing and related research. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

9 pages, 2357 KB  
Proceeding Paper
AI-Enhanced Mono-View Geometry for Digital Twin 3D Visualization in Autonomous Driving
by Ing-Chau Chang, Yu-Chiao Chang, Chunghui Kuo and Chin-En Yen
Eng. Proc. 2025, 120(1), 6; https://doi.org/10.3390/engproc2025120006 - 25 Dec 2025
Abstract
To address the critical problem of 3D object detection in autonomous driving scenarios, we developed a novel digital twin architecture. This architecture combines AI models with geometric optics algorithms of camera systems for autonomous vehicles, characterized by low computational cost and high generalization [...] Read more.
To address the critical problem of 3D object detection in autonomous driving scenarios, we developed a novel digital twin architecture. This architecture combines AI models with geometric optics algorithms of camera systems for autonomous vehicles, characterized by low computational cost and high generalization capability. The architecture leverages monocular images to estimate the real-world heights and 3D positions of objects using vanishing lines and the pinhole camera model. The You Only Look Once (YOLOv11) object detection model is employed for accurate object category identification. These components are seamlessly integrated to construct a digital twin system capable of real-time reconstruction of the surrounding 3D environment. This enables the autonomous driving system to perform real-time monitoring and optimized decision-making. Compared with conventional deep-learning-based 3D object detection models, the architecture offers several notable advantages. Firstly, it mitigates the significant reliance on large-scale labeled datasets typically required by deep learning approaches. Secondly, its decision-making process inherently provides interpretability. Thirdly, it demonstrates robust generalization capabilities across diverse scenes and object types. Finally, its low computational complexity makes it particularly well-suited for resource-constrained in-vehicle edge devices. Preliminary experimental results validate the reliability of the proposed approach, showing a depth prediction error of less than 5% in driving scenarios. Furthermore, the proposed method achieves significantly faster runtime, corresponding to only 42, 27, and 22% of MonoAMNet, MonoSAID, and MonoDFNet, respectively. Full article
Show Figures

Figure 1

19 pages, 2025 KB  
Article
Bidirectional Complementary Cross-Attention and Temporal Adaptive Fusion for 3D Object Detection in Intelligent Transportation Scenes
by Di Tian, Jiawei Wang, Jiabo Li, Mingming Gong, Jiahang Shi, Zhongyi Huang and Zhongliang Fu
Electronics 2026, 15(1), 83; https://doi.org/10.3390/electronics15010083 - 24 Dec 2025
Abstract
Multi-sensor fusion represents a primary approach for enhancing environmental perception in intelligent transportation scenes. Among diverse fusion strategies, Bird’s-Eye View (BEV) perspective-based fusion methods have emerged as a prominent research focus owing to advantages such as unified spatial representation. However, current BEV fusion [...] Read more.
Multi-sensor fusion represents a primary approach for enhancing environmental perception in intelligent transportation scenes. Among diverse fusion strategies, Bird’s-Eye View (BEV) perspective-based fusion methods have emerged as a prominent research focus owing to advantages such as unified spatial representation. However, current BEV fusion methods still face challenges with insufficient robustness in cross-modal alignment and weak perception of dynamic objects. To address these challenges, this paper proposes a Bidirectional Complementary Cross-Attention Module (BCCA), which achieves deep fusion of image and point cloud features by adaptively learning cross-modal attention weights, thereby significantly improving cross-modal information interaction. Secondly, we propose a Temporal Adaptive Fusion Module (TAFusion). This module effectively incorporates temporal information within the BEV space and enables efficient fusion of multi-modal features across different frames through a two-stage alignment strategy, substantially enhancing the model’s ability to perceive dynamic objects. Based on the above, we integrate these two modules to propose the Dual Temporal and Transversal Attention Network (DTTANet), a novel camera and LiDAR fusion framework. Comprehensive experiments demonstrate that our proposed method achieves improvements of 1.42% in mAP and 1.26% in NDS on the nuScenes dataset compared to baseline networks, effectively advancing the development of 3D object detection technology for intelligent transportation scenes. Full article
27 pages, 7980 KB  
Article
A Novel Data-Focusing Method for Highly Squinted MEO SAR Based on Spatially Variable Spectrum and NUFFT 2D Resampling
by Huguang Yao, Tao He, Pengbo Wang, Zhirong Men and Jie Chen
Remote Sens. 2026, 18(1), 49; https://doi.org/10.3390/rs18010049 - 24 Dec 2025
Abstract
Although the elevated orbit and highly squinted observation geometry bring advantages for medium-earth-orbit (MEO) synthetic aperture radar (SAR) in applications, they also complicate signal processing. The severe spatial variability of Doppler parameters and large extended range distribution of echo make it challenging for [...] Read more.
Although the elevated orbit and highly squinted observation geometry bring advantages for medium-earth-orbit (MEO) synthetic aperture radar (SAR) in applications, they also complicate signal processing. The severe spatial variability of Doppler parameters and large extended range distribution of echo make it challenging for the traditional imaging algorithms to get the expected results. To quantify the variation, a spatially variable two-dimensional (SV2D) spectrum is established in this paper. The sufficient order and spatially variable terms allow it to preserve the features of targets both in the scene center and at the edge. In addition, the huge data volume and incomplete azimuth signals of edge targets, caused by the large range walk when MEO SAR operates in squinted mode, are alleviated by the variable pulse repetition interval (VPRI) technique. Based on this, a novel data-focusing method for highly squinted MEO SAR is proposed. The azimuth resampling, achieved through the non-uniform fast Fourier transform (NUFFT), eliminates the impact of most Doppler parameter space variation. Then, a novel imaging kernel is applied to accomplish target focusing. The spatially variable range cell migration (RCM) is corrected by a similar idea, with Doppler parameter equalization, and an accurate high-order phase filter derived from the SV2D spectrum guarantees that the targets located in the center range gate and the center Doppler time are well focused. For other targets, inspired by the non-linear chirp scaling algorithm (NCSA), the residual spatially variable mismatch is eliminated by a cubic phase filter during the scaling process to achieve sufficient focusing depth. The simulation results are given at the end of this paper and these validate the effectiveness of the method. Full article
Show Figures

Figure 1

23 pages, 22740 KB  
Article
LVCA-Net: Lightweight LiDAR Semantic Segmentation for Advanced Sensor-Based Perception in Autonomous Transportation Systems
by Yuxuan Gong, Yuanhao Huang, Li Bao and Jinlei Wang
Sensors 2026, 26(1), 94; https://doi.org/10.3390/s26010094 (registering DOI) - 23 Dec 2025
Abstract
Reliable 3D scene understanding is a fundamental requirement for intelligent machines in autonomous transportation systems, as on-board perception must remain accurate and stable across diverse environments and sensing conditions. However, LiDAR point clouds acquired in real traffic scenes are often sparse and irregular, [...] Read more.
Reliable 3D scene understanding is a fundamental requirement for intelligent machines in autonomous transportation systems, as on-board perception must remain accurate and stable across diverse environments and sensing conditions. However, LiDAR point clouds acquired in real traffic scenes are often sparse and irregular, and they exhibit heterogeneous sampling patterns that hinder consistent and fine-grained semantic interpretation. To address these challenges, this paper proposes LVCA-Net, a lightweight voxel–coordinate attention framework designed for efficient LiDAR-based 3D semantic segmentation in autonomous driving scenarios. The architecture integrates (i) an anisotropic depthwise residual module for direction-aware geometric feature extraction, (ii) a hierarchical LiteDown–LiteUp pathway for multi-scale feature fusion, and (iii) a Coordinate-Guided Sparse Semantic Module that enhances spatial consistency in a cylindrical voxel space while maintaining computational sparsity. Experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that LVCA-Net achieves 67.17% mean Intersection over Union (mIoU) and 91.79% overall accuracy on SemanticKITTI, as well as 77.1% mIoU on nuScenes, while maintaining real-time inference efficiency. These results indicate that LVCA-Net delivers scalable and robust 3D scene understanding with high semantic precision for LiDAR-only perception, making it well suited for deployment in autonomous vehicles and other safety-critical intelligent systems. Full article
Show Figures

Figure 1

20 pages, 3382 KB  
Article
CFFCNet: Center-Guided Feature Fusion Completion for Accurate Vehicle Localization and Dimension Estimation from Lidar Point Clouds
by Xiaoyi Chen, Xiao Feng, Shichen Zhang, Wen Xiao, Miao Tang and Kun Sun
Remote Sens. 2026, 18(1), 39; https://doi.org/10.3390/rs18010039 - 23 Dec 2025
Abstract
Accurate scene understanding from 3D point cloud data is fundamental to intelligent transportation systems and geospatial digital twins. However, point clouds acquired from lidar sensors in urban environments suffer from incompleteness due to occlusions and limited sensor resolution, presenting significant challenges for precise [...] Read more.
Accurate scene understanding from 3D point cloud data is fundamental to intelligent transportation systems and geospatial digital twins. However, point clouds acquired from lidar sensors in urban environments suffer from incompleteness due to occlusions and limited sensor resolution, presenting significant challenges for precise object localization and geometric reconstruction—critical requirements for traffic safety monitoring and autonomous navigation. To address these point cloud processing challenges, we propose a Center-guided Feature Fusion Completion Network (CFFCNet) that enhances vehicle representation through geometry-aware point cloud completion. The network incorporates a Branch-assisted Center Perception (BCP) module that learns to predict geometric centers while extracting multi-scale spatial features, generating initial coarse completions that account for the misalignment between detection centers and true geometric centers in real-world data. Subsequently, a Multi-scale Feature Blending Upsampling (MFBU) module progressively refines these completions by fusing hierarchical features across multiple stages, producing accurate and complete vehicle point clouds. Comprehensive evaluations on the KITTI dataset demonstrate substantial improvements in geometric accuracy, with localization mean absolute error (MAE) reduced to 0.0928 m and length MAE to 0.085 m. The method’s generalization capability is further validated on a real-world roadside lidar dataset (CUG-Roadside) without fine-tuning, achieving localization MAE of 0.051 m and length MAE of 0.051 m. These results demonstrate the effectiveness of geometry-guided completion for point cloud scene understanding in infrastructure-based traffic monitoring applications, contributing to the development of robust 3D perception systems for urban geospatial environments. Full article
(This article belongs to the Special Issue Point Cloud Data Analysis and Applications)
Show Figures

Figure 1

18 pages, 3092 KB  
Article
On the Selection of Transmitted Views for Decoder-Side Depth Estimation
by Dominika Klóska, Adrian Dziembowski, Adam Grzelka and Dawid Mieloch
Appl. Sci. 2026, 16(1), 72; https://doi.org/10.3390/app16010072 - 20 Dec 2025
Viewed by 110
Abstract
The selection of optimal views for transmission is critical for the coding efficiency of the MPEG Immersive Video (MIV) profile of Decoder-Side Depth Estimation (DSDE). Standard approaches, which favor a uniform camera distribution, often fail in scenes with complex geometry, leading to decreased [...] Read more.
The selection of optimal views for transmission is critical for the coding efficiency of the MPEG Immersive Video (MIV) profile of Decoder-Side Depth Estimation (DSDE). Standard approaches, which favor a uniform camera distribution, often fail in scenes with complex geometry, leading to decreased quality of depth estimation, and thus, reduced quality of virtual views presented to a viewer. This paper proposes an adaptive view selection method that analyzes the scene’s percentage of occluded regions. Based on this analysis, the encoder dynamically selects a transmission strategy: for scenes with a low occlusion ratio (smaller than 10%), a uniform layout is maintained to maximize spatial coverage; for scenes with a high occlusion ratio, the system switches to grouping cameras into stereo pairs, which are more robust for decreasing numbers of occlusions. Experiments conducted using the TMIV reference software demonstrated that this approach yields measurable quality gains (up to 2 dB BD-IVPSNR) for complex test sequences, such as MartialArts and Frog, without requiring any modifications to the decoder. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

24 pages, 27907 KB  
Article
Efficient Object-Related Scene Text Grouping Pipeline for Visual Scene Analysis in Large-Scale Investigative Data
by Enrique Shinohara, Jorge García, Luis Unzueta and Peter Leškovský
Electronics 2026, 15(1), 12; https://doi.org/10.3390/electronics15010012 - 19 Dec 2025
Viewed by 104
Abstract
Law Enforcement Agencies (LEAs) typically analyse vast collections of media files, extracting visual information that helps them to advance investigations. While recent advancements in deep learning-based computer vision algorithms have revolutionised the ability to detect multi-class objects and text instances (characters, words, numbers) [...] Read more.
Law Enforcement Agencies (LEAs) typically analyse vast collections of media files, extracting visual information that helps them to advance investigations. While recent advancements in deep learning-based computer vision algorithms have revolutionised the ability to detect multi-class objects and text instances (characters, words, numbers) from in-the-wild scenes, their association remains relatively unexplored. Previous studies focus on clustering text given its semantic relationship or layout, rather than its relationship with objects. In this paper, we present an efficient, modular pipeline for contextual scene text grouping with three complementary strategies: 2D planar segmentation, multi-class instance segmentation and promptable segmentation. The strategies address common scenes where related text instances frequently share the same 2D planar surface and object (vehicle, banner, etc.). Evaluated on a custom dataset of 1100 images, the overall grouping performance remained consistently high across all three strategies (B-Cubed F1 92–95%; Pairwise F1 80–82%), with adjusted Rand indices between 0.08 and 0.23. Our results demonstrate clear trade-offs between computational efficiency and contextual generalisation, where geometric methods offer reliability, semantic approaches provide scalability and class-agnostic strategies offer the most robust generalisation. The dataset used for testing will be made available upon request. Full article
(This article belongs to the Special Issue Deep Learning-Based Scene Text Detection)
Show Figures

Figure 1

17 pages, 1903 KB  
Article
GMAFNet: Gated Mechanism Adaptive Fusion Network for 3D Semantic Segmentation of LiDAR Point Clouds
by Xiangbin Kong, Weijun Wu, Minghu Wu, Zhihang Gui, Zhe Luo and Chuyu Miao
Electronics 2025, 14(24), 4917; https://doi.org/10.3390/electronics14244917 - 15 Dec 2025
Viewed by 196
Abstract
Three-dimensional semantic segmentation plays a crucial role in advancing scene understanding in fields such as autonomous driving, drones, and robotic applications. Existing studies usually improve prediction accuracy by fusing data from vehicle-mounted cameras and vehicle-mounted LiDAR. However, current semantic segmentation methods face two [...] Read more.
Three-dimensional semantic segmentation plays a crucial role in advancing scene understanding in fields such as autonomous driving, drones, and robotic applications. Existing studies usually improve prediction accuracy by fusing data from vehicle-mounted cameras and vehicle-mounted LiDAR. However, current semantic segmentation methods face two main challenges: first, they often directly fuse 2D and 3D features, leading to the problem of information redundancy in the fusion process; second, there are often issues of image feature loss and missing point cloud geometric information in the feature extraction stage. From the perspective of multimodal fusion, this paper proposes a point cloud semantic segmentation method based on a multimodal gated attention mechanism. The method comprises a feature extraction network and a gated attention fusion and segmentation network. The feature extraction network utilizes a 2D image feature extraction structure and a 3D point cloud feature extraction structure to extract RGB image features and point cloud features, respectively. Through feature extraction and global feature supplementation, it effectively mitigates the issues of fine-grained image feature loss and point cloud geometric structure deficiency. The gated attention fusion and segmentation network increases the network’s attention to important categories such as vehicles and pedestrians through an attention mechanism and then uses a dynamic gated attention mechanism to control the respective weights of 2D and 3D features in the fusion process, enabling it to solve the problem of information redundancy in feature fusion. Finally, a 3D decoder is used for point cloud semantic segmentation. In this paper, tests will be conducted on the SemanticKITTI and nuScenes large-scene point cloud datasets. Full article
Show Figures

Figure 1

20 pages, 4253 KB  
Article
From Building Deliverables to Open Scene Description: A Pipeline for Lifecycle 3D Interoperability
by Guoqian Ren, Chengzheng Huang and Tengxiang Su
Buildings 2025, 15(24), 4503; https://doi.org/10.3390/buildings15244503 - 12 Dec 2025
Viewed by 268
Abstract
Industrial deliverables in the AEC/FM sector are increasingly specified, validated, and governed by open standards. However, the machine-readable delivery specifications rarely propagate intact into the real-time collaborative 3D scene descriptions required by digital twins, XR, large-scale simulation, and visualization. This paper proposes a [...] Read more.
Industrial deliverables in the AEC/FM sector are increasingly specified, validated, and governed by open standards. However, the machine-readable delivery specifications rarely propagate intact into the real-time collaborative 3D scene descriptions required by digital twins, XR, large-scale simulation, and visualization. This paper proposes a pipeline that transforms industrial deliverables into semantically faithful, queryable, and render-ready open scene descriptions. Unlike existing workflows that focus on geometric translation via connectors or intermediate formats, the proposed pipeline aligns defined delivery specifications with schema-aware USD composition so that contractual semantics remain executable in the scene. The pipeline comprises delivery specification, which records required objects, attributes, and provenance as versioned rule sets; semantically bound scene realization, which builds an open scene graph that preserves spatial hierarchy and identifiers, while linking rich properties through lightweight references; and interactive sustainment, which lets multiple engines render, analyze, and update the scene while allowing rules to be re-applied at any time. It presents a prototype and roadmap that make open scene description a streaming-ready execution layer for building deliverables, enabling consistent semantics, and reuse across diverse 3D engines. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

30 pages, 22912 KB  
Article
HV-LIOM: Adaptive Hash-Voxel LiDAR–Inertial SLAM with Multi-Resolution Relocalization and Reinforcement Learning for Autonomous Exploration
by Shicheng Fan, Xiaopeng Chen, Weimin Zhang, Peng Xu, Zhengqing Zuo, Xinyan Tan, Xiaohai He, Chandan Sheikder, Meijun Guo and Chengxiang Li
Sensors 2025, 25(24), 7558; https://doi.org/10.3390/s25247558 - 12 Dec 2025
Viewed by 424
Abstract
This paper presents HV-LIOM (Adaptive Hash-Voxel LiDAR–Inertial Odometry and Mapping), a unified LiDAR–inertial SLAM and autonomous exploration framework for real-time 3D mapping in dynamic, GNSS-denied environments. We propose an adaptive hash-voxel mapping scheme that improves memory efficiency and real-time state estimation by subdividing [...] Read more.
This paper presents HV-LIOM (Adaptive Hash-Voxel LiDAR–Inertial Odometry and Mapping), a unified LiDAR–inertial SLAM and autonomous exploration framework for real-time 3D mapping in dynamic, GNSS-denied environments. We propose an adaptive hash-voxel mapping scheme that improves memory efficiency and real-time state estimation by subdividing voxels according to local geometric complexity and point density. To enhance robustness to poor initialization, we introduce a multi-resolution relocalization strategy that enables reliable localization against a prior map under large initial pose errors. A learning-based loop-closure module further detects revisited places and injects global constraints, while global pose-graph optimization maintains long-term map consistency. For autonomous exploration, we integrate a Soft Actor–Critic (SAC) policy that selects informative navigation targets online, improving exploration efficiency in unknown scenes. We evaluate HV-LIOM on public datasets (Hilti and NCLT) and a custom mobile robot platform. Results show that HV-LIOM improves absolute pose accuracy by up to 15.2% over FAST-LIO2 in indoor settings and by 7.6% in large-scale outdoor scenarios. The learned exploration policy achieves comparable or superior area coverage with reduced travel distance and exploration time relative to sampling-based and learning-based baselines. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

20 pages, 5083 KB  
Article
MDR–SLAM: Robust 3D Mapping in Low-Texture Scenes with a Decoupled Approach and Temporal Filtering
by Kailin Zhang and Letao Zhou
Electronics 2025, 14(24), 4864; https://doi.org/10.3390/electronics14244864 - 10 Dec 2025
Viewed by 257
Abstract
Realizing real-time dense 3D reconstruction on resource-limited mobile platforms remains a significant challenge, particularly in low-texture environments that demand robust multi-frame fusion to resolve matching ambiguities. However, the inherent tight coupling of pose estimation and mapping in traditional monolithic SLAM architectures imposes a [...] Read more.
Realizing real-time dense 3D reconstruction on resource-limited mobile platforms remains a significant challenge, particularly in low-texture environments that demand robust multi-frame fusion to resolve matching ambiguities. However, the inherent tight coupling of pose estimation and mapping in traditional monolithic SLAM architectures imposes a severe restriction on integrating high-complexity fusion algorithms without compromising tracking stability. To overcome these limitations, this paper proposes MDR–SLAM, a modular and fully decoupled stereo framework. The system features a novel keyframe-driven temporal filter that synergizes efficient ELAS stereo matching with Kalman filtering to effectively accumulate geometric constraints, thereby enhancing reconstruction density in textureless areas. Furthermore, a confidence-based fusion backend is employed to incrementally maintain global map consistency and filter outliers. Quantitative evaluation on the NUFR-M3F indoor dataset demonstrates the effectiveness of the proposed method: compared to the standard single-frame baseline, MDR–SLAM reduces map RMSE by 83.3% (to 0.012 m) and global trajectory drift by 55.6%, while significantly improving map completeness. The system operates entirely on CPU resources with a stable 4.7 Hz mapping frequency, verifying its suitability for embedded mobile robotics. Full article
(This article belongs to the Special Issue Recent Advance of Auto Navigation in Indoor Scenarios)
Show Figures

Figure 1

25 pages, 3616 KB  
Article
A Deep Learning-Driven Semantic Mapping Strategy for Robotic Inspection of Desalination Facilities
by Albandari Alotaibi, Reem Alrashidi, Hanan Alatawi, Lamaa Duwayriat, Aseel Binnouh, Tareq Alhmiedat and Ahmad Al-Qerem
Machines 2025, 13(12), 1129; https://doi.org/10.3390/machines13121129 - 8 Dec 2025
Viewed by 287
Abstract
The area of robot autonomous navigation has become essential for reducing labor-intensive tasks. These robots’ current navigation systems are based on sensed geometrical structures of the environment, with the engagement of an array of sensor units such as laser scanners, range-finders, and light [...] Read more.
The area of robot autonomous navigation has become essential for reducing labor-intensive tasks. These robots’ current navigation systems are based on sensed geometrical structures of the environment, with the engagement of an array of sensor units such as laser scanners, range-finders, and light detection and ranging (LiDAR) in order to obtain the environment layout. Scene understanding is an important task in the development of robots that need to act autonomously. Hence, this paper presents an efficient semantic mapping system that integrates LiDAR, RGB-D, and odometry data to generate precise and information-rich maps. The proposed system enables the automatic detection and labeling of critical infrastructure components, while preserving high spatial accuracy. As a case study, the system was applied to a desalination plant, where it interactively labeled key entities by integrating Simultaneous Localization and Mapping (SLAM) with vision-based techniques in order to determine the location of installed pipes. The developed system was validated using an efficient development environment known as Robot Operating System (ROS) and a two-wheel-drive robot platform. Several simulations and real-world experiments were conducted to validate the efficiency of the developed semantic mapping system. The obtained results are promising, as the developed semantic map generation system achieves an average object detection accuracy of 84.97% and an average localization error of 1.79 m. Full article
Show Figures

Figure 1

34 pages, 20812 KB  
Article
Surreal AI: The Generation, Reconstruction, and Assessment of Surreal Images and 3D Models
by Naai-Jung Shih
Technologies 2025, 13(12), 577; https://doi.org/10.3390/technologies13120577 - 8 Dec 2025
Viewed by 482
Abstract
Surrealism applies metaphors to create a vocabulary of contexts and scenes. Can AI interpret surrealism? What occurs if a negative prompt is input for 3D reconstruction? This study aims to generate surreal images in AI and to assess the subsequent 3D reconstructed models [...] Read more.
Surrealism applies metaphors to create a vocabulary of contexts and scenes. Can AI interpret surrealism? What occurs if a negative prompt is input for 3D reconstruction? This study aims to generate surreal images in AI and to assess the subsequent 3D reconstructed models as an exemplification of context. This AI interpretation study uses 87 sets of conflicting prompts to generate images with novel 3D structural and visual details. Eight characteristic 3D models were selected with geometric features modified by functions, such as the reduction in noise, to identify the changes made to the original shape, with upper and lower bounds of between 92.11% and 47.89% for area and between 20.51% and 1.46% for volume, which indicates structural details. This study creates a unique numeric identity of surreal images upon 3D reconstruction in terms of the relative % of the changes made to the original shape. AI can create a connection between 2D surreal imagination and the 3D physical world, in which the images and models are also appropriate for video morphing, situated elaboration in AR scenes, and verified 3D RP prints. Full article
Show Figures

Graphical abstract

23 pages, 21889 KB  
Article
Multi-Stage Domain-Adapted 6D Pose Estimation of Warehouse Load Carriers: A Deep Convolutional Neural Network Approach
by Hisham ElMoaqet, Mohammad Rashed and Mohamed Bakr
Machines 2025, 13(12), 1126; https://doi.org/10.3390/machines13121126 - 8 Dec 2025
Viewed by 331
Abstract
Intelligent autonomous guided vehicles (AGVs) are of huge importance in facilitating the automation of load handling in the era of Industry 4.0. AGVs heavily rely on environmental perception, such as the 6D poses of objects, in order to execute complex tasks efficiently. Therefore, [...] Read more.
Intelligent autonomous guided vehicles (AGVs) are of huge importance in facilitating the automation of load handling in the era of Industry 4.0. AGVs heavily rely on environmental perception, such as the 6D poses of objects, in order to execute complex tasks efficiently. Therefore, estimating the 6D poses of objects in warehouses is crucial for proper load handling in modern intra-logistics warehouse environments. This study presents a deep convolutional neural network approach for estimating the pose of warehouse load carriers. Recognizing the paucity of labeled real 6D pose estimation data, the proposed approach uses only synthetic RGB warehouse data to train the network. Domain adaption was applied using a Contrastive Unpaired Image-to-Image Translation (CUT) Network to generate domain-adapted training data that can bridge the domain gap between synthetic and real environments and help the model generalize better over realistic scenes. In order to increase the detection range, a multi-stage refinement detection pipeline is developed using consistent multi-view multi-object 6D pose estimation (CosyPose) networks. The proposed framework was tested with different training scenarios, and its performance was comprehensively analyzed and compared with a state-of-the-art non-adapted single-stage pose estimation approach, showing an improvement of up to 80% on the ADD-S AUC metric. Using a mix of adapted and non-adapted synthetic data along with splitting the state space into multiple refiners, the proposed approach achieved an ADD-S AUC performance greater than 0.81 over a wide detection range, from one and up to five meters, while still being trained on a relatively small synthetic dataset for a limited number of epochs. Full article
(This article belongs to the Special Issue Industry 4.0: Intelligent Robots in Smart Manufacturing)
Show Figures

Figure 1

Back to TopTop