MDPI - Publisher of Open Access Journals

25 pages, 9564 KB

Open AccessArticle

Semantic-Aware Cross-Modal Transfer for UAV-LiDAR Individual Tree Segmentation

by Fuyang Zhou, Haiqing He, Ting Chen, Tao Zhang, Minglu Yang, Ye Yuan and Jiahao Liu

Remote Sens. 2025, 17(16), 2805; https://doi.org/10.3390/rs17162805 - 13 Aug 2025

Viewed by 660

Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address these issues, this study proposes a cross-modal semantic transfer framework tailored for individual tree point cloud segmentation in forested scenes. Leveraging co-registered UAV-acquired RGB imagery and LiDAR data, we construct a technical pipeline of “2D semantic inference—3D spatial mapping—cross-modal fusion” to enable annotation-free semantic parsing of 3D individual trees. Specifically, we first introduce a novel Multi-Source Feature Fusion Network (MSFFNet) to achieve accurate instance-level segmentation of individual trees in the 2D image domain. Subsequently, we develop a hierarchical two-stage registration strategy to effectively align dense matched point clouds (MPC) generated from UAV imagery with LiDAR point clouds. On this basis, we propose a probabilistic cross-modal semantic transfer model that builds a semantic probability field through multi-view projection and the expectation–maximization algorithm. By integrating geometric features and semantic confidence, the model establishes semantic correspondences between 2D pixels and 3D points, thereby achieving spatially consistent semantic label mapping. This facilitates the transfer of semantic annotations from the 2D image domain to the 3D point cloud domain. The proposed method is evaluated on two forest datasets. The results demonstrate that the proposed individual tree instance segmentation approach achieves the highest performance, with an IoU of 87.60%, compared to state-of-the-art methods such as Mask R-CNN, SOLOV2, and Mask2Former. Furthermore, the cross-modal semantic label transfer framework significantly outperforms existing mainstream methods in individual tree point cloud semantic segmentation across complex forest scenarios. Full article

(This article belongs to the Topic Vegetation Characterization and Classification With Multi-Source Remote Sensing Data)

► Show Figures

Figure 1

23 pages, 32729 KB

Open AccessArticle

PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

by Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali and Hafiz Husnain Raza Sherazi

Information 2024, 15(11), 739; https://doi.org/10.3390/info15110739 - 19 Nov 2024

Cited by 4 | Viewed by 3019

Abstract

Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision. Full article

(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)

► Show Figures

Figure 1

27 pages, 13491 KB

Open AccessArticle

Safety Evaluation of Reinforced Concrete Structures Using Multi-Source Fusion Uncertainty Cloud Inference and Experimental Study

by Zhao Liu, Huiyong Guo and Bo Zhang

Sensors 2023, 23(20), 8638; https://doi.org/10.3390/s23208638 - 22 Oct 2023

Cited by 2 | Viewed by 1912

Abstract

Structural damage detection and safety evaluations have emerged as a core driving force in structural health monitoring (SHM). Focusing on the multi-source monitoring data in sensing systems and the uncertainty caused by initial defects and monitoring errors, in this study, we develop a comprehensive method for evaluating structural safety, named multi-source fusion uncertainty cloud inference (MFUCI), that focuses on characterizing the relationship between condition indexes and structural performance in order to quantify the structural health status. Firstly, based on cloud theory, the cloud numerical characteristics of the condition index cloud drops are used to establish the qualitative rule base. Next, the proposed multi-source fusion generator yields a multi-source joint certainty degree, which is then transformed into cloud drops with certainty degree information. Lastly, a quantitative structural health evaluation is performed through precision processing. This study focuses on the numerical simulation of an RC frame at the structural level and an RC T-beam damage test at the component level, based on the stiffness degradation process. The results show that the proposed method is effective at evaluating the health of components and structures in a quantitative manner. It demonstrates reliability and robustness by incorporating uncertainty information through noise immunity and cross-domain inference, outperforming baseline models such as Bayesian neural network (BNN) in uncertainty estimations and LSTM in point estimations. Full article

(This article belongs to the Topic AI Enhanced Civil Infrastructure Safety)

► Show Figures

Figure 1

15 pages, 5107 KB

Open AccessArticle

PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection

by Zheng Zhang, Ruyu Xu and Qing Tian

Mathematics 2023, 11(20), 4277; https://doi.org/10.3390/math11204277 - 13 Oct 2023

Cited by 2 | Viewed by 2374

Abstract

In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

25 pages, 7806 KB

Open AccessArticle

A Cross-Source Point Cloud Registration Algorithm Based on Trigonometric Mutation Chaotic Harris Hawk Optimisation for Rockfill Dam Construction

by Bingyu Ren, Hao Zhao and Shuyang Han

Sensors 2023, 23(10), 4942; https://doi.org/10.3390/s23104942 - 21 May 2023

Cited by 4 | Viewed by 2258

Abstract

A high-precision three-dimensional (3D) model is the premise and vehicle of digitalising hydraulic engineering. Unmanned aerial vehicle (UAV) tilt photography and 3D laser scanning are widely used for 3D model reconstruction. Affected by the complex production environment, in a traditional 3D reconstruction based on a single surveying and mapping technology, it is difficult to simultaneously balance the rapid acquisition of high-precision 3D information and the accurate acquisition of multi-angle feature texture characteristics. To ensure the comprehensive utilisation of multi-source data, a cross-source point cloud registration method integrating the trigonometric mutation chaotic Harris hawk optimisation (TMCHHO) coarse registration algorithm and the iterative closest point (ICP) fine registration algorithm is proposed. The TMCHHO algorithm generates a piecewise linear chaotic map sequence in the population initialisation stage to improve population diversity. Furthermore, it employs trigonometric mutation to perturb the population in the development stage and thus avoid the problem of falling into local optima. Finally, the proposed method was applied to the Lianghekou project. The accuracy and integrity of the fusion model compared with those of the realistic modelling solutions of a single mapping system improved. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City)

► Show Figures

Figure 1

26 pages, 9109 KB

Open AccessArticle

Pose Estimation of Non-Cooperative Space Targets Based on Cross-Source Point Cloud Fusion

by Jie Li, Yiqi Zhuang, Qi Peng and Liang Zhao

Remote Sens. 2021, 13(21), 4239; https://doi.org/10.3390/rs13214239 - 22 Oct 2021

Cited by 19 | Viewed by 3539

Abstract

On-orbit space technology is used for tasks such as the relative navigation of non-cooperative targets, rendezvous and docking, on-orbit assembly, and space debris removal. In particular, the pose estimation of space non-cooperative targets is a prerequisite for studying these applications. The capabilities of a single sensor are limited, making it difficult to achieve high accuracy in the measurement range. Against this backdrop, a non-cooperative target pose measurement system fused with multi-source sensors was designed in this study. First, a cross-source point cloud fusion algorithm was developed. This algorithm uses the unified and simplified expression of geometric elements in conformal geometry algebra, breaks the traditional point-to-point correspondence, and constructs matching relationships between points and spheres. Next, for the fused point cloud, we proposed a plane clustering-method-based CGA to eliminate point cloud diffusion and then reconstruct the 3D contour model. Finally, we used a twistor along with the Clohessy–Wiltshire equation to obtain the posture and other motion parameters of the non-cooperative target through the unscented Kalman filter. In both the numerical simulations and the semi-physical experiments, the proposed measurement system met the requirements for non-cooperative target measurement accuracy, and the estimation error of the angle of the rotating spindle was 30% lower than that of other, previously studied methods. The proposed cross-source point cloud fusion algorithm can achieve high registration accuracy for point clouds with different densities and small overlap rates. Full article

► Show Figures

Graphical abstract

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI