Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (447)

Search Parameters:
Keywords = representation of point cloud

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 4377 KB  
Article
GeoAssemble: A Geometry-Aware Hierarchical Method for Point Cloud-Based Multi-Fragment Assembly
by Caiqin Jia, Yali Ren, Zhi Wang and Yuan Zhang
Sensors 2025, 25(21), 6533; https://doi.org/10.3390/s25216533 - 23 Oct 2025
Viewed by 185
Abstract
Three-dimensional fragment assembly technology has significant application value in fields such as cultural relic restoration, medical image analysis, and industrial quality inspection. To address the common challenges of limited feature representation ability and insufficient assembling accuracy in existing methods, this paper proposes a [...] Read more.
Three-dimensional fragment assembly technology has significant application value in fields such as cultural relic restoration, medical image analysis, and industrial quality inspection. To address the common challenges of limited feature representation ability and insufficient assembling accuracy in existing methods, this paper proposes a geometry-aware hierarchical fragment assembly framework (GeoAssemble). The core contributions of our work are threefold: first, the framework utilizes DGCNN to extract local geometric features while integrating centroid relative positions to construct a multi-dimensional feature representation, thereby enhancing the identification quality of fracture points; secondly, it designs a two-stage matching strategy that combines global shape similarity coarse matching with local geometric affinity fine matching to effectively reduce matching ambiguity; finally, we propose an auxiliary transformation estimation mechanism based on the geometric center of fracture point clouds to robustly initialize pose parameters, thereby improving both alignment accuracy and convergence stability. Experiments conducted on both synthetic and real-world fragment datasets demonstrate that this method significantly outperforms baseline methods in matching accuracy and exhibits higher robustness in multi-fragment scenarios. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

29 pages, 6329 KB  
Article
Non-Contact Measurement of Sunflower Flowerhead Morphology Using Mobile-Boosted Lightweight Asymmetric (MBLA)-YOLO and Point Cloud Technology
by Qiang Wang, Xinyuan Wei, Kaixuan Li, Boxin Cao and Wuping Zhang
Agriculture 2025, 15(21), 2180; https://doi.org/10.3390/agriculture15212180 - 22 Oct 2025
Viewed by 234
Abstract
The diameter of the sunflower flower head and the thickness of its margins are important crop phenotypic parameters. Traditional, single-dimensional two-dimensional imaging methods often struggle to balance precision with computational efficiency. This paper addresses the limitations of the YOLOv11n-seg model in the instance [...] Read more.
The diameter of the sunflower flower head and the thickness of its margins are important crop phenotypic parameters. Traditional, single-dimensional two-dimensional imaging methods often struggle to balance precision with computational efficiency. This paper addresses the limitations of the YOLOv11n-seg model in the instance segmentation of floral disk fine structures by proposing the MBLA-YOLO instance segmentation model, achieving both lightweight efficiency and high accuracy. Building upon this foundation, a non-contact measurement method is proposed that combines an improved model with three-dimensional point cloud analysis to precisely extract key structural parameters of the flower head. First, image annotation is employed to eliminate interference from petals and sepals, whilst instance segmentation models are used to delineate the target region; The segmentation results for the disc surface (front) and edges (sides) are then mapped onto the three-dimensional point cloud space. Target regions are extracted, and following processing, separate models are constructed for the disc surface and edges. Finally, with regard to the differences between the surface and edge structures, targeted methods are employed for their respective calculations. Whilst maintaining lightweight characteristics, the proposed MBLA-YOLO model achieves simultaneous improvements in accuracy and efficiency compared to the baseline YOLOv11n-seg. The introduced CKMB backbone module enhances feature modelling capabilities for complex structural details, whilst the LADH detection head improves small object recognition and boundary segmentation accuracy. Specifically, the CKMB module integrates MBConv and channel attention to strengthen multi-scale feature extraction and representation, while the LADH module adopts a tri-branch design for classification, regression, and IoU prediction, structurally improving detection precision and boundary recognition. This research not only demonstrates superior accuracy and robustness but also significantly reduces computational overhead, thereby achieving an excellent balance between model efficiency and measurement precision. This method avoids the need for three-dimensional reconstruction of the entire plant and multi-view point cloud registration, thereby reducing data redundancy and computational resource expenditure. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

23 pages, 6492 KB  
Article
MAC-I2P: I2P Registration with Modality Approximation and Cone–Block–Point Matching
by Yunda Sun, Lin Zhang and Shengjie Zhao
Appl. Sci. 2025, 15(20), 11212; https://doi.org/10.3390/app152011212 - 20 Oct 2025
Viewed by 233
Abstract
The misaligned geometric representation between images and point clouds and the different data densities limit the performance of I2P registration. The former hinders the learning of cross-modal features, and the latter leads to low-quality 2D–3D matching. To address these challenges, we propose a [...] Read more.
The misaligned geometric representation between images and point clouds and the different data densities limit the performance of I2P registration. The former hinders the learning of cross-modal features, and the latter leads to low-quality 2D–3D matching. To address these challenges, we propose a novel I2P registration framework called MAC-I2P, which is composed of a modality approximation module and a cone–block–point matching strategy. By generating pseudo-RGBD images, the module mitigates geometrical misalignment and converts 2D images into 3D space. In addition, it voxelizes the point cloud so that the features of the image and the point cloud can be processed in a similar way, thereby enhancing the repeatability of cross-modal features. Taking into account the different data densities and perception ranges between images and point clouds, the cone–block–point matching relaxes the strict one-to-one matching criterion by gradually refining the matching candidates. As a result, it effectively improves the 2D–3D matching quality. Notably, MAC-I2P is supervised by multiple matching objectives and optimized in an end-to-end manner, which further strengthens the cross-modal representation capability of the model. Extensive experiments conducted on KITTI Odometry and Oxford Robotcar demonstrate the superior performance of our MAC-I2P. Our approach surpasses the current state-of-the-art (SOTA) by 8∼63.2% in relative translation error (RTE) and 19.3∼38.5% in relative rotation error (RRE). The ablation experiments also confirm the effectiveness of each proposed component. Full article
(This article belongs to the Special Issue Computer Vision, Robotics and Intelligent Systems)
Show Figures

Figure 1

20 pages, 6483 KB  
Article
Loop-MapNet: A Multi-Modal HDMap Perception Framework with SDMap Dynamic Evolution and Priors
by Yuxuan Tang, Jie Hu, Daode Zhang, Wencai Xu, Feiyu Zhao and Xinghao Cheng
Appl. Sci. 2025, 15(20), 11160; https://doi.org/10.3390/app152011160 - 17 Oct 2025
Viewed by 290
Abstract
High-definition maps (HDMaps) are critical for safe autonomy on structured roads. Yet traditional production—relying on dedicated mapping fleets and manual quality control—is costly and slow, impeding large-scale, frequent updates. Recently, standard-definition maps (SDMaps) derived from remote sensing have been adopted as priors to [...] Read more.
High-definition maps (HDMaps) are critical for safe autonomy on structured roads. Yet traditional production—relying on dedicated mapping fleets and manual quality control—is costly and slow, impeding large-scale, frequent updates. Recently, standard-definition maps (SDMaps) derived from remote sensing have been adopted as priors to support HDMap perception, lowering cost but struggling with subtle urban changes and localization drift. We propose Loop-MapNet, a self-evolving, multimodal, closed-loop mapping framework. Loop-MapNet effectively leverages surround-view images, LiDAR point clouds, and SDMaps; it fuses multi-scale vision via a weighted BiFPN, and couples PointPillars BEV and SDMap topology encoders for cross-modal sensing. A Transformer-based bidirectional adaptive cross-attention aligns SDMap with online perception, enabling robust fusion under heterogeneity. We further introduce a confidence-guided masked autoencoder (CG-MAE) that leverages confidence and probabilistic distillation to both capture implicit SDMap priors and enhance the detailed representation of low-confidence HDMap regions. With spatiotemporal consistency checks, Loop-MapNet incrementally updates SDMaps to form a perception–mapping–update loop, compensating remote-sensing latency and enabling online map optimization. On nuScenes, within 120 m, Loop-MapNet attains 61.05% mIoU, surpassing the best baseline by 0.77%. Under extreme localization errors, it maintains 60.46% mIoU, improving robustness by 2.77%; CG-MAE pre-training raises accuracy in low-confidence regions by 1.72%. These results demonstrate advantages in fusion and robustness, moving beyond one-way prior injection and enabling HDMap–SDMap co-evolution for closed-loop autonomy and rapid SDMap refresh from remote sensing. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 1126 KB  
Article
Generative Implicit Steganography via Message Mapping
by Yangjie Zhong, Jia Liu, Peng Luo, Yan Ke and Mingshu Zhang
Appl. Sci. 2025, 15(20), 11041; https://doi.org/10.3390/app152011041 - 15 Oct 2025
Viewed by 251
Abstract
Generative steganography (GS) generates stego-media via secret messages, but existing GS only targets single-type multimedia data with poor universality. The generator and extractor sizes are highly coupled with resolution. Message mapping converts secret messages and noise, yet current GS schemes based on it [...] Read more.
Generative steganography (GS) generates stego-media via secret messages, but existing GS only targets single-type multimedia data with poor universality. The generator and extractor sizes are highly coupled with resolution. Message mapping converts secret messages and noise, yet current GS schemes based on it use gridded data, failing to generate diverse multimedia universally. Inspired by implicit neural representation (INR), we propose generative implicit steganography via message mapping (GIS). We designed single-bit and multi-bit message mapping schemes in function domains. The scheme’s function generator eliminates the coupling between model and gridded data sizes, enabling diverse multimedia generation and breaking resolution limits. A dedicated point cloud extractor is trained for adaptability. Through a literature review, this scheme is the first to perform message mapping in the functional domain. During the experiment, taking images as an example, methods such as PSNR, StegExpose, and neural pruning were used to demonstrate that the generated image quality is almost indistinguishable from the real image. At the same time, the generated image is robust. The accuracy of message extraction can reach 96.88% when the embedding capacity is 1 bpp, 89.84% when the embedding capacity is 2 bpp, and 82.21% when the pruning rate is 0.3. Full article
Show Figures

Figure 1

20 pages, 18957 KB  
Article
Multi-Modal Data Fusion for 3D Object Detection Using Dual-Attention Mechanism
by Mengying Han, Benlan Shen and Jiuhong Ruan
Sensors 2025, 25(20), 6360; https://doi.org/10.3390/s25206360 - 14 Oct 2025
Viewed by 508
Abstract
To address the issue of missing feature information for small objects caused by the sparsity and irregularity of point clouds, as well as the poor detection performance on small objects due to their weak feature representation, this paper proposes a multi-modal 3D object [...] Read more.
To address the issue of missing feature information for small objects caused by the sparsity and irregularity of point clouds, as well as the poor detection performance on small objects due to their weak feature representation, this paper proposes a multi-modal 3D object detection method based on an improved PointPillars framework. First, LiDAR point clouds are fused with camera images at the data level, incorporating 2D semantic information to enhance small-object feature representation. Second, a Pillar-wise Channel Attention (PCA) module is introduced to emphasize critical features before converting pillar features into pseudo-image representations. Additionally, a Spatial Attention Module (SAM) is embedded into the backbone network to enhance spatial feature representation. Experiments on the KITTI dataset show that, compared with the baseline PointPillars, the proposed method significantly improves small-object detection performance. Specifically, under the bird’s-eye view (BEV) evaluation metrics, the Average Precision (AP) for pedestrians and cyclists increases by 7.06% and 3.08%, respectively; under the 3D evaluation metrics, these improvements are 4.36% and 2.58%. Compared with existing methods, the improved model also achieves relatively higher accuracy in detecting small objects. Visualization results further demonstrate the enhanced detection capability of the proposed method for small objects with different difficulty levels. Overall, the proposed approach effectively improves 3D object detection performance, particularly for small objects, in complex autonomous driving scenarios. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

24 pages, 1782 KB  
Article
Point Cloud Completion Network Based on Multi-Dimensional Adaptive Feature Fusion and Informative Channel Attention Mechanism
by Di Tian, Jiahang Shi, Jiabo Li and Mingming Gong
Sensors 2025, 25(19), 6173; https://doi.org/10.3390/s25196173 - 5 Oct 2025
Viewed by 602
Abstract
With the continuous advancement of 3D perception technology, point cloud data has found increasingly widespread application. However, the presence of holes in point cloud data caused by device limitations and environmental interference severely restricts algorithmic performance, making point cloud completion a research topic [...] Read more.
With the continuous advancement of 3D perception technology, point cloud data has found increasingly widespread application. However, the presence of holes in point cloud data caused by device limitations and environmental interference severely restricts algorithmic performance, making point cloud completion a research topic of high interest. This study observes that most existing mainstream point cloud completion methods primarily focus on capturing global features, while often underrepresenting local structural details. Moreover, the generation process of complete point clouds lacks effective control over fine-grained features, leading to insufficient detail in the completed outputs and reduced data integrity. To address these issues, we propose a Set Combination Multi-Layer Perceptron (SCMP) module that enables the simultaneous extraction of both local and global features, thereby reducing the loss of local detail information. In addition, we introduce the Squeeze Excitation Pooling Network (SEP-Net) module, an informative channel attention mechanism capable of adaptively identifying and enhancing critical channel features, thus improving the overall feature representation capability. Based on these modules, we further design a novel Feature Fusion Point Fractal Network (FFPF-Net), which fuses multi-dimensional point cloud features to enhance representation capacity and progressively refines the missing regions to generate a more complete point cloud. Extensive experiments conducted on the ShapeNet-Part and MVP datasets compared to L-GAN and PCN showed average prediction error improvements of 1.3 and 1.4, respectively. The average completion errors on the ShapeNet-Part and MVP datasets are 0.783 and 0.824, highlighting the improved fine-detail reconstruction capability of our network. These results indicate that the proposed method effectively enhances point cloud completion performance and can further promote the practical application of point cloud data in various real-world scenarios. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

19 pages, 5861 KB  
Article
Topological Signal Processing from Stereo Visual SLAM
by Eleonora Di Salvo, Tommaso Latino, Maria Sanzone, Alessia Trozzo and Stefania Colonnese
Sensors 2025, 25(19), 6103; https://doi.org/10.3390/s25196103 - 3 Oct 2025
Viewed by 359
Abstract
Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are [...] Read more.
Topological signal processing is emerging alongside Graph Signal Processing (GSP) in various applications, incorporating higher-order connectivity structures—such as faces—in addition to nodes and edges, for enriched connectivity modeling. Rich point clouds acquired by multi-camera systems in Visual Simultaneous Localization and Mapping (V-SLAM) are typically processed using graph-based methods. In this work, we introduce a topological signal processing (TSP) framework that integrates texture information extracted from V-SLAM; we refer to this framework as TSP-SLAM. We show how TSP-SLAM enables the extension of graph-based point cloud processing to more advanced topological signal processing techniques. We demonstrate, on real stereo data, that TSP-SLAM enables a richer point cloud representation by associating signals not only with vertices but also with edges and faces of the mesh computed from the point cloud. Numerical results show that TSP-SLAM supports the design of topological filtering algorithms by exploiting the mapping between the 3D mesh faces, edges and vertices and their 2D image projections. These findings confirm the potential of TSP-SLAM for topological signal processing of point cloud data acquired in challenging V-SLAM environments. Full article
(This article belongs to the Special Issue Stereo Vision Sensing and Image Processing)
Show Figures

Figure 1

38 pages, 10032 KB  
Article
Closed and Structural Optimization for 3D Line Segment Extraction in Building Point Clouds
by Ruoming Zhai, Xianquan Han, Peng Wan, Jianzhou Li, Yifeng He and Bangning Ding
Remote Sens. 2025, 17(18), 3234; https://doi.org/10.3390/rs17183234 - 18 Sep 2025
Viewed by 438
Abstract
The extraction of architectural structural line features can simplify the 3D spatial representation of built environments, reduce the storage and processing burden of large-scale point clouds, and provide essential geometric primitives for downstream modeling tasks. However, existing 3D line extraction methods suffer from [...] Read more.
The extraction of architectural structural line features can simplify the 3D spatial representation of built environments, reduce the storage and processing burden of large-scale point clouds, and provide essential geometric primitives for downstream modeling tasks. However, existing 3D line extraction methods suffer from incomplete and fragmented contours, with missing or misaligned intersections. To overcome these limitations, this study proposes a patch-level framework for 3D line extraction and structural optimization from building point clouds. The proposed method first partitions point clouds into planar patches and establishes local image planes for each patch, enabling a structured 2D representation of unstructured 3D data. Then, graph-cut segmentation is proposed to extract compact boundary contours, which are vectorized into closed lines and back-projected into 3D space to form the initial line segments. To improve geometric consistency, regularized geometric constraints, including adjacency, collinearity, and orthogonality constraints, are further designed to merge homogeneous segments, refine topology, and strengthen structural outlines. Finally, we evaluated the approach on three indoor building environments and four outdoor scenes, and experimental results show that it reduces noise and redundancy while significantly improving the completeness, closure, and alignment of 3D line features in various complex architectural structures. Full article
Show Figures

Figure 1

17 pages, 1773 KB  
Article
CrossInteraction: Multi-Modal Interaction and Alignment Strategy for 3D Perception
by Weiyi Zhao, Xinxin Liu and Yu Ding
Sensors 2025, 25(18), 5775; https://doi.org/10.3390/s25185775 - 16 Sep 2025
Cited by 1 | Viewed by 620
Abstract
Cameras and LiDAR are the primary sensors utilized in contemporary 3D object perception, leading to the development of various multi-modal detection algorithms for images, point clouds, and their fusion. Given the demanding accuracy requirements in autonomous driving environments, traditional multi-modal fusion techniques often [...] Read more.
Cameras and LiDAR are the primary sensors utilized in contemporary 3D object perception, leading to the development of various multi-modal detection algorithms for images, point clouds, and their fusion. Given the demanding accuracy requirements in autonomous driving environments, traditional multi-modal fusion techniques often overlook critical information from individual modalities and struggle to effectively align transformed features. In this paper, we introduce an improved modal interaction strategy, called CrossInteraction. This method enhances the interaction between modalities by using the output of the first modal representation as the input for the second interaction enhancement, resulting in better overall interaction effects. To further address the challenge of feature alignment errors, we employ a graph convolutional network. Finally, the prediction process is completed through a cross-attention mechanism, ensuring more accurate detection out- comes. Full article
Show Figures

Figure 1

38 pages, 24535 KB  
Article
Time-Series 3D Modeling of Tunnel Damage Through Fusion of Image and Point Cloud Data
by Chulhee Lee, Donggyou Kim, Dongku Kim and Joonoh Kang
Remote Sens. 2025, 17(18), 3173; https://doi.org/10.3390/rs17183173 - 12 Sep 2025
Viewed by 694
Abstract
Precise maintenance is vital for ensuring the safety of tunnel structures; however, traditional visual inspections are subjective and hazardous. Digital technologies such as LiDAR and imaging offer promising alternatives, but each has complementary limitations in geometric precision and visual representation. This study addresses [...] Read more.
Precise maintenance is vital for ensuring the safety of tunnel structures; however, traditional visual inspections are subjective and hazardous. Digital technologies such as LiDAR and imaging offer promising alternatives, but each has complementary limitations in geometric precision and visual representation. This study addresses these limitations by developing a three-dimensional modeling framework that integrates image and point cloud data and evaluates its effectiveness. Terrestrial LiDAR and UAV images were acquired three times over a freeze–thaw cycle at an aging, abandoned tunnel. Based on the data obtained, three types of 3D models were constructed: TLS-based, image-based, and fusion-based. A comparative evaluation results showed that the TLS-based model had excellent geometric accuracy but low resolution due to low point density. The image-based model had high density and excellent resolution but low geometric accuracy. In contrast, the fusion-based model achieved the lowest root mean squared error (RMSE), the highest geometric accuracy, and the highest resolution. Time-series analysis further demonstrated that only the fusion-based model could identify the complex damage progression mechanism in which leakage and icicle formation (visual changes) increased the damaged area by 55.8% (as measured by geometric changes). This also enabled quantitative distinction between active damage (leakage, structural damage) and stable-state damage (spalling, efflorescence, cracks). In conclusion, this study empirically demonstrates the necessity of data fusion for comprehensive tunnel condition diagnosis. It provides a benchmark for evaluating 3D modeling techniques in real-world environments and lays the foundation for digital twin development in data-driven preventive maintenance. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

21 pages, 8671 KB  
Article
IFE-CMT: Instance-Aware Fine-Grained Feature Enhancement Cross Modal Transformer for 3D Object Detection
by Xiaona Song, Haozhe Zhang, Haichao Liu, Xinxin Wang and Lijun Wang
Sensors 2025, 25(18), 5685; https://doi.org/10.3390/s25185685 - 12 Sep 2025
Viewed by 532
Abstract
In recent years, multi-modal 3D object detection algorithms have experienced significant development. However, current algorithms primarily focus on designing overall fusion strategies for multi-modal features, neglecting finer-grained representations, which leads to a decline in the detection accuracy of small objects. To address this [...] Read more.
In recent years, multi-modal 3D object detection algorithms have experienced significant development. However, current algorithms primarily focus on designing overall fusion strategies for multi-modal features, neglecting finer-grained representations, which leads to a decline in the detection accuracy of small objects. To address this issue, this paper proposes the Instance-aware Fine-grained feature Enhancement Cross Modal Transformer (IFE-CMT) model. We designed an Instance feature Enhancement Module (IE-Module), which can accurately extract object features from multi-modal data and use them to enhance overall features while avoiding view transformations and maintaining low computational overhead. Additionally, we design a new point cloud branch network that effectively expands the network’s receptive field, enhancing the model’s semantic expression capabilities while preserving texture details of the objects. Experimental results on the nuScenes dataset demonstrate that compared to the CMT model, our proposed IFE-CMT model improves mAP and NDS by 2.1% and 0.8% on the validation set, respectively. On the test set, it improves mAP and NDS by 1.9% and a 0.7%. Notably, for small object categories such as bicycles and motorcycles, the mAP improved by 6.6% and 3.7%, respectively, significantly enhancing the detection accuracy of small objects. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

20 pages, 3823 KB  
Article
SA-Encoder: A Learnt Spatial Autocorrelation Representation to Inform 3D Geospatial Object Detection
by Tianyang Chen, Wenwu Tang, Shen-En Chen and Craig Allan
Remote Sens. 2025, 17(17), 3124; https://doi.org/10.3390/rs17173124 - 8 Sep 2025
Viewed by 497
Abstract
Contextual features play a critical role in geospatial object detection by characterizing the surrounding environment of objects. In existing deep learning-based studies of 3D point cloud classification and segmentation, these features have been represented through geometric descriptors, semantic context (i.e., modeled by an [...] Read more.
Contextual features play a critical role in geospatial object detection by characterizing the surrounding environment of objects. In existing deep learning-based studies of 3D point cloud classification and segmentation, these features have been represented through geometric descriptors, semantic context (i.e., modeled by an attention-based mechanism), global-level context (i.e., through global aggregation), and textural representation (e.g., RGB, intensity, and other attributes). Even though contextual features have been widely explored, spatial contextual features that explicitly capture spatial autocorrelation and neighborhood dependency have received limited attention in object detection tasks. This gap is particularly relevant in the context of GeoAI, which calls for mutual benefits between artificial intelligence and geographic information science. To bridge this gap, this study presents a spatial autocorrelation encoder, namely SA-Encoder, designed to inform 3D geospatial object detection by capturing spatial autocorrelation representation as types of spatial contextual features. The study investigated the effectiveness of such spatial contextual features by estimating the performance of a model trained on them alone. The results suggested that the derived spatial autocorrelation information can help adequately identify some large objects in an urban-rural scene, such as buildings, terrain, and large trees. We further investigated how the spatial autocorrelation encoder can inform model performance in a geospatial object detection task. The results demonstrated significant improvements in detection accuracy across varied urban and rural environments when we compared the results to models without considering spatial autocorrelation as an ablation experiment. Moreover, the approach also outperformed the models trained by explicitly feeding traditional spatial autocorrelation measures (i.e., Matheron’s semivariance). This study showcases the advantage of the adaptiveness of the neural network-based encoder in deriving a spatial autocorrelation representation. This advancement bridges the gap between theoretical geospatial concepts and practical AI applications. Consequently, this study demonstrates the potential of integrating geographic theories with deep learning technologies to address challenges in 3D object detection, paving the way for further innovations in this field. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

26 pages, 5655 KB  
Article
A Hierarchical Multi-Feature Point Cloud Lithology Identification Method Based on Feature-Preserved Compressive Sampling (FPCS)
by Xiaolei Duan, Ran Jing, Yanlin Shao, Yuangang Liu, Binqing Gan, Peijin Li and Longfan Li
Sensors 2025, 25(17), 5549; https://doi.org/10.3390/s25175549 - 5 Sep 2025
Viewed by 1099
Abstract
Lithology identification is a critical technology for geological resource exploration and engineering safety assessment. However, traditional methods suffer from insufficient feature representation and low classification accuracy due to challenges such as weathering, vegetation cover, and spectral overlap in complex sedimentary rock regions. This [...] Read more.
Lithology identification is a critical technology for geological resource exploration and engineering safety assessment. However, traditional methods suffer from insufficient feature representation and low classification accuracy due to challenges such as weathering, vegetation cover, and spectral overlap in complex sedimentary rock regions. This study proposes a hierarchical multi-feature random forest algorithm based on Feature-Preserved Compressive Sampling (FPCS). Using 3D laser point cloud data from the Manas River outcrop in the southern margin of the Junggar Basin as the test area, we integrate graph signal processing and multi-scale feature fusion to construct a high-precision lithology identification model. The FPCS method establishes a geologically adaptive graph model constrained by geodesic distance and gradient-sensitive weighting, employing a three-tier graph filter bank (low-pass, band-pass, and high-pass) to extract macroscopic morphology, interface gradients, and microscopic fracture features of rock layers. A dynamic gated fusion mechanism optimizes multi-level feature weights, significantly improving identification accuracy in lithological transition zones. Experimental results on five million test samples demonstrate an overall accuracy (OA) of 95.6% and a mean accuracy (mAcc) of 94.3%, representing improvements of 36.1% and 20.5%, respectively, over the PointNet model. These findings confirm the robust engineering applicability of the FPCS-based hierarchical multi-feature approach for point cloud lithology identification. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

15 pages, 2951 KB  
Article
Fusing Residual and Cascade Attention Mechanisms in Voxel–RCNN for 3D Object Detection
by You Lu, Yuwei Zhang, Xiangsuo Fan, Dengsheng Cai and Rui Gong
Sensors 2025, 25(17), 5497; https://doi.org/10.3390/s25175497 - 4 Sep 2025
Viewed by 1059
Abstract
In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks [...] Read more.
In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks (RPNs) in generating candidate regions and the inadequate detection of small-scale objects caused by overly deep convolutional layers in both 3D and 2D backbone networks, this paper proposes a Cascade Attention Network (CAN). The CAN is designed to progressively refine and enhance the proposed regions, thereby producing more accurate detection results. Furthermore, a 3D Residual Network is introduced, which improves the representation of small objects by reducing the number of convolutional layers while incorporating residual connections. In the Bird’s-Eye View (BEV) feature extraction network, a Residual Attention Network (RAN) is developed. This follows a similar approach to the aforementioned 3D backbone network, leveraging the spatial awareness capabilities of the BEV. Additionally, the Squeeze-and-Excitation (SE) attention mechanism is incorporated to assign dynamic weights to features, allowing the network to focus more effectively on informative features. Experimental results on the KITTI validation dataset demonstrate the effectiveness of the proposed method, with detection accuracy for cars, pedestrians, and bicycles improving by 3.34%, 10.75%, and 4.61%, respectively, under the KITTI hard level. The primary evaluation metric adopted is the 3D Average Precision (AP), computed over 40 recall positions (R40). The Intersection over IoU thresholds used are 0.7 for cars and 0.5 for both pedestrians and bicycles. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

Back to TopTop