MDPI - Publisher of Open Access Journals

22 pages, 2919 KiB

Open AccessArticle

A Feasible Domain Segmentation Algorithm for Unmanned Vessels Based on Coordinate-Aware Multi-Scale Features

by Zhengxun Zhou, Weixian Li, Yuhan Wang, Haozheng Liu and Ning Wu

J. Mar. Sci. Eng. 2025, 13(8), 1387; https://doi.org/10.3390/jmse13081387 - 22 Jul 2025

The accurate extraction of navigational regions from images of navigational waters plays a key role in ensuring on-water safety and the automation of unmanned vessels. Nonetheless, current technological methods encounter significant challenges in addressing fluctuations in water surface illumination, reflective disturbances, and surface [...] Read more.

The accurate extraction of navigational regions from images of navigational waters plays a key role in ensuring on-water safety and the automation of unmanned vessels. Nonetheless, current technological methods encounter significant challenges in addressing fluctuations in water surface illumination, reflective disturbances, and surface undulations, among other disruptions, in turn making it challenging to achieve rapid and precise boundary segmentation. To cope with these challenges, in this paper, we propose a coordinate-aware multi-scale feature network (GASF-ResNet) method for water segmentation. The method integrates the attention module Global Grouping Coordinate Attention (GGCA) in the four downsampling branches of ResNet-50, thus enhancing the model’s ability to capture target features and improving the feature representation. To expand the model’s receptive field and boost its capability in extracting features of multi-scale targets, the Avoidance Spatial Pyramid Pooling (ASPP) technique is used. Combined with multi-scale feature fusion, this effectively enhances the expression of semantic information at different scales and improves the segmentation accuracy of the model in complex water environments. The experimental results show that the average pixel accuracy (mPA) and average intersection and union ratio (mIoU) of the proposed method on the self-made dataset and on the USVInaland unmanned ship dataset are 99.31% and 98.61%, and 98.55% and 99.27%, respectively, significantly better results than those obtained for the existing mainstream models. These results are helpful in overcoming the background interference caused by water surface reflection and uneven lighting in the aquatic environment and in realizing the accurate segmentation of the water area for the safe navigation of unmanned vessels, which is of great value for the stable operation of unmanned vessels in complex environments. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

18 pages, 1332 KiB

Open AccessArticle

SC-LKM: A Semantic Chunking and Large Language Model-Based Cybersecurity Knowledge Graph Construction Method

by Pu Wang, Yangsen Zhang, Zicheng Zhou and Yuqi Wang

Electronics 2025, 14(14), 2878; https://doi.org/10.3390/electronics14142878 - 18 Jul 2025

Viewed by 285

Abstract

In cybersecurity, constructing an accurate knowledge graph is vital for discovering key entities and relationships in security incidents buried in vast unstructured threat reports. Traditional knowledge-graph construction pipelines based on handcrafted rules or conventional machine learning models falter when the data scale and [...] Read more.

In cybersecurity, constructing an accurate knowledge graph is vital for discovering key entities and relationships in security incidents buried in vast unstructured threat reports. Traditional knowledge-graph construction pipelines based on handcrafted rules or conventional machine learning models falter when the data scale and linguistic variety grow. GraphRAG, a retrieval-augmented generation (RAG) framework that splits documents into fixed-length chunks and then retrieves the most relevant ones for generation, offers a scalable alternative yet still suffers from fragmentation and semantic gaps that erode graph integrity. To resolve these issues, this paper proposes SC-LKM, a cybersecurity knowledge-graph construction method that couples the GraphRAG backbone with hierarchical semantic chunking. SC-LKM applies semantic chunking to build a cybersecurity knowledge graph that avoids the fragmentation and inconsistency seen in prior work. The semantic chunking method first respects the native document hierarchy and then refines boundaries with topic similarity and named-entity continuity, maintaining logical coherence while limiting information loss during the fine-grained processing of unstructured text. SC-LKM further integrates the semantic comprehension capacity of Qwen2.5-14B-Instruct, markedly boosting extraction accuracy and reasoning quality. Experimental results show that SC-LKM surpasses baseline systems in entity-recognition coverage, topology density, and semantic consistency. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 5616 KiB

Open AccessArticle

Symmetry-Guided Dual-Branch Network with Adaptive Feature Fusion and Edge-Aware Attention for Image Tampering Localization

by Zhenxiang He, Le Li and Hanbin Wang

Symmetry 2025, 17(7), 1150; https://doi.org/10.3390/sym17071150 - 18 Jul 2025

Viewed by 175

Abstract

When faced with diverse types of image tampering and image quality degradation in real-world scenarios, traditional image tampering localization methods often struggle to balance boundary accuracy and robustness. To address these issues, this paper proposes a symmetric guided dual-branch image tampering localization network—FENet [...] Read more.

When faced with diverse types of image tampering and image quality degradation in real-world scenarios, traditional image tampering localization methods often struggle to balance boundary accuracy and robustness. To address these issues, this paper proposes a symmetric guided dual-branch image tampering localization network—FENet (Fusion-Enhanced Network)—that integrates adaptive feature fusion and edge attention mechanisms. This method is based on a structurally symmetric dual-branch architecture, which extracts RGB semantic features and SRM noise residual information to comprehensively capture the fine-grained differences in tampered regions at the visual and statistical levels. To effectively fuse different features, this paper designs a self-calibrating fusion module (SCF), which introduces a content-aware dynamic weighting mechanism to adaptively adjust the importance of different feature branches, thereby enhancing the discriminative power and expressiveness of the fused features. Furthermore, considering that image tampering often involves abnormal changes in edge structures, we further propose an edge-aware coordinate attention mechanism (ECAM). By jointly modeling spatial position information and edge-guided information, the model is guided to focus more precisely on potential tampering boundaries, thereby enhancing its boundary detection and localization capabilities. Experiments on public datasets such as Columbia, CASIA, and NIST16 demonstrate that FENet achieves significantly better results than existing methods. We also analyze the model’s performance under various image quality conditions, such as JPEG compression and Gaussian blur, demonstrating its robustness in real-world scenarios. Experiments in Facebook, Weibo, and WeChat scenarios show that our method achieves average F1 scores that are 2.8%, 3%, and 5.6% higher than those of existing state-of-the-art methods, respectively. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision Using Embedded Systems)

► Show Figures

Figure 1

19 pages, 14033 KiB

Open AccessArticle

SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection

by Jiahao Tang, Boyuan Gu, Tianyou Li and Ying-Bo Lu

Remote Sens. 2025, 17(14), 2380; https://doi.org/10.3390/rs17142380 - 10 Jul 2025

Viewed by 328

Abstract

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from [...] Read more.

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from insufficient feature representation due to their small size and blurred boundaries. In addition, the visual similarity between craters and surrounding terrain further exacerbates background confusion. These challenges significantly hinder detection performance in remote sensing imagery and underscore the necessity of enhancing both local feature representation and global semantic reasoning. In this paper, we propose a novel Spatial Channel Fusion and Context-Aware YOLO (SCCA-YOLO) model built upon the YOLO11 framework. Specifically, the Context-Aware Module (CAM) employs a multi-branch dilated convolutional structure to enhance feature richness and expand the local receptive field, thereby strengthening the feature extraction capability. The Joint Spatial and Channel Fusion Module (SCFM) is utilized to fuse spatial and channel information to model the global relationships between craters and the background, effectively suppressing background noise and reinforcing feature discrimination. In addition, the improved Channel Attention Concatenation (CAC) strategy adaptively learns channel-wise importance weights during feature concatenation, further optimizing multi-scale semantic feature fusion and enhancing the model’s sensitivity to critical crater features. The proposed method is validated on a self-constructed Chang’e 6 dataset, covering the landing site and its surrounding areas. Experimental results demonstrate that our model achieves an

m A P_{0.5}

of 96.5% and an

m A P_{0.5 : 0.95}

of 81.5%, outperforming other mainstream detection models including the YOLO family of algorithms. These findings highlight the potential of SCCA-YOLO for high-precision lunar crater detection and provide valuable insights into future lunar surface analysis. Full article

► Show Figures

Figure 1

15 pages, 1359 KiB

Open AccessArticle

Phoneme-Aware Hierarchical Augmentation and Semantic-Aware SpecAugment for Low-Resource Cantonese Speech Recognition

by Lusheng Zhang, Shie Wu and Zhongxun Wang

Sensors 2025, 25(14), 4288; https://doi.org/10.3390/s25144288 - 9 Jul 2025

Viewed by 328

Abstract

Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments [...] Read more.

Cantonese Automatic Speech Recognition (ASR) is hindered by tonal complexity, acoustic diversity, and a lack of labelled data. This study proposes a phoneme-aware hierarchical augmentation framework that enhances performance without additional annotation. A Phoneme Substitution Matrix (PSM), built from Montreal Forced Aligner alignments and Tacotron-2 synthesis, injects adversarial phoneme variants into both transcripts and their aligned audio segments, enlarging pronunciation diversity. Concurrently, a semantic-aware SpecAugment scheme exploits wav2vec 2.0 attention heat maps and keyword boundaries to adaptively mask informative time–frequency regions; a reinforcement-learning controller tunes the masking schedule online, forcing the model to rely on a wider context. On the Common Voice Cantonese 50 h subset, the combined strategy reduces the character error rate (CER) from 26.17% to 16.88% with wav2vec 2.0 and from 38.83% to 23.55% with Zipformer. At 100 h, the CER further drops to 4.27% and 2.32%, yielding relative gains of 32–44%. Ablation studies confirm that phoneme-level and masking components provide complementary benefits. The framework offers a practical, model-independent path toward accurate ASR for Cantonese and other low-resource tonal languages. This paper presents an intelligent sensing-oriented modeling framework for speech signals, which is suitable for deployment on edge or embedded systems to process input from audio sensors (e.g., microphones) and shows promising potential for voice-interactive terminal applications. Full article

(This article belongs to the Special Issue Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis)

► Show Figures

Figure 1

28 pages, 10581 KiB

Open AccessArticle

A Textual Semantic Analysis Framework Integrating Geographic Metaphors and GIS-Based Spatial Analysis Methods

by Yu Liu, Zhen Ren, Kaifeng Wang, Qin Tian, Xi Kuai and Sheng Li

Symmetry 2025, 17(7), 1064; https://doi.org/10.3390/sym17071064 - 4 Jul 2025

Viewed by 361

Abstract

Geographic information systems (GISs) have shown considerable promise in enhancing textual semantic analysis. Current textual semantic analysis methods face significant limitations in accurately delineating semantic boundaries, identifying semantic clustering patterns, and representing knowledge evolution. To address these issues, this study proposes a framework [...] Read more.

Geographic information systems (GISs) have shown considerable promise in enhancing textual semantic analysis. Current textual semantic analysis methods face significant limitations in accurately delineating semantic boundaries, identifying semantic clustering patterns, and representing knowledge evolution. To address these issues, this study proposes a framework that innovatively introduces GIS methods into textual semantic analysis and aligns them with the conceptual foundation of geographical metaphor theory. Specifically, word embedding models are employed to endow semantic primitives with comprehensive, high-dimensional semantic representations. GIS methods and geographical metaphors are subsequently utilized to project both semantic primitives and their relationships into a low-dimensional geospatial analog, thereby constructing a semantic space model that facilitates accurate delineation of semantic boundaries. On the basis of this model, spatial correlation measurements are adopted to reveal underlying semantic patterns, while knowledge evolution is represented using ArcGIS 10.7-based visualization techniques. Experiments on social media data validate the effectiveness of the framework in semantic boundary delineation and clustering pattern identification. Moreover, the framework supports dynamic three-dimensional visualization of topic evolution. Importantly, by employing specialized visualization methods, the proposed framework enables the intuitive representation of semantic symmetry and asymmetry within semantic spaces. Full article

(This article belongs to the Special Issue Applications Based on Symmetry/Asymmetry in Data Mining)

► Show Figures

Figure 1

22 pages, 8689 KiB

Open AccessArticle

Transfer Learning-Based Accurate Detection of Shrub Crown Boundaries Using UAS Imagery

by Jiawei Li, Huihui Zhang and David Barnard

Remote Sens. 2025, 17(13), 2275; https://doi.org/10.3390/rs17132275 - 3 Jul 2025

Viewed by 316

Abstract

The accurate delineation of shrub crown boundaries is critical for ecological monitoring, land management, and understanding vegetation dynamics in fragile ecosystems such as semi-arid shrublands. While traditional image processing techniques often struggle with overlapping canopies, deep learning methods, such as convolutional neural networks [...] Read more.

The accurate delineation of shrub crown boundaries is critical for ecological monitoring, land management, and understanding vegetation dynamics in fragile ecosystems such as semi-arid shrublands. While traditional image processing techniques often struggle with overlapping canopies, deep learning methods, such as convolutional neural networks (CNNs), offer promising solutions for precise segmentation. This study employed high-resolution imagery captured by unmanned aircraft systems (UASs) throughout the shrub growing season and explored the effectiveness of transfer learning for both semantic segmentation (Attention U-Net) and instance segmentation (Mask R-CNN). It utilized pre-trained model weights from two previous studies that originally focused on tree crown delineation to improve shrub crown segmentation in non-forested areas. Results showed that transfer learning alone did not achieve satisfactory performance due to differences in object characteristics and environmental conditions. However, fine-tuning the pre-trained models by unfreezing additional layers improved segmentation accuracy by around 30%. Fine-tuned pre-trained models show limited sensitivity to shrubs in the early growing season (April to June) and improved performance when shrub crowns become more spectrally unique in late summer (July to September). These findings highlight the value of combining pre-trained models with targeted fine-tuning to enhance model adaptability in complex remote sensing environments. The proposed framework demonstrates a scalable solution for ecological monitoring in data-scarce regions, supporting informed land management decisions and advancing the use of deep learning for long-term environmental monitoring. Full article

(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

► Show Figures

Figure 1

32 pages, 5287 KiB

Open AccessArticle

UniHSFormer X for Hyperspectral Crop Classification with Prototype-Routed Semantic Structuring

by Zhen Du, Senhao Liu, Yao Liao, Yuanyuan Tang, Yanwen Liu, Huimin Xing, Zhijie Zhang and Donghui Zhang

Agriculture 2025, 15(13), 1427; https://doi.org/10.3390/agriculture15131427 - 2 Jul 2025

Viewed by 310

Abstract

Hyperspectral imaging (HSI) plays a pivotal role in modern agriculture by capturing fine-grained spectral signatures that support crop classification, health assessment, and land-use monitoring. However, the transition from raw spectral data to reliable semantic understanding remains challenging—particularly under fragmented planting patterns, spectral ambiguity, [...] Read more.

Hyperspectral imaging (HSI) plays a pivotal role in modern agriculture by capturing fine-grained spectral signatures that support crop classification, health assessment, and land-use monitoring. However, the transition from raw spectral data to reliable semantic understanding remains challenging—particularly under fragmented planting patterns, spectral ambiguity, and spatial heterogeneity. To address these limitations, we propose UniHSFormer-X, a unified transformer-based framework that reconstructs agricultural semantics through prototype-guided token routing and hierarchical context modeling. Unlike conventional models that treat spectral–spatial features uniformly, UniHSFormer-X dynamically modulates information flow based on class-aware affinities, enabling precise delineation of field boundaries and robust recognition of spectrally entangled crop types. Evaluated on three UAV-based benchmarks—WHU-Hi-LongKou, HanChuan, and HongHu—the model achieves up to 99.80% overall accuracy and 99.28% average accuracy, outperforming state-of-the-art CNN, ViT, and hybrid architectures across both structured and heterogeneous agricultural scenarios. Ablation studies further reveal the critical role of semantic routing and prototype projection in stabilizing model behavior, while parameter surface analysis demonstrates consistent generalization across diverse configurations. Beyond high performance, UniHSFormer-X offers a semantically interpretable architecture that adapts to the spatial logic and compositional nuance of agricultural imagery, representing a forward step toward robust and scalable crop classification. Full article

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field)

► Show Figures

Figure 1

32 pages, 6860 KiB

Open AccessArticle

Participatory Drawing Methodology for Light in Architecture: Drawing Experienced Space

by Ulrika Wänström Lindh

Buildings 2025, 15(13), 2278; https://doi.org/10.3390/buildings15132278 - 28 Jun 2025

Viewed by 313

Abstract

Visual techniques can capture information about visual experiences in ways that differ from speaking and writing. This article examines drawing as a data collection method in architectural lighting research. Lighting design is a rapidly growing profession, and there is a need to build [...] Read more.

Visual techniques can capture information about visual experiences in ways that differ from speaking and writing. This article examines drawing as a data collection method in architectural lighting research. Lighting design is a rapidly growing profession, and there is a need to build research knowledge of people’s spatial experience of lit environments and to develop methods that capture it. Based on a study in which 16 participants’ experiences of different light scenarios were collected through sketches, semantic rating scales, and deep interviews, the participants drew the boundaries of what they experienced as “the room” and spatial directions inside it. In this study, the 64 sketches were compared in different combinations to detect patterns. The results showed that this drawing method worked well for everybody, both those with and those without professional drawing experience. This method, named Drawing Experienced Space, facilitated finding words and expressions for the experiences of participants, especially for those without training. Full article

(This article belongs to the Special Issue Lighting Design for the Built Environment)

► Show Figures

Figure 1

25 pages, 4835 KiB

Open AccessArticle

Object Tracking Algorithm Based on Multi-Layer Feature Fusion and Semantic Enhancement

by Jing Wang, Yanru Wang, Dan Yuan, Yuxiang Que, Weichao Huang and Yuan Wei

Appl. Sci. 2025, 15(13), 7228; https://doi.org/10.3390/app15137228 - 26 Jun 2025

Viewed by 265

Abstract

The TransT object tracking algorithm, built on Transformer architecture, effectively integrates deep feature extraction with attention mechanisms, thereby enhancing the stability and accuracy of the algorithm. However, this algorithm exhibits insufficient tracking accuracy and boundary box drift when dealing with similar background clutter, [...] Read more.

The TransT object tracking algorithm, built on Transformer architecture, effectively integrates deep feature extraction with attention mechanisms, thereby enhancing the stability and accuracy of the algorithm. However, this algorithm exhibits insufficient tracking accuracy and boundary box drift when dealing with similar background clutter, which directly affects the subsequent tracking process. To overcome this problem, this paper constructs a semantic enhancement model, which utilizes multi-layer feature representations extracted from deep networks, and correlates and fuses shallow features with deep features by using cross-attention. At the same time, in order to adapt to the changes in the surrounding environment of the object and establish good discrimination with similar objects, this paper proposes a dynamic mask strategy to optimize the attention allocation mechanism and finally employs an object template update mechanism to improve the adaptability of the model by comparing the spatio-temporal information of successive frames to update the object template in time, further enhancing its tracking performance in complex scenes. Experimental comparison results demonstrate that the algorithm proposed in this paper can effectively handle similar background clutter, leading to a significant improvement in the overall performance of the tracking model. Full article

► Show Figures

Figure 1

25 pages, 18500 KiB

Open AccessArticle

DBFormer: A Dual-Branch Adaptive Remote Sensing Image Resolution Fine-Grained Weed Segmentation Network

by Xiangfei She, Zhankui Tang, Xin Pan, Jian Zhao and Wenyu Liu

Remote Sens. 2025, 17(13), 2203; https://doi.org/10.3390/rs17132203 - 26 Jun 2025

Viewed by 255

Abstract

Remote sensing image segmentation holds significant application value in precision agriculture, environmental monitoring, and other fields. However, in the task of fine-grained segmentation of weeds and crops, traditional deep learning methods often fail to balance global semantic information with local detail features, resulting [...] Read more.

Remote sensing image segmentation holds significant application value in precision agriculture, environmental monitoring, and other fields. However, in the task of fine-grained segmentation of weeds and crops, traditional deep learning methods often fail to balance global semantic information with local detail features, resulting in over-segmentation or under-segmentation issues. To address this challenge, this paper proposes a segmentation model based on a dual-branch Transformer architecture—DBFormer—to enhance the accuracy of weed detection in remote sensing images. This approach integrates the following techniques: (1) a dynamic context aggregation branch (DCA-Branch) with adaptive downsampling attention to model long-range dependencies and suppress background noise, and (2) a local detail enhancement branch (LDE-Branch) leveraging depthwise-separable convolutions with residual refinement to preserve and sharpen small weed edges. An Edge-Aware Loss module further reinforces boundary clarity. On the Tobacco Dataset, DBFormer achieves an mIoU of 86.48%, outperforming the best baseline by 3.83%; on the Sunflower Dataset, it reaches 85.49% mIoU, a 4.43% absolute gain. These results demonstrate that our dual-branch synergy effectively resolves the global–local conflict, delivering superior accuracy and stability in the context of practical agricultural applications. Full article

► Show Figures

Figure 1

24 pages, 595 KiB

Open AccessArticle

An Empirical Comparison of Machine Learning and Deep Learning Models for Automated Fake News Detection

by Yexin Tian, Shuo Xu, Yuchen Cao, Zhongyan Wang and Zijing Wei

Mathematics 2025, 13(13), 2086; https://doi.org/10.3390/math13132086 - 25 Jun 2025

Viewed by 423

Abstract

Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention [...] Read more.

Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention to interpretability and computational tradeoffs—remain underexplored. In this study, we systematically evaluate the mathematical foundations and empirical performance of five representative models for automated fake news classification: three classical machine learning algorithms (Logistic Regression, Random Forest, and Light Gradient Boosting Machine) and two state-of-the-art deep learning architectures (A Lite Bidirectional Encoder Representations from Transformers—ALBERT and Gated Recurrent Units—GRUs). Leveraging the large-scale WELFake dataset, we conduct rigorous experiments under both headline-only and headline-plus-content input scenarios, providing a comprehensive assessment of each model’s capability to capture linguistic, contextual, and semantic cues. We analyze each model’s optimization framework, decision boundaries, and feature importance mechanisms, highlighting the empirical tradeoffs between representational capacity, generalization, and interpretability. Our results show that transformer-based models, especially ALBERT, achieve state-of-the-art performance (macro F1 up to 0.99) with rich context, while classical ensembles remain viable for constrained settings. These findings directly inform practical fake news detection. Full article

(This article belongs to the Special Issue Mathematical Foundations in NLP: Applications and Challenges)

► Show Figures

Figure 1

17 pages, 1728 KiB

Open AccessArticle

Spatiotemporal Contextual 3D Semantic Segmentation for Intelligent Outdoor Mining

by Wenhao Yang, Liqun Kuang, Song Wang, Xie Han, Rong Guo, Yongpeng Wang, Haifeng Yue and Tao Wei

Algorithms 2025, 18(7), 383; https://doi.org/10.3390/a18070383 - 24 Jun 2025

Viewed by 249

Abstract

Three-dimensional semantic segmentation plays a crucial role in accurately identifying terrain features and objects by effectively extracting 3D spatial information from the environment. However, the inherent sparsity of point clouds and unclear terrain boundaries in outdoor mining environments significantly complicate the recognition process. [...] Read more.

Three-dimensional semantic segmentation plays a crucial role in accurately identifying terrain features and objects by effectively extracting 3D spatial information from the environment. However, the inherent sparsity of point clouds and unclear terrain boundaries in outdoor mining environments significantly complicate the recognition process. To address these challenges, we propose a novel 3D semantic segmentation network that incorporates spatiotemporal feature aggregation. Specifically, we introduced the Gated Spatiotemporal Clue Encoder, which extracts spatiotemporal context from historical multi-frame point cloud data and combines it with the current scan frame to enhance feature representation. Additionally, the Spatiotemporal Feature State Space Module is proposed to efficiently model long-term spatiotemporal features while minimizing computational and memory overhead. Experimental results show that the proposed method outperforms the baseline model, achieving a 2.1% improvement in mIoU on the self-constructed TZMD_NUC outdoor mining dataset and a 1.9% avg improvement on the public SemanticKITTI dataset. Moreover, the method simultaneously improves computational efficiency, making it more suitable for real-time applications in complex, real-world mining environments. These results validate the effectiveness of the proposed method, offering a promising solution for 3D semantic segmentation in complex, real-world mining environments, where computational efficiency and accuracy are both critical. Full article

► Show Figures

Figure 1

20 pages, 67212 KiB

Open AccessArticle

KPV-UNet: KAN PP-VSSA UNet for Remote Image Segmentation

by Shuiping Zhang, Qiang Rao, Lei Wang, Tang Tang and Chen Chen

Electronics 2025, 14(13), 2534; https://doi.org/10.3390/electronics14132534 - 23 Jun 2025

Viewed by 418

Abstract

Semantic segmentation of remote sensing images is a key technology for land cover interpretation and target identification. Although convolutional neural networks (CNNs) have achieved remarkable success in this field, their inherent limitation of local receptive fields restricts their ability to model long-range dependencies [...] Read more.

Semantic segmentation of remote sensing images is a key technology for land cover interpretation and target identification. Although convolutional neural networks (CNNs) have achieved remarkable success in this field, their inherent limitation of local receptive fields restricts their ability to model long-range dependencies and global contextual information. As a result, CNN-based methods often struggle to capture the comprehensive spatial context necessary for accurate segmentation in complex remote sensing scenes, leading to issues such as the misclassification of small objects and blurred or imprecise object boundaries. To address these problems, this paper proposes a new hybrid architecture called KPV-UNet, which integrates the Kolmogorov–Arnold Network (KAN) and the Pyramid Pooling Visual State Space Attention (PP-VSSA) block. KPV-UNet introduces a deep feature refinement module based on KAN and incorporates PP-VSSA to enable scalable long-range modeling. This design effectively captures global dependencies and abundant localized semantic content extracted from complex feature spaces, overcoming CNNs’ limitations in modeling long-range dependencies and inter-national context in large-scale complex scenes. In addition, we designed an Auxiliary Local Monitoring (ALM) block that significantly enhances KPV-UNet’s perception of local content. Experimental results demonstrate that KPV-UNet outperforms state-of-the-art methods on the Vaihingen, LoveDA Urban, and WHDLD datasets, achieving mIoU scores of 84.03%, 51.27%, and 62.87%, respectively. The proposed method not only improves segmentation accuracy but also produces clearer and more connected object boundaries in visual results. Full article

► Show Figures

Figure 1

16 pages, 1058 KiB

Open AccessArticle

Multi-Scale Context Enhancement Network with Local–Global Synergy Modeling Strategy for Semantic Segmentation on Remote Sensing Images

by Qibing Ma, Hongning Liu, Yifan Jin and Xinyue Liu

Electronics 2025, 14(13), 2526; https://doi.org/10.3390/electronics14132526 - 21 Jun 2025

Viewed by 291

Abstract

Semantic segmentation of remote sensing images is a fundamental task in geospatial analysis and Earth observation research, and has a wide range of applications in urban planning, land cover classification, and ecological monitoring. In complex geographic scenes, low target-background discriminability in overhead views [...] Read more.

Semantic segmentation of remote sensing images is a fundamental task in geospatial analysis and Earth observation research, and has a wide range of applications in urban planning, land cover classification, and ecological monitoring. In complex geographic scenes, low target-background discriminability in overhead views (e.g., indistinct boundaries, ambiguous textures, and low contrast) significantly complicates local–global information modeling and results in blurred boundaries and classification errors in model predictions. To address this issue, in this paper, we proposed a novel Multi-Scale Local–Global Mamba Feature Pyramid Network (MLMFPN) through designing a local–global information synergy modeling strategy, and guided and enhanced the cross-scale contextual information interaction in the feature fusion process to obtain quality semantic features to be used as cues for precise semantic reasoning. The proposed MLMFPN comprises two core components: Local–Global Align Mamba Fusion (LGAMF) and Context-Aware Cross-attention Interaction Module (CCIM). Specifically, LGAMF designs a local-enhanced global information modeling through asymmetric convolution for synergistic modeling of the receptive fields in vertical and horizontal directions, and further introduces the Vision Mamba structure to facilitate local–global information fusion. CCIM introduces positional encoding and cross-attention mechanisms to enrich the global-spatial semantics representation during multi-scale context information interaction, thereby achieving refined segmentation. The proposed methods are evaluated on the ISPRS Potsdam and Vaihingen datasets and the outperformance in the results verifies the effectiveness of the proposed method. Full article

(This article belongs to the Special Issue Artificial Intelligence and Pattern Recognition for Intelligent Systems)

► Show Figures

Figure 1

Search Results (315)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (315)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI