MDPI - Publisher of Open Access Journals

27 pages, 7645 KiB

Open AccessArticle

VMMT-Net: A Dual-Branch Parallel Network Combining Visual State Space Model and Mix Transformer for Land–Sea Segmentation of Remote Sensing Images

by Jiawei Wu, Zijian Liu, Zhipeng Zhu, Chunhui Song, Xinghui Wu and Haihua Xing

Remote Sens. 2025, 17(14), 2473; https://doi.org/10.3390/rs17142473 - 16 Jul 2025

Abstract

Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack [...] Read more.

Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack the ability to model spatial continuity effectively, thereby limiting a comprehensive understanding of coastline features in remote sensing imagery. To address this issue, we have developed VMMT-Net, a novel dual-branch semantic segmentation framework. By constructing a parallel heterogeneous dual-branch encoder, VMMT-Net integrates the complementary strengths of the Mix Transformer and the Visual State Space Model, enabling comprehensive modeling of local details, global semantics, and spatial continuity. We design a Cross-Branch Fusion Module to facilitate deep feature interaction and collaborative representation across branches, and implement a customized decoder module that enhances the integration of multiscale features and improves boundary refinement of coastlines. Extensive experiments conducted on two benchmark remote sensing datasets, GF-HNCD and BSD, demonstrate that the proposed VMMT-Net outperforms existing state-of-the-art methods in both quantitative metrics and visual quality. Specifically, the model achieves mean F1-scores of 98.48% (GF-HNCD) and 98.53% (BSD) and mean intersection-over-union values of 97.02% (GF-HNCD) and 97.11% (BSD). The model maintains reasonable computational complexity, with only 28.24 M parameters and 25.21 GFLOPs, striking a favorable balance between accuracy and efficiency. These results indicate the strong generalization ability and practical applicability of VMMT-Net in real-world remote sensing segmentation tasks. Full article

(This article belongs to the Special Issue Application of Remote Sensing in Coastline Monitoring)

► Show Figures

Figure 1

23 pages, 3626 KiB

Open AccessArticle

A Framework for Predicting Winter Wheat Yield in Northern China with Triple Cross-Attention and Multi-Source Data Fusion

by Shuyan Pan and Liqun Liu

Plants 2025, 14(14), 2206; https://doi.org/10.3390/plants14142206 - 16 Jul 2025

Abstract

To solve the issue that existing yield prediction methods do not fully capture the interaction between multiple factors, we propose a winter wheat yield prediction framework with triple cross-attention for multi-source data fusion. This framework consists of three modules: a multi-source data processing [...] Read more.

To solve the issue that existing yield prediction methods do not fully capture the interaction between multiple factors, we propose a winter wheat yield prediction framework with triple cross-attention for multi-source data fusion. This framework consists of three modules: a multi-source data processing module, a multi-source feature fusion module, and a yield prediction module. The multi-source data processing module collects satellite, climate, and soil data based on the winter wheat planting range, and constructs a multi-source feature sequence set by combining statistical data. The multi-source feature fusion module first extracts deeper-level feature information based on the characteristics of different data, and then performs multi-source feature fusion through a triple cross-attention fusion mechanism. The encoder part in the production prediction module adds a graph attention mechanism, forming a dual branch with the original multi-head self-attention mechanism to ensure the capture of global dependencies while enhancing the preservation of local feature information. The decoder section generates the final predicted output. The results show that: (1) Using 2021 and 2022 as test sets, the mean absolute error of our method is 385.99 kg/hm², and the root mean squared error is 501.94 kg/hm², which is lower than other methods. (2) It can be concluded that the jointing-heading stage (March to April) is the most crucial period affecting winter wheat production. (3) It is evident that our model has the ability to predict the final winter wheat yield nearly a month in advance. Full article

(This article belongs to the Section Plant Modeling)

27 pages, 6371 KiB

Open AccessArticle

Growth Stages Discrimination of Multi-Cultivar Navel Oranges Using the Fusion of Near-Infrared Hyperspectral Imaging and Machine Vision with Deep Learning

by Chunyan Zhao, Zhong Ren, Yue Li, Jia Zhang and Weinan Shi

Agriculture 2025, 15(14), 1530; https://doi.org/10.3390/agriculture15141530 - 15 Jul 2025

Abstract

To noninvasively and precisely discriminate among the growth stages of multiple cultivars of navel oranges simultaneously, the fusion of the technologies of near-infrared (NIR) hyperspectral imaging (HSI) combined with machine vision (MV) and deep learning is employed. NIR reflectance spectra and hyperspectral and [...] Read more.

To noninvasively and precisely discriminate among the growth stages of multiple cultivars of navel oranges simultaneously, the fusion of the technologies of near-infrared (NIR) hyperspectral imaging (HSI) combined with machine vision (MV) and deep learning is employed. NIR reflectance spectra and hyperspectral and RGB images for 740 Gannan navel oranges of five cultivars are collected. Based on preprocessed spectra, optimally selected hyperspectral images, and registered RGB images, a dual-branch multi-modal feature fusion convolutional neural network (CNN) model is established. In this model, a spectral branch is designed to extract spectral features reflecting internal compositional variations, while the image branch is utilized to extract external color and texture features from the integration of hyperspectral and RGB images. Finally, growth stages are determined via the fusion of features. To validate the availability of the proposed method, various machine-learning and deep-learning models are compared for single-modal and multi-modal data. The results demonstrate that multi-modal feature fusion of HSI and MV combined with the constructed dual-branch CNN deep-learning model yields excellent growth stage discrimination in navel oranges, achieving an accuracy, recall rate, precision, F1 score, and kappa coefficient on the testing set are 95.95%, 96.66%, 96.76%, 96.69%, and 0.9481, respectively, providing a prominent way to precisely monitor the growth stages of fruits. Full article

(This article belongs to the Special Issue Multi- and Hyper-Spectral Imaging Technologies for Crop Monitoring—2nd Edition)

► Show Figures

Figure 1

25 pages, 1429 KiB

Open AccessArticle

A Contrastive Semantic Watermarking Framework for Large Language Models

by Jianxin Wang, Xiangze Chang, Chaoen Xiao and Lei Zhang

Symmetry 2025, 17(7), 1124; https://doi.org/10.3390/sym17071124 - 14 Jul 2025

Viewed by 95

Abstract

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly [...] Read more.

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly critical in open-access settings, where model internals and generation logits are unavailable for attribution. To address these limitations, we propose CWS (Contrastive Watermarking with Semantic Modeling)—a novel keyless watermarking framework that integrates contrastive semantic token selection and shared embedding space alignment. CWS enables context-aware, fluent watermark embedding while supporting robust detection via a dual-branch mechanism: a lightweight z-score statistical test for public verification and a GRU-based semantic decoder for black-box adversarial robustness. Experiments on GPT-2, OPT-1.3B, and LLaMA-7B over C4 and DBpedia datasets demonstrate that CWS achieves F1 scores up to 99.9% and maintains F1 ≥ 93% under semantic rewriting, token substitution, and lossy compression (ε ≤ 0.25, δ ≤ 0.2). The GRU-based detector offers a superior speed–accuracy trade-off (0.42 s/sample) over LSTM and Transformer baselines. These results highlight CWS as a lightweight, black-box-compatible, and semantically robust watermarking method suitable for practical content attribution across LLM architectures and decoding strategies. Furthermore, CWS maintains a symmetrical architecture between embedding and detection stages via shared semantic representations, ensuring structural consistency and robustness. This semantic symmetry helps preserve detection reliability across diverse decoding strategies and adversarial conditions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

21 pages, 21215 KiB

Open AccessArticle

ES-Net Empowers Forest Disturbance Monitoring: Edge–Semantic Collaborative Network for Canopy Gap Mapping

by Yutong Wang, Zhang Zhang, Jisheng Xia, Fei Zhao and Pinliang Dong

Remote Sens. 2025, 17(14), 2427; https://doi.org/10.3390/rs17142427 - 12 Jul 2025

Viewed by 192

Abstract

Canopy gaps are vital microhabitats for forest carbon cycling and species regeneration, whose accurate extraction is crucial for ecological modeling and smart forestry. However, traditional monitoring methods have notable limitations: ground-based measurements are inefficient; remote-sensing interpretation is susceptible to terrain and spectral interference; [...] Read more.

Canopy gaps are vital microhabitats for forest carbon cycling and species regeneration, whose accurate extraction is crucial for ecological modeling and smart forestry. However, traditional monitoring methods have notable limitations: ground-based measurements are inefficient; remote-sensing interpretation is susceptible to terrain and spectral interference; and traditional algorithms exhibit an insufficient feature representation capability. Aiming at overcoming the bottleneck issues of canopy gap identification in mountainous forest regions, we constructed a multi-task deep learning model (ES-Net) integrating an edge–semantic collaborative perception mechanism. First, a refined sample library containing multi-scale interference features was constructed, which included 2808 annotated UAV images. Based on this, a dual-branch feature interaction architecture was designed. A cross-layer attention mechanism was embedded in the semantic segmentation module (SSM) to enhance the discriminative ability for heterogeneous features. Meanwhile, an edge detection module (EDM) was built to strengthen geometric constraints. Results from selected areas in Yunnan Province (China) demonstrate that ES-Net outperforms U-Net, boosting the Intersection over Union (IoU) by 0.86% (95.41% vs. 94.55%), improving the edge coverage rate by 3.14% (85.32% vs. 82.18%), and reducing the Hausdorff Distance by 38.6% (28.26 pixels vs. 46.02 pixels). Ablation studies further verify that the synergy between SSM and EDM yields a 13.0% IoU gain over the baseline, highlighting the effectiveness of joint semantic–edge optimization. This study provides a terrain-adaptive intelligent interpretation method for forest disturbance monitoring and holds significant practical value for advancing smart forestry construction and ecosystem sustainable management. Full article

► Show Figures

Figure 1

24 pages, 2440 KiB

Open AccessArticle

A Novel Dynamic Context Branch Attention Network for Detecting Small Objects in Remote Sensing Images

by Huazhong Jin, Yizhuo Song, Ting Bai, Kaimin Sun and Yepei Chen

Remote Sens. 2025, 17(14), 2415; https://doi.org/10.3390/rs17142415 - 12 Jul 2025

Viewed by 124

Abstract

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the [...] Read more.

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the contextual scope based on the specific characteristics of each target. To address this issue and improve the detection performance of small objects (typically defined as objects with a bounding box area of less than 1024 pixels), we propose a novel backbone network called the Dynamic Context Branch Attention Network (DCBANet). We present the Dynamic Context Scale-Aware (DCSA) Block, which utilizes a multi-branch architecture to generate features with diverse receptive fields. Within each branch, a Context Adaptive Selection Module (CASM) dynamically weights information, allowing the model to focus on the most relevant context. To further enhance performance, we introduce an Efficient Branch Attention (EBA) module that adaptively reweights the parallel branches, prioritizing the most discriminative ones. Finally, to ensure computational efficiency, we design a Dual-Gated Feedforward Network (DGFFN), a lightweight yet powerful replacement for standard FFNs. Extensive experiments conducted on four public remote sensing datasets demonstrate that the DCBANet achieves impressive mAP@0.5 scores of 80.79% on DOTA, 89.17% on NWPU VHR-10, 80.27% on SIMD, and a remarkable 42.4% mAP@0.5:0.95 on the specialized small object benchmark AI-TOD. These results surpass RetinaNet, YOLOF, FCOS, Faster R-CNN, Dynamic R-CNN, SKNet, and Cascade R-CNN, highlighting its effectiveness in detecting small objects in remote sensing images. However, there remains potential for further improvement in multi-scale and weak target detection. Future work will integrate local and global context to enhance multi-scale object detection performance. Full article

(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)

► Show Figures

Figure 1

21 pages, 12122 KiB

Open AccessArticle

RA3T: An Innovative Region-Aligned 3D Transformer for Self-Supervised Sim-to-Real Adaptation in Low-Altitude UAV Vision

by Xingrao Ma, Jie Xie, Di Shao, Aiting Yao and Chengzu Dong

Electronics 2025, 14(14), 2797; https://doi.org/10.3390/electronics14142797 - 11 Jul 2025

Viewed by 131

Abstract

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework [...] Read more.

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework that enables robust Sim-to-Real adaptation. Specifically, we first develop a dual-branch strategy for self-supervised feature learning, integrating Masked Autoencoders and contrastive learning. This approach extracts domain-invariant representations from unlabeled simulated imagery to enhance robustness against occlusion while reducing annotation dependency. Leveraging these learned features, we then introduce a 3D Transformer fusion module that unifies multi-view RGB and LiDAR point clouds through cross-modal attention. By explicitly modeling spatial layouts and height differentials, this component significantly improves recognition of small and occluded targets in complex low-altitude environments. To address persistent fine-grained domain shifts, we finally design region-level adversarial calibration that deploys local discriminators on partitioned feature maps. This mechanism directly aligns texture, shadow, and illumination discrepancies which challenge conventional global alignment methods. Extensive experiments on UAV benchmarks VisDrone and DOTA demonstrate the effectiveness of RA3T. The framework achieves +5.1% mAP on VisDrone and +7.4% mAP on DOTA over the 2D adversarial baseline, particularly on small objects and sparse occlusions, while maintaining real-time performance of 17 FPS at 1024 × 1024 resolution on an RTX 4080 GPU. Visual analysis confirms that the synergistic integration of 3D geometric encoding and local adversarial alignment effectively mitigates domain gaps caused by uneven illumination and perspective variations, establishing an efficient pathway for simulation-to-reality UAV perception. Full article

(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)

► Show Figures

Figure 1

17 pages, 5189 KiB

Open AccessArticle

YOLO-Extreme: Obstacle Detection for Visually Impaired Navigation Under Foggy Weather

by Wei Wang, Bin Jing, Xiaoru Yu, Wei Zhang, Shengyu Wang, Ziqi Tang and Liping Yang

Sensors 2025, 25(14), 4338; https://doi.org/10.3390/s25144338 - 11 Jul 2025

Viewed by 243

Abstract

Visually impaired individuals face significant challenges in navigating safely and independently, particularly under adverse weather conditions such as fog. To address this issue, we propose YOLO-Extreme, an enhanced object detection framework based on YOLOv12, specifically designed for robust navigation assistance in foggy environments. [...] Read more.

Visually impaired individuals face significant challenges in navigating safely and independently, particularly under adverse weather conditions such as fog. To address this issue, we propose YOLO-Extreme, an enhanced object detection framework based on YOLOv12, specifically designed for robust navigation assistance in foggy environments. The proposed architecture incorporates three novel modules: the Dual-Branch Bottleneck Block (DBB) for capturing both local spatial and global semantic features, the Multi-Dimensional Collaborative Attention Module (MCAM) for joint spatial-channel attention modeling to enhance salient obstacle features and reduce background interference in foggy conditions, and the Channel-Selective Fusion Block (CSFB) for robust multi-scale feature integration. Comprehensive experiments conducted on the Real-world Task-driven Traffic Scene (RTTS) foggy dataset demonstrate that YOLO-Extreme achieves state-of-the-art detection accuracy and maintains high inference speed, outperforming existing dehazing-and-detect and mainstream object detection methods. To further verify the generalization capability of the proposed framework, we also performed cross-dataset experiments on the Foggy Cityscapes dataset, where YOLO-Extreme consistently demonstrated superior detection performance across diverse foggy urban scenes. The proposed framework significantly improves the reliability and safety of assistive navigation for visually impaired individuals under challenging weather conditions, offering practical value for real-world deployment. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

12 pages, 4368 KiB

Open AccessArticle

A Dual-Branch Fusion Model for Deepfake Detection Using Video Frames and Microexpression Features

by Georgios Petmezas, Vazgken Vanian, Manuel Pastor Rufete, Eleana E. I. Almaloglou and Dimitris Zarpalas

J. Imaging 2025, 11(7), 231; https://doi.org/10.3390/jimaging11070231 - 11 Jul 2025

Viewed by 251

Abstract

Deepfake detection has become a critical issue due to the rise of synthetic media and its potential for misuse. In this paper, we propose a novel approach to deepfake detection by combining video frame analysis with facial microexpression features. The dual-branch fusion model [...] Read more.

Deepfake detection has become a critical issue due to the rise of synthetic media and its potential for misuse. In this paper, we propose a novel approach to deepfake detection by combining video frame analysis with facial microexpression features. The dual-branch fusion model utilizes a 3D ResNet18 for spatiotemporal feature extraction and a transformer model to capture microexpression patterns, which are difficult to replicate in manipulated content. We evaluate the model on the widely used FaceForensics++ (FF++) dataset and demonstrate that our approach outperforms existing state-of-the-art methods, achieving 99.81% accuracy and a perfect ROC-AUC score of 100%. The proposed method highlights the importance of integrating diverse data sources for deepfake detection, addressing some of the current limitations of existing systems. Full article

(This article belongs to the Special Issue Deepfakes, Fake News and Multimedia Manipulation from Generation to Detection (2nd Edition))

► Show Figures

Figure 1

30 pages, 5053 KiB

Open AccessArticle

Dual-Branch Spatial–Spectral Transformer with Similarity Propagation for Hyperspectral Image Classification

by Teng Wen, Heng Wang and Liguo Wang

Remote Sens. 2025, 17(14), 2386; https://doi.org/10.3390/rs17142386 - 10 Jul 2025

Viewed by 252

Abstract

In recent years, Vision Transformers (ViTs) have gained significant traction in the field of hyperspectral image classification due to their advantages in modeling long-range dependency relationships between spectral bands and spatial pixels. However, after stacking multiple Transformer encoders, challenges pertaining to information degradation [...] Read more.

In recent years, Vision Transformers (ViTs) have gained significant traction in the field of hyperspectral image classification due to their advantages in modeling long-range dependency relationships between spectral bands and spatial pixels. However, after stacking multiple Transformer encoders, challenges pertaining to information degradation may emerge during the forward propagation. That is to say, existing Transformer-based methods exhibit certain limitations in retaining and effectively utilizing information throughout their forward transmission. To tackle these challenges, this paper proposes a novel dual-branch spatial–spectral Transformer model that incorporates similarity propagation (DBSSFormer-SP). Specifically, this model first employs a Hybrid Pooling Spatial Channel Attention (HPSCA) module to integrate global information by pooling across different dimensional directions, thereby enhancing its ability to extract salient features. Secondly, we introduce a mechanism for transferring similarity attention that aims to retain and strengthen key semantic features, thus mitigating issues associated with information degradation. Additionally, the Spectral Transformer (SpecFormer) module is employed to capture long-range dependencies among spectral bands. Finally, the extracted spatial and spectral features are fed into a multilayer perceptron (MLP) module for classification. The proposed method is evaluated against several mainstream approaches on four public datasets. Experimental results demonstrate that DBSSFormer-SP exhibits excellent classification performance. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning with Applications in Remote Sensing (Third Edition))

► Show Figures

Figure 1

29 pages, 16466 KiB

Open AccessArticle

DMF-YOLO: Dynamic Multi-Scale Feature Fusion Network-Driven Small Target Detection in UAV Aerial Images

by Xiaojia Yan, Shiyan Sun, Huimin Zhu, Qingping Hu, Wenjian Ying and Yinglei Li

Remote Sens. 2025, 17(14), 2385; https://doi.org/10.3390/rs17142385 - 10 Jul 2025

Viewed by 369

Abstract

Target detection in UAV aerial images has found increasingly widespread applications in emergency rescue, maritime monitoring, and environmental surveillance. However, traditional detection models suffer significant performance degradation due to challenges including substantial scale variations, high proportions of small targets, and dense occlusions in [...] Read more.

Target detection in UAV aerial images has found increasingly widespread applications in emergency rescue, maritime monitoring, and environmental surveillance. However, traditional detection models suffer significant performance degradation due to challenges including substantial scale variations, high proportions of small targets, and dense occlusions in UAV-captured images. To address these issues, this paper proposes DMF-YOLO, a high-precision detection network based on YOLOv10 improvements. First, we design Dynamic Dilated Snake Convolution (DDSConv) to adaptively adjust the receptive field and dilation rate of convolution kernels, enhancing local feature extraction for small targets with weak textures. Second, we construct a Multi-scale Feature Aggregation Module (MFAM) that integrates dual-branch spatial attention mechanisms to achieve efficient cross-layer feature fusion, mitigating information conflicts between shallow details and deep semantics. Finally, we propose an Expanded Window-based Bounding Box Regression Loss Function (EW-BBRLF), which optimizes localization accuracy through dynamic auxiliary bounding boxes, effectively reducing missed detections of small targets. Experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that DMF-YOLOv10 achieves 50.1% and 81.4% mAP50, respectively, significantly outperforming the baseline YOLOv10s by 27.1% and 2.6%, with parameter increases limited to 24.4% and 11.9%. The method exhibits superior robustness in dense scenarios, complex backgrounds, and long-range target detection. This approach provides an efficient solution for UAV real-time perception tasks and offers novel insights for multi-scale object detection algorithm design. Full article

► Show Figures

Figure 1

19 pages, 2468 KiB

Open AccessArticle

A Dual-Branch Spatial-Frequency Domain Fusion Method with Cross Attention for SAR Image Target Recognition

by Chao Li, Jiacheng Ni, Ying Luo, Dan Wang and Qun Zhang

Remote Sens. 2025, 17(14), 2378; https://doi.org/10.3390/rs17142378 - 10 Jul 2025

Viewed by 217

Abstract

Synthetic aperture radar (SAR) image target recognition has important application values in security reconnaissance and disaster monitoring. However, due to speckle noise and target orientation sensitivity in SAR images, traditional spatial domain recognition methods face challenges in accuracy and robustness. To effectively address [...] Read more.

Synthetic aperture radar (SAR) image target recognition has important application values in security reconnaissance and disaster monitoring. However, due to speckle noise and target orientation sensitivity in SAR images, traditional spatial domain recognition methods face challenges in accuracy and robustness. To effectively address these challenges, we propose a dual-branch spatial-frequency domain fusion recognition method with cross-attention, achieving deep fusion of spatial and frequency domain features. In the spatial domain, we propose an enhanced multi-scale feature extraction module (EMFE), which adopts a multi-branch parallel structure to effectively enhance the network’s multi-scale feature representation capability. Combining frequency domain guided attention, the model focuses on key regional features in the spatial domain. In the frequency domain, we design a hybrid frequency domain transformation module (HFDT) that extracts real and imaginary features through Fourier transform to capture the global structure of the image. Meanwhile, we introduce a spatially guided frequency domain attention to enhance the discriminative capability of frequency domain features. Finally, we propose a cross-domain feature fusion (CDFF) module, which achieves bidirectional interaction and optimal fusion of spatial-frequency domain features through cross attention and adaptive feature fusion. Experimental results demonstrate that our method achieves significantly superior recognition accuracy compared to existing methods on the MSTAR dataset. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

► Show Figures

Figure 1

28 pages, 14588 KiB

Open AccessArticle

CAU²DNet: A Dual-Branch Deep Learning Network and a Dataset for Slum Recognition with Multi-Source Remote Sensing Data

by Xi Lyu, Chenyu Zhang, Lizhi Miao, Xiying Sun, Xinxin Zhou, Xinyi Yue, Zhongchang Sun and Yueyong Pang

Remote Sens. 2025, 17(14), 2359; https://doi.org/10.3390/rs17142359 - 9 Jul 2025

Viewed by 147

Abstract

The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face [...] Read more.

The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face challenges such as limited receptive fields and insufficient sensitivity to spatial locations when integrating multi-source remote sensing data, and high-quality datasets that integrate multi-spectral and geoscientific indicators to support them are scarce. In response to these issues, this study proposes a DL model (coordinate-attentive U²-DeepLab network [CAU²DNet]) that integrates multi-source remote sensing data. The model integrates the multi-scale feature extraction capability of U²-Net with the global receptive field advantage of DeepLabV3+ through a dual-branch architecture. Thereafter, the spatial semantic perception capability is enhanced by introducing the CoordAttention mechanism, and ConvNextV2 is adopted to optimize the backbone network of the DeepLabV3+ branch, thereby improving the modeling capability of low-resolution geoscientific features. The two branches adopt a decision-level fusion mechanism for feature fusion, which means that the results of each are weighted and summed using learnable weights to obtain the final output feature map. Furthermore, this study constructs the São Paulo slums dataset for model training due to the lack of a multi-spectral slum dataset. This dataset covers 7978 samples of 512 × 512 pixels, integrating high-resolution RGB images, Normalized Difference Vegetation Index (NDVI)/Modified Normalized Difference Water Index (MNDWI) geoscientific indicators, and POI infrastructure data, which can significantly enrich multi-source slum remote sensing data. Experiments have shown that CAU²DNet achieves an intersection over union (IoU) of 0.6372 and an F1 score of 77.97% on the São Paulo slums dataset, indicating a significant improvement in accuracy over the baseline model. The ablation experiments verify that the improvements made in this study have resulted in a 16.12% increase in precision. Moreover, CAU²DNet also achieved the best results in all metrics during the cross-domain testing on the WHU building dataset, further confirming the model’s generalizability. Full article

(This article belongs to the Special Issue Target Detection, Recognition, Tracking, and Positioning Using Remote Sensing and AI Techniques)

► Show Figures

Figure 1

23 pages, 16714 KiB

Open AccessArticle

A Dual-Stream Dental Panoramic X-Ray Image Segmentation Method Based on Transformer Heterogeneous Feature Complementation

by Tian Ma, Jiahui Li, Zhenrui Dang, Yawen Li and Yuancheng Li

Technologies 2025, 13(7), 293; https://doi.org/10.3390/technologies13070293 - 8 Jul 2025

Viewed by 254

Abstract

To address the widespread challenges of significant multi-category dental morphological variations and interference from overlapping anatomical structures in panoramic dental X-ray images, this paper proposes a dual-stream dental segmentation model based on Transformer heterogeneous feature complementarity. Firstly, we construct a parallel architecture comprising [...] Read more.

To address the widespread challenges of significant multi-category dental morphological variations and interference from overlapping anatomical structures in panoramic dental X-ray images, this paper proposes a dual-stream dental segmentation model based on Transformer heterogeneous feature complementarity. Firstly, we construct a parallel architecture comprising a Transformer semantic parsing branch and a Convolutional Neural Network (CNN) detail capturing pathway, achieving collaborative optimization of global context modeling and local feature extraction. Furthermore, a Pooling-Cooperative Convolutional Module was designed, which enhances the model’s capability in detail extraction and boundary localization through weighted centroid features of dental structures and a latent edge extraction module. Finally, a Semantic Transformation Module and Interactive Fusion Module are constructed. The Semantic Transformation Module converts geometric detail features extracted from the CNN branch into high-order semantic representations compatible with Transformer sequential processing paradigms, while the Interactive Fusion Module applies attention mechanisms to progressively fuse dual-stream features, thereby enhancing the model’s capability in holistic dental feature extraction. Experimental results demonstrate that the proposed method achieves an IoU of 91.49% and a Dice coefficient of 94.54%, outperforming current segmentation methods across multiple evaluation metrics. Full article

► Show Figures

Figure 1

46 pages, 5911 KiB

Open AccessArticle

Leveraging Prior Knowledge in Semi-Supervised Learning for Precise Target Recognition

by Guohao Xie, Zhe Chen, Yaan Li, Mingsong Chen, Feng Chen, Yuxin Zhang, Hongyan Jiang and Hongbing Qiu

Remote Sens. 2025, 17(14), 2338; https://doi.org/10.3390/rs17142338 - 8 Jul 2025

Viewed by 256

Abstract

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, [...] Read more.

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, enhanced by domain-specific prior knowledge. The architecture employs a Convolutional Block Attention Module (CBAM) for localized feature refinement, a lightweight New Transformer Encoder for global context modeling, and a novel TriFusion Block to synergize spectral–temporal–spatial features through parallel multi-branch fusion, addressing the limitations of single-modality extraction. Leveraging the mean teacher framework, DART-MT optimizes consistency regularization to exploit unlabeled data, effectively mitigating class imbalance and annotation scarcity. Evaluations on the DeepShip and ShipsEar datasets demonstrate state-of-the-art accuracy: with 10% labeled data, DART-MT achieves 96.20% (DeepShip) and 94.86% (ShipsEar), surpassing baseline models by 7.2–9.8% in low-data regimes, while reaching 98.80% (DeepShip) and 98.85% (ShipsEar) with 90% labeled data. Under varying noise conditions (−20 dB to 20 dB), the model maintained a robust performance (F1-score: 92.4–97.1%) with 40% lower variance than its competitors, and ablation studies validated each module’s contribution (TriFusion Block alone improved accuracy by 6.9%). This research advances UATR by (1) resolving multi-scale feature fusion bottlenecks, (2) demonstrating the efficacy of semi-supervised learning in marine acoustics, and (3) providing an open-source implementation for reproducibility. In future work, we will extend cross-domain adaptation to diverse oceanic environments. Full article

(This article belongs to the Special Issue Remote Sensing Target Recognition and Detection: Theory and Applications (Second Edition))

► Show Figures

Figure 1

Search Results (384)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (384)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI