Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (302)

Search Parameters:
Keywords = cross-level feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 6798 KiB  
Article
Robust Optical and SAR Image Matching via Attention-Guided Structural Encoding and Confidence-Aware Filtering
by Qi Kang, Jixian Zhang, Guoman Huang and Fei Liu
Remote Sens. 2025, 17(14), 2501; https://doi.org/10.3390/rs17142501 - 18 Jul 2025
Abstract
Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and [...] Read more.
Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and efficient optical–SAR image registration. The proposed method integrates a structure-enhanced feature extractor, RS2FNet, which combines dual-stage Res2Net modules with a bi-level routing attention mechanism to capture multi-scale local textures and global structural semantics. A context-aware matching module refines correspondences through self- and cross-attention, coupled with a confidence-driven early-exit pruning strategy to reduce computational cost while maintaining accuracy. Additionally, a match-aware multi-task loss function jointly enforces spatial consistency, affine invariance, and structural coherence for end-to-end optimization. Experiments on public datasets (SEN1-2 and WHU-OPT-SAR) and a self-collected Gaofen (GF) dataset demonstrated that ACAMatch significantly outperformed existing state-of-the-art methods in terms of the number of correct matches, matching accuracy, and inference speed, especially under challenging conditions such as resolution differences and severe structural distortions. These results indicate the effectiveness and generalizability of the proposed approach for multimodal image registration, making ACAMatch a promising solution for remote sensing applications such as change detection and multi-sensor data fusion. Full article
(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)
Show Figures

Figure 1

21 pages, 2308 KiB  
Article
Forgery-Aware Guided Spatial–Frequency Feature Fusion for Face Image Forgery Detection
by Zhenxiang He, Zhihao Liu and Ziqi Zhao
Symmetry 2025, 17(7), 1148; https://doi.org/10.3390/sym17071148 - 18 Jul 2025
Abstract
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they [...] Read more.
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they still suffer from limited sensitivity to local forgery regions and inadequate interaction between spatial and frequency information in practical applications. To address these challenges, we propose a novel forgery-aware guided spatial–frequency feature fusion network. A lightweight U-Net is employed to generate pixel-level saliency maps by leveraging structural symmetry and semantic consistency, without relying on ground-truth masks. These maps dynamically guide the fusion of spatial features (from an improved Swin Transformer) and frequency features (via Haar wavelet transforms). Cross-domain attention, channel recalibration, and spatial gating are introduced to enhance feature complementarity and regional discrimination. Extensive experiments conducted on two benchmark face forgery datasets, FaceForensics++ and Celeb-DFv2, show that the proposed method consistently outperforms existing state-of-the-art techniques in terms of detection accuracy and generalization capability. The future work includes improving robustness under compression, incorporating temporal cues, extending to multimodal scenarios, and evaluating model efficiency for real-world deployment. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

23 pages, 6348 KiB  
Article
A Framework for Predicting Winter Wheat Yield in Northern China with Triple Cross-Attention and Multi-Source Data Fusion
by Shuyan Pan and Liqun Liu
Plants 2025, 14(14), 2206; https://doi.org/10.3390/plants14142206 - 16 Jul 2025
Viewed by 56
Abstract
To solve the issue that existing yield prediction methods do not fully capture the interaction between multiple factors, we propose a winter wheat yield prediction framework with triple cross-attention for multi-source data fusion. This framework consists of three modules: a multi-source data processing [...] Read more.
To solve the issue that existing yield prediction methods do not fully capture the interaction between multiple factors, we propose a winter wheat yield prediction framework with triple cross-attention for multi-source data fusion. This framework consists of three modules: a multi-source data processing module, a multi-source feature fusion module, and a yield prediction module. The multi-source data processing module collects satellite, climate, and soil data based on the winter wheat planting range, and constructs a multi-source feature sequence set by combining statistical data. The multi-source feature fusion module first extracts deeper-level feature information based on the characteristics of different data, and then performs multi-source feature fusion through a triple cross-attention fusion mechanism. The encoder part in the production prediction module adds a graph attention mechanism, forming a dual branch with the original multi-head self-attention mechanism to ensure the capture of global dependencies while enhancing the preservation of local feature information. The decoder section generates the final predicted output. The results show that: (1) Using 2021 and 2022 as test sets, the mean absolute error of our method is 385.99 kg/hm2, and the root mean squared error is 501.94 kg/hm2, which is lower than other methods. (2) It can be concluded that the jointing-heading stage (March to April) is the most crucial period affecting winter wheat production. (3) It is evident that our model has the ability to predict the final winter wheat yield nearly a month in advance. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

21 pages, 31171 KiB  
Article
Local Information-Driven Hierarchical Fusion of SAR and Visible Images via Refined Modal Salient Features
by Yunzhong Yan, La Jiang, Jun Li, Shuowei Liu and Zhen Liu
Remote Sens. 2025, 17(14), 2466; https://doi.org/10.3390/rs17142466 - 16 Jul 2025
Viewed by 51
Abstract
Compared to other multi-source image fusion tasks, visible and SAR image fusion faces a lack of training data in deep learning-based methods. Introducing structural priors to design fusion networks is a viable solution. We incorporated the feature hierarchy concept from computer vision, dividing [...] Read more.
Compared to other multi-source image fusion tasks, visible and SAR image fusion faces a lack of training data in deep learning-based methods. Introducing structural priors to design fusion networks is a viable solution. We incorporated the feature hierarchy concept from computer vision, dividing deep features into low-, mid-, and high-level tiers. Based on the complementary modal characteristics of SAR and visible, we designed a fusion architecture that fully analyze and utilize the difference of hierarchical features. Specifically, our framework has two stages. In the cross-modal enhancement stage, a CycleGAN generator-based method for cross-modal interaction and input data enhancement is employed to generate pseudo-modal images. In the fusion stage, we have three innovations: (1) We designed feature extraction branches and fusion strategies differently for each level based on the features of different levels and the complementary modal features of SAR and visible to fully utilize cross-modal complementary features. (2) We proposed the Layered Strictly Nested Framework (LSNF), which emphasizes hierarchical differences and uses hierarchical characteristics, to reduce feature redundancy. (3) Based on visual saliency theory, we proposed a Gradient-weighted Pixel Loss (GWPL), which dynamically assigns higher weights to regions with significant gradient magnitudes, emphasizing high-frequency detail preservation during fusion. Experiments on the YYX-OPT-SAR and WHU-OPT-SAR datasets show that our method outperforms 11 state-of-the-art methods. Ablation studies confirm each component’s contribution. This framework effectively meets remote sensing applications’ high-precision image fusion needs. Full article
Show Figures

Figure 1

18 pages, 4631 KiB  
Article
Semantic Segmentation of Rice Fields in Sub-Meter Satellite Imagery Using an HRNet-CA-Enhanced DeepLabV3+ Framework
by Yifan Shao, Pan Pan, Hongxin Zhao, Jiale Li, Guoping Yu, Guomin Zhou and Jianhua Zhang
Remote Sens. 2025, 17(14), 2404; https://doi.org/10.3390/rs17142404 - 11 Jul 2025
Viewed by 294
Abstract
Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- [...] Read more.
Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- and time-series-based approaches still struggle to preserve fine spatial details in sub-meter scenes. Targeting this gap, we propose an HRNet-CA-enhanced DeepLabV3+ that retains the original model’s strengths while resolving its two key weaknesses: (i) detail loss caused by repeated down-sampling and feature-pyramid compression and (ii) boundary blurring due to insufficient multi-scale information fusion. The Xception backbone is replaced with a High-Resolution Network (HRNet) to maintain full-resolution feature streams through multi-resolution parallel convolutions and cross-scale interactions. A coordinate attention (CA) block is embedded in the decoder to strengthen spatially explicit context and sharpen class boundaries. The rice dataset consisted of 23,295 images (11,295 rice + 12,000 non-rice) via preprocessing and manual labeling and benchmarked the proposed model against classical segmentation networks. Our approach boosts boundary segmentation accuracy to 92.28% MIOU and raises texture-level discrimination to 95.93% F1, without extra inference latency. Although this study focuses on architecture optimization, the HRNet-CA backbone is readily compatible with future multi-source fusion and time-series modules, offering a unified path toward operational paddy mapping in fragmented sub-meter landscapes. Full article
Show Figures

Figure 1

21 pages, 12122 KiB  
Article
RA3T: An Innovative Region-Aligned 3D Transformer for Self-Supervised Sim-to-Real Adaptation in Low-Altitude UAV Vision
by Xingrao Ma, Jie Xie, Di Shao, Aiting Yao and Chengzu Dong
Electronics 2025, 14(14), 2797; https://doi.org/10.3390/electronics14142797 - 11 Jul 2025
Viewed by 157
Abstract
Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework [...] Read more.
Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework that enables robust Sim-to-Real adaptation. Specifically, we first develop a dual-branch strategy for self-supervised feature learning, integrating Masked Autoencoders and contrastive learning. This approach extracts domain-invariant representations from unlabeled simulated imagery to enhance robustness against occlusion while reducing annotation dependency. Leveraging these learned features, we then introduce a 3D Transformer fusion module that unifies multi-view RGB and LiDAR point clouds through cross-modal attention. By explicitly modeling spatial layouts and height differentials, this component significantly improves recognition of small and occluded targets in complex low-altitude environments. To address persistent fine-grained domain shifts, we finally design region-level adversarial calibration that deploys local discriminators on partitioned feature maps. This mechanism directly aligns texture, shadow, and illumination discrepancies which challenge conventional global alignment methods. Extensive experiments on UAV benchmarks VisDrone and DOTA demonstrate the effectiveness of RA3T. The framework achieves +5.1% mAP on VisDrone and +7.4% mAP on DOTA over the 2D adversarial baseline, particularly on small objects and sparse occlusions, while maintaining real-time performance of 17 FPS at 1024 × 1024 resolution on an RTX 4080 GPU. Visual analysis confirms that the synergistic integration of 3D geometric encoding and local adversarial alignment effectively mitigates domain gaps caused by uneven illumination and perspective variations, establishing an efficient pathway for simulation-to-reality UAV perception. Full article
(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)
Show Figures

Figure 1

28 pages, 14588 KiB  
Article
CAU2DNet: A Dual-Branch Deep Learning Network and a Dataset for Slum Recognition with Multi-Source Remote Sensing Data
by Xi Lyu, Chenyu Zhang, Lizhi Miao, Xiying Sun, Xinxin Zhou, Xinyi Yue, Zhongchang Sun and Yueyong Pang
Remote Sens. 2025, 17(14), 2359; https://doi.org/10.3390/rs17142359 - 9 Jul 2025
Viewed by 165
Abstract
The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face [...] Read more.
The efficient and precise identification of urban slums is a significant challenge for urban planning and sustainable development, as their morphological diversity and complex spatial distribution make it difficult to use traditional remote sensing inversion methods. Current deep learning (DL) methods mainly face challenges such as limited receptive fields and insufficient sensitivity to spatial locations when integrating multi-source remote sensing data, and high-quality datasets that integrate multi-spectral and geoscientific indicators to support them are scarce. In response to these issues, this study proposes a DL model (coordinate-attentive U2-DeepLab network [CAU2DNet]) that integrates multi-source remote sensing data. The model integrates the multi-scale feature extraction capability of U2-Net with the global receptive field advantage of DeepLabV3+ through a dual-branch architecture. Thereafter, the spatial semantic perception capability is enhanced by introducing the CoordAttention mechanism, and ConvNextV2 is adopted to optimize the backbone network of the DeepLabV3+ branch, thereby improving the modeling capability of low-resolution geoscientific features. The two branches adopt a decision-level fusion mechanism for feature fusion, which means that the results of each are weighted and summed using learnable weights to obtain the final output feature map. Furthermore, this study constructs the São Paulo slums dataset for model training due to the lack of a multi-spectral slum dataset. This dataset covers 7978 samples of 512 × 512 pixels, integrating high-resolution RGB images, Normalized Difference Vegetation Index (NDVI)/Modified Normalized Difference Water Index (MNDWI) geoscientific indicators, and POI infrastructure data, which can significantly enrich multi-source slum remote sensing data. Experiments have shown that CAU2DNet achieves an intersection over union (IoU) of 0.6372 and an F1 score of 77.97% on the São Paulo slums dataset, indicating a significant improvement in accuracy over the baseline model. The ablation experiments verify that the improvements made in this study have resulted in a 16.12% increase in precision. Moreover, CAU2DNet also achieved the best results in all metrics during the cross-domain testing on the WHU building dataset, further confirming the model’s generalizability. Full article
Show Figures

Figure 1

19 pages, 528 KiB  
Article
Quantum-Inspired Attention-Based Semantic Dependency Fusion Model for Aspect-Based Sentiment Analysis
by Chenyang Xu, Xihan Wang, Jiacheng Tang, Yihang Wang, Lianhe Shao and Quanli Gao
Axioms 2025, 14(7), 525; https://doi.org/10.3390/axioms14070525 - 9 Jul 2025
Viewed by 218
Abstract
Aspect-Based Sentiment Analysis (ABSA) has gained significant popularity in recent years, which emphasizes the aspect-level sentiment representation of sentences. Current methods for ABSA often use pre-trained models and graph convolution to represent word dependencies. However, they struggle with long-range dependency issues in lengthy [...] Read more.
Aspect-Based Sentiment Analysis (ABSA) has gained significant popularity in recent years, which emphasizes the aspect-level sentiment representation of sentences. Current methods for ABSA often use pre-trained models and graph convolution to represent word dependencies. However, they struggle with long-range dependency issues in lengthy texts, resulting in averaging and loss of contextual semantic information. In this paper, we explore how richer semantic relationships can be encoded more efficiently. Inspired by quantum theory, we construct superposition states from text sequences and utilize them with quantum measurements to explicitly capture complex semantic relationships within word sequences. Specifically, we propose an attention-based semantic dependency fusion method for ABSA, which employs a quantum embedding module to create a superposition state of real-valued word sequence features in a complex-valued Hilbert space. This approach yields a word sequence density matrix representation that enhances the handling of long-range dependencies. Furthermore, we introduce a quantum cross-attention mechanism to integrate sequence features with dependency relationships between specific word pairs, aiming to capture the associations between particular aspects and comments more comprehensively. Our experiments on the SemEval-2014 and Twitter datasets demonstrate the effectiveness of the quantum-inspired attention-based semantic dependency fusion model for the ABSA task. Full article
Show Figures

Figure 1

25 pages, 11253 KiB  
Article
YOLO-UIR: A Lightweight and Accurate Infrared Object Detection Network Using UAV Platforms
by Chao Wang, Rongdi Wang, Ziwei Wu, Zetao Bian and Tao Huang
Drones 2025, 9(7), 479; https://doi.org/10.3390/drones9070479 - 7 Jul 2025
Viewed by 383
Abstract
Within the field of remote sensing, Unmanned Aerial Vehicle (UAV) infrared object detection plays a pivotal role, especially in complex environments. However, existing methods face challenges such as insufficient accuracy or low computational efficiency, particularly in the detection of small objects. This paper [...] Read more.
Within the field of remote sensing, Unmanned Aerial Vehicle (UAV) infrared object detection plays a pivotal role, especially in complex environments. However, existing methods face challenges such as insufficient accuracy or low computational efficiency, particularly in the detection of small objects. This paper proposes a lightweight and accurate UAV infrared object detection model, YOLO-UIR, for small object detection from a UAV perspective. The model is based on the YOLO architecture and mainly includes the Efficient C2f module, lightweight spatial perception (LSP) module, and bidirectional feature interaction fusion (BFIF) module. The Efficient C2f module significantly enhances feature extraction capabilities by combining local and global features through an Adaptive Dual-Stream Attention Mechanism. Compared with the existing C2f module, the introduction of Partial Convolution reduces the model’s parameter count while maintaining high detection accuracy. The BFIF module further enhances feature fusion effects through cross-level semantic interaction, thereby improving the model’s ability to fuse contextual features. Moreover, the LSP module efficiently combines features from different distances using Large Receptive Field Convolution Layers, significantly enhancing the model’s long-range information capture capability. Additionally, the use of Reparameterized Convolution and Depthwise Separable Convolution ensures the model’s lightweight nature, making it highly suitable for real-time applications. On the DroneVehicle and HIT-UAV datasets, YOLO-UIR achieves superior detection performance compared to existing methods, with an mAP of 71.1% and 90.7%, respectively. The model also demonstrates significant advantages in terms of computational efficiency and parameter count. Ablation experiments verify the effectiveness of each optimization module. Full article
(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)
Show Figures

Figure 1

23 pages, 2463 KiB  
Article
MCDet: Target-Aware Fusion for RGB-T Fire Detection
by Yuezhu Xu, He Wang, Yuan Bi, Guohao Nie and Xingmei Wang
Forests 2025, 16(7), 1088; https://doi.org/10.3390/f16071088 - 30 Jun 2025
Viewed by 225
Abstract
Forest fire detection is vital for ecological conservation and disaster management. Existing visual detection methods exhibit instability in smoke-obscured or illumination-variable environments. Although multimodal fusion has demonstrated potential, effectively resolving inconsistencies in smoke features across diverse modalities remains a significant challenge. This issue [...] Read more.
Forest fire detection is vital for ecological conservation and disaster management. Existing visual detection methods exhibit instability in smoke-obscured or illumination-variable environments. Although multimodal fusion has demonstrated potential, effectively resolving inconsistencies in smoke features across diverse modalities remains a significant challenge. This issue stems from the inherent ambiguity between regions characterized by high temperatures in infrared imagery and those with elevated brightness levels in visible-light imaging systems. In this paper, we propose MCDet, an RGB-T forest fire detection framework incorporating target-aware fusion. To alleviate feature cross-modal ambiguity, we design a Multidimensional Representation Collaborative Fusion module (MRCF), which constructs global feature interactions via a state-space model and enhances local detail perception through deformable convolution. Then, a content-guided attention network (CGAN) is introduced to aggregate multidimensional features by dynamic gating mechanism. Building upon this foundation, the integration of WIoU further suppresses vegetation occlusion and illumination interference on a holistic level, thereby reducing the false detection rate. Evaluated on three forest fire datasets and one pedestrian dataset, MCDet achieves a mean detection accuracy of 77.5%, surpassing advanced methods. This performance makes MCDet a practical solution to enhance early warning system reliability. Full article
(This article belongs to the Special Issue Advanced Technologies for Forest Fire Detection and Monitoring)
Show Figures

Figure 1

29 pages, 18908 KiB  
Article
Toward Efficient UAV-Based Small Object Detection: A Lightweight Network with Enhanced Feature Fusion
by Xingyu Di, Kangning Cui and Rui-Feng Wang
Remote Sens. 2025, 17(13), 2235; https://doi.org/10.3390/rs17132235 - 29 Jun 2025
Viewed by 514
Abstract
UAV-based small target detection is crucial in environmental monitoring, circuit detection, and related applications. However, UAV images often face challenges such as significant scale variation, dense small targets, high inter-class similarity, and intra-class diversity, which can lead to missed detections, thus reducing performance. [...] Read more.
UAV-based small target detection is crucial in environmental monitoring, circuit detection, and related applications. However, UAV images often face challenges such as significant scale variation, dense small targets, high inter-class similarity, and intra-class diversity, which can lead to missed detections, thus reducing performance. To solve these problems, this study proposes a lightweight and high-precision model UAV-YOLO based on YOLOv8s. Firstly, a double separation convolution (DSC) module is designed to replace the Bottleneck structure in the C2f module with deep separable convolution and point-by-point convolution fusion, which can reduce the model parameters and calculation complexity while enhancing feature expression. Secondly, a new SPPL module is proposed, which combines spatial pyramid pooling rapid fusion (SPPF) with long-distance dependency modeling (LSKA) to improve the robustness of the model to multi-scale targets through cross-level feature association. Then, DyHead is used to replace the original detector head, and the discrimination ability of small targets in complex background is enhanced by adaptive weight allocation and cross-scale feature optimization fusion. Finally, the WIPIoU loss function is proposed, which integrates the advantages of Wise-IoU, MPDIoU and Inner-IoU, and incorporates the geometric center of bounding box, aspect ratio and overlap degree into a unified measure to improve the localization accuracy of small targets and accelerate the convergence. The experimental results on the VisDrone2019 dataset showed that compared to YOLOv8s, UAV-YOLO achieved an 8.9% improvement in the recall of mAP@0.5 and 6.8%, while the parameters and calculations were reduced by 23.4% and 40.7%, respectively. Additional evaluations of the DIOR, RSOD, and NWPU VHR-10 datasets demonstrate the generalization capability of the model. Full article
(This article belongs to the Special Issue Geospatial Intelligence in Remote Sensing)
Show Figures

Figure 1

14 pages, 1438 KiB  
Article
CDBA-GAN: A Conditional Dual-Branch Attention Generative Adversarial Network for Robust Sonar Image Generation
by Wanzeng Kong, Han Yang, Mingyang Jia and Zhe Chen
Appl. Sci. 2025, 15(13), 7212; https://doi.org/10.3390/app15137212 - 26 Jun 2025
Viewed by 244
Abstract
The acquisition of real-world sonar data necessitates substantial investments of manpower, material resources, and financial capital, rendering it challenging to obtain sufficient authentic samples for sonar-related research tasks. Consequently, sonar image simulation technology has become increasingly vital in the field of sonar data [...] Read more.
The acquisition of real-world sonar data necessitates substantial investments of manpower, material resources, and financial capital, rendering it challenging to obtain sufficient authentic samples for sonar-related research tasks. Consequently, sonar image simulation technology has become increasingly vital in the field of sonar data analysis. Traditional sonar simulation methods predominantly focus on low-level physical modeling, which often suffers from limited image controllability and diminished fidelity in multi-category and multi-background scenarios. To address these limitations, this paper proposes a Conditional Dual-Branch Attention Generative Adversarial Network (CDBA-GAN). The framework comprises three key innovations: The conditional information fusion module, dual-branch attention feature fusion mechanism, and cross-layer feature reuse. By integrating encoded conditional information with the original input data of the generative adversarial network, the fusion module enables precise control over the generation of sonar images under specific conditions. A hierarchical attention mechanism is implemented, sequentially performing channel-level and pixel-level attention operations. This establishes distinct weight matrices at both granularities, thereby enhancing the correlation between corresponding elements. The dual-branch attention features are fused via a skip-connection architecture, facilitating efficient feature reuse across network layers. The experimental results demonstrate that the proposed CDBA-GAN generates condition-specific sonar images with a significantly lower Fréchet inception distance (FID) compared to existing methods. Notably, the framework exhibits robust imaging performance under noisy interference and outperforms state-of-the-art models (e.g., DCGAN, WGAN, SAGAN) in fidelity across four categorical conditions, as quantified by FID metrics. Full article
Show Figures

Figure 1

32 pages, 7048 KiB  
Article
DCMC-UNet: A Novel Segmentation Model for Carbon Traces in Oil-Immersed Transformers Improved with Dynamic Feature Fusion and Adaptive Illumination Enhancement
by Hongxin Ji, Jiaqi Li, Zhennan Shi, Zijian Tang, Xinghua Liu and Peilin Han
Sensors 2025, 25(13), 3904; https://doi.org/10.3390/s25133904 - 23 Jun 2025
Viewed by 260
Abstract
For large oil-immersed transformers, their metal-enclosed structure poses significant challenges for direct visual inspection of internal defects. To ensure the effective detection of internal insulation defects, this study employs a self-developed micro-robot for internal visual inspection. Given the substantial morphological and dimensional variations [...] Read more.
For large oil-immersed transformers, their metal-enclosed structure poses significant challenges for direct visual inspection of internal defects. To ensure the effective detection of internal insulation defects, this study employs a self-developed micro-robot for internal visual inspection. Given the substantial morphological and dimensional variations of target defects (e.g., carbon traces produced by surface discharge inside the transformer), the intelligent and efficient extraction of carbon trace features from complex backgrounds becomes critical for robotic inspection. To address these challenges, we propose the DCMC-UNet, a semantic segmentation model for carbon traces containing adaptive illumination enhancement and dynamic feature fusion. For blurred carbon trace images caused by unstable light reflection and illumination in transformer oil, an improved CLAHE algorithm is developed, incorporating learnable parameters to balance luminance and contrast while enhancing edge features of carbon traces. To handle the morphological diversity and edge complexity of carbon traces, a dynamic deformable encoder (DDE) was integrated into the encoder, leveraging deformable convolutional kernels to improve carbon trace feature extraction. An edge-aware decoder (EAD) was integrated into the decoder, which extracts edge details from predicted segmentation maps and fuses them with encoded features to enrich edge features. To mitigate the semantic gap between the encoder and the decoder, we replace the standard skip connection with a cross-level attention connection fusion layer (CLFC), enhancing the multi-scale fusion of morphological and edge features. Furthermore, a multi-scale atrous feature aggregation module (MAFA) is designed in the neck to enhance the integration of deep semantic and shallow visual features, improving multi-dimensional feature fusion. Experimental results demonstrate that DCMC-UNet outperforms U-Net, U-Net++, and other benchmarks in carbon trace segmentation. For the transformer carbon trace dataset, it achieves better segmentation than the baseline U-Net, with an improved mIoU of 14.04%, Dice of 10.87%, pixel accuracy (P) of 10.97%, and overall accuracy (Acc) of 5.77%. The proposed model provides reliable technical support for surface discharge intensity assessment and insulation condition evaluation in oil-immersed transformers. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

28 pages, 114336 KiB  
Article
Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images
by Qiyuan Zhang, Xiaodan Zhang, Chen Quan, Tong Zhao, Wei Huo and Yuanchen Huang
Remote Sens. 2025, 17(13), 2135; https://doi.org/10.3390/rs17132135 - 21 Jun 2025
Viewed by 474
Abstract
Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers [...] Read more.
Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers are hard to scale due to quadratic computational complexity and high memory consumption. To address these challenges, this study introduces an end-to-end remote sensing image spatiotemporal fusion approach based on the Mamba architecture (Mamba-spatiotemporal fusion model, Mamba-STFM), marking the first application of Mamba in this domain and presenting a novel paradigm for spatiotemporal fusion model design. Mamba-STFM consists of a feature extraction encoder and a feature fusion decoder. At the core of the encoder is the visual state space-FuseCore-AttNet block (VSS-FCAN block), which deeply integrates linear complexity cross-scan global perception with a channel attention mechanism, significantly reducing quadratic-level computation and memory overhead while improving inference throughput through parallel scanning and kernel fusion techniques. The decoder’s core is the spatiotemporal mixture-of-experts fusion module (STF-MoE block), composed of our novel spatial expert and temporal expert modules. The spatial expert adaptively adjusts channel weights to optimize spatial feature representation, enabling precise alignment and fusion of multi-resolution images, while the temporal expert incorporates a temporal squeeze-and-excitation mechanism and selective state space model (SSM) techniques to efficiently capture short-range temporal dependencies, maintain linear sequence modeling complexity, and further enhance overall spatiotemporal fusion throughput. Extensive experiments on public datasets demonstrate that Mamba-STFM outperforms existing methods in fusion quality; ablation studies validate the effectiveness of each core module; and efficiency analyses and application comparisons further confirm the model’s superior performance. Full article
Show Figures

Figure 1

12 pages, 2801 KiB  
Article
Multi-Algorithm Feature Extraction from Dual Sections for the Recognition of Three African Redwoods
by Jiawen Sun, Jiashun Niu, Liren Xu, Jianping Sun and Linhong Zhao
Forests 2025, 16(7), 1043; https://doi.org/10.3390/f16071043 - 21 Jun 2025
Viewed by 270
Abstract
To address the persistent challenge of low recognition accuracy in precious wood species classification, this study proposes a novel methodology for identifying Pterocarpus santalinus, Pterocarpus tinctorius (PTD), and Pterocarpus tinctorius (Zambia). This approach synergistically integrates artificial neural networks (ANNs) with advanced image feature [...] Read more.
To address the persistent challenge of low recognition accuracy in precious wood species classification, this study proposes a novel methodology for identifying Pterocarpus santalinus, Pterocarpus tinctorius (PTD), and Pterocarpus tinctorius (Zambia). This approach synergistically integrates artificial neural networks (ANNs) with advanced image feature extraction techniques, specifically Fast Fourier Transform, Gabor Transform, Wavelet Transform, and Gray-Level Co-occurrence Matrix. Features were extracted from both transverse and longitudinal wood sections. Fifteen distinct ANN models were subsequently developed: hybrid-section models combined features from different sections using a single algorithm, while multi-algorithm models aggregated features from the same section across all four algorithms. The dual-section hybrid wavelet model (LC4) demonstrated superior performance, achieving a perfect 100% recognition accuracy. High accuracies were also observed in the four-parameter combination models for longitudinal (L5) and transverse (C5) sections, yielding 97.62% and 91.67%, respectively. Notably, 92.31% of the LC4 model’s test samples exhibited an absolute error of ≤1%, highlighting its high reliability and precision. These findings confirm the efficacy of integrating image processing with neural networks for fine-grained wood identification and underscore the exceptional discriminative power of wavelet-based features in cross-sectional data fusion. Full article
(This article belongs to the Section Wood Science and Forest Products)
Show Figures

Figure 1

Back to TopTop