MDPI - Publisher of Open Access Journals

35 pages, 9814 KB

Open AccessArticle

EO2SAR-Diff: Structure-Aware Latent Diffusion for Unpaired EO-to-SAR Translation

by Yeon-Wook Kim and Kiyoung Kim

Remote Sens. 2026, 18(12), 2037; https://doi.org/10.3390/rs18122037 - 18 Jun 2026

Viewed by 228

Synthetic aperture radar (SAR) imagery provides all-weather, day-and-night observation capabilities that complement electro-optical (EO) imaging; however, the limited number of operational SAR satellites and the difficulty of acquiring expert-annotated SAR datasets constrain deep-learning-based SAR image analysis. In this paper, we propose EO2SAR-Diff, a [...] Read more.

Synthetic aperture radar (SAR) imagery provides all-weather, day-and-night observation capabilities that complement electro-optical (EO) imaging; however, the limited number of operational SAR satellites and the difficulty of acquiring expert-annotated SAR datasets constrain deep-learning-based SAR image analysis. In this paper, we propose EO2SAR-Diff, a conditional latent diffusion framework that translates EO aerial images into realistic synthetic SAR images. The framework comprises three core components: (1) domain-adaptive LoRA pre-training that anchors the Stable Diffusion backbone in the remote sensing domain, (2) a style extraction and injection network that captures SAR-specific visual characteristics via multi-scale feature encoding and parallel cross-attention, and (3) a multi-branch ControlNet with three parallel branches for complementary structural guidance. These components are coordinated by a dual-axis feature injection strategy that modulates conditioning strength along both spatial (per-block) and temporal (per-timestep) dimensions. Experiments on the DOTA 1.0 and SARDet-100K datasets demonstrate that EO2SAR-Diff ranks in the top tier among all compared methods in distributional alignment with real SAR imagery, in terms of FID and KID computed with two SAR-domain-adapted feature extractors. Augmenting the SAR training set with our synthetic images yields consistent improvements in downstream object detection performance, confirming the practical utility of the proposed framework. Full article

(This article belongs to the Special Issue AI-Driven Remote Sensing Image Restoration and Generation)

► Show Figures

Figure 1

31 pages, 4605 KB

Open AccessArticle

A Dual-Branch Lightweight Network for Multimodal Image Fusion with Mamba and INN

by Nan Li, Hongxin Li and Lin Tian

Sensors 2026, 26(12), 3814; https://doi.org/10.3390/s26123814 - 15 Jun 2026

Viewed by 303

Abstract

Multimodal image fusion aims to integrate complementary information from heterogeneous imaging modalities into a single informative image. However, many deep learning-based fusion methods rely on complex feature extractors, leading to high computational cost and limited suitability for real-time deployment on resource-constrained devices. To [...] Read more.

Multimodal image fusion aims to integrate complementary information from heterogeneous imaging modalities into a single informative image. However, many deep learning-based fusion methods rely on complex feature extractors, leading to high computational cost and limited suitability for real-time deployment on resource-constrained devices. To address this issue, this paper proposes a lightweight Mamba-INN dual-branch network for efficient multimodal image fusion. The proposed model decouples global structure modeling from local detail preservation. A simplified Mamba-inspired branch is designed to capture long-range contextual dependencies, while a lightweight invertible neural network branch preserves high-frequency textures and edge information through information-preserving transformations. The lightweight INN branch preserves high-frequency texture and edge information during the forward feature transformation process through reversible feature partitioning, coupled transformations, and exponential scale modulation, thereby reducing the loss of detail caused by feature compression. Compact shallow feature refinement, module reuse, low-dimensional channel design, and a streamlined decoder are further introduced to reduce redundant computation. Experiments on infrared-visible and medical image fusion benchmarks, including MSRS, TNO, RoadScene, MRI-CT, MRI-PET, and MRI-SPECT datasets, demonstrate that the proposed method achieves competitive fusion quality with low model complexity. The proposed method achieves performance comparable to or better than that of methods such as CDDFuse, U2Fusion, CNN and SDNet on metrics including MI, VIF, Qabf, and SSIM for infrared-visible and medical image fusion tasks, while containing only 0.24 million parameters and requiring 24.04 GFLOPs of computational power at an input resolution of 256 × 256. Compared to CDDFuse, our method significantly reduces model complexity, enhancing the potential for lightweight deployment while maintaining fusion quality. Full article

(This article belongs to the Special Issue Multi-Source Image Fusion, Restoration, and Understanding and Its Application in Sensing)

► Show Figures

Figure 1

28 pages, 25031 KB

Open AccessArticle

HMT-Net: A Hybrid Mamba–Transformer Network for Motor Imagery EEG Decoding

by Tingting Zhang, Haorong Liao, Yiming Mu, Junfeng Han, Nan Li, Guoyu Hu and Xiangzeng Kong

Mathematics 2026, 14(12), 2149; https://doi.org/10.3390/math14122149 - 15 Jun 2026

Viewed by 164

Abstract

Electroencephalography (EEG) is widely used in brain-computer interfaces (BCIs) for decoding motor imagery (MI) signals. However, existing methods remain limited in extracting multi-scale local spatiotemporal features and effectively integrating them with global feature information, leaving room for further improvement in classification accuracy. To [...] Read more.

Electroencephalography (EEG) is widely used in brain-computer interfaces (BCIs) for decoding motor imagery (MI) signals. However, existing methods remain limited in extracting multi-scale local spatiotemporal features and effectively integrating them with global feature information, leaving room for further improvement in classification accuracy. To address this issue, we propose HMT-Net, a hybrid architecture that integrates multi-scale convolution, the Mamba state-space model, and a self-attention mechanism. The model consists of a shallow feature embedding (SFE) module for spatiotemporal feature extraction, a multi-scale local feature extractor (MSLFE), and a Mamba–transformer global feature encoder (MTGFE). Specifically, the MSLFE employs dual-branch convolutions and channel attention to achieve adaptive multi-scale perception, while the MTGFE combines Mamba’s linear sequence modeling capability with multi-head attention to efficiently capture global dependencies. Unlike conventional Mamba or transformer EEG models, HMT-Net couples linear state-space modeling with global pairwise attention, avoiding the representational limits inherent in each individual architecture. Experiments on the BCI-IV-2a, BCI-IV-2b, and HGD datasets show that HMT-Net achieves subject-dependent accuracies of 84.07%, 89.60%, and 96.02%, respectively, outperforming EEGNet, FBCNet, EEGConformer, and ATCNet by 11.65%, 5.02%, 5.13%, and 6.60%, respectively, on BCI-IV-2a. Furthermore, HMT-Net achieves the best accuracy in subject-independent experiments, demonstrating strong generalization capability. Ablation studies and visualizations further validate the effectiveness and interpretability of the proposed model. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

22 pages, 35239 KB

Open AccessArticle

TBDDQN: Imbalanced Fault Diagnosis for Blast Furnace Ironmaking Process via Transformer–BiLSTM Double Deep Q-Networks

by Jinlong Zheng, Ping Wu, Ruirui Zuo, Xin Su, Yinzhu Liu and Nabin Kandel

Machines 2026, 14(3), 276; https://doi.org/10.3390/machines14030276 - 2 Mar 2026

Viewed by 502

Abstract

The blast furnace ironmaking process (BFIP) is a highly complex and dynamic industrial system where strong spatiotemporal coupling and severe data imbalance pose substantial challenges for fault diagnosis. To address these issues, this study proposes a Transformer–BiLSTM Double Deep Q-Network (TBDDQN) framework for [...] Read more.

The blast furnace ironmaking process (BFIP) is a highly complex and dynamic industrial system where strong spatiotemporal coupling and severe data imbalance pose substantial challenges for fault diagnosis. To address these issues, this study proposes a Transformer–BiLSTM Double Deep Q-Network (TBDDQN) framework for intelligent fault diagnosis. The framework employs a dual-branch architecture that integrates a Transformer-based spatial encoder with a BiLSTM-attention temporal extractor to capture global dependencies and dynamic patterns from multivariate time-series data. To mitigate class imbalance and asymmetric fault costs, a cost-sensitive reinforcement learning scheme based on Double DQN is incorporated, featuring prioritized experience replay and adaptive misclassification penalties. Experiments on real blast furnace datasets show that TBDDQN achieves a macro-averaged precision of 0.970 and a macro-averaged F1-score of 0.929, outperforming conventional CNN, LSTM, and DQN-based baselines. These results demonstrate that TBDDQN offers a robust and interpretable solution for imbalanced industrial fault diagnosis in the BFIP. Full article

(This article belongs to the Special Issue Data-Driven and AI-Based Fault Diagnosis for Industrial Dynamic Systems)

► Show Figures

Figure 1

22 pages, 4477 KB

Open AccessArticle

Robust Detection and Localization of Image Copy-Move Forgery Using Multi-Feature Fusion

by Kaiqi Lu and Qiuyu Zhang

J. Imaging 2026, 12(2), 75; https://doi.org/10.3390/jimaging12020075 - 10 Feb 2026

Viewed by 1237

Abstract

Copy-move forgery detection (CMFD) is a crucial image forensics analysis technique. The rapid development of deep learning algorithms has led to impressive advancements in CMFD. However, existing models suffer from two key limitations: Their feature fusion modules insufficiently exploit the complementary nature of [...] Read more.

Copy-move forgery detection (CMFD) is a crucial image forensics analysis technique. The rapid development of deep learning algorithms has led to impressive advancements in CMFD. However, existing models suffer from two key limitations: Their feature fusion modules insufficiently exploit the complementary nature of features from the RGB domain and noise domain, resulting in suboptimal feature representations. During decoding, they simply classify pixels as authentic or forged, without aggregating cross-layer information or integrating local and global attention mechanisms, leading to unsatisfactory detection precision. To overcome these limitations, a robust detection and localization approach to image copy-move forgery using multi-feature fusion is proposed. Firstly, a Multi-Feature Fusion Network (MFFNet) was designed. Within its feature fusion module, features from both the RGB domain and noise domain were fused to enable mutual complementarity between distinct characteristics, yielding richer feature information. Then, a Lightweight Multi-layer Perceptron Decoder (LMPD) was developed for image reconstruction and forgery localization map generation. Finally, by aggregating information from different layers and combining local and global attention mechanisms, more accurate prediction masks were obtained. The experimental results demonstrate that the proposed MFFNet model exhibits enhanced robustness and superior detection and localization performance compared to existing methods when faced with JPEG compression, noise addition, and resizing operations. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

26 pages, 4105 KB

Open AccessArticle

Robust Dual-Stream Diagnosis Network for Ultrasound Breast Tumor Classification with Cross-Domain Segmentation Priors

by Xiaokai Jiang, Xuewen Ding, Jinying Ma, Chunyu Liu and Xinyi Li

Sensors 2026, 26(3), 974; https://doi.org/10.3390/s26030974 - 2 Feb 2026

Viewed by 729

Abstract

Ultrasound imaging is widely used for early breast cancer screening to enhance patient survival. However, interpreting these images is inherently challenging due to speckle noise, low lesion-to-tissue contrast, and highly variable tumor morphology within complex anatomical structures. Additionally, variations in image characteristics across [...] Read more.

Ultrasound imaging is widely used for early breast cancer screening to enhance patient survival. However, interpreting these images is inherently challenging due to speckle noise, low lesion-to-tissue contrast, and highly variable tumor morphology within complex anatomical structures. Additionally, variations in image characteristics across institutions and devices further impede the development of robust and generalizable computer-aided diagnostic systems. To alleviate these issues, this paper presents a cross-domain segmentation prior guided classification strategy for robust breast tumor diagnosis in ultrasound imaging, implemented through a novel Dual-Stream Diagnosis Network (DSDNet). DSDNet adopts a decoupled dual-stream architecture, where a frozen segmentation branch supplies spatial priors to guide the classification backbone. This design enables stable and accurate performance across diverse imaging conditions and clinical settings. To realize the proposed DSDNet framework, three novel modules are created. The Dual-Stream Mask Attention (DSMA) module enhances lesion priors by jointly modeling foreground and background cues. The Segmentation Prior Guidance Fusion (SPGF) module integrates multi-scale priors into the classification backbone using cross-domain spatial cues, improving tumor morphology representation. The Mamba-Inspired Linear Transformer (MILT) block, built upon the Mamba-Inspired Linear Attention (MILA) mechanism, serves as an efficient attention-based feature extractor. On the BUSI, BUS, and GDPH_SYSUCC datasets, DSDNet achieves ACC values of 0.878, 0.836, and 0.882, and Recall scores of 0.866, 0.789, and 0.878, respectively. These results highlight the effectiveness and strong classification performance of our method in ultrasound breast cancer diagnosis. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

21 pages, 1300 KB

Open AccessArticle

CAIC-Net: Robust Radio Modulation Classification via Unified Dynamic Cross-Attention and Cross-Signal-to-Noise Ratio Contrastive Learning

by Teng Wu, Quan Zhu, Runze Mao, Changzhen Hu and Shengjun Wei

Sensors 2026, 26(3), 756; https://doi.org/10.3390/s26030756 - 23 Jan 2026

Cited by 1 | Viewed by 658

Abstract

In complex wireless communication environments, automatic modulation classification (AMC) faces two critical challenges: the lack of robustness under low-signal-to-noise ratio (SNR) conditions and the inefficiency of integrating multi-scale feature representations. To address these issues, this paper proposes CAIC-Net, a robust modulation classification network [...] Read more.

In complex wireless communication environments, automatic modulation classification (AMC) faces two critical challenges: the lack of robustness under low-signal-to-noise ratio (SNR) conditions and the inefficiency of integrating multi-scale feature representations. To address these issues, this paper proposes CAIC-Net, a robust modulation classification network that integrates a dynamic cross-attention mechanism with a cross-SNR contrastive learning strategy. CAIC-Net employs a dual-stream feature extractor composed of ConvLSTM2D and Transformer blocks to capture local temporal dependencies and global contextual relationships, respectively. To enhance fusion effectiveness, we design a Dynamic Cross-Attention Unit (CAU) that enables deep bidirectional interaction between the two branches while incorporating an SNR-aware mechanism to adaptively adjust the fusion strategy under varying channel conditions. In addition, a Cross-SNR Contrastive Learning (CSCL) module is introduced as an auxiliary task, where positive and negative sample pairs are constructed across different SNR levels and optimized using InfoNCE loss. This design significantly strengthens the intrinsic noise-invariant properties of the learned representations. Extensive experiments conducted on two standard datasets demonstrate that CAIC-Net achieves competitive classification performance at moderate-to-high SNRs and exhibits clear advantages in extremely low-SNR scenarios, validating the effectiveness and strong generalization capability of the proposed approach. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

30 pages, 8453 KB

Open AccessArticle

PBZGNet: A Novel Defect Detection Network for Substation Equipment Based on Gradual Parallel Branch Architecture

by Mintao Hu, Yang Zhuang, Jiahao Wang, Yaoyi Hu, Desheng Sun, Dawei Xu and Yongjie Zhai

Sensors 2026, 26(1), 300; https://doi.org/10.3390/s26010300 - 2 Jan 2026

Viewed by 958

Abstract

As power systems expand and grow smarter, the safe and steady operation of substation equipment has become a prerequisite for grid reliability. In cluttered substation scenes, however, existing deep learning detectors still struggle with small targets, multi-scale feature fusion, and precise localization. To [...] Read more.

As power systems expand and grow smarter, the safe and steady operation of substation equipment has become a prerequisite for grid reliability. In cluttered substation scenes, however, existing deep learning detectors still struggle with small targets, multi-scale feature fusion, and precise localization. To overcome these limitations, we introduce PBZGNet, a defect-detection network that couples a gradual parallel-branch backbone, a zoom-fusion neck, and a global channel-recalibration module. First, BiCoreNet is embedded in the feature extractor: dual-core parallel paths, reversible residual links, and channel recalibration cooperate to mine fault-sensitive cues. Second, cross-scale ZFusion and Concat-CBFuse are dynamically merged so that no scale loses information; a hierarchical composite feature pyramid is then formed, strengthening the representation of both complex objects and tiny flaws. Third, an attention-guided decoupled detection head (ADHead) refines responses to obscured and minute defect patterns. Finally, within the Generalized Focal Loss framework, a quality rating scheme suppresses background interference while distribution regression sharpens the localization of small targets. Across all scales, PBZGNet clearly outperforms YOLOv11. Its lightweight variant, PBZGNet-n, attains 83.9% mAP@50 with only 2.91 M parameters and 7.7 GFLOPs—9.3% above YOLOv11-n. The full PBZGNet surpasses the current best substation model, YOLO-SD, by 7.3% mAP@50, setting a new state of the art (SOTA). Full article

(This article belongs to the Special Issue Deep Learning Based Intelligent Fault Diagnosis)

► Show Figures

Figure 1

19 pages, 4278 KB

Open AccessArticle

Research on Transfer Learning-Based Fault Diagnosis for Planetary Gearboxes Under Cross-Operating Conditions via IDANN

by Xiaolu Wang, Aiguo Wang, Haoyu Sun and Xin Xia

Information 2025, 16(12), 1112; https://doi.org/10.3390/info16121112 - 18 Dec 2025

Viewed by 800

Abstract

To address the limited performance of transfer fault diagnosis for planetary gearboxes under cross-operating conditions, which is caused by the heterogeneous feature distribution of vibration data and insufficient feature extraction. An improved domain-adversarial neural network (IDANN) model based on a joint-adaptive-domain alignment component [...] Read more.

To address the limited performance of transfer fault diagnosis for planetary gearboxes under cross-operating conditions, which is caused by the heterogeneous feature distribution of vibration data and insufficient feature extraction. An improved domain-adversarial neural network (IDANN) model based on a joint-adaptive-domain alignment component and a dual-branch feature extractor is proposed. Firstly, a joint domain adaptation alignment approach, integrating maximum mean discrepancy (MMD) and CORrelation ALignment (CORAL), is proposed to realize the correlation structure matching of features between the source and target domains of IDANN. Secondly, a dual-branch feature extractor composed of ResNet18 and Swin Transformer is proposed with an attention-weighted fusion mechanism to enhance feature extraction. Finally, validation experiments conducted on public planetary gearbox fault datasets show that the proposed method attains high accuracy and stable performance in cross-operating-condition transfer fault diagnosis. Full article

► Show Figures

Figure 1

24 pages, 15414 KB

Open AccessArticle

TAF-YOLO: A Small-Object Detection Network for UAV Aerial Imagery via Visible and Infrared Adaptive Fusion

by Zhanhong Zhuo, Ruitao Lu, Yongxiang Yao, Siyu Wang, Zhi Zheng, Jing Zhang and Xiaogang Yang

Remote Sens. 2025, 17(24), 3936; https://doi.org/10.3390/rs17243936 - 5 Dec 2025

Cited by 14 | Viewed by 2854

Abstract

Detecting small objects from UAV-captured aerial imagery is a critical yet challenging task, hindered by factors such as small object size, complex backgrounds, and subtle inter-class differences. Single-modal methods lack the robustness for all-weather operation, while existing multimodal solutions are often too computationally [...] Read more.

Detecting small objects from UAV-captured aerial imagery is a critical yet challenging task, hindered by factors such as small object size, complex backgrounds, and subtle inter-class differences. Single-modal methods lack the robustness for all-weather operation, while existing multimodal solutions are often too computationally expensive for deployment on resource-constrained UAVs. To this end, we propose TAF-YOLO, a lightweight and efficient multimodal detection framework designed to balance accuracy and efficiency. First, we propose an early fusion module, the Two-branch Adaptive Fusion Network (TAFNet), which adaptively integrates visible and infrared information at both pixel and channel levels before the feature extractor, maximizing complementary data while minimizing redundancy. Second, we propose a Large Adaptive Selective Kernel (LASK) module that dynamically expands the receptive field using multi-scale convolutions and spatial attention, preserving crucial details of small objects during downsampling. Finally, we present an optimized feature neck architecture that replaces PANet’s bidirectional path with a more efficient top-down pathway. This is enhanced by a Dual-Stream Attention Bridge (DSAB) that injects high-level semantics into low-level features, improving localization without significant computational overhead. On the VEDAI benchmark, TAF-YOLO achieves 67.2% mAP₅₀, outperforming the CFT model by 2.7% and demonstrating superior performance against seven other YOLO variants. Our work presents a practical and powerful solution that enables real-time, all-weather object detection on resource-constrained UAVs. Full article

(This article belongs to the Special Issue Target Detection, Recognition, Tracking, and Positioning Using Remote Sensing and AI Techniques)

► Show Figures

Figure 1

19 pages, 656 KB

Open AccessArticle

Bias-Alleviated Zero-Shot Sports Action Recognition Enabled by Multi-Scale Semantic Alignment

by Qiang Zheng, Wen Qin, Fanyi Meng and Hongyang Liu

Symmetry 2025, 17(11), 1959; https://doi.org/10.3390/sym17111959 - 14 Nov 2025

Viewed by 874

Abstract

Zero-shot action recognition remains challenging due to the visual–semantic gap and the persistent bias toward seen classes, particularly under the generalized setting where both seen and unseen categories appear during inference. To address these issues, we propose Multi-Scale Semantic Alignment framework for Zero-Shot [...] Read more.

Zero-shot action recognition remains challenging due to the visual–semantic gap and the persistent bias toward seen classes, particularly under the generalized setting where both seen and unseen categories appear during inference. To address these issues, we propose Multi-Scale Semantic Alignment framework for Zero-Shot Sports Action Recognition (MSA-ZSAR), a framework that integrates a multi-scale spatiotemporal feature extractor to capture both coarse and fine-grained motion dynamics, a dual-branch semantic alignment strategy that adapts to different levels of semantic availability, and a bias-suppression mechanism to improve the balance between seen and unseen recognition. This design ensures that the model can effectively align visual features with semantic representations while alleviating overfitting to source classes. Extensive experiments demonstrate the effectiveness of the proposed framework. MSA-ZSAR achieves 52.8% unseen accuracy, 69.7% seen accuracy, and 61.3% harmonic mean, consistently surpassing prior approaches. These results confirm that the proposed framework delivers balanced and superior performance in realistic generalized zero-shot scenarios. Full article

(This article belongs to the Special Issue Application of Symmetry/Asymmetry and Machine Learning)

► Show Figures

Figure 1

24 pages, 59144 KB

Open AccessArticle

EWAM: Scene-Adaptive Infrared-Visible Image Matching with Radiation-Prior Encoding and Learnable Wavelet Edge Enhancement

by Mingwei Li, Hai Tan, Haoran Zhai and Jinlong Ci

Remote Sens. 2025, 17(22), 3666; https://doi.org/10.3390/rs17223666 - 7 Nov 2025

Viewed by 1267

Abstract

Infrared–visible image matching is a prerequisite for environmental monitoring, military reconnaissance, and multisource geospatial analysis. However, pronounced texture disparities, intensity drift, and complex non-linear radiometric distortions in such cross-modal pairs mean that existing frameworks such as SuperPoint + SuperGlue (SP + SG) and [...] Read more.

Infrared–visible image matching is a prerequisite for environmental monitoring, military reconnaissance, and multisource geospatial analysis. However, pronounced texture disparities, intensity drift, and complex non-linear radiometric distortions in such cross-modal pairs mean that existing frameworks such as SuperPoint + SuperGlue (SP + SG) and LoFTR cannot reliably establish correspondences. To address this issue, we propose a dual-path architecture, the Environment-Adaptive Wavelet Enhancement and Radiation Priors Aided Matcher (EWAM). EWAM incorporates two synergistic branches: (1) an Environment-Adaptive Radiation Feature Extractor, which first classifies the scene according to radiation-intensity variations and then incorporates a physical radiation model into a learnable gating mechanism for selective feature propagation; (2) a Wavelet-Transform High-Frequency Enhancement Module, which recovers blurred edge structures by boosting wavelet coefficients under directional perceptual losses. The two branches collectively increase the number of tie points (reliable correspondences) and refine their spatial localization. A coarse-to-fine matcher subsequently refines the cross-modal correspondences. We benchmarked EWAM against SIFT, AKAZE, D2-Net, SP + SG, and LoFTR on a newly compiled dataset that fuses GF-7, Landsat-8, and Five-Billion-Pixels imagery. Across desert, mountain, gobi, urban and farmland scenes, EWAM reduced the average RMSE to 1.85 pixels and outperformed the best competing method by 2.7%, 2.6%, 2.0%, 2.3% and 1.8% in accuracy, respectively. These findings demonstrate that EWAM yields a robust and scalable framework for large-scale multi-sensor remote-sensing data fusion. Full article

► Show Figures

Graphical abstract

28 pages, 14783 KB

Open AccessArticle

HSSTN: A Hybrid Spectral–Structural Transformer Network for High-Fidelity Pansharpening

by Weijie Kang, Yuan Feng, Yao Ding, Hongbo Xiang, Xiaobo Liu and Yaoming Cai

Remote Sens. 2025, 17(19), 3271; https://doi.org/10.3390/rs17193271 - 23 Sep 2025

Viewed by 1577

Abstract

Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS [...] Read more.

Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS and PAN data. Consequently, spectral distortion and spatial degradation often occur, limiting high-precision downstream applications. To address these issues, this work proposes a Hybrid Spectral–Structural Transformer Network (HSSTN) that enhances multi-level collaboration through comprehensive modelling of spectral–structural feature complementarity. Specifically, the HSSTN implements a three-tier fusion framework. First, an asymmetric dual-stream feature extractor employs a residual block with channel attention (RBCA) in the MS branch to strengthen spectral representation, while a Transformer architecture in the PAN branch extracts high-frequency spatial details, thereby reducing modality discrepancy at the input stage. Subsequently, a target-driven hierarchical fusion network utilises progressive crossmodal attention across scales, ranging from local textures to multi-scale structures, to enable efficient spectral–structural aggregation. Finally, a novel collaborative optimisation loss function preserves spectral integrity while enhancing structural details. Comprehensive experiments conducted on QuickBird, GaoFen-2, and WorldView-3 datasets demonstrate that HSSTN outperforms existing methods in both quantitative metrics and visual quality. Consequently, the resulting images exhibit sharper details and fewer spectral artefacts, showcasing significant advantages in high-fidelity remote sensing image fusion. Full article

(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)

► Show Figures

Figure 1

21 pages, 3261 KB

Open AccessArticle

A Driving-Preference-Aware Framework for Vehicle Lane Change Prediction

by Ying Lyu, Yulin Wang, Huan Liu, Xiaoyu Dong, Yifan He and Yilong Ren

Sensors 2025, 25(17), 5342; https://doi.org/10.3390/s25175342 - 28 Aug 2025

Cited by 3 | Viewed by 1723

Abstract

With the development of intelligent connected vehicle and artificial intelligence technologies, mixed traffic scenarios where autonomous and human-driven vehicles coexist are becoming increasingly common. Autonomous vehicles need to accurately predict the lane change behavior of preceding vehicles to ensure safety. However, lane change [...] Read more.

With the development of intelligent connected vehicle and artificial intelligence technologies, mixed traffic scenarios where autonomous and human-driven vehicles coexist are becoming increasingly common. Autonomous vehicles need to accurately predict the lane change behavior of preceding vehicles to ensure safety. However, lane change behavior of human-driven vehicles is influenced by both environmental factors and driver preferences, which increases its uncertainty and makes prediction more difficult. To address this challenge, this paper focuses on the mining of driving preferences and the prediction of lane change behavior. We clarify the definition of driving preference and its relationship with driving style and construct a representation of driving operations based on vehicle dynamics parameters and statistical features. A preference feature extractor based on the SimCLR contrastive learning framework is designed to capture high-dimensional driving preference features through unsupervised learning, effectively distinguishing between aggressive, normal, and conservative driving styles. Furthermore, a dual-branch lane change prediction model is proposed, which fuses explicit temporal features of vehicle states with implicit driving preference features, enabling efficient integration of multi-source information. Experimental results on the HighD dataset show that the proposed model significantly outperforms traditional models such as Transformer and LSTM in lane change prediction accuracy, providing technical support for improving the safety and human-likeness of autonomous driving decision-making. Full article

(This article belongs to the Special Issue Applications of Advanced Sensors and Interoperability Technologies in Autonomous Transportation Systems)

► Show Figures

Figure 1

18 pages, 4494 KB

Open AccessArticle

MDFN: Enhancing Power Grid Image Quality Assessment via Multi-Dimension Distortion Feature

by Zhenyu Chen, Jianguang Du, Jiwei Li and Hongwei Lv

Sensors 2025, 25(11), 3414; https://doi.org/10.3390/s25113414 - 29 May 2025

Cited by 2 | Viewed by 1227

Abstract

Low-quality power grid image data can greatly affect the effect of deep learning in the power industry. Therefore, adopting accurate image quality assessment techniques is essential for screening high-quality power grid images. Although current blind image quality assessment (BIQA) methods have made some [...] Read more.

Low-quality power grid image data can greatly affect the effect of deep learning in the power industry. Therefore, adopting accurate image quality assessment techniques is essential for screening high-quality power grid images. Although current blind image quality assessment (BIQA) methods have made some progress, they usually use only one type of feature and ignore other factors that affect the quality of images, such as noise and brightness, which are highly relevant to low-quality power grid images with noise, underexposure, and overexposure. Therefore, we propose a multi-dimension distortion feature network (MDFN) based on CNN and Transformer, which considers high-frequency (edges and details) and low-frequency (semantic and structural) features of images, along with noise and brightness features, to achieve more accurate quality assessment. Specifically, the network employs a dual-branch feature extractor, where the CNN branch captures local distortion features and the Transformer branch integrates both local and global features. We argue that separating low-frequency and high-frequency components enables richer distortion features. Thus, we propose a frequency selection module (FSM) which extracts high-frequency and low-frequency features and updates these features to achieve global spatial information fusion. Additionally, previous methods only use the CLS token for predicting the quality score of the image. Considering the issues of severe noise and exposure in power grid images, we design an effective way to extract noise and brightness features and combine them with the CLS token for the prediction. The results of the experiments indicate that our method surpasses existing approaches across three public datasets and a power grid image dataset, which shows the superiority of our proposed method. Full article

(This article belongs to the Special Issue Computer Vision and Sensing Technologies for Industrial Quality Inspection: 2nd Edition)

► Show Figures

Figure 1

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI