MDPI - Publisher of Open Access Journals

42 pages, 16998 KB

Open AccessArticle

FSD-Net: A Siamese Dual Detail Recovery Network for High Resolution Remote Sensing Change Detection Based on Frequency Domain Sensing

by Jiajian Li, Ran Peng, Yuhao Nie, Shengyuan Zhi, Zhuolun He and Xiaoyan Chen

Appl. Sci. 2026, 16(9), 4240; https://doi.org/10.3390/app16094240 (registering DOI) - 26 Apr 2026

Abstract

High-resolution remote sensing image change detection holds significant application value in the fields of urban planning, disaster assessment, and others. However, it faces the dual challenge of pseudo-change interference and loss of detailed information. To address these issues, a frequency-domain-aware Siamese detail recovery [...] Read more.

High-resolution remote sensing image change detection holds significant application value in the fields of urban planning, disaster assessment, and others. However, it faces the dual challenge of pseudo-change interference and loss of detailed information. To address these issues, a frequency-domain-aware Siamese detail recovery network (FSD-Net) is designed in this paper. Firstly, from the perspective of frequency domain analysis, a theory on the dual roles of frequency domain components is introduced to reveal the robustness of low-frequency components to pseudo-changes and the dual semantic noise attributes of high-frequency components. Based on this theory, a frequency-aware context-guided difference (FCGD) module is designed. By explicitly decoupling the difference features into low-frequency global components and high-frequency residual components, it utilizes the prior low-frequency scene as a semantic gate to adaptively modulate the high-frequency differences, which effectively suppress pseudo-change interference. Subsequently, a detail recovery block (DRB), based on sub-pixel convolution, is constructed. This achieves unbiased spatial rearrangement through the semantic redundancy of channel dimensions, which avoids the checkerboard artifacts of traditional upsampling, and by employing a progressive multi-stage upsampling strategy to integrate shallow detail features from the encoder. The experimental results on the three public datasets of LEVIR-CD, WHU-CD, and CDD-CD demonstrate that the FSD-Net outperforms current mainstream methods (e.g., ChangeFormer, BAN, and so on) in core metrics such as F1 score and IoU, with a particularly significant improvement in recall. The ablation experiments validate the effectiveness and complementarity of the FCGD and DRB. Parameter sensitivity analysis indicates that the auxiliary loss weight λ is dataset dependent, with λ = 0.1 serving as a robust default choice. This study provides an efficient and reliable solution for change detection in high-resolution remote sensing imagery. Full article

(This article belongs to the Special Issue Deep Learning and Machine Learning in Image Processing and Pattern Recognition—Second Edition)

26 pages, 4019 KB

Open AccessArticle

MSWA-ResNet: Multi-Scale Wavelet Attention for Patient-Level and Interpretable Breast Cancer Histopathology Classification

by Ghadeer Al Sukkar, Ali Rodan and Azzam Sleit

J. Imaging 2026, 12(4), 176; https://doi.org/10.3390/jimaging12040176 - 19 Apr 2026

Viewed by 139

Abstract

Breast cancer histopathological classification is critical for diagnosis and treatment planning, yet manual assessment remains time-consuming and subject to inter-observer variability. Although deep learning approaches have advanced automated analysis, image-level data splitting may introduce data leakage, and spatial-domain architectures lack explicit multi-scale frequency [...] Read more.

Breast cancer histopathological classification is critical for diagnosis and treatment planning, yet manual assessment remains time-consuming and subject to inter-observer variability. Although deep learning approaches have advanced automated analysis, image-level data splitting may introduce data leakage, and spatial-domain architectures lack explicit multi-scale frequency modeling. This study proposes MSWA-ResNet, a Multi-Scale Wavelet Attention Residual Network that embeds recursive discrete wavelet decomposition within residual blocks to enable frequency-aware and scale-aware feature learning. The model is evaluated on the BreakHis dataset using a strict patient-level protocol with 70/30 patient-wise splitting, five-fold stratified cross-validation, ensemble prediction, and hierarchical aggregation from patch to patient level. MSWA-ResNet achieves 96% patient-level accuracy at 100×, 200×, and 400× magnifications, and 92% at 40×, with F1-scores of 0.97 and 0.94, respectively. At 200× and 400×, accuracy improves from 0.92 to 0.96 and F1-score from 0.94 to 0.97 over baseline CNNs while maintaining 11.8–12.1 M parameters and 2.5–4.8 ms inference time. Grad-CAM demonstrates improved localization of diagnostically relevant regions, indicating that explicit multi-scale frequency modeling enhances accurate and interpretable patient-level classification. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

27 pages, 3706 KB

Open AccessArticle

Simulation-Driven Spatial Frequency Domain Imaging and Deep Learning for Subsurface Fruit Bruise Discrimination

by Jinchen Han, Yanlin Song and Xiaping Fu

Foods 2026, 15(8), 1397; https://doi.org/10.3390/foods15081397 - 17 Apr 2026

Viewed by 246

Abstract

Conventional spatial frequency domain imaging (SFDI) based optical property inversion is inefficient, while deep learning methods suffer from heavy reliance on large-scale real datasets. To address this contradiction, a simulation-driven approach for subsurface fruit bruise discrimination was proposed. An SFDI simulation environment was [...] Read more.

Conventional spatial frequency domain imaging (SFDI) based optical property inversion is inefficient, while deep learning methods suffer from heavy reliance on large-scale real datasets. To address this contradiction, a simulation-driven approach for subsurface fruit bruise discrimination was proposed. An SFDI simulation environment was built with Blender to generate 800 paired datasets of diffuse reflectance images and optical transport coefficients, overcoming the high cost and long cycle of real dataset acquisition. We designed the CBAM-GAN-U-Net model and adopted surface profile correction in the prediction method to eliminate curved surface-induced non-planar distortion, with the whole method validated on liquid phantoms, green apples and crown pears. This prediction method achieved high accuracy in predicting the reduced scattering coefficient μ_s′, with NMAE of 0.021 ± 0.007 (phantoms), 0.039 ± 0.012 (severely bruised green apples) and 0.044 ± 0.015 (severely bruised crown pears), outperforming U-Net and GANPOP. Based on the predicted μ_s′, a discrimination strategy combining coefficient of variation, mean ratio and receiver operating characteristic (ROC) curve analysis was adopted, attaining 100% accuracy for non-bruised/bruised fruit discrimination, with misclassification rates of 6% (green apples) and 8% (crown pears) for mild/severe bruise differentiation. This method enables accurate subsurface fruit bruise detection, providing a reliable technical solution for the fruit and vegetable industry and helping reduce postharvest supply chain losses. Full article

(This article belongs to the Section Food Analytical Methods)

► Show Figures

Figure 1

29 pages, 7372 KB

Open AccessArticle

Multi-Scale Frequency-Aware Representation Learning for Infrared and Visible Image Fusion

by Chuanwen Hu, Zheyi Hu, Chuan Xu, Zhina Song and Liye Mei

Remote Sens. 2026, 18(8), 1178; https://doi.org/10.3390/rs18081178 - 15 Apr 2026

Viewed by 302

Abstract

Infrared and visible image fusion aims to integrate complementary information from heterogeneous sensors for remote sensing and Earth-observation applications. To achieve a better balance between global contextual modeling and local structural preservation, we propose MSF-Net, a multi-scale frequency-aware fusion network with a hierarchical [...] Read more.

Infrared and visible image fusion aims to integrate complementary information from heterogeneous sensors for remote sensing and Earth-observation applications. To achieve a better balance between global contextual modeling and local structural preservation, we propose MSF-Net, a multi-scale frequency-aware fusion network with a hierarchical design. The proposed framework consists of two main stages: multi-scale feature extraction with frequency-domain interaction and hierarchical cross-modal fusion. Specifically, a hybrid spatial-frequency encoding block (HSFEB) is designed as the basic building unit, which combines a spatial-frequency interaction module (SFIM) for global context aggregation in the frequency domain and a structure-guided feature refinement module (SGFRM) for preserving local structural details. In addition, a hierarchical feature fusion module (HFFM) is introduced to progressively integrate cross-modal and cross-scale features in a coarse-to-fine manner. A joint loss function, composed of intensity and structural constraints, is adopted to supervise the fusion process. Extensive experiments on three public benchmarks, MSRS, M³FD, and TNO, demonstrate that MSF-Net achieves superior performance over nine SOTA methods in both qualitative and quantitative evaluations. The results show that the proposed method effectively enhances thermal targets, preserves structural details, and maintains good visual naturalness under diverse remote-sensing scenarios. Full article

(This article belongs to the Special Issue Application of Spatial Information Science and Cartography in the Big Remotely Sensed Data Era)

► Show Figures

Figure 1

31 pages, 6244 KB

Open AccessArticle

Physics-Driven Multi-Modal Fusion for SAR Ship Detection Under Motion Defocusing

by Xinmei Qiang, Ze Yu, Xianxun Yao, Dongxu Li, Ruijuan Deng, Na Pu and Shengjie Zhong

Remote Sens. 2026, 18(8), 1166; https://doi.org/10.3390/rs18081166 - 14 Apr 2026

Viewed by 353

Abstract

Synthetic aperture radar (SAR) ship detection is severely limited by the artifacts caused by motion. Due to the complex six-degree-of-freedom (6-DOF) motion of ships, the ship imaging exhibits aberration phenomena including spatial blurring, discrete ghosting, and Lorentz linear blurring. Traditional detectors rely on [...] Read more.

Synthetic aperture radar (SAR) ship detection is severely limited by the artifacts caused by motion. Due to the complex six-degree-of-freedom (6-DOF) motion of ships, the ship imaging exhibits aberration phenomena including spatial blurring, discrete ghosting, and Lorentz linear blurring. Traditional detectors rely on the identification of static spatial features. When the phase coherence is disrupted, they tend to fail. To overcome this problem, we propose a multimodal fusion framework based on physical principles. This framework establishes a theoretical connection between the ship hydrodynamic response and imaging degradation through short, long, and ultra-long coherence processing intervals (CPI). The framework adopts a cascaded architecture: first, a lightweight YOLOv8 performs rapid global screening, followed by a signal backtracking mechanism that extracts high-fidelity time-frequency domain (TFD) and range instantaneous Doppler (RID) features from the original distance compressed data. In the second-level detection, these physical features are adaptively fused with spatial intensity through a YOLOv8 network integrated with the convolutional block attention module (CBAM) to reduce the false detection rate. The validation on high-fidelity simulations and real GF-3 datasets shows that this method consistently achieves an average precision (mAP) of over 95%, outperforming several widely used detectors, and demonstrates strong generalization ability in extreme imaging conditions, suitable for maritime detection scenarios. Full article

(This article belongs to the Special Issue Ship Imaging, Detection and Recognition for High-Resolution SAR)

► Show Figures

Figure 1

25 pages, 4032 KB

Open AccessArticle

CoFiWaveMamba: A Coarse-to-Fine Wavelet-Guided Mamba Network for Single Image Dehazing

by Qiang Fu, Boyu Lu and Chongyao Yan

Electronics 2026, 15(8), 1599; https://doi.org/10.3390/electronics15081599 - 11 Apr 2026

Viewed by 213

Abstract

Single image dehazing remains challenging because haze simultaneously distorts global illumination, scene structure, and fine textures, making rigid low–high frequency decoupling prone to error propagation and detail inconsistency. To address this issue, we propose CoFiWaveMamba, a coarse-to-fine wavelet-guided Mamba network for single image [...] Read more.

Single image dehazing remains challenging because haze simultaneously distorts global illumination, scene structure, and fine textures, making rigid low–high frequency decoupling prone to error propagation and detail inconsistency. To address this issue, we propose CoFiWaveMamba, a coarse-to-fine wavelet-guided Mamba network for single image dehazing. The proposed method first employs wavelet decomposition to separate low- and high-frequency components. For low-frequency restoration, a 2D selective-scan Mamba-based module is introduced to capture long-range dependencies, combined with lightweight high-frequency-guided spatial modulation and Shuffle-guided Sequence Attention, we design a progressive coarse-to-fine refinement strategy that combines Fourier-domain global spectral consistency with wavelet-domain directional detail representation, enabling more targeted recovery of edges and textures. Experiments on synthetic and real dehazing benchmarks, including Haze4K, RESIDE-6K, HSTS-SYNTHETIC, I-Haze, NH-Haze, Dense-Haze, and O-HAZE, as well as ablation studies, verify the effectiveness of the proposed design. Overall, CoFiWaveMamba provides a more coordinated solution for global haze removal and local detail reconstruction, helping suppress residual haze, ringing artifacts, oversharpening, and texture inconsistency while restoring clearer and more natural images. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

17 pages, 22263 KB

Open AccessArticle

Coarse-to-Fine GAN for Image Inpainting via Transformer and Channel-Frequency Encoder

by Shibin Wang, Yubo Xu, Dehuang Qin, Dong Liu and Xueshan Li

Electronics 2026, 15(8), 1580; https://doi.org/10.3390/electronics15081580 - 10 Apr 2026

Viewed by 214

Abstract

Image inpainting aims to recover missing regions in damaged images while preserving structural coherence and textural authenticity. Although deep learning methods based on generative adversarial networks (GAN) have made significant progress, they still face challenges in modeling long-range dependencies and maintaining semantic consistency, [...] Read more.

Image inpainting aims to recover missing regions in damaged images while preserving structural coherence and textural authenticity. Although deep learning methods based on generative adversarial networks (GAN) have made significant progress, they still face challenges in modeling long-range dependencies and maintaining semantic consistency, especially when large areas are missing. To address these issues, we propose an innovative multi-stage restoration framework. The coarse restoration stage incorporates attention via a transformer architecture, while the refinement stage introduces a plug-and-play channel-frequency encoder (CF-Encoder). This encoder effectively models both global structure and local details by hierarchically extracting and enhancing features through frequency-domain decomposition combined with an adaptive spatial-channel attention mechanism. Furthermore, we employ a bi-discriminator fusion mechanism to stabilize training and enhance perceptual quality. Experiments across multiple benchmark datasets demonstrate our method’s superior performance in both quantitative metrics and visual fidelity, with particularly notable advantages in high-missing-value scenarios. Full article

(This article belongs to the Special Issue AI-Powered Visual Intelligence: Tasks, Methods, and Real-World Applications)

► Show Figures

Figure 1

22 pages, 3941 KB

Open AccessArticle

CSFCNet: Cascaded Spatial-Frequency Convolutional Network for Hyperspectral Image Classification

by Feng Jiang, Xin Liu, Mingxuan Li, Ting Nie and Liang Huang

Sensors 2026, 26(8), 2325; https://doi.org/10.3390/s26082325 - 9 Apr 2026

Viewed by 375

Abstract

CNNs can effectively extract features with low computational costs, achieving significant progress in hyperspectral image classification. However, due to the limited receptive field of CNNs, they have difficulty in capturing the multi-scale structural and global contextual information. Moreover, the class imbalance in hyperspectral [...] Read more.

CNNs can effectively extract features with low computational costs, achieving significant progress in hyperspectral image classification. However, due to the limited receptive field of CNNs, they have difficulty in capturing the multi-scale structural and global contextual information. Moreover, the class imbalance in hyperspectral images often causes the model to focus disproportionately on certain spectral bands, thereby reducing the average accuracy. To address these challenges, a method called the Cascaded Spatial-Frequency Convolutional Network (CSFCNet) was proposed for hyperspectral image classification. It integrates rich spatial-domain information and frequency-domain information by jointly modeling both domains. Specifically, a Dual Spatial Fourier Convolution (DSF-Conv) module was proposed to project feature maps into parallel spatial and frequency representations. In the Spatial pathway, input features are grouped and processed with multi-scale convolutions to extract hierarchical structures; in the Fourier pathway, frequency-domain convolutions can aggregate the global context. Subsequently, a group-cascaded structure connects the DSF-Conv modules with residual connections, alleviating the class imbalance problem by promoting more balanced contributions from different spectral components. Additionally, we introduce a Lightweight Local Attention module to enhance the feature discrimination. Furthermore, experiments on three datasets achieved competitive accuracies, demonstrating the effectiveness of CSFCNet. Ablation studies further verify the effectiveness of the core components within the network. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

23 pages, 9328 KB

Open AccessArticle

High-Resolution Multiband 3D Imaging of Egyptian Papyri: Integrating Ultra-Close-Range Photogrammetry and Reflectance Transformation Imaging for Enhanced Documentation

by Marco Gargano, Gianmarco Borghi, Eleonora Verni, Francesca Gaia Maiocchi, Sonia Antoniazzi, Viviana Goggi and Emanuela Grifoni

Sensors 2026, 26(7), 2242; https://doi.org/10.3390/s26072242 - 4 Apr 2026

Viewed by 469

Abstract

Egyptian papyri are commonly documented using high-resolution two-dimensional imaging, which enhances legibility but does not adequately capture the micrometric surface morphology required for material and conservation studies. To address this limitation, we developed and validated an integrated, fully non-contact imaging workflow combining Ultra-Close-Range [...] Read more.

Egyptian papyri are commonly documented using high-resolution two-dimensional imaging, which enhances legibility but does not adequately capture the micrometric surface morphology required for material and conservation studies. To address this limitation, we developed and validated an integrated, fully non-contact imaging workflow combining Ultra-Close-Range Multiband Photogrammetry with Reflectance Transformation Imaging (RTI) and normal map integration. The protocol was tested on six papyrus fragments from the Museo Egizio di Torino (XXI Dynasty–Byzantine period) exhibiting different conservation conditions. Multiband photogrammetry in the visible and visible-induced infrared luminescence bands achieved a Ground Sample Distance of 17 µm/px and a point cloud density of approximately 170 points/mm², enabling detailed analysis of fiber morphology, surface deformation, and the spatial distribution of Egyptian blue. RTI-based normal map integration provided complementary high-frequency surface information with reduced acquisition and processing times. To overcome RTI low-frequency distortions, a revised normal integration strategy was implemented using surface planarization and frequency-domain fusion with photogrammetric data based on Power Spectral Density analysis. The resulting hybrid models combine metric reliability with enhanced surface detail, providing a scalable and non-invasive approach for papyrological documentation and conservation research. Full article

(This article belongs to the Special Issue Advances in Multispectral Sensing, Imaging Techniques and Computational Applications in Cultural Heritage)

► Show Figures

Figure 1

31 pages, 6459 KB

Open AccessArticle

Cooperative Hybrid Domain Network for Salient Object Detection in Optical Remote Sensing Images

by Yi Gu, Jianhang Zhou and Lelei Yan

Remote Sens. 2026, 18(7), 1087; https://doi.org/10.3390/rs18071087 - 4 Apr 2026

Viewed by 342

Abstract

Salient Object Detection (SOD) in Optical Remote Sensing Images (ORSIs) aims to localize and segment visually prominent objects amidst complex backgrounds and extreme scale variations. However, we observe that current frequency-aware methods typically rely on a naive feature aggregation paradigm, merging frequency and [...] Read more.

Salient Object Detection (SOD) in Optical Remote Sensing Images (ORSIs) aims to localize and segment visually prominent objects amidst complex backgrounds and extreme scale variations. However, we observe that current frequency-aware methods typically rely on a naive feature aggregation paradigm, merging frequency and spatial features via simple concatenation, addition, or direct combination. This shallow interaction overlooks the inherent semantic misalignment between the two domains, resulting in feature redundancy and poor boundary delineation. To address this limitation, we propose the Cooperative Hybrid Domain Network (CHDNet), a framework designed to facilitate synergistic cooperation between heterogeneous domains. Specifically, we propose the Cross-Domain Multi-Head Self-Attention (CD-MHSA) mechanism as a semantic bridge following the encoder. It employs a dimension expansion strategy to construct a Unified Interaction Manifold and utilizes a Frequency Anchor Interaction mechanism to achieve precise modulation of spatial textures using global spectral cues. Furthermore, to address the dual challenges of lacking explicit interpretation mechanisms for semantic co-occurrence and the susceptibility of topological structures to fracture in complex scenes during the decoding phase, we design a Multi-Branch Cooperative Decoder (MBCD) comprising three parallel paths: edge semantics, global relations, and reverse correction. This module dynamically integrates these heterogeneous clues through a Cooperative Fusion Strategy, combining explicit global dependency modeling with dual-domain reverse mining. Extensive experiments on multiple benchmark datasets demonstrate that the proposed CHDNet achieves performance superior to state-of-the-art (SOTA) methods. Full article

► Show Figures

Figure 1

20 pages, 41296 KB

Open AccessArticle

Frequency-Domain Feature Learning Network for Joint Image Demosaicing and Denoising

by Donghui Zhang, Feiyu Li, Jun Yang and Le Yang

Mathematics 2026, 14(7), 1175; https://doi.org/10.3390/math14071175 - 1 Apr 2026

Viewed by 417

Abstract

The methods employed for image demosaicing and denoising play a pivotal role in image acquisition and restoration, and have been extensively studied over the past few decades. Traditionally, these tasks are performed sequentially, with demosaicing followed by denoising, or vice versa, treating each [...] Read more.

The methods employed for image demosaicing and denoising play a pivotal role in image acquisition and restoration, and have been extensively studied over the past few decades. Traditionally, these tasks are performed sequentially, with demosaicing followed by denoising, or vice versa, treating each process independently. While this approach can enhance image quality, it often leads to issues such as color inaccuracies and information loss, as the outcome of the first task influences the second. Consequently, the integration of joint demosaicing and denoising (JDD) has become a focal point in recent research. Deep convolutional neural networks have shown promising results in addressing JDD challenges. This study introduces an end-to-end network, termed the Frequency-domain Features learning Network (FFNet), designed to tackle the JDD problem. Unlike conventional methods that focus on spatial domain features, FFNet utilizes frequency-domain (FD) characteristics to capture both global and local image details. Based on the vision Transformer architecture, FFNet consists of two key components: a global Fourier block (GFB), which uses global attention to determine the weights of FD parameters, and an MLP-based local Fourier block (LFB), which improves local feature extraction. These blocks are integrated with a channel attention mechanism to form the frequency-domain attention block (FAB), the core element of FFNet. Extensive experimental results on benchmark datasets demonstrate that FFNet achieves superior performance in terms of both quantitative metrics (PSNR/SSIM) and visual quality compared to existing state-of-the-art JDD methods. Furthermore, we provide a comprehensive analysis of its computational efficiency, including parameter count, FLOPs, and inference time, showing a competitive trade-off between performance and complexity. Full article

► Show Figures

Figure 1

17 pages, 19835 KB

Open AccessArticle

Evaluating Curvature-Induced Variation in Deep Learning-Based Beamforming for Flexible Transducers in Ultrasound-Guided Radiation Therapy

by Ziwei Feng, Xinyue Huang, Hamed Hooshangnejad, Debarghya China, Junghoon Lee, Todd McNutt, Muyinatu A. Lediju Bell and Kai Ding

Bioengineering 2026, 13(4), 398; https://doi.org/10.3390/bioengineering13040398 - 29 Mar 2026

Viewed by 430

Abstract

Ultrasound imaging is a crucial tool for guiding radiation therapy, particularly for cancers such as pancreatic cancer, where tumors exhibit respiration-induced motion. While flexible ultrasound transducers offer improved anatomical conformity and reduced compression-induced distortion compared to rigid probes, their variable geometry presents significant [...] Read more.

Ultrasound imaging is a crucial tool for guiding radiation therapy, particularly for cancers such as pancreatic cancer, where tumors exhibit respiration-induced motion. While flexible ultrasound transducers offer improved anatomical conformity and reduced compression-induced distortion compared to rigid probes, their variable geometry presents significant challenges for conventional beamforming. In this study, we investigate a deep learning-based beamforming framework that directly predicts delayed RF data from raw RF input, bypassing explicit transducer shape estimation and traditional delay-and-sum computations. Building upon an artificial curvature simulation strategy, we systematically analyze the impact of curvature-induced variation and inherent RF noise on model performance and generalizability. We further introduce frequency-domain analysis to quantify RF-level signal variation that may not be apparent in spatial-domain image comparisons. Our results demonstrate that although noise-augmented training improves prediction consistency, reconstruction performance remains limited under the current prototype noise conditions. These findings highlight the importance of RF data diversity and noise characterization in developing clinically robust deep learning beamformers for flexible transducer-based ultrasound-guided radiation therapy. Full article

(This article belongs to the Special Issue Novel Imaging Techniques in Radiotherapy)

► Show Figures

Figure 1

30 pages, 11698 KB

Open AccessArticle

RShDet: An Adaptive Spectral-Aware Network for Remote Sensing Object Detection Under Haze Corruption

by Wei Zhang, Yuantao Wang, Haowei Yang and Xuerui Mao

Remote Sens. 2026, 18(7), 1020; https://doi.org/10.3390/rs18071020 - 29 Mar 2026

Viewed by 302

Abstract

Remote sensing (RS) object detection faces intrinsic challenges arising from the overhead imaging paradigm and the diversity of climatic conditions. In particular, atmospheric phenomena such as clouds and haze cause severe visual degradation, making reliable object detection difficult. However, most existing detectors are [...] Read more.

Remote sensing (RS) object detection faces intrinsic challenges arising from the overhead imaging paradigm and the diversity of climatic conditions. In particular, atmospheric phenomena such as clouds and haze cause severe visual degradation, making reliable object detection difficult. However, most existing detectors are developed under clear-weather conditions, which limits their generalization capability in realistic haze-degraded RS scenarios. To alleviate this issue, an adaptive spectral-aware network for RS object detection under haze interference is proposed, termed RShDet, which is designed to handle both high-altitude RS imagery and low-altitude Unmanned Aerial Vehicle (UAV) scenarios. Firstly, the Object-Centered Dynamic Enhancement (OCDE) module dynamically adjusts the spatial positions of key-value pairs through query-agnostic offsets, enabling the network to emphasize object-relevant regions while suppressing haze-induced background interference. Secondly, the Dynamic Multi-Spectral Perception and Filtering (DSPF) module introduces a multi-spectral attention mechanism that adaptively selects informative frequency components, thereby enhancing discriminative feature representations in hazy environments. Thirdly, the Frequency-Domain Multi-Feature Fusion (FDMF) module employs learnable weights to complementarily integrate amplitude and phase information in the frequency domain, enabling effective cross-task feature interaction between the enhancement and detection branches. Extensive experiments demonstrate that RShDet consistently achieves superior detection performance under hazy conditions across both synthetic and real-world benchmarks. Specifically, it achieves improvements of 2.4% mAP50 on Hazy-DOTA, 1.9% mAP on HazyDet, and 2.33% mAP on the real-world foggy dataset RTTS, surpassing existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Remote Sensing Image Target Detection and Recognition)

► Show Figures

Figure 1

24 pages, 1020 KB

Open AccessArticle

Research on the Diagnosis of Abnormal Sound Defects in Automobile Engines Based on Fusion of Multi-Modal Images and Audio

by Yi Xu, Wenbo Chen and Xuedong Jing

Electronics 2026, 15(7), 1406; https://doi.org/10.3390/electronics15071406 - 27 Mar 2026

Viewed by 379

Abstract

Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. [...] Read more.

Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. Existing multi-modal fusion methods fail to deeply mine the physical coupling between cross-modal features and often entail excessive model complexity, hindering deployment on resource-constrained on-board edge devices. To resolve these limitations, this study proposes a Physical Prior-Embedded Cross-Modal Attention (PPE-CMA) mechanism for lightweight multi-modal fusion diagnosis of engine abnormal sound defects. First, wavelet packet decomposition (WPD) and mel-frequency cepstral coefficients (MFCC) are integrated to extract time-frequency features from engine audio signals, while a channel-pruned ResNet18 is employed to extract spatial features from engine thermal imaging and vibration visualization images. Second, the PPE-CMA module is designed to adaptively assign attention weights to audio and image features by exploiting the physical coupling between engine fault acoustic and visual characteristics, enabling efficient cross-modal feature fusion with redundant information suppression. A rigorous theoretical derivation is provided to link cosine similarity with the physical correlation of engine fault acoustic-visual features, justifying the attention weight constraint (β = 1 − α) from the perspective of fault feature physical coupling. Third, an improved lightweight XGBoost classifier is constructed for fault classification, and a hybrid data augmentation strategy customized for engine multi-modal data is proposed to address the small-sample challenge in industrial applications. Ablation experiments on ResNet18 pruning ratios verify the optimal trade-off between diagnostic performance and computational efficiency, while feature distribution analysis validates the authenticity and effectiveness of the hybrid augmentation strategy. Experimental results on a self-constructed multi-modal dataset show that the proposed method achieves 98.7% diagnostic accuracy and a 98.2% F1-score, retaining 96.5% accuracy under 90 dB high-level environmental noise, with an end-to-end inference speed of 0.8 ms per sample (including preprocessing, feature extraction, and classification). Cross-engine and cross-domain validation on a 2.0T diesel engine small-sample dataset and the open-source SEMFault-2024 dataset yield average accuracies of 94.8% and 95.2%, respectively, demonstrating strong generalization. This method effectively enhances the accuracy and robustness of engine abnormal sound defect diagnosis, offering a lightweight technical solution for on-board real-time fault diagnosis and in-plant online quality inspection. By reducing engine fault-induced energy loss and spare parts waste, it further promotes energy conservation and emission reduction in the automotive industry. Quantified experimental data on fuel efficiency improvement and carbon emission reduction are provided to substantiate the ecological benefits of the proposed framework. Full article

► Show Figures

Figure 1

23 pages, 131728 KB

Open AccessArticle

Hyperspectral Image Reconstruction Based on State Space Models

by Xuguang Wang, Haozhe Zhou, Tongxin Wei and Yanchao Zhang

Remote Sens. 2026, 18(7), 990; https://doi.org/10.3390/rs18070990 - 25 Mar 2026

Viewed by 464

Abstract

To address the high hardware costs associated with hyperspectral imaging in precision agriculture, spectral reconstruction (SR) is emerging as a feasible solution for obtaining hyperspectral images. However, existing methods, mainly including CNN and Transformer, face a notable dilemma: convolutional neural networks (CNNs) are [...] Read more.

To address the high hardware costs associated with hyperspectral imaging in precision agriculture, spectral reconstruction (SR) is emerging as a feasible solution for obtaining hyperspectral images. However, existing methods, mainly including CNN and Transformer, face a notable dilemma: convolutional neural networks (CNNs) are limited by their local receptive fields, while Transformers encounter the problem of quadratic computational complexity. Effectively balancing computational efficiency with the capture of long-range spatial dependencies remains a significant challenge. To this end, this study proposes FGA-Mamba (Frequency-Gradient Attention Mamba), a novel reconstruction network based on the Mamba architecture. This network introduces a Frequency-Visual State Space (F-VSS) module, which combines the linear long-range modeling capability of state space models (SSMs) with a frequency-domain self-calibration mechanism to enhance global structural consistency by explicitly modulating frequency features. In addition, we designed an Enhanced Gradient Attention Module (EGAM). This module optimizes local feature representation through a gradient-aware mechanism, effectively compensating for the loss of spatial details. Experimental results on 3 datasets shows that FGA-Mamba have significant improvement in both quantitative and qualitative metrics. Moreover, the high consistency observed in vegetation index (VI) calculations confirms its potential for practical agricultural application. Full article

(This article belongs to the Special Issue AI-Driven Remote Sensing Image Restoration and Generation)

► Show Figures

Figure 1

Search Results (414)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (414)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI