Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = temporal multi-scale fine-grained fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 6223 KB  
Article
MSMCD: A Multi-Stage Mamba Network for Geohazard Change Detection
by Liwei Qin, Quan Zou, Guoqing Li, Wenyang Yu, Lei Wang, Lichuan Chen and Heng Zhang
Remote Sens. 2026, 18(1), 108; https://doi.org/10.3390/rs18010108 - 28 Dec 2025
Viewed by 281
Abstract
Change detection plays a crucial role in geological disaster tasks such as landslide identification, post-earthquake building reconstruction assessment, and unstable rock mass monitoring. However, real-world scenarios often pose significant challenges, including complex surface backgrounds, illumination and seasonal variations between temporal phases, and diverse [...] Read more.
Change detection plays a crucial role in geological disaster tasks such as landslide identification, post-earthquake building reconstruction assessment, and unstable rock mass monitoring. However, real-world scenarios often pose significant challenges, including complex surface backgrounds, illumination and seasonal variations between temporal phases, and diverse change patterns. To address these issues, this paper proposes a multi-stage model for geological disaster change detection, termed MSMCD, which integrates strategies of global dependency modeling, local difference enhancement, edge constraint, and frequency-domain fusion to achieve precise perception and delineation of change regions. Specifically, the model first employs a DualTimeMamba (DTM) module for two-dimensional selective scanning state-space modeling, explicitly capturing cross-temporal long-range dependencies to learn robust shared representations. Subsequently, a Multi-Scale Perception (MSP) module highlights fine-grained differences to enhance local discrimination. The Edge–Change Interaction (ECI) module then constructs bidirectional coupling between the change and edge branches with edge supervision, improving boundary accuracy and geometric consistency. Finally, the Frequency-domain Change Fusion (FCF) module performs weighted modulation on multi-layer, channel-joint spectra, balancing low-frequency structural consistency with high-frequency detail fidelity. Experiments conducted on the landslide change detection dataset (GVLM-CD), post-earthquake building change detection dataset (WHU-CD), and a self-constructed unstable rock mass change detection dataset (TGRM-CD) demonstrate that MSMCD achieves state-of-the-art performance across all benchmarks. These results confirm its strong cross-scenario generalization ability and effectiveness in multiple geological disaster tasks. Full article
(This article belongs to the Special Issue Efficient Object Detection Based on Remote Sensing Images)
Show Figures

Figure 1

25 pages, 25629 KB  
Article
DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion
by Dandan Zhou, Lina Xu, Ke Wu, Huize Liu and Mengting Jiang
Remote Sens. 2025, 17(24), 4050; https://doi.org/10.3390/rs17244050 - 17 Dec 2025
Viewed by 238
Abstract
Many deep learning-based spatiotemporal fusion (STF) methods have been proven to achieve high accuracy and robustness. Due to the variable shapes and sizes of objects in remote sensing images, pyramid networks are generally introduced to extract multi-scale features. However, the down-sampling operation in [...] Read more.
Many deep learning-based spatiotemporal fusion (STF) methods have been proven to achieve high accuracy and robustness. Due to the variable shapes and sizes of objects in remote sensing images, pyramid networks are generally introduced to extract multi-scale features. However, the down-sampling operation in the pyramid structure may lead to the loss of image detail information, affecting the model’s ability to reconstruct fine-grained targets. To address this issue, we propose a novel Dual-Stream Enhanced Pyramid based on Generative Adversarial Network (DSEPGAN) for the spatiotemporal fusion of remote sensing images. The network adopts a dual-stream architecture to separately process coarse and fine images, tailoring feature extraction to their respective characteristics: coarse images provide temporal dynamics, while fine images contain rich spatial details. A reversible feature transformation is embedded in the pyramid feature extraction stage to preserve high-frequency information, and a fusion module employing large-kernel and depthwise separable convolutions captures long-range dependencies across inputs. To further enhance realism and detail fidelity, adversarial training encourages the network to generate sharper and more visually convincing fusion results. The proposed DSEPGAN is compared with widely used and state-of-the-art STF models in three publicly available datasets. The results illustrate that DSEPGAN achieves superior performance across various evaluation metrics, highlighting its notable advantages for predicting seasonal variations in highly heterogeneous regions and abrupt changes in land use. Full article
Show Figures

Figure 1

36 pages, 22245 KB  
Article
CMSNet: A SAM-Enhanced CNN–Mamba Framework for Damaged Building Change Detection in Remote Sensing Imagery
by Jianli Zhang, Liwei Tao, Wenbo Wei, Pengfei Ma and Mengdi Shi
Remote Sens. 2025, 17(23), 3913; https://doi.org/10.3390/rs17233913 - 3 Dec 2025
Viewed by 735
Abstract
In war and explosion scenarios, buildings often suffer varying degrees of damage characterized by complex, irregular, and fragmented spatial patterns, posing significant challenges for remote sensing–based change detection. Additionally, the scarcity of high-quality datasets limits the development and generalization of deep learning approaches. [...] Read more.
In war and explosion scenarios, buildings often suffer varying degrees of damage characterized by complex, irregular, and fragmented spatial patterns, posing significant challenges for remote sensing–based change detection. Additionally, the scarcity of high-quality datasets limits the development and generalization of deep learning approaches. To overcome these issues, we propose CMSNet, an end-to-end framework that integrates the structural priors of the Segment Anything Model (SAM) with the efficient temporal modeling and fine-grained representation capabilities of CNN–Mamba. Specifically, CMSNet adopts CNN–Mamba as the backbone to extract multi-scale semantic features from bi-temporal images, while SAM-derived visual priors guide the network to focus on building boundaries and structural variations. A Pre-trained Visual Prior-Guided Feature Fusion Module (PVPF-FM) is introduced to align and fuse these priors with change features, enhancing robustness against local damage, non-rigid deformations, and complex background interference. Furthermore, we construct a new RWSBD (Real-world War Scene Building Damage) dataset based on Gaza war scenes, comprising 42,732 annotated building damage instances across diverse scales, offering a strong benchmark for real-world scenarios. Extensive experiments on RWSBD and three public datasets (CWBD, WHU-CD, and LEVIR-CD+) demonstrate that CMSNet consistently outperforms eight state-of-the-art methods in both quantitative metrics (F1, IoU, Precision, Recall) and qualitative evaluations, especially in fine-grained boundary preservation, small-scale change detection, and complex scene adaptability. Overall, this work introduces a novel detection framework that combines foundation model priors with efficient change modeling, along with a new large-scale war damage dataset, contributing valuable advances to both research and practical applications in remote sensing change detection. Additionally, the strong generalization ability and efficient architecture of CMSNet highlight its potential for scalable deployment and practical use in large-area post-disaster assessment. Full article
Show Figures

Figure 1

25 pages, 5023 KB  
Article
Multi-State Recognition of Electro-Hydraulic Servo Fatigue Testers via Spatiotemporal Fusion and Bidirectional Cross-Attention
by Guotai Huang, Shuang Bai, Xiuguang Yang, Xiyu Gao and Peng Liu
Sensors 2025, 25(23), 7229; https://doi.org/10.3390/s25237229 - 26 Nov 2025
Viewed by 572
Abstract
Electro-hydraulic servo fatigue testing machines are susceptible to concurrent degradation and failure of multiple components during high-frequency, high-load, and long-duration cyclic operations, posing significant challenges for online health monitoring. To address this, this paper proposes a multi-state recognition method based on spatiotemporal feature [...] Read more.
Electro-hydraulic servo fatigue testing machines are susceptible to concurrent degradation and failure of multiple components during high-frequency, high-load, and long-duration cyclic operations, posing significant challenges for online health monitoring. To address this, this paper proposes a multi-state recognition method based on spatiotemporal feature fusion and bidirectional cross-attention. The method employs a Bidirectional Temporal Convolutional Network (BiTCN) to extract multi-scale local features, a Bidirectional Gated Recurrent Unit (BiGRU) to capture forward and backward temporal dependencies, and Bidirectional Cross-Attention (BiCrossAttention) to achieve fine-grained bidirectional interaction and fusion of spatial and temporal features. During training, GradNorm is introduced to dynamically balance task weights and mitigate gradient conflicts. Experimental validation was conducted using a real-world multi-sensor dataset collected from an SDZ0100 electro-hydraulic servo fatigue testing machine. The results show that on the validation set, the cooler and servo valve achieved both accuracy and F1-scores of 100%, the motor-pump unit achieved an accuracy of 98.32% and an F1-score of 97.72%, and the servo actuator achieved an accuracy of 96.39% and an F1-score of 95.83%. Compared to single-task models with the same backbone, multi-task learning improved performance by approximately 3% to 4% for the hydraulic pump and servo actuator tasks, while significantly reducing overall deployment resources. Compared to single-task baselines, multi-task learning improves performance by 3–4% while reducing deployment parameters by 75%. Ablation studies further confirmed the critical contributions of the bidirectional structure and individual components, as well as the effectiveness of GradNorm in multi-task learning for testing machines, achieving an average F1-score of 98.38%. The method also demonstrated strong robustness under varying learning rates and resampling conditions. Compared to various deep learning and fusion baseline methods, the proposed approach achieved optimal performance in most tasks. This study provides an effective technical solution for high-precision, lightweight, and robust online health monitoring of electro-hydraulic servo fatigue testing machines under complex operating conditions. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

19 pages, 5265 KB  
Article
A Real-Time Photovoltaic Power Estimation Framework Based on Multi-Scale Spatio-Temporal Graph Fusion
by Gaofei Yang, Jiale Xiao, Chaoyang Zhang, Debang Yang and Changyun Li
Electronics 2025, 14(22), 4492; https://doi.org/10.3390/electronics14224492 - 18 Nov 2025
Viewed by 498
Abstract
Accurate forecasting of photovoltaic (PV) power is crucial for real-time grid balancing and storage optimization. However, the intermittent, noisy, and nonstationary nature of PV generation, together with cross-site interactions, makes multi-site intra-hour forecasting challenging. In this paper, we propose a unified approach for [...] Read more.
Accurate forecasting of photovoltaic (PV) power is crucial for real-time grid balancing and storage optimization. However, the intermittent, noisy, and nonstationary nature of PV generation, together with cross-site interactions, makes multi-site intra-hour forecasting challenging. In this paper, we propose a unified approach for multi-site PV power forecasting named WGL (Wavelet–Graph Learning). Unlike prior studies that treat denoising and spatio-temporal modeling separately or predict each station independently, WGL forecasts all PV stations jointly while explicitly capturing their inherent spatio-temporal correlations. Within WGL, Learnable Wavelet Shrinkage (LWS) performs end-to-end noise suppression; a Temporal Multi-Scale Fine-grained Fusion (T-MSFF) module extracts complementary temporal patterns; and an attention fusion gate adaptively balances TCN and LSTM branches. For spatial coupling, graph self-attention (GSA) learns a sparse undirected graph among stations, and a Factorized Spatio-Temporal Attention (FSTA) efficiently models long-range interactions. Experiments on real-world multi-site PV datasets show that WGL consistently outperforms representative deep and graph-based baselines across intra-hour horizons, highlighting its effectiveness and deployment potential. Furthermore, a comprehensive analysis of influencing factors for scheme implementation—encompassing safety, reliability, economic rationality, management scientificity, and humanistic care—is conducted, providing a holistic assessment of the framework’s feasibility and potential impact in real-world power systems. Full article
Show Figures

Figure 1

24 pages, 3910 KB  
Article
SynerCD: Synergistic Tri-Branch and Vision-Language Coupling for Remote Sensing Change Detection
by Yumei Tong, Panpan Zheng, Wenbin Tang, Shuli Cheng and Liejun Wang
Remote Sens. 2025, 17(22), 3694; https://doi.org/10.3390/rs17223694 - 12 Nov 2025
Viewed by 641
Abstract
RSCD faces persistent challenges in high-resolution imagery due to complex spatial structures, temporal heterogeneity, and semantic ambiguity. While deep learning methods have significantly advanced the field, most existing models still rely on static and homogeneous processing, treating all channels and modalities equally, which [...] Read more.
RSCD faces persistent challenges in high-resolution imagery due to complex spatial structures, temporal heterogeneity, and semantic ambiguity. While deep learning methods have significantly advanced the field, most existing models still rely on static and homogeneous processing, treating all channels and modalities equally, which limits their capacity to capture fine-grained semantic shifts or adapt to region-dependent variations. To address these issues, we propose SynerCD, a unified Siamese encoder–decoder framework that introduces dynamic, content-adaptive perception through channel decoupling, frequency-domain enhancement, and vision-language collaboration. The encoder employs a Tri-branch Synergistic Coupling (TSC) module that dynamically rebalances channel responses and captures multi-scale spatial-frequency dependencies via Mamba-based long-sequence modeling and wavelet decomposition. The decoder integrates a vision-aware language-guided attention (VAL-Att) module, which adaptively modulates visual-textual fusion using CLIP-based semantic prompts to guide attention toward meaningful change regions. Extensive experiments on four benchmark datasets verify that SynerCD achieves superior localization accuracy and semantic robustness, establishing a dynamic and adaptive paradigm for multimodal change detection. Full article
Show Figures

Figure 1

26 pages, 3558 KB  
Article
Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks
by Xin Liu, Tao Liu and Ning Hu
Electronics 2025, 14(21), 4233; https://doi.org/10.3390/electronics14214233 - 29 Oct 2025
Viewed by 503
Abstract
Industrial control systems (ICS), as critical infrastructure supporting national operations, are increasingly threatened by sophisticated stealthy network attacks. These attacks often break malicious behaviors into multiple highly camouflaged packets, which are embedded into large-scale background traffic with low frequency, making them semantically and [...] Read more.
Industrial control systems (ICS), as critical infrastructure supporting national operations, are increasingly threatened by sophisticated stealthy network attacks. These attacks often break malicious behaviors into multiple highly camouflaged packets, which are embedded into large-scale background traffic with low frequency, making them semantically and temporally indistinguishable from normal traffic and thus evading traditional detection. Existing methods largely rely on flow-level statistics or long-sequence modeling, resulting in coarse detection granularity, high latency, and poor byte-level interpretability, falling short of industrial demands for real-time and actionable detection. To address these challenges, we propose Avocado, a fine-grained, multi-level intrusion detection model. Avocado’s core innovation lies in contextual flow-feature fusion: it models each packet jointly with its surrounding packet sequence, enabling independent abnormality detection and precise localization. Moreover, a shared-query multi-head self-attention mechanism is designed to quantify byte-level importance within packets. Experimental results show that Avocado significantly outperforms state-of-the-art flow-level methods on NGAS and CLIA-M221 datasets, improving packet-level detection ACC by 1.55% on average, and reducing FPR and FNR to 3.2%, 3.6% (NGAS), and 3.7%, 4.3% (CLIA-M221), respectively, demonstrating its superior performance in both detection and interpretability. Full article
(This article belongs to the Special Issue Novel Approaches for Deep Learning in Cybersecurity)
Show Figures

Figure 1

30 pages, 11870 KB  
Article
Early Mapping of Farmland and Crop Planting Structures Using Multi-Temporal UAV Remote Sensing
by Lu Wang, Yuan Qi, Juan Zhang, Rui Yang, Hongwei Wang, Jinlong Zhang and Chao Ma
Agriculture 2025, 15(21), 2186; https://doi.org/10.3390/agriculture15212186 - 22 Oct 2025
Viewed by 965
Abstract
Fine-grained identification of crop planting structures provides key data for precision agriculture, thereby supporting scientific production and evidence-based policy making. This study selected a representative experimental farmland in Qingyang, Gansu Province, and acquired Unmanned Aerial Vehicle (UAV) multi-temporal data (six epochs) from multiple [...] Read more.
Fine-grained identification of crop planting structures provides key data for precision agriculture, thereby supporting scientific production and evidence-based policy making. This study selected a representative experimental farmland in Qingyang, Gansu Province, and acquired Unmanned Aerial Vehicle (UAV) multi-temporal data (six epochs) from multiple sensors (multispectral [visible–NIR], thermal infrared, and LiDAR). By fusing 59 feature indices, we achieved high-accuracy extraction of cropland and planting structures and identified the key feature combinations that discriminate among crops. The results show that (1) multi-source UAV data from April + June can effectively delineate cropland and enable accurate plot segmentation; (2) July is the optimal time window for fine-scale extraction of all planting-structure types in the area (legumes, millet, maize, buckwheat, wheat, sorghum, maize–legume intercropping, and vegetables), with a cumulative importance of 72.26% for the top ten features, while the April + June combination retains most of the separability (67.36%), enabling earlier but slightly less precise mapping; and (3) under July imagery, the SAM (Segment Anything Model) segmentation + RF (Random Forest) classification approach—using the RF-selected top 10 of the 59 features—achieved an overall accuracy of 92.66% with a Kappa of 0.9163, representing a 7.57% improvement over the contemporaneous SAM + CNN (Convolutional Neural Network) method. This work establishes a basis for UAV-based recognition of typical crops in the Qingyang sector of the Loess Plateau and, by deriving optimal recognition timelines and feature combinations from multi-epoch data, offers useful guidance for satellite-based mapping of planting structures across the Loess Plateau following multi-scale data fusion. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

18 pages, 2459 KB  
Article
FFMamba: Feature Fusion State Space Model Based on Sound Event Localization and Detection
by Yibo Li, Dongyuan Ge, Jieke Xu and Xifan Yao
Electronics 2025, 14(19), 3874; https://doi.org/10.3390/electronics14193874 - 29 Sep 2025
Viewed by 677
Abstract
Previous studies on Sound Event Localization and Detection (SELD) have primarily focused on CNN- and Transformer-based designs. While CNNs possess local receptive fields, making it difficult to capture global dependencies over long sequences, Transformers excel at modeling long-range dependencies but have limited sensitivity [...] Read more.
Previous studies on Sound Event Localization and Detection (SELD) have primarily focused on CNN- and Transformer-based designs. While CNNs possess local receptive fields, making it difficult to capture global dependencies over long sequences, Transformers excel at modeling long-range dependencies but have limited sensitivity to local time–frequency features. Recently, the VMamba architecture, built upon the Visual State Space (VSS) model, has shown great promise in handling long sequences, yet it remains limited in modeling local spatial details. To address this issue, we propose a novel state space model with an attention-enhanced feature fusion mechanism, termed FFMamba, which balances both local spatial modeling and long-range dependency capture. At a fine-grained level, we design two key modules: the Multi-Scale Fusion Visual State Space (MSFVSS) module and the Wavelet Transform-Enhanced Downsampling (WTED) module. Specifically, the MSFVSS module integrates a Multi-Scale Fusion (MSF) component into the VSS framework, enhancing its ability to capture both long-range temporal dependencies and detailed local spatial information. Meanwhile, the WTED module employs a dual-branch design to fuse spatial and frequency domain features, improving the richness of feature representations. Comparative experiments were conducted on the DCASE2021 Task 3 and DCASE2022 Task 3 datasets. The results demonstrate that the proposed FFMamba model outperforms recent approaches in capturing long-range temporal dependencies and effectively integrating multi-scale audio features. In addition, ablation studies confirmed the effectiveness of the MSFVSS and WTED modules. Full article
Show Figures

Figure 1

24 pages, 2338 KB  
Article
DynaNet: A Dynamic Feature Extraction and Multi-Path Attention Fusion Network for Change Detection
by Xue Li, Dong Li, Jiandong Fang and Xueying Feng
Sensors 2025, 25(18), 5832; https://doi.org/10.3390/s25185832 - 18 Sep 2025
Viewed by 821
Abstract
Existing change detection methods often struggle with both inadequate feature fusion and interference from background noise when processing bi-temporal remote sensing imagery. These challenges are particularly pronounced in building change detection, where capturing subtle spatial and semantic dependencies is critical. To address these [...] Read more.
Existing change detection methods often struggle with both inadequate feature fusion and interference from background noise when processing bi-temporal remote sensing imagery. These challenges are particularly pronounced in building change detection, where capturing subtle spatial and semantic dependencies is critical. To address these issues, we propose DynaNet, a dynamic feature extraction and multi-path attention fusion network for change detection. Specifically, we design a Dynamic Feature Extractor (DFE) that leverages a cross-temporal gating mechanism to amplify relevant change signals while suppressing irrelevant variations, enabling high-quality feature alignment. A Contextual Attention Module (CAM) is then employed to incorporate global contextual information, further enhancing the discriminative capability of change regions. Additionally, a Multi-Branch Attention Fusion Module (MBAFM) is introduced to model inter-scale semantic relationships through self- and cross-attention mechanisms, thereby improving the detection of fine-grained structural changes. To facilitate robust evaluation, we present a new benchmark dataset, Inner-CD, comprising 800 pairs of 256 × 256 bi-temporal satellite images with 0.5–2 m spatial resolution. Unlike existing datasets, Inner-CD features abundant buildings in both temporal images, with changes manifested as subtle morphological variations. Extensive experiments demonstrate that DynaNet achieves state-of-the-art performance, obtaining F1-scores of 90.92% on Inner-CD, 92.38% on LEVIR-CD, and 94.35% on WHU-CD. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

24 pages, 3729 KB  
Article
Multi-Source Heterogeneous Data Fusion Algorithm for Vessel Trajectories in Canal Scenarios
by Jiayu Zhang, Mei Wang, Ruixiang Kan and Zihang Xiong
Electronics 2025, 14(16), 3223; https://doi.org/10.3390/electronics14163223 - 14 Aug 2025
Viewed by 1167
Abstract
With the globalization of trade, maritime transport is playing an increasingly strategic role in sustaining international commerce. As a result, research into the tracking and fusion of multi-source vessel data in canal environments has become critical for enhancing maritime situational awareness. In the [...] Read more.
With the globalization of trade, maritime transport is playing an increasingly strategic role in sustaining international commerce. As a result, research into the tracking and fusion of multi-source vessel data in canal environments has become critical for enhancing maritime situational awareness. In the existing research and development, the heterogeneity of and variability in vessel flow data often lead to multiple issues in tracking algorithms, as well as in subsequent trajectory-matching processes. The existing tracking and matching frameworks typically suffer from three major limitations: insufficient capacity to extract fine-grained features from multi-source data; difficulty in balancing global context with local dynamics during multi-scale feature tracking; and an inadequate ability to model long-range temporal dependencies in trajectory matching. To address these challenges, this study proposes the Shape Similarity and Generalized Distance Adjustment (SSGDA) framework, a novel vessel trajectory-matching approach designed to track and associate multi-source heterogeneous vessel data in complex canal environments. The primary contributions of this work are summarized as follows: (1) an enhanced optimization strategy for trajectory fusion based on Enhanced Particle Swarm Optimization (E-PSO) designed for the proposed trajectory-matching framework; (2) the proposal of a trajectory similarity measurement method utilizing a distance-based reward–penalty mechanism, followed by empirical validation using the publicly available FVessel dataset. Comprehensive aggregation and analysis of the experimental results demonstrate that the proposed SSGDA method achieved a matching precision of 96.30%, outperforming all comparative approaches. Additionally, the proposed method reduced the mean-squared error between trajectory points by 97.82 pixel units. These findings further highlight the strong research potential and practical applicability of the proposed framework in real-world canal scenarios. Full article
Show Figures

Figure 1

28 pages, 48169 KB  
Article
Advancing Self-Supervised Learning for Building Change Detection and Damage Assessment: Unified Denoising Autoencoder and Contrastive Learning Framework
by Songxi Yang, Bo Peng, Tang Sui, Meiliu Wu and Qunying Huang
Remote Sens. 2025, 17(15), 2717; https://doi.org/10.3390/rs17152717 - 6 Aug 2025
Cited by 1 | Viewed by 2141
Abstract
Building change detection and building damage assessment are two essential tasks in post-disaster analysis. Building change detection focuses on identifying changed building areas between bi-temporal images, while building damage assessment involves segmenting all buildings and classifying their damage severity. These tasks play a [...] Read more.
Building change detection and building damage assessment are two essential tasks in post-disaster analysis. Building change detection focuses on identifying changed building areas between bi-temporal images, while building damage assessment involves segmenting all buildings and classifying their damage severity. These tasks play a critical role in disaster response and urban development monitoring. Although supervised learning has significantly advanced building change detection and damage assessment, its reliance on large labeled datasets remains a major limitation. In contrast, self-supervised learning enables the extraction of meaningful data representations without explicit training labels. To address this challenge, we propose a self-supervised learning approach that unifies denoising autoencoders and contrastive learning, enabling effective data representation for building change detection and damage assessment. The proposed architecture integrates a dual denoising autoencoder with a Vision Transformer backbone and contrastive learning strategy, complemented by a Feature Pyramid Network-ResNet dual decoder and an Edge Guidance Module. This design enhances multi-scale feature extraction and enables edge-aware segmentation for accurate predictions. Extensive experiments were conducted on five public datasets, including xBD, LEVIR, LEVIR+, SYSU, and WHU, to evaluate the performance and generalization capabilities of the model. The results demonstrate that the proposed Denoising AutoEncoder-enhanced Dual-Fusion Network (DAEDFN) approach achieves competitive performance compared with fully supervised methods. On the xBD dataset, the largest dataset for building damage assessment, our proposed method achieves an F1 score of 0.892 for building segmentation, outperforming state-of-the-art methods. For building damage severity classification, the model achieves an F1 score of 0.632. On the building change detection datasets, the proposed method achieves F1 scores of 0.837 (LEVIR), 0.817 (LEVIR+), 0.768 (SYSU), and 0.876 (WHU), demonstrating model generalization across diverse scenarios. Despite these promising results, challenges remain in complex urban environments, small-scale changes, and fine-grained boundary detection. These findings highlight the potential of self-supervised learning in building change detection and damage assessment tasks. Full article
Show Figures

Figure 1

22 pages, 4882 KB  
Article
Dual-Branch Spatio-Temporal-Frequency Fusion Convolutional Network with Transformer for EEG-Based Motor Imagery Classification
by Hao Hu, Zhiyong Zhou, Zihan Zhang and Wenyu Yuan
Electronics 2025, 14(14), 2853; https://doi.org/10.3390/electronics14142853 - 17 Jul 2025
Cited by 1 | Viewed by 1541
Abstract
The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture [...] Read more.
The decoding of motor imagery (MI) electroencephalogram (EEG) signals is crucial for motor control and rehabilitation. However, as feature extraction is the core component of the decoding process, traditional methods, often limited to single-feature domains or shallow time-frequency fusion, struggle to comprehensively capture the spatio-temporal-frequency characteristics of the signals, thereby limiting decoding accuracy. To address these limitations, this paper proposes a dual-branch neural network architecture with multi-domain feature fusion, the dual-branch spatio-temporal-frequency fusion convolutional network with Transformer (DB-STFFCNet). The DB-STFFCNet model consists of three modules: the spatiotemporal feature extraction module (STFE), the frequency feature extraction module (FFE), and the feature fusion and classification module. The STFE module employs a lightweight multi-dimensional attention network combined with a temporal Transformer encoder, capable of simultaneously modeling local fine-grained features and global spatiotemporal dependencies, effectively integrating spatiotemporal information and enhancing feature representation. The FFE module constructs a hierarchical feature refinement structure by leveraging the fast Fourier transform (FFT) and multi-scale frequency convolutions, while a frequency-domain Transformer encoder captures the global dependencies among frequency domain features, thus improving the model’s ability to represent key frequency information. Finally, the fusion module effectively consolidates the spatiotemporal and frequency features to achieve accurate classification. To evaluate the feasibility of the proposed method, experiments were conducted on the BCI Competition IV-2a and IV-2b public datasets, achieving accuracies of 83.13% and 89.54%, respectively, outperforming existing methods. This study provides a novel solution for joint time-frequency representation learning in EEG analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence Methods for Biomedical Data Processing)
Show Figures

Figure 1

20 pages, 3811 KB  
Article
A Multi-Scale Time–Frequency Complementary Load Forecasting Method for Integrated Energy Systems
by Enci Jiang, Ziyi Wang and Shanshan Jiang
Energies 2025, 18(12), 3103; https://doi.org/10.3390/en18123103 - 12 Jun 2025
Viewed by 1108
Abstract
With the growing demand for global energy transition, integrated energy systems (IESs) have emerged as a key pathway for sustainable development due to their deep coupling of multi-energy flows. Accurate load forecasting is crucial for IES optimization and scheduling, yet conventional methods struggle [...] Read more.
With the growing demand for global energy transition, integrated energy systems (IESs) have emerged as a key pathway for sustainable development due to their deep coupling of multi-energy flows. Accurate load forecasting is crucial for IES optimization and scheduling, yet conventional methods struggle with complex spatio-temporal correlations and long-term dependencies. This study proposes ST-ScaleFusion, a multi-scale time–frequency complementary hybrid model to enhance comprehensive energy load forecasting accuracy. The model features three core modules: a multi-scale decomposition hybrid module for fine-grained extraction of multi-time-scale features via hierarchical down-sampling and seasonal-trend decoupling; a frequency domain interpolation forecasting (FI) module using complex linear projection for amplitude-phase joint modeling to capture long-term patterns and suppress noise; and an FI sub-module extending series length via frequency domain interpolation to adapt to non-stationary loads. Experiments on 2021–2023 multi-energy load and meteorological data from the Arizona State University Tempe campus show that ST-ScaleFusion achieves 24 h forecasting MAE values of 667.67 kW for electric load, 1073.93 kW/h for cooling load, and 85.73 kW for heating load, outperforming models like TimesNet and TSMixer. Robust in long-step (96 h) forecasting, it reduces MAE by 30% compared to conventional methods, offering an efficient tool for real-time IES scheduling and risk decision-making. Full article
(This article belongs to the Special Issue Computational Intelligence in Electrical Systems: 2nd Edition)
Show Figures

Figure 1

22 pages, 52487 KB  
Article
LCFANet: A Novel Lightweight Cross-Level Feature Aggregation Network for Small Agricultural Pest Detection
by Shijian Huang, Yunong Tian, Yong Tan and Zize Liang
Agronomy 2025, 15(5), 1168; https://doi.org/10.3390/agronomy15051168 - 11 May 2025
Cited by 1 | Viewed by 1073
Abstract
In agricultural pest detection, the small size of pests poses a critical hurdle to detection accuracy. To mitigate this concern, we propose a Lightweight Cross-Level Feature Aggregation Network (LCFANet), which comprises three key components: a deep feature extraction network, a deep feature fusion [...] Read more.
In agricultural pest detection, the small size of pests poses a critical hurdle to detection accuracy. To mitigate this concern, we propose a Lightweight Cross-Level Feature Aggregation Network (LCFANet), which comprises three key components: a deep feature extraction network, a deep feature fusion network, and a multi-scale object detection head. Within the feature extraction and fusion networks, we introduce the Dual Temporal Feature Aggregation C3k2 (DTFA-C3k2) module, leveraging a spatiotemporal fusion mechanism to integrate multi-receptive field features while preserving fine-grained texture and structural details across scales. This significantly improves detection performance for objects with large scale variations. Additionally, we propose the Aggregated Downsampling Convolution (ADown-Conv) module, a dual-path compression unit that enhances feature representation while efficiently reducing spatial dimensions. For feature fusion, we design a Cross-Level Hierarchical Feature Pyramid (CLHFP), which employs bidirectional integration—backward pyramid construction for deep-to-shallow fusion and forward pyramid construction for feature refinement. The detection head incorporates a Multi-Scale Adaptive Spatial Fusion (MSASF) module, adaptively fusing features at specific scales to improve accuracy for varying-sized objects. Furthermore, we introduce the MPDINIoU loss function, combining InnerIoU and MPDIoU to optimize bounding box regression. The LCFANet-n model has 2.78M parameters and a computational cost of 6.7 GFLOPs, enabling lightweight deployment. Extensive experiments on the public dataset demonstrate that the LCFANet-n model achieves a precision of 71.7%, recall of 68.5%, mAP50 of 70.4%, and mAP50-95 of 45.1%, reaching state-of-the-art (SOTA) performance in small-sized pest detection while maintaining a lightweight architecture. Full article
Show Figures

Figure 1

Back to TopTop