Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (223)

Search Parameters:
Keywords = multi-level feature aggregation network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2891 KB  
Article
A Community Detection Model Based on Dynamic Propagation-Aware Multi-Hop Feature Aggregation
by Chao Lei, Yuzhi Xiao, Sheng Jin, Tao Huang, Chuang Zhang and Meng Cheng
Entropy 2025, 27(10), 1053; https://doi.org/10.3390/e27101053 - 10 Oct 2025
Viewed by 277
Abstract
Community detection is a crucial technique for uncovering latent network structures, analyzing group behaviors, and understanding information dissemination pathways. Existing methods predominantly rely on static graph structural features, while neglecting the intrinsic dynamic patterns of information diffusion and nonlinear attenuation within static networks. [...] Read more.
Community detection is a crucial technique for uncovering latent network structures, analyzing group behaviors, and understanding information dissemination pathways. Existing methods predominantly rely on static graph structural features, while neglecting the intrinsic dynamic patterns of information diffusion and nonlinear attenuation within static networks. To address these limitations, we propose DAMA, a community detection model that integrates dynamic propagation-aware feature modeling with adaptive multi-hop structural aggregation. First, an Information Flow Matrix (IFM) is constructed to quantify the nonlinear attenuation of information propagation between nodes, thereby enriching static structural representations with nonlinear propagation dynamics. Second, we propose an Adaptive Sparse Sampling Module that adaptively retains influential neighbors by applying multi-level propagation thresholds, improving structural denoising and preserving essential diffusion pathways. Finally, we design a Hierarchical Multi-Hop Aggregation Framework, which employs a dual-gating mechanism to adaptively integrate neighborhood representations across multiple hops. This approach enables more expressive structural embeddings by progressively combining local and extended topological information. Experimental results demonstrate that DAMA achieves better performance in community detection tasks across multiple real-world networks and LFR-generated synthetic networks. Full article
(This article belongs to the Section Complexity)
Show Figures

Figure 1

22 pages, 2395 KB  
Article
Multimodal Alignment and Hierarchical Fusion Network for Multimodal Sentiment Analysis
by Jiasheng Huang, Huan Li and Xinyue Mo
Electronics 2025, 14(19), 3828; https://doi.org/10.3390/electronics14193828 - 26 Sep 2025
Viewed by 831
Abstract
The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity [...] Read more.
The widespread emergence of multimodal data on social platforms has presented new opportunities for sentiment analysis. However, previous studies have often overlooked the issue of detail loss during modal interaction fusion. They also exhibit limitations in addressing semantic alignment challenges and the sensitivity of modalities to noise. To enhance analytical accuracy, a novel model named MAHFNet is proposed. The proposed architecture is composed of three main components. Firstly, an attention-guided gated interaction alignment module is developed for modeling the semantic interaction between text and image using a gated network and a cross-modal attention mechanism. Next, a contrastive learning mechanism is introduced to encourage the aggregation of semantically aligned image-text pairs. Subsequently, an intra-modality emotion extraction module is designed to extract local emotional features within each modality. This module serves to compensate for detail loss during interaction fusion. The intra-modal local emotion features and cross-modal interaction features are then fed into a hierarchical gated fusion module, where the local features are fused through a cross-gated mechanism to dynamically adjust the contribution of each modality while suppressing modality-specific noise. Then, the fusion results and cross-modal interaction features are further fused using a multi-scale attention gating module to capture hierarchical dependencies between local and global emotional information, thereby enhancing the model’s ability to perceive and integrate emotional cues across multiple semantic levels. Finally, extensive experiments have been conducted on three public multimodal sentiment datasets, with results demonstrating that the proposed model outperforms existing methods across multiple evaluation metrics. Specifically, on the TumEmo dataset, our model achieves improvements of 2.55% in ACC and 2.63% in F1 score compared to the second-best method. On the HFM dataset, these gains reach 0.56% in ACC and 0.9% in F1 score, respectively. On the MVSA-S dataset, these gains reach 0.03% in ACC and 1.26% in F1 score. These findings collectively validate the overall effectiveness of the proposed model. Full article
Show Figures

Figure 1

28 pages, 14783 KB  
Article
HSSTN: A Hybrid Spectral–Structural Transformer Network for High-Fidelity Pansharpening
by Weijie Kang, Yuan Feng, Yao Ding, Hongbo Xiang, Xiaobo Liu and Yaoming Cai
Remote Sens. 2025, 17(19), 3271; https://doi.org/10.3390/rs17193271 - 23 Sep 2025
Viewed by 536
Abstract
Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS [...] Read more.
Pansharpening fuses multispectral (MS) and panchromatic (PAN) remote sensing images to generate outputs with high spatial resolution and spectral fidelity. Nevertheless, conventional methods relying primarily on convolutional neural networks or unimodal fusion strategies frequently fail to bridge the sensor modality gap between MS and PAN data. Consequently, spectral distortion and spatial degradation often occur, limiting high-precision downstream applications. To address these issues, this work proposes a Hybrid Spectral–Structural Transformer Network (HSSTN) that enhances multi-level collaboration through comprehensive modelling of spectral–structural feature complementarity. Specifically, the HSSTN implements a three-tier fusion framework. First, an asymmetric dual-stream feature extractor employs a residual block with channel attention (RBCA) in the MS branch to strengthen spectral representation, while a Transformer architecture in the PAN branch extracts high-frequency spatial details, thereby reducing modality discrepancy at the input stage. Subsequently, a target-driven hierarchical fusion network utilises progressive crossmodal attention across scales, ranging from local textures to multi-scale structures, to enable efficient spectral–structural aggregation. Finally, a novel collaborative optimisation loss function preserves spectral integrity while enhancing structural details. Comprehensive experiments conducted on QuickBird, GaoFen-2, and WorldView-3 datasets demonstrate that HSSTN outperforms existing methods in both quantitative metrics and visual quality. Consequently, the resulting images exhibit sharper details and fewer spectral artefacts, showcasing significant advantages in high-fidelity remote sensing image fusion. Full article
(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)
Show Figures

Figure 1

20 pages, 4568 KB  
Article
Dual-Branch Transformer–CNN Fusion for Enhanced Cloud Segmentation in Remote Sensing Imagery
by Shengyi Cheng, Hangfei Guo, Hailei Wu and Xianjun Du
Appl. Sci. 2025, 15(18), 9870; https://doi.org/10.3390/app15189870 - 9 Sep 2025
Viewed by 535
Abstract
Cloud coverage and obstruction significantly affect the usability of remote sensing images, making cloud detection a key prerequisite for optical remote sensing applications. In existing cloud detection methods, using U-shaped convolutional networks alone has limitations in modeling long-range contexts, while Vision Transformers fall [...] Read more.
Cloud coverage and obstruction significantly affect the usability of remote sensing images, making cloud detection a key prerequisite for optical remote sensing applications. In existing cloud detection methods, using U-shaped convolutional networks alone has limitations in modeling long-range contexts, while Vision Transformers fall short in capturing local spatial features. To address these issues, this study proposes a dual-branch framework, TransCNet, which combines Transformer and CNN architectures to enhance the accuracy and effectiveness of cloud detection. TransCNet addresses this by designing dual encoder branches: a Transformer branch capturing global dependencies and a CNN branch extracting local details. A novel feature aggregation module enables the complementary fusion of multi-level features from both branches at each encoder stage, enhanced by channel attention mechanisms. To mitigate feature dilution during decoding, aggregated features compensate for information loss from sampling operations. Evaluations on 38-Cloud, SPARCS, and a high-resolution Landsat-8 dataset demonstrate TransCNet’s competitive performance across metrics, effectively balancing global semantic understanding and local edge preservation for clearer cloud boundary detection. The approach resolves key limitations in existing cloud detection frameworks through synergistic multi-branch feature integration. Full article
Show Figures

Figure 1

31 pages, 8445 KB  
Article
HIRD-Net: An Explainable CNN-Based Framework with Attention Mechanism for Diabetic Retinopathy Diagnosis Using CLAHE-D-DoG Enhanced Fundus Images
by Muhammad Hassaan Ashraf, Muhammad Nabeel Mehmood, Musharif Ahmed, Dildar Hussain, Jawad Khan, Younhyun Jung, Mohammed Zakariah and Deema Mohammed AlSekait
Life 2025, 15(9), 1411; https://doi.org/10.3390/life15091411 - 8 Sep 2025
Viewed by 922
Abstract
Diabetic Retinopathy (DR) is a leading cause of vision impairment globally, underscoring the need for accurate and early diagnosis to prevent disease progression. Although fundus imaging serves as a cornerstone of Computer-Aided Diagnosis (CAD) systems, several challenges persist, including lesion scale variability, blurry [...] Read more.
Diabetic Retinopathy (DR) is a leading cause of vision impairment globally, underscoring the need for accurate and early diagnosis to prevent disease progression. Although fundus imaging serves as a cornerstone of Computer-Aided Diagnosis (CAD) systems, several challenges persist, including lesion scale variability, blurry morphological patterns, inter-class imbalance, limited labeled datasets, and computational inefficiencies. To address these issues, this study proposes an end-to-end diagnostic framework that integrates an enhanced preprocessing pipeline with a novel deep learning architecture, Hierarchical-Inception-Residual-Dense Network (HIRD-Net). The preprocessing stage combines Contrast Limited Adaptive Histogram Equalization (CLAHE) with Dilated Difference of Gaussian (D-DoG) filtering to improve image contrast and highlight fine-grained retinal structures. HIRD-Net features a hierarchical feature fusion stem alongside multiscale, multilevel inception-residual-dense blocks for robust representation learning. The Squeeze-and-Excitation Channel Attention (SECA) is introduced before each Global Average Pooling (GAP) layer to refine the Feature Maps (FMs). It further incorporates four GAP layers for multi-scale semantic aggregation, employs the Hard-Swish activation to enhance gradient flow, and utilizes the Focal Loss function to mitigate class imbalance issues. Experimental results on the IDRiD-APTOS2019, DDR, and EyePACS datasets demonstrate that the proposed framework achieves 93.46%, 82.45% and 79.94% overall classification accuracy using only 4.8 million parameters, highlighting its strong generalization capability and computational efficiency. Furthermore, to ensure transparent predictions, an Explainable AI (XAI) approach known as Gradient-weighted Class Activation Mapping (Grad-CAM) is employed to visualize HIRD-Net’s decision-making process. Full article
(This article belongs to the Special Issue Advanced Machine Learning for Disease Prediction and Prevention)
Show Figures

Figure 1

30 pages, 25011 KB  
Article
Multi-Level Contextual and Semantic Information Aggregation Network for Small Object Detection in UAV Aerial Images
by Zhe Liu, Guiqing He and Yang Hu
Drones 2025, 9(9), 610; https://doi.org/10.3390/drones9090610 - 29 Aug 2025
Viewed by 669
Abstract
In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images [...] Read more.
In recent years, detection methods for generic object detection have achieved significant progress. However, due to the large number of small objects in aerial images, mainstream detectors struggle to achieve a satisfactory detection performance. The challenges of small object detection in aerial images are primarily twofold: (1) Insufficient feature representation: The limited visual information for small objects makes it difficult for models to learn discriminative feature representations. (2) Background confusion: Abundant background information introduces more noise and interference, causing the features of small objects to easily be confused with the background. To address these issues, we propose a Multi-Level Contextual and Semantic Information Aggregation Network (MCSA-Net). MCSA-Net includes three key components: a Spatial-Aware Feature Selection Module (SAFM), a Multi-Level Joint Feature Pyramid Network (MJFPN), and an Attention-Enhanced Head (AEHead). The SAFM employs a sequence of dilated convolutions to extract multi-scale local context features and combines a spatial selection mechanism to adaptively merge these features, thereby obtaining the critical local context required for the objects, which enriches the feature representation of small objects. The MJFPN introduces multi-level connections and weighted fusion to fully leverage the spatial detail features of small objects in feature fusion and enhances the fused features further through a feature aggregation network. Finally, the AEHead is constructed by incorporating a sparse attention mechanism into the detection head. The sparse attention mechanism efficiently models long-range dependencies by computing the attention between the most relevant regions in the image while suppressing background interference, thereby enhancing the model’s ability to perceive targets and effectively improving the detection performance. Extensive experiments on four datasets, VisDrone, UAVDT, MS COCO, and DOTA, demonstrate that the proposed MCSA-Net achieves an excellent detection performance, particularly in small object detection, surpassing several state-of-the-art methods. Full article
(This article belongs to the Special Issue Intelligent Image Processing and Sensing for Drones, 2nd Edition)
Show Figures

Figure 1

16 pages, 74973 KB  
Article
TVI-MFAN: A Text–Visual Interaction Multilevel Feature Alignment Network for Visual Grounding in Remote Sensing
by Hao Chi, Weiwei Qin, Xingyu Chen, Wenxin Guo and Baiwei An
Remote Sens. 2025, 17(17), 2993; https://doi.org/10.3390/rs17172993 - 28 Aug 2025
Viewed by 640
Abstract
Visual grounding for remote sensing (RSVG) focuses on localizing specific objects in remote sensing (RS) imagery based on linguistic expressions. Existing methods typically employ pre-trained models to locate the referenced objects. However, due to the insufficient capability of cross-modal interaction and alignment, the [...] Read more.
Visual grounding for remote sensing (RSVG) focuses on localizing specific objects in remote sensing (RS) imagery based on linguistic expressions. Existing methods typically employ pre-trained models to locate the referenced objects. However, due to the insufficient capability of cross-modal interaction and alignment, the extracted visual features may suffer from semantic drift, limiting the performance of RSVG. To address this, the article introduces a novel RSVG framework named the text–visual interaction multilevel feature alignment network (TVI-MFAN), which leverages a text–visual interaction attention (TVIA) module to dynamically generate adaptive weights and biases at both spatial and channel dimensions, enabling the visual feature to focus on relevant linguistic expressions. Additionally, a multilevel feature alignment network (MFAN) aggregates contextual information by using cross-modal alignment to enhance features and suppress irrelevant regions. Experiments demonstrate that the proposed method achieves 75.65% and 80.24% (2.42% and 3.1% absolute improvement) accuracy on the OPT-RSVG and DIOR-RSVG dataset, validating its effectiveness. Full article
Show Figures

Figure 1

18 pages, 526 KB  
Article
DPBD: Disentangling Preferences via Borrowing Duration for Book Recommendation
by Zhifang Liao, Liping Chen, Yuelan Qi and Fei Li
Big Data Cogn. Comput. 2025, 9(9), 222; https://doi.org/10.3390/bdcc9090222 - 28 Aug 2025
Viewed by 645
Abstract
Traditional book recommendation methods predominantly rely on collaborative filtering and context-based approaches. However, existing methods fail to account for the order of users’ book borrowings and the duration they hold them, both of which are crucial indicators reflecting users’ book preferences. To address [...] Read more.
Traditional book recommendation methods predominantly rely on collaborative filtering and context-based approaches. However, existing methods fail to account for the order of users’ book borrowings and the duration they hold them, both of which are crucial indicators reflecting users’ book preferences. To address this challenge, we propose a book recommendation framework called DPBD, which disentangles preferences based on borrowing duration, thereby explicitly modeling temporal patterns in library borrowing behaviors. The DPBD model adopts a dual-path neural architecture comprising the following: (1) The item-level path utilizes self-attention networks to encode historical borrowing sequences while incorporating borrowing duration as an adaptive weighting mechanism for attention score refinement. (2) The feature-level path employs gated fusion modules to effectively aggregate multi-source item attributes (e.g., category and title), followed by self-attention networks to model feature transition patterns. The framework subsequently combines both path representations through fully connected layers to generate user preference embeddings for next-book recommendation. Extensive experiments conducted on two real-world university library datasets demonstrate the superior performance of the proposed DPBD model compared with baseline methods. Specifically, the model achieved 13.67% and 15.75% on HR@1 and 15.75% and 12.90% on NDCG@1 across the two datasets. Full article
Show Figures

Figure 1

24 pages, 3961 KB  
Article
Hierarchical Multi-Scale Mamba with Tubular Structure-Aware Convolution for Retinal Vessel Segmentation
by Tao Wang, Dongyuan Tian, Haonan Zhao, Jiamin Liu, Weijie Wang, Chunpei Li and Guixia Liu
Entropy 2025, 27(8), 862; https://doi.org/10.3390/e27080862 - 14 Aug 2025
Viewed by 992
Abstract
Retinal vessel segmentation plays a crucial role in diagnosing various retinal and cardiovascular diseases and serves as a foundation for computer-aided diagnostic systems. Blood vessels in color retinal fundus images, captured using fundus cameras, are often affected by illumination variations and noise, making [...] Read more.
Retinal vessel segmentation plays a crucial role in diagnosing various retinal and cardiovascular diseases and serves as a foundation for computer-aided diagnostic systems. Blood vessels in color retinal fundus images, captured using fundus cameras, are often affected by illumination variations and noise, making it difficult to preserve vascular integrity and posing a significant challenge for vessel segmentation. In this paper, we propose HM-Mamba, a novel hierarchical multi-scale Mamba-based architecture that incorporates tubular structure-aware convolution to extract both local and global vascular features for retinal vessel segmentation. First, we introduce a tubular structure-aware convolution to reinforce vessel continuity and integrity. Building on this, we design a multi-scale fusion module that aggregates features across varying receptive fields, enhancing the model’s robustness in representing both primary trunks and fine branches. Second, we integrate multi-branch Fourier transform with the dynamic state modeling capability of Mamba to capture both long-range dependencies and multi-frequency information. This design enables robust feature representation and adaptive fusion, thereby enhancing the network’s ability to model complex spatial patterns. Furthermore, we propose a hierarchical multi-scale interactive Mamba block that integrates multi-level encoder features through gated Mamba-based global context modeling and residual connections, enabling effective multi-scale semantic fusion and reducing detail loss during downsampling. Extensive evaluations on five widely used benchmark datasets—DRIVE, CHASE_DB1, STARE, IOSTAR, and LES-AV—demonstrate the superior performance of HM-Mamba, yielding Dice coefficients of 0.8327, 0.8197, 0.8239, 0.8307, and 0.8426, respectively. Full article
Show Figures

Figure 1

16 pages, 9189 KB  
Article
SEND: Semantic-Aware Deep Unfolded Network with Diffusion Prior for Multi-Modal Image Fusion and Object Detection
by Rong Zhang, Mao-Yi Xiong and Jun-Jie Huang
Mathematics 2025, 13(16), 2584; https://doi.org/10.3390/math13162584 - 12 Aug 2025
Viewed by 604
Abstract
Multi-modality image fusion (MIF) aims to integrate complementary information from diverse imaging modalities into a single comprehensive representation and serves as an essential processing step for downstream high-level computer vision tasks. The existing deep unfolding-based processes demonstrate promising results; however, they often rely [...] Read more.
Multi-modality image fusion (MIF) aims to integrate complementary information from diverse imaging modalities into a single comprehensive representation and serves as an essential processing step for downstream high-level computer vision tasks. The existing deep unfolding-based processes demonstrate promising results; however, they often rely on deterministic priors with limited generalization ability and usually decouple from the training process of object detection. In this paper, we propose Semantic-Aware Deep Unfolded Network with Diffusion Prior (SEND), a novel framework designed for transparent and effective multi-modality fusion and object detection. SEND consists of a Denoising Prior Guided Fusion Module and a Fusion Object Detection Module. The Denoising Prior Guided Fusion Module does not utilize the traditional deterministic prior but combines the diffusion prior with deep unfolding, leading to improved multi-modal fusion performance and generalization ability. It is designed with a model-based optimization formulation for multi-modal image fusion, which is unfolded into two cascaded blocks: a Diffusion Denoising Fusion Block to generate informative diffusion priors and a Data Consistency Enhancement Block that explicitly aggregates complementary features from both the diffusion priors and input modalities. Additionally, SEND incorporates the Fusion Object Detection Module with the Denoising Prior Guided Fusion Module for object detection task optimization using a carefully designed two-stage training strategy. Experiments demonstrate that the proposed SEND method outperforms state-of-the-art methods, achieving superior fusion quality with improved efficiency and interpretability. Full article
Show Figures

Figure 1

27 pages, 3511 KB  
Article
A Distributed Wearable Computing Framework for Human Activity Classification
by Jhonathan L. Rivas-Caicedo, Kevin Niño-Tejada, Laura Saldaña-Aristizabal and Juan F. Patarroyo-Montenegro
Electronics 2025, 14(16), 3203; https://doi.org/10.3390/electronics14163203 - 12 Aug 2025
Viewed by 641
Abstract
Human Activity Recognition (HAR) using wearable sensors plays a critical role in applications such as healthcare, sports monitoring, and rehabilitation. Traditional approaches typically rely on centralized models that aggregate and process data from multiple sensors simultaneously. However, such architecture often suffers from high [...] Read more.
Human Activity Recognition (HAR) using wearable sensors plays a critical role in applications such as healthcare, sports monitoring, and rehabilitation. Traditional approaches typically rely on centralized models that aggregate and process data from multiple sensors simultaneously. However, such architecture often suffers from high latency, increased communication overhead, limited scalability, and reduced robustness, particularly in dynamic environments where wearable systems operate under resource constraints. This paper proposes a distributed neural network framework for HAR, where each wearable sensor independently processes its data using a lightweight neural model and transmits high-level features or predictions to a central neural network for final classification. This strategy alleviates the computational load on the central node, reduces data transmission across the network, and enhances user privacy. We evaluated the proposed distributed framework using our publicly available multi-sensor HAR dataset and compared its performance against a centralized neural network trained on the same data. The results demonstrate that the distributed approach achieves comparable or superior classification accuracy while significantly lowering inference latency and energy consumption. These findings underscore the promise of distributed intelligence in wearable systems for real-time and energy-efficient human activity monitoring. Full article
(This article belongs to the Special Issue Wearable Sensors for Human Position, Attitude and Motion Tracking)
Show Figures

Figure 1

21 pages, 5260 KB  
Article
LapECNet: Laplacian Pyramid Networks for Image Exposure Correction
by Yongchang Li and Jing Jiang
Appl. Sci. 2025, 15(16), 8840; https://doi.org/10.3390/app15168840 - 11 Aug 2025
Viewed by 583
Abstract
Images captured under complex lighting conditions often suffer from local under/ overexposure and detail loss. Existing methods typically process illumination and texture information in a mixed manner, making it difficult to simultaneously achieve precise exposure adjustment and preservation of detail. To address this [...] Read more.
Images captured under complex lighting conditions often suffer from local under/ overexposure and detail loss. Existing methods typically process illumination and texture information in a mixed manner, making it difficult to simultaneously achieve precise exposure adjustment and preservation of detail. To address this challenge, we propose LapECNet, an enhanced Laplacian pyramid network architecture for image exposure correction and detail reconstruction. Specifically, it decomposes the input image into different frequency bands of a Laplacian pyramid, enabling separate handling of illumination adjustment and detail enhancement. The framework first decomposes the image into three feature levels. At each level, we introduce a feature enhancement module that adaptively processes image features across different frequency bands using spatial and channel attention mechanisms. After enhancing the features at each level, we further propose a dynamic aggregation module that learns adaptive weights to hierarchically fuse multi-scale features, achieving context-aware recombination of the enhanced features. Extensive experiments with public benchmarks on the MSEC dataset demonstrated that our method gave improvements of 15.4% in PSNR and 7.2% in SSIM over previous methods. On the LCDP dataset, our method demonstrated improvements of 7.2% in PSNR and 13.9% in SSIM over previous methods. Full article
(This article belongs to the Special Issue Recent Advances in Parallel Computing and Big Data)
Show Figures

Figure 1

24 pages, 10165 KB  
Article
MDNet: A Differential-Perception-Enhanced Multi-Scale Attention Network for Remote Sensing Image Change Detection
by Jingwen Li, Mengke Zhao, Xiaoru Wei, Yusen Shao, Qingyang Wang and Zhenxin Yang
Appl. Sci. 2025, 15(16), 8794; https://doi.org/10.3390/app15168794 - 8 Aug 2025
Viewed by 506
Abstract
As a core task in remote sensing image processing, change detection plays a vital role in dynamic surface monitoring for environmental management, urban planning, and agricultural supervision. However, existing methods often suffer from missed detection of small targets and pseudo-change interference, stemming from [...] Read more.
As a core task in remote sensing image processing, change detection plays a vital role in dynamic surface monitoring for environmental management, urban planning, and agricultural supervision. However, existing methods often suffer from missed detection of small targets and pseudo-change interference, stemming from insufficient modeling of multi-scale feature coupling and spatio-temporal differences due to factors such as background complexity and appearance variations. To this end, we propose a Differential-Perception-Enhanced Multi-Scale Attention Network for Remote Sensing Image Change Detection (MDNet), an optimized framework integrating multi-scale feature extraction, cross-scale aggregation, difference enhancement, and context modeling. Through the parallel collaborative mechanism of the designed Multi-Scale Feature Extraction Module (EMF) and Cross-Scale Adjacent Semantic Information Aggregation Module (CASAM), multi-scale semantic learning is strengthened, enabling fine-grained modeling of change targets of different sizes and improving small-target-detection capability. Meanwhile, the Differential-Perception-Enhanced Module (DPEM) and Transformer structure are introduced for global–local coupled modeling of spatio-temporal differences. They enhance spectral–structural differences to form discriminative features, use self-attention to capture long-range dependencies, and construct multi-level features from local differences to global associations, significantly suppressing pseudo-change interference. Experimental results show that, on three public datasets (LEVIR-CD, WHU-CD, and CLCD), the proposed model exhibits superior detection performance and robustness in terms of quantitative metrics and qualitative analysis compared with existing advanced methods. Full article
Show Figures

Figure 1

26 pages, 4899 KB  
Article
SDDGRNets: Level–Level Semantically Decomposed Dynamic Graph Reasoning Network for Remote Sensing Semantic Change Detection
by Zhuli Xie, Gang Wan, Yunxia Yin, Guangde Sun and Dongdong Bu
Remote Sens. 2025, 17(15), 2641; https://doi.org/10.3390/rs17152641 - 30 Jul 2025
Cited by 1 | Viewed by 881
Abstract
Semantic change detection technology based on remote sensing data holds significant importance for urban and rural planning decisions and the monitoring of ground objects. However, simple convolutional networks are limited by the receptive field, cannot fully capture detailed semantic information, and cannot effectively [...] Read more.
Semantic change detection technology based on remote sensing data holds significant importance for urban and rural planning decisions and the monitoring of ground objects. However, simple convolutional networks are limited by the receptive field, cannot fully capture detailed semantic information, and cannot effectively perceive subtle changes and constrain edge information. Therefore, a dynamic graph reasoning network with layer-by-layer semantic decomposition for semantic change detection in remote sensing data is developed in response to these limitations. This network aims to understand and perceive subtle changes in the semantic content of remote sensing data from the image pixel level. On the one hand, low-level semantic information and cross-scale spatial local feature details are obtained by dividing subspaces and decomposing convolutional layers with significant kernel expansion. Semantic selection aggregation is used to enhance the characterization of global and contextual semantics. Meanwhile, the initial multi-scale local spatial semantics are screened and re-aggregated to improve the characterization of significant features. On the other hand, at the encoding stage, the weight-sharing approach is employed to align the positions of ground objects in the change area and generate more comprehensive encoding information. Meanwhile, the dynamic graph reasoning module is used to decode the encoded semantics layer by layer to investigate the hidden associations between pixels in the neighborhood. In addition, the edge constraint module is used to constrain boundary pixels and reduce semantic ambiguity. The weighted loss function supervises and optimizes each module separately to enable the network to acquire the optimal feature representation. Finally, experimental results on three open-source datasets, such as SECOND, HIUSD, and Landsat-SCD, show that the proposed method achieves good performance, with an SCD score reaching 35.65%, 98.33%, and 67.29%, respectively. Full article
Show Figures

Graphical abstract

19 pages, 3397 KB  
Article
FEMNet: A Feature-Enriched Mamba Network for Cloud Detection in Remote Sensing Imagery
by Weixing Liu, Bin Luo, Jun Liu, Han Nie and Xin Su
Remote Sens. 2025, 17(15), 2639; https://doi.org/10.3390/rs17152639 - 30 Jul 2025
Viewed by 614
Abstract
Accurate and efficient cloud detection is critical for maintaining the usability of optical remote sensing imagery, particularly in large-scale Earth observation systems. In this study, we propose FEMNet, a lightweight dual-branch network that combines state space modeling with convolutional encoding for multi-class cloud [...] Read more.
Accurate and efficient cloud detection is critical for maintaining the usability of optical remote sensing imagery, particularly in large-scale Earth observation systems. In this study, we propose FEMNet, a lightweight dual-branch network that combines state space modeling with convolutional encoding for multi-class cloud segmentation. The Mamba-based encoder captures long-range semantic dependencies with linear complexity, while a parallel CNN path preserves spatial detail. To address the semantic inconsistency across feature hierarchies and limited context perception in decoding, we introduce the following two targeted modules: a cross-stage semantic enhancement (CSSE) block that adaptively aligns low- and high-level features, and a multi-scale context aggregation (MSCA) block that integrates contextual cues at multiple resolutions. Extensive experiments on five benchmark datasets demonstrate that FEMNet achieves state-of-the-art performance across both binary and multi-class settings, while requiring only 4.4M parameters and 1.3G multiply–accumulate operations. These results highlight FEMNet’s suitability for resource-efficient deployment in real-world remote sensing applications. Full article
Show Figures

Figure 1

Back to TopTop