MDPI - Publisher of Open Access Journals

22 pages, 24173 KiB

Open AccessArticle

ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing

by Hao Zhou, Yalun Wang, Wanting Peng, Xin Guan and Tao Tao

Remote Sens. 2025, 17(15), 2664; https://doi.org/10.3390/rs17152664 - 1 Aug 2025

Viewed by 154

Abstract

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm [...] Read more.

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm for vision tasks, showing great promise due to their computational efficiency and robust capacity to model global dependencies. However, most existing learning-based dehazing methods lack physical interpretability, leading to weak generalization. Furthermore, they typically rely on spatial features while neglecting crucial frequency domain information, resulting in incomplete feature representation. To address these challenges, we propose ScaleViM-PDD, a novel network that enhances an SSM backbone with two key innovations: a Multi-scale EfficientViM with Physical Decoupling (ScaleViM-P) module and a Dual-Domain Fusion (DD Fusion) module. The ScaleViM-P module synergistically integrates a Physical Decoupling block within a Multi-scale EfficientViM architecture. This design enables the network to mitigate haze interference in a physically grounded manner at each representational scale while simultaneously capturing global contextual information to adaptively handle complex haze distributions. To further address detail loss, the DD Fusion module replaces conventional skip connections by incorporating a novel Frequency Domain Module (FDM) alongside channel and position attention. This allows for a more effective fusion of spatial and frequency features, significantly improving the recovery of fine-grained details, including color and texture information. Extensive experiments on nine publicly available remote sensing datasets demonstrate that ScaleViM-PDD consistently surpasses state-of-the-art baselines in both qualitative and quantitative evaluations, highlighting its strong generalization ability. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

25 pages, 5445 KiB

Open AccessArticle

HyperspectralMamba: A Novel State Space Model Architecture for Hyperspectral Image Classification

by Jianshang Liao and Liguo Wang

Remote Sens. 2025, 17(15), 2577; https://doi.org/10.3390/rs17152577 - 24 Jul 2025

Viewed by 287

Abstract

Hyperspectral image classification faces challenges with high-dimensional spectral data and complex dependencies between bands. This paper proposes HyperspectralMamba, a novel architecture for hyperspectral image classification that integrates state space modeling with adaptive recalibration mechanisms. The method addresses limitations in existing techniques through three [...] Read more.

Hyperspectral image classification faces challenges with high-dimensional spectral data and complex dependencies between bands. This paper proposes HyperspectralMamba, a novel architecture for hyperspectral image classification that integrates state space modeling with adaptive recalibration mechanisms. The method addresses limitations in existing techniques through three key innovations: (1) a novel dual-stream architecture that combines SSM global modeling with parallel convolutional local feature extraction, distinguishing our approach from existing single-stream SSM methods; (2) a band-adaptive feature recalibration mechanism specifically designed for hyperspectral data that adaptively adjusts the importance of different spectral band features; and (3) an effective feature fusion strategy that integrates global and local features through residual connections. Experimental results on three benchmark datasets—Indian Pines, Pavia University, and Salinas Valley—demonstrate that the proposed method achieves overall accuracies of 95.31%, 98.60%, and 96.40%, respectively, significantly outperforming existing convolutional neural networks, attention-enhanced networks, and Transformer methods. HyperspectralMamba demonstrates an exceptional performance in small-sample class recognition and distinguishing spectrally similar terrain, while maintaining lower computational complexity, providing a new technical approach for high-precision hyperspectral image classification. Full article

(This article belongs to the Special Issue Recent Advances in the Processing of Hyperspectral Images (Second Edition))

► Show Figures

Figure 1

35 pages, 7685 KiB

Open AccessArticle

Spatial and Spectral Structure-Aware Mamba Network for Hyperspectral Image Classification

by Jie Zhang, Ming Sun and Sheng Chang

Remote Sens. 2025, 17(14), 2489; https://doi.org/10.3390/rs17142489 - 17 Jul 2025

Viewed by 280

Abstract

Recently, a network based on selective state space models (SSMs), Mamba, has emerged as a research focus in hyperspectral image (HSI) classification due to its linear computational complexity and strong long-range dependency modeling capability. Originally designed for 1D causal sequence modeling, Mamba is [...] Read more.

Recently, a network based on selective state space models (SSMs), Mamba, has emerged as a research focus in hyperspectral image (HSI) classification due to its linear computational complexity and strong long-range dependency modeling capability. Originally designed for 1D causal sequence modeling, Mamba is challenging for HSI tasks that require simultaneous awareness of spatial and spectral structures. Current Mamba-based HSI classification methods typically convert spatial structures into 1D sequences and employ various scanning patterns to capture spatial dependencies. However, these approaches inevitably disrupt spatial structures, leading to ineffective modeling of complex spatial relationships and increased computational costs due to elongated scanning paths. Moreover, the lack of neighborhood spectral information utilization fails to mitigate the impact of spatial variability on classification performance. To address these limitations, we propose a novel model, Dual-Aware Discriminative Fusion Mamba (DADFMamba), which is simultaneously aware of spatial-spectral structures and adaptively integrates discriminative features. Specifically, we design a Spatial-Structure-Aware Fusion Module (SSAFM) to directly establish spatial neighborhood connectivity in the state space, preserving structural integrity. Then, we introduce a Spectral-Neighbor-Group Fusion Module (SNGFM). It enhances target spectral features by leveraging neighborhood spectral information before partitioning them into multiple spectral groups to explore relations across these groups. Finally, we introduce a Feature Fusion Discriminator (FFD) to discriminate the importance of spatial and spectral features, enabling adaptive feature fusion. Extensive experiments on four benchmark HSI datasets demonstrate that DADFMamba outperforms state-of-the-art deep learning models in classification accuracy while maintaining low computational costs and parameter efficiency. Notably, it achieves superior performance with only 30 training samples per class, highlighting its data efficiency. Our study reveals the great potential of Mamba in HSI classification and provides valuable insights for future research. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Graphical abstract

36 pages, 25361 KiB

Open AccessArticle

Remote Sensing Image Compression via Wavelet-Guided Local Structure Decoupling and Channel–Spatial State Modeling

by Jiahui Liu, Lili Zhang and Xianjun Wang

Remote Sens. 2025, 17(14), 2419; https://doi.org/10.3390/rs17142419 - 12 Jul 2025

Viewed by 464

Abstract

As the resolution and data volume of remote sensing imagery continue to grow, achieving efficient compression without sacrificing reconstruction quality remains a major challenge, given that traditional handcrafted codecs often fail to balance rate-distortion performance and computational complexity, while deep learning-based approaches offer [...] Read more.

As the resolution and data volume of remote sensing imagery continue to grow, achieving efficient compression without sacrificing reconstruction quality remains a major challenge, given that traditional handcrafted codecs often fail to balance rate-distortion performance and computational complexity, while deep learning-based approaches offer superior representational capacity. However, challenges remain in achieving a balance between fine-detail adaptation and computational efficiency. Mamba, a state–space model (SSM)-based architecture, offers linear-time complexity and excels at capturing long-range dependencies in sequences. It has been adopted in remote sensing compression tasks to model long-distance dependencies between pixels. However, despite its effectiveness in global context aggregation, Mamba’s uniform bidirectional scanning is insufficient for capturing high-frequency structures such as edges and textures. Moreover, existing visual state–space (VSS) models built upon Mamba typically treat all channels equally and lack mechanisms to dynamically focus on semantically salient spatial regions. To address these issues, we present an innovative architecture for distant sensing image compression, called the Multi-scale Channel Global Mamba Network (MGMNet). MGMNet integrates a spatial–channel dynamic weighting mechanism into the Mamba architecture, enhancing global semantic modeling while selectively emphasizing informative features. It comprises two key modules. The Wavelet Transform-guided Local Structure Decoupling (WTLS) module applies multi-scale wavelet decomposition to disentangle and separately encode low- and high-frequency components, enabling efficient parallel modeling of global contours and local textures. The Channel–Global Information Modeling (CGIM) module enhances conventional VSS by introducing a dual-path attention strategy that reweights spatial and channel information, improving the modeling of long-range dependencies and edge structures. We conducted extensive evaluations on three distinct remote sensing datasets to assess the MGMNet. The results of the investigations revealed that MGMNet outperforms the current SOTA models across various performance metrics. Full article

(This article belongs to the Special Issue New Insights in Remote Sensing Image Interpretation with Deep Learning)

► Show Figures

Figure 1

20 pages, 90560 KiB

Open AccessArticle

A Hybrid MIL Approach Leveraging Convolution and State-Space Model for Whole-Slide Image Cancer Subtyping

by Dehui Bi and Yuqi Zhang

Mathematics 2025, 13(13), 2178; https://doi.org/10.3390/math13132178 - 3 Jul 2025

Viewed by 266

Abstract

Precise identification of cancer subtypes from whole slide images (WSIs) is pivotal in tailoring patient-specific therapies. Under the weakly supervised multiple instance learning (MIL) paradigm, existing techniques frequently fall short in simultaneously capturing local tissue textures and long-range contextual relationships. To address these [...] Read more.

Precise identification of cancer subtypes from whole slide images (WSIs) is pivotal in tailoring patient-specific therapies. Under the weakly supervised multiple instance learning (MIL) paradigm, existing techniques frequently fall short in simultaneously capturing local tissue textures and long-range contextual relationships. To address these challenges, we introduce ConvMixerSSM, a hybrid model that integrates a ConvMixer block for local spatial representation, a state space model (SSM) block for capturing long-range dependencies, and a feature-gated block to enhance informative feature selection. The model was evaluated on the TCGA-NSCLC dataset and the CAMELYON16 dataset for cancer subtyping tasks. Extensive experiments, including comparisons with state-of-the-art MIL methods and ablation studies, were conducted to assess the contribution of each component. ConvMixerSSM achieved an AUC of 97.83%, an ACC of 91.82%, and an F1 score of 91.18%, outperforming existing MIL baselines on the TCGA-NSCLC dataset. The ablation study revealed that each block contributed positively to performance, with the full model showing the most balanced and superior results. Moreover, our visualization results further confirm that ConvMixerSSM can effectively identify tumor regions within WSIs, providing model interpretability and clinical relevance. These findings suggest that ConvMixerSSM has strong potential for advancing computational pathology applications in clinical decision-making. Full article

(This article belongs to the Special Issue Computational Perspectives on Artificial Intelligence Drive in Medical Decision-Making)

► Show Figures

Figure 1

23 pages, 3677 KiB

Open AccessArticle

HG-Mamba: A Hybrid Geometry-Aware Bidirectional Mamba Network for Hyperspectral Image Classification

by Xiaofei Yang, Jiafeng Yang, Lin Li, Suihua Xue, Haotian Shi, Haojin Tang and Xiaohui Huang

Remote Sens. 2025, 17(13), 2234; https://doi.org/10.3390/rs17132234 - 29 Jun 2025

Viewed by 469

Abstract

Deep learning has demonstrated significant success in hyperspectral image (HSI) classification by effectively leveraging spatial–spectral feature learning. However, current approaches encounter three challenges: (1) high spectral redundancy and the presence of noisy bands, which impair the extraction of discriminative features; (2) limited spatial [...] Read more.

Deep learning has demonstrated significant success in hyperspectral image (HSI) classification by effectively leveraging spatial–spectral feature learning. However, current approaches encounter three challenges: (1) high spectral redundancy and the presence of noisy bands, which impair the extraction of discriminative features; (2) limited spatial receptive fields inherent in convolutional operations; and (3) unidirectional context modeling that inadequately captures bidirectional dependencies in non-causal HSI data. To address these challenges, this paper proposes HG-Mamba, a novel hybrid geometry-aware bidirectional Mamba network for HSI classification. The proposed HG-Mamba synergistically integrates convolutional operations, geometry-aware filtering, and bidirectional state-space models (SSMs) to achieve robust spectral–spatial representation learning. The proposed framework comprises two stages. The first stage, termed spectral compression and discrimination enhancement, employs multi-scale spectral convolutions alongside a spectral bidirectional Mamba (SeBM) module to suppress redundant bands while modeling long-range spectral dependencies. The second stage, designated spatial structure perception and context modeling, incorporates a Gaussian Distance Decay (GDD) mechanism to adaptively reweight spatial neighbors based on geometric distances, coupled with a spatial bidirectional Mamba (SaBM) module for comprehensive global context modeling. The GDD mechanism facilitates boundary-aware feature extraction by prioritizing spatially proximate pixels, while the bidirectional SSMs mitigate unidirectional bias through parallel forward–backward state transitions. Extensiveexperiments on the Indian Pines, Houston2013, and WHU-Hi-LongKou datasets demonstrate the superior performance of HG-Mamba, achieving overall accuracies of 94.91%, 98.41%, and 98.67%, respectively. Full article

(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)

► Show Figures

Graphical abstract

24 pages, 4434 KiB

Open AccessArticle

MRFP-Mamba: Multi-Receptive Field Parallel Mamba for Hyperspectral Image Classification

by Xiaofei Yang, Lin Li, Suihua Xue, Sihuan Li, Wanjun Yang, Haojin Tang and Xiaohui Huang

Remote Sens. 2025, 17(13), 2208; https://doi.org/10.3390/rs17132208 - 26 Jun 2025

Viewed by 512

Abstract

Deep learning has achieved remarkable success in hyperspectral image (HSI) classification, attributed to its powerful feature extraction capabilities. However, existing methods face several challenges: Convolutional Neural Networks (CNNs) are limited in modeling long-range spectral dependencies because of their limited receptive fields; Transformers are [...] Read more.

Deep learning has achieved remarkable success in hyperspectral image (HSI) classification, attributed to its powerful feature extraction capabilities. However, existing methods face several challenges: Convolutional Neural Networks (CNNs) are limited in modeling long-range spectral dependencies because of their limited receptive fields; Transformers are constrained by their quadratic computational complexity; and Mamba-based methods fail to fully exploit spatial–spectral interactions when handling high-dimensional HSI data. To address these limitations, we propose MRFP-Mamba, a novel Multi-Receptive-Field Parallel Mamba architecture that integrates hierarchical spatial feature extraction with efficient modeling of spatial–spectral dependencies. The proposed MRFP-Mamba introduces two key innovation modules: (1) A multi-receptive-field convolutional module employing parallel

1 \times 1

,

3 \times 3

,

5 \times 5

, and

7 \times 7

kernels to capture fine-to-coarse spatial features, thereby improving discriminability for multi-scale objects; and (2) a parameter-optimized Vision Mamba branch that models global spatial–spectral relationships through structured state space mechanisms. Experimental results demonstrate that the proposed MRFP-Mamba consistently surpasses existing CNN-, Transformer-, and state space model (SSM)-based approaches across four widely used hyperspectral image (HSI) benchmark datasets: PaviaU, Indian Pines, Houston 2013, and WHU-Hi-LongKou. Compared with MambaHSI, our MRFP-Mamba achieves improvements in Overall Accuracy (OA) by 0.69%, 0.30%, 0.40%, and 0.97%, respectively, thereby validating its superior classification capability and robustness. Full article

(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)

► Show Figures

Figure 1

28 pages, 114336 KiB

Open AccessArticle

Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images

by Qiyuan Zhang, Xiaodan Zhang, Chen Quan, Tong Zhao, Wei Huo and Yuanchen Huang

Remote Sens. 2025, 17(13), 2135; https://doi.org/10.3390/rs17132135 - 21 Jun 2025

Viewed by 598

Abstract

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers [...] Read more.

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers are hard to scale due to quadratic computational complexity and high memory consumption. To address these challenges, this study introduces an end-to-end remote sensing image spatiotemporal fusion approach based on the Mamba architecture (Mamba-spatiotemporal fusion model, Mamba-STFM), marking the first application of Mamba in this domain and presenting a novel paradigm for spatiotemporal fusion model design. Mamba-STFM consists of a feature extraction encoder and a feature fusion decoder. At the core of the encoder is the visual state space-FuseCore-AttNet block (VSS-FCAN block), which deeply integrates linear complexity cross-scan global perception with a channel attention mechanism, significantly reducing quadratic-level computation and memory overhead while improving inference throughput through parallel scanning and kernel fusion techniques. The decoder’s core is the spatiotemporal mixture-of-experts fusion module (STF-MoE block), composed of our novel spatial expert and temporal expert modules. The spatial expert adaptively adjusts channel weights to optimize spatial feature representation, enabling precise alignment and fusion of multi-resolution images, while the temporal expert incorporates a temporal squeeze-and-excitation mechanism and selective state space model (SSM) techniques to efficiently capture short-range temporal dependencies, maintain linear sequence modeling complexity, and further enhance overall spatiotemporal fusion throughput. Extensive experiments on public datasets demonstrate that Mamba-STFM outperforms existing methods in fusion quality; ablation studies validate the effectiveness of each core module; and efficiency analyses and application comparisons further confirm the model’s superior performance. Full article

► Show Figures

Figure 1

19 pages, 6772 KiB

Open AccessArticle

A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization

by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng

Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025

Viewed by 978

Abstract

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article

► Show Figures

Figure 1

23 pages, 10182 KiB

Open AccessArticle

HyperSMamba: A Lightweight Mamba for Efficient Hyperspectral Image Classification

by Mengyuan Sun, Liejun Wang, Shaochen Jiang, Shuli Cheng and Lihan Tang

Remote Sens. 2025, 17(12), 2008; https://doi.org/10.3390/rs17122008 - 11 Jun 2025

Viewed by 650

Abstract

Deep learning has recently achieved remarkable progress in hyperspectral image (HSI) classification. Among these advancements, the Transformer-based models have gained considerable attention due to their ability to establish long-range dependencies. However, the quadratic computational complexity of the self-attention mechanism limits its application in [...] Read more.

Deep learning has recently achieved remarkable progress in hyperspectral image (HSI) classification. Among these advancements, the Transformer-based models have gained considerable attention due to their ability to establish long-range dependencies. However, the quadratic computational complexity of the self-attention mechanism limits its application in hyperspectral image classification (HSIC). Recently, the Mamba architecture has shown outstanding performance in 1D sequence modeling tasks owing to its lightweight linear sequence operations and efficient parallel scanning capabilities. Nevertheless, its application in HSI classification still faces challenges. Most existing Mamba-based approaches adopt various selective scanning strategies for HSI serialization, ensuring the adjacency of scanning sequences to enhance spatial continuity. However, these methods lead to substantially increased computational overhead. To overcome these challenges, this study proposes the Hyperspectral Spatial Mamba (HyperSMamba) model for HSIC, aiming to reduce computational complexity while improving classification performance. The suggested framework consists of the following key components: (1) a Multi-Scale Spatial Mamba (MS-Mamba) encoder, which refines the state-space model (SSM) computation by incorporating a Multi-Scale State Fusion Module (MSFM) after the state transition equations of original SSMs. This module aggregates adjacent state representations to reinforce spatial dependencies among local features; (2) our proposed Adaptive Fusion Attention Module (AFAttention) to dynamically fuse bidirectional Mamba outputs for optimizing feature representation. Experiments were performed on three HSI datasets, and the findings demonstrate that HyperSMamba attains overall accuracy of 94.86%, 97.72%, and 97.38% on the Indian Pines, Pavia University, and Salinas datasets, while maintaining low computational complexity. These results confirm the model’s effectiveness and potential for practical application in HSIC tasks. Full article

(This article belongs to the Special Issue Multi-Task Remote Sensing Image Analysis: Classification, Segmentation, and Change Detection)

► Show Figures

Figure 1

21 pages, 2277 KiB

Open AccessArticle

Visual Information Decoding Based on State-Space Model with Neural Pathways Incorporation

by Haidong Wang, Jianhua Zhang, Qia Shan, Pengfei Xiao and Ao Liu

Electronics 2025, 14(11), 2245; https://doi.org/10.3390/electronics14112245 - 30 May 2025

Viewed by 633

Abstract

In contemporary visual decoding models, traditional neural network-based methods have made some advancements; however, their performance in addressing complex visual tasks remains constrained. This limitation is primarily due to the restrictions of local receptive fields and their inability to effectively capture visual information, [...] Read more.

In contemporary visual decoding models, traditional neural network-based methods have made some advancements; however, their performance in addressing complex visual tasks remains constrained. This limitation is primarily due to the restrictions of local receptive fields and their inability to effectively capture visual information, resulting in the loss of essential contextual details. Visual processing in the brain initiates in the retina, where information is transmitted via the optic nerve to the lateral geniculate nucleus (LGN) and subsequently progresses along the ventral pathway for layered processing. Unfortunately, this natural process is not fully represented in current decoding models. In this paper, we propose a state-space-based visual information decoding model, SSM-VIDM, which enhances performance in complex visual tasks by aligning with the brain’s visual processing mechanisms. This approach overcomes the limitations of traditional convolutional neural networks (CNNs) regarding local receptive fields, thereby preserving contextual information in visual tasks. Experimental results demonstrate that the state-space-based visual information decoding model proposed in this study outperforms traditional decoding models in terms of performance and exhibits higher accuracy in image recognition tasks. Our research findings suggest that the visual decoding model, which is based on the lateral geniculate nucleus and the ventral pathway, can enhance decoding performance. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

► Show Figures

Figure 1

14 pages, 3525 KiB

Open AccessArticle

MRD: A Linear-Complexity Encoder for Real-Time Vehicle Detection

by Kaijie Li and Xiaoci Huang

World Electr. Veh. J. 2025, 16(6), 307; https://doi.org/10.3390/wevj16060307 - 30 May 2025

Viewed by 606

Abstract

Vehicle detection algorithms constitute a fundamental pillar in intelligent driving systems and smart transportation infrastructure. Nevertheless, the inherent complexity and dynamic variability of traffic scenarios present substantial technical barriers to robust vehicle detection. While visual transformer-based detection architectures have demonstrated performance breakthroughs through [...] Read more.

Vehicle detection algorithms constitute a fundamental pillar in intelligent driving systems and smart transportation infrastructure. Nevertheless, the inherent complexity and dynamic variability of traffic scenarios present substantial technical barriers to robust vehicle detection. While visual transformer-based detection architectures have demonstrated performance breakthroughs through enhanced perceptual capabilities, establishing themselves as the dominant paradigm in this domain, their practical implementation faces critical challenges due to the quadratic computational complexity inherent in the self-attention mechanism, which imposes prohibitive computational overhead. To address these limitations, this study introduces Mamba RT-DETR (MRD), an optimized architecture featuring three principal innovations: (1) We devise an efficient vehicle detection Mamba (EVDMamba) network that strategically integrates a linear-complexity state space model (SSM) to substantially mitigate computational overhead while preserving feature extraction efficacy. (2) To counteract the constrained receptive fields and suboptimal spatial localization associated with conventional SSM sequence modeling, we implement a multi-branch collaborative learning framework that synergistically optimizes channel dimension processing, thereby augmenting the model’s capacity to capture critical spatial dependencies. (3) Comprehensive evaluations on the BDD100K benchmark demonstrate that MRD architecture achieves a 3.1% enhancement in mean average precision (mAP) relative to state-of-the-art RT-DETR variants, while concurrently reducing parameter count by 55.7%—a dual optimization of accuracy and efficiency. Full article

(This article belongs to the Special Issue Recent Advances in Intelligent Vehicle)

► Show Figures

Figure 1

22 pages, 2133 KiB

Open AccessArticle

Classification of Whole-Slide Pathology Images Based on State Space Models and Graph Neural Networks

by Feng Ding, Chengfei Cai, Jun Li, Mingxin Liu, Yiping Jiao, Zhengcan Wu and Jun Xu

Electronics 2025, 14(10), 2056; https://doi.org/10.3390/electronics14102056 - 19 May 2025

Viewed by 731

Abstract

Whole-slide images (WSIs) pose significant analytical challenges due to their large data scale and complexity. Multiple instance learning (MIL) has emerged as an effective solution for WSI classification, but existing frameworks often lack flexibility in feature integration and underutilize sequential information. To address [...] Read more.

Whole-slide images (WSIs) pose significant analytical challenges due to their large data scale and complexity. Multiple instance learning (MIL) has emerged as an effective solution for WSI classification, but existing frameworks often lack flexibility in feature integration and underutilize sequential information. To address these limitations, this work proposes a novel MIL framework: Dynamic Graph and State Space Model-Based MIL (DG-SSM-MIL). DG-SSM-MIL combines graph neural networks and selective state space models, leveraging the former’s ability to extract local and spatial features and the latter’s advantage in comprehensively understanding long-sequence instances. This enhances the model’s performance in diverse instance classification, improves its capability to handle long-sequence data, and increases the precision and scalability of feature fusion. We propose the Dynamic Graph and State Space Model (DynGraph-SSM) module, which aggregates local and spatial information of image patches through directed graphs and learns global feature representations using the Mamba model. Additionally, the directed graph structure alleviates the unidirectional scanning limitation of Mamba and enhances its ability to process pathological images with dispersed lesion distributions. DG-SSM-MIL demonstrates superior performance in classification tasks compared to other models. We validate the effectiveness of the proposed method on features extracted from two pretrained models across four public medical image datasets: BRACS, TCGA-NSCLC, TCGA-RCC, and CAMELYON16. Experimental results demonstrate that DG-SSM-MIL consistently outperforms existing MIL methods across four public datasets. For example, when using ResNet-50 features, our model achieves the highest AUCs of 0.936, 0.785, 0.879, and 0.957 on TCGA-NSCLC, BRACS, CAMELYON16, and TCGA-RCC, respectively. Similarly, with UNI features, DG-SSM-MIL reaches AUCs of 0.968, 0.846, 0.993, and 0.990, surpassing all baselines. These results confirm the effectiveness and generalizability of our approach in diverse WSI classification tasks. Full article

(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)

► Show Figures

Figure 1

25 pages, 2225 KiB

Open AccessFeature PaperArticle

MambaLLM: Integrating Macro-Index and Micro-Stock Data for Enhanced Stock Price Prediction

by Jin Yan and Yuling Huang

Mathematics 2025, 13(10), 1599; https://doi.org/10.3390/math13101599 - 13 May 2025

Viewed by 1504

Abstract

Accurate stock price prediction requires the integration of heterogeneous data streams, yet conventional techniques struggle to simultaneously leverage fine-grained micro-stock features and broader macroeconomic indicators. To address this gap, we propose MambaLLM, a novel framework that fuses macro-index and micro-stock inputs through the [...] Read more.

Accurate stock price prediction requires the integration of heterogeneous data streams, yet conventional techniques struggle to simultaneously leverage fine-grained micro-stock features and broader macroeconomic indicators. To address this gap, we propose MambaLLM, a novel framework that fuses macro-index and micro-stock inputs through the synergistic use of state-space models (SSMs) and large language models (LLMs). Our two-branch architecture comprises (i) Micro-Stock Encoder, a Mamba-based temporal encoder for processing granular stock-level data (prices, volumes, and technical indicators), and (ii) Macro-Index Analyzer, an LLM module—employing DeepSeek R1 7B distillation—capable of interpreting market-level index trends (e.g., S&P 500) to produce textual summaries. These summaries are then distilled into compact embeddings via FinBERT. By merging these multi-scale representations through a concatenation mechanism and subsequently refining them with multi-layer perceptrons (MLPs), MambaLLM dynamically captures both asset-specific price behavior and systemic market fluctuations. Extensive experiments on six major U.S. stocks (AAPL, AMZN, MSFT, TSLA, GOOGL, and META) reveal that MambaLLM delivers up to a 28.50% reduction in RMSE compared with suboptimal models, surpassing traditional recurrent neural networks and MAMBA-based baselines under volatile market conditions. This marked performance gain highlights the framework’s unique ability to merge structured financial time series with semantically rich macroeconomic narratives. Altogether, our findings underscore the scalability and adaptability of MambaLLM, offering a powerful, next-generation tool for financial forecasting and risk management. Full article

(This article belongs to the Special Issue Applied Mathematics in Data Science and High-Performance Computing)

► Show Figures

Figure 1

13 pages, 5336 KiB

Open AccessArticle

SnowMamba: Achieving More Precise Snow Removal with Mamba

by Guoqiang Wang, Yanyun Zhou, Fei Shi and Zhenhong Jia

Appl. Sci. 2025, 15(10), 5404; https://doi.org/10.3390/app15105404 - 12 May 2025

Viewed by 423

Abstract

Due to the diversity and semi-transparency of snowflakes, accurately locating and reconstructing background information during image restoration poses a significant challenge. Snowflakes obscure image details, thereby affecting downstream tasks such as object recognition and image segmentation. Although Convolutional Neural Networks (CNNs) and Transformers [...] Read more.

Due to the diversity and semi-transparency of snowflakes, accurately locating and reconstructing background information during image restoration poses a significant challenge. Snowflakes obscure image details, thereby affecting downstream tasks such as object recognition and image segmentation. Although Convolutional Neural Networks (CNNs) and Transformers have achieved promising results in snow removal through local or global feature processing, residual snowflakes or shadows persist in restored images. Inspired by the recent popularity of State Space Models (SSMs), this paper proposes a Mamba-based multi-scale desnowing network (SnowMamba), which effectively models the long-range dependencies of snowflakes. This enables the precise localization and removal of snow particles, addressing the issue of residual snowflakes and shadows in images. Specifically, we design a four-stage encoder–decoder network that incorporates Snow Caption Mamba (SCM) and SE modules to extract comprehensive snowflake and background information. The extracted multi-scale snow and background features are then fed into the proposed Multi-Scale Residual Interaction Network (MRNet) to learn and reconstruct clear, snow-free background images. Extensive experiments demonstrate that the proposed method outperforms other mainstream desnowing approaches in both qualitative and quantitative evaluations on three standard image desnowing datasets. Full article

► Show Figures

Figure 1

Search Results (49)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (49)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI