Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (389)

Search Parameters:
Keywords = Mamba

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 10017 KB  
Article
U-H-Mamba: An Uncertainty-Aware Hierarchical State-Space Model for Lithium-Ion Battery Remaining Useful Life Prediction Using Hybrid Laboratory and Real-World Datasets
by Zhihong Wen, Xiangpeng Liu, Wenshu Niu, Hui Zhang and Yuhua Cheng
Energies 2026, 19(2), 414; https://doi.org/10.3390/en19020414 - 14 Jan 2026
Viewed by 152
Abstract
Accurate prognosis of the remaining useful life (RUL) for lithium-ion batteries is critical for mitigating range anxiety and ensuring the operational safety of electric vehicles. However, existing data-driven methods often struggle to maintain robustness when transferring from controlled laboratory conditions to complex, sensor-limited, [...] Read more.
Accurate prognosis of the remaining useful life (RUL) for lithium-ion batteries is critical for mitigating range anxiety and ensuring the operational safety of electric vehicles. However, existing data-driven methods often struggle to maintain robustness when transferring from controlled laboratory conditions to complex, sensor-limited, real-world environments. To bridge this gap, this study presents U-H-Mamba, a novel uncertainty-aware hierarchical framework trained on a massive hybrid repository comprising over 146,000 charge–discharge cycles from both laboratory benchmarks and operational electric vehicle datasets. The proposed architecture employs a two-level design to decouple degradation dynamics, where a Multi-scale Temporal Convolutional Network functions as the base encoder to extract fine-grained electrochemical fingerprints, including derived virtual impedance proxies, from high-frequency intra-cycle measurements. Subsequently, an enhanced Pressure-Aware Multi-Head Mamba decoder models the long-range inter-cycle degradation trajectories with linear computational complexity. To guarantee reliability in safety-critical applications, a hybrid uncertainty quantification mechanism integrating Monte Carlo Dropout with Inductive Conformal Prediction is implemented to generate calibrated confidence intervals. Extensive empirical evaluations demonstrate the framework’s superior performance, achieving a RMSE of 3.2 cycles on the NASA dataset and 5.4 cycles on the highly variable NDANEV dataset, thereby outperforming state-of-the-art baselines by 20–40%. Furthermore, SHAP-based interpretability analysis confirms that the model correctly identifies physics-informed pressure dynamics as critical degradation drivers, validating its zero-shot generalization capabilities. With high accuracy and linear scalability, the U-H-Mamba model offers a viable and physically interpretable solution for cloud-based prognostics in large-scale electric vehicle fleets. Full article
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)
Show Figures

Figure 1

24 pages, 6383 KB  
Article
FF-Mamba-YOLO: An SSM-Based Benchmark for Forest Fire Detection in UAV Remote Sensing Images
by Binhua Guo, Dinghui Liu, Zhou Shen and Tiebin Wang
J. Imaging 2026, 12(1), 43; https://doi.org/10.3390/jimaging12010043 - 13 Jan 2026
Viewed by 171
Abstract
Timely and accurate detection of forest fires through unmanned aerial vehicle (UAV) remote sensing target detection technology is of paramount importance. However, multiscale targets and complex environmental interference in UAV remote sensing images pose significant challenges during detection tasks. To address these obstacles, [...] Read more.
Timely and accurate detection of forest fires through unmanned aerial vehicle (UAV) remote sensing target detection technology is of paramount importance. However, multiscale targets and complex environmental interference in UAV remote sensing images pose significant challenges during detection tasks. To address these obstacles, this paper presents FF-Mamba-YOLO, a novel framework based on the principles of Mamba and YOLO (You Only Look Once) that leverages innovative modules and architectures to overcome these limitations. Specifically, we introduce MFEBlock and MFFBlock based on state space models (SSMs) in the backbone and neck parts of the network, respectively, enabling the model to effectively capture global dependencies. Second, we construct CFEBlock, a module that performs feature enhancement before SSM processing, improving local feature processing capabilities. Furthermore, we propose MGBlock, which adopts a dynamic gating mechanism, enhancing the model’s adaptive processing capabilities and robustness. Finally, we enhance the structure of Path Aggregation Feature Pyramid Network (PAFPN) to improve feature fusion quality and introduce DySample to enhance image resolution without significantly increasing computational costs. Experimental results on our self-constructed forest fire image dataset demonstrate that the model achieves 67.4% mAP@50, 36.3% mAP@50:95, and 64.8% precision, outperforming previous state-of-the-art methods. These results highlight the potential of FF-Mamba-YOLO in forest fire monitoring. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

26 pages, 5686 KB  
Article
MAFMamba: A Multi-Scale Adaptive Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Images
by Boxu Li, Xiaobing Yang and Yingjie Fan
Sensors 2026, 26(2), 531; https://doi.org/10.3390/s26020531 - 13 Jan 2026
Viewed by 81
Abstract
With rapid advancements in sub-meter satellite and aerial imaging technologies, high-resolution remote sensing imagery has become a pivotal source for geospatial information acquisition. However, current semantic segmentation models encounter two primary challenges: (1) the inherent trade-off between capturing long-range global context and preserving [...] Read more.
With rapid advancements in sub-meter satellite and aerial imaging technologies, high-resolution remote sensing imagery has become a pivotal source for geospatial information acquisition. However, current semantic segmentation models encounter two primary challenges: (1) the inherent trade-off between capturing long-range global context and preserving precise local structural details—where excessive reliance on downsampled deep semantics often results in blurred boundaries and the loss of small objects and (2) the difficulty in modeling complex scenes with extreme scale variations, where objects of the same category exhibit drastically different morphological features. To address these issues, this paper introduces MAFMamba, a multi-scale adaptive fusion visual Mamba network tailored for high-resolution remote sensing images. To mitigate scale variation, we design a lightweight hybrid encoder incorporating an Adaptive Multi-scale Mamba Block (AMMB) in each stage. Driven by a Multi-scale Adaptive Fusion (MSAF) mechanism, the AMMB dynamically generates pixel-level weights to recalibrate cross-level features, establishing a robust multi-scale representation. Simultaneously, to strictly balance local details and global semantics, we introduce a Global–Local Feature Enhancement Mamba (GLMamba) in the decoder. This module synergistically integrates local fine-grained features extracted by convolutions with global long-range dependencies modeled by the Visual State Space (VSS) layer. Furthermore, we propose a Multi-Scale Cross-Attention Fusion (MSCAF) module to bridge the semantic gap between the encoder’s shallow details and the decoder’s high-level semantics via an efficient cross-attention mechanism. Extensive experiments on the ISPRS Potsdam and Vaihingen datasets demonstrate that MAFMamba surpasses state-of-the-art Convolutional Neural Network (CNN), Transformer, and Mamba-based methods in terms of mIoU and mF1 scores. Notably, it achieves superior accuracy while maintaining linear computational complexity and low memory usage, underscoring its efficiency in complex remote sensing scenarios. Full article
(This article belongs to the Special Issue Intelligent Sensors and Artificial Intelligence in Building)
Show Figures

Figure 1

27 pages, 80350 KB  
Article
Pose-Based Static Sign Language Recognition with Deep Learning for Turkish, Arabic, and American Sign Languages
by Rıdvan Yayla, Hakan Üçgün and Mahmud Abbas
Sensors 2026, 26(2), 524; https://doi.org/10.3390/s26020524 - 13 Jan 2026
Viewed by 134
Abstract
Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark [...] Read more.
Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark extraction, ensuring stable and consistent feature representation across diverse linguistic contexts. Datasets were meticulously constructed from nine public-domain sources (four Arabic, three American, and two Turkish). The final training data comprises curated image datasets, with frames for each language carefully selected from varying angles and distances to ensure high diversity. A comprehensive comparative evaluation was conducted across three state-of-the-art deep learning architectures—ConvNeXt (CNN-based), Swin Transformer (ViT-based), and Vision Mamba (SSM-based)—all applied to identical feature sets. The evaluation demonstrates the superior performance of contemporary vision Transformers and state space models in capturing subtle spatial cues across diverse sign languages. Our approach provides a comparative analysis of model generalization capabilities across three distinct sign languages, offering valuable insights for model selection in pose-based SLR systems. Full article
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))
Show Figures

Figure 1

53 pages, 3354 KB  
Review
Mamba for Remote Sensing: Architectures, Hybrid Paradigms, and Future Directions
by Zefeng Li, Long Zhao, Yihang Lu, Yue Ma and Guoqing Li
Remote Sens. 2026, 18(2), 243; https://doi.org/10.3390/rs18020243 - 12 Jan 2026
Viewed by 118
Abstract
Modern Earth observation combines high spatial resolution, wide swath, and dense temporal sampling, producing image grids and sequences far beyond the regime of standard vision benchmarks. Convolutional networks remain strong baselines but struggle to aggregate kilometre-scale context and long temporal dependencies without heavy [...] Read more.
Modern Earth observation combines high spatial resolution, wide swath, and dense temporal sampling, producing image grids and sequences far beyond the regime of standard vision benchmarks. Convolutional networks remain strong baselines but struggle to aggregate kilometre-scale context and long temporal dependencies without heavy tiling and downsampling, while Transformers incur quadratic costs in token count and often rely on aggressive patching or windowing. Recently proposed visual state-space models, typified by Mamba, offer linear-time sequence processing with selective recurrence and have therefore attracted rapid interest in remote sensing. This survey analyses how far that promise is realised in practice. We first review the theoretical substrates of state-space models and the role of scanning and serialization when mapping two- and three-dimensional EO data onto one-dimensional sequences. A taxonomy of scan paths and architectural hybrids is then developed, covering centre-focused and geometry-aware trajectories, CNN– and Transformer–Mamba backbones, and multimodal designs for hyperspectral, multisource fusion, segmentation, detection, restoration, and domain-specific scientific applications. Building on this evidence, we delineate the task regimes in which Mamba is empirically warranted—very long sequences, large tiles, or complex degradations—and those in which simpler operators or conventional attention remain competitive. Finally, we discuss green computing, numerical stability, and reproducibility, and outline directions for physics-informed state-space models and remote-sensing-specific foundation architectures. Overall, the survey argues that Mamba should be used as a targeted, scan-aware component in EO pipelines rather than a drop-in replacement for existing backbones, and aims to provide concrete design principles for future remote sensing research and operational practice. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

31 pages, 17740 KB  
Article
HR-UMamba++: A High-Resolution Multi-Directional Mamba Framework for Coronary Artery Segmentation in X-Ray Coronary Angiography
by Xiuhan Zhang, Peng Lu, Zongsheng Zheng and Wenhui Li
Fractal Fract. 2026, 10(1), 43; https://doi.org/10.3390/fractalfract10010043 - 9 Jan 2026
Viewed by 240
Abstract
Coronary artery disease (CAD) remains a leading cause of mortality worldwide, and accurate coronary artery segmentation in X-ray coronary angiography (XCA) is challenged by low contrast, structural ambiguity, and anisotropic vessel trajectories, which hinder quantitative coronary angiography. We propose HR-UMamba++, a U-Mamba-based framework [...] Read more.
Coronary artery disease (CAD) remains a leading cause of mortality worldwide, and accurate coronary artery segmentation in X-ray coronary angiography (XCA) is challenged by low contrast, structural ambiguity, and anisotropic vessel trajectories, which hinder quantitative coronary angiography. We propose HR-UMamba++, a U-Mamba-based framework centered on a rotation-aligned multi-directional state-space scan for modeling long-range vessel continuity across multiple orientations. To preserve thin distal branches, the framework is equipped with (i) a persistent high-resolution bypass that injects undownsampled structural details and (ii) a UNet++-style dense decoder topology for cross-scale topological fusion. On an in-house dataset of 739 XCA images from 374 patients, HR-UMamba++ is evaluated using eight segmentation metrics, fractal-geometry descriptors, and multi-view expert scoring. Compared with U-Net, Attention U-Net, HRNet, U-Mamba, DeepLabv3+, and YOLO11-seg, HR-UMamba++ achieves the best performance (Dice 0.8706, IoU 0.7794, HD95 16.99), yielding a relative Dice improvement of 6.0% over U-Mamba and reducing the deviation in fractal dimension by up to 57% relative to U-Net. Expert evaluation across eight angiographic views yields a mean score of 4.24 ± 0.49/5 with high inter-rater agreement. These results indicate that HR-UMamba++ produces anatomically faithful coronary trees and clinically useful segmentations that can serve as robust structural priors for downstream quantitative coronary analysis. Full article
Show Figures

Figure 1

24 pages, 2570 KB  
Article
SCT-Diff: Seamless Contextual Tracking via Diffusion Trajectory
by Guohao Nie, Xingmei Wang, Debin Zhang and He Wang
J. Imaging 2026, 12(1), 38; https://doi.org/10.3390/jimaging12010038 - 9 Jan 2026
Viewed by 131
Abstract
Existing detection-based trackers exploit temporal contexts by updating appearance models or modeling target motion. However, the sequential one-shot integration of temporal priors risks amplifying error accumulation, as frame-level template matching restricts comprehensive spatiotemporal analysis. To address this, we propose SCT-Diff, a video-level framework [...] Read more.
Existing detection-based trackers exploit temporal contexts by updating appearance models or modeling target motion. However, the sequential one-shot integration of temporal priors risks amplifying error accumulation, as frame-level template matching restricts comprehensive spatiotemporal analysis. To address this, we propose SCT-Diff, a video-level framework that holistically estimates target trajectories. Specifically, SCT-Diff processes video clips globally via a diffusion model to incorporate bidirectional spatiotemporal awareness, where reverse diffusion steps progressively refine noisy trajectory proposals into optimal predictions. Crucially, SCT-Diff enables iterative correction of historical trajectory hypotheses by observing future contexts within a sliding time window. This closed-loop feedback from future frames preserves temporal consistency and breaks the error propagation chain under complex appearance variations. For joint modeling of appearance and motion dynamics, we formulate trajectories as unified discrete token sequences. The designed Mamba-based expert decoder bridges visual features with language-formulated trajectories, enabling lightweight yet coherent sequence modeling. Extensive experiments demonstrate SCT-Diff’s superior efficiency and performance, achieving 75.4% AO on GOT-10k while maintaining real-time computational efficiency. Full article
(This article belongs to the Special Issue Object Detection in Video Surveillance Systems)
Show Figures

Figure 1

24 pages, 3204 KB  
Article
AMUSE++: A Mamba-Enhanced Speech Enhancement Framework with Bi-Directional and Advanced Front-End Modeling
by Tsung-Jung Li, Berlin Chen and Jeih-Weih Hung
Electronics 2026, 15(2), 282; https://doi.org/10.3390/electronics15020282 - 8 Jan 2026
Viewed by 250
Abstract
This study presents AMUSE++, an advanced speech enhancement framework that extends the MUSE++ model by redesigning its core Mamba module with two major improvements. First, the originally unidirectional one-dimensional (1D) Mamba is transformed into a bi-directional architecture to capture temporal dependencies more effectively. [...] Read more.
This study presents AMUSE++, an advanced speech enhancement framework that extends the MUSE++ model by redesigning its core Mamba module with two major improvements. First, the originally unidirectional one-dimensional (1D) Mamba is transformed into a bi-directional architecture to capture temporal dependencies more effectively. Second, this module is extended to a two-dimensional (2D) structure that jointly models both time and frequency dimensions, capturing richer speech features essential for enhancement tasks. In addition to these structural changes, we propose a Preliminary Denoising Module (PDM) as an advanced front-end, which is composed of multiple cascaded 2D bi-directional Mamba Blocks designed to preprocess and denoise input speech features before the main enhancement stage. Extensive experiments on the VoiceBank+DEMAND dataset demonstrate that AMUSE++ significantly outperforms both the backbone MUSE++ across a variety of objective speech enhancement metrics, including improvements in perceptual quality and intelligibility. These results confirm that the combination of bi-directionality, two-dimensional modeling, and an enhanced denoising frontend provides a powerful approach for tackling challenging noisy speech scenarios. AMUSE++ thus represents a notable advancement in neural speech enhancement architectures, paving the way for more effective and robust speech enhancement systems in real-world applications. Full article
Show Figures

Figure 1

23 pages, 10516 KB  
Article
SSGTN: Spectral–Spatial Graph Transformer Network for Hyperspectral Image Classification
by Haotian Shi, Zihang Luo, Yiyang Ma, Guanquan Zhu and Xin Dai
Remote Sens. 2026, 18(2), 199; https://doi.org/10.3390/rs18020199 - 7 Jan 2026
Viewed by 269
Abstract
Hyperspectral image (HSI) classification is fundamental to a wide range of remote sensing applications, such as precision agriculture, environmental monitoring, and urban planning, because HSIs provide rich spectral signatures that enable the discrimination of subtle material differences. Deep learning approaches, including Convolutional Neural [...] Read more.
Hyperspectral image (HSI) classification is fundamental to a wide range of remote sensing applications, such as precision agriculture, environmental monitoring, and urban planning, because HSIs provide rich spectral signatures that enable the discrimination of subtle material differences. Deep learning approaches, including Convolutional Neural Networks (CNNs), Graph Convolutional Networks (GCNs), and Transformers, have achieved strong performance in learning spatial–spectral representations. However, these models often face difficulties in jointly modeling long-range dependencies, fine-grained local structures, and non-Euclidean spatial relationships, particularly when labeled training data are scarce. This paper proposes a Spectral–Spatial Graph Transformer Network (SSGTN), a dual-branch architecture that integrates superpixel-based graph modeling with Transformer-based global reasoning. SSGTN consists of four key components, namely (1) an LDA-SLIC superpixel graph construction module that preserves discriminative spectral–spatial structures while reducing computational complexity, (2) a lightweight spectral denoising module based on 1×1 convolutions and batch normalization to suppress redundant and noisy bands, (3) a Spectral–Spatial Shift Module (SSSM) that enables efficient multi-scale feature fusion through channel-wise and spatial-wise shift operations, and (4) a dual-branch GCN-Transformer block that jointly models local graph topology and global spectral–spatial dependencies. Extensive experiments on three public HSI datasets (Indian Pines, WHU-Hi-LongKou, and Houston2018) under limited supervision (1% training samples) demonstrate that SSGTN consistently outperforms state-of-the-art CNN-, Transformer-, Mamba-, and GCN-based methods in overall accuracy, Average Accuracy, and the κ coefficient. The proposed framework provides an effective baseline for HSI classification under limited supervision and highlights the benefits of integrating graph-based structural priors with global contextual modeling. Full article
Show Figures

Figure 1

20 pages, 15504 KB  
Article
O-Transformer-Mamba: An O-Shaped Transformer-Mamba Framework for Remote Sensing Image Haze Removal
by Xin Guan, Runxu He, Le Wang, Hao Zhou, Yun Liu and Hailing Xiong
Remote Sens. 2026, 18(2), 191; https://doi.org/10.3390/rs18020191 - 6 Jan 2026
Viewed by 164
Abstract
Although Transformer-based and state-space models (e.g., Mamba) have demonstrated impressive performance in image restoration, they remain deficient in remote sensing image dehazing. Transformer-based models tend to distribute attention evenly, making them difficult to handle the uneven distribution of haze. While Mamba excels at [...] Read more.
Although Transformer-based and state-space models (e.g., Mamba) have demonstrated impressive performance in image restoration, they remain deficient in remote sensing image dehazing. Transformer-based models tend to distribute attention evenly, making them difficult to handle the uneven distribution of haze. While Mamba excels at modeling long-range dependencies, it lacks fine-grained spatial awareness of complex atmospheric scattering. To overcome these limitations, we present a new O-shaped dehazing architecture that combines a Sparse-Enhanced Self-Attention (SE-SA) module with a Mixed Visual State Space Model (Mix-VSSM), balancing haze-sensitive details in remote sensing images with long-range context modeling. The SE-SA module introduces a dynamic soft masking mechanism that adaptively adjusts attention weights based on the local haze distribution, enabling the network to more effectively focus on severely degraded regions while suppressing redundant responses. Furthermore, the Mix-VSSM enhances global context modeling by combining sequential processing of 2D perception with local residual information. This design mitigates the loss of spatial detail in the standard VSSM and improves the feature representation of haze-degraded remote sensing images. Thorough experiments verify that our O-shaped framework outperforms existing methods on several benchmark datasets. Full article
(This article belongs to the Special Issue Deep Learning for Remote Sensing Image Enhancement)
Show Figures

Graphical abstract

30 pages, 13588 KB  
Article
MSTFT: Mamba-Based Spatio-Temporal Fusion for Small Object Tracking in UAV Videos
by Kang Sun, Haoyang Zhang and Hui Chen
Electronics 2026, 15(2), 256; https://doi.org/10.3390/electronics15020256 - 6 Jan 2026
Viewed by 145
Abstract
Unmanned Aerial Vehicle (UAV) visual tracking is widely used but continues to face challenges such as unpredictable target motion, error accumulation, and the sparse appearance of small targets. To address these issues, we propose a Mamba-based Spatio-Temporal Fusion Tracker. To address tracking drift [...] Read more.
Unmanned Aerial Vehicle (UAV) visual tracking is widely used but continues to face challenges such as unpredictable target motion, error accumulation, and the sparse appearance of small targets. To address these issues, we propose a Mamba-based Spatio-Temporal Fusion Tracker. To address tracking drift from large displacements and abrupt pose changes, we first introduce a Bidirectional Spatio-Temporal Mamba module. It employs bidirectional spatial scanning to capture discriminative local features and temporal scanning to model dynamic motion patterns. Second, to suppress error accumulation in complex scenes, we develop a Dynamic Template Fusion module with Adaptive Attention. This module integrates a threefold safety verification mechanism—based on response peak, temporal consistency, and motion stability—with a scale-aware strategy to enable robust template updates. Moreover, we design a Small-Target-Aware Context Prediction Head that utilizes a Gaussian-weighted prior to guide feature fusion and refines the loss function, significantly improving localization accuracy under sparse target features and strong background interference. On three major UAV tracking benchmarks (UAV123, UAV123@10fps, and UAV20L), our MSTFT establishes new state-of-the-art with success AUCs of 79.4%, 76.5%, and 75.8% respectively. More importantly, it maintains a tracking speed of 45 FPS, demonstrating a superior balance between precision and efficiency. Full article
Show Figures

Figure 1

31 pages, 2157 KB  
Article
DynMultiDep: A Dynamic Multimodal Fusion and Multi-Scale Time Series Modeling Approach for Depression Detection
by Jincheng Li, Menglin Zheng, Jiongyi Yang, Yihui Zhan and Xing Xie
J. Imaging 2026, 12(1), 29; https://doi.org/10.3390/jimaging12010029 - 6 Jan 2026
Viewed by 159
Abstract
Depression is a prevalent mental disorder that imposes a significant public health burden worldwide. Although multimodal detection methods have shown potential, existing techniques still face two critical bottlenecks: (i) insufficient integration of global patterns and local fluctuations in long-sequence modeling and (ii) static [...] Read more.
Depression is a prevalent mental disorder that imposes a significant public health burden worldwide. Although multimodal detection methods have shown potential, existing techniques still face two critical bottlenecks: (i) insufficient integration of global patterns and local fluctuations in long-sequence modeling and (ii) static fusion strategies that fail to dynamically adapt to the complementarity and redundancy among modalities. To address these challenges, this paper proposes a dynamic multimodal depression detection framework, DynMultiDep, which combines multi-scale temporal modeling with an adaptive fusion mechanism. The core innovations of DynMultiDep lie in its Multi-scale Temporal Experts Module (MTEM) and Dynamic Multimodal Fusion module (DynMM). On one hand, MTEM employs Mamba experts to extract long-term trend features and utilizes local-window Transformers to capture short-term dynamic fluctuations, achieving adaptive fusion through a long-short routing mechanism. On the other hand, DynMM introduces modality-level and fusion-level dynamic decision-making, selecting critical modality paths and optimizing cross-modal interaction strategies based on input characteristics. The experimental results demonstrate that DynMultiDep outperforms existing state-of-the-art methods in detection performance on two widely used large-scale depression datasets. Full article
Show Figures

Figure 1

26 pages, 5848 KB  
Article
HR-Mamba: Building Footprint Segmentation with Geometry-Driven Boundary Regularization
by Buyu Su, Defei Yin, Piyuan Yi, Wenhuan Wu, Junjian Liu, Fan Yang, Haowei Mu and Jingyi Xiong
Sensors 2026, 26(2), 352; https://doi.org/10.3390/s26020352 - 6 Jan 2026
Viewed by 244
Abstract
Building extraction underpins land-use assessment, urban planning, and disaster mitigation, yet dense urban scenes still cause missed small objects, target adhesion, and ragged contours. We present High-Resolution-Mamba (HR-Mamba), a high-resolution semantic segmentation network that augments a High-Resolution Network (HRNet) parallel backbone with edge-aware [...] Read more.
Building extraction underpins land-use assessment, urban planning, and disaster mitigation, yet dense urban scenes still cause missed small objects, target adhesion, and ragged contours. We present High-Resolution-Mamba (HR-Mamba), a high-resolution semantic segmentation network that augments a High-Resolution Network (HRNet) parallel backbone with edge-aware and sequence-state modeling. A Canny-enhanced, median-filtered stem stabilizes boundaries under noise; Involution-based residual blocks capture position-specific local geometry; and a Mamba-based State Space Models (Mamba-SSM) global branch captures cross-scale long-range dependencies with linear complexity. Training uses a composite loss of binary cross entropy (BCE), Dice loss, and Boundary loss, with weights selected by joint grid search. We further design a feature-driven adaptive post-processing pipeline that includes geometric feature analysis, multi-strategy simplification, multi-directional regularization, and topological consistency verification to produce regular, smooth, engineering-ready building outlines. On dense urban imagery, HR-Mamba improves F1-score from 80.95% to 83.93%, an absolute increase of 2.98% relative to HRNet. We conclude that HR-Mamba jointly enhances detail fidelity and global consistency and offers a generalizable route for high-resolution building extraction in remote sensing. Full article
Show Figures

Figure 1

23 pages, 902 KB  
Article
Data-Driven Cross-Lingual Anomaly Detection via Self-Supervised Representation Learning
by Mingfei Wang, Nuo Wang, Lingdong Mei, Yunfei Li, Xinyang Liu, Surui Hua and Manzhou Li
Electronics 2026, 15(1), 212; https://doi.org/10.3390/electronics15010212 - 2 Jan 2026
Viewed by 367
Abstract
Deep anomaly detection in multilingual environments remains challenging due to limited labeled data, semantic inconsistency across languages, and the unstable distribution of rare abnormal patterns. These challenges are particularly severe in low-resource scenarios—characterized by scarce labeled anomaly data and non-standardized terminology—where conventional supervised [...] Read more.
Deep anomaly detection in multilingual environments remains challenging due to limited labeled data, semantic inconsistency across languages, and the unstable distribution of rare abnormal patterns. These challenges are particularly severe in low-resource scenarios—characterized by scarce labeled anomaly data and non-standardized terminology—where conventional supervised or transfer-based models suffer from semantic drift and feature mismatch. To address these limitations, a data-driven cross-lingual anomaly detection framework, LR-SSAD, is proposed. Targeting paired text and behavioral data without requiring parallel translation corpora, the framework is built upon the joint optimization of complementary self-supervised objectives. A cross-lingual masked prediction module is designed to capture language-invariant semantic structures to align semantic spaces, while a Mamba-based sequence reconstruction module leverages its linear computational complexity (O(N)) to efficiently model long-range dependencies in transaction histories, overcoming the computational bottlenecks of quadratic attention mechanisms. To further enhance robustness under noisy supervision, a noise-aware pseudo-label refinement mechanism is introduced. Evaluated on a newly constructed real-world financial dataset (spanning January–June 2023) comprising 1.2 million multilingual texts and 420,000 transaction sequences, experimental results demonstrate that LR-SSAD achieves substantial improvements over state-of-the-art baselines. The model achieves an accuracy of 0.932, a precision of 0.914, a recall of 0.891, and an F1-score of 0.902, with the Area Under the Curve (AUC) reaching 0.948. The proposed framework provides a scalable and data-efficient solution for anomaly detection in real-world multilingual environments. Full article
(This article belongs to the Special Issue Advances in Data-Driven Artificial Intelligence)
Show Figures

Figure 1

24 pages, 3582 KB  
Article
A Dual-Decomposition Graph-Mamba-Transformer Framework for Ultra-Short-Term Wind Power Forecasting
by Jinming Gao, Yixin Sun, Kwangheon Song, Kwanyoung Jung and Hoekyung Jung
Appl. Sci. 2026, 16(1), 466; https://doi.org/10.3390/app16010466 - 1 Jan 2026
Viewed by 315
Abstract
Accurate ultra-short-term wind power forecasting is vital for the secure and economic operation of power systems with high renewable penetration. Conventional models, however, struggle with multi-scale frequency feature extraction, dynamic cross-variable dependencies, and simultaneously capturing local fluctuations and global trends. This study proposes [...] Read more.
Accurate ultra-short-term wind power forecasting is vital for the secure and economic operation of power systems with high renewable penetration. Conventional models, however, struggle with multi-scale frequency feature extraction, dynamic cross-variable dependencies, and simultaneously capturing local fluctuations and global trends. This study proposes a novel hybrid framework termed VMD–ALIF–GraphBlock–MLLA–Transformer. A dual-decomposition strategy combining variational mode decomposition and adaptive local iterative filtering first extracts dominant periodic components while suppressing high-frequency noise. An adaptive GraphBlock with MixHop convolution then models structured and time-varying inter-variable dependencies. Finally, a multi-scale linear attention-enhanced Mamba-like module and Transformer encoder jointly capture short- and long-range temporal dynamics. Experiments on a real wind farm dataset with 10-min resolution demonstrate substantial superiority over State-of-the-Art baselines across 1-, 4-, and 8-step forecasting horizons. SHAP analysis further confirms excellent consistency with underlying physical mechanisms. The proposed framework provides a robust, accurate, and highly interpretable solution for intelligent wind power forecasting. Full article
Show Figures

Figure 1

Back to TopTop