Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (96)

Search Parameters:
Keywords = multi-scale linear spatial attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 4386 KB  
Article
DOL-DETR: An Efficient Small Object Detection Algorithm for Unmanned Aerial Vehicle Remote Sensing
by Shanle Chen and Zhipeng Li
Appl. Sci. 2026, 16(9), 4510; https://doi.org/10.3390/app16094510 - 3 May 2026
Viewed by 297
Abstract
Object detection in Unmanned Aerial Vehicle (UAV) imagery faces severe challenges, including small target scales, dense spatial distributions, and complex backgrounds. To address the feature attenuation and noise interference inherent in existing deep learning models, this paper proposes DOL-DETR, an efficient small object [...] Read more.
Object detection in Unmanned Aerial Vehicle (UAV) imagery faces severe challenges, including small target scales, dense spatial distributions, and complex backgrounds. To address the feature attenuation and noise interference inherent in existing deep learning models, this paper proposes DOL-DETR, an efficient small object detection algorithm based on the Real-Time DEtection TRansformer (RT-DETR) architecture. Our model introduces three key innovations. First, the DAttention-based Intra-scale Feature Interaction (DAIFI) module reconstructs intra-scale feature interactions using deformable attention to focus on salient regions with linear complexity. Second, the Omni-Modulated Feature Fusion (OMFF) mechanism adaptively captures multi-scale features and dynamically suppresses background noise. Finally, Linear De-redundancy Convolution (LDConv) replaces standard downsampling to dynamically adapt to object deformations. While introducing a complex dynamic resampling mechanism, it strategically optimizes parameter allocation, significantly enhancing localization precision without introducing excessive computational overhead. Extensive experiments on the VisDrone2019 benchmark demonstrate that DOL-DETR achieves an mAP@0.5 of 52.4% (a 4.2% improvement over the baseline) while maintaining a real-time inference speed of 120.1 FPS with only 20.1M parameters. Furthermore, generalization experiments on the large-scale DOTA dataset yield a 76.1% mAP@0.5, outperforming the baseline by 3.8%. These results indicate that DOL-DETR provides a better trade-off between detection accuracy, inference efficiency, and cross-domain generalization in UAV remote sensing scenarios. Full article
Show Figures

Figure 1

22 pages, 48062 KB  
Article
MMIC: A Remote Sensing Image Compression Algorithm
by Longwei Li and Likun Hu
Appl. Sci. 2026, 16(9), 4499; https://doi.org/10.3390/app16094499 - 3 May 2026
Viewed by 184
Abstract
Unlike natural images, remote sensing images have unique characteristics such as high spatial resolution, complex textures, and strong directional features. Their content often contains many man-made targets with clear directional structures, such as buildings, roads, and bridges. It also contains complex ground object [...] Read more.
Unlike natural images, remote sensing images have unique characteristics such as high spatial resolution, complex textures, and strong directional features. Their content often contains many man-made targets with clear directional structures, such as buildings, roads, and bridges. It also contains complex ground object boundaries. However, most existing image compression methods are designed for natural images. They typically use square convolution kernels and local receptive fields. As a result, they struggle to effectively capture the rich directional structures in remote sensing images and model global context information. This limits compression efficiency and the fidelity of key information. To address this challenge, this paper proposes a novel remote sensing image compression algorithm. The algorithm uses a multi-scale asymmetric convolution block that combines sampling convolution, parallel one-dimensional horizontal and vertical convolutions, and two-dimensional square convolution. This helps the model better capture directional objects and multi-scale features. In addition, we also propose a multi-scale non-local attention module. It models global dependencies with a linear computational complexity. This helps improve the ability to retain key information. The experimental results demonstrate that compared with the baseline model, the proposed algorithm achieves a 0.40 dB improvement in BD-PSNR and a 10.27% reduction in BD-Rate, while also delivering superior subjective visual quality. These results confirm the effectiveness of our approach for remote sensing image compression. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
Show Figures

Figure 1

19 pages, 6641 KB  
Article
Automated Detection and Classification of Lunar Linear Tectonic Features Using a Deep Learning Method
by Xiaoyang Liu, Yang Luo, Jianhui Wang, Denggao Qiu, Jianguo Yan, Wensong Zhang and Yaowen Luo
Remote Sens. 2026, 18(9), 1330; https://doi.org/10.3390/rs18091330 - 26 Apr 2026
Viewed by 267
Abstract
On the lunar surface, wrinkle ridges, grabens, and lobate scarps represent key tectonic landforms that reflect the evolution of the Moon’s stress field and its tectonic processes. However, these linear structures often exhibit weak textures, low contrast, and large scale variations, making manual [...] Read more.
On the lunar surface, wrinkle ridges, grabens, and lobate scarps represent key tectonic landforms that reflect the evolution of the Moon’s stress field and its tectonic processes. However, these linear structures often exhibit weak textures, low contrast, and large scale variations, making manual interpretation inefficient and subjective. To address this issue, this study introduces an improved YOLOv8 model, termed HL-YOLOv8, for the automated detection of lunar linear features. The model incorporates a multiscale lightweight channel attention (C2f_MLCA) module into the backbone network to enhance the extraction of fine-grained and weak-texture features and integrates a multihead self-attention (C2f_MHSA) module in the feature fusion stage to improve the modelling of long-range spatial dependencies. In addition, the combination of a dual focal loss and a diversified data augmentation strategy effectively mitigates the detection difficulties caused by class imbalance and weak-feature samples. The experimental results obtained using the global LROC-WAC image dataset demonstrate that HL-YOLOv8 significantly outperforms the baseline YOLOv8 and other comparative models in terms of precision, recall, and mAP@0.5. Specifically, the proposed model achieved an average precision of 73.5%, an average recall of 73.1%, and an average mAP@0.5 of 74.6% on the evaluation dataset, showing particularly strong performance in detecting elongated grabens and boundary-blurred lobate scarps. The global distribution maps derived from the model predictions indicate that HL-YOLOv8 can be applied to comprehensively reconstruct the spatial patterns of the three types of linear structures and identify potential new features in high-latitude and geologically complex regions, demonstrating excellent generalizability and robustness. This study provides an efficient and reliable framework for the automated identification and global mapping of lunar linear features and offers a transferable methodological reference for the tectonic interpretation of terrestrial planets. Full article
Show Figures

Figure 1

36 pages, 4902 KB  
Article
PFEB: A Post-Fusion Enhanced Decoder Module for Remote Sensing Semantic Segmentation
by Dongjie Lian, Gang Chen, Biao Wu and Feifan Yang
Remote Sens. 2026, 18(8), 1246; https://doi.org/10.3390/rs18081246 - 20 Apr 2026
Viewed by 417
Abstract
Remote sensing semantic segmentation is fundamental to applications such as land-cover mapping, urban analysis, and environmental monitoring. However, remote sensing scenes often exhibit pronounced scale variation, fragmented regions, dense small objects, and complex boundary transitions, making fine-grained prediction particularly challenging. Transformer-based architectures such [...] Read more.
Remote sensing semantic segmentation is fundamental to applications such as land-cover mapping, urban analysis, and environmental monitoring. However, remote sensing scenes often exhibit pronounced scale variation, fragmented regions, dense small objects, and complex boundary transitions, making fine-grained prediction particularly challenging. Transformer-based architectures such as SegFormer have demonstrated a strong capability in modeling long-range context through hierarchical encoding, yet their lightweight decoders mainly rely on linear projection and feature fusion, providing limited capacity for local refinement after multi-scale aggregation. This limitation may reduce spatial precision in boundary-sensitive and small-object-rich regions. To address this issue, we propose the Post-fusion Enhanced Block (PFEB), a lightweight decoder-side refinement module inserted after multi-scale feature fusion and before pixel-wise classification. PFEB combines channel expansion, depthwise and pointwise convolutions, efficient channel attention (ECA), and residual learning to enhance local semantic refinement while largely preserving computational efficiency. Built upon SegFormer, the proposed method was evaluated on two widely used remote sensing benchmarks, i.e., LoveDA and ISPRS Vaihingen, under both Mix Transformer-B0 (MiT-B0) and Mix Transformer-B2 (MiT-B2) backbones. Experimental results show that PFEB consistently improves the SegFormer baseline across datasets and model scales. Under MiT-B2 backbone, our method achieves 53.82 ± 0.31 mean intersection over union (mIoU) on LoveDA and 74.84 ± 0.41 mIoU on ISPRS Vaihingen. Boundary- and size-aware evaluations further indicate that the gains are mainly reflected in improved semantic correctness near boundaries and in the recoverability of small objects. With only modest additional cost (approximately +0.53 M parameters and +8.7 G floating point operations (FLOPs)), PFEB provides a favorable accuracy–efficiency trade-off. These results suggest that PFEB is an effective and lightweight post-fusion refinement module for improving fine-grained remote sensing semantic segmentation. Full article
Show Figures

Figure 1

26 pages, 10629 KB  
Article
LRD-DETR: A Lightweight RT-DETR-Based Model for Road Distress Detection
by Chen Dong and Yunwei Zhang
Sensors 2026, 26(8), 2375; https://doi.org/10.3390/s26082375 - 12 Apr 2026
Viewed by 404
Abstract
Intelligent road distress detection technology has emerged as an important research topic in the field of highway maintenance. However, the accuracy and practicality of pavement distress detection are constrained by multiple factors, primarily including the irregular shapes of distress, the tendency for fine [...] Read more.
Intelligent road distress detection technology has emerged as an important research topic in the field of highway maintenance. However, the accuracy and practicality of pavement distress detection are constrained by multiple factors, primarily including the irregular shapes of distress, the tendency for fine cracks to be overlooked, and the high parameter count of detection models that makes deployment difficult. Therefore, this study proposes a lightweight road distress detection model based on an improved RT-DETR architecture—LRD-DETR. First, this work integrates the C2f-LFEM module with the ADown adaptive down-sampling strategy into the backbone network, significantly reducing the number of model parameters and computational load while effectively enhancing the representation capacity of multi-scale pavement distress features. Second, a frequency-domain spatial attention is embedded in the S4 feature layer, where synergistic integration of frequency-domain filtering and spatial attention enables detail enhancement of distress edges and contours, automatically focuses on the distress regions, and suppresses background interference. The polarity-aware linear attention is incorporated into the S5 feature layer, by explicitly modeling polarity interactions, it effectively captures textural discrepancies between damaged regions and the intact road surface, and a learnable power function dynamically rescales attention weights to strengthen distress-specific feature responses. Finally, a cross-scale spatial feature fusion module (CSF2M) is developed to reconstruct and fuse multi-level spatial featurez, thereby improving detection robustness for pavement distresses with diverse morphologies under complex background conditions. Quantitative experiments indicate that, in contrast with the baseline RT-DETR, the presented framework improves the F1-score by 7.1% and mAP@50 by 9.0%, while reducing computational complexity and parameter quantity by 43.8% and 38.0%, respectively. These advantages enable LRD-DETR to be suitably deployed on resource-limited embedded platforms for real-time road distress detection. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

25 pages, 6534 KB  
Article
Spectral–Spatial State Space Model with Hybrid Attention for Hyperspectral Image Classification
by Mengdi Cheng, Haixin Sun, Fanlei Meng, Qiuguang Cao and Jingwen Xu
Algorithms 2026, 19(4), 300; https://doi.org/10.3390/a19040300 - 11 Apr 2026
Viewed by 508
Abstract
Hyperspectral image (HSI) classification requires the extraction of discriminative features from high-dimensional spatial–spectral data. While the Mamba architecture has shown promise in long-sequence modeling with linear complexity, its application to HSI remains constrained by two major hurdles: the unidirectional causal scanning which fails [...] Read more.
Hyperspectral image (HSI) classification requires the extraction of discriminative features from high-dimensional spatial–spectral data. While the Mamba architecture has shown promise in long-sequence modeling with linear complexity, its application to HSI remains constrained by two major hurdles: the unidirectional causal scanning which fails to capture non-causal global dependencies, and the serialization-induced loss of two-dimensional spatial topology and local textures. To overcome these limitations, we propose HAMamba, a novel Hybrid Attention State Space Model. HAMamba facilitates deep representation learning through two core components: a Multi-Scale Dynamic Fusion (MSDF) module and a Hybrid Attention Mamba Encoder (HAME). Specifically, the MSDF module augments spatial perception through parallelized feature extraction and dynamically weighted integration. The HAME synergizes a Bidirectional Sequence Scan Mamba (BSSM) to establish global semantic context and a Spatial–Spectral Gated Attention (SSGA) module to refine local structural details. Comprehensive experiments on four public benchmark datasets demonstrate that the proposed HAMamba significantly outperforms state-of-the-art approaches, achieving a superior balance between classification accuracy and computational efficiency. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

25 pages, 6659 KB  
Article
MDS3-Net: A Multiscale Spectral–Spatial Sequence Hybrid CNN–Transformer Model for Hyperspectral Image Classification
by Taonian Bian, Bin Yang, Yuanjiang Chen, Xuan Zhou, Li Yue and Shunshi Hu
Remote Sens. 2026, 18(7), 977; https://doi.org/10.3390/rs18070977 - 25 Mar 2026
Viewed by 525
Abstract
Hyperspectral image (HSI) classification faces significant challenges due to the spatial–spectral heterogeneity of land covers and the geometric rigidity of standard convolutions. Although Transformers offer powerful global modeling capabilities, their quadratic computational complexity limits practical efficiency. To address these limitations, this paper proposes [...] Read more.
Hyperspectral image (HSI) classification faces significant challenges due to the spatial–spectral heterogeneity of land covers and the geometric rigidity of standard convolutions. Although Transformers offer powerful global modeling capabilities, their quadratic computational complexity limits practical efficiency. To address these limitations, this paper proposes a novel hierarchical framework named MDS3-Net (Multiscale Deformable Spectral–Spatial Sequence Network). Specifically, we design a Multiscale Spectral-Deformable Convolution (MSDC) module that adopts a cascaded strategy to sequentially extract discriminative spectral features and adaptively align spatial receptive fields with irregular object boundaries. To capture long-range dependencies efficiently, a Spectral–Spatial Sequence (S3) Encoder is introduced based on a gated large-kernel convolution mechanism, achieving global context modeling with linear complexity. Furthermore, a Dual-Path Feature Extraction (DPFE) module is proposed to perform semantics-preserving dimension reduction via spectral reorganization and spatial attention. Experimental results on four public datasets demonstrate that the proposed MDS3-Net achieves state-of-the-art classification performance and exhibits superior robustness under limited training samples compared to existing methods. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

22 pages, 6052 KB  
Article
HSMD-YOLO: An Anti-Aliasing Feature-Enhanced Network for High-Speed Microbubble Detection
by Wenda Luo, Yongjie Li and Siguang Zong
Algorithms 2026, 19(3), 234; https://doi.org/10.3390/a19030234 - 20 Mar 2026
Viewed by 323
Abstract
Underwater micro-bubble detection entails multiple challenges, including diminutive target sizes, sparse pixel information, pronounced specular highlights and water scattering, indistinct bubble boundaries, and adhesion or overlap between instances. To address these issues, we propose HSMD-YOLO, an improved detector tailored for high-resolution micro-bubble detection [...] Read more.
Underwater micro-bubble detection entails multiple challenges, including diminutive target sizes, sparse pixel information, pronounced specular highlights and water scattering, indistinct bubble boundaries, and adhesion or overlap between instances. To address these issues, we propose HSMD-YOLO, an improved detector tailored for high-resolution micro-bubble detection and built upon YOLOv11. The model incorporates three novel components: the Scale Switch Block (SSB), a scale-transformation module that suppresses artifacts and background noise, thereby stabilizing edges in thin-walled bubble regions and enhancing sensitivity to geometric contours; the Global Local Refine Block (GLRB), which achieves efficient global relationship modeling with an asymptotic linear complexity (O(N)) in spatial dimensions while further refining local features, thereby strengthening boundary perception and improving bubble–background separability; and the Bidirectional Exponential Moving Attention Fusion (BEMAF), which accommodates the multi-scale nature of bubbles by employing a parallel multi-kernel architecture to extract spatial features across scales, coupled with a multi-stage EMA based attention mechanism to enhance detection robustness under weak boundaries and complex backgrounds. Experiments conducted on an Side-Illuminated Light Field Bubble Database (SILB-DB) and a public gas–liquid two-phase flow dataset (GTFD) demonstrate that HSMD-YOLO achieves mAP@50 scores of 0.911 and 0.854, respectively, surpassing mainstream detection methods. Ablation studies indicate that SSB, GLRB, and BEMAF contribute performance gains of 1.3%, 2.0%, and 0.4%, respectively, thereby corroborating the effectiveness of each module for micro-scale object detection. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

18 pages, 2314 KB  
Article
Efficient Two-Stage Autofocus for Micro-Assembly Based on Joint Spatial-Frequency Image Quality Assessment
by Jianpeng Zhang, Tianbo Kang, Xin Zhao, Mingzhu Sun and Yi Yang
J. Imaging 2026, 12(3), 137; https://doi.org/10.3390/jimaging12030137 - 19 Mar 2026
Viewed by 1154
Abstract
Reliable autofocus is a fundamental prerequisite for precise positioning in micro-assembly systems, where complex reflections, scale variations, and narrow depth-of-field often degrade the robustness of traditional sharpness metrics. To address these challenges, we propose an efficient two-stage autofocus method for a dual-camera micro-vision [...] Read more.
Reliable autofocus is a fundamental prerequisite for precise positioning in micro-assembly systems, where complex reflections, scale variations, and narrow depth-of-field often degrade the robustness of traditional sharpness metrics. To address these challenges, we propose an efficient two-stage autofocus method for a dual-camera micro-vision system based on a spatial-frequency image quality assessment (IQA) model. First, we design WaveMamba-IQA for image sharpness estimation, synergistically combining the Discrete Wavelet Transform with Vision Transformers to capture high-frequency details and semantic features, further enhanced by Multi-Linear Transposed Attention and Vision Mamba for global context modeling. Moreover, we implement a coarse-to-fine autofocus workflow, employing the Covariance Matrix Adaptation Evolution Strategy for global optimization on the horizontal camera, followed by geometric prior-based precise adjustment for the oblique camera. Experimental results on a custom microsphere dataset demonstrate that WaveMamba-IQA achieves a Spearman correlation coefficient of 0.9786. Furthermore, the integrated system achieves a 98.33% autofocus success rate across varying lighting conditions. This method significantly improves the robustness and automation level of micro-assembly systems, effectively overcoming the limitations of manual and traditional focusing techniques. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

17 pages, 4972 KB  
Article
Seismic Attribute Fusion and Reservoir Prediction Using Multiscale Convolutional Neural Networks and Self-Attention: A Case Study of the B Gas Field, South Sumatra Basin
by Ziyun Cheng, Wensong Huang, Xiaoling Zhang, Zhanxiang Lei, Guoliang Hong, Wenwen Wang, Mengyang Zhang, Linze Li and Jian Li
Processes 2026, 14(6), 981; https://doi.org/10.3390/pr14060981 - 19 Mar 2026
Viewed by 431
Abstract
Strong heterogeneity and ambiguous seismic responses hinder reliable sandstone thickness prediction when using a single seismic attribute in the lower sandstone interval of the Talang Akar Formation (hereafter abbreviated as the LTAF interval) in the B gas field, South Sumatra Basin. To address [...] Read more.
Strong heterogeneity and ambiguous seismic responses hinder reliable sandstone thickness prediction when using a single seismic attribute in the lower sandstone interval of the Talang Akar Formation (hereafter abbreviated as the LTAF interval) in the B gas field, South Sumatra Basin. To address this challenge, we propose a seismic attribute fusion and reservoir sweet-spot prediction framework based on a multiscale convolutional neural network (CNN) integrated with a self-attention module. Multiple seismic attribute volumes are organized as multi-channel 2D attribute slices, and parallel convolutions with kernel sizes of 3 × 3, 5 × 5, and 7 × 7 are employed to capture spatial features ranging from thin-bed boundaries and channel morphology to sand-body assemblage distribution. The self-attention module explicitly models inter-attribute dependencies and performs adaptive weighted fusion to suppress noise and emphasize informative attributes. The network adopts a dual-output design, producing (i) a sandstone thickness prediction map at the same spatial resolution as the input and (ii) attribute importance scores for quantitative attribute selection and geological interpretation. Using 3D seismic data and well-constrained thickness labels, the proposed model achieves an R2 of 0.8954, outperforming linear regression (R2 = 0.8281) and random forest regression (R2 ≈ 0.8453). The learned importance scores indicate that amplitude-related attributes (e.g., RMS amplitude and maximum amplitude) contribute most to thickness prediction, whereas frequency- and energy-related attributes show relatively lower contributions, which is consistent with bandwidth-limited resolution effects. Overall, the proposed framework unifies attribute fusion, thickness prediction, and interpretability within a single model, providing practical support for fine reservoir characterization and development optimization in heterogeneous sandstone reservoirs. Full article
(This article belongs to the Special Issue Applications of Intelligent Models in the Petroleum Industry)
Show Figures

Figure 1

28 pages, 48517 KB  
Article
DDF-DETR: A Multi-Scale Spatial Context Method for Field Cotton Seedling Detection
by Feng Xu, Huade Zhou, Yinyi Pan, Yi Lu and Luan Dong
Agriculture 2026, 16(5), 615; https://doi.org/10.3390/agriculture16050615 - 7 Mar 2026
Viewed by 662
Abstract
Accurate assessment of cotton emergence rates is essential for precision agriculture management, and unmanned aerial vehicle (UAV) imagery provides a scalable means for field-level monitoring. However, cotton seedling detection from UAV images faces persistent challenges: individual seedlings appear as small targets with diverse [...] Read more.
Accurate assessment of cotton emergence rates is essential for precision agriculture management, and unmanned aerial vehicle (UAV) imagery provides a scalable means for field-level monitoring. However, cotton seedling detection from UAV images faces persistent challenges: individual seedlings appear as small targets with diverse morphologies across varying flight altitudes; strong plastic film reflections, weeds, and soil cracks introduce substantial background interference; and “missing seedling” targets, which manifest as negative space features, exhibit high similarity to background noise. Existing CNN–Transformer hybrid detection architectures are limited by fixed convolutional receptive fields that cannot adapt to multi-scale target variations, attention mechanisms that lack explicit directional geometric modeling, and interpolation-based upsampling that attenuates high-frequency edge details of small targets. To address these issues, this paper proposes DDF-DETR (Dynamic-Direction-Frequency Detection Transformer), a multi-scale spatial context detection method based on RT-DETR. The method incorporates three components: a Dynamic Gated Mixer Block (DGMB) for adaptive multi-scale feature extraction with background noise suppression, a Direction-Aware Adaptive Transformer Encoder (DAATE) for directional geometric feature modeling at linear computational complexity, and a Frequency-Aware Sub-pixel Upsampling Network (FASN) for high-frequency detail recovery in the feature pyramid. On the self-constructed Xinjiang cotton field dataset, DDF-DETR achieves 83.72% mAP@0.5 and 63.46% mAP@0.5:0.95, representing improvements of 2.38% and 5.28% over the baseline RT-DETR-R18, while reducing the parameter count by 30.6% and computational cost to 42.8 GFLOPs. Generalization experiments on the VisDrone2019 and TinyPerson datasets further validate the robustness of the proposed method for small target detection across different scenarios. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

26 pages, 9001 KB  
Article
PSiam-HDSFNet: A Pseudo-Siamese Hybrid Dilation Spiral Feature Network for Flood Inundation Change Detection Based on Heterogeneous Remote Sensing Imagery
by Yichuang Luo, Xunqiang Gong, Yuanxin Ye, Pengyuan Lv, Shuting Yang, Ailong Ma and Yanfei Zhong
Remote Sens. 2026, 18(5), 788; https://doi.org/10.3390/rs18050788 - 4 Mar 2026
Viewed by 411
Abstract
Flood change detection from remote sensing data can be used to identify post-disaster flooded areas, providing decision support for emergency rescue and post-disaster reconstruction. Although the combination of SAR and optical images effectively addresses obscuration by clouds and rain, the inherent difference in [...] Read more.
Flood change detection from remote sensing data can be used to identify post-disaster flooded areas, providing decision support for emergency rescue and post-disaster reconstruction. Although the combination of SAR and optical images effectively addresses obscuration by clouds and rain, the inherent difference in their imaging mechanisms poses a challenge to improving the accuracy of flood area change detection. Furthermore, existing flood inundation change detection methods based on heterogeneous remote sensing imagery struggle to distinguish small ground objects within the background from the actual inundated regions. Therefore, a pseudo-Siamese hybrid dilation spiral feature network (PSiam-HDSFNet) is proposed in this paper. Firstly, the feature extraction pipeline progressively processes optical and SAR images through five-layer Enhanced Deep Residual Blocks and five-layer Residual Dense Blocks, respectively. A Hybrid Dilated Pyramid (HDP) module based on a sawtooth wave-like dilated coefficient is designed to enhance multi-scale semantics of deep features in order to selectively reinforce semantic features in flood areas and weaken the noise semantics from small ground objects. Then, a Spiral Feature Pyramid (SFP) module is designed to make the deep features of SAR and optical images more consistent in spatial structure and numerical distribution patterns, so that the features of flood areas become more prominent while the noise semantics from small ground objects are further suppressed. After that, the Galerkin-type attention with linear complexity is introduced to the decoder, rapidly reconstructing the abstract semantic information of floods into interpretable flood features. Finally, the Align OPT-SAR (AlignOS) method is designed to align SAR and optical image features, enabling subsequent flood area detection. Seven metrics are adopted in the comparison between PSiam-HDSFNet and the other 14 methods. The results indicate that PSiam-HDSFNet improves change detection accuracy by extracting and processing depth features of these two images without image domain translation, and its F1 scores are improved by 7.704%, 7.664%, 4.353%, and 1.111% in the four flood coverage categories detection tasks compared to the suboptimum. Full article
Show Figures

Figure 1

36 pages, 35239 KB  
Article
SoccerDETR: Real-Time Soccer Object Detection via Visual State Space Models with Semantic-Aware Feature Fusion
by Dongyang Zhou and Yuheng Li
Technologies 2026, 14(3), 142; https://doi.org/10.3390/technologies14030142 - 27 Feb 2026
Viewed by 1082
Abstract
Real-time object detection in soccer videos presents significant challenges due to the dynamic nature of matches, varying object scales, and the stringent requirement for efficient processing. In this work, we define real-time detection as that which achieves inference speeds of at least 30 [...] Read more.
Real-time object detection in soccer videos presents significant challenges due to the dynamic nature of matches, varying object scales, and the stringent requirement for efficient processing. In this work, we define real-time detection as that which achieves inference speeds of at least 30 frames per second (FPS), which is the minimum requirement for smooth video processing and live broadcast applications. While transformer-based detectors have achieved remarkable accuracy, their quadratic computational complexity limits their real-time applications. In this paper, we propose SoccerDETR, a novel real-time detection framework that integrates MobileMamba-based visual state space models with an efficient transformer encoder for soccer object detection. Our approach introduces four key innovations: (1) a MobileMamba backbone leveraging selective state space modeling to achieve linear computational complexity while maintaining global receptive fields; (2) a Semantic-aware Dynamic Feature Fusion Module (SDFM) that adaptively aggregates multi-scale features through progressive semantic injection; (3) a Spatial-Channel Synergistic Attention (SCSA) mechanism that explores the synergistic effects between spatial and channel attention for enhanced feature representation; and (4) a Separable Dynamic Decoder that employs dynamic convolution attention to replace traditional cross-attention, significantly reducing computational overhead. Additionally, we design a Scale-Aware Focal Loss (SAFL) that addresses the class imbalance and scale variation problems inherent in soccer scenarios. Extensive experiments on the Soccana and SoccerNet datasets demonstrate that SoccerDETR achieves state-of-the-art performance with 94.2% mAP@50 on Soccana and 91.8% mAP@50 on SoccerNet, while maintaining real-time inference speed of 78 FPS on a single NVIDIA RTX 4090 GPU with a batch size of 1 and an input resolution 640 × 640. Our method outperforms existing approaches by 2.3–5.7% in mAP while being 1.5–3.2× faster, demonstrating the effectiveness of state space models for efficient sports video object detection. Comprehensive ablation studies validate the effectiveness of each proposed component, and cross-dataset experiments demonstrate strong generalization capability. Full article
Show Figures

Figure 1

23 pages, 1294 KB  
Article
Event-Driven Spatiotemporal Computing for Robust Flight Arrival Time Prediction: A Probabilistic Spiking Transformer Approach
by Quanquan Chen and Meilong Le
Aerospace 2026, 13(2), 203; https://doi.org/10.3390/aerospace13020203 - 22 Feb 2026
Viewed by 385
Abstract
Precise Estimated Time of Arrival (ETA) prediction in Terminal Maneuvering Areas (TMA) constitutes a prerequisite for efficient arrival sequencing and airspace capacity management. While data-driven approaches outperform kinematic models, conventional Recurrent Neural Networks (RNNs) exhibit limitations in modeling complex multi-aircraft spatial interactions and [...] Read more.
Precise Estimated Time of Arrival (ETA) prediction in Terminal Maneuvering Areas (TMA) constitutes a prerequisite for efficient arrival sequencing and airspace capacity management. While data-driven approaches outperform kinematic models, conventional Recurrent Neural Networks (RNNs) exhibit limitations in modeling complex multi-aircraft spatial interactions and lack the capability to quantify predictive uncertainty. Conversely, Spiking Neural Networks (SNNs) enable energy-efficient event-driven computation, yet their applicability to continuous trajectory regression is hindered by “input starvation,” where normalized state vectors fail to induce sufficient neural firing rates. This study proposes a Probabilistic Spiking Transformer (PST) architecture to integrate neuromorphic sparsity with global attention mechanisms. An Adaptive Spiking Temporal Encoding mechanism incorporating learnable linear projections is introduced to resolve the regression-spiking incompatibility, facilitating the autonomous mapping of continuous trajectory dynamics into sparse spike trains without heuristic scaling. Concurrently, a Distance-Biased Multi-Aircraft Cross-Attention (MACA) module models air traffic conflicts by weighting spatial interactions according to physical proximity, thereby embedding separation constraints into the feature extraction process. Evaluation on large-scale real-world ADS-B datasets demonstrates that the PST yields a Mean Absolute Error (MAE) of 49.27 s, representing a 60% error reduction relative to standard LSTM baselines. Furthermore, the model generates well-calibrated probabilistic distributions (Prediction Interval Coverage Probability > 94%), offering quantifiable uncertainty metrics for risk-based decision support while ensuring real-time inference suitable for operational deployment. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

25 pages, 3276 KB  
Article
SIDWA: Synthetic Image Detection Based on Discrete Wavelet Transform Stem and Deformable Sliding Window Cross-Attention
by Luo Li, Tianyi Lu, Jiaxin Song and Ke Cheng
Electronics 2026, 15(4), 891; https://doi.org/10.3390/electronics15040891 - 21 Feb 2026
Viewed by 508
Abstract
With the rapid evolution of Generative Adversarial Networks (GANs) and diffusion models (DMs), the detection of synthetic images faces significant challenges due to non-rigid artifacts and complex frequency biases. In this paper, we propose SIDWA, a novel dual-branch detection framework that leverages the [...] Read more.
With the rapid evolution of Generative Adversarial Networks (GANs) and diffusion models (DMs), the detection of synthetic images faces significant challenges due to non-rigid artifacts and complex frequency biases. In this paper, we propose SIDWA, a novel dual-branch detection framework that leverages the synergy between frequency and spatial domains. Within the spatial branch, we design a Deformable Sliding Window Cross-Attention (DSWA) module, which utilizes a learnable offset mechanism to dynamically warp the receptive field, effectively capturing distorted edges and non-linear texture features. Simultaneously, the Discrete Wavelet Transform (DWT) Stem decomposes input images into multi-scale sub-bands to preserve crucial high-frequency residues. Through a Frequency-Semantic Resonance Projector (FSRP) strategy, the semantic priors from the spatial branch act as queries to guide the model toward localized frequency anomalies, achieving a unified “where to look” and “how to analyze” approach. Experimental results for the SIDataset (SIDset) benchmark demonstrate that Synthetic Image Detection based on Discrete Wavelet Transform Stem and Deformable Sliding Window Cross-Attention (SIDWA) achieves superior performance, with an average accuracy exceeding 95% and a competitive inference time of 18.2 ms on an NVIDIA A100 GPU. Ablation studies further validate the critical role of learnable offsets and frequency integration in enhancing robustness and generalization. SIDWA offers an efficient and reliable forensic solution for combating the growing threats of sophisticated generative forgeries. Full article
Show Figures

Figure 1

Back to TopTop