Next Article in Journal
Cross-Domain Land Surface Temperature Retrieval via Strategic Fine-Tuning-Based Transfer Learning: Application to GF5-02 VIMI Imagery
Previous Article in Journal
Reconstructed SWHs Based on a Deep Learning Method and the Revealed Long-Term SWH Variance Characteristics During 1993–2024
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection

1
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China
2
School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2025, 17(23), 3801; https://doi.org/10.3390/rs17233801 (registering DOI)
Submission received: 15 October 2025 / Revised: 11 November 2025 / Accepted: 19 November 2025 / Published: 23 November 2025
(This article belongs to the Section AI Remote Sensing)

Abstract

In the field of infrared small target detection (ISTD), single-frame ISTD (SISTD), using only spatial features, cannot deal well with dim targets in cluttered backgrounds. In contrast, multi-frame ISTD (MISTD), utilizing spatio-temporal information from videos, can significantly enhance moving target features and effectively suppress background interference. However, current MISTD algorithms are limited by fixed-size time windows, resulting in an inability to adaptively adjust the input amount of spatio-temporal information for different detection scenarios. Moreover, utilizing spatio-temporal features remains a significant challenge in MISTD, particularly in scenarios involving slow-moving targets and fast-moving backgrounds. To address the above problems, we propose a memory-based temporal Transformer U-Net (MTTU-Net), which integrates a memory-based temporal Transformer module (MTTM) into U-Net. Specifically, MTTM utilizes the proposed D-ConvLSTM to sequentially transmit the temporal information in the form of memory, breaking through the limitation of the time window paradigm. And we propose a Transformer-based interactive fusion approach, which is dominated by spatial features of the to-be-detected frame and supplemented by temporal features in the memory, thereby effectively dealing with targets and backgrounds with various motion states. In addition, MTTM is divided into a temporal channel-cross Transformer module (TCTM) and a temporal space-cross Transformer module (TSTM), which achieve target feature enhancement and global background perception through feature interactive fusion in the channel and space dimensions, respectively. Extensive experiments on IRDST and IDSMT datasets demonstrate that our MTTU-Net outperforms existing MISTD algorithms, and they verify the effectiveness of the proposed modules.
Keywords: infrared small target; detection; moving targets; memory; transformer infrared small target; detection; moving targets; memory; transformer

Share and Cite

MDPI and ACS Style

Feng, Z.; Zhang, W.; Liu, D.; Tao, X.; Su, A.; Yang, Y. Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection. Remote Sens. 2025, 17, 3801. https://doi.org/10.3390/rs17233801

AMA Style

Feng Z, Zhang W, Liu D, Tao X, Su A, Yang Y. Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection. Remote Sensing. 2025; 17(23):3801. https://doi.org/10.3390/rs17233801

Chicago/Turabian Style

Feng, Zicheng, Wenlong Zhang, Donghui Liu, Xingfu Tao, Ang Su, and Yixin Yang. 2025. "Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection" Remote Sensing 17, no. 23: 3801. https://doi.org/10.3390/rs17233801

APA Style

Feng, Z., Zhang, W., Liu, D., Tao, X., Su, A., & Yang, Y. (2025). Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection. Remote Sensing, 17(23), 3801. https://doi.org/10.3390/rs17233801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop