Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection

Feng, Zicheng; Zhang, Wenlong; Liu, Donghui; Tao, Xingfu; Su, Ang; Yang, Yixin

doi:10.3390/rs17233801

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection

by

Zicheng Feng

^1,†

,

Wenlong Zhang

^1,*,†

,

Donghui Liu

¹,

Xingfu Tao

¹,

Ang Su

¹ and

Yixin Yang

²

¹

College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China

²

School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2025, 17(23), 3801; https://doi.org/10.3390/rs17233801 (registering DOI)

Submission received: 15 October 2025 / Revised: 11 November 2025 / Accepted: 19 November 2025 / Published: 23 November 2025

(This article belongs to the Section AI Remote Sensing)

Download Versions Notes

Abstract

In the field of infrared small target detection (ISTD), single-frame ISTD (SISTD), using only spatial features, cannot deal well with dim targets in cluttered backgrounds. In contrast, multi-frame ISTD (MISTD), utilizing spatio-temporal information from videos, can significantly enhance moving target features and effectively suppress background interference. However, current MISTD algorithms are limited by fixed-size time windows, resulting in an inability to adaptively adjust the input amount of spatio-temporal information for different detection scenarios. Moreover, utilizing spatio-temporal features remains a significant challenge in MISTD, particularly in scenarios involving slow-moving targets and fast-moving backgrounds. To address the above problems, we propose a memory-based temporal Transformer U-Net (MTTU-Net), which integrates a memory-based temporal Transformer module (MTTM) into U-Net. Specifically, MTTM utilizes the proposed D-ConvLSTM to sequentially transmit the temporal information in the form of memory, breaking through the limitation of the time window paradigm. And we propose a Transformer-based interactive fusion approach, which is dominated by spatial features of the to-be-detected frame and supplemented by temporal features in the memory, thereby effectively dealing with targets and backgrounds with various motion states. In addition, MTTM is divided into a temporal channel-cross Transformer module (TCTM) and a temporal space-cross Transformer module (TSTM), which achieve target feature enhancement and global background perception through feature interactive fusion in the channel and space dimensions, respectively. Extensive experiments on IRDST and IDSMT datasets demonstrate that our MTTU-Net outperforms existing MISTD algorithms, and they verify the effectiveness of the proposed modules.

Keywords: infrared small target; detection; moving targets; memory; transformer

Share and Cite

MDPI and ACS Style

Feng, Z.; Zhang, W.; Liu, D.; Tao, X.; Su, A.; Yang, Y. Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection. Remote Sens. 2025, 17, 3801. https://doi.org/10.3390/rs17233801

AMA Style

Feng Z, Zhang W, Liu D, Tao X, Su A, Yang Y. Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection. Remote Sensing. 2025; 17(23):3801. https://doi.org/10.3390/rs17233801

Chicago/Turabian Style

Feng, Zicheng, Wenlong Zhang, Donghui Liu, Xingfu Tao, Ang Su, and Yixin Yang. 2025. "Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection" Remote Sensing 17, no. 23: 3801. https://doi.org/10.3390/rs17233801

APA Style

Feng, Z., Zhang, W., Liu, D., Tao, X., Su, A., & Yang, Y. (2025). Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection. Remote Sensing, 17(23), 3801. https://doi.org/10.3390/rs17233801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Memory-Based Temporal Transformer U-Net for Multi-Frame Infrared Small Target Detection

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI