FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection
Abstract
1. Introduction
- Sparse intrinsic features. Infrared targets typically occupy an extremely small number of pixels and thus lack shape and texture features.
- Low contrast. Infrared imaging relies on radiation energy of scenes, resulting in lacking color information. The boundary between targets and background is highly indistinct, so the saliency is unobvious.
- Dense clutter in complex scenes. There is heavy clutter and noise in the scenes with similar energy and structure as targets, which makes it difficult for high-precision separation.
- This paper proposes FM-Net, which effectively integrates frequency information to promote infrared small target detection. Wavelet residual blocks (WRBs) in encoders extract multi-level frequency details in feature maps, which have larger receptive fields to perceive local contexts.
- Instead of skip connection, a frequency-modulation masked-attention module (FMM) is proposed to facilitate semantic fusion across full-scale feature and learn long-range context association of infrared images.
- Experimental results on three benchmark datasets (e.g., SIRST, IRSTD-1k, NUDT-SIRST) demonstrate that our FM-Net achieves superior detection performance.
2. Related Works
2.1. Infrared Small Target Detection
2.2. Mask Attention
2.3. Frequency Learning on Vision Tasks
3. Methodology
3.1. Overall Architecture
3.2. Wavelet Residual Block
3.3. Frequency-Modulation Masked-Attention Module
3.3.1. Mask Attention Mechanism
3.3.2. Channel-Wise Frequency Modulation Module
4. Experiments
4.1. Evaluation Metrics
4.2. Experiment Setting
4.3. Quantitative Results
4.4. Qualitative Results
4.5. Ablation Study
4.5.1. Network Framework
4.5.2. Impact of WRB
- FM-Net w/o WL. We remove the wavelet layer and only use the residual block to extract features in the encoder.
- FM-Net w/o CBAM. We remove all CBAMs in WRB and only use convolution operation to process frequency maps.
- FM-Net w j = 1. We change the deepest level of wavelet layer, only performing feature extraction and enhancement once.
- FM-Net w j = 3. We change the deepest level of wavelet layer and the wavelet pyramid has three levels.
- FM-Net w db2. We change the type of wavelet transform, using Daubechies wavelet of order 2.
- FM-Net w coif1. We change the type of wavelet transform, using Coiflets wavelet of order 1.
4.5.3. Impact of FMM
- FM-Net w/o FMM. Removing FMM, we use the plain skip connection to pass encoded features to decoder.
- FM-Net w cat. We remove the FMM in this variant and still utilize convolutions to aggregate the encoded features, which are concatenated together
- FM-Net w/o MA. In this variant, we remove the MA module in FMM to investigate the benefit of MA.
- FM-Net w MA2. We change the number of MA in FMM. This variant has two cascaded MA modules to extract feature.
- FM-Net w MA3. This variant has three cascaded MA modules in FMM.
- FM-Net w/o CFM. We remove the CFM in FMM in this variant to investigate the benefit of CFM.
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fu, J.; Li, F.; Zhao, J.; Wang, Y.; Zhang, H. Maritime Infrared Ship Detection in UAV Imagery Based on Two-Stage Region-Segmentation-Guided Learning Network. IEEE Trans. Instrum. Meas. 2025, 74, 5028516. [Google Scholar] [CrossRef]
- Huang, B.; Li, J.; Chen, J.; Wang, G.; Zhao, J.; Xu, T. Anti-UAV410: A thermal infrared benchmark and customized scheme for tracking drones in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 2852–2865. [Google Scholar] [CrossRef] [PubMed]
- Aibibu, T.; Lan, J.; Zeng, Y.; Lu, W.; Gu, N. Feature-enhanced attention and dual-gelan net (feadg-net) for uav infrared small object detection in traffic surveillance. Drones 2024, 8, 304. [Google Scholar] [CrossRef]
- Sun, Y.; Yang, J.; An, W. Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model. IEEE Trans. Geosci. Remote Sens. 2020, 59, 3737–3752. [Google Scholar] [CrossRef]
- Rivest, J.F.; Fortin, R. Detection of dim targets in digital infrared imagery by morphological image processing. Opt. Eng. 1996, 35, 1886–1893. [Google Scholar] [CrossRef]
- Deshpande, S.D.; Er, M.H.; Venkateswarlu, R.; Chan, P. Max-mean and max-median filters for detection of small targets. In Signal and Data Processing of Small Targets 1999; SPIE: Bellingham, DC, USA, 1999; Volume 3809, pp. 74–83. [Google Scholar]
- Han, J.; Moradi, S.; Faramarzi, I.; Zhang, H.; Zhao, Q.; Zhang, X.; Li, N. Infrared small target detection based on the weighted strengthened local contrast measure. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1670–1674. [Google Scholar] [CrossRef]
- Han, J.; Moradi, S.; Faramarzi, I.; Liu, C.; Zhang, H.; Zhao, Q. A local contrast method for infrared small-target detection utilizing a tri-layer window. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1822–1826. [Google Scholar] [CrossRef]
- Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared patch-image model for small target detection in a single image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef]
- Liu, T.; Liu, Y.; Yang, J.; Li, B.; Wang, Y.; An, W. Graph Laplacian regularization for fast infrared small target detection. Pattern Recognit. 2025, 158, 111077. [Google Scholar] [CrossRef]
- Liu, T.; Yang, J.; Li, B.; Wang, Y.; An, W. Representative coefficient total variation for efficient infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
- Kou, R.; Wang, C.; Peng, Z.; Zhao, Z.; Chen, Y.; Han, J.; Huang, F.; Yu, Y.; Fu, Q. Infrared small target segmentation networks: A survey. Pattern Recognit. 2023, 143, 109788. [Google Scholar] [CrossRef]
- Liu, Q.; Liu, R.; Zheng, B.; Wang, H.; Fu, Y. Infrared small target detection with scale and location sensitivity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17490–17499. [Google Scholar]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric contextual modulation for infrared small target detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 950–959. [Google Scholar]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Attentional local contrast networks for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9813–9824. [Google Scholar] [CrossRef]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 18 2022; pp. 877–886. [Google Scholar]
- Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense nested attention network for infrared small target detection. IEEE Trans. Image Process. 2022, 32, 1745–1758. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Trans. Image Process. 2022, 32, 364–376. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.; Li, B.; Luo, Y.; Wang, Y.; Xiao, C.; Liu, T.; Yang, J.; An, W.; Guo, Y. MTU-Net: Multilevel TransUNet for space-based infrared tiny ship detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
- Yuan, S.; Qin, H.; Yan, X.; Akhtar, N.; Mian, A. Sctransnet: Spatial-channel cross transformer network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5002615. [Google Scholar] [CrossRef]
- Li, B.; Ying, X.; Li, R.; Liu, Y.; Shi, Y.; Li, M.; Zhang, X.; Hu, M.; Wu, C.; Zhang, Y.; et al. ICPR 2024 Competition on Resource-Limited Infrared Small Target Detection Challenge: Methods and Results. In Proceedings of the International Conference on Pattern Recognition, Kolkata, India, 1–5 December 2024; pp. 62–77. [Google Scholar]
- Zhu, Y.; Ma, Y.; Fan, F.; Huang, J.; Yao, Y.; Zhou, X.; Huang, R. Towards Robust Infrared Small Target Detection via Frequency and Spatial Feature Fusion. IEEE Trans. Geosci. Remote Sens. 2025, 63, 2001115. [Google Scholar] [CrossRef]
- He, H.; Wan, M.; Xu, Y.; Kong, X.; Liu, Z.; Chen, Q.; Gu, G. WTAPNet: Wavelet Transform-based Augmented Perception Network for Infrared Small Target Detection. IEEE Trans. Instrum. Meas. 2024, 73, 5037217. [Google Scholar] [CrossRef]
- Huang, Y.; Zhi, X.; Hu, J.; Yu, L.; Han, Q.; Chen, W.; Zhang, W. FDDBA-NET: Frequency domain decoupling bidirectional interactive attention network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5004416. [Google Scholar] [CrossRef]
- Chen, T.; Ye, Z. FreqODEs: Frequency neural ODE networks for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5005912. [Google Scholar] [CrossRef]
- Cai, S.; Yang, J.; Xiang, T.; Bai, J. Frequency-Aware Contextual Feature Pyramid Network for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2025, 22, 6501205. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Zhang, M.; Wang, Y.; Guo, J.; Li, Y.; Gao, X.; Zhang, J. IRSAM: Advancing segment anything model for infrared small target detection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 233–249. [Google Scholar]
- Zhang, M.; Yue, K.; Guo, J.; Zhang, Q.; Zhang, J.; Gao, X. Computational Fluid Dynamic Network for Infrared Small Target Detection. IEEE Trans. Neural Netw. Learn. Syst. 2025, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Gu, L.; Li, L.; Yan, C.; Fu, Y. Frequency Dynamic Convolution for Dense Image Prediction. arXiv 2025, arXiv:2503.18783. [Google Scholar]
- Cui, Y.; Ren, W.; Cao, X.; Knoll, A. Image restoration via frequency selection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 1093–1108. [Google Scholar] [CrossRef]
- Chen, L.; Fu, Y.; Gu, L.; Yan, C.; Harada, T.; Huang, G. Frequency-aware feature fusion for dense image prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10763–10780. [Google Scholar] [CrossRef] [PubMed]
- Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet convolutions for large receptive fields. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 363–380. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–22 June 2022; pp. 1290–1299. [Google Scholar]
- Cheng, A.; Yin, C.; Chang, Y.; Ping, H.; Li, S.; Nazarian, S.; Bogdan, P. MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation. arXiv 2025, arXiv:2503.10686. [Google Scholar]
- Pitas, I. Digital Image Processing Algorithms and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
- Wang, H.; Wu, X.; Huang, Z.; Xing, E.P. High-frequency component helps explain the generalization of convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8684–8694. [Google Scholar]
- Chen, L.; Gu, L.; Zheng, D.; Fu, Y. Frequency-adaptive dilated convolution for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3414–3425. [Google Scholar]
- Huang, J.; Huang, R.; Xu, J.; Peng, S.; Duan, Y.; Deng, L.J. Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 3662–3670. [Google Scholar]
- Li, K.; Wang, D.; Hu, Z.; Zhu, W.; Li, S.; Wang, Q. Unleashing channel potential: Space-frequency selection convolution for SAR object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17323–17332. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, H.; Cao, P.; Wang, J.; Zaiane, O.R. Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 2441–2449. [Google Scholar]
- Ma, J.; Chen, J.; Ng, M.; Huang, R.; Li, Y.; Li, C.; Yang, X.; Martel, A.L. Loss odyssey in medical image segmentation. Med. Image Anal. 2021, 71, 102035. [Google Scholar] [CrossRef]
- Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3; Springer: Berlin/Heidelberg, Germany, 2017; pp. 240–248. [Google Scholar]
- Fan, Y.; Lyu, S.; Ying, Y.; Hu, B. Learning with average top-k loss. Adv. Neural Inf. Process. Syst. 2017, 30, 497–505. [Google Scholar] [CrossRef]
Method | IRSTD-1k [16] | SIRST [14] | NUDT-SIRST [17] | #Params↓ | FLOPs↓ | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
IoU↑ | Pd↑ | Fa↓ | IoU↑ | Pd↑ | Fa↓ | IoU↑ | Pd↑ | Fa↓ | |||||
Traditional | Top-Hat [5] | 10.06 | 75.11 | 1432 | 7.14 | 79.84 | 1012 | 20.72 | 78.41 | 166.7 | - | - | |
Max-Median [6] | 7.00 | 65.21 | 59.73 | 4.17 | 69.20 | 55.33 | 4.197 | 58.41 | 36.89 | - | - | ||
WSLCM [7] | 3.452 | 72.44 | 6619 | 1.16 | 77.95 | 5446 | 2.283 | 56.82 | 1309 | - | - | ||
TLLCM [8] | 3.311 | 77.39 | 6738 | 1.03 | 79.09 | 5899 | 2.176 | 62.01 | 1608 | - | - | ||
IPI [9] | 27.92 | 81.37 | 16.18 | 25.67 | 85.55 | 11.47 | 17.76 | 74.49 | 41.23 | - | - | ||
MSLSTIPT [4] | 11.43 | 79.03 | 1524 | 10.30 | 82.13 | 1131 | 8.342 | 47.40 | 888.1 | - | - | ||
Deep Learning | Spatial | ACM [14] | 60.33 | 93.94 | 69.31 | 68.34 | 93.92 | 30.12 | 64.86 | 96.72 | 28.59 | 0.398 | 1.33 |
DNANet [17] | 65.74 | 90.24 | 13.17 | 73.60 | 93.54 | 38.49 | 94.19 | 99.26 | 2.436 | 4.697 | 56.11 | ||
UIUNet [18] | 65.69 | 91.58 | 14.18 | 76.21 | 92.40 | 10.70 | 93.48 | 98.31 | 7.79 | 50.54 | 186.28 | ||
MTU-Net [19] | 66.11 | 93.27 | 36.80 | 74.78 | 93.54 | 22.36 | 74.85 | 93.97 | 46.95 | 8.221 | 24.777 | ||
SCTransNet [20] | 68.03 | 93.27 | 10.74 | 76.32 | 96.96 | 14.41 | 94.10 | 98.73 | 7.101 | 11.20 | 40.43 | ||
Freq | WTAPNet [23] | 67.21 | 92.75 | 27.82 | 76.26 | 94.29 | 24.83 | 87.71 | 99.07 | 9.537 | 14.83 | 103.24 | |
FM-Net (Ours) | 68.13 | 95.62 | 8.085 | 76.75 | 96.96 | 15.710 | 94.76 | 99.15 | 1.838 | 3.606 | 14.171 |
Model Channel | Metrics | #Params↓ | ||
---|---|---|---|---|
IoU↑ | Pd↑ | Fa↓ | ||
C = 8, 16, 32, 64, 64 | 66.50 | 88.22 | 9.015 | 0.921 |
C = 16, 32, 64, 128, 128 | 68.13 | 95.62 | 8.085 | 3.606 |
C = 32, 64, 128, 256, 256 | 67.65 | 94.28 | 30.745 | 14.209 |
Model Patch Size | Metrics | #Params↓ | ||
---|---|---|---|---|
IoU↑ | Pd↑ | Fa↓ | ||
P = 16 × 16 | 67.96 | 89.23 | 30.593 | 4.392 |
P = 32 × 32 | 68.13 | 95.62 | 8.085 | 3.606 |
P = 64 × 64 | 67.44 | 92.93 | 13.873 | 3.409 |
Train Strategy | Metrics | ||
---|---|---|---|
IoU↑ | Pd↑ | Fa↓ | |
FM-Net w/o DS | 67.32 | 92.93 | 12.545 |
FM-Net | 68.13 | 95.62 | 8.085 |
Model | Metrics | #Params↓ | ||
---|---|---|---|---|
IoU↑ | Pd↑ | Fa↓ | ||
FM-Net w/o WL | 67.42 | 86.87 | 25.754 | 3.239 |
FM-Net w/o CBAM | 67.99 | 92.93 | 46.346 | 3.300 |
FM-Net w j = 1 | 68.03 | 91.58 | 11.482 | 3.427 |
FM-Net w j = 3 | 67.58 | 91.58 | 14.670 | 3.785 |
FM-Net w db2 | 67.71 | 93.94 | 20.876 | 3.606 |
FM-Net w coif1 | 66.71 | 91.25 | 9.394 | 3.606 |
FM-Net | 68.13 | 95.62 | 8.085 | 3.606 |
Model | Metrics | #Params↓ | ||
---|---|---|---|---|
IoU↑ | Pd↑ | Fa↓ | ||
FM-Net w/o FMM | 67.85 | 92.93 | 25.45 | 2.231 |
FM-Net w cat | 67.60 | 94.61 | 33.934 | 2.288 |
FM-Net w/o MA | 66.95 | 93.60 | 17.062 | 3.432 |
FM-Net w MA2 | 67.64 | 91.25 | 34.978 | 3.780 |
FM-Net w MA3 | 67.01 | 91.25 | 46.687 | 3.954 |
FM-Net w/o CFM | 67.87 | 94.28 | 9.375 | 2.462 |
FM-Net | 68.13 | 95.62 | 8.085 | 3.606 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Lin, Z.; Li, B.; Liu, T.; An, W. FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection. Remote Sens. 2025, 17, 2264. https://doi.org/10.3390/rs17132264
Liu Y, Lin Z, Li B, Liu T, An W. FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection. Remote Sensing. 2025; 17(13):2264. https://doi.org/10.3390/rs17132264
Chicago/Turabian StyleLiu, Yongxian, Zaiping Lin, Boyang Li, Ting Liu, and Wei An. 2025. "FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection" Remote Sensing 17, no. 13: 2264. https://doi.org/10.3390/rs17132264
APA StyleLiu, Y., Lin, Z., Li, B., Liu, T., & An, W. (2025). FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection. Remote Sensing, 17(13), 2264. https://doi.org/10.3390/rs17132264