WT-HMFF: Wavelet Transform Convolution and Hierarchical Multi-Scale Feature Fusion Network for Detecting Infrared Small Targets
Abstract
1. Introduction
- (1)
- Applying the WTConv module to ISTD networks, enlarging the receptive field, preserving detailed information, and emphasizing the shape information of targets.
- (2)
- The introduction of the HMFF, which achieves the multi-scale fusion and interaction of local and global features, combining features from different network levels to retain crucial target information.
- (3)
- Extensive experiments on the IRSTD-1k dataset and the SIRST dataset, demonstrating the accuracy of the WT-HMFF network.
2. Related Work
2.1. Infrared Small Target Detection
2.2. Wavelet Transform Convolutions
2.3. Multi-Scale Feature Fusion Module
3. Methods
3.1. Overall Architecture
3.2. WTConv
3.3. HMFF Module
4. Experiment
4.1. Datasets and Evaluation Metrics
4.1.1. Datasets
4.1.2. Evaluation Metrics
- 1.
- Intersection over Union
- 2.
- Normalized Intersection over Union
- 3.
- Probability of Detection
- 4.
- False Alarm Rate
- 5.
- ROC Curve
4.2. Implementation Details
4.3. Quantitative Results
4.4. Visual Results
4.5. Ablation Study
4.5.1. Impact of WTConv Decomposition Levels
4.5.2. Impact of Using Different Wavelet Bases in WTConv
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deng, H.; Sun, X.; Liu, M.; Ye, C.; Zhou, X. Small Infrared Target Detection Based on Weighted Local Difference Measure. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4204–4214. [Google Scholar] [CrossRef]
- Thanh, N.T.; Sahli, H.; Hao, D.N. Infrared Thermography for Buried Landmine Detection: Inverse Problem Setting. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3987–4004. [Google Scholar] [CrossRef]
- Deng, L.; Chen, Q.; He, Y.; Sui, X.; Liu, Q.; Hu, L. Fire Detection with Infrared Images Using Cascaded Neural Network. J. Algorithms Comput. Technol. 2019, 13, 1748302619895433. [Google Scholar] [CrossRef]
- Teutsch, M.; Krüger, W. Classification of Small Boats in Infrared Images for Maritime Surveillance. In Proceedings of the 2010 International WaterSide Security Conference, Carrara, Italy, 3–5 November 2010; pp. 1–7. [Google Scholar]
- Pan, P.; Wang, H.; Wang, C.; Nie, C. ABC: Attention with Bilinear Correlation for Infrared Small Target Detection. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 2381–2386. [Google Scholar]
- Deshpande, S.D.; Er, M.H.; Ronda, V.; Chan, P. Max-Mean and Max-Median Filters for Detection of Small-Targets. In Proceedings of the SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, Denver, CO, USA, 18–23 July 1999. [Google Scholar]
- Bai, X.; Zhou, F. Analysis of New Top-Hat Transformation and the Application for Infrared Dim Small Target Detection. Pattern Recognit. 2010, 43, 2145–2156. [Google Scholar] [CrossRef]
- Zhang, S.; Huang, X.; Wang, M. Background Suppression Algorithm for Infrared Images Based on Robinson Guard Filter. In Proceedings of the 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), Wuhan, China, 17–19 March 2017; pp. 250–254. [Google Scholar]
- Zhang, L.; Peng, L.; Zhang, T.; Cao, S.; Peng, Z. Infrared Small Target Detection via Non-Convex Rank Approximation Minimization Joint L2,1 Norm. Remote Sens. 2018, 10, 1821. [Google Scholar] [CrossRef]
- Kim, S.; Yang, Y.; Lee, J.; Park, Y. Small Target Detection Utilizing Robust Methods of the Human Visual System for IRST. J. Infrared Milli Terahz Waves 2009, 30, 994–1011. [Google Scholar] [CrossRef]
- Bo, Y.; Wu, Y.; Wang, X. A Novel Attention-Enhanced Network for Image Super-Resolution. Eng. Appl. Artif. Intell. 2024, 130, 107709. [Google Scholar] [CrossRef]
- Liu, S.; Chen, P.; Woźniak, M. Image Enhancement-Based Detection with Small Infrared Targets. Remote Sens. 2022, 14, 3232. [Google Scholar] [CrossRef]
- Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared Patch-Image Model for Small Target Detection in a Single Image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Wang, H.; Zhou, L.; Wang, L. Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8508–8517. [Google Scholar]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric Contextual Modulation for Infrared Small Target Detection. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Virtual Conference, Waikoloa, HI, USA, 5–9 January 2021; pp. 949–958. [Google Scholar]
- Zhao, B.; Wang, C.; Fu, Q.; Han, Z. A Novel Pattern for Infrared Small Target Detection with Generative Adversarial Network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4481–4492. [Google Scholar] [CrossRef]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape Matters for Infrared Small Target Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 867–876. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Zhang, M.; Bai, H.; Zhang, J.; Zhang, R.; Wang, C.; Guo, J.; Gao, X. RKformer: Runge-Kutta Transformer with Random-Connection Attention for Infrared Small Target Detection. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1730–1738. [Google Scholar]
- Yang, H.; Mu, T.; Dong, Z.; Zhang, Z.; Wang, B.; Ke, W.; Yang, Q.; He, Z. PBT: Progressive Background-Aware Transformer for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5004513. [Google Scholar] [CrossRef]
- Wang, K.; Du, S.; Liu, C.; Cao, Z. Interior Attention-Aware Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5002013. [Google Scholar] [CrossRef]
- Chen, T.; Tan, Z.; Chu, Q.; Wu, Y.; Liu, B.; Yu, N. TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection. Proc. AAAI Conf. Artif. Intell. 2024, 38, 1201–1209. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 548–558. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. arXiv 2024, arXiv:2407.05848. [Google Scholar]
- Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense Nested Attention Network for Infrared Small Target Detection. IEEE Trans. Image Process. 2023, 32, 1745–1758. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Li, L.; Cao, S.; Pu, T.; Peng, Z. Attention-Guided Pyramid Context Networks for Detecting Infrared Small Target Under Complex Background. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 4250–4261. [Google Scholar] [CrossRef]
- Hou, Q.; Wang, Z.; Tan, F.; Zhao, Y.; Zheng, H.; Zhang, W. RISTDnet: Robust Infrared Small Target Detection Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7000805. [Google Scholar] [CrossRef]
- Wu, S.; Xiao, C.; Wang, L.; Wang, Y.; Yang, J.; An, W. RepISD-Net: Learning Efficient Infrared Small-Target Detection Network via Structural Re-Parameterization. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5622712. [Google Scholar] [CrossRef]
- Yang, J.; Liu, S.; Wu, J.; Su, X.; Hai, N.; Huang, X. Pinwheel-Shaped Convolution and Scale-Based Dynamic Loss for Infrared Small Target Detection. arXiv 2024, arXiv:2412.16986. [Google Scholar] [CrossRef]
- Liu, S.; Qiao, B.; Li, S.; Wang, Y.; Dang, L. Patch Spatial Attention Networks for Semantic Token Transformer in Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5003014. [Google Scholar] [CrossRef]
- Chen, T.; Ye, Z.; Tan, Z.; Gong, T.; Wu, Y.; Chu, Q.; Liu, B.; Yu, N.; Ye, J. MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small-Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–13. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1 October 2023; pp. 3992–4003. [Google Scholar]
- Zhang, M.; Wang, Y.; Guo, J.; Li, Y.; Gao, X.; Zhang, J. IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection. In Proceedings of the Computer Vision—ECCV 2024: 18th European Conference, Milan, Italy, 29 September–4 October 2024; Proceedings, Part LXVII.. Springer: Berlin/Heidelberg, Germany, 2024; pp. 233–249. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar]
- Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11953–11965. [Google Scholar]
- Liu, S.; Chen, T.; Chen, X.; Chen, X.; Xiao, Q.; Wu, B.; Kärkkäinen, T.; Pechenizkiy, M.; Mocanu, D.; Wang, Z. More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 Using Sparsity. arXiv 2023, arXiv:2207.03620. [Google Scholar]
- Liu, P.; Zhang, H.; Zhang, K.; Lin, L.; Zuo, W. Multi-Level Wavelet-CNN for Image Restoration. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 886–88609. [Google Scholar]
- Fujieda, S.; Takayama, K.; Hachisuka, T. Wavelet Convolutional Neural Networks. arXiv 2018, arXiv:1805.08620. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Wu, Y.-H.; Liu, Y.; Zhan, X.; Cheng, M.-M. P2T: Pyramid Pooling Transformer for Scene Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12760–12771. [Google Scholar] [CrossRef]
- Kolahi, S.G.; Chaharsooghi, S.K.; Khatibi, T.; Bozorgpour, A.; Azad, R.; Heidari, M.; Hacihaliloglu, I.; Merhof, D. MSA^2Net: Multi-Scale Adaptive Attention-Guided Network for Medical Image Segmentation. arXiv 2024, arXiv:2407.21640. [Google Scholar]
- Xie, L.; Li, C.; Wang, Z.; Zhang, X.; Chen, B.; Shen, Q.; Wu, Z. SHISRCNet: Super-Resolution and Classification Network for Low-Resolution Breast Cancer Histopathology Image. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2023; Greenspan, H., Madabhushi, A., Mousavi, P., Salcudean, S., Duncan, J., Syeda-Mahmood, T., Taylor, R., Eds.; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2023; Volume 14224, pp. 23–32. ISBN 978-3-031-43903-2. [Google Scholar]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Attentional Local Contrast Networks for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9813–9824. [Google Scholar] [CrossRef]
- Zhang, M.; Yang, H.; Guo, J.; Li, Y.; Gao, X.; Zhang, J. IRPruneDet: Efficient Infrared Small Target Detection via Wavelet Structure-Regularized Soft Channel Pruning. AAAI 2024, 38, 7224–7232. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for Infrared Small Object Detection. IEEE Trans. Image Process. 2023, 32, 364–376. [Google Scholar] [CrossRef]
- Yuan, S.; Qin, H.; Yan, X.; Akhtar, N.; Mian, A. SCTransNet: Spatial-Channel Cross Transformer Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5002615. [Google Scholar] [CrossRef]
- Liu, Q.; Liu, R.; Zheng, B.; Wang, H.; FU, Y. Infrared Small Target Detection with Scale and Location Sensitivity. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 17490–17499. [Google Scholar]
Dataset | Image Type | Image Size | Image Num | Target Size | Target Num | Background Type |
---|---|---|---|---|---|---|
IRSTD-1k | real | 512 × 512 | 1001 | 2~1065 pixels | 1~6 | cloud/city/sea/ field/river/ mountain |
SIRST | real | 96 × 135~400 × 592 | 427 | 4~330 pixels | 1~7 | cloud/city/sea |
Method | IRSTD-1k | SIRST | ||||||
---|---|---|---|---|---|---|---|---|
Pd | Fa | Pd | Fa | |||||
Top-Hat [7] | 10.06 | 7.44 | 75.11 | 1432 | 7.03 | 5.08 | 77.04 | 73.12 |
WSLCM [50] | 15.13 | 14.95 | 65.31 | 1319 | 6.39 | 28.31 | 88.74 | 4462 |
IPI [13] | 36.92 | 34.60 | 72.90 | 16.18 | 51.24 | 50.73 | 58.91 | 282.3 |
NRAM [9] | 15.25 | 9.90 | 70.68 | 16.93 | 13.54 | 18.95 | 60.04 | 25.23 |
MDvsFA [15] | 49.50 | 47.41 | 82.11 | 80.33 | 60.30 | 58.26 | 89.35 | 56.35 |
ACM [16] | 60.25 | 58.06 | 90.58 | 21.78 | 72.33 | 71.43 | 96.33 | 9.33 |
ALCNet [47] | 62.05 | 59.58 | 92.19 | 31.56 | 74.31 | 73.12 | 97.34 | 20.21 |
UIUNet [49] | 66.64 | 60.20 | 89.23 | 16.02 | 75.39 | 73.67 | 97.25 | 42.41 |
DNANet [27] | 65.24 | 64.14 | 92.52 | 17.99 | 75.76 | 71.70 | 98.87 | 13.34 |
Pconv(4, 3) [31] | 63.29 | 62.37 | 93.88 | 12.09 | 70.76 | 68.20 | 97.25 | 11.74 |
IRPruneDet [48] | 64.54 | 62.71 | 91.74 | 16.04 | 75.12 | 73.50 | 98.61 | 2.96 |
SCTransNet [50] | 65.75 | 63.34 | 92.74 | 6.26 | 73.66 | 71.38 | 99.08 | 4.79 |
WT-HMFF | 67.10 | 64.28 | 94.56 | 14.20 | 76.22 | 73.82 | 99.18 | 12.77 |
Method | IoU | Params (M) | FLOPs (G) | FPS |
---|---|---|---|---|
MdvsFA [15] | 60.30 | 3.77 | 247.11 | 44.03 |
ACM [16] | 72.33 | 0.52 | 4.03 | 205.87 |
UIUNet [49] | 75.39 | 50.5 | 65.85 | 53.57 |
DNANet [27] | 75.76 | 4.70 | 114.26 | 40.08 |
Pconv(4, 3) [31] | 70.76 | 4.03 | 43.81 | 97.36 |
IRPruneDet [48] | 75.12 | 0.18 | 0.94 | - |
SCTransNet [50] | 73.66 | 11.19 | 81.28 | 52.75 |
WT-HMFF | 76.22 | 4.04 | 48.63 | 219.32 |
Method | Pd | Fa | ||
---|---|---|---|---|
Baseline | 72.48 | 72.63 | 99.08 | 31.23 |
Baseline + WTConv | 74.11 | 72.84 | 99.08 | 16.32 |
Baseline+HMFF | 75.29 | 73.29 | 99.08 | 13.36 |
WT-HMFF | 76.22 | 73.82 | 99.18 | 12.77 |
WT Levels | Wavelet Basis | |||||||
---|---|---|---|---|---|---|---|---|
Haar | db4 | |||||||
IoU | nIoU | Pd | Fa | IoU | nIoU | Pd | Fa | |
1 level | 71.24 | 67.62 | 96.33 | 31.08 | 73.93 | 72.09 | 96.33 | 30.86 |
2 levels | 73.69 | 71.78 | 96.41 | 34.95 | 74.11 | 72.84 | 99.08 | 16.32 |
3 levels | 69.42 | 72.28 | 97.25 | 33.53 | 73.73 | 71.85 | 96.33 | 33.70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Huang, J.; Duan, Q.; Li, Z. WT-HMFF: Wavelet Transform Convolution and Hierarchical Multi-Scale Feature Fusion Network for Detecting Infrared Small Targets. Remote Sens. 2025, 17, 2268. https://doi.org/10.3390/rs17132268
Li S, Huang J, Duan Q, Li Z. WT-HMFF: Wavelet Transform Convolution and Hierarchical Multi-Scale Feature Fusion Network for Detecting Infrared Small Targets. Remote Sensing. 2025; 17(13):2268. https://doi.org/10.3390/rs17132268
Chicago/Turabian StyleLi, Siyu, Jingsi Huang, Qingwu Duan, and Zheng Li. 2025. "WT-HMFF: Wavelet Transform Convolution and Hierarchical Multi-Scale Feature Fusion Network for Detecting Infrared Small Targets" Remote Sensing 17, no. 13: 2268. https://doi.org/10.3390/rs17132268
APA StyleLi, S., Huang, J., Duan, Q., & Li, Z. (2025). WT-HMFF: Wavelet Transform Convolution and Hierarchical Multi-Scale Feature Fusion Network for Detecting Infrared Small Targets. Remote Sensing, 17(13), 2268. https://doi.org/10.3390/rs17132268