A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition
Abstract
1. Introduction
- We design a novel triple-pooling strategy (average, max, and standard deviation) for both channel and spatial attention modules. This approach captures complementary statistical properties, including central tendency, peak responses, and distribution variability, to generate robust descriptors that comprehensively characterize smoke features under diverse conditions.
- We propose a novel dual-branch architecture that integrates channel–spatial attention with frequency-domain analysis to enhance smoke feature representation. To complement the channel–spatial attention function, the frequency attention branch incorporates discrete cosine transform (DCT) to refine multi-frequency smoke texture patterns, resulting in a robust smoke representation.
- We conduct a systematic evaluation of our framework using the widely acknowledged USTC_SmokeRS dataset, achieving state-of-the-art recognition accuracy. To assess generalizability, we further examine the adaptability of the proposed methodology on the popular Yuan smoke dataset. The consistent performance demonstrated across these diverse datasets substantiates the effectiveness of the CSFAttention module in constructing comprehensive smoke representations through the strategic integration of hybrid attention mechanisms.
2. Related Work
2.1. Traditional Hand-Crafted Features for Smoke Detection
2.2. Deep Feature Refinement by Attention Mechanisms
2.3. Attention Mechanisms for Smoke Recognition
3. The Proposed Method
3.1. CSFAttention Mechanism Module
3.1.1. Revisit Squeeze-and-Excitation Block
3.1.2. Channel–Spatial Refinement Branch
3.1.3. Frequency Refinement Branch
3.2. The Network with CSFAttention for Smoke Recognition
4. Experimental Results
4.1. Dataset
4.2. Evaluation Index
4.3. Network Train
4.4. Ablation Studies
4.5. Comparison with the Other Methods
4.5.1. Comparison with the Existing Attention Mechanisms
4.5.2. Comparison with the Existing Deep Networks
4.6. Adaptability Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ba, R.; Chen, C.; Yuan, J.; Song, W.; Lo, S. Smokenet: Satellite smoke scene detection using convolutional neural network with spatial and channel-wise attention. Remote Sens. 2019, 11, 1702. [Google Scholar] [CrossRef]
- Jain, P.; Coogan, S.C.; Subramanian, S.G.; Crowley, M.; Taylor, S.; Flannigan, M.D. A review of machine learning applications in wildfire science and management. Environ. Rev. 2020, 28, 478–505. [Google Scholar] [CrossRef]
- Ojo, J.A.; Oladosu, J.A. Video-based smoke detection algorithms: A chronological survey. Comput. Eng. Intell. Syst. 2014, 5, 38–50. [Google Scholar]
- Matlani, P.; Shrivastava, M. A survey on video smoke detection. In Information and Communication Technology for Sustainable Development: Proceedings of ICT4SD 2016; Springer: Singapore, 2018; Volume 1, pp. 211–222. [Google Scholar]
- Chaturvedi, S.; Khanna, P.; Ojha, A. A survey on vision-based outdoor smoke detection techniques for environmental safety. ISPRS J. Photogramm. Remote. Sens. 2022, 185, 158–187. [Google Scholar] [CrossRef]
- Cheng, G.; Chen, X.; Wang, C.; Li, X.; Xian, B.; Yu, H. Visual fire detection using deep learning: A survey. Neurocomputing 2024, 596, 127975. [Google Scholar] [CrossRef]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Chen, S.; Cao, Y.; Feng, X.; Lu, X. Global2salient: Self-adaptive feature aggregation for remote sensing smoke detection. Neurocomputing 2021, 466, 202–220. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zheng, Y.; Zhang, G.; Tan, S.; Yang, Z.; Wen, D.; Xiao, H. A forest fire smoke detection model combining convolutional neural network and vision transformer. Front. For. Glob. Change 2023, 6, 1136969. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Piccinini, P.; Calderara, S.; Cucchiara, R. Reliable smoke detection in the domains of image energy and color. In Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 1376–1379. [Google Scholar]
- Chunyu, Y.; Jun, F.; Jinjun, W.; Yongming, Z. Video fire smoke detection using motion and color features. Fire Technol. 2010, 46, 651–663. [Google Scholar] [CrossRef]
- Tung, T.X.; Kim, J.-M. An effective four-stage smoke-detection algorithm using video images for early fire-alarm systems. Fire Saf. J. 2011, 46, 276–282. [Google Scholar] [CrossRef]
- Ye, S.; Bai, Z.; Chen, H.; Bohush, R.; Ablameyko, S. An effective algorithm to detect both smoke and flame using color and wavelet analysis. Pattern Recognit. Image Anal. 2017, 27, 131–138. [Google Scholar] [CrossRef]
- Wang, S.; He, Y.; Yang, H.; Wang, K.; Wang, J. Video smoke detection using shape, color and dynamic features. J. Intell. Fuzzy Syst. 2017, 33, 305–313. [Google Scholar] [CrossRef]
- Wu, S.; Zhang, L. Using popular object detection methods for real time forest fire detection. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 8–9 December 2018; Volume 1, pp. 280–284. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Yuan, F. Video-based smoke detection with histogram sequence of lbp and lbpv pyramids. Fire Saf. J. 2011, 46, 132–139. [Google Scholar] [CrossRef]
- Alamgir, N.; Nguyen, K.; Chandran, V.; Boles, W. Combining multi-channel color space with local binary co-occurrence feature descriptors for accurate smoke detection from surveillance videos. Fire Saf. J. 2018, 102, 1–10. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, A.; Zhang, J.; Zhao, M.; Li, W.; Dong, N. Fire smoke detection based on texture features and optical flow vector of contour. In Proceedings of the 2016 12th World Congress on Intelligent Control and Automation (WCICA), Guilin, China, 12–15 June 2016; pp. 2879–2883. [Google Scholar]
- Wu, M.-Y.; Han, N.; Luo, Q.-J. A smoke detection algorithm based on discrete wavelet transform and correlation analysis. In Proceedings of the 2012 Fourth International Conference on Multimedia Information Networking and Security, Nanjing, China, 2–4 November 2012; pp. 281–284. [Google Scholar]
- Gubbi, J.; Marusic, S.; Palaniswami, M. Smoke detection in video using wavelets and support vector machines. Fire Saf. J. 2009, 44, 1110–1115. [Google Scholar] [CrossRef]
- Benazza-Benyahia, A.; Hamouda, N.; Tlili, F.; Ouerghi, S. Early smoke detection in forest areas from dct based compressed video. In Proceedings of the 2012 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 27–31 August 2012; pp. 2752–2756. [Google Scholar]
- Millan-Garcia, L.; Sanchez-Perez, G.; Nakano, M.; Toscano-Medina, K.; Perez-Meana, H.; Rojas-Cardenas, L. An early fire detection algorithm using ip cameras. Sensors 2012, 12, 5670–5686. [Google Scholar] [CrossRef]
- Töreyin, B.U.; Dedeoğlu, Y.; Cetin, A.E. Wavelet based real-time smoke detection in video. In Proceedings of the 2005 13th European Signal Processing Conference, Antalya, Turkey, 4–8 September 2005; pp. 1–4. [Google Scholar]
- Chen, J.; Wang, Y.; Tian, Y.; Huang, T. Wavelet based smoke detection method with rgb contrast-image and shape constrain. In Proceedings of the 2013 Visual Communications and Image Processing (VCIP), Kuching, Malaysia, 17–20 November 2013; pp. 1–6. [Google Scholar]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Reynolds, J.H.; Chelazzi, L. Attentional modulation of visual processing. Annu. Rev. Neurosci. 2004, 27, 611–647. [Google Scholar] [CrossRef]
- Chun, M.M.; Golomb, J.D.; Turk-Browne, N.B. A taxonomy of external and internal attention. Annu. Rev. Psychol. 2011, 62, 73–101. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Computer Vision—ECCV 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision—ECCV 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Yuan, Y.; Huang, L.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Ocnet: Object context network for scene parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Li, B.; Chen, Z.; Lu, L.; Qi, P.; Zhang, L.; Ma, Q.; Hu, H.; Zhai, J.; Li, X. Cascaded frameworks in underwater optical image restoration. Inf. Fusion 2025, 117, 102809. [Google Scholar] [CrossRef]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Gool, L.V. Temporal segment networks for action recognition in videos. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2740–2755. [Google Scholar] [CrossRef]
- Xie, S.; Liu, S.; Chen, Z.; Tu, Z. Attentional shapecontextnet for point cloud recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4606–4615. [Google Scholar]
- Guo, M.-H.; Cai, J.-X.; Liu, Z.-N.; Mu, T.-J.; Martin, R.R.; Hu, S.-M. Pct: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
- Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3024–3033. [Google Scholar]
- Lee, H.; Kim, H.-E.; Nam, H. Srm: A style-based recalibration module for convolutional neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1854–1862. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Qin, Z.; Zhang, P.; Wu, F.; Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 783–792. [Google Scholar]
- Park, J. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3139–3148. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Vaswani, A. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- Zhao, Y.; Wang, Y.; Jung, H.-K.; Jin, Y.; Hua, D.; Xu, S. Lightweight smoke recognition based on deep convolution and self-attention. Math. Probl. Eng. 2022, 2022, 1218713. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, Y.; Gao, S.; Li, Y.; Yu, H. Convolution-enhanced vision transformer network for smoke recognition. Fire Technol. 2023, 59, 925–948. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Li, S.; Yan, Q.; Liu, P. An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism. IEEE Trans. Image Process. 2020, 29, 8467–8475. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Yu, J.; He, Z. Deca: A novel multi-scale efficient channel attention module for object detection in real-life fire images. Appl. Intell. 2022, 52, 1362–1375. [Google Scholar] [CrossRef]
- Li, T.; Zhang, C.; Zhu, H.; Zhang, J. Adversarial fusion network for forest fire smoke detection. Forests 2022, 13, 366. [Google Scholar] [CrossRef]
- Majid, S.; Alenezi, F.; Masood, S.; Ahmad, M.; Gündüz, E.S.; Polat, K. Attention based cnn model for fire detection and localization in real-world images. Expert Syst. Appl. 2022, 189, 116114. [Google Scholar] [CrossRef]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Rao, K.R.; Yip, P. Discrete Cosine Transform: Algorithms, Advantages, Applications; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Wang, W.; Qu, J.J.; Hao, X.; Liu, Y.; Sommers, W.T. An improved algorithm for small and cool fire detection using modis data: A preliminary study in the southeastern united states. Remote. Sens. Environ. 2007, 108, 163–170. [Google Scholar] [CrossRef]
- Li, X.; Song, W.; Lian, L.; Wei, X. Forest fire smoke detection using back-propagation neural network based on modis data. Remote Sens. 2015, 7, 4473–4498. [Google Scholar] [CrossRef]
- Ba, R.; Song, W.; Li, X.; Xie, Z.; Lo, S. Integration of multiple spectral indices and a neural network for burned area mapping based on modis data. Remote Sens. 2019, 11, 326. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Fan, H.; Xiong, B.; Mangalam, K.; Li, Y.; Yan, Z.; Malik, J.; Feichtenhofer, C. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6824–6835. [Google Scholar]
- Yin, Z.; Wan, B.; Yuan, F.; Xia, X.; Shi, J. A deep normalization and convolutional neural network for image smoke detection. IEEE Access 2017, 5, 18429–18438. [Google Scholar] [CrossRef]







| Partition | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | 
|---|---|---|---|---|
| 5:2.5:2.5 | 94.36 ± 0.53 | 95.31 ± 0.66 | 93.17 ± 1.25 | 94.21 ± 0.62 | 
| 6:2:2 | 96.25 ± 0.61 | 96.55 ± 0.84 | 95.63 ± 0.69 | 96.09 ± 0.78 | 
| 7:1.5:1.5 | 95.23 ± 0.66 | 95.26 ± 0.44 | 95.07 ± 1.37 | 95.16 ± 0.72 | 
| 8:1:1 | 95.86 ± 0.47 | 95.37 ± 0.57 | 96.30 ± 1.18 | 95.83 ± 0.54 | 
| Network | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | 
|---|---|---|---|---|
| ResNet50 | 82.39 ± 0.25 | 85.95 ± 0.64 | 76.03 ± 1.51 | 80.67 ± 0.55 | 
| CSFResNet-cs | 92.67 ± 0.89 | 94.22 ± 1.45 | 90.74 ± 1.38 | 92.44 ± 0.95 | 
| CSFResNet-f | 92.46 ± 0.29 | 94.73 ± 0.71 | 89.72 ± 0.84 | 92.16 ± 0.37 | 
| CSFResNet | 96.25 ± 0.61 | 96.55 ± 0.84 | 95.63 ± 0.69 | 96.09 ± 0.78 | 
| Network | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | 
|---|---|---|---|---|
| ResNet50 | 82.39 ± 0.25 | 85.95 ± 0.64 | 76.03 ± 1.51 | 80.67 ± 0.55 | 
| SEResNet | 92.64 ± 0.32 | 94.39 ± 1.28 | 90.51 ± 2.07 | 92.39 ± 0.49 | 
| CBAMResNet | 92.14 ± 0.47 | 94.57 ± 0.29 | 89.20 ± 0.81 | 91.80 ± 0.45 | 
| SRMResNet | 86.17 ± 0.05 | 88.71 ± 0.51 | 82.49 ± 0.45 | 85.49 ± 0.05 | 
| FCAResNet | 93.09 ± 0.33 | 94.11 ± 0.35 | 91.76 ± 0.46 | 92.92 ± 0.36 | 
| SimAMResNet | 92.27 ± 0.35 | 94.55 ± 0.25 | 89.49 ± 0.89 | 91.95 ± 0.40 | 
| CSFResNet | 96.25 ± 0.61 | 96.55 ± 0.84 | 95.63 ± 0.69 | 96.09 ± 0.78 | 
| Network | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | 
|---|---|---|---|---|
| AlexNet | 88.51 ± 0.83 | 92.76 ± 0.58 | 83.21 ± 2.04 | 87.71 ± 1.00 | 
| ResNet50 | 82.39 ± 0.25 | 85.95 ± 0.64 | 76.03 ± 1.51 | 80.67 ± 0.55 | 
| densenet121 | 89.32 ± 1.06 | 92.62 ± 0.91 | 85.16 ± 1.98 | 88.72 ± 0.76 | 
| EfficientNet-b1 | 93.01 ± 1.56 | 93.90 ± 0.83 | 92.22 ± 0.25 | 93.21 ± 0.54 | 
| RepVGG-A1 | 89.94 ± 0.59 | 91.56 ± 0.56 | 87.72 ± 0.76 | 89.59 ± 0.13 | 
| ViT-B | 82.69 ± 1.88 | 83.00 ± 1.67 | 81.64 ± 2.27 | 82.31 ± 1.88 | 
| Deit-Base | 81.11 ± 1.05 | 82.56 ± 1.41 | 78.29 ± 0.15 | 80.37 ± 1.01 | 
| Swin-small | 85.78 ± 1.53 | 86.82 ± 2.06 | 83.67 ± 1.19 | 85.04 ± 1.34 | 
| Mvit-Small | 78.16 ± 1.45 | 78.46 ± 1.38 | 77.37 ± 2.47 | 77.78 ± 1.83 | 
| CSFResNet | 96.25 ± 0.61 | 96.55 ± 0.84 | 95.63 ± 0.69 | 96.09 ± 0.78 | 
| Network | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | 
|---|---|---|---|---|
| SEResNet | 98.27 ± 0.04 | 97.82 ± 0.13 | 98.67 ± 0.20 | 98.24 ± 0.05 | 
| CBAMResNet | 98.36 ± 0.26 | 97.83 ± 0.33 | 98.85 ± 0.34 | 98.34 ± 0.26 | 
| SRMResNet | 94.23 ± 0.14 | 92.12 ± 0.61 | 96.51 ± 0.65 | 94.26 ± 0.14 | 
| FCAResNet | 98.58 ± 0.03 | 98.10 ± 0.42 | 99.04 ± 0.14 | 98.56 ± 0.03 | 
| SimAMResNet | 98.41 ± 0.04 | 97.74 ± 0.27 | 99.06 ± 0.20 | 98.39 ± 0.04 | 
| AlexNet | 95.62 ± 0.72 | 95.71 ± 1.92 | 95.42 ± 2.45 | 95.52 ± 0.73 | 
| ResNet50 | 95.80 ± 0.42 | 94.61 ± 0.61 | 96.97 ± 0.39 | 95.78 ± 0.42 | 
| DenseNet-121 | 96.49 ± 0.33 | 94.93 ± 0.87 | 98.10 ± 0.64 | 96.49 ± 0.33 | 
| EfficientNet-b1 | 99.15 ± 0.17 | 99.00 ± 0.14 | 99.26 ± 0.19 | 99.13 ± 0.17 | 
| RepVGG-A1 | 99.09 ± 0.11 | 98.97 ± 0.41 | 99.19 ± 0.25 | 99.08 ± 0.11 | 
| ViT-B | 95.29 ± 0.20 | 94.67 ± 0.41 | 95.81 ± 0.27 | 95.23 ± 0.20 | 
| Deit-Small | 93.77 ± 0.37 | 92.03 ± 0.44 | 95.58 ± 0.28 | 93.77 ± 0.37 | 
| Swin-small | 94.63 ± 0.72 | 93.39 ± 0.88 | 95.77 ± 0.20 | 94.62 ± 0.72 | 
| Mvit-Small | 96.80 ± 0.20 | 96.25 ± 0.41 | 97.53 ± 0.14 | 96.81 ± 0.20 | 
| CSFResNet | 98.94 ± 0.23 | 98.80 ± 0.15 | 99.15 ± 0.44 | 98.97 ± 0.23 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, G.; Yang, L.; Yu, Z.; Li, X.; Fu, G. A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition. Fire 2025, 8, 197. https://doi.org/10.3390/fire8050197
Cheng G, Yang L, Yu Z, Li X, Fu G. A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition. Fire. 2025; 8(5):197. https://doi.org/10.3390/fire8050197
Chicago/Turabian StyleCheng, Guangtao, Lisha Yang, Zhihao Yu, Xiaobo Li, and Guanghui Fu. 2025. "A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition" Fire 8, no. 5: 197. https://doi.org/10.3390/fire8050197
APA StyleCheng, G., Yang, L., Yu, Z., Li, X., & Fu, G. (2025). A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition. Fire, 8(5), 197. https://doi.org/10.3390/fire8050197
 
        



 
       