Real-Time Segmentation of Artificial Targets Using a Dual-Modal Efficient Attention Fusion Network
Abstract
:1. Introduction
- A real-time spectral–polarimetric segmentation algorithm for artificial targets based on an efficient attention fusion network, called the ESPFNet (efficient spectral–polarimetric fusion network) is proposed. The ESPFNet employs dual input streams from a snapshot spectral camera and a snapshot polarization camera to improve the spatial resolution in detecting multiple targets in complex backgrounds while reducing the impact of illumination variations;
- A coordination attention bimodal fusion (CABF) module is designed, which leverages position attention mechanisms to optimize information alignment across encoding layers. Additionally, a complex atrous spatial pyramid pooling (CASPP) module is pro-posed to enhance the feature extraction capability. The network also incorporates a residual decoding block (RDB) to extract fused features and improve segmentation performance;
- A spectral–polarimetric image dataset of artificial targets, named SPIAO (spectral–polarimetric image of artificial objects) is constructed. Experimental results on the SPIAO dataset demonstrate that the ESPFNet accurately detects the artificial targets and meets the real-time requirement.
2. Methods
2.1. Algorithm Overview
2.2. Preprocessing of Spectral–Polarimetric Images
- Spectral calibration: To mitigate the effects of CMOS variations and imaging noise caused by changes in lighting conditions, a calibration process is performed using the white and dark reference taken under the same lighting conditions. The corrected spectral image [15] is calculated as follows:
- Band selection: To obtain a subset of bands with lower correlation and higher discriminative information, reducing the redundancy of the spectral bands while preserving high information content, an optimal clustering framework (OCF)-based band selection algorithm is applied [15];
- Band image fusion: The selected subset of bands is divided into R, G, and B band sets based on wavelengths. Principal component analysis (PCA) is then utilized to merge the different bands in the set into a single image [24]. This fusion technique retains the original spectral information while reducing the dimensionality of the data, making it compatible with subsequent detection using deep learning models.
- Fusion of multiple polarization direction images. The 0°, 45°, and 90° polarization direction images are encoded, allowing the encoded image to fully retain the polarization information of the target while also meeting the processing requirements of the neural network [21].
2.3. ESPFNet Artificial Target Segmentation Algorithm
2.3.1. Coordination Attention Bimodal Fusion Module
2.3.2. Complex Atrous Spatial Pyramid Pooling Module
2.3.3. Residual Decoding Block Module
3. Experiments and Results
3.1. Data Set
3.2. Experiment Settings
3.2.1. Parameter Settings
3.2.2. Evaluation Metrics
3.2.3. Statistical Significance Tests
3.3. Results
3.3.1. Spectral Feature Images and Polarization-Encoded Images
3.3.2. Ablation Study
3.3.3. CASPP Compared with Modules of the Same Type
3.3.4. Comparison with Other Methods
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huang, Y.; Ding, W.; Li, H. Haze removal for UAV reconnaissance images using layered scattering model. Chin. J. Aeronaut. 2016, 29, 502–511. [Google Scholar] [CrossRef]
- Gao, S.; Wu, J.; Ai, J. Multi-UAV reconnaissance task allocation for heterogeneous targets using grouping ant colony optimization algorithm. Soft Comput. 2021, 25, 7155–7167. [Google Scholar] [CrossRef]
- Yang, X.; Xu, W.; Jia, Q.; Liu, J. MF-CFI: A fused evaluation index for camouflage patterns based on human visual perception. Def. Technol. 2021, 17, 1602–1608. [Google Scholar] [CrossRef]
- Bi, H.; Zhang, C.; Wang, K.; Tong, J.; Zheng, F. Rethinking Camouflaged Object Detection: Models and Datasets. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 5708–5724. [Google Scholar] [CrossRef]
- Mondal, A. Camouflaged Object Detection and Tracking: A Survey. Int. J. Image Graph. 2020, 20, 2050028. [Google Scholar] [CrossRef]
- Feng, X.; Guoying, C.; Richang, H.; Jing, G. Camouflage texture evaluation using a saliency map. Multimed. Syst. 2015, 21, 165–175. [Google Scholar] [CrossRef]
- Zhang, X.; Zhu, C.; Wang, S.; Liu, Y.; Ye, M. A Bayesian Approach to Camouflaged Moving Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2001–2013. [Google Scholar] [CrossRef]
- Hall, J.R.; Cuthill, I.C.; Baddeley, R.; Shohte, A.J.; Scott-Samuel, N.E. Camouflage, detection and identification of moving targets. Proc. Biol. Sci. 2013, 280, 20130064. [Google Scholar] [CrossRef]
- Fan, D.; Ji, G.; Sun, G.; Cheng, M.; Shen, J.; Shao, L. Camouflaged Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2777–2787. [Google Scholar]
- Wang, K.; Bi, H.; Zhang, Y.; Zhang, C.; Liu, Z.; Zheng, S. D2C-Net: A dual-branch, dual-guidance and cross-refine network for camouflaged object detection. IEEE Trans. Ind. Electron. 2021, 69, 5364–5374. [Google Scholar] [CrossRef]
- Zhou, T.; Zhou, Y.; Gong, C. Feature aggregation and propagation network for camouflaged object detection. IEEE Trans. Image Process. 2022, 31, 7036–7047. [Google Scholar] [CrossRef] [PubMed]
- Mei, H.; Ji, G.; Wei, Z.; Yang, X.; Wei, X.; Fan, D. Camouflaged object segmentation with distraction mining. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8772–8781. [Google Scholar]
- Tan, J.; Zhang, J.; Zou, B. Camouflage target detection based on polarized spectral features. In Proceedings of the SPIE 9853, Polarization: Measurement, Analysis, and Remote Sensing XII, Baltimore, MD, USA, 17–21 May 2016. [Google Scholar]
- Shen, Y.; Lin, W.; Wang, Z.; Li, J.; Sun, X.; Wu, X.; Wang, S.; Huang, F. Rapid detection of camouflaged artificial target based on polarization imaging and deep learning. IEEE Photonics J. 2021, 13, 1–9. [Google Scholar] [CrossRef]
- Shen, Y.; Li, J.; Lin, W.; Chen, L.; Huang, F.; Wang, S. Camouflaged target detection based on snapshot multispectral imaging. Remote Sens. 2021, 13, 3949. [Google Scholar] [CrossRef]
- Zhou, P.C.; Liu, C.C. Camouflaged target separation by spectral-polarimetric imagery fusion with shearlet transform and Clustering Segmentation. In Proceedings of the International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Sensors and Applications, Beingjing, China, 21 August 2013. [Google Scholar]
- Islam, M.N.; Tahtali, M.; Pickering, M. Hybrid fusion-based background segmentation in multispectral polarimetric imagery. Remote Sens. 2020, 12, 1776. [Google Scholar] [CrossRef]
- Tan, J.; Zhang, J.; Zhang, Y. Target detection for polarized hyperspectral images based on tensor decomposition. IEEE Geosci. Remote Sens. Lett. 2017, 14, 674–678. [Google Scholar] [CrossRef]
- Zhang, J.; Tan, J.; Zhang, Y. Joint sparse tensor representation for the target detection of polarized hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2235–2239. [Google Scholar] [CrossRef]
- Xiang, K.; Yang, K.; Wang, K. Polarization-driven semantic segmentation via efficient attention-bridged fusion. Opt. Express 2021, 29, 4802–4820. [Google Scholar] [CrossRef]
- Blin, R.; Ainouz, S.; Canu, S.; Meriaudeau, F. The PolarLITIS dataset: Road scenes under fog. IEEE Trans. Intell. Transp. Syst. 2022, 23, 10753–10762. [Google Scholar] [CrossRef]
- Sattar, S.; Lapray, P.; Foulonneau, A.; Bigué, L. Review of spectral and polarization imaging systems. In Proceedings of the Unconventional Optical Imaging II, Online, 6–10 April 2020; SPIE: Bellingham, WA, USA, 2020; Volume 11351, pp. 191–203. [Google Scholar]
- Ning, J.; Xu, Z.; Wu, D.; Zhang, R.; Wang, Y.; Xie, Y.; Zhao, W.; Ren, W. Compressive circular polarization snapshot spectral imaging. Opt. Commun. 2021, 491, 126946. [Google Scholar] [CrossRef]
- Son, D.; Kwon, H.; Lee, S. Visible and near-infrared image synthesis using PCA fusion of multiscale layers. Appl. Sci. 2020, 10, 8702. [Google Scholar] [CrossRef]
- Li, N.; Zhao, Y.; Pan, Q.; Kong, S.G. Demosaicking DoFP images using newton’s polynomial interpolation and polarization difference model. Opt. Express 2019, 27, 1376–1391. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Roy, A.G.; Navab, N.; Wachinger, C. Concurrent spatial and channel squeeze & excitation in fully convolutional networks. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Granada, Spain, 16–20 September 2018; pp. 421–429. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Wang, Q.; Zhang, F.; Li, X. Optimal clustering framework for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5910–5922. [Google Scholar] [CrossRef]
- Shi, B.; Liu, C.; Sun, W.; Chen, N. Sparse nonnegative matrix factorization for hyperspectral optimal band selection. Acta Geod. Cartogr. Sin. 2013, 42, 351–357. [Google Scholar]
- Matteoli, S.; Diani, M.; Theiler, J. An overview of background modeling for detection of targets and anomalies in hyperspectral remotely sensed imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 2317–2336. [Google Scholar] [CrossRef]
- Lu, B.; Dao, P.D.; Liu, J.; He, Y.; Shang, J. Recent advances of hyperspectral imaging technology and applications in agriculture. Remote Sens. 2020, 12, 2659. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Wang, C.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Image Computing and Computer Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5693–5703. [Google Scholar]
- Guo, M.; Lu, C.; Hou, Q.; Liu, Z.; Cheng, M.; Hu, S. SegNeXt: Rethinking convolutional attention design for semantic segmentation. arXiv 2022, arXiv:2209.08575. [Google Scholar]
- Kim, J.; Koh, J.; Kim, Y.; Choi, J.; Hwang, Y.; Choi, J.W. Robust deep multi-modal learning based on gated information fusion network. In Proceedings of the 2018 Asian Coference on Computer Vision (ACCV), Perth, Australia, 2–6 December 2018; pp. 90–106. [Google Scholar]
- Hu, X.; Yang, K.; Fei, L.; Wang, K. ACNET: Attention based network to exploit complementary features for rgbd semantic segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1440–1444. [Google Scholar]
- Seichter, D.; Köhler, M.; Lewandowski, B.; Wengefeld, T.; Gross, H.M. Efficient rgb-d semantic segmentation for indoor scene analysis. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13525–13531. [Google Scholar]
- Cao, Z. C3Net: Cross-modal feature recalibrated, cross-scale semantic aggregated and compact network for semantic segmentation of multi-modal high-resolution aerial images. Remote Sens. 2021, 13, 528. [Google Scholar] [CrossRef]
- Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G.W. Understanding convolution for semantic seg-mentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NA, USA, 12–15 March 2018; pp. 1451–1460. [Google Scholar]
- Zhou, W.; Lv, Y.; Lei, J.; Yu, L. Global and local-contrast guides content-aware fusion for rgb-d saliency prediction. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 3641–3649. [Google Scholar] [CrossRef]
Base | CABF | CASPP | RDB | MIoU/% | MPA/% |
---|---|---|---|---|---|
✓ | 71.5 | 81.2 | |||
✓ | ✓ | 76.8 | 85.4 | ||
✓ | ✓ | 77.1 | 85.6 | ||
✓ | ✓ | 75.6 | 84.4 | ||
✓ | ✓ | ✓ | 78.4 | 86.9 | |
✓ | ✓ | ✓ | 78.1 | 86.8 | |
✓ | ✓ | ✓ | 79.6 | 87.1 | |
✓ | ✓ | ✓ | ✓ | 80.4 | 88.1 |
Module | PPM | ASPP | RFB | SPPFCSPC | CASPP |
---|---|---|---|---|---|
Size/MB | 0.788 | 8.92 | 1.32 | 7.09 | 0.853 |
Module | MIoU/% | MPA/% |
---|---|---|
ESPFNet (PPM) | 79.4 | 87.2 |
ESPFNet (ASPP) | 79.5 | 87.3 |
ESPFNet (RFB) | 78.9 | 86.8 |
ESPFNet (SPPFCSPC) | 79.2 | 86.9 |
ESPFNet (CASPP) | 80.4 | 88.1 |
Input | Method | Backbone | Camouflaged Plates (IoU/%) | Camouflaged Nets (IoU/%) | MIoU/% | MPA/% | FPS |
---|---|---|---|---|---|---|---|
Polarization | FCN | Resnet18 | 51.2 | 64.8 | 58.0 | 66.5 | 27.0 |
U-Net | Resnet18 | 53.4 | 66.2 | 59.8 | 68.4 | 30.6 | |
PSPNet | Resnet18 | 52.5 | 65.9 | 59.2 | 67.5 | 30.0 | |
DeepLabv3+ | Xception | 55.2 | 65.3 | 60.3 | 69.3 | 25.5 | |
HRNet | Hrnet | 59.4 | 75.6 | 67.5 | 75.5 | 17.8 | |
SegNeXt | Mscan | 58.9 | 76.3 | 67.6 | 75.2 | 40.0 | |
Spectral | FCN | Resnet18 | 50.4 | 66.9 | 58.7 | 66.9 | 27.0 |
U-Net | Resnet18 | 52.5 | 69.6 | 61.1 | 68.7 | 30.6 | |
PSPNet | Resnet18 | 51.1 | 68.2 | 59.7 | 67.9 | 30.0 | |
DeepLabv3+ | Xception | 54.9 | 69.2 | 62.1 | 69.9 | 25.5 | |
HRNet | Hrnet | 57.6 | 78.1 | 67.9 | 75.9 | 17.8 | |
SegNeXt | Mscan | 57.2 | 77.8 | 67.5 | 75.7 | 40.0 | |
Polarization + Spectral | GIFNet | Resnet18 | 61.5 | 78.6 | 70.1 | 78.2 | 17.5 |
ACNet | Resnet18 | 63.2 | 79.4 | 71.3 | 79.1 | 25.0 | |
ESANet | Resnet18 | 70.2 | 84.3 | 77.3 | 85.5 | 24.5 | |
EAFNet | Resnet18 | 69.7 | 82.6 | 76.2 | 83.8 | 28.0 | |
ESPFNet (ours) | Resnet18 | 75.2 | 85.6 | 80.4 | 88.1 | 27.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, Y.; Liu, X.; Zhang, S.; Xu, Y.; Zeng, D.; Wang, S.; Huang, F. Real-Time Segmentation of Artificial Targets Using a Dual-Modal Efficient Attention Fusion Network. Remote Sens. 2023, 15, 4398. https://doi.org/10.3390/rs15184398
Shen Y, Liu X, Zhang S, Xu Y, Zeng D, Wang S, Huang F. Real-Time Segmentation of Artificial Targets Using a Dual-Modal Efficient Attention Fusion Network. Remote Sensing. 2023; 15(18):4398. https://doi.org/10.3390/rs15184398
Chicago/Turabian StyleShen, Ying, Xiancai Liu, Shuo Zhang, Yixuan Xu, Dawei Zeng, Shu Wang, and Feng Huang. 2023. "Real-Time Segmentation of Artificial Targets Using a Dual-Modal Efficient Attention Fusion Network" Remote Sensing 15, no. 18: 4398. https://doi.org/10.3390/rs15184398
APA StyleShen, Y., Liu, X., Zhang, S., Xu, Y., Zeng, D., Wang, S., & Huang, F. (2023). Real-Time Segmentation of Artificial Targets Using a Dual-Modal Efficient Attention Fusion Network. Remote Sensing, 15(18), 4398. https://doi.org/10.3390/rs15184398