Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion
Abstract
:1. Introduction
- We introduce a novel transformer-based network called SMAformer; it takes advantage of the flexibility of the attention mechanism in the transformer to effectively extract intricate long-range information in both spatial and spectral dimensions.
- We use the sparse-attention mechanism to replace the self-attention mechanism. This enables us to achieve a sparse representation of spatial information while eliminating interfering noise in space.
- Extensive experiments on three synthetic benchmark datasets, CAVE, Harvard, and Pavia Center, show that SMAformer outperforms state-of-the-art methods.
2. Related Work
2.1. Traditional Work
2.2. Deep-Learning-Based Work
3. Method
3.1. Network Architecture
3.2. Spectral Mix-Attention Block
3.3. Sparse Spectral Mix-Attention Block
3.4. Loss Function
4. Experiments
4.1. Experimental Setup and Model Details
4.2. Datasets
4.3. Evaluation Metrics
4.3.1. PSNR
4.3.2. SSIM
4.3.3. SAM
4.3.4. ERGAS
4.3.5. UIQI
4.4. Quantitative Analysis
4.5. Qualitative Analysis
4.6. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
- Uzkent, B.; Hoffman, M.J.; Vodacek, A. Real-time vehicle tracking in aerial video using hyperspectral features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 36–44. [Google Scholar]
- Hong, D.; Yao, J.; Meng, D.; Xu, Z.; Chanussot, J. Multimodal GANs: Toward crossmodal hyperspectral–multispectral image segmentation. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5103–5113. [Google Scholar] [CrossRef]
- Aiazzi, B.; Baronti, S.; Selva, M. Improving component substitution pansharpening through multivariate regression of MS + Pan data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
- Chavez, P.; Sides, S.C.; Anderson, J.A. Comparison of three different methods to merge multiresolution and multispectral data- Landsat TM and SPOT panchromatic. Photogramm. Eng. Remote Sens. 1991, 57, 295–303. [Google Scholar]
- Burt, P.J.; Adelson, E.H. The Laplacian pyramid as a compact image code. In Readings in Computer Vision; Elsevier: Amsterdam, The Netherlands, 1987; pp. 671–679. [Google Scholar]
- Loncan, L.; De Almeida, L.B.; Bioucas-Dias, J.M.; Briottet, X.; Chanussot, J.; Dobigeon, N.; Fabre, S.; Liao, W.; Licciardi, G.A.; Simoes, M.; et al. Hyperspectral pansharpening: A review. IEEE Geosci. Remote Sens. Mag. 2015, 3, 27–46. [Google Scholar] [CrossRef]
- Starck, J.L.; Fadili, J.; Murtagh, F. The undecimated wavelet decomposition and its reconstruction. IEEE Trans. Image Process. 2007, 16, 297–309. [Google Scholar] [CrossRef]
- Bungert, L.; Coomes, D.A.; Ehrhardt, M.J.; Rasch, J.; Reisenhofer, R.; Schönlieb, C.B. Blind image fusion for hyperspectral imaging with the directional total variation. Inverse Probl. 2018, 34, 044003. [Google Scholar] [CrossRef]
- Akhtar, N.; Shafait, F.; Mian, A. Bayesian sparse representation for hyperspectral image super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3631–3640. [Google Scholar]
- Dian, R.; Fang, L.; Li, S. Hyperspectral image super-resolution via non-local sparse tensor factorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5344–5353. [Google Scholar]
- Li, S.; Dian, R.; Fang, L.; Bioucas-Dias, J.M. Fusing hyperspectral and multispectral images via coupled sparse tensor factorization. IEEE Trans. Image Process. 2018, 27, 4118–4130. [Google Scholar] [CrossRef]
- Kawakami, R.; Matsushita, Y.; Wright, J.; Ben-Ezra, M.; Tai, Y.W.; Ikeuchi, K. High-resolution hyperspectral imaging via matrix factorization. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2329–2336. [Google Scholar]
- Akhtar, N.; Shafait, F.; Mian, A. Sparse spatio-spectral representation for hyperspectral image super-resolution. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VII 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 63–78. [Google Scholar]
- Wei, Q.; Bioucas-Dias, J.; Dobigeon, N.; Tourneret, J.Y. Hyperspectral and multispectral image fusion based on a sparse representation. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3658–3668. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
- Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. Multispectral and hyperspectral image fusion using a 3-D-convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 639–643. [Google Scholar] [CrossRef]
- Xie, Q.; Zhou, M.; Zhao, Q.; Xu, Z.; Meng, D. MHF-Net: An interpretable deep network for multispectral and hyperspectral image fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1457–1473. [Google Scholar] [CrossRef]
- Shen, D.; Liu, J.; Wu, Z.; Yang, J.; Xiao, L. ADMM-HFNet: A matrix decomposition-based deep approach for hyperspectral image fusion. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
- Liu, J.; Shen, D.; Wu, Z.; Xiao, L.; Sun, J.; Yan, H. Patch-aware deep hyperspectral and multispectral image fusion by unfolding subspace-based optimization model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1024–1038. [Google Scholar] [CrossRef]
- Yao, J.; Hong, D.; Chanussot, J.; Meng, D.; Zhu, X.; Xu, Z. Cross-attention in coupled unmixing nets for unsupervised hyperspectral super-resolution. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 208–224. [Google Scholar]
- Yang, J.; Zhao, Y.Q.; Chan, J.C.W. Hyperspectral and multispectral image fusion via deep two-branches convolutional neural network. Remote Sens. 2018, 10, 800. [Google Scholar] [CrossRef]
- Hu, J.F.; Huang, T.Z.; Deng, L.J.; Dou, H.X.; Hong, D.; Vivone, G. Fusformer: A transformer-based fusion network for hyperspectral image super-resolution. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Jia, S.; Min, Z.; Fu, X. Multiscale spatial–spectral transformer network for hyperspectral and multispectral image fusion. Inf. Fusion 2023, 96, 117–129. [Google Scholar] [CrossRef]
- Cai, Y.; Lin, J.; Hu, X.; Wang, H.; Yuan, X.; Zhang, Y.; Timofte, R.; Van Gool, L. Coarse-to-fine sparse transformer for hyperspectral image reconstruction. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 686–704. [Google Scholar]
- Peng, J.; Sun, W.; Li, H.C.; Li, W.; Meng, X.; Ge, C.; Du, Q. Low-rank and sparse representation for hyperspectral image processing: A review. IEEE Geosci. Remote Sens. Mag. 2021, 10, 10–43. [Google Scholar] [CrossRef]
- Nunez, J.; Otazu, X.; Fors, O.; Prades, A.; Pala, V.; Arbiol, R. Multiresolution-based image fusion with additive wavelet decomposition. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1204–1211. [Google Scholar] [CrossRef]
- Yokoya, N.; Yairi, T.; Iwasaki, A. Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion. IEEE Trans. Geosci. Remote Sens. 2011, 50, 528–537. [Google Scholar] [CrossRef]
- Zhang, K.; Wang, M.; Yang, S.; Jiao, L. Spatial–spectral-graph-regularized low-rank tensor decomposition for multispectral and hyperspectral image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1030–1040. [Google Scholar] [CrossRef]
- Xu, Y.; Wu, Z.; Chanussot, J.; Wei, Z. Hyperspectral images super-resolution via learning high-order coupled tensor ring representation. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4747–4760. [Google Scholar] [CrossRef]
- Dian, R.; Li, S.; Guo, A.; Fang, L. Deep hyperspectral image sharpening. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5345–5355. [Google Scholar] [CrossRef]
- Keys, R. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1153–1160. [Google Scholar] [CrossRef]
- Cai, Y.; Lin, J.; Hu, X.; Wang, H.; Yuan, X.; Zhang, Y.; Timofte, R.; Van Gool, L. Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17502–17511. [Google Scholar]
- Chen, X.; Li, H.; Li, M.; Pan, J. Learning A Sparse Transformer Network for Effective Image Deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5896–5905. [Google Scholar]
- Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S.K. Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. Volume 30. [Google Scholar] [CrossRef]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13608–13618. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Chakrabarti, A.; Zickler, T. Statistics of real-world hyperspectral images. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 193–200. [Google Scholar]
- Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the JPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop, Pasadena, CA, USA, 1–5 June 1992; AVIRIS Workshop. Volume 1. [Google Scholar]
- Wald, L. Quality of high resolution synthesised images: Is there a simple criterion? In Proceedings of the Third Conference Fusion of Earth Data: Merging Point Measurements, Raster Maps and Remotely Sensed Images, Sophia Antipolis, France, 26–28 January 2000; SEE/URISCA. pp. 99–103. [Google Scholar]
- Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
- Dian, R.; Li, S.; Fang, L. Learning a low tensor-train rank representation for hyperspectral image super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2672–2683. [Google Scholar] [CrossRef]
- Wang, W.; Zeng, W.; Huang, Y.; Ding, X.; Paisley, J. Deep blind hyperspectral image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Repulic of Korea, 27–28 October 2019; pp. 4150–4159. [Google Scholar]
- Ran, R.; Deng, L.J.; Jiang, T.X.; Hu, J.F.; Chanussot, J.; Vivone, G. GuidedNet: A general CNN fusion framework via high-resolution guidance for hyperspectral image super-resolution. IEEE Trans. Cybern. 2023, 53, 4148–4161. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
Title | PSNR↑ | SSIM ↑ | SAM↓ | ERGAS↓ | UIQI↑ |
---|---|---|---|---|---|
LTTR [44] | 41.20 | 0.980 | 4.25 | 1.90 | 0.984 |
MHFnet [19] | 45.23 | 0.988 | 4.88 | 0.71 | 0.981 |
DBIN [45] | 45.02 | 0.981 | 3.38 | 0.71 | 0.992 |
ADMM-HFNet [20] | 45.48 | 0.992 | 3.39 | 0.71 | 0.992 |
SpfNet [21] | 46.29 | 0.990 | 4.24 | 1.46 | 0.980 |
Fusformer [24] | 42.18 | 0.993 | 3.07 | 1.25 | 0.992 |
GuidedNet [46] | 45.41 | 0.991 | 4.03 | 0.97 | - |
Ours | 46.56 | 0.994 | 2.92 | 0.64 | 0.995 |
Title | PSNR↑ | SSIM ↑ | SAM↓ | ERGAS↓ | UIQI↑ |
---|---|---|---|---|---|
LTTR [44] | 40.06 | 0.999 | 4.69 | 1.29 | 0.993 |
MHFnet [19] | 44.50 | 0.981 | 3.68 | 1.21 | 0.991 |
DBIN [45] | 45.33 | 0.983 | 3.04 | 1.09 | 0.995 |
ADMM-HFNet [20] | 45.53 | 0.983 | 3.04 | 1.08 | 0.995 |
SpfNet [21] | 45.09 | 0.984 | 2.31 | 0.65 | 0.997 |
Fusformer [24] | 41.96 | 0.995 | 3.33 | 2.86 | 0.995 |
GuidedNet [46] | 41.64 | 0.981 | 2.85 | 1.20 | - |
Ours | 47.86 | 0.995 | 2.25 | 0.75 | 0.997 |
N | PSNR↑ | SSIM ↑ | SAM↓ | ERGAS↓ | UIQI↑ | Time↓ |
---|---|---|---|---|---|---|
1 | 43.06 | 0.986 | 3.69 | 1.19 | 0.993 | 0.45 |
2 | 45.38 | 0.990 | 3.20 | 0.97 | 0.991 | 0.51 |
3 | 46.56 | 0.994 | 2.92 | 0.64 | 0.995 | 0.55 |
4 | 46.01 | 0.995 | 2.91 | 0.78 | 0.993 | 0.68 |
PSNR↑ | SSIM ↑ | SAM↓ | ERGAS↓ | UIQI↑ | |
---|---|---|---|---|---|
1 | 42.13 | 0.920 | 4.12 | 1.67 | 0.986 |
44.25 | 0.986 | 3.49 | 1.03 | 0.990 | |
46.56 | 0.994 | 2.92 | 0.64 | 0.995 | |
45.21 | 0.992 | 3.03 | 0.66 | 0.994 |
Method | PSNR↑ | SSIM ↑ | SAM↓ | ERGAS↓ | UIQI↑ |
---|---|---|---|---|---|
w/o SMAB | 44.38 | 0.991 | 3.24 | 0.96 | 0.990 |
w/o SSMAB | 45.02 | 0.990 | 2.96 | 0.77 | 0.992 |
Ours | 46.56 | 0.994 | 2.92 | 0.64 | 0.995 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, S.; Zhang, X.; Song, H. Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion. Remote Sens. 2024, 16, 144. https://doi.org/10.3390/rs16010144
Yu S, Zhang X, Song H. Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion. Remote Sensing. 2024; 16(1):144. https://doi.org/10.3390/rs16010144
Chicago/Turabian StyleYu, Shihai, Xu Zhang, and Huihui Song. 2024. "Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion" Remote Sensing 16, no. 1: 144. https://doi.org/10.3390/rs16010144
APA StyleYu, S., Zhang, X., & Song, H. (2024). Sparse Mix-Attention Transformer for Multispectral Image and Hyperspectral Image Fusion. Remote Sensing, 16(1), 144. https://doi.org/10.3390/rs16010144