Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy
Abstract
1. Introduction
- (1)
- To address the extraction ability imbalance between the encoder and the hyperencoder, an enhanced image compression network that incorporates a residual block and a parameter estimation module on the hyper-encoding side is explored to reinforce the feature extraction performance of the hyper-encoder.
- (2)
- It has been discovered that there is a fixed pattern in the generation of latent representations by encoders. To address this problem, a NCEG is proposed for secondary processing of latent representations. Specifically, NCEG is divided into two processes, namely graph generation and graph aggregation. The NCEG module associates similar feature maps to further improve the side information extraction performance of the hyper-encoder.
- (3)
- A long-dependent residual network is selected as the backbone, and a sparse attention module is inserted into the encoder/decoder side to enlarge the perceptual field of the network. Note that without a matching hyper-encoder, simply replacing the original attention mechanism with a sparse attention mechanism will not improve the rate-distortion performance.
- (4)
- The overall performance of the proposed model is evaluated on two public datasets. The experimental results demonstrate that the proposed approach achieves optimal results on two evaluation datasets.
2. Related Works
2.1. Graph Neural Network (GNN)
2.2. Nonlocal Processing
2.3. Attention Mechanism
2.4. Hand-Crafted Compression
2.5. Learned Compression
3. Proposed Methods
3.1. Formulation of Learned Image Compression
3.2. Sparse Attention Module
3.3. Overall Network Architecture
3.4. Nonlocal Cross-Channel Efficient Graph
4. Experiments and Analysis
4.1. Experimental Setup
4.2. Quantitative Results
4.3. Qualitative Results
4.4. Ablation Study
4.5. Entropy Module Visualization
4.6. Further Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wallace, G. The jpeg still picture compression standard. IEEE Trans. Consum. Electron. 1992, 38, 18–34. [Google Scholar] [CrossRef]
- Christopoulos, C.; Skodras, A.; Ebrahimi, T. The jpeg2000 still image coding system: An overview. IEEE Trans. Consum. Electron. 2000, 46, 1103–1127. [Google Scholar] [CrossRef]
- Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (hevc) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Minnen, D.; Ballé, J.; Toderici, G.D. Joint autoregressive and hierarchical priors for learned image compression. In Proceedings of the Annual Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. arXiv 2018, arXiv:1802.01436. [Google Scholar]
- Zhang, X.; Wu, X. Attention-guided image compression by deep reconstruction of compressive sensed saliency skeleton. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13354–13364. [Google Scholar]
- Song, M.; Choi, J.; Han, B. Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2380–2389. [Google Scholar]
- Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7939–7948. [Google Scholar]
- Zou, R.; Song, C.; Zhang, Z. The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17492–17501. [Google Scholar]
- He, D.; Zheng, Y.; Sun, B.; Wang, Y.; Qin, H. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 14771–14780. [Google Scholar]
- Islam, K.; Dang, L.M.; Lee, S.; Moon, H. Image compression with recurrent neural network and generalized divisive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 1875–1879. [Google Scholar]
- Qian, Y.; Tan, Z.; Sun, X.; Lin, M.; Li, D.; Sun, Z.; Li, H.; Jin, R. Learning accurate entropy model with global reference for image compression. arXiv 2020, arXiv:2010.08321. [Google Scholar]
- Iwai, S.; Miyazaki, T.; Sugaya, Y.; Omachi, S. Fidelitycontrollable extreme image compression with generative adversarial networks. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 8235–8242. [Google Scholar]
- Agustsson, E.; Tschannen, M.; Mentzer, F.; Timofte, R.; Gool, L.V. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 221–231. [Google Scholar]
- Li, M.; Lin, J.; Ding, Y.; Liu, Z.; Zhu, J.Y.; Han, S. Gan compression: Efficient architectures for interactive conditional gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5284–5294. [Google Scholar]
- Su, R.; Cheng, Z.; Sun, H.; Katto, J. Scalable learned image compression with a recurrent neural networks-based hyperprior. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 3369–3373. [Google Scholar]
- Liu, Y.; Ng, M.K. Deep neural network compression by tucker decomposition with nonlinear response. Knowl.-Based Syst. 2022, 241, 108171. [Google Scholar] [CrossRef]
- Li, L.J.; Zhou, S.L.; Chao, F.; Chang, X.; Yang, L.; Yu, X.; Shang, C.; Shen, Q. Model compression optimized neural network controller for nonlinear systems. Knowl.-Based Syst. 2023, 265, 110311. [Google Scholar] [CrossRef]
- Yang, F.; Herranz, L.; Cheng, Y.; Mozerov, M.G. Slimmable compressive autoencoders for practical neural image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4998–5007. [Google Scholar]
- Yu, C.; Hong, L.; Pan, T.; Li, Y.; Li, T. ESTUGAN: Enhanced Swin Transformer with U-Net Discriminator for Remote Sensing Image Super-Resolution. Electronics 2023, 12, 4235. [Google Scholar] [CrossRef]
- Pan, T.; Zhang, L.; Song, Y.; Liu, Y. Hybrid attention compression network with light graph attention module for remote sensing images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6005605. [Google Scholar]
- Liu, D.; Wen, B.; Fan, Y.; Loy, C.C.; Huang, T.S. Nonlocal recurrent network for image restoration. In Proceedings of the Annual Conference on Neural Information Processing System, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Zhong, B.; Fu, Y. Residual non-local attention networks for image restoration. arXiv 2019, arXiv:1903.10082. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Chen, T.; Liu, H.; Ma, Z.; Shen, Q.; Cao, X.; Wang, Y. End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Trans. Image Process. 2021, 30, 3179–3191. [Google Scholar] [CrossRef] [PubMed]
- Xuyang, G.; Junyang, Y.; Shuwei, X. Text classification study based on graph convolutional neural networks. In Proceedings of the 2021 International Conference on Internet, Education and Information Technology (IEIT), Suzhou, China, 16–18 April 2021; pp. 102–105. [Google Scholar]
- Beck, D.; Haffari, G.; Cohn, T. Graph-to-sequence learning using gated graph neural networks. arXiv 2018, arXiv:1806.09835. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Cui, P.; Zhu, W. Deep learning on graphs: A survey. IEEE Trans. Knowl. Data Eng. 2020, 34, 249–270. [Google Scholar] [CrossRef]
- Lu, Y.; Zhu, Y.; Lu, G. 3d sceneflownet: Self-supervised 3d scene flow estimation based on graph cnn. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 3647–3651. [Google Scholar]
- Dinesh, C.; Cheung, G.; Bajić, I.V. 3d point cloud superresolution via graph total variation on surface normal. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 4390–4394. [Google Scholar]
- Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph convolutional neural networks for webscale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 974–983. [Google Scholar]
- Chen, Y.H.; Huang, L.; Wang, C.D.; Lai, J.H. Hybrid-order gated graph neural network for session-based recommendation. IEEE Trans. Ind. Inform. 2021, 18, 1458–1467. [Google Scholar] [CrossRef]
- Valsesia, D.; Fracastoro, G.; Magli, E. Deep graphconvolutional image denoising. IEEE Trans. Image Process. 2020, 29, 8226–8237. [Google Scholar] [CrossRef] [PubMed]
- Zhou, S.; Zhang, J.; Zuo, W.; Loy, C.C. Cross-scale internal graph neural network for image super-resolution. In Proceedings of the Annual Conference on Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 3499–3509. [Google Scholar]
- Tang, Z.; Wang, H.; Yi, X.; Zhang, Y.; Kwong, S.; Kuo, C.C.J. Joint graph attention and asymmetric convolutional neural network for deep image compression. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 421–433. [Google Scholar] [CrossRef]
- Yan, Q.; Zhang, L.; Liu, Y.; Zhu, Y.; Sun, J.; Shi, Q.; Zhang, Y. Deep HDR imaging via a non-local network. IEEE Transactions on Image Processing. 2020, 29, 4308–4322. [Google Scholar] [CrossRef] [PubMed]
- Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the IEEE Computer Society Conference on Computer vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; pp. 60–65. [Google Scholar]
- Xu, X.; Liu, S.; Chuang, T.D.; Huang, Y.W.; Lei, S.M.; Rapaka, K.; Pang, C.; Seregin, V.; Wang, Y.K.; Karczewicz, M. Intra block copy in hevc screen content coding extensions. IEEE J. Emerg. Sel. Top. Circuits Syst. 2016, 6, 409–419. [Google Scholar] [CrossRef]
- Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating long sequences with sparse transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 459–479. [Google Scholar]
- Wang, W.; Yao, L.; Chen, L.; Lin, B.; Cai, D.; He, X.; Liu, W. Crossformer: A versatile vision transformer hinging on cross-scale attention. arXiv 2021, arXiv:2108.00154. [Google Scholar] [CrossRef] [PubMed]
- Choi, J.; Han, B. Task-Aware Quantization Network for JPEG Image Compression. In Computer Vision, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 309–324. [Google Scholar]
- Zhong, G.; Wang, J.; Hu, J.; Liang, F. A GAN-Based Video Intra Coding. Electronics 2021, 10, 132. [Google Scholar] [CrossRef]
- Mei, Y.; Fan, Y.; Zhou, Y. Image super-resolution with nonlocal sparse attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3517–3526. [Google Scholar]
- Bai, Y.; Liu, X.; Zuo, W.; Wang, Y.; Ji, X. Learning scalable ly-constrained near-lossless image compression via joint lossy image and residual compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11946–11955. [Google Scholar]
- Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Deep residual learning for image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Bresson, X.; Laurent, T. Residual gated graph convnets. arXiv 2017, arXiv:1711.07553. [Google Scholar]
- Pan, T.; Zhang, L.; Qu, L.; Liu, Y. A Coupled Compression Generation Network for Remote-Sensing Images at Extremely Low Bitrates. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5608514. [Google Scholar] [CrossRef]
- Lee, J.; Cho, S.; Beack, S.K. Context-adaptive entropy model for end-to-end optimized image compression. arXiv 2018, arXiv:1809.10452. [Google Scholar]
BPP (bit/pix) | 0.168 | 0.312 | 0.536 |
Saving Bits (bit) | 5832 | 5377 | 5867 |
PSNR changes (dB) | +0.16 | +0.14 | +0.19 |
MS-SSIM changes (dB) | +0.12 | +0.12 | +0.15 |
Methods | FLOPs | Parameters |
---|---|---|
Cheng et al. [8] (192 channels) | 61.1 G | 18.4 M |
Cheng et al. [8] (256 channels) | 108.2 G | 31.7 M |
Chen et al. [25] (192 channels) | 207.2 G | 47.7 M |
Qian et al. [12] | 35.4 G | 12 M |
Minnen et al. [4] | 25.7 G | 12 M |
Proposed (192 channels) | 69.3 G | 69.6 M |
Proposed (256 channels) | 122.8 G | 122.7 M |
Sparse attention module (192 channels) | 30.7 M | 0.1 M |
Sparse attention module (256 channels) | 54.5 M | 0.2 M |
NCEG (192 channels) | 0.8 M | 36.9 K |
NCEG (256 channels) | 1.0 M | 65.5 K |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Pan, T.; Zhang, L. Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy. Electronics 2024, 13, 2677. https://doi.org/10.3390/electronics13132677
Li H, Pan T, Zhang L. Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy. Electronics. 2024; 13(13):2677. https://doi.org/10.3390/electronics13132677
Chicago/Turabian StyleLi, Hui, Tianpeng Pan, and Lili Zhang. 2024. "Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy" Electronics 13, no. 13: 2677. https://doi.org/10.3390/electronics13132677
APA StyleLi, H., Pan, T., & Zhang, L. (2024). Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy. Electronics, 13(13), 2677. https://doi.org/10.3390/electronics13132677