Video Super-Resolution Using Multi-Scale and Non-Local Feature Fusion
Abstract
:1. Introduction
- The non-local module overcomes the limitation of convolution operation in the feature-extraction process, fully excavates the global information of the video frames, expands the receptive field, and improves the utilization of effective information.
- Multi-scale feature fusion blocks are connected by different convolution kernels with different sizes, which makes the feature information extracted by different convolutions be fused efficiently, so the reconstruction results contain more details.
2. Related Work
2.1. Single-Image Super-Resolution
2.2. Video Super-Resolution
2.2.1. Methods with Video Frames Alignment
2.2.2. Methods without Video Frames Alignment
3. Proposed Method
3.1. Network Architecture
3.2. Non-Local Module
3.3. Multi-Scale Feature Fusion Block
4. Experiments
4.1. Ablation Experiments
4.2. Comparative Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Barzigar, N.; Roozgard, A.; Verma, P.; Cheng, S. A video super-resolution framework using SCoBeP. IEEE Trans. Circuits Syst. Video Technol. 2013, 26, 264–277. [Google Scholar] [CrossRef]
- Jin, Z.; Tillo, T.; Yao, C.; Xiao, J.; Zhao, Y. Virtual-view-assisted video super-resolution and enhancement. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 467–478. [Google Scholar] [CrossRef]
- Kappeler, A.; Yoo, S.; Dai, Q.; Katsaggelos, A.K. Video super-resolution with convolutional neural networks. IEEE Trans. Comput. Imaging 2016, 2, 109–122. [Google Scholar] [CrossRef]
- Lucas, A.; Lopez-Tapia, S.; Molina, R.; Katsaggelos, A.K. Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans. Image Process. 2019, 28, 3312–3327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jo, Y.; Oh, S.W.; Kang, J.; Kim, S.J. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3224–3232. [Google Scholar]
- Li, S.; He, F.; Du, B.; Zhang, L.; Xu, Y.; Tao, D. Fast spatio-temporal residual network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10522–10531. [Google Scholar]
- Kim, S.Y.; Lim, J.; Na, T.; Kim, M. Video super-resolution based on 3d-cnns with consideration of scene change. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2831–2835. [Google Scholar]
- Guo, J.; Chao, H. Building an end-to-end spatial-temporal convolutional network for video super-resolution. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Zhu, X.; Li, Z.; Zhang, X.Y.; Li, C.; Liu, Y.; Xue, Z. Residual invertible spatio-temporal network for video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5981–5988. [Google Scholar]
- Caballero, J.; Ledig, C.; Aitken, A.; Acosta, A.; Totz, J.; Wang, Z.; Shi, W. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4778–4787. [Google Scholar]
- Liao, R.; Tao, X.; Li, R.; Ma, Z.; Jia, J. Video super-resolution via deep draft-ensemble learning. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 531–539. [Google Scholar]
- Tao, X.; Gao, H.; Liao, R.; Wang, J.; Jia, J. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4472–4480. [Google Scholar]
- Liu, D.; Wang, Z.; Fan, Y.; Liu, X.; Wang, Z.; Chang, S.; Huang, T. Robust video super-resolution with learned temporal dynamics. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2507–2515. [Google Scholar]
- Sajjadi, M.S.; Vemulapalli, R.; Brown, M. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6626–6634. [Google Scholar]
- Wang, Z.; Yi, P.; Jiang, K.; Jiang, J.; Han, Z.; Lu, T.; Ma, J. Multi-memory convolutional neural network for video super-resolution. IEEE Trans. Image Process. 2018, 28, 2530–2544. [Google Scholar] [CrossRef] [PubMed]
- Yi, P.; Wang, Z.; Jiang, K.; Shao, Z.; Ma, J. Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2503–2516. [Google Scholar] [CrossRef]
- Tian, Y.; Zhang, Y.; Fu, Y.; Xu, C. Tdan: Temporally-deformable alignment network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3360–3369. [Google Scholar]
- Chu, M.; Xie, Y.; Mayer, J.; Leal-Taixé, L.; Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Trans. Graph. (TOG) 2020, 39, 75. [Google Scholar] [CrossRef]
- Kim, T.H.; Sajjadi, M.S.; Hirsch, M.; Scholkopf, B. Spatio-temporal transformer network for video restoration. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 106–122. [Google Scholar]
- Li, D.; Liu, Y.; Wang, Z. Video super-resolution using non-simultaneous fully recurrent convolutional network. IEEE Trans. Image Process. 2018, 28, 1342–1355. [Google Scholar] [CrossRef]
- Liu, D.; Wang, Z.; Fan, Y.; Liu, X.; Wang, Z.; Chang, S.; Wang, X.; Huang, T.S. Learning temporal dynamics for video super-resolution: A deep learning approach. IEEE Trans. Image Process. 2018, 27, 3432–3445. [Google Scholar] [CrossRef]
- Huang, Y.; Wang, W.; Wang, L. Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1015–1028. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Van Der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Wang, L.; Guo, Y.; Liu, L.; Lin, Z.; Deng, X.; An, W. Deep video super-resolution using HR optical flow estimation. IEEE Trans. Image Process. 2020, 29, 4323–4336. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [Green Version]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Hu, Y.; Li, J.; Huang, Y.; Gao, X. Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 3911–3927. [Google Scholar] [CrossRef] [Green Version]
- Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
- Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Haris, M.; Shakhnarovich, G.; Ukita, N. Recurrent back-projection network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3897–3906. [Google Scholar]
- Bao, W.; Lai, W.S.; Zhang, X.; Gao, Z.; Yang, M.H. Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 48, 933–948. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kalarot, R.; Porikli, F. Multiboot vsr: Multi-stage multi-reference bootstrapping for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Chen, L.; Pan, J.; Hu, R.; Han, Z.; Liang, C.; Wu, Y. Modeling and optimizing of the multi-layer nearest neighbor network for face image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 4513–4525. [Google Scholar] [CrossRef]
- Haris, M.; Shakhnarovich, G.; Ukita, N. Space-time-aware multi-resolution video enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2859–2868. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Wang, X.; Chan, K.C.; Yu, K.; Dong, C.; Change Loy, C. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Ying, X.; Wang, L.; Wang, Y.; Sheng, W.; An, W.; Guo, Y. Deformable 3D convolution for video super-resolution. IEEE Signal Process. Lett. 2020, 27, 1500–1504. [Google Scholar] [CrossRef]
- Isobe, T.; Zhu, F.; Jia, X.; Wang, S. Revisiting temporal modeling for video super-resolution. In Proceedings of the British Machine Vision Conference, Manchester, UK, 7–11 September 2020. [Google Scholar]
- Yan, B.; Lin, C.; Tan, W. Frame and feature-context video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 1–27 January 2019; Volume 33, pp. 5597–5604. [Google Scholar]
- Huang, Y.; Wang, W.; Wang, L. Bidirectional recurrent convolutional networks for multi-frame super-resolution. Adv. Neural Inf. Process. Syst. 2015, 28, 235–243. [Google Scholar]
- Yi, P.; Wang, Z.; Jiang, K.; Jiang, J.; Ma, J. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 3106–3115. [Google Scholar]
- Li, W.; Tao, X.; Guo, T.; Qi, L.; Lu, J.; Jia, J. Mucan: Multi-correspondence aggregation network for video super-resolution. In Proceedings of the European Conference on Computer Vision. Springer, Glasgow, UK, 23–28 August 2020; pp. 335–351. [Google Scholar]
- Song, Q.; Liu, H. Deep Gradient Prior Regularized Robust Video Super-Resolution. Electronics 2021, 10, 1641. [Google Scholar] [CrossRef]
- Wang, J.; Teng, G.; An, P. Video Super-Resolution Based on Generative Adversarial Network and Edge Enhancement. Electronics 2021, 10, 459. [Google Scholar] [CrossRef]
- Liu, S.; Zheng, C.; Lu, K.; Gao, S.; Wang, N.; Wang, B.; Zhang, D.; Zhang, X.; Xu, T. Evsrnet: Efficient video super-resolution with neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2480–2485. [Google Scholar]
- Li, D.; Wang, Z. Video superresolution via motion compensation and deep residual learning. IEEE Trans. Comput. Imaging 2017, 3, 749–762. [Google Scholar] [CrossRef]
- Xue, T.; Chen, B.; Wu, J.; Wei, D.; Freeman, W.T. Video enhancement with task-oriented flow. Int. J. Comput. Vis. 2019, 127, 1106–1125. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Liu, C.; Sun, D. On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 346–360. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, L.; Guo, Y.; Lin, Z.; Deng, X.; An, W. Learning for video super-resolution through HR optical flow estimation. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 514–529. [Google Scholar]








| Method | PSNR | SSIM | 
|---|---|---|
| SOF-VSR* | 34.19 | 0.923 | 
| Proposed without MSFFB | 34.26 | 0.924 | 
| Proposed without Non-local | 34.29 | 0.927 | 
| Proposed | 34.52 | 0.930 | 
| Threshold Value | HR | S/HR | P/HR | (P+S)/HR | (P-S)/HR | 
|---|---|---|---|---|---|
| <150 | 2.74 | 36.95 | 42.84 | 79.79 | 5.89 | 
| <100 | 6.84 | 38.73 | 44.42 | 83.15 | 5.69 | 
| <50 | 16.70 | 38.80 | 45.56 | 84.36 | 6.76 | 
| =0 | 31.23 | 36.63 | 43.36 | 79.99 | 6.73 | 
| >20 | 4.91 | 30.33 | 36.69 | 67.02 | 6.36 | 
| >50 | 14.82 | 34.10 | 40.93 | 75.07 | 6.83 | 
| >100 | 24.50 | 36.06 | 43.06 | 79.12 | 7.00 | 
| >150 | 28.54 | 36.59 | 43.42 | 80.01 | 6.83 | 
| Model | Method | PSNR | SSIM | 
|---|---|---|---|
| Average | SOF-VSR | 24.90 | 0.752 | 
| Proposed | 24.82 | 0.756 | |
| Bilinear | SOF-VSR | 25.49 | 0.742 | 
| Proposed | 25.68 | 0.752 | 
| Model | Scale | Method | PSNR | SSIM | 
|---|---|---|---|---|
| BI | ×2 | Bicubic | 28.42 | 0.866 | 
| DRCN [33] | 31.57 | 0.924 | ||
| LapSRN [28] | 31.41 | 0.923 | ||
| CARN [34] | 31.96 | 0.931 | ||
| VSRnet [3] | 31.29 | 0.927 | ||
| SOF-VSR [24] | 33.17 | 0.947 | ||
| Proposed* | 33.29 | 0.948 | ||
| Proposed | 33.63 | 0.951 | ||
| ×3 | Bicubic | 25.26 | 0.730 | |
| DRCN [33] | 26.82 | 0.805 | ||
| CARN [34] | 27.16 | 0.818 | ||
| VSRnet [3] | 26.75 | 0.807 | ||
| VESPCN [10] | 27.25 | 0.845 | ||
| SOF-VSR [24] | 28.09 | 0.861 | ||
| Proposed* | 28.26 | 0.864 | ||
| Proposed | 28.46 | 0.871 | ||
| ×4 | Bicubic | 23.75 | 0.630 | |
| DRCN [33] | 24.94 | 0.707 | ||
| LapSRN [28] | 24.98 | 0.711 | ||
| CARN [34] | 25.27 | 0.725 | ||
| VSRnet [3] | 24.81 | 0.702 | ||
| VESPCN [10] | 25.35 | 0.756 | ||
| SOF-VSR [24] | 26.01 | 0.771 | ||
| Proposed* | 26.05 | 0.773 | ||
| Proposed | 26.21 | 0.782 | ||
| BD | ×4 | SPMC [12] | 25.99 | 0.773 | 
| SOF-VSR [24] | 26.19 | 0.785 | ||
| Proposed* | 26.29 | 0.791 | ||
| Proposed | 26.43 | 0.797 | 
| Model | Scale | Method | PSNR | SSIM | 
|---|---|---|---|---|
| BI | ×2 | Bicubic | 36.43 | 0.958 | 
| DRCN [33] | 40.62 | 0.979 | ||
| LapSRN [28] | 40.30 | 0.978 | ||
| CARN [34] | 40.99 | 0.981 | ||
| VSRnet [3] | 39.00 | 0.972 | ||
| SOF-VSR [24] | 41.38 | 0.983 | ||
| Proposed* | 41.00 | 0.982 | ||
| Proposed | 41.35 | 0.984 | ||
| ×3 | Bicubic | 32.94 | 0.912 | |
| DRCN [33] | 36.08 | 0.947 | ||
| CARN [34] | 36.70 | 0.952 | ||
| VSRnet [3] | 34.94 | 0.936 | ||
| SOF-VSR [24] | 36.80 | 0.955 | ||
| Proposed* | 36.63 | 0.953 | ||
| Proposed | 37.02 | 0.958 | ||
| ×4 | Bicubic | 30.97 | 0.870 | |
| DRCN [33] | 33.49 | 0.911 | ||
| LapSRN [28] | 33.54 | 0.911 | ||
| CARN [34] | 34.12 | 0.921 | ||
| VSRnet [3] | 32.63 | 0.897 | ||
| SOF-VSR [24] | 34.32 | 0.925 | ||
| Proposed* | 34.26 | 0.924 | ||
| Proposed | 34.52 | 0.930 | ||
| BD | ×4 | SPMC [12] | 33.02 | 0.911 | 
| SOF-VSR [24] | 34.28 | 0.927 | ||
| Proposed* | 34.43 | 0.930 | ||
| Proposed | 34.69 | 0.933 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Zhu, H.; Hou, Q.; Wang, J.; Wu, W. Video Super-Resolution Using Multi-Scale and Non-Local Feature Fusion. Electronics 2022, 11, 1499. https://doi.org/10.3390/electronics11091499
Li Y, Zhu H, Hou Q, Wang J, Wu W. Video Super-Resolution Using Multi-Scale and Non-Local Feature Fusion. Electronics. 2022; 11(9):1499. https://doi.org/10.3390/electronics11091499
Chicago/Turabian StyleLi, Yanghui, Hong Zhu, Qian Hou, Jing Wang, and Wenhuan Wu. 2022. "Video Super-Resolution Using Multi-Scale and Non-Local Feature Fusion" Electronics 11, no. 9: 1499. https://doi.org/10.3390/electronics11091499
APA StyleLi, Y., Zhu, H., Hou, Q., Wang, J., & Wu, W. (2022). Video Super-Resolution Using Multi-Scale and Non-Local Feature Fusion. Electronics, 11(9), 1499. https://doi.org/10.3390/electronics11091499
 
         
                                                

 
       