Video Colorization Based on Variational Autoencoder
Abstract
:1. Introduction
- We leverage RESNET-50 for comprehensive image feature extraction, utilizing deep-level features in a layered merging approach.
- The integration of an attention mechanism enhances the model’s ability to calculate image similarity across different reference images, thereby improving generalization.
- Our novel video coloring method based on Variational AutoEncoder (VAE) effectively utilizes spatiotemporal data to ensure coherence and authenticity in generated videos.
2. Related Work
2.1. Image Colorization
2.2. Video Colorization
3. Methodology
3.1. Overall Framework
3.2. Network Structure
3.2.1. Feature Processing Network F
3.2.2. Color Network C
3.3. Loss Function
3.3.1. Perceptual Loss
3.3.2. KL Divergence Loss
3.3.3. Context Loss
3.3.4. Smoothness Loss
4. Experiment
4.1. Efficiency and Datasets
4.1.1. Efficiency
4.1.2. Datasets
4.2. Ablation Experiment
4.2.1. Loss Function Analysis
4.2.2. Subnetwork Module Analysis
4.3. Comparative Experiment
4.3.1. Qualitative Analysis
4.3.2. Quantitative Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Let There Be Color! ACM Trans. Graph. 2016, 35, 1–11. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A. Colorful Image Colorization. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 649–666. [Google Scholar] [CrossRef]
- Larsson, G.; Maire, M.; Shakhnarovich, G. Learning Representations for Automatic Colorization. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 577–593. [Google Scholar] [CrossRef]
- Lai, W.S.; Huang, J.B.; Wang, O.; Shechtman, E.; Yumer, E.; Yang, M.H. Learning Blind Video Temporal Consistency. arXiv 2018, arXiv:1808.00449v1. [Google Scholar]
- Zhao, J.; Liu, L.; Snoek, C.G.M.; Han, J.; Shao, L. Pixel-Level Semantics Guided Image Colorization. arXiv 2018, arXiv:1808.00672. [Google Scholar]
- Deshpande, A.; Rock, J.; Forsyth, D. Learning Large-Scale Automatic Image Colorization. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
- Zhang, B.; He, M.; Liao, J.; Sander, P.V.; Yuan, L.; Bermak, A.; Chen, D. Deep Exemplar-Based Video Colorization. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar] [CrossRef]
- Irony, R.; Cohen-Or, D.; Lischinski, D. Colorization by Example. In Proceedings of the Eurographics Symposium on Rendering Techniques, Konstanz, Germany, 29 June–1 July 2005. [Google Scholar]
- Gupta, R.K.; Chia, A.Y.S.; Rajan, D.; Ng, E.S.; Huang, Z. Image Colorization Using Similar Images. In Proceedings of the 20th ACM International Conference on Multimedia, Nara, Japan, 29 October–2 November 2012. [Google Scholar] [CrossRef]
- Zhao, H.; Wu, W.; Liu, Y.; He, D. Color2Embed: Fast Exemplar-Based Image Colorization Using Color Embeddings. arXiv 2021, arXiv:2106.08017. [Google Scholar]
- Levin, A.; Lischinski, D.; Weiss, Y. Colorization Using Optimization. ACM Trans. Graph. 2004, 23, 689–694. [Google Scholar] [CrossRef]
- Cheng, Z.; Yang, Q.; Sheng, B. Deep Colorization. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
- Baldassarre, F.; Morín, D.G.; Rodés-Guirao, L. Deep Koalarization: Image Colorization using CNNs and Inception-ResNet. arXiv 2017, arXiv:1712.03400. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. GAN (Generative Adversarial Nets). J. Jpn. Soc. Fuzzy Theory Intell. Inform. 2017, 29, 177. [Google Scholar] [CrossRef] [PubMed]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Vitoria, P.; Raad, L.; Ballester, C. ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020. [Google Scholar] [CrossRef]
- Treneska, S.; Zdravevski, E.; Pires, I.M.; Lameski, P.; Gievska, S. GAN-Based Image Colorization for Self-Supervised Visual Feature Learning. Sensors 2022, 22, 1599. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Liu, Y.; Wang, Z.; Yang, X. Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning. arXiv 2023, arXiv:2304.08947. [Google Scholar]
- Chen, H.; Yu, Q.; Wu, J.; Zhang, L. BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization. arXiv 2022, arXiv:2212.02268. [Google Scholar]
- Li, X.; Sun, L.; Jiang, J.; Gao, X. DeepExemplar: Deep Exemplar-based Video Colorization. arXiv 2022, arXiv:2203.15797. [Google Scholar]
- Bonneel, N.; Tompkin, J.; Sunkavalli, K.; Sun, D.; Paris, S.; Pfister, H. Blind Video Temporal Consistency. ACM Trans. Graph. 2015, 34, 6. [Google Scholar] [CrossRef]
- Lei, C.; Xing, Y.; Chen, Q. Blind Video Temporal Consistency via Deep Video Prior. In Proceedings of the Neural Information Processing Systems, Virtual, 6–12 December 2020. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Gool, L.V.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 724–732. [Google Scholar] [CrossRef]
- Lei, C.; Chen, Q. Fully Automatic Video Colorization With Self-Regularization and Diversity. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar] [CrossRef]
- Kouzouglidis, P.; Sfikas, G.; Nikou, C. Automatic Video Colorization Using 3D Conditional Generative Adversarial Networks. arXiv 2019, arXiv:1905.03023v1. [Google Scholar]
- Zhao, Y.; Po, L.M.; Yu, W.Y.; Ur Rehman, Y.A.; Liu, M.; Zhang, Y.; Ou, W. VCGAN: Video Colorization with Hybrid Generative Adversarial Network. IEEE Trans. Multimed. 2023, 25, 3017–3032. [Google Scholar] [CrossRef]
- Wan, Z.; Zhang, B.; Chen, D.; Liao, J. Bringing old films back to life. arXiv 2022, arXiv:2203.17276. [Google Scholar]
- Chen, S.; Li, X.; Zhang, X.; Wang, M.; Zhang, Y.; Han, J.; Zhang, Y. Exemplar-Based Video Colorization with Long-Term Spatiotemporal Dependency. arXiv 2023, arXiv:2303.15081. [Google Scholar] [CrossRef]
- Iizuka, S.; Simo-Serra, E. DeepRemaster. ACM Trans. Graph. 2019, 38, 1–13. [Google Scholar] [CrossRef]
- Zhao, Y.; Po, L.M.; Liu, K.; Wang, X.; Yu, W.Y.; Xian, P.; Zhang, Y.; Liu, M. SVCNet: Scribble-Based Video Colorization Network with Temporal Aggregation. arXiv 2023, arXiv:2303.11591. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
FPS (Frames per Second) | Average Process Time (Milliseconds) | |
---|---|---|
Deep | 24.7638 | 399.29 ms |
SVC | 10.7024 | 937.60 ms |
Ours | 25.3438 | 389.03 ms |
SSIM | PSNR | FID | LPIPS | |
---|---|---|---|---|
Without | 0.957 | 29.33 | 71.82 | 0.18 |
Without | 0.942 | 26.83 | 100.79 | 0.37 |
Without | 0.955 | 28.14 | 69.66 | 0.15 |
Without | 0.948 | 28.06 | 80.75 | 0.23 |
Complete model | 0.977 | 31.80 | 46.04 | 0.10 |
SSIM | PSNR | FID | LPIPS | |
---|---|---|---|---|
VGG-19 | 0.976 | 24.02 | 33.76 | 0.72 |
No | 0.967 | 23.56 | 70.33 | 0.70 |
GAN | 0.959 | 25.94 | 68.60 | 0.71 |
Complete model | 0.977 | 31.80 | 46.04 | 0.10 |
Top-1 | Top-5 | |
---|---|---|
RESNET-50 | 80.1% | 93.0% |
VGG-19 | 75.5% | 92.4% |
SSIM | PSNR(dB) | FID | LPIPS | |
---|---|---|---|---|
Deep | 0.971 | 29.3 | 60.55 | 0.15 |
SVC | 0.953 | 19.9 | 130.25 | 0.23 |
Ours (DAVIS) | 0.976 | 30.0 | 46.53 | 0.14 |
Ours (DAVIS + our videos) | 0.977 | 31.8 | 46.04 | 0.10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, G.; Hong, X.; Liu, Y.; Qian, Y.; Cai, X. Video Colorization Based on Variational Autoencoder. Electronics 2024, 13, 2412. https://doi.org/10.3390/electronics13122412
Zhang G, Hong X, Liu Y, Qian Y, Cai X. Video Colorization Based on Variational Autoencoder. Electronics. 2024; 13(12):2412. https://doi.org/10.3390/electronics13122412
Chicago/Turabian StyleZhang, Guangzi, Xiaolin Hong, Yan Liu, Yulin Qian, and Xingquan Cai. 2024. "Video Colorization Based on Variational Autoencoder" Electronics 13, no. 12: 2412. https://doi.org/10.3390/electronics13122412
APA StyleZhang, G., Hong, X., Liu, Y., Qian, Y., & Cai, X. (2024). Video Colorization Based on Variational Autoencoder. Electronics, 13(12), 2412. https://doi.org/10.3390/electronics13122412