Transform a Simple Sketch to a Chinese Painting by a Multiscale Deep Neural Network
Abstract
:1. Introduction
- We propose a deep generative adversarial network to produce surprising Chinese paintings by inputting simple sketches.
- It can use multiscale images to train the generative model and the discriminative model by setting these two models as fully-convolution networks.
- By adding an edge detector, the generative model can also be treated as a neural style transfer method.
- The method we proposed is also effective in other image-to-image translation problems, such as image colorization and image super-resolution.
2. Related Work
2.1. Convolutional Networks
2.2. Generative Adversarial Nets
2.3. Sketch to Image and Style Transform
3. Method
3.1. Generative Adversarial Nets
3.2. Conditional Generative Adversarial Nets
3.3. Network Architecture
3.4. Loss Function
4. Experiments and Results
4.1. Training Details
4.2. Network Architecture Analysis
4.3. Compare to Pix2pix
4.4. Validity of Multiscale
4.5. Style Transfer
4.6. Time and Memory Usage
5. Other Applications
5.1. Image Colorization
Comparisons with Other Approaches
5.2. Image Super-Resolution
Comparisons with Other Approaches
6. Conclusions and Future Work
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Jing, Y.; Yang, Y.; Feng, Z.; Ye, J.; Song, M. Neural Style Transfer: A Review. arXiv, 2017; arXiv:1705.04058. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv, 2015; arXiv:1508.06576. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Shih, Y.; Paris, S.; Barnes, C.; Freeman, W.T.; Durand, F. Style Transfer for Headshot Portraits; Association for Computing Machinery (ACM): New York, NY, USA, 2014. [Google Scholar]
- Efros, A.A.; Freeman, W.T. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 12–17 August 2001; ACM: New York, NY, USA, 2001; pp. 341–346. [Google Scholar]
- Wei, L.Y.; Levoy, M. Fast texture synthesis using tree-structured vector quantization. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; ACM Press/Addison-Wesley Publishing Co.: Boston, MA, USA, 2000; pp. 479–488. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems, Lake Tahoe, Nevada, 3–6 December 2012; Curran Associates, Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv, 2015; 1–15arXiv:1511.06434. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin, Germany, 2016; pp. 694–711. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv, 2014; 1–9arXiv:1406.2661. [Google Scholar]
- Sangkloy, P.; Lu, J.; Fang, C.; Yu, F.; Hays, J. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. arXiv, 2016; arXiv:1612.00835. [Google Scholar]
- Güçlütürk, Y.; Güçlü, U.; van Lier, R.; van Gerven, M.A. Convolutional sketch inversion. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin, Germany, 2016; pp. 810–824. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Eitz, M.; Richter, R.; Hildebrand, K.; Boubekeur, T.; Alexa, M. Photosketcher: Interactive sketch-based image synthesis. IEEE Comput. Graph. Appl. 2011, 31, 56–66. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Cheng, M.M.; Tan, P.; Shamir, A.; Hu, S.M. Sketch2photo: Internet image montage. ACM Trans. Graph. 2009, 28. [Google Scholar] [CrossRef]
- Shih, Y.; Paris, S.; Durand, F.; Freeman, W.T. Data-driven hallucination of different times of day from a single outdoor photo. ACM Trans. Graph. 2013, 32. [Google Scholar] [CrossRef] [Green Version]
- Kwatra, V.; Schödl, A.; Essa, I.; Turk, G.; Bobick, A. Graphcut textures: Image and video synthesis using graph cuts. ACM Trans. Graph. 2003, 22, 277–286. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv, 2013; arXiv:1312.6114. [Google Scholar]
- Rezende, D.J.; Mohamed, S.; Wierstra, D. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1278–1286. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 2672–2680. [Google Scholar]
- Van den Oord, A.; Kalchbrenner, N.; Espeholt, L.; Vinyals, O.; Graves, A. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 4790–4798. [Google Scholar]
- Hertzmann, A.; Jacobs, C.E.; Oliver, N.; Curless, B.; Salesin, D.H. Image analogies. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 12–17 August 2001; ACM: New York, NY, USA, 2001; pp. 327–340. [Google Scholar]
- Yan, X.; Yang, J.; Sohn, K.; Lee, H. Attribute2image: Conditional image generation from visual attributes. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin, Germany, 2016; pp. 776–791. [Google Scholar]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification. ACM Trans. Graph. 2016, 35, 110:1–110:11. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 1–10. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. arXiv, 2016; arXiv:1606.03657. [Google Scholar]
- Dixon, D.; Prasad, M.; Hammond, T. iCanDraw: Using sketch recognition and corrective feedback to assist a user in drawing human faces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 7–11 May 2010; ACM: New York, NY, USA, 2010; pp. 897–906. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. arXiv, 2016; arXiv:1611.07004. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv, 2014; arXiv:1411.1784. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin, Germany, 2015; pp. 234–241. [Google Scholar]
- Zeiler, M.D.; Krishnan, D.; Taylor, G.W.; Fergus, R. Deconvolutional networks. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 2528–2535. [Google Scholar]
- Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and Checkerboard Artifacts. 2016. Available online: http://distill.pub/2016/deconv-checkerboard/ (accessed on 15 July 2017).
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv, 2016; arXiv:1603.04467. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Li, C.; Wand, M. Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2479–2486. [Google Scholar]
- Ulyanov, D.; Lebedev, V.; Vedaldi, A.; Lempitsky, V. Texture networks: Feed-forward synthesis of textures and stylized images. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016. [Google Scholar]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; ACM: New York, NY, USA, 2010; pp. 270–279. [Google Scholar]
- Zhang, R.; Zhu, J.Y.; Isola, P.; Geng, X.; Lin, A.S.; Yu, T.; Efros, A.A. Real-Time User-Guided Image Colorization with Learned Deep Priors. ACM Trans. Graph. 2017, 36. [Google Scholar] [CrossRef]
- Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
- Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; pp. 416–423. [Google Scholar]
- Chang, H.; Yeung, D.Y.; Xiong, Y. Super-resolution through neighbor embedding. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar]
- Glasner, D.; Bagon, S.; Irani, M. Super-resolution from a single image. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 349–356. [Google Scholar]
- Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar]
- Bevilacqua, M.; Roumy, A.; Guillemot, C.; Morel, A. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Guildford, UK, 3–7 September 2012. [Google Scholar]
- Timofte, R.; De Smet, V.; Van Gool, L. A+: Adjusted anchored neighborhood regression for fast super-resolution. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; Springer: Berlin, Germany, 2014; pp. 111–126. [Google Scholar]
- Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, 2014; pp. 184–199. [Google Scholar]
Generative Model | |||||
---|---|---|---|---|---|
Name | Kernel Size | Stride | Pad | Norm | Activation |
Conv1 | 3 × 3 × 3 × 64 | 1 | 1 | - | ReLU |
Conv2 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv3 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv4 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv5 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv6 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv7 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv8 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv9 | 3 × 3 × 64 × 64 | 1 | 1 | BN | ReLU |
Conv10 | 3 × 3 × 64 × 3 | 1 | 1 | - | ReLU |
Discriminative Model | |||||
---|---|---|---|---|---|
Name | Kernel Size | Stride | Pad | Norm | Activation |
Conv1 | 4 × 4 × 3 × 64 | 2 | 1 | - | Leaky |
Conv2 | 4 × 4 × 64 × 128 | 2 | 1 | BN | Leaky |
Conv3 | 4 × 4 × 128 × 256 | 2 | 1 | BN | Leaky |
Conv4 | 4 × 4 × 256 × 512 | 2 | 1 | BN | Leaky |
Conv5 | 4 × 4 × 512 × 512 | 2 | 1 | BN | Leaky |
Conv6 | 4 × 4 × 512 × 512 | 2 | 1 | BN | Leaky |
Conv7 | 4 × 4 × 512 × 1 | 2 | 1 | BN | - |
Length of Generator | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
---|---|---|---|---|---|---|---|
PSNR (dB) | 30.829 | 30.832 | 30.837 | 30.846 | 30.842 | 30.852 | 30.856 |
Times () | 0.023 s | 0.027 s | 0.029 s | 0.032 s | 0.035 s | 0.039 s | 0.044 s |
Images | Column 1 to 3 | Column 4–6 | Column 7–9 | |
---|---|---|---|---|
Methods | ||||
Pix2Pix (PSNR/SSIM) | 28.38/0.4534 | 28.08/0.3421 | 28.85/0.5312 | |
Ours (PSNR/SSIM) | 30.95/0.7282 | 28.76/0.5854 | 31.84/0.6863 |
Methods | Gatys et al. [2] | Ulyanov et al. [39] | Ours | |
---|---|---|---|---|
Image Size | ||||
1.542 s | 0.004 s | 0.002 s | ||
6.483 s | 0.015 s | 0.008 s | ||
25.23 s | 0.051 s | 0.032 s | ||
106.2 s | 0.212 s | 0.122 s |
Dataset | Scale | Bicubic | A+ | SelfEx | SRCNN | Ours |
---|---|---|---|---|---|---|
PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | PSNR/SSIM | ||
Set5 | ×2 | 33.66/0.9299 | 36.54/0.9544 | 36.54/0.9537 | 36.49/0.9537 | 36.66/0.9542 |
×3 | 30.39/0.8682 | 32.58/0.9088 | 32.43/0.9057 | 32.58/0.9093 | 32.75/0.9090 | |
×4 | 28.42/0.8104 | 30.28/0.8603 | 30.14/0.8548 | 30.31/0.8619 | 30.48/0.8628 | |
Set14 | ×2 | 30.24/0.8688 | 32.28/0.9056 | 32.26/0.9040 | 32.22/0.9034 | 32.42/0.9063 |
×3 | 27.55/0.7742 | 29.13/0.8188 | 29.05/0.8164 | 29.16/0.8196 | 29.28/0.8209 | |
×4 | 26.00/0.7027 | 27.32/0.7491 | 27.24/0.7451 | 27.40/0.7518 | 27.49/0.7503 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, D.; Wang, Y.; Xu, G.; Li, J.; Fu, K. Transform a Simple Sketch to a Chinese Painting by a Multiscale Deep Neural Network. Algorithms 2018, 11, 4. https://doi.org/10.3390/a11010004
Lin D, Wang Y, Xu G, Li J, Fu K. Transform a Simple Sketch to a Chinese Painting by a Multiscale Deep Neural Network. Algorithms. 2018; 11(1):4. https://doi.org/10.3390/a11010004
Chicago/Turabian StyleLin, Daoyu, Yang Wang, Guangluan Xu, Jun Li, and Kun Fu. 2018. "Transform a Simple Sketch to a Chinese Painting by a Multiscale Deep Neural Network" Algorithms 11, no. 1: 4. https://doi.org/10.3390/a11010004
APA StyleLin, D., Wang, Y., Xu, G., Li, J., & Fu, K. (2018). Transform a Simple Sketch to a Chinese Painting by a Multiscale Deep Neural Network. Algorithms, 11(1), 4. https://doi.org/10.3390/a11010004