Building Facade-Completion Network Based on Dynamic Convolutional GAN
Abstract
:1. Introduction
- 1.
- We propose a novel facade-completion network based on dynamic convolution that allows training with fewer model parameters, reducing the risk of overfitting. The network uses a global receptive field to process image features, ensuring the relevance between texture and semantic structural features in both the completion area and the global area of the building facade.
- 2.
- We introduce a spatial attention branch to enhance the feature representation of the missing parts by strengthening the features of the edge region of the mask and weakening the background features, effectively improving the local structural correlation between the completion area and the surrounding area in building facades.
2. Related Work
2.1. Image Completion for Facade
2.2. Image-Completion Algorithm
3. Method
3.1. Dynamic Convolution with Dynamic Channel Fusion
3.2. Spatial Attention Branch
3.3. Network Structure
3.4. Loss Functions
4. Experiment
4.1. Experimental Settings
- MAE: Mean Absolute Error (MAE) between predicted values and ground truth can roughly reflect the ability of the model to reconstruct the content of the original image. The smaller the MAE value, the better the generated result.
- PSNR: Peak Signal-to-Noise Ratio (PSNR) is a widely used objective image quality evaluation metric that calculates the error between corresponding pixels in two images. The higher the PSNR value, the less distortion and better performance.
- FID: Fréchet Inception Distance [41] uses the Inception v3 [42] image classification model to extract visual features from original images to separately measure the similarity between two sets of images. It is a metric that calculates the distance between feature vectors of real and generated images.
- LPIPS: LPIPS [43] uses a pre-trained VGG or Alex network to measure the perceptual image similarity between images, where a smaller value indicates a higher level of similarity.
4.2. Comparison with Existing Work
4.3. Ablation Study
4.4. Experiments with Other Typed Data
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yu, B.; Hu, J.; Dong, X.; Dai, K.; Xiao, D.; Zhang, B.; Wu, T.; Hu, Y.; Wang, B. A Robust Automatic Method to Extract Building Facade Maps from 3D Point Cloud Data. Remote Sens. 2022, 14, 3848. [Google Scholar] [CrossRef]
- Wang, B.; Zhang, J.; Zhang, R.; Li, Y.; Li, L.; Nakashima, Y. Improving facade parsing with vision transformers and line integration. Adv. Eng. Inform. 2024, 60, 102463. [Google Scholar] [CrossRef]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative Image Inpainting With Contextual Attention. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; Computer Vision Foundation/IEEE Computer Society: Washington, DC, USA, 2018; pp. 5505–5514. [Google Scholar] [CrossRef]
- Bertalmío, M.; Bertozzi, A.L.; Sapiro, G. Navier-Stokes, Fluid Dynamics, and Image and Video Inpainting. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), with CD-ROM, Kauai, HI, USA, 8–14 December 2001; IEEE Computer Society: Washington, DC, USA, 2001; pp. 355–362. [Google Scholar] [CrossRef]
- Dai, D.; Riemenschneider, H.; Schmitt, G.; Gool, L.V. Example-Based Facade Texture Synthesis. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, 1–8 December 2013; IEEE Computer Society: Washington, DC, USA, 2013; pp. 1065–1072. [Google Scholar] [CrossRef]
- Huang, J.; Kang, S.B.; Ahuja, N.; Kopf, J. Image completion using planar structure guidance. ACM Trans. Graph. 2014, 33, 1–10. [Google Scholar] [CrossRef]
- Kottler, B.; Bulatov, D.; Zhang, X. Context-aware Patch-based Method for Façade Inpainting. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020, Volume 1: GRAPP, Valletta, Malta, 27–29 February 2020; Bouatouch, K., de Sousa, A.A., Braz, J., Eds.; SCITEPRESS: Setúbal, Portugal, 2020; pp. 210–218. [Google Scholar] [CrossRef]
- Hensel, S.; Goebbels, S.; Kada, M. LSTM Architectures for Facade Structure Completion. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021, Volume 1: GRAPP, Online Streaming, 8–10 February 2021; de Sousa, A.A., Havran, V., Braz, J., Bouatouch,, K., Eds.; SCITEPRESS: Setúbal, Portugal, 2021; pp. 15–24. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Shao, X.; Qiang, Z.; Dai, F.; He, L.; Lin, H. Face Image Completion Based on GAN Prior. Electronics 2022, 11, 1997. [Google Scholar] [CrossRef]
- Jin, X.; Chen, Z.; Lin, J.; Zhou, W.; Chen, J.; Shan, C. AI-GAN: Signal de-interference via asynchronous interactive generative adversarial network. In Proceedings of the 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shanghai, China, 8–12 July 2019; pp. 228–233. [Google Scholar]
- Jin, X.; Chen, Z.; Lin, J.; Chen, Z.; Zhou, W. Unsupervised single image deraining with self-supervised constraints. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2761–2765. [Google Scholar]
- Jin, X.; Chen, Z.; Li, W. AI-GAN: Asynchronous interactive generative adversarial network for single image rain removal. Pattern Recognit. 2020, 100, 107143. [Google Scholar] [CrossRef]
- Georgiou, Y.; Loizou, M.; Kelly, T.; Averkiou, M. FacadeNet: Conditional Facade Synthesis via Selective Editing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 5384–5393. [Google Scholar]
- Zhang, J.; Fukuda, T.; Yabuki, N. Automatic Object Removal with Obstructed Façades Completion Using Semantic Segmentation and Generative Adversarial Inpainting. IEEE Access 2021, 9, 117486–117495. [Google Scholar] [CrossRef]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Free-Form Image Inpainting With Gated Convolution. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4470–4479. [Google Scholar] [CrossRef]
- Kottler, B.; List, L.; Bulatov, D.; Weinmann, M. 3GAN: A Three-GAN-based Approach for Image Inpainting Applied to the Reconstruction of Occluded Parts of Building Walls. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2022, Volume 4: VISAPP, Online Streaming, 6–8 February 2022; Farinella, G.M., Radeva, P., Bouatouch, K., Eds.; SCITEPRESS: Setúbal, Portugal, 2022; pp. 427–435. [Google Scholar] [CrossRef]
- Bertalmio, M.; Vese, L.; Sapiro, G.; Osher, S. Simultaneous structure and texture image inpainting. IEEE Trans. Image Process. 2003, 12, 882–889. [Google Scholar] [CrossRef]
- Levin, A.; Zomet, A.; Weiss, Y. Learning How to Inpaint from Global Image Statistics. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), Nice, France, 14–17 October 2003; IEEE Computer Society: Washington, DC, USA, 2003; pp. 305–312. [Google Scholar] [CrossRef]
- Weickert, J. Coherence-Enhancing Diffusion Filtering. Int. J. Comput. Vis. 1999, 31, 111–127. [Google Scholar] [CrossRef]
- Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D.B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 2009, 28, 24. [Google Scholar] [CrossRef]
- Sun, J.; Yuan, L.; Jia, J.; Shum, H. Image completion with structure propagation. ACM Trans. Graph. 2005, 24, 861–868. [Google Scholar] [CrossRef]
- Criminisi, A.; Pérez, P.; Toyama, K. Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 2004, 13, 1200–1212. [Google Scholar] [CrossRef]
- Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Washington, DC, USA, 2016; pp. 2536–2544. [Google Scholar] [CrossRef]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.; Tao, A.; Catanzaro, B. Image Inpainting for Irregular Holes Using Partial Convolutions. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part XI. Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11215, pp. 89–105. [Google Scholar] [CrossRef]
- Zeng, Y.; Lin, Z.; Lu, H.; Patel, V.M. CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 14144–14153. [Google Scholar] [CrossRef]
- Zeng, Y.; Fu, J.; Chao, H.; Guo, B. Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 1486–1494. [Google Scholar] [CrossRef]
- Hui, Z.; Li, J.; Wang, X.; Gao, X. Image Fine-grained Inpainting. arXiv 2020, arXiv:2002.02609. [Google Scholar]
- Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R., Eds.; pp. 1305–1316. [Google Scholar]
- Chen, J.; Wang, X.; Guo, Z.; Zhang, X.; Sun, J. Dynamic Region-Aware Convolution. arXiv 2020, arXiv:2003.12243. [Google Scholar]
- Li, Y.; Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yu, Y.; Yuan, L.; Liu, Z.; Chen, M.; Vasconcelos, N. Revisiting Dynamic Convolution via Matrix Decomposition. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, 3–8 January 2022; pp. 3172–3182. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III. Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Tylecek, R.; Sára, R. Spatial Pattern Templates for Recognition of Objects with Regular Structure. In Proceedings of the Pattern Recognition—35th German Conference, GCPR 2013, Saarbrücken, Germany, 3–6 September 2013; Proceedings. Weickert, J., Hein, M., Schiele, B., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2013; Volume 8142, pp. 364–374. [Google Scholar] [CrossRef]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 Million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Zhu, D.; Li, J.; Wang, F.; Gong, X.; Cong, W.; Wang, P.; Liu, Y. A Method for Extracting Contours of Building Facade Hollowing Defects Using Polarization Thermal Images Based on Improved Canny Algorithm. Buildings 2023, 13, 2563. [Google Scholar] [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; Computer Vision Foundation/IEEE Computer Society: Washington, DC, USA, 2018; pp. 586–595. [Google Scholar] [CrossRef]
- Kaup, A.; Meisinger, K.; Aach, T. Frequency selective signal extrapolation with applications to error concealment in image communication. AEU-Int. J. Electron. Commun. 2005, 59, 147–156. [Google Scholar] [CrossRef]
- Telea, A.C. An Image Inpainting Technique Based on the Fast Marching Method. J. Graph. GPU Game Tools 2004, 9, 23–34. [Google Scholar] [CrossRef]
- Ren, Y.; Yu, X.; Zhang, R.; Li, T.H.; Liu, S.; Li, G. StructureFlow: Image Inpainting via Structure-Aware Appearance Flow. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 181–190. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Xu, S.; Zhang, J.; Li, Y. Knowledge-Driven and Diffusion Model-Based Methods for Generating Historical Building Facades: A Case Study of Traditional Minnan Residences in China. Information 2024, 15, 344. [Google Scholar] [CrossRef]
PSNR (↑) | SSIM (↑) | LPIPS (↓) | |||||||
---|---|---|---|---|---|---|---|---|---|
Mask | 10–20% | 20–40% | 40–60% | 10–20% | 20–40% | 40–60% | 10–20% | 20–40% | 40–60% |
FRS [44] | 32.7418 | 29.2348 | 10.0350 | 0.9708 | 0.9487 | 0.1583 | 0.5759 | 0.5806 | 0.6654 |
NS [4] | 31.6565 | 27.7897 | 20.8385 | 0.9668 | 0.9393 | 0.7641 | 0.0357 | 0.0623 | 0.2565 |
Telea [45] | 32.9517 | 27.7469 | 20.8646 | 0.9789 | 0.9378 | 0.7623 | 0.0205 | 0.0641 | 0.2632 |
PENNet [28] | 26.6155 | 28.6404 | 21.3096 | 0.9395 | 0.8835 | 0.7598 | 0.0453 | 0.089 | 0.1880 |
DeepFillV2 [16] | 32.7418 | 29.2348 | 21.4314 | 0.9708 | 0.9486 | 0.7916 | 0.0226 | 0.0367 | 0.1463 |
Cr-fill [27] | 32.0074 | 28.8477 | 21.5734 | 0.9674 | 0.9458 | 0.7929 | 0.0253 | 0.0398 | 0.1464 |
Lama [34] | − | − | 23.0938 | − | − | 0.8036 | − | − | 0.1326 |
Ours | 33.5893 | 29.7769 | 22.5839 | 0.9733 | 0.9515 | 0.8098 | 0.0184 | 0.0306 | 0.1188 |
Algorithm | Paris StreetView | CMP Facade (Semantic Label) | ||||
---|---|---|---|---|---|---|
MAE (↓) | PSNR (↑) | SSIM (↑) | MAE (↓) | PSNR (↑) | SSIM (↑) | |
PENNet [28] | 0.0402 | 24.3106 | 0.7858 | − | − | − |
Lama [34] | 0.0417 | 23.7953 | 0.8387 | 0.1303 | 13.6811 | 0.5199 |
DeepFillV2 [16] | 0.0327 | 24.8073 | 0.8237 | 0.1387 | 10.4645 | 0.7494 |
Cr-fill [27] | 0.0331 | 24.9039 | 0.8183 | 0.0936 | 14.0850 | 0.6613 |
Ours | 0.0292 | 26.2195 | 0.8387 | 0.0387 | 18.2277 | 0.8848 |
Method | MAE (↓) | PSNR (↑) | LPIPS (↓) | FID (↓) |
---|---|---|---|---|
Baseline [46] | 0.0362 | 22.3843 | 0.1203 | 22.2626 |
Baseline+condconv [30] | 0.0356 | 22.3831 | 0.1139 | 16.7377 |
Baseline+DRconv [31] | 0.0357 | 22.3311 | 0.1167 | 14.4352 |
Baseline+Dconv-block (Ours) | 0.0355 | 22.3891 | 0.1132 | 13.4204 |
Method | MAE (↓) | PSNR (↑) | LPIPS (↓) | FID (↓) |
---|---|---|---|---|
Baseline [46] | 0.0362 | 22.3843 | 0.1203 | 22.2626 |
Ours (without SA) | 0.0363 | 22.3385 | 0.1138 | 14.9955 |
Ours | 0.0355 | 22.3891 | 0.1132 | 13.4204 |
Dataset | Method | MAE (↓) | PSNR (↑) | SSIM (↑) | LPIPS (↓) | FID (↓) |
---|---|---|---|---|---|---|
CelebA | DeepFillV2 [16] | 0.0364 | 27.5051 | 0.8702 | 0.4048 | 16.6591 |
Cr-fill [27] | 0.0417 | 26.0711 | 0.8697 | 0.3179 | 22.3520 | |
Ours | 0.0359 | 28.5568 | 0.8764 | 0.2184 | 11.3311 | |
Places2 | DeepFillV2 [16] | 0.0412 | 24.7671 | 0.8114 | 0.9747 | 66.0904 |
Cr-fill [27] | 0.0458 | 24.1457 | 0.8008 | 0.9707 | 70.0322 | |
Ours | 0.0376 | 24.3606 | 0.7941 | 0.5730 | 49.1062 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, Z.; Lin, Y.; Huang, X.; Zhang, Z.; Wang, Z. Building Facade-Completion Network Based on Dynamic Convolutional GAN. Electronics 2024, 13, 3422. https://doi.org/10.3390/electronics13173422
Cai Z, Lin Y, Huang X, Zhang Z, Wang Z. Building Facade-Completion Network Based on Dynamic Convolutional GAN. Electronics. 2024; 13(17):3422. https://doi.org/10.3390/electronics13173422
Chicago/Turabian StyleCai, Zhenhuang, Yangbin Lin, Xingwang Huang, Zongliang Zhang, and Zongyue Wang. 2024. "Building Facade-Completion Network Based on Dynamic Convolutional GAN" Electronics 13, no. 17: 3422. https://doi.org/10.3390/electronics13173422
APA StyleCai, Z., Lin, Y., Huang, X., Zhang, Z., & Wang, Z. (2024). Building Facade-Completion Network Based on Dynamic Convolutional GAN. Electronics, 13(17), 3422. https://doi.org/10.3390/electronics13173422