Bridging Formal Shape Models and Deep Learning: A Novel Fusion for Understanding 3D Objects
Abstract
1. Introduction
- Shape estimates are guaranteed to satisfy complex geometric shape and physical constraints, including self-symmetry, self-similarity, and free-standing stability properties.
- Shape estimates are guaranteed to satisfy important geometric model properties by providing water-tight, i.e., manifold, polygon models that require a small number of triangle primitives to describe the basic object geometry.
- Shape estimates provide a highly compact parametric representation of objects, allowing objects to be efficiently shared over communication links.
- User-provided shape programs allow human-in-the-loop control over DL estimates. Aspects of this control include specifying lists of candidate objects, the shape variations that each object can exhibit, and the level of detail or, equivalently, dimension of the latent representation of the shape. These aspects of our approach allow humans to more easily control the DL estimate outputs and also enable humans to more easily interpret DL estimate results, which we collectively refer to as “human-in-the-loop” benefits.
- Users can control the complexity and diversity of DL-estimated shapes for each object and for each object component directly through the construction of the DL network.
- Object models can be used to synthesize training data for DL systems, improving over current 3D model databases which use static 3D models and therefore lack geometric diversity. Object models can be combined to generate extremely large synthetic 2D/3D datasets having rich geometric diversity and including important annotations to support a wide variety of 3D and 2D DL applications.
- An example of the proposed DL fusion is provided that detects objects, and their parametric representation given a PSML shape grammar is demonstrated. Key metrics for the model estimates are shown that demonstrate the benefits of this approach.
2. Related Work
- A overview of shape grammar and its applications.
- An examination of deep learning generative models of 3D shapes.
2.1. Shape Grammar
2.2. Generative Models of 3D Shapes
3. Methodology
- An introduction to the Procedural Shape Modeling Language (PSML) that incorporates shape grammar programs as elements within the sequential programming code (Section 3.1).
- A discussion of the benefits offered by the PSML data generation approach (Section 3.2).
- A fused system of PSML and DL that takes point cloud as input and generates 3D shape estimates of objects (Section 3.3).
- An application of using the PSML-driven method to generate synthetic datasets (Section 3.4).
3.1. Procedural Shape Modeling Language (PSML)
| Algorithm 1: The contents of the PSML program: Table.psm | 
|  | 
3.2. Benefits of PSML-Driven Data Generation
- Enforced geometric constraints and physical constraints;
- Manifold polygon models;
- Compact parametric representation;
- Unlimited semantic variability;
- Human-in-the-loop control and interpretability of DL estimates.
3.2.1. Geometric and Physical Constraints
| Algorithm 2: The contents of PSML program: Door.psm | 
|  | 
3.2.2. Manifold Polygon Models
3.2.3. Compact Parametric Representation
3.2.4. Unlimited Semantic Variability
3.2.5. Human-In-The-Loop Control and Interpretability
3.3. Fusion of PSML and Deep Learning
- A new Multi-Layer Perceptron (MLP) is added to the existing architecture for PSML parameters estimation.
- The PSML parameters are encoded as an additional attribute of the 3D bounding boxes for prediction.
3.4. Data Synthesis for DL Systems
3.4.1. Scene Data Generation
3.4.2. Sensor Data Generation
3.4.3. Synthetic Dataset
4. Results
4.1. Comparison with Other Generative Methods
4.2. Comparison with Other Data Representations
4.3. PSML-DL Fusion for Shape Detection Task
4.3.1. Synthetic Dataset Generation
4.3.2. Three-Dimensional Object Detection and Shape Estimation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| 3D | Three-dimensional | 
| DL | Deep Learning | 
| PSML | Procedural Shape Modeling Language | 
| AI | Artificial Intelligence | 
| VAE | Variational Autoencoders | 
| GANs | Generative Adversarial Networks | 
| MAE | Mean Absolute Error | 
| 3DSVAE | 3D Shape Variational Autoencoder | 
| 3DGANs | 3D Generative Adversarial Network | 
| SJC | Score Jacobian Chaining | 
References
- Ajayi, E.A.; Lim, K.M.; Chong, S.C.; Lee, C.P. 3D Shape Generation via Variational Autoencoder with Signed Distance Function Relativistic Average Generative Adversarial Network. Appl. Sci. 2023, 13, 5925. [Google Scholar] [CrossRef]
- Dai, B.; Wipf, D. Diagnosing and enhancing VAE models. arXiv 2019, arXiv:1903.05789. [Google Scholar]
- Kosiorek, A.R.; Strathmann, H.; Zoran, D.; Moreno, P.; Schneider, R.; Mokrá, S.; Rezende, D.J. Nerf-vae: A geometry aware 3d scene generative model. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 5742–5752. [Google Scholar]
- Wu, J.; Zhang, C.; Xue, T.; Freeman, B.; Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar] [CrossRef]
- Frühstück, A.; Sarafianos, N.; Xu, Y.; Wonka, P.; Tung, T. Vive3d: Viewpoint-independent video editing using 3d-aware gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4446–4455. [Google Scholar]
- Chan, E.R.; Lin, C.Z.; Chan, M.A.; Nagano, K.; Pan, B.; De Mello, S.; Gallo, O.; Guibas, L.J.; Tremblay, J.; Khamis, S.; et al. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16123–16133. [Google Scholar]
- Liu, R.; Wu, R.; Van Hoorick, B.; Tokmakov, P.; Zakharov, S.; Vondrick, C. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 9298–9309. [Google Scholar]
- Stability AI. 3D Couch Generated Using Zero123-XL Models. Available online: https://stability.ai/news/stable-zero123-3d-generation (accessed on 7 April 2023).
- Wang, H.; Du, X.; Li, J.; Yeh, R.A.; Shakhnarovich, G. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12619–12629. [Google Scholar]
- Xu, J.; Wang, X.; Cheng, W.; Cao, Y.P.; Shan, Y.; Qie, X.; Gao, S. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20908–20918. [Google Scholar]
- ArcGIS CityEngine. ArcGIS CityEngine. Available online: https://www.esri.com/en-us/arcgis/products/arcgis-cityengine/overview (accessed on 7 April 2023).
- Yang, L.; Li, J.; Chang, H.T.; Zhao, Z.; Ma, H.; Zhou, L. A generative urban space design method based on shape grammar and urban induction patterns. Land 2023, 12, 1167. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, N.; Quan, F.; Li, Y.; Wang, S. Digital form generation of heritages in historical district based on plan typology and shape grammar: Case study on kulangsu islet. Buildings 2023, 13, 229. [Google Scholar] [CrossRef]
- Barros, M.; Duarte, J.P.; Chaparro, B. A grammar-based model for the mass customisation of chairs: Modelling the optimisation part. Nexus Netw. J. 2015, 17, 875–898. [Google Scholar] [CrossRef]
- Jowers, I.; Earl, C.; Stiny, G. Shapes, structures and shape grammar implementation. Comput.-Aided Des. 2019, 111, 80–92. [Google Scholar] [CrossRef]
- Havemann, S.; Fellner, D. Generative parametric design of gothic window tracery. In Proceedings of the Shape Modeling Applications, Genova, Italy, 7–9 June 2004; pp. 350–353. [Google Scholar]
- Goodman, N.; Mansinghka, V.; Roy, D.M.; Bonawitz, K.; Tenenbaum, J.B. Church: A language for generative models. arXiv 2012, arXiv:1206.3255. [Google Scholar]
- Havemann, S. Generative Mesh Modeling. Ph.D. Thesis, Technical University of Braunschweig, Braunschweig, Germany, 2005. [Google Scholar]
- Willis, A.R.; Ganesh, P.; Volle, K.; Zhang, J.; Brink, K. Volumetric procedural models for shape representation. Graph. Vis. Comput. 2021, 4, 200018. [Google Scholar] [CrossRef]
- Agarwal, S.; Furukawa, Y.; Snavely, N.; Simon, I.; Curless, B.; Seitz, S.M.; Szeliski, R. Building rome in a day. Commun. ACM 2011, 54, 105–112. [Google Scholar] [CrossRef]
- Zhao, P.; Fang, T.; Xiao, J.; Zhang, H.; Zhao, Q.; Quan, L. Rectilinear Parsing of Architecture in Urban Environment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1–8. [Google Scholar]
- Teboul, O.; Simon, L.; Koutsourakis, P.; Paragios, N. Segmentation of building facades using procedural shape prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1–8. [Google Scholar]
- Teboul, O.; Kokkinos, I.; Simon, L.; Koutsourakis, P.; Paragios, N. Shape Grammar Parsing via Reinforcement Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2273–2280. [Google Scholar]
- Stiny, G. Introduction to shape and shape grammars. Environ. Plan. B Plan. Des. 1980, 7, 343–351. [Google Scholar] [CrossRef]
- Ritchie, D.; Mildenhall, B.; Goodman, N.D.; Hanrahan, P. Controlling procedural modeling programs with stochastically-ordered sequential monte carlo. ACM Trans. Graph. (TOG) 2015, 34, 1–11. [Google Scholar] [CrossRef]
- Jiang, H.; Yan, D.M.; Zhang, X.; Wonka, P. Selection expressions for procedural modeling. IEEE Trans. Vis. Comput. Graph. 2018, 26, 1775–1788. [Google Scholar] [CrossRef] [PubMed]
- Talton, J.O.; Lou, Y.; Lesser, S.; Duke, J.; Mech, R.; Koltun, V. Metropolis procedural modeling. ACM Trans. Graph. 2011, 30, 1–14. [Google Scholar] [CrossRef]
- Mata, M.P.; Ahmed-Kristensen, S.; Shea, K. Implementation of design rules for perception into a tool for three-dimensional shape generation using a shape grammar and a parametric model. J. Mech. Des. 2019, 141, 011101. [Google Scholar] [CrossRef]
- Jones, R.K.; Barton, T.; Xu, X.; Wang, K.; Jiang, E.; Guerrero, P.; Mitra, N.J.; Ritchie, D. Shapeassembly: Learning to generate programs for 3d shape structure synthesis. ACM Trans. Graph. (TOG) 2020, 39, 1–20. [Google Scholar] [CrossRef]
- Koutsourakis, P.; Simon, L.; Teboul, L.; Tziritas, G.; Paragios, N. Single View Reconstruction Using Shape Grammars for Urban Environments. In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 1–8. [Google Scholar]
- Kyriakaki, G.; Doulamis, A.; Doulamis, N.; Ioannides, M.; Makantasis, K.; Protopapadakis, E.; Hadjiprocopis, A.; Wenzel, K.; Fritsch, D.; Klein, M.; et al. 4D reconstruction of tangible cultural heritage objects from web-retrieved images. Int. J. Herit. Digit. Era 2014, 3, 431–451. [Google Scholar] [CrossRef]
- Hohmann, B.; Krispel, U.; Havemann, S.; Fellner, D. CityFit: High-Quality Urban Reconstructions by Fitting Shape Grammers to Images and Derived Textured Point Clouds. In Proceedings of the ISPRS International Workshop, Lund, Sweden, 3–4 December 2009; pp. 1–8. [Google Scholar]
- Tran, H.; Khoshelham, K.; Kealy, A.; Díaz-Vilariño, L. Shape grammar approach to 3D modeling of indoor environments using point clouds. J. Comput. Civ. Eng. 2019, 33, 04018055. [Google Scholar] [CrossRef]
- Jones, R.K.; Habib, A.; Hanocka, R.; Ritchie, D. The neurally-guided shape parser: Grammar-based labeling of 3d shape regions with approximate inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11614–11623. [Google Scholar]
- Liu, T.; Chaudhuri, S.; Kim, V.G.; Huang, Q.; Mitra, N.J.; Funkhouser, T. Creating consistent scene graphs using a probabilistic grammar. ACM Trans. Graph. (TOG) 2014, 33, 1–12. [Google Scholar] [CrossRef]
- Misra, I.; Girdhar, R.; Joulin, A. An end-to-end transformer model for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2906–2917. [Google Scholar]
- Woods, J.O.; Christian, J.A. Glidar: An OpenGL-based, real-time, and open source 3D sensor simulator for testing computer vision algorithms. J. Imaging 2016, 2, 5. [Google Scholar] [CrossRef]
- De Vries, J. Learn Opengl; 2015; Volume 4. Available online: https://learnopengl.com (accessed on 7 April 2023).
- Khoshelham, K.; Elberink, S.O. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 2012, 12, 1437–1454. [Google Scholar] [CrossRef] [PubMed]
- Ishan, M.; Rohit, G.; Armand, J. 3DETR: An End-to-End Transformer Model for 3D Object Detection. Available online: https://github.com/facebookresearch/3detr/tree/main (accessed on 7 April 2023).
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
- Song, S.; Lichtenberg, S.P.; Xiao, J. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 567–576. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5828–5839. [Google Scholar]














| Table (Box) | Chair | Couch | Bookshelf | Window | Door | Room | |
|---|---|---|---|---|---|---|---|
| PSML | 1046 | 6410 | 3606 | 2350 | 2244 | 798 | 21,328 | 
| Polygon Mesh | 92,496 | 3840 | 3360 | 15,600 | 18,960 | 1920 | 529,344 | 
| Point Cloud | 63,360 | 42,000 | 7,063,680 | 647,760 | 786,960 | 192,240 | 14,986,080 | 
| Table | Chair | Couch | Bookshelf | Window | Door | Overall | ||
|---|---|---|---|---|---|---|---|---|
| 3DETR-P | 98.90 | 87.60 | 99.30 | 98.95 | 93.89 | 56.84 | 89.25 | |
| 89.64 | 61.76 | 95.16 | 93.96 | 62.51 | 13.16 | 69.36 | ||
| 0.14 | 0.11 | 0.19 | 0.16 | 0.23 | 0.20 | 0.17 | ||
| 3DETR-SUN | 52.6 | 72.4 | 65.3 | 28.5 | - | - | 54.7 | |
| 3DETR-SN | 67.6 | 90.9 | 89.8 | 56.4 | 39.6 | 52.4 | 66.1 | 
| Exp 1 (Figure 14c) | table | 0.13 | 0.06 | 0.12 | 0.12 | 0 | 
| chair | 0.10 | 0.10 | 0.15 | 0 | 0.01 | |
| bookshelf | 0.08 | 0.13 | 0.07 | 0.11 | 0.14 | |
| Exp 2 (Figure 14f) | table | 0.09 | 0.11 | 0.08 | 0.12 | 0.01 | 
| chair | 0.07 | 0.09 | 0.13 | 0 | 0 | |
| window | 0.17 | 0.13 | 0.17 | 0.15 | 0 | |
| door | 0.14 | 0.16 | 0.13 | 0.01 | 0.01 | |
| Exp 3 (Figure 14i) | table | 0.12 | 0.10 | 0.09 | 0.11 | 0 | 
| chair | 0.12 | 0.09 | 0.13 | 0.01 | 0 | |
| door | 0.13 | 0.15 | 0.12 | 0 | 0.02 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Willis, A.R. Bridging Formal Shape Models and Deep Learning: A Novel Fusion for Understanding 3D Objects. Sensors 2024, 24, 3874. https://doi.org/10.3390/s24123874
Zhang J, Willis AR. Bridging Formal Shape Models and Deep Learning: A Novel Fusion for Understanding 3D Objects. Sensors. 2024; 24(12):3874. https://doi.org/10.3390/s24123874
Chicago/Turabian StyleZhang, Jincheng, and Andrew R. Willis. 2024. "Bridging Formal Shape Models and Deep Learning: A Novel Fusion for Understanding 3D Objects" Sensors 24, no. 12: 3874. https://doi.org/10.3390/s24123874
APA StyleZhang, J., & Willis, A. R. (2024). Bridging Formal Shape Models and Deep Learning: A Novel Fusion for Understanding 3D Objects. Sensors, 24(12), 3874. https://doi.org/10.3390/s24123874
 
        



 
       