FP-MAE: A Self-Supervised Model for Floorplan Generation with Incomplete Inputs
Abstract
1. Introduction
1.1. Research Background
1.2. Contribution
2. Related Works
2.1. Application of Self-Supervised Learning
2.2. Development of Autoencoders
2.3. Architectural Floor Plan Generation
3. Methodology
3.1. FP-MAE Encoder
3.2. FP-MAE Decoder
3.3. Loss Function
3.4. Dataset
3.5. Sampling Strategies
- (1)
- High random sampling and masking reduce redundancy, turning patch reconstruction into extrapolation from masked neighbors.
- (2)
- Uniform distribution avoids central bias, but diverse sampling strategies are used due to uneven information distribution in floor plans.
- (3)
- We apply large-scale masking to provide the encoder with ample room to operate efficiently, using sparse input image patches to predict an entire architectural floor plan from limited information.
4. Experiments
4.1. Self-Supervised Pretraining
4.2. Masking and Sampling Strategies
4.3. Qualitative Experimental Results
- (1)
- Random Masking (80% masked): Figure 6 and Figure 7 show examples of FP-MAE reconstructions where the input patches were randomly masked (Figure 6 for a line-drawing plan, Figure 7 for a colored plan). Despite the extremely large amount of missing information, the model is still able to restore a reasonable architectural floor plan structure and make local adjustments that align with the visible content. In the line drawing case, the input might have only sparse wall segments and a few visible corners. FP-MAE completes the structure by drawing walls that enclose spaces, effectively guessing the presence of rooms. One can see that the primary functional zoning remains clear in the output. Some small details are understandably imperfect, for example, the exact positions of doors and windows in the reconstructed portion may not match the original and room dimensions might be slightly off, since multiple configurations could satisfy the limited cues. In the colored plan case, FP-MAE not only reconstructs the walls but also fills the correct color in many areas, indicating it inferred the room types. However, those differences do not detract from the overall feasibility of the layout, and an architect could take the output for a valid design. It should be noted that FP-MAE operates at the level of two-dimensional plan configuration and does not model experiential or inhabitation-related aspects of space symbolized by the walls. These results highlight FP-MAE’s ability to perform extrapolation: given disjoint glimpses of a layout, it extrapolates the likely connecting structure.
- (2)
- Center Masking (30% masked in center): Figure 8 provides examples where the entire center of the floor plan was missing, and the input had only the outer frame, like an empty house outline with some peripheral rooms drawn. FP-MAE’s output in this case successfully reconstructs the main internal layout. We do notice in such cases that while the structure is usually dead-on, with walls aligning perfectly with existing ones at the boundary of the masked area, the content of the central reconstructed area can have more variability. For instance, the model might place two small rooms where the original had one large room. This reflects the model’s training to minimize MSE loss, and in ambiguous situations it might produce an “average” solution or select one plausible option among many. Despite this, crucial architectural elements like continuity of walls and maintaining accessible pathways were satisfied. These cases show that FP-MAE has a good understanding of how rooms typically connect in a residential layout even if the middle portion is unseen.
- (3)
- Perimeter (Edge) Masking (70% edges masked): Figure 9 shows an example input where the border of the floor plan is missing, and only some interior rooms are visible, along with FP-MAE’s reconstruction of the perimeter. The model effectively restores the outline of the house, adding exterior walls that close off the layout. If the visible interior suggested a rectangular overall shape, the output reflected that shape at the edges. In the illustrated output, edge rooms that were entirely masked are reconstructed; for instance, an edge might show a balcony connecting to the interior. These reconstructed edge rooms display slight variations: sometimes the model may not perfectly recover their exact shape or may confuse a balcony with a room due to limited evidence. Nonetheless, FP-MAE consistently generates plausible completions in masked edge regions, rather than leaving gaps. In practical applications, this ability enables the model to suggest extensions for a partially drawn interior plan.
- (4)
- Spotted-perimeter Masking (75% masked): Figure 10 presents an example input where the outer boundary and some central patches of the floor plan are masked, leaving only discontinuous central fragments visible. Unlike edge masking, which only removes the entire outline, spotted-perimeter masking adds intermittent gaps in the center, forcing FP-MAE to reconstruct the overall outline and simultaneously fill in missing central details. In this scenario, the model attempts to close the layout by extrapolating exterior walls and reconstructing rooms attached to the edge. Because as much as 75% of the plan is missing, the reconstructed results often appear simplified or slightly blurred. Certain details, such as window placement or exact room proportions, may be inaccurately inferred, and in some cases the model may misalign confuse a balcony or a small room. Nevertheless, FP-MAE consistently produces a coherent and plausible boundary rather than leaving large gaps. This shows that even under extremely sparse input conditions, the model retains sufficient prior knowledge to imagine reasonable edge structures. In practical applications, this means that spotted-perimeter masking could support early-stage design tasks in which only a small portion of the layout is provided, offering plausible completions of the entire plan.
- (5)
- Biased-keyhole Masking (90% masked): Figure 11 shows an extreme case in which only a very small “keyhole-shaped” region of the floor plan is left visible, while the remaining 90% of the layout is masked. This setting emulates a scenario where only a limited fragment of the design is available, for instance a small corner or partial core, while the remainder of the plan is left unspecified. FP-MAE must therefore extrapolate nearly the entire structure from this narrow clue. In the reconstructed outputs, the model attempts to extend the visible fragment into a complete house layout. For example, if the revealed patch suggests part of a corridor or a wall junction, FP-MAE generates surrounding rooms and boundaries consistent with that structure. However, given the extreme sparsity of information, the results tend to be simplified and blurred. Fine elements such as windows, doors, or exact proportions are often missing or roughly approximated. Some reconstructed walls also exhibit lower contrast and slight misalignment, reflecting the uncertainty of inferring large-scale structure from minimal evidence. Despite these challenges, FP-MAE still produces layouts that are structurally coherent rather than chaotic, filling masked regions with plausible room divisions and exterior walls. This demonstrates the model’s ability to rely on strong prior knowledge of floor plan regularities, even when trained under very sparse conditions.
- (6)
- One-Sided Masking (30% missing one side): It refers to a strategy in which only a single boundary of the floor plan image is masked. This setting emulates realistic design scenarios where architects extend or modify a specific edge of a layout, such as adding new rooms to one wing of a house or expanding a facade. In our implementation, we systematically masked either the top, bottom, or right side of the plan in separate experiments, thereby creating diverse but structured missing regions. A masking ratio of approximately 30% was applied to these boundary regions, indicated as gray blocks in Figure 12. This level of masking was carefully selected to preserve most of the original spatial organization while still introducing a meaningful reconstruction challenge. Because the essential layout remains visible, FP-MAE can effectively leverage the intact structural cues to infer the missing parts. The model not only restores the broader spatial configuration but also accurately reconstructs fine-grained elements such as walls, doors, and room boundaries. The results demonstrate that one-sided masking allows FP-MAE to excel at producing reconstructions that are both structurally coherent and visually convincing, highlighting its ability to generalize even when large continuous areas are absent.
- (7)
- Corner Masking (75% masked, only a corner visible): Figure 13 provides an example with only a small top-left corner of the floor plan given and everything else masked. The output in this case shows the model’s attempt to imagine the rest of the house. In the example, the visible corner might contain a bedroom. FP-MAE then conjectured a layout that extends from that bedroom—it might add a living room adjacent to it, bedrooms on the other end, etc. The reconstructed areas are a bit more generic: walls are correctly placed to form rooms, but one can see they are somewhat simplified or blurred in drawing quality. For instance, windows might be missing or roughly placed since the model is less certain where they should go. Some walls may also appear with lower contrast or slight misalignment as the model balances many possibilities. Still, the output is a valid floor plan. It is impressive that with a tiny fraction of input, the model does not produce chaotic lines; instead, it outputs a structured set of spaces. This demonstrates strong prior knowledge learned by FP-MAE about what constitutes a plausible floor plan.
- (8)
- Spotted-corner Masking (80% masked): Figure 14 illustrates an input where most of the floor plan is obscured, leaving only a fragmented corner region partially visible. Unlike standard corner masking, where a single continuous corner is revealed, spotted-corner masking introduces discontinuous patches in the top-left corner area. This makes the reconstruction task even more challenging, as FP-MAE must infer the overall layout from scattered local clues rather than a single coherent boundary. In the outputs, the model extends the incomplete fragments into a full residential layout. For instance, if a visible fragment suggests part of a balcony or corridor, FP-MAE extrapolates plausible adjacent spaces, adding living areas or bedrooms in reasonable positions. However, due to the high masking ratio (80% of the plan missing) and the fragmented nature of the input, the reconstructions often appear blurred or simplified. Windows and doors may be roughly placed or omitted, and some walls show reduced contrast or slight misalignment. Nevertheless, FP-MAE consistently produces structured and coherent layouts rather than chaotic lines. Even with such limited and discontinuous input, the model leverages strong architectural priors to fill in plausible spaces, demonstrating its robustness in handling highly sparse and irregular masking scenarios.
4.4. Quantitative Experimental Results
5. Discussion and Conclusions
5.1. Contribution to Architectural Image Completion
5.2. Implications for Architectural Practice
5.3. Limitations and Future Research Directions
5.4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Purcell, A.T.; Gero, J.S. Drawings and the design process: A review of protocol studies in design and other disciplines and related research in cognitive psychology. Des. Stud. 1998, 19, 389–430. [Google Scholar] [CrossRef]
- Hettithanthri, U.; Hansen, P.; Munasinghe, H. Exploring the architectural design process assisted in conventional design studio: A systematic literature review. Int. J. Technol. Des. Educ. 2023, 33, 1835–1859. [Google Scholar] [CrossRef]
- Sami, Z.; Özer, Y.S. The importance of figural and verbal sketches in creativity within the architectural design studio. Megaron Yildiz Tech. Univ. Fac. Archit. E J. 2024, 19, 539–549. [Google Scholar] [CrossRef]
- Li, M.; Wang, C.; Wu, Y.; Santamouris, M.; Lu, S. Assessing spatial inequities of thermal environment and blue-green intervention for vulnerable populations in dense urban areas. Urban Clim. 2025, 59, 102328. [Google Scholar] [CrossRef]
- Vandenhende, K. Mixing Specific and More Universal Design Media to Deal with Multidisciplinarity. Athens J. Archit. 2023, 9, 319–334. [Google Scholar] [CrossRef]
- Lu, S.; Xu, W.; Chen, Y.; Yan, X. An experimental study on the acoustic absorption of sand panels. Appl. Acoust. 2017, 116, 238–248. [Google Scholar] [CrossRef]
- Heckmann, O.; Schneider, F. Floor Plan Manual: Housing; Birkhäuser: Basel, Switzerland, 1997. [Google Scholar]
- Shi, M.; Seo, J.; Cha, S.H.; Xiao, B.; Chi, H.L. Generative AI-powered architectural exterior conceptual design based on the design intent. J. Comput. Des. Eng. 2024, 11, 125–142. [Google Scholar] [CrossRef]
- Zeng, P.; Yin, J.; Zhang, M.; Shen, Y.; Wang, X. CARD: A cross-modal agent framework for generative and editable residential design. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP); Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 1–12. [Google Scholar]
- Yin, J.; Gao, W.; Li, J.; Xu, P.; Wu, C.; Lin, B.; Lu, S. ArchiDiff: Interactive design of 3D architectural forms generated from a single image. Comput. Ind. 2025, 168, 104275. [Google Scholar] [CrossRef]
- Chen, L.; Song, Y.; Guo, J.; Sun, L.; Childs, P.; Yin, Y. How generative AI supports human in conceptual design. Des. Sci. 2025, 11, e9. [Google Scholar] [CrossRef]
- Zeng, P.; Gao, W.; Li, J.; Yin, J.; Chen, J.; Lu, S. Automated residential layout generation and editing using natural language and images. Autom. Constr. 2025, 174, 106133. [Google Scholar] [CrossRef]
- Karadağ, D.; Ozar, B. A new frontier in design studio: AI and human collaboration in conceptual design. Front. Arch. Res. 2025, 14, 1536–1550. [Google Scholar] [CrossRef]
- Zeng, T.; Ma, X.; Luo, Y.; Ji, Y.; Lu, S. Improving outdoor thermal environmental quality through kinetic canopy empowered by machine learning and control algorithms. Build. Simul. 2025, 18, 699–720. [Google Scholar] [CrossRef]
- Agri, M.E.; Le, A.; Phung, Q. AI integration in architectural design and management: Professionals’ perspectives. Arch. Eng. Des. Manag. 2025, 1–16, ahead of print. [Google Scholar] [CrossRef]
- Yenew, A.B.; Assefa, B.G. From Algorithms to Architecture: Computational Methods for House Floorplan Generation. SN Comput. Sci. 2024, 5, 589. [Google Scholar] [CrossRef]
- Liu, S.; Wang, Y.; Liu, X.; Yang, L.; Zhang, Y.; He, J. How does future climatic uncertainty affect multi-objective building energy retrofit decisions? Evidence from residential buildings in subtropical Hong Kong. Sustain. Cities Soc. 2023, 92, 104482. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 16000–16009. [Google Scholar]
- Huang, W.; Zheng, H. Architectural drawings recognition and generation through machine learning. In Proceedings of the 38th Annual Conference of the Association for Computer Aided Design in Architecture, Mexico City, Mexico, 18–20 October 2018. [Google Scholar]
- Lu, Z.; Wang, T.; Guo, J.; Meng, W.; Xiao, J.; Zhang, W.; Zhang, X. Data-driven floor plan understanding in rural residential buildings via deep recognition. Inf. Sci. 2021, 567, 58–74. [Google Scholar] [CrossRef]
- Lv, X.; Zhao, S.; Yu, X.; Zhao, B. Residential floor plan recognition and reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021. [Google Scholar]
- Xu, Z.; Yang, C.; Alheejawi, S.; Jha, N.; Mehadi, S.; Mandal, M. Automatic floor plan analysis using a boundary attention-based deep network. Int. J. Doc. Anal. Recognit. IJDAR 2025, 28, 19–30. [Google Scholar] [CrossRef]
- Abouagour, M.; Garyfallidis, E. GFLAN: Generative Functional Layouts. arXiv 2025, arXiv:2512.16275. [Google Scholar] [CrossRef]
- Zeng, P.; Yin, J.; Gao, Y.; Li, J.; Jin, Z.; Lu, S. Comprehensive and Dedicated Metrics for Evaluating AI-Generated Residential Floor Plans. Buildings 2025, 15, 1674. [Google Scholar] [CrossRef]
- Yin, J.; Zeng, P.; Sun, H.; Dai, Y.; Zheng, H.; Zhang, M.; Zhang, Y.; Lu, S. Floorplan-llama: Aligning architects’ feedback and domain knowledge in architectural floor plan generation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2025. [Google Scholar]
- Hu, S.; Wu, W.; Wang, Y.; Xu, B.; Zheng, L. GSDiff: Synthesizing Vector Floorplans via Geometry-enhanced Structural Graph Generation. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2025; Volume 39, pp. 17323–17332. [Google Scholar]
- Zhong, J.; Li, P.; Luo, R.; Yin, J.; Ding, Y.; Bai, J.; Hong, C.; Deng, X.; Ma, X.; Lu, S. EnergAI: A Large Language Model-Driven Generative Design Method for Early-Stage Building Energy Optimization. Energies 2025, 18, 5921. [Google Scholar] [CrossRef]
- Yin, J.; Zhong, J.; Zeng, P.; Li, P.; Zheng, H.; Zhang, M.; Lu, S. ArchShapeNet: An interpretable 3D-CNN framework for evaluating architectural shapes. Int. J. Archit. Comput. 2025, 23, 14780771251352965. [Google Scholar] [CrossRef]
- Yin, J.; Zeng, P.; Huang, Y.; Sun, H.; Zhong, J.; Hao, T.; Lu, S. AI-empowered prediction of office building energy use from single-view conceptual images for early-stage design. Appl. Energy 2026, 406, 127289. [Google Scholar] [CrossRef]
- Hendrycks, D.; Mazeika, M.; Kadavath, S.; Song, D. Using self-supervised learning can improve model robustness and uncertainty. Adv. Neural Inf. Process. Syst. 2019, 32, 15663–15674. [Google Scholar]
- Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J. Self-supervised learning: Generative or contrastive. IEEE Trans. Knowl. Data Eng. 2021, 35, 857–876. [Google Scholar] [CrossRef]
- Liu, Y.; Jin, M.; Pan, S.; Zhou, C.; Zheng, Y.; Xia, F. Graph self-supervised learning: A survey. IEEE Trans. Knowl. Data Eng. 2022, 35, 5879–5900. [Google Scholar] [CrossRef]
- Gui, J.; Chen, T.; Zhang, J.; Cao, Q.; Sun, Z.; Luo, H. A survey on self-supervised learning: Algorithms, applications, and future trends. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9052–9071. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning; PMLR: New York, NY, USA, 2021; pp. 8748–8763. [Google Scholar]
- Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.-T.; Parekh, Z.; Pham, H.; Le, Q.; Sung, Y.-H.; Li, Z.; Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International Conference on Machine Learning; PMLR: New York, NY, USA, 2021; pp. 4904–4916. [Google Scholar]
- Baevski, A.; Hsu, W.N.; Xu, Q.; Babu, A.; Gu, J.; Auli, M. Data2vec: A general framework for self-supervised learning in speech, vision and language. In Proceedings of the International Conference on Machine Learning; PMLR: New York, NY, USA, 2022; pp. 1298–1312. [Google Scholar]
- Huang, Y.; Zeng, T.; Jia, M.; Yang, J.; Xu, W.; Lu, S. Fusing Transformer and diffusion for high-resolution prediction of daylight illuminance and glare based on sparse ceiling-mounted input. Build. Environ. 2025, 267, 112163. [Google Scholar] [CrossRef]
- Sun, H.; Xia, B.; Zhao, Y.; Chang, Y.; Wang, X. Positive Enhanced Preference Alignment for Text-to-Image Models. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
- Zhou, J.; Wei, C.; Wang, H.; Shen, W.; Xie, C.; Yuille, A.; Kong, T. ibot: Image bert pre-training with online tokenizer. arXiv 2022, arXiv:2111.07832. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://openai.com/research/language-unsupervised (accessed on 1 December 2025).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Sinha, K.; Jia, R.; Hupkes, D.; Pineau, J.; Williams, A.; Kiela, D. Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. arXiv 2021, arXiv:2104.06644. [Google Scholar] [CrossRef]
- Shurrab, S.; Duwairi, R. Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Comput. Sci. 2022, 8, e1045. [Google Scholar] [CrossRef]
- Chen, L.; Bentley, P.; Mori, K.; Misawa, K.; Fujiwara, M.; Rueckert, D. Self-supervised learning for medical image analysis using image context restoration. Med. Image Anal. 2019, 58, 101539. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.-C.; Pareek, A.; Jensen, M.; Lungren, M.P.; Yeung, S.; Chaudhari, A.S. Self-supervised learning for medical image classification: A systematic review and implementation guidelines. NPJ Digit. Med. 2023, 6, 74. [Google Scholar] [CrossRef] [PubMed]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
- Wang, W.; Bao, H.; Dong, L.; Bjorck, J.; Peng, Z.; Liu, Q.; Aggarwal, K.; Mohammed, O.K.; Singhal, S.; Som, S.; et al. Image as a foreign language: Beit pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023. [Google Scholar]
- Li, P.; Yin, J.; Zhong, J.; Luo, R.; Zeng, P.; Zhang, M. Segment any architectural facades (SAAF): An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance. arXiv 2025, arXiv:2506.09071. [Google Scholar] [CrossRef]
- Zhong, J.; Yin, J.; Li, P.; Zeng, P.; Zhang, M.; Luo, R.; Lu, S. ArchiLense: A framework for quantitative analysis of architectural styles based on vision large language models. arXiv 2025, arXiv:2506.07739. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Khan, A.; Sohail, A.; Fiaz, M.; Hassan, M.; Afridi, T.H.; Marwat, S.U.; Munir, F.; Ali, S.; Naseem, H.; Zaheer, M.Z.; et al. A survey of the self supervised learning mechanisms for vision transformers. arXiv 2024, arXiv:2408.17059. [Google Scholar] [CrossRef]
- Xie, Z.; Zhang, Z.; Cao, Y.; Lin, Y.; Bao, J.; Yao, Z.; Dai, Q.; Hu, H. SimMIM: A Simple Framework for Masked Image Modeling. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 9653–9663. [Google Scholar]
- Zhou, J.; Wei, C.; Wang, H.; Shen, W.; Xie, C.; Yuille, A.; Kong, T. iBOT: Image BERT Pre-Training with Online Tokenizer. In Proceedings of the 10th International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
- Zeng, P.; Yin, J.; Zhang, M.; Li, J.; Zhang, Y.; Lu, S. Unified residential floor plan generation with multimodal inputs. Autom. Constr. 2025, 178, 106408. [Google Scholar] [CrossRef]
- Yin, J.; Zeng, P.; Li, P.; Zhong, J.; Thao, T.; Zheng, H.; Lu, S. Drag2Build++: A drag-based 3D architectural mesh editing workflow based on differentiable surface modeling. Front. Archit. Res. 2025, 14, 1602–1620. [Google Scholar] [CrossRef]
- Yin, J.; Zeng, P.; Shen, L.; Zhang, M.; Zhong, J.; Han, Y.; Lu, S. ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2025; pp. 26004–26014. [Google Scholar]
- Baldi, P. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Edinburgh, Scotland, 26 June–1 July 2012. JMLR Workshop and Conference Proceedings. [Google Scholar]
- Berahmand, K.; Daneshfar, F.; Salehi, E.S.; Li, Y.; Xu, Y. Autoencoders and their applications in machine learning: A survey. Artif. Intell. Rev. 2024, 57, 28. [Google Scholar] [CrossRef]
- Vincent, P.; LaRochelle, H.; Bengio, Y.; Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2008. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016. [Google Scholar]
- Van Den Oord, A.; Vinyals, O.; Kavukcuoglu, K. Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 2017, 30, 6306–6315. [Google Scholar]
- Chen, S.; Guo, W. Auto-encoders in deep learning—A review with new perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, H.; Zhang, F. Mask autoencoder for enhanced image reconstruction with position coding offset and combined masking. Vis. Comput. 2025, 41, 7477–7491. [Google Scholar] [CrossRef]
- Rodrigues, E.; Gaspar, A.R.; Gomes, Á. Automated approach for design generation and thermal assessment of alternative floor plans. Energy Build. 2014, 81, 170–181. [Google Scholar] [CrossRef]
- Lu, Z.; Li, Y.; Wang, F. Complex layout generation for large-scale floor plans via deep edge-aware GNNs. Appl. Intell. 2025, 55, 400. [Google Scholar] [CrossRef]
- Liu, J.; Xue, Y.; Ni, H.; Yu, R.; Zhou, Z.; Huang, S.X. Computer-Aided Layout Generation for Building Design: A Review. arXiv 2025, arXiv:2504.09694. [Google Scholar] [CrossRef]
- Yan, S.; Wu, C.; Zhang, Y. Generative design for architectural spatial layouts: A review of technical approaches. J. Asian Archit. Build. Eng. 2025, 1–21. [Google Scholar] [CrossRef]
- Meselhy, A.; Almalkawi, A. A review of artificial intelligence methodologies in computational automated generation of high performance floorplans. NPJ Clean Energy 2025, 1, 2. [Google Scholar] [CrossRef]
- Zeng, P.; Yin, J.; Sun, H.; Dai, Y.; Jiang, M.; Zhang, M.; Lu, S. MRED-14: A Benchmark for Low-Energy Residential Floor Plan Generation with 14 Flexible Inputs. In Proceedings of the 33rd ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2025; pp. 11298–11307. [Google Scholar]
- Zeng, P.; Yin, J.; Huang, Y.; Zhong, J.; Lu, S. AI-based generation and optimization of energy-efficient residential layouts controlled by contour and room number. Build. Simul. 2025, 18, 2777–2805. [Google Scholar] [CrossRef]
- Merrell, P.; Schkufza, E.; Koltun, V. Computer-generated residential building layouts. ACM Trans. Graph. 2010, 29, 181. [Google Scholar] [CrossRef]
- Wu, W.; Fu, X.-M.; Tang, R.; Wang, Y.; Qi, Y.-H.; Liu, L. Data-driven Interior Plan Generation for Residential Buildings. ACM Trans. Graph. 2019, 38, 234. [Google Scholar] [CrossRef]
- Hu, R.; Huang, Z.; Tang, Y.; Van Kaick, O.; Zhang, H.; Huang, H. Graph2Plan: Learning Floorplan Generation from Layout Graphs. ACM Trans. Graph. 2020, 39, 118. [Google Scholar] [CrossRef]
- Nauata, N.; Chang, K.-H.; Cheng, C.-Y.; Mori, G.; Furukawa, Y. House-GAN: Relational Generative Adversarial Networks for Graph-Constrained House Layout Generation. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference; Springer: Cham, Switzerland, 2020; pp. 162–177. [Google Scholar]
- Nauata, N.; Hosseini, S.; Chang, K.-H.; Chu, H.; Cheng, C.-Y.; Furukawa, Y. House-GAN++: Generative Adversarial Layout Refinement Networks. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2021. [Google Scholar]
- Shabani, M.A.; Hosseini, S.; Furukawa, Y. Housediffusion: Vector floorplan generation via a diffusion model with discrete and continuous denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023. [Google Scholar]
- Xia, D.; Wu, Z.; Zou, Y.; Chen, R.; Lou, S. Developing a bottom-up approach to assess energy challenges in urban residential buildings of China. Front. Archit. Res. 2025, 14, 1810–1833. [Google Scholar] [CrossRef]
- Zou, Y.; Chen, Z.; Lou, S.; Huang, Y.; Xia, D.; Cao, Y.; Li, H.; Lun, I.Y.F. Accelerating long-term building energy performance simulation with a reference day method. Build. Simul. 2024, 17, 2331–2353. [Google Scholar] [CrossRef]
- Liu, X.; He, J.; Xiong, K.; Liu, S.; He, B.-J. Identification of factors affecting public willingness to pay for heat mitigation and adaptation: Evidence from Guangzhou, China. Urban Clim. 2023, 48, 101405. [Google Scholar] [CrossRef]
- Rahbar, M.; Mahdavinejad, M.; Markazi, A.H.; Bemanian, M. Architectural Layout Design through Deep Learning and Agent-Based Modeling: A Hybrid Approach. J. Build. Eng. 2022, 47, 103822. [Google Scholar] [CrossRef]
- Tang, H.; Zhang, Z.; Shi, H.; Li, B.; Shao, L.; Sebe, N.; Timofte, R.; Van Gool, L. Graph Transformer GANs for Graph-Constrained House Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2173–2182. [Google Scholar]
- Sun, J.; Wu, W.; Liu, L.; Min, W.; Zhang, G.; Zheng, L. WallPlan: Synthesizing Floorplans by Learning to Generate Wall Graphs. ACM Trans. Graph. 2022, 41, 92. [Google Scholar] [CrossRef]
- Upadhyay, A.; Dubey, A.; Kuriakose, S.M.; Agarawal, S. FloorGAN: Generative Network for Automated Floor Layout Generation. In Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD), CODS-COMAD ’23; Association for Computing Machinery: New York, NY, USA, 2023; pp. 140–148. [Google Scholar]
- Yin, J.; Zeng, P.; Zhong, J.; Li, P.; Zhang, M.; Luo, R.; Lu, S. FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction. arXiv 2025, arXiv:2506.21562. [Google Scholar]
- Leng, S.; Zhou, Y.; Dupty, M.H.; Lee, W.S.; Joyce, S.; Lu, W. Tell2design: A dataset for language-guided floor plan generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, (Volume 1: Long Papers); Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar]
- Zong, Z.; Zhan, Z.; Tan, G. HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation. arXiv 2024, arXiv:2411.12279v1. [Google Scholar]
- Qin, S.; He, C.; Chen, Q.; Yang, S.; Liao, W.; Gu, Y.; Lu, X. ChatHouseDiffusion: Prompt-guided generation and editing of floor plans. arXiv 2024, arXiv:2410.11908. [Google Scholar]
- Gueze, A.; Ospici, M.; Rohmer, D.; Cani, M.-P. Floor Plan Reconstruction from Sparse Views: Combining Graph Neural Network with Constrained Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2023; pp. 1583–1592. [Google Scholar]
- Medel-Vera, C.; Britton, S.; Gates, W.F. An exploration of the role of generative AI in fostering creativity in architectural learning environments. Comput. Educ. Artif. Intell. 2025, 9, 100501. [Google Scholar] [CrossRef]
- Swaileh, W.; Kotzinos, D.; Ghosh, S.; Jordan, M.; Vu, N.-S.; Qian, Y. Versailles-FP dataset: Wall detection in ancient floor plans. In Proceedings of the International Conference on Document Analysis and Recognition; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar]
- Karadag, I. Machine learning for conservation of architectural heritage. Open House Int. 2023, 48, 23–37. [Google Scholar] [CrossRef]















| Method | DATA | FID | PSNR | SSIM |
|---|---|---|---|---|
| Random Masking | Line Drawing | 12.6053 | 79.7688 | 0.9757 |
| Random Masking | Colored Drawing | 17.2192 | 79.7638 | 0.9724 |
| Center Masking | Line Drawing | 22.0109 | 77.7863 | 0.9738 |
| Perimeter Masking | Line Drawing | 12.4044 | 78.6158 | 0.9609 |
| Spotted-perimeter Masking | Line Drawing | 16.8733 | 79.3059 | 0.9637 |
| Biased-keyhole Masking | Line Drawing | 17.4593 | 79.3542 | 0.9632 |
| One-sided Masking | Line Drawing | 8.84111 | 80.1424 | 0.9794 |
| Corner Masking | Line Drawing | 27.3668 | 76.0285 | 0.9247 |
| Spotted-corner Masking | Line Drawing | 19.3524 | 81.3495 | 0.9423 |
| Pix2Pix | Line Drawing | 76.3275 | 68.3546 | 0.9334 |
| CycleGan | Line Drawing | 90.3235 | 64.3487 | 0.9468 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhong, J.; Luo, R.; Li, P.; Li, T.; Zeng, P.; Lei, Z.; Feng, T.; Yin, J. FP-MAE: A Self-Supervised Model for Floorplan Generation with Incomplete Inputs. Buildings 2026, 16, 558. https://doi.org/10.3390/buildings16030558
Zhong J, Luo R, Li P, Li T, Zeng P, Lei Z, Feng T, Yin J. FP-MAE: A Self-Supervised Model for Floorplan Generation with Incomplete Inputs. Buildings. 2026; 16(3):558. https://doi.org/10.3390/buildings16030558
Chicago/Turabian StyleZhong, Jing, Ran Luo, Peilin Li, Tianrui Li, Pengyu Zeng, Zhifeng Lei, Tianjing Feng, and Jun Yin. 2026. "FP-MAE: A Self-Supervised Model for Floorplan Generation with Incomplete Inputs" Buildings 16, no. 3: 558. https://doi.org/10.3390/buildings16030558
APA StyleZhong, J., Luo, R., Li, P., Li, T., Zeng, P., Lei, Z., Feng, T., & Yin, J. (2026). FP-MAE: A Self-Supervised Model for Floorplan Generation with Incomplete Inputs. Buildings, 16(3), 558. https://doi.org/10.3390/buildings16030558

