Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion

Feng, Junfu; Luo, Ruidan; Li, Xuechao; Zhou, Xiaoping; Wang, Mengmeng; Yin, Jiaqi; Yuan, Hong

doi:10.3390/app16031530

Open AccessArticle

Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion

by

Junfu Feng

^1,†,

Ruidan Luo

^2,†

,

Xuechao Li

^3,*,

Xiaoping Zhou

¹

,

Mengmeng Wang

⁴,

Jiaqi Yin

² and

Hong Yuan

²

¹

School of Intelligent Science and Technology, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

²

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100080, China

³

Tongzhou Campus, Renmin University of China, Beijing 100872, China

⁴

Capital Engineering & Research Incorporation Limited, Beijing 100176, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2026, 16(3), 1530; https://doi.org/10.3390/app16031530

Submission received: 7 January 2026 / Revised: 26 January 2026 / Accepted: 27 January 2026 / Published: 3 February 2026

Download

Browse Figures

Versions Notes

Abstract

Building information modeling (BIM)-based interior design automation remains constrained by a semantic mismatch: engineering constraints are explicit and categorical, whereas aesthetic style is implicit, contextual, and difficult to formalize. As a result, existing systems often overfit local visual similarity or rely on rigid rules, producing recommendations that drift stylistically at the scene level or conflict with professional design logic. This paper proposes KsDesign, a neuro-symbolic framework for interpretable, retrieval-based BIM scene completion that unifies visual style perception with explicit design knowledge. Offline, KsDesign mines category-level co-occurrence and compatibility patterns from curated designer-quality interiors and encodes them as a weighted Furniture-Matching Knowledge Graph (FMKG). Online, it learns style representations exclusively from BIM-derived 2D renderings/projections of 3D family models and BIM scenes, and applies a knowledge-guided attention mechanism to weight contextual furniture cues, synthesizing a global scene-style representation for candidate ranking and retrieval. In a Top-3 (K = 3) evaluation on 10 BIM test scenes with a 20-expert consensus ground truth, KsDesign consistently outperforms single-modal baselines, achieving 86.7% precision in complex scenes and improving average precision by 23.5% (up to 40%), with a 15.5% average recall increase. These results suggest that global semantic constraints can serve as a logical regularizer, mitigating the local biases of purely visual matching and yielding configurations that are both aesthetically coherent and logically valid. We further implement in-authoring explainability within Revit, exposing KG-derived influence weights and evidence paths to support rationale inspection and immediate family insertion. Finally, the knowledge priors and traceable intermediate representations provide a robust substrate for integration with LLM-driven conversational design agents, enabling constraint-aware, verifiable generation and interactive iteration.

Keywords:

building information modeling (BIM); interior design synthesis; neuro-symbolic learning; knowledge graph; style representation; explainable AI

1. Introduction

Building Information Modeling (BIM) is widely adopted as a shared digital representation of built assets to support information management and decision-making across design, construction, and operations [1,2]. Open standards such as IFC further enable interoperable, machine-interpretable information exchange, providing a foundation for downstream automation in the AECO domain [3]. Recent systematic reviews indicate a rapid growth of machine learning (ML) and AI applications over BIM-related data [4,5]. Nevertheless, BIM-supported interior design assistance remains comparatively underdeveloped in everyday authoring workflows: designers still devote substantial effort to searching, selecting, and placing furniture families from large libraries, and this “library retrieval + scene composition” workload becomes increasingly inefficient as project scale and library volume grow [6,7,8]. More importantly, professional interior design decisions rely on tacit knowledge—style harmony, co-occurrence conventions, and functional logic—that is difficult to encode in explicit BIM metadata or keyword search alone [4,6].

From a methodological perspective, existing attempts toward intelligent interior furniture recommendation can be broadly viewed through two complementary paradigms. The first paradigm emphasizes data-driven visual compatibility or style learning, where images or renderings are embedded into latent spaces for retrieval or recommendation [9,10,11,12,13]. Such approaches are effective at capturing implicit aesthetic cues but typically do not guarantee satisfaction of functional conventions or professional design logic when used as decision engines in BIM workflows [9,10,13]. The second paradigm relies on explicit rules, constraints, or structured knowledge to preserve logical validity and transparency, yet these approaches often lack the representational flexibility needed to model context-dependent aesthetics at scale, particularly when style signals are only weakly expressed in symbolic attributes [6,14]. This tension exposes a persistent style-logic gap: purely visual models may fail to enforce professional constraints, while purely symbolic rules struggle to represent design “taste” in complex, multi-object scenes.

In practice, explainability is essential for adoption. Professional designers need not only ranked suggestions, but also defensible rationales aligned with how they reason about global room coherence. Explainable AI (XAI) has therefore been increasingly advocated for construction and built-environment decision support to improve trust and accountability [15,16,17]. At the same time, the interior design and graphics communities have advanced from pairwise compatibility toward scene-level modeling and controllable generation, including layout enhancement and physically grounded scene synthesis [18,19,20]. These advances highlight an important point for BIM interior assistance: the problem should be formulated as constrained scene completion—inferring the intended scene style and contextual expectations from the existing layout and objects—rather than as isolated item-to-item matching [9,18,19,20].

To address these challenges, we propose KsDesign, a neuro-symbolic framework for explainable BIM interior furniture recommendation. KsDesign integrates (i) neural visual representation learning from BIM-derived 2D renderings/projections of objects and scenes, and (ii) symbolic reasoning over a Furniture-Matching Knowledge Graph (FMKG) that encodes design logic mined from successful design cases. The curated design-case imagery is used only for extracting co-occurrence/compatibility rules (i.e., knowledge acquisition); style representations used at deployment are learned from BIM-derived renderings/projections rather than from internet photographs. Technically, KsDesign detects furniture instances in design cases using Faster R-CNN [21], mines frequent co-occurrence patterns via FP-Growth association mining [22], and organizes them as a structured knowledge graph. A knowledge-guided fusion mechanism then combines scene-level style cues with symbolic constraints and produces designer-facing rationales grounded in explicit knowledge paths. This approach is consistent with neuro-symbolic AI principles that integrate perception and reasoning to improve interpretability and robustness [11,23], and it aligns with explainable graph-based recommendation research emphasizing graph structure as a natural carrier of explanations [16,24,25,26].

The main contributions are as follows.

(1): Neuro-symbolic style–logic alignment for BIM interior recommendation: a unified framework bridging visual aesthetics and explicit design logic to reduce the style–logic semantic gap.
(2): Automated and scalable design-logic extraction: a data-driven pipeline (Faster R-CNN + FP-Growth) to mine tacit matching patterns from successful design cases and formalize them as a knowledge graph.
(3): Context-aware scene completion via knowledge-guided fusion: a scene-level recommendation strategy that dynamically weights object influences under global logical rules.
(4): Explainability and professional validation: interpretable rationales supported by knowledge paths and expert evaluation, further demonstrated through an in-authoring Revit visualization for rationale inspection and direct family insertion [15,16,17,24].

The remainder of this paper is organized as follows. Section 2 reviews related work in intelligent BIM interior design, visual compatibility learning, and knowledge graph-based reasoning. Section 3 formulates the problem and presents the overall system architecture. Section 4 details the proposed methods, including knowledge graph construction and the scene-style-based recommendation algorithm. Section 5 reports experimental settings and results, followed by the Discussion in Section 6 and conclusions and future work in Section 7.

2. Related Work

This section reviews prior studies most relevant to KsDesign, focusing on (i) BIM-oriented retrieval and automation for object libraries, (ii) visual compatibility and scene-level modeling for interior design, and (iii) knowledge graphs and neuro-symbolic recommendation for explainability.

2.1. BIM-Oriented Object Library Retrieval and Design Automation

A growing body of BIM research investigates AI-enabled BIM intelligence, including retrieval, quality assurance, and workflow assistance, as summarized by recent systematic reviews [4,5,6]. In particular, BIM information retrieval has evolved beyond keyword search toward context-aware retrieval, incorporating information standards, object relationships, and natural-language interfaces [7,8,27]. BIM search engines demonstrate that encoding object relationships and standard-aware structure can improve retrieval relevance, especially when libraries become large and heterogeneous [8]. Natural-language BIM retrieval and BIM assistants further aim to reduce the engineering burden of querying and navigating BIM databases and authoring tools [7,12,27,28]. However, scene-level composition in BIM authoring requires reasoning over multiple contextual objects and their global coherence, which is not fully supported by retrieval pipelines centered on isolated queries or metadata-level matching.

These systems primarily improve “finding” components (retrieval of existing families) rather than solving “composing” under multi-object coherence. They typically do not encode reusable aesthetic compatibility logic or provide designer-grade rationales grounded in explicit design knowledge. KsDesign therefore reframes BIM interior assistance as constrained scene completion and introduces a design-logic knowledge graph mined from successful cases to support explainable composition beyond metadata-level retrieval.

2.2. Visual Compatibility Learning and Scene-Level Interior Modeling

Visual compatibility and style learning have been widely studied for recommendation settings dominated by aesthetic signals. In interior design and graphics, work has progressed from style compatibility modeling for furniture assets to scene-level modeling, with surveys summarizing key datasets, methods, and evaluation protocols [9,10]. In the BIM context, recent studies have begun to learn style-aware representations from BIM-derived renderings for BIM product style retrieval and recommendation, indicating rising interest in bridging aesthetics with BIM objects [11,12,13]. In parallel, controllable layout enhancement and physically grounded scene synthesis have improved realism and constraint handling through optimization or generative guidance, reinforcing the importance of scene-level coherence rather than pairwise similarity alone [18,19,20]. However, these methods often provide limited operational explainability in professional authoring settings, where designers need inspectable intermediate factors rather than only visual similarity scores.

Despite strong representational capacity, purely visual approaches typically do not guarantee functional or professional validity in BIM workflows, and explanation pathways are often limited. KsDesign adopts the scene-level viewpoint (holistic coherence) but anchors style representation in BIM-derived renderings/projections and couples it with explicit symbolic design logic to ensure validity and interpretability.

2.3. Knowledge Graphs, Neuro-Symbolic AI, and Explainable Recommendation

Knowledge graphs (KGs) provide structured representations of entities and relations and are increasingly used in AECO for transparent reasoning and validation. BIM–KG integration has been applied to tasks such as compliance checking and model auditing, leveraging explicit rules to support explainable decision support [14,29,30,31]. Standards-oriented developments (e.g., IFC/IDS) further support machine-checkable information exchange and validation workflows, strengthening the semantic substrate for downstream reasoning [3,32,33]. In parallel, XAI has been explicitly advocated for construction decision support, motivating explanation as a first-class objective rather than an optional add-on [15,17]. Neuro-symbolic AI surveys consolidate methods for integrating neural learning with symbolic reasoning to improve interpretability and robustness [11,23]. In AECO, a representative example is rule-grounded BIM-KG reasoning for compliance checking and model auditing, where explicit constraints yield auditable reasoning traces [14,29,30,31]. Analogously, in explainable recommendation, rationales are often grounded in structured evidence such as relation-aware importance weights and subgraph-based paths [24,25,26,34,35].

For recommender systems, multiple surveys and reviews establish knowledge graph-based and graph-neural recommenders as a major family of approaches, and recent work emphasizes evaluation and mechanisms of explainable recommendation (i.e., explaining “why”) [24,25,26]. Explainable graph-based recommender reviews also highlight attention-based fusion and graph attention as common building blocks for integrating relational signals, especially when explanations are expected to correspond to subgraph evidence [16,26,34,35].

Most AECO KGs focus on regulations, compliance, or lifecycle management, while design knowledge for aesthetic matching is scarce and often handcrafted. Conversely, general KG-based recommenders rarely address BIM-specific object representations and professional interior conventions. Importantly, some prior studies may partially combine BIM data, visual learning, and knowledge-based reasoning, but the coupling is often task-specific or loosely connected. Typical patterns include visual retrieval based on embeddings without explicit, reusable compatibility priors, or KG-driven reasoning for compliance without BIM-derived style representation learning that supports scene-level ranking. In contrast, KsDesign differs by unifying offline compatibility-prior mining, online BIM-derived style learning, and evidence-linked explanations into a single auditable decision chain for retrieval-based scene completion. KsDesign contributes an automated pathway to extract tacit interior matching logic from successful cases (association rules via FP-Growth [22] over detected object categories [21]) and fuses it with BIM-derived scene style cues through knowledge-guided fusion to produce both context-aware recommendations and designer-facing rationales grounded in explicit knowledge paths.

3. Problem Formulation and System Architecture

Before detailing the algorithmic implementation, we first formalize the BIM interior design recommendation task as a mathematical optimization problem and present the high-level architecture of the proposed KsDesign framework.

3.1. Mathematical Problem Formulation

In the context of intelligent BIM design, the core challenge lies in quantifying the abstract concept of “style” and modeling the implicit compatibility between furniture entities. We define the furniture universe as F = {f₁, f₂, …, f_N}; each furniture item f_i is defined by two modalities derived from the BIM environment: (1) Visual Modality, representing the geometric aesthetics via 2D projections of the 3D family models, and (2) Semantic Modality, representing explicit property data (e.g., OmniClass codes, family types). In our implementation, the semantic modality is used to (i) map BIM families to canonical furniture types T(.) for category-level reasoning, and (ii) encode immutable taxonomy/constraint edges in the KG (e.g., type membership), which are assigned maximal weights during attention computation.

a.: Furniture Style Representation:

To bridge the gap between visual perception and computational processing, we map each furniture item fi to a high-dimensional feature vector v(f_i) ∈ R^d. As illustrated in Figure 1, this vector space adheres to the principle of geometric proximity: the distance/similarity between two vectors correlates with their stylistic similarity. In this work, we adopt cosine similarity for ranking and retrieval to improve scale invariance in high-dimensional embeddings. Thus, a “Minimalist Chair” will be spatially closer to a “Modern Table” than to a “Baroque Cabinet” in this latent space.

b.: Logical Constraints as Probabilistic Weights:

While visual vectors capture aesthetic features, they lack explicit functional logic. To integrate engineering constraints, we model the dependency between furniture categories as a directed probabilistic graph. Let c_u and c_v denote the categories of furniture items f_u and f_v. We define the Style Influence Weight w_uv as the conditional probability of category c_v appearing given the presence of c_u:

w_{u v} = P (c_{v} | c_{u})

(1)

In this formulation, w_uv acts as a cognitive attention score. It quantifies the strength of the logical expectation: if a “Dining Table” (c_u) determines the style of a “Dining Chair” (c_v) with high probability, w_uv will be high, forcing the recommendation algorithm to prioritize their visual compatibility.

In practice, we instantiate this conditional dependency using confidence derived from association-rule mining, because it directly estimates P(c_v|c_u) and remains bounded in [0, 1], which is convenient for subsequent weighting and interpretation. The KG weights are used to modulate the relative influence of context categories, rather than acting as absolute scores. Other normalized association measures (e.g., lift or PMI) could also be used to parameterize w_uv and mitigate popularity bias; we adopt confidence here for its direct interpretability as P(c_v|c_u), and leave systematic comparisons as future work.

c.: The Recommendation Objective:

Given a partially furnished BIM scene P = {p₁, p₂, …, p_m}, the goal is to recommend a target furniture item g* from the candidate library G that maximizes the stylistic harmony with the current scene context.

Instead of matching g* against each p_i individually, we formulate this as a Context-Aware Ranking Problem. We define the Scene Style Vector v(P) as the weighted centroid of the existing items, governed by the influence weights w. The objective is to find the item g* from the candidate library G that maximizes the stylistic harmony with the scene vector:

g * = \underset{g \in G}{\arg \max} = (\cos (v (P), v (g)))

(2)

This formulation transforms the subjective design task into a rigorous vector retrieval optimization problem.

3.2. The Neuro-Symbolic System Architecture

To solve the optimization problem defined above, we present KsDesign, a holistic framework that integrates visual perception with knowledge reasoning. As shown in Figure 2, the architecture operates through two interconnected pipelines: Knowledge Graph Generation (Offline Learning) and Scene Style-Based Recommendation (Online Inference).

Phase 1: Knowledge Graph Generation (The Cognitive Base).

The left module of Figure 2 illustrates the acquisition of design knowledge. Unlike traditional methods that rely on manual rule-coding, KsDesign employs a data-driven approach. It ingests a gallery of curated, designer-quality interior scene images and utilizes a target detection algorithm to digitize the visual scenes into structured furniture lists. Subsequently, an association rule mining engine extracts latent co-occurrence and compatibility patterns (e.g., certain furniture categories frequently appear together in professionally designed scenes). These mined rules, along with furniture classification taxonomies, are structured into the Furniture-Matching Knowledge Graph (FMKG). This FMKG serves as the system’s “long-term memory,” storing explicit pairing rules derived from professional design data.

This decoupling isolates knowledge acquisition from deployment-time style embedding, reducing cross-domain bias while keeping the learned style space aligned with BIM family previews.

Phase 2: Scene Style-Based Recommendation (Online Inference and Retrieval).

The right module depicts the inference workflow. When a user inputs a target BIM scene, the system initiates a dual-path processing mechanism:

(1): Visual Path: It retrieves (or computes) the style vectors v(pi) for all existing furniture from BIM-derived 2D renderings/projections of the corresponding 3D BIM families and scene objects, and accesses them from the vector database.
(2): Logical Path: It queries the Knowledge Graph to determine the dynamic attention weights w based on the relationships between existing furniture categories and the target furniture type.

These two paths converge at the Scene Style Computing module, where BIM-based visual features are weighted by KG-derived logical rules to synthesize the global scene vector v(P). Finally, the similarity computing module ranks candidate BIM family items, outputting a recommendation list that is both visually cohesive and logically sound. Here, “generative” refers to synthesizing a context-aware scene representation v(P) under knowledge-guided attention for ranking and retrieval, rather than generating new geometry.

4. Methods

This paper presents KsDesign, a novel neuro-symbolic framework designed to bridge the semantic gap between aesthetic visual perception and explicit engineering constraints in BIM interior design. Unlike traditional methods that rely solely on geometric rules or visual similarity, KsDesign orchestrates a dual-process mechanism: (1) Automated Design Knowledge Extraction, which mines category-level co-occurrence and compatibility rules from curated designer scene images and organizes them into a structured Knowledge Graph; and (2) Explainable Scene Completion, which utilizes a knowledge-guided attention mechanism to generate stylistically harmonious and logically valid furniture compositions.

4.1. Visual-Semantic Knowledge Extraction and Graph Construction

The construction of a robust BIM design system is often hindered by the scarcity of large-scale, annotated, structured datasets. To overcome this, we propose a Data-Driven Knowledge Extraction Scheme that transforms curated interior scene imagery into explicit design rules. As shown in Figure 3, this process involves two stages: Visual Perception (detecting objects) and Cognitive Pattern Mining (extracting association rules).

Given the scarcity of structured BIM design knowledge, we decouple knowledge acquisition from style representation learning. In the offline stage, we mine category-level co-occurrence/compatibility rules from curated designer-quality interior reference images via association-rule mining and encode them into a weighted FMKG. In the online stage, style representations are extracted exclusively from BIM-derived 2D renderings/projections of 3D BIM family models and BIM scenes (building upon our automatic BIM preview generation pipeline [12]). The FMKG is then used as a logical prior to derive attention weights over context items, enabling the synthesis of a global scene style vector v(P) for ranking and retrieving compatible BIM candidates.

4.1.1. Visual Perception via Deep Learning

In real-world design scenarios, design logic is implicitly embedded within visual representations. For instance, the co-occurrence of a “minimalist sofa” and a “geometric coffee table” represents a latent stylistic constraint. To decode these visual semantics, we employ a deep learning-based perception module. Specifically, we utilize Faster R-CNN [21] to parse curated designer-quality interior reference images (design cases) into structured furniture instance lists, which are then abstracted into category-level itemsets for downstream rule mining.

Since generic pre-trained models lack the domain specificity for interior detailing, we fine-tuned the detector on a curated and manually annotated design-case corpus of 1000 indoor scene images, focusing on five core furniture categories: beds, tables, chairs, cabinets, and sofas. These categories were selected because they constitute the most frequent functional anchors in common residential furnishing workflows (particularly bedroom and living-room scenes), allowing us to validate the proposed neuro-symbolic fusion under a controlled yet practically representative setting. Importantly, this choice does not limit the generality of the framework: the detection-to-itemset abstraction and subsequent rule mining are category-agnostic, and additional categories can be incorporated by expanding the detector label set, the BIM candidate library, and the FMKG schema while keeping the same reasoning and weighting formulation. The images were annotated with bounding boxes using LabelImg 1.8.6, producing PASCAL VOC-style XML labels. The Faster R-CNN was initialized with an ImageNet-pretrained VGG16 backbone and fine-tuned via transfer learning (learning rate = 0.001; max iterations = 10,000; 256 sampled RoIs per image for optimizing the RoI head). This step digitizes each reference image into quantifiable furniture entities, which are then abstracted into category-level itemsets for transaction construction and subsequent rule mining; accordingly, we report the annotation protocol and fine-tuning settings for transparency, while the primary evaluation of KsDesign is conducted on downstream BIM retrieval-and-explanation tasks (Section 5).

Upon identifying these entities, the next step is to quantify their aesthetic attributes. The furniture style vector v(f_i) is extracted using the Deep Style Learning Model established in our previous work [12]. The same embedding pipeline has been evaluated on the five furniture categories in [12], achieving an average precision of 68.8% using style information alone, which supports its suitability as the visual style representation adopted in this work. In this paper, the model is applied to BIM-derived 2D renderings/projections of 3D family models and BIM scenes, and the curated designer images are not used for style representation learning. To emphasize aesthetic cues (e.g., texture and color correlations) while reducing sensitivity to object layout and geometric semantics, we utilize a VGG-19 network pre-trained on the ImageNet dataset as the backbone.

Following the principle of Neural Style Transfer, the style representation is mathematically defined by Gram Matrices, which compute the inner product correlations between feature maps. Formally, let F^l ∈ R^Cl^×Ml represents the feature map of the furniture image at layer l, where C_l is the number of channels and M_l is the flattened spatial dimension (width × height). The entry Fⁱ_kl denotes the activation of the i-th filter at position k. The Gram Matrix G^l ∈ R^Cl^×Cl or layer l is defined as the inner product of the vectorized feature maps:

G_{i j}^{l} = \sum_{k = 1}^{M_{l}} F_{i k}^{l} F_{j k}^{l}

(3)

This matrix G^l captures the texture information (e.g., correlations between wood grain and color) while discarding spatial structural information. The final style vector v(f_i) is obtained by flattening and concatenating the Gram matrices from the five selected layers (Conv1_1 to Conv5_1) after normalization (Equation (4)), thereby creating a robust multi-scale representation of the furniture’s aesthetic style.

v (f_{i}) = C o n c a t (F l a t (G^{l_{1}}), \dots, F l a t (G^{l_{5}}))

(4)

While recent foundation encoders (e.g., CLIP/ViT/DINO) could serve as alternative visual representations, a systematic benchmarking under a fair training and evaluation protocol is beyond the scope of this manuscript and is left as future work.

4.1.2. Mining Tacit Design Logic

Visual detection yields raw semantic data, but it does not automatically reveal the latent design rules. To elevate this data into explainable design logic, we apply the FP-Growth (Frequent Pattern-growth) algorithm [22]. This phase effectively simulates the experience accumulation of a human designer, mining high-confidence association rules to determine which furniture categories imply the presence of others.

During the preliminary data analysis, we observed that a single scene image often contains multiple instances of the same furniture type (e.g., multiple chairs), which introduces statistical noise. Therefore, we performed semantic cleaning on the furniture itemsets to remove redundant instances, ensuring logical clarity. In our corpus, this process yields 1000 initial category-level transactions (one per image); after removing single-category scenes and collapsing duplicated categories within a scene, 837 transactions were retained for FP-Growth mining. Subsequently, we deployed the FP-Growth algorithm to extract the core design patterns. The detailed mining process is illustrated in Figure 4 and described in the following steps:

Step 1: Construction of the Design Pattern Tree (FP-Tree).

The algorithm begins by constructing a compressed data structure to represent the global design distribution:

Frequency Quantification: We perform the first traversal of the original dataset to count the occurrence of each furniture category (1-furniture itemset). To filter out noise and retain statistically significant design elements, we set a minimum support threshold of S_min = 0.18.
Header Table Creation: The frequent furniture items are sorted in descending order of support to generate the item header table.
Tree Construction: In a second traversal, we reorder the items in each transaction record according to the header table. A root node is created (default count 1), and items from each record are inserted into the tree as branches. If an item shares the same path as an existing record, its node count is incremented; otherwise, a new branch is formed, linked by node pointers to maintain structural connectivity.

Step 2: Mining Frequent Design Contexts.

Using a bottom-up strategy based on the header table, we extract frequent furniture itemsets:

Path Extraction: For each specific furniture item, we trace all paths in the FP-tree that terminate at that item. The minimum node count along these paths constitutes the Conditional Pattern Base, which represents the specific contextual environment for that furniture.
Contextual Pruning: For instance, as shown in Figure 4, for the item “sofa,” the FP-tree reveals three distinct design paths: {chair:1, bed:1, table:1, sofa:1}, {chair:1, sofa:1}, and {table:1, sofa:1}. Consequently, the conditional pattern base for “sofa” is derived as {{chair, bed, table}:1, {chair}:1, {table}:1}.
Conditional FP-Tree Generation: We merge nodes within this base and prune items that fall below the support threshold. For example, if “bed” has a cumulative count of 1 (below threshold), it is removed. This results in a pruned Conditional FP-Tree for the sofa (e.g., branches <chair:2, table:1> and <table:1>). By applying this methodology recursively, we obtain the frequent itemsets and their respective support counts for all furniture categories.

Step 3: Generation of Neuro-Symbolic Association Rules.

Based on the frequent itemsets derived from Steps 1 and 2, the algorithm generates candidate association rules to encode design logic. For a furniture itemset X (antecedent, e.g., {Sofa}) and Y (consequent, e.g., {Coffee Table}), the strength of the rule X → Y is quantified by two statistical metrics: Support and Confidence.

First, Support measures the frequency of the co-occurrence in the global dataset D, filtering out statistically insignificant combinations:

Support (X \cup Y) = \frac{Count (X \cup Y)}{|D|}

(5)

Second, Confidence measures the reliability of the inference. Crucially, this metric serves as the data-driven estimation for the Style Influence Weight (w_uv) conceptually defined in Equation (1). It represents the conditional probability that a specific furniture type Y appears given the context X:

Confidence (X \to Y) = P (Y | X) = \frac{Support (X \cup Y)}{Support (X)}

(6)

A minimum confidence threshold Confmin = 0.4 is applied to prune weak associations. However, raw association rules alone lack the structural hierarchy inherent in BIM semantics. To rigorously unify mined probabilistic patterns with explicit BIM ontological constraints, we formalize the final Furniture-Matching Knowledge Graph as a weighted directed graph, FMKG = (V, E, W). Here, V denotes the set of furniture entities and categories derived from the BIM ontology, and E comprises both ontological links and mined association links.

In our implementation, V includes 659 BIM family entities and 5 category nodes (|V| = 664). The edge set E contains 659 deterministic membership edges linking each family to its category, and 17 mined category-to-category association edges after applying the minimum confidence threshold (Confmin = 0.4), resulting in |E| = 676 directed edges in total. The retained directed rules and their confidence values are listed in Appendix A, Table A1. The retained association edges cover all five categories, with 3–4 outgoing rules per category, and their confidence values fall within [0.408, 0.711], providing a compact yet interpretable logical prior for the subsequent attention-based scene completion. Overall, the directed FMKG has an average out-degree of |E|/|V| ≈ 1.02; focusing on the compact category-level association subgraph, the average out-degree is 17/5 = 3.4.

Unlike traditional binary graphs, the edges in the FMKG are assigned continuous strengths through a unified edge-weighting function H:V × V → [0, 1], where H(u, v) = 0, if (u, v) ∉ E. This function maps heterogeneous relations onto a unified numeric scale for downstream computation. Specifically, for any directed edge connecting entity u to v, the weight w_uv is edge-specifically assigned to unify deterministic ontological links and probabilistic aesthetic associations:

w_{u v} = H (u, v) \{\begin{matrix} 1, \\ Conf (u \to v), \\ 0, \end{matrix} \begin{matrix} if (u, v) \in R_{onto} \\ if (u, v) \in R_{logic} \\ otherwise \end{matrix}

(7)

This mathematical formulation (Equation (7)) serves as a structured prior (i.e., a logical regularizer) for the system. It treats immutable BIM constraints (e.g., category membership and hierarchical relations) as hard semantic links with a unit weight, while scaling the influence of mined aesthetic associations by their statistical confidence. The resulting FMKG therefore provides a principled basis for the subsequent attention-based scene completion, enabling both controllable influence modulation and traceable rationale inspection.

4.2. Furniture Recommendation Using Scene Style Learning

Traditional style-based recommendation systems often treat items in isolation, failing to account for the holistic harmony of a multi-object BIM environment. To address this, KsDesign introduces a Context-Aware Scene Style Learning module. This approach shifts the paradigm from simple item matching to semantic scene completion, where the system treats the BIM environment not as a static collection of objects, but as a dynamic, weighted feature field governed by design logic.

The overall workflow, as illustrated in Figure 5, demonstrates how visual perception and logical reasoning converge. When a target scene is input—such as the example scene in Figure 5 containing four distinct furniture entities—the system does not merely average their styles. Instead, it initiates a neuro-symbolic fusion process. The system first parses the existing furniture instances (e.g., bed, cabinet, etc.) and simultaneously queries two distinct knowledge sources: the Furniture Style Vector Database for extracting implicit visual features, and the Furniture-Matching Knowledge Graph (constructed in Section 4.1) for retrieving explicit association rules. These two streams of information are then aggregated to form a global representation of the scene’s stylistic intent.

Formally, we quantify this process as follows. Let the current BIM scene be denoted as P = {p_j|j = 1, 2, …, m}, where each p_j represents an existing furniture instance with a corresponding visual style vector v(p_j). To recommend a target furniture type t, the system must first determine the “attention weight” (w_j) of every existing item p_j relative to t. This weight is dynamically retrieved from the Knowledge Graph: if a strong association rule exists (e.g., a specific table style strongly implies a specific cabinet style), w_j is set to the rule’s confidence score; if the item belongs to the same category as the target (T(p_j) = t), w_j is maximized to 1 to ensure self-consistency; conversely, if no logical association exists, w_j is minimized to 0. Here, the type mapping T(pj) is obtained from BIM semantic metadata (e.g., family type/OmniClass), ensuring that category constraints are grounded in explicit engineering semantics rather than visual cues. While the current prototype implicitly targets common residential room types (bedroom/living room), room function can be explicitly incorporated by conditioning the KG-derived weight lookup on a room-type label extracted from BIM room/space metadata (e.g., using w = H(u, v|r), enabling function-aware priors without changing the overall formulation.

Based on these cognitive weights, the system computes the Context-Aware Scene Style Vector v(P) as the weighted centroid of the scene:

v (P) = \frac{\sum_{j = 1}^{m} w_{j} v (p_{j})}{\sum_{j = 1}^{m} w_{j}}

(8)

Because v(P) is formulated as a weighted centroid in Equation (8), the contribution of each context item is normalized by Σ_jw_j within the scene; therefore, the confidence-based weights serve as relative priors that adjust the balance among context items rather than introducing an unnormalized scale.

This weighted vector v(P) effectively represents the “ideal style” of the missing furniture, inferred from the context of the room. Finally, to generate the recommendation, the system calculates the cosine similarity between v(P) and every candidate furniture vector v(g_i) in the library G:

\cos (v (P), v (g_{i})) = \frac{v (P) v^{T} (g_{i})}{|v (P)| |v (g_{i})|}

(9)

The candidate items are then ranked based on these similarity scores, and the top-K most harmonious items are output as the recommendation set B(P, K), as detailed in Algorithm 1. By integrating explicit design knowledge into the implicit vector space, KsDesign ensures that the generated recommendations are not only visually similar but also logically explained by the underlying design rules.

Algorithm 1. Recommended algorithm for BIM scene

Input: BIM furniture style vector library G = {v(g_i)|i = 1, 2, …, n}

BIM scene context: P = {p_j|j = 1, 2, …, m}

Output: Top-K harmonious furniture list B(P, K)

1: S = []//List to store similarity scores

2: t = target furniture type required by user

3://Step 1: Compute Scene Style Vector

4: for each item p_j in P:

5: w_j = SearchRule (T(p_j), t)//Query weight from FMKG

6: v(P) = calculate (w_j, v(p_j))//Equation (8)

7://Step 2: Rank Candidates

8: for each candidate g_i in G:

9: s_i = CosineSimilarity(v(P), v(g_i))//Equation (9)

10: Add pair (g_i, s_i) to S

11: Sort S in descending order based on s_i

12: B = {item from S[0], …, item from S[K − 1]}

13: return B

5. Experiments

Beyond merely assessing recommendation accuracy, the experiments aim to validate two core hypotheses: (1) whether the neuro-symbolic fusion of visual features and explicit knowledge yields higher stylistic consistency than purely visual approaches; and (2) whether the knowledge-guided attention mechanism effectively simulates the cognitive decision-making process of human designers in complex BIM environments. In addition, we present an in-authoring visualization result in Revit to verify that the proposed explanations are operational and can directly support family insertion in practical workflows.

5.1. Experimental Setup

Experimental Environment: The validation was conducted on a workstation equipped with an Intel(R) Core(TM) i5-13500H CPU, 16 GB RAM, and an NVIDIA GeForce 3050 GPU, running Windows 11. The algorithms were implemented in Python 3.6, utilizing PyTorch 1.10.2 for deep learning computations.

Dataset Construction and Scenario Generation: To validate the scene-completion capability of our method, we used real-world residential BIM room models from public libraries. To bridge 3D BIM entities with the visual learning module, we processed all BIM models using the automated optimal-viewpoint projection pipeline in [12], which produces standardized 2D renderings for style embedding. Based on this pipeline, we constructed a candidate library comprising five core furniture categories—bed (107), table (121), chair (56), sofa (256), and cabinet (119). To reflect practical furnishing workflows while avoiding trivial random combinations, we systematically composed 10 BIM scenes (Table 1), primarily covering bedroom and living-room settings, with varied context compositions and stylistic constraints. This design allows us to evaluate whether the proposed knowledge-guided attention mechanism remains effective across different scene contexts rather than in a single hand-picked example.

Ground Truth Establishment via Expert Consensus: A critical challenge in evaluating aesthetic design is the subjectivity of “correctness”. To establish a reliable Ground Truth that reflects professional tacit knowledge, we recruited 20 practitioners from interior design-related fields. For each to-be-designed BIM scene, these experts selected the most stylistically compatible furniture from the dataset. To mitigate individual bias, a piece of furniture was labeled as “Stylistically Consistent” only if it was selected by a consensus of more than 10 experts. This rigorous process ensures that our evaluation benchmarks reflect human-level aesthetic standards. Together with the evaluation across all 10 scenes, this setting helps reduce the risk that the reported gains are driven by a single scene or a single subjective preference.

Evaluation Metrics: We employ two quantitative metrics to measure the performance of KsDesign.

Precision (Aesthetic Hit Rate): Indicates the proportion of recommended results (I(P)) that align with the expert consensus (A(P)). High precision signifies the system’s ability to filter out stylistically dissonant items.

P r e c i s i o n = \frac{|I (P) \cap A (P)|}{|I (P)|}

(10)

Recall (Design Coverage): Measures the proportion of valid design options retrieved by the system relative to the total available valid options.

R e c a l l = \frac{|I (P) \cap A (P)|}{|A (P)|}

(11)

Parameter setting: In all experiments, for a given BIM scene input, the system generates the Top-3 (K = 3) recommendations for each missing furniture category sequentially.

Revit-side explainability visualization: In addition to offline precision/recall evaluation, we integrate the recommendation output into Revit to visualize the KG-derived influence weights and evidence paths for a given target type, allowing designers to insert the selected family into the current scene. This qualitative result is reported in Section 5.2.4.

5.2. Results and Mechanism Analysis

To provide a comprehensive assessment of the proposed framework, our evaluation is structured into four phases. First, we conduct a qualitative case study to verify the effectiveness of the visual-semantic feature extraction. Second, we perform a comparative ablation study to demonstrate the necessity of the knowledge-guided attention mechanism. Third, we analyze the statistical robustness of the method across distinct design scenarios to validate its generalization capability. Finally, we report an in-authoring explainability visualization in Revit to illustrate how the KG-derived influence weights and evidence paths support designer-facing rationale inspection and direct family insertion.

5.2.1. Efficacy of Visual-Semantic Feature Extraction

We first evaluate the effectiveness of KsDesign using a representative scene (containing a chair and a bed) as a case study. The system computed the Context-Aware Scene Style Vector and retrieved the most compatible items. The quantitative results for this case study are summarized in Table 2. Notably, the computed cosine similarity scores remain high across all recommendations, with even the lowest score reaching 0.8694. This high baseline indicates that our feature extraction module successfully maps diverse furniture entities into a cohesive vector space, ensuring that all generated candidates maintain a high degree of visual harmony.

When validated against professional ground truth, as shown in Table 3, the system demonstrates exceptional alignment with human intent. Specifically, 13 out of 15 recommended items were validated by experts, yielding a Precision of 86.7%. This result confirms that by weighting visual features with knowledge-based attention, KsDesign effectively filters out stylistically dissonant items, achieving a design precision that purely geometric or random methods cannot attain.

5.2.2. Comparative Analysis: Global Context vs. Local Features

To demonstrate the superiority of our neuro-symbolic approach, we performed an ablation study comparing KsDesign against traditional single-modal baselines (i.e., recommending based solely on the style of one existing piece of furniture, without the global knowledge graph).

The limitations of local-feature approaches become evident when analyzing the results in Table 4 and Table 5. When recommendations are based strictly on the style of a single chair, the precision drops significantly to 60% (Table 4). Similarly, relying solely on the bed’s aesthetics results in a further performance decline to 53.3% (Table 5).

Comparing the global approach (Table 3) with these local baselines (Table 4 and Table 5) reveals the “reasoning” capability of our system. While the visual correlation between “Chair” and “Table” is locally strong enough to yield identical results across methods, the system’s behavior changes for the chair category itself. As observed, the ranking order differs between the single-modal approach and KsDesign. This indicates that the Knowledge Graph intervened: by considering the global context (including the Bed), the system adjusted the attention weights, realizing that what matches the “Chair” best locally might not act as the optimal component for the “Bedroom Context” globally. The Knowledge Graph effectively acts as a logical regularizer, correcting local biases introduced by single-object matching.

5.2.3. Robustness Across Diverse Scenarios

To ensure these findings are not anecdotal, we expanded the evaluation to 10 distinct BIM scenes. The comprehensive performance metrics comparing KsDesign against traditional baselines are tabulated in Table 6.

As evidenced by the statistical data, KsDesign consistently outperforms local-feature methods across all scenarios. The proposed method achieves an average increase of 23.5% in precision, with a maximum performance boost of 40% in complex scenes, while average recall improved by 15.5%. These results demonstrate that as the complexity of the scene increases (i.e., more furniture constraints), the advantage of KsDesign becomes more pronounced. While simple visual matching struggles with conflicting stylistic cues, our Context-Aware Scene Style Learning module successfully aggregates these cues into a unified, harmonious design solution.

5.2.4. In-Authoring Explainability Visualization in Revit

To further examine whether the proposed explanations are operational in real BIM authoring workflows, we integrate the recommendation output with a Revit-side visualization. In this setting, the target furniture type t is specified by the user (e.g., cabinet), and the system ranks candidate BIM families within this type by maximizing stylistic harmony with the current scene context. The explanation interface exposes the intermediate reasoning variables used by KsDesign—namely, the FMKG-derived influence weights w_j and the corresponding evidence paths that support each weight—so that designers can inspect why a candidate is ranked higher and then insert the selected family into the active scene.

Figure 6 illustrates the visualization layout. The graph summarizes how each existing context object p_j contributes to inferring the ‘ideal’ style for the user-specified target type t: edges are annotated with the influence weights wj retrieved from the FMKG (confidence-based logical priors), which are subsequently used to compute the context-aware scene style vector v(P) (Equation (8)) for ranking and retrieval. Rather than selecting the furniture category, the visualization explains how the current scene context modulates the style inference for the chosen target type, making the recommendation rationale transparent and directly actionable in Revit.

The visualization is also useful for inspecting how the balance between FMKG-derived priors and visual evidence varies across scenes. When a few FMKG edges carry relatively large w_j, the inferred scene representation v(P) tends to be driven by these logical priors; when most w_j values are small due to sparse FMKG links, the ranking is predominantly governed by the visual similarity in the learned style space. By making w_j and its supporting evidence path explicit, the interface allows designers to identify such dominance patterns and interpret recommendations in cases where the logical prior and the perceived visual context are not fully aligned.

Taken together, the experimental results confirm the feasibility and superiority of the proposed framework. By integrating visual perception with logical reasoning, KsDesign not only significantly improves recommendation metrics but, more importantly, ensures that the generated BIM scenes satisfy the holistic stylistic consistency required by professional standards. This verifies that our neuro-symbolic approach successfully bridges the semantic gap between engineering constraints and aesthetic design.

6. Discussion

This section interprets the reported results from three angles—problem positioning, mechanism-level explanation, and robustness/scope. The discussion is organized accordingly in Section 6.1, Section 6.2, Section 6.3, Section 6.4 and Section 6.5.

6.1. Addressing the “Semantic Gap” in BIM Interior Assistance

BIM-oriented automation is strong at encoding explicit, checkable constraints through formal rules and compliance-style checking, yet it is comparatively less suited to capturing the implicit aesthetic regularities that designers apply during scene completion [29,30,31]. KsDesign targets this practical gap by injecting an interpretable, data-driven co-occurrence prior (“what tends to appear with what”) while preserving appearance-driven intent derived from BIM renderings. Rather than replacing symbolic reasoning, the framework complements constraint-centric BIM workflows by adding an evidence-backed aesthetic prior that is both controllable and auditable. Importantly, “scene completion” is treated as candidate classification and retrieval: the target type is specified, candidates are ranked from a BIM family library [7,8], and the intermediate variables that drive ranking remain inspectable in the authoring environment. This framing aligns expectations with the intended contribution and avoids conflating the method with generative synthesis. In contrast to purely rule-based checking pipelines and purely appearance-driven similarity retrieval [7,8,10], KsDesign makes the aesthetic prior explicit, bounded, and traceable within the authoring loop, enabling scene-level completion decisions to be justified through inspectable intermediate variables rather than opaque scoring alone.

6.2. Interpreting the Empirical Gains: Why Neuro-Symbolic Fusion Helps

The main empirical implication is that fusing scene-level visual style representations with FMKG-derived priors improves recommendation stability at the scene level while retaining authoring-time interpretability [23,25,26]. The FMKG does not function as a hard-rule system; instead, it provides bounded conditional priors through attention weights w_j ∈ [0, 1] that modulate how strongly each contextual object contributes to the target style estimate. This design makes the fusion behavior auditable: when a small set of high-confidence priors dominates, the inference becomes knowledge-dominant; when priors are sparse, it becomes visual-dominant and relies primarily on similarity in the embedding space. Because the explanation interface exposes the same weights and evidence paths that participate in ranking, explanations remain coupled to the decision pathway rather than being post hoc narratives [16,24]. Therefore, the reported gains should be interpreted as the effect of explicit, traceable modulation of scene representation, not as an opaque improvement from increased backbone capacity.

6.3. Representation Choices: Gram-Matrix Style Features and Credible Published Alternatives

Gram-matrix-based style descriptors are adopted because they capture correlations of texture and material cues and are comparatively less sensitive to object layout, matching the operational notion of “style consistency” required for interior scene completion [36]. In practice, this representation emphasizes appearance regularities that are salient to design harmony while remaining compatible with the BIM-to-2D rendering pipeline used for authoring-time retrieval [12,13]. At the same time, modern foundation encoders provide credible alternatives for benchmarking (e.g., CLIP, ViT, and DINO) [37,38,39], particularly when stronger language alignment or broader visual robustness is desired. In this manuscript, the visual embedding module is intentionally kept consistent so that the contribution of KG-guided fusion and in-authoring explainability can be isolated and validated without confounding factors. A systematic encoder comparison is therefore best positioned as a controlled benchmark extension after the fusion mechanism and its interpretability properties are established. This design choice keeps the discussion centered on the fusion mechanism and the explainability interface, while leaving representation benchmarking as a separable axis of evaluation.

6.4. Robustness of FMKG Construction: Thresholds, Sparsity, and Normalized Association Measures

FMKG construction follows association-rule mining, where support and confidence provide transparent criteria for extracting co-occurrence regularities [22]. Confidence is used deliberately as the edge weight because it corresponds directly to the conditional dependency in the formulation and remains bounded in [0, 1], which is suitable for exposing category-level influence as an interpretable prior. The weights are further normalized within each scene during the weighted centroid computation, so their role is to modulate influence rather than to act as unbounded scores. Under this design, the mining thresholds primarily control the sparsity and compactness of the category-level association layer and therefore affect the balance between knowledge-dominant regularization and visual-dominant retrieval, while keeping the fusion behavior auditable through explicit weights. As a result, adjusting S_min and Conf_min typically leads to gradual shifts in sparsity/coverage and the knowledge-visual balance, rather than brittle changes driven by any single association.

A related consideration is frequency bias: since confidence can be affected by globally frequent categories, normalized interestingness measures (e.g., lift or PMI/NPMI) are plausible alternatives for weighting association strength [40]. However, these measures change the semantics of edge weights and can interact with thresholding by altering which associations are retained as “salient.” Their impact is therefore best evaluated under a matched-threshold, controlled ablation protocol that separates changes in sparsity from changes in weighting semantics. Within the current scope, confidence offers the most direct designer-interpretable conditional prior, while normalized measures remain meaningful candidates for future systematic comparison when scaling to richer category sets and larger case corpora.

6.5. Scope, Failure Modes, and Extensions That Preserve the Paper’s Core Framing

The evaluation is designed as a controlled but practically representative setting to validate the end-to-end neuro-symbolic pathway and the in-authoring explainability mechanism under typical residential completion tasks. Within this scope, two qualitative failure tendencies are most relevant. First, knowledge-dominant behavior can occur when strong priors over-regularize eclectic or intentionally contrasting intents; in such cases, the weighted inference may lean toward conventional co-occurrence logic. Second, visual-dominant behavior can arise when priors are sparse; the ranking then relies primarily on similarity in the embedding space, and ambiguities in BIM renderings (e.g., limited material expressiveness or visually confusable cues) can mislead retrieval.

The Revit-side explainability visualization is therefore integral: by exposing FMKG-derived influence weights and evidence paths, it makes it possible to inspect whether a recommendation is driven mainly by a small number of high-weight priors or by distributed visual similarity across context objects, and to diagnose conflicts when logical evidence and visual expectations diverge [16,24]. Extensions that preserve the same auditable formulation include expanding the label set and FMKG schema to cover more furniture categories, conditioning priors on room function via BIM room/space metadata, and benchmarking stronger published visual encoders under a consistent protocol while keeping the explanation interface as a first-class requirement [37,38,39].

7. Conclusions

This study addresses the persistent semantic dissonance between rigor in engineering constraints and implicit aesthetic perceptions within intelligent BIM design. By proposing KsDesign, a neuro-symbolic framework, we successfully bridge this gap, orchestrating visual semantic perception with explicit knowledge reasoning. Unlike traditional methods that treat furniture recommendation as isolated retrieval tasks, our approach reframes it as a context-aware inference and scene completion process, where a knowledge-guided attention mechanism simulates the cognitive intuition of professional designers by leveraging category-level compatibility rules encoded in a Furniture-Matching Knowledge Graph, and synthesizing a global scene style vector v(P) from BIM-derived style embeddings to rank and retrieve stylistically harmonious candidates.

Empirical validation, grounded in professional expert consensus, confirms the robustness of this dual-process mechanism. In our setting, curated designer images are used solely for association-rule mining to construct the FMKG, while style representations are learned and extracted exclusively from BIM-derived 2D renderings of 3D family models and BIM scenes. The framework achieved a recommendation precision of 86.7% in complex scenarios, outperforming traditional single-modal style learning approaches by an average of 23.5%. Crucially, these results demonstrate that the integration of global semantic constraints effectively acts as a logical regularizer, correcting the local biases inherent in purely visual models. This ensures that computational outputs do not merely satisfy geometric proximity but align with high-level design logic, creating spaces that are both logically valid and aesthetically coherent.

Beyond performance metrics, this work advances the explainability of AI in the AEC domain, fostering greater trust by providing transparent design rationales grounded in knowledge graph paths and instantiated as an in-authoring Revit visualization that supports rationale inspection and direct family insertion. While the current iteration focuses on stylistic compatibility, the underlying graph-based architecture offers a versatile foundation for future research. Specifically, subsequent work will aim to extend this logic to geometric layout optimization and integrate Large Language Models (LLMs) to automate the expansion of design knowledge, paving the way for natural language-driven interactive BIM generation.

Author Contributions

J.F.: Conceptualization, Methodology, Formal analysis, Writing—Original draft preparation; R.L.: Conceptualization, Methodology, Formal analysis, Writing—Original draft preparation; X.L.: Conceptualization, Methodology, Writing—Review and Editing, Supervision; X.Z.: Methodology, Writing—Review and Editing; M.W.: Investigation, Validation, Writing—review & editing; J.Y.: Validation, Writing—review & editing; H.Y.: Conceptualization, Methodology, Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Specialized Research Fund for the State Key Laboratory of Solar Activity and Space Weather.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Author Mengmeng Wang was employed by the company Capital Engineering & Research Incorporation Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Furniture collocation rules.

Number	Association Rules	Confidence (%)
1	{sofa} → {chair}	40.8
2	{cabinet} → {sofa}	44.2
3	{sofa} → {cabinet}	47.0
4	{sofa} → {bed}	54.9
5	{sofa} → {table}	67.3
6	{table} → {sofa}	53.5
7	{cabinet} → {table}	57.0
8	{bed} → {sofa}	41.2
9	{bed} → {chair}	43.0
10	{chair} → {bed}	46.1
11	{table} → {cabinet}	48.2
12	{cabinet} → {chair}	62.8
13	{chair} → {cabinet}	53.7
14	{bed} → {cabinet}	51.5
15	{cabinet} → {bed}	64.6
16	{table} → {chair}	70.3
17	{chair} → {table}	71.1

References

Abanda, F.H.; Balu, B.; Adukpo, S.E.; Akintola, A. Decoding ISO 19650 Through Process Modelling for Information Management and Stakeholder Communication in BIM. Buildings 2025, 15, 431. [Google Scholar] [CrossRef]
Sacks, R.; Eastman, C.; Lee, G.; Teicholz, P. BIM Handbook: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
Aydın, M. Analyzing the impact of ISO 16739-1: 2024 (Industry Foundation Classes, IFC) on data sharing and Building Information Modeling (BIM) collaboration in the construction industry. J. Archit. Sci. Appl. 2025, 10, 157–174. [Google Scholar] [CrossRef]
Zabin, A.; González, V.A.; Zou, Y.; Amor, R. Applications of machine learning to BIM: A systematic literature review. Adv. Eng. Inform. 2022, 51, 101474. [Google Scholar] [CrossRef]
Jadidoleslami, S.; Saghatforoush, E. Unveiling diverse AI applications in BIM: A quantitative map of techniques for 10 dimensions. Archit. Eng. Des. Manag. 2025, 1–19. [Google Scholar] [CrossRef]
Rane, N. Integrating building information modelling (BIM) and artificial intelligence (AI) for smart construction schedule, cost, quality, and safety management: Challenges and opportunities. In Cost, Quality, and Safety Management: Challenges and Opportunities (September 16, 2023); SSRN: Rochester, NY, USA, 2023. [Google Scholar]
Wu, S.; Shen, Q.; Deng, Y.; Cheng, J. Natural-language-based intelligent retrieval engine for BIM object database. Comput. Ind. 2019, 108, 73–88. [Google Scholar] [CrossRef]
Molsa, M.; Demian, P.; Gerges, M. BIM search engine: Effects of object relationships and information standards. Buildings 2023, 13, 1591. [Google Scholar] [CrossRef]
Wang, Y.; Liang, C.; Huai, N.; Chen, J.; Zhang, C. A survey of personalized interior design. Comput. Graph. Forum 2023, 42, e14844. [Google Scholar] [CrossRef]
Liu, T.; Hertzmann, A.; Li, W.; Funkhouser, T. Style compatibility for 3D furniture models. ACM Trans. Graph. (TOG) 2015, 34, 1–9. [Google Scholar] [CrossRef]
De Raedt, L.; Dumančić, S.; Manhaeve, R.; Marra, G. From statistical relational to neuro-symbolic artificial intelligence. arXiv 2020, arXiv:2003.08316. [Google Scholar] [CrossRef]
Zhou, X.; Ma, C.; Wang, M.; Guo, M.; Guo, Z.; Liang, X.; Han, J. BIM product recommendation for intelligent design using style learning. J. Build. Eng. 2023, 73, 106701. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Y.; Zhou, X.; Su, L.; Hu, Q. BIM Style Restoration Based on Image Retrieval and Object Location Using Convolutional Neural Network. Buildings 2022, 12, 2047. [Google Scholar] [CrossRef]
Peng, J.; Liu, X. Automated code compliance checking research based on BIM and knowledge graph. Sci. Rep. 2023, 13, 7065. [Google Scholar] [CrossRef]
Love, P.E.; Matthews, J.; Fang, W.; Porter, S.; Luo, H.; Ding, L. Explainable artificial intelligence in construction: The content, context, process, outcome evaluation framework. arXiv 2022, arXiv:2211.06561. [Google Scholar] [CrossRef]
Markchom, T.; Liang, H.; Ferryman, J. Review of explainable graph-based recommender systems. ACM Comput. Surv. 2023, 58, 1–35. [Google Scholar] [CrossRef]
Zarghami, S.; Kouchaki, H.; Yang, L.; Rodriguez, P.M. Explainable Artificial Intelligence in Generative Design for Construction. In EC3 Conference 2024; European Council on Computing in Construction: St-Niklaas, Belgium, 2024; Volume 5. [Google Scholar]
Leimer, K.; Guerrero, P.; Weiss, T.; Musialski, P. Layoutenhancer: Generating good indoor layouts from imperfect data. In SIGGRAPH Asia 2022 Conference Papers; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar]
Yang, Y.; Jia, B.; Zhi, P.; Huang, S. Physcene: Physically interactable 3d scene synthesis for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Bai, T.; Bai, W.; Chen, D.; Wu, T.; Li, M.; Ma, R. FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed]
Han, J.; Jian, P.; Yin, Y. Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 2000, 29, 1–12. [Google Scholar] [CrossRef]
Marra, G.; Dumančić, S.; Manhaeve, R.; De Raedt, L. From statistical relational to neurosymbolic artificial intelligence: A survey. Artif. Intell. 2024, 328, 104062. [Google Scholar] [CrossRef]
Jung, H.; Park, H.; Lee, K. Enhancing recommender systems with semantic user profiling through frequent subgraph mining on knowledge graphs. Appl. Sci. 2023, 13, 10041. [Google Scholar] [CrossRef]
Zhang, J.-C.; Zain, A.M.; Zhou, K.-Q.; Chen, X.; Zhang, R.-M. A review of recommender systems based on knowledge graph embedding. Expert Syst. Appl. 2024, 250, 123876. [Google Scholar] [CrossRef]
Sharma, K.; Lee, Y.-C.; Nambi, S.; Salian, A.; Shah, S.; Kim, S.-W.; Kumar, S. A survey of graph neural networks for social recommender systems. ACM Comput. Surv. 2024, 56, 1–34. [Google Scholar] [CrossRef]
Hellin, S.; Nousias, S.; Borrmann, A. Natural Language Information Retrieval from BIM Models: An LLM-Based Multi-Agent System Approach. In EC3 Conference 2025; European Council on Computing in Construction: St-Niklaas, Belgium, 2025; Volume 6. [Google Scholar]
Elsaka, O.; Du, C.; Nousias, S.; Borrmann, A. BIM Command Recommendation Using Dynamic Graph Neural Network; International Workshop on Intelligent Computing in Engineering: Glasgow, UK, 2025. [Google Scholar]
Doukari, O.; Greenwood, D.; Rogage, K.; Kassem, M. Automated compliance checking (ACC) classification approach. ITcon 2022, 27, 335. [Google Scholar] [CrossRef]
Beach, T.; Yeung, J.; Nisbet, N.; Rezgui, Y. Digital approaches to construction compliance checking: Validating the suitability of an ecosystem approach to compliance checking. Adv. Eng. Inform. 2024, 59, 102288. [Google Scholar] [CrossRef]
Nuyts, E.; Bonduel, M.; Verstraeten, R. Comparative analysis of approaches for automated compliance checking of construction data. Adv. Eng. Inform. 2024, 60, 102443. [Google Scholar] [CrossRef]
Gourabpasi, H.; Jalaei, A.F.; Ghobadi, M. Developing an openBIM Information Delivery Specifications Framework for Operational Carbon Impact Assessment of Building Projects. Sustainability 2025, 17, 673. [Google Scholar] [CrossRef]
Fischer, S.; Urban, H.; Schranz, C.; Loibl, P.; van Berlo, L. Extending Information Delivery Specifications for digital building permit requirements. Dev. Built Environ. 2024, 20, 100560. [Google Scholar] [CrossRef]
Ashish, V. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Ao, L.; Qi, R. Early-Stage Graph Fusion with Refined Graph Neural Networks for Semantic Code Search. Appl. Sci. 2025, 16, 12. [Google Scholar] [CrossRef]
Cai, Q.; Ma, M.; Wang, C.; Li, H. Image neural style transfer: A review. Comput. Electr. Eng. 2023, 108, 108723. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning; PmLR: New York, NY, USA, 2021. [Google Scholar]
Alexey, D. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Caron, M.; Touvron, H.; Misra, I.; Jegou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
Chiarcos, C.; Gkirtzou, K.; Ionov, M.; Kabashi, B.; Khan, F.; Truică, C.O. Modelling collocations in OntoLex-FrAC. In Proceedings of Globalex Workshop on Linked Lexicography Within the 13th Language Resources and Evaluation Conference; European Language Resources Association: Marseille, France, 2022. [Google Scholar]

Figure 1. Furniture style vector.

Figure 2. Overview of KsDesign with offline knowledge graph generation and online scene style-based recommendation.

Figure 3. Furniture-matching knowledge graph based on intelligent mining building process.

Figure 4. Framework of visual-semantic knowledge extraction and graph construction.

Figure 5. Furniture recommendation process using scene style learning.

Figure 6. Revit-integrated explainability visualization for KsDesign.

Table 1. BIM scene dataset.

No.	1	2	3	4	5
scene
No.	6	7	8	9	10
scene

Table 2. Recommendation results and style similarity of a BIM scene.

	Chair	Table	Sofa	Bed	Cabinet
BIM Scene	Chair	Table	Sofa	Bed	Cabinet
	0.9069	0.8851	0.9343	0.9462	0.8934
	0.9009	0.8694	0.9327	0.9387	0.8734
	0.8923	0.8664	0.9265	0.9304	0.8730

Table 3. Recommended results and precision of a BIM scene.

	Chair	Table	Sofa	Bed	Cabinet
BIM Scene	Chair	Table	Sofa	Bed	Cabinet
	1	1	1	1	1
	1	1	1	1	1
	0	0	1	1	1

Table 4. Recommendation results and the precision of a chair BIM furniture.

	Chair	Table	Sofa	Bed	Cabinet
Furniture	Chair	Table	Sofa	Bed	Cabinet
	1	1	1	0	1
	1	1	0	0	0
	0	0	1	1	1

Table 5. Recommendation results and the precision of a bed BIM furniture.

	Chair	Table	Sofa	Bed	Cabinet
Furniture	Chair	Table	Sofa	Bed	Cabinet
	0	1	1	1	0
	0	0	1	1	1
	0	1	1	0	0

Table 6. Precision and recall of recommendation results in different BIM scenes.

	Furniture				Scene	Furniture				Scene
Reference style	1.1		1.2		1	2.1		2.2		2
Precision (%)	53.3		60.0		86.7	53.3		46.7		80.0
Recall (%)	30.8		34.6		50.0	34.8		30.4		52.2
Reference style	3.1		3.2		3	4.1		4.2		4
Precision (%)	40.0		60.0		66.7	33.3		53.3		60.0
Recall (%)	25.0		37.5		41.7	23.8		38.1		42.9
Reference style	5.1		5.2		5	6.1		6.2		6
Precision (%)	46.7		60.0		80.0	33.3		46.7		53.3
Recall (%)	33.3		42.9		57.1	22.7		31.8		36.4
Reference style	7.1		7.2		7	8.1		8.2		8
Precision (%)	53.3		53.3		73.3	46.7		46.7		66.7
Recall (%)	32.0		32.0		44.0	31.8		31.8		45.5
Reference style	9.1	9.2		9.3	9	10.1	10.2		10.3	10
Precision (%)	60.0	46.7		40.0	86.7	53.3	46.7		33.3	66.7
Recall (%)	32.1	25.0		21.4	46.4	26.1	30.4		21.7	43.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, J.; Luo, R.; Li, X.; Zhou, X.; Wang, M.; Yin, J.; Yuan, H. Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion. Appl. Sci. 2026, 16, 1530. https://doi.org/10.3390/app16031530

AMA Style

Feng J, Luo R, Li X, Zhou X, Wang M, Yin J, Yuan H. Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion. Applied Sciences. 2026; 16(3):1530. https://doi.org/10.3390/app16031530

Chicago/Turabian Style

Feng, Junfu, Ruidan Luo, Xuechao Li, Xiaoping Zhou, Mengmeng Wang, Jiaqi Yin, and Hong Yuan. 2026. "Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion" Applied Sciences 16, no. 3: 1530. https://doi.org/10.3390/app16031530

APA Style

Feng, J., Luo, R., Li, X., Zhou, X., Wang, M., Yin, J., & Yuan, H. (2026). Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion. Applied Sciences, 16(3), 1530. https://doi.org/10.3390/app16031530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Bridging the Semantic Gap in BIM Interior Design: A Neuro-Symbolic Framework for Explainable Scene Completion

Abstract

1. Introduction

2. Related Work

2.1. BIM-Oriented Object Library Retrieval and Design Automation

2.2. Visual Compatibility Learning and Scene-Level Interior Modeling

2.3. Knowledge Graphs, Neuro-Symbolic AI, and Explainable Recommendation

3. Problem Formulation and System Architecture

3.1. Mathematical Problem Formulation

3.2. The Neuro-Symbolic System Architecture

4. Methods

4.1. Visual-Semantic Knowledge Extraction and Graph Construction

4.1.1. Visual Perception via Deep Learning

4.1.2. Mining Tacit Design Logic

4.2. Furniture Recommendation Using Scene Style Learning

5. Experiments

5.1. Experimental Setup

5.2. Results and Mechanism Analysis

5.2.1. Efficacy of Visual-Semantic Feature Extraction

5.2.2. Comparative Analysis: Global Context vs. Local Features

5.2.3. Robustness Across Diverse Scenarios

5.2.4. In-Authoring Explainability Visualization in Revit

6. Discussion

6.1. Addressing the “Semantic Gap” in BIM Interior Assistance

6.2. Interpreting the Empirical Gains: Why Neuro-Symbolic Fusion Helps

6.3. Representation Choices: Gram-Matrix Style Features and Credible Published Alternatives

6.4. Robustness of FMKG Construction: Thresholds, Sparsity, and Normalized Association Measures

6.5. Scope, Failure Modes, and Extensions That Preserve the Paper’s Core Framing

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI