OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management

Sun, Lingzhi; Zou, Linyan; Zhang, Yuanxin; Flood, Ian

doi:10.3390/buildings16071429

Open AccessArticle

OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management

¹

Department of Construction Management, School of Civil Engineering and Architecture, Shandong University of Science and Technology, No. 579 Qianwan Port Road, Huangdao District, Qingdao 266510, China

²

Department of Construction Management, School of Management, Guangzhou University, 230 Outer Ring, Rd., University Town, Panyu District, Guangzhou 510006, China

³

M.E. Rinker, Sr. School of Construction Management, University of Florida, Gainesville, FL 32611, USA

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(7), 1429; https://doi.org/10.3390/buildings16071429

Submission received: 3 February 2026 / Revised: 31 March 2026 / Accepted: 1 April 2026 / Published: 3 April 2026

(This article belongs to the Special Issue AI in Construction: Automation, Optimization, and Safety)

Download

Browse Figures

Versions Notes

Abstract

The operation and maintenance (O&M) management of prefabricated buildings often struggles with fragmented knowledge and low reusability, relying predominantly on expert experience. While large language models (LLMs) offer a potential solution, their inherent hallucination issues significantly hinder practical application. To address these issues, this study proposes a knowledge base-augmented OM-GPT for prefabricated buildings O&M, built on a hybrid architecture that combines domain-specific fine-tuning with graph-based retrieval-augmented generation (GraphRAG). Specifically, it first fine-tuned the LLM Qwen2.5 using specialized O&M data to enhance its understanding of O&M tasks. It then constructed a multi-relational knowledge graph within a GraphRAG framework to effectively mitigate model hallucinations. Experimental results demonstrate that the Fine-Tuned Model achieved excellent Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores, validating the success of domain adaptation. In a five-dimensional evaluation, knowledge base-augmented OM-GPT significantly outperformed both GPT-4 and DeepSeek. Furthermore, two-way ANOVA tests confirmed the model’s advantages generalize across all five evaluation dimensions.

Keywords:

large language model; fine-tuning; GraphRAG; prefabricated buildings; O&M

1. Introduction

The modular production inherent in prefabricated buildings, while enhancing construction efficiency, simultaneously introduces significant operation and maintenance (O&M) management challenges. These include complex component interface coordination and fragmented lifecycle data. Existing research predominantly focuses on areas such as sustainability and environmental impact assessment [1], design methods and production control [2,3], and government-driven management optimization [4,5]. Conversely, there is a notable lack of systematic exploration into knowledge acquisition and decision support during the O&M phase. This construction-centric research paradigm, which neglects O&M, means that O&M decisions still heavily rely on manual expertise. This issue is further exacerbated by an imbalance in the industry’s talent structure, leading to a bimodal talent gap in the prefabricated building O&M sector. Cultivating senior engineers with comprehensive knowledge integration capabilities requires several years. Simultaneously, the annual turnover rate among newer-generation technicians remains persistently high due to a mismatch between knowledge update pressures and career growth expectations. This contradiction intensifies against the backdrop of digital transformation, urgently necessitating technological solutions to achieve the explicit representation and standardized transfer of tacit knowledge.

Recent breakthroughs in large language models (LLMs) for knowledge-intensive areas offer a potential solution to these problems, demonstrating significant advantages in instruction parsing and text generation [6]. However, existing studies indicate that LLMs may generate ambiguous or factually incorrect outputs when handling complex relationships [7]. While some researchers have attempted to address these issues using knowledge graphs (KGs), challenges such as high manual annotation costs and limited multi-hop reasoning capabilities persist [8,9,10]. Against this backdrop, LLM architectures integrated with KGs have emerged as a key research focus [11,12]. Their core value lies in the KG’s ability to provide structured knowledge injection during both LLM pre-training and inference, simultaneously enhancing the interpretability of model decisions. Building upon this, this study constructs a domain-specific knowledge base grounded in KGs to address issues of professional data fragmentation and digital silos.

Previous research has consistently shown that fine-tuning LLMs with limited amounts of high-quality, domain-specific data can achieve performance comparable to models like MiniGPT-4 or GPT-4 on specialized tasks [13,14]. This indicates how domain-specific fine-tuning can significantly boost LLMs response quality in specialized scenarios. Therefore, this study proposes a hybrid architecture integrating fine-tuning techniques with knowledge base augmentation. This study introduces innovations across three key dimensions to construct an LLM specifically tailored for this domain. Specifically, (1) Data collection: this study combines authoritative data retrieval and AutoGen’s automated generation strategies to build a comprehensive, domain-specific dataset for prefabricated building O&M management, achieving semantic alignment between the structured knowledge base and unstructured text. (2) Model training: this study employs Low-Rank Adaptation (LoRA) to efficiently update only rank decomposition matrices and optimize loss functions to adjust model parameters. This process yields an O&M GPT (OM-GPT) model. Its linguistic patterns are better aligned with the prefabricated building O&M domain, exhibiting enhanced professionalism. (3) Knowledge reasoning: this study introduces a Graph-based Retrieval-Augmented Generation (GraphRAG) framework to construct a domain-specific knowledge base for prefabricated building O&M. This enables crucial semantic association and interpretable reasoning.

The remainder of the paper is structured as follows: Section 2 explores the state of the art in prefabricated buildings and LLM technologies. Section 3 details the key aspects of data collection, model fine-tuning, and knowledge base construction. Section 4 discusses the results and their validation. Section 5 concludes the paper by summarizing our findings and proposing potential avenues for future research.

2. Literature Review

2.1. Research Related to Prefabricated Buildings

Prefabricated buildings have experienced rapid development and sustained strong growth momentum, largely supported by national policies [15]. This growth is driven by their inherent advantages, including energy conservation, environmental protection, shortened construction periods, and enhanced construction quality [16]. Consequently, research on prefabricated buildings has garnered significant scholarly attention, primarily focusing on several key areas like sustainability and environmental impact, design methods, production control, and components installation and automation.

As for sustainability and environmental impact, Batikha et al. [17] compared 3D concrete printing with other construction methods in terms of construction time, cost and CO₂ emissions. Liu et al. [18] proposed a building information modeling (BIM)-integrated two-stage metaheuristic search framework for bridge structure optimization to reduce carbon emissions. In design methods and production control, Liu et al. [19] automatically generated clash-free rebar designs in prefabricated concrete wall panels using Generative Adversarial Networks and Deep Reinforcement Learning. Alabbasi et al. [20] utilized parametric modeling, topology optimization, finite element analysis, and robotic 3D printing to manufacture optimized precast reinforced concrete columns. Regarding component installation and automation, Nguyen et al. [21] developed a new method that involves the integration of design for manufacturing and assembly (DfMA) principles and parametric BIM to create a preassembly analysis system aimed at reducing assembly errors through automated assembly-aware manufacturing. Liu et al. [18] proposed a vision-based, robot-assisted prefabricated component installation system to improve assembly accuracy. In enterprise management and business models, Pan et al. [22] focused on how enterprises adopted off-site technologies to advance strategic management. Regarding government-driven policies and management optimization, Pan et al. [23] examined the decision-making behaviors of governments, developers, and consumers in promoting prefabricated buildings development in China, offering policy recommendations for all three parties. Some studies address schedule and cost management across various stages of the prefabricated building lifecycle [24]. Innovation in prefabricated buildings is another significant research area; for instance, Zhang et al. [25] integrated the Quadruple Helix Model with system dynamics to explore factors influencing the formation and dynamic evolution of prefabricated construction innovation network. Additionally, quality management [26], risk management [27], and supply chain management [28] are crucial research components in this area.

While existing literature on O&M management is substantial, it primarily focuses on traditional buildings and covers areas like information management, maintenance management, energy management, and emergency management [29]. For instance, Rashid et al. [30] integrated Internet of Things (IoT) infrastructure with BIM-based virtual environments to extend BIM’s utility into building operational phases. Vitiello et al. [31] adopted a simplified semi-probabilistic approach within BIM models to assess economic performance variations and losses caused by seismic hazards. Bonci et al. [32] proposed a BIM-based cyber–physical system for automated facility monitoring and performance evaluation. Ma and Wu [33] develop a fire emergency management system that incorporates occupant behavior decision-making, built upon a BIM platform. However, literature searches of this study reveal a scarcity of research specifically addressing prefabricated building O&M management.

2.2. Research Related to LLMs

While the rapid advancement of LLMs has propelled the practical implementation of artificial intelligence (AI), it has also brought to light a significant challenge: the hallucination bottleneck. This issue involves LLMs generating factually and logically inconsistent outputs, often appearing as factual errors, reasoning contradictions, and overuse of specialized jargon [34]. To address this bottleneck, academic research largely pursuing two strategies: fine-tuning existing models and augmenting them with external knowledge augmentation.

When it comes to fine-tuning LLMs, traditional research emphasized full fine-tuning (FFT). However, as LLMs have grown in size, FFT paradigms have become incredibly inefficient and unsustainable, often requiring thousands of GPUs working in parallel. This has spurred the emergence of Parameter-Efficient Fine-Tuning (PEFT) [35], which fixes most pre-trained parameters while fine-tuning a minimal subset, enabling LLMs to quickly adapt to various downstream tasks. Remarkably, PEFT can sometimes even surpass FFT performance. For example, Liu et al. [36] proposed a deep prompt tuning approach based on P-Tuning, which applies continuous prompts to each layer of the pre-trained model, not just the input layer. Experiments showed consistent improvements across model scales, achieving near parity with FFT on billion-parameter models. Hu et al. [37] developed Low-Rank Adaptation (LoRA), which freezes pre-trained model weights and injected trainable rank decomposition matrices into every layer of the transformer architecture. This drastically reduces trainable parameters for downstream tasks, with results demonstrating that LoRA matches or exceeds FFT quality on GPT-3 and DeBERTa. Mao et al. [38] introduced the UniPELT framework, which incorporates different parameter-efficient language model tuning (PELT) methods as submodules and learns to activate the best suited methods for current data or tasks via a gating mechanism. Experimental data show that hybrid PELT methods outperform single approaches. Nevertheless, exclusive reliance on fine-tuning faces limitations such as delayed dynamic knowledge updates and risks of domain overfitting. This is particularly problematic in scenarios requiring real-time responses to industry regulations, such as prefabricated building O&M, where model performance enhancements can easily encounter bottlenecks.

In knowledge augmentation, existing research has primarily focused on the development and application of ontological knowledge bases. For example, Bonci et al. [32] developed an ontological knowledge management tool to provide essential simulation knowledge for inexperienced users from a numerical simulation perspective. Similarly, Fitkau and Hartmann [39] proposed a Fire Safety Ontology that integrates expert opinions with architectural design knowledge, using inference engines to automatically deduce structural fire safety requirements from regulations. However, with the continuous evolution of retrieval technologies, knowledge bases have gradually transitioned from ontologies to RAG. RAG can effectively address information latency issues when external knowledge sources are regularly updated [40], enabling timely and accurate responses [41]. Examples include Heredia Álvaro and Barreda [42], who applied RAG in tile manufacturing scenarios to enhance quality control through integrated knowledge retrieval and text generation. Additionally, Uhm et al. [43] developed an RAG-GPT model, whose effectiveness in construction safety was demonstrated through comparisons with four existing GPT models. While RAG can partially mitigate hallucination in LLMs, it still has limitations, particularly when handling statistical and summarization-focused queries such as Query-Focused Summarization (QFS) [44]. With Microsoft’s open-sourcing of GraphRAG, knowledge base development has entered a new phase. GraphRAG improves knowledge relevance and scalability through graph-structured representations of knowledge relationships, demonstrating superior flexibility and accuracy in handling complex problems [45].

The integration of techniques with GraphRAG offers strong complementary advantages. Fine-tuning enhances the model’s intrinsic understanding of domain terminology and reasoning paradigms through parameter optimization, while GraphRAG provides contextualized external knowledge inputs via graph-structured knowledge bases. This synergistic architecture not only improves linguistic adaptation to industry specifications but also leverages GraphRAG’s dynamic knowledge networks to address inherent knowledge gaps. For complex O&M decisions, it translates directly into end-to-end reliability assurance, covering from macro-strategy formulation to micro-specification verification.

2.3. Points of Departure

Recent research on prefabricated buildings and O&M management has demonstrated trends toward diversification and deepening [29]. While significant progress has been made across multiple domains, this evolution has simultaneously revealed key deficiencies and challenges in O&M management. The swift advancement of LLMs offers new paradigms for addressing these challenges, yet their inherent hallucination issues become particularly pronounced in vertical applications. Current academic efforts to overcome this primarily follow two paths: fine-tuning and knowledge augmentation. This study introduces an integrated approach, combining fine-tuning with GraphRAG to construct a knowledge retrieval and fault-diagnosis-enhanced knowledge base for O&M management of prefabricated buildings. The methodology leverages fine-tuning to bolster the model’s intrinsic comprehension of domain-specific terminology and construction logic, while employing GraphRAG to improve knowledge relevance and scalability by dynamically injecting new domain knowledge into the LLM.

3. Methodology

This study proposes an efficient knowledge retrieval and fault diagnosis graph enhancement technical framework for prefabricated building O&M management. The framework employs a multi-source data fusion strategy to build a high-quality, domain-specific training dataset by integrating official authoritative data with AutoGen-generated data. In the model construction stage, first, this study fine-tuned the Qwen2.5-7B large language model to enhance its domain adaptability, resulting in a specialized OM-GPT model. At the same time, GraphRAG was used to construct a prefabricated building O&M knowledge graph, enabling the storage and retrieval of structured knowledge. Finally, this study built a visual interactive interface, forming a complete technical closed loop from data collection and knowledge construction to intelligent application.

3.1. Data Collection and Processing

Existing research indicates that LLMs can significantly improve their performance in specialized fields by learning from domain-specific corpora [46]. Therefore, this study systematically collects and processes datasets for prefabricated building O&M management. The data processing process involves three key steps: data collection, preprocessing, and standardization, aiming to provide high-quality data for subsequent model fine-tuning and knowledge base construction.

In terms of data collection, this study adopts a multi-source heterogeneous data fusion strategy to construct a dual-layer data architecture comprising a “Fine-tuning Instruction Set” and a “Retrieval Knowledge Base.” This architecture is designed to support domain-style alignment and retrieval-augmented generation for the OM-GPT model, respectively.

First, regarding the Fine-tuning Instruction Set, we utilized the AutoGen framework to build a multi-agent collaborative system. Organized by a GroupChatManager, two agents—AssistantAgent and UserProxyAgent—collaborated to automatically generate 102 high-quality fine-tuning instructions in JSON format. It is important to note that a multi-level quality control mechanism was implemented during the AutoGen generation phase, including the following: (1) Constraint-based Prompt Engineering: We designed structured prompts with strict output schemas (JSON) and few-shot examples to guide the AssistantAgent in generating content that adheres to domain specifications. (2) Multi-agent Cross-Validation: The UserProxyAgent served not only as an interlocutor but also as a validator. A “critic” mechanism was embedded to cross-check generated samples against retrieved knowledge snippets before final acceptance. Through these mechanisms, the system automatically produced a high-confidence JSON-formatted fine-tuning dataset. Subsequently, a manual verification step was conducted to perform a secondary review of the generated training samples, providing a high-quality corpus foundation that ensures both relevance and accuracy for model training.

Second, regarding the construction of the Retrieval Knowledge Base, this study systematically collected and organized standardized documents covering the entire lifecycle of prefabricated buildings to establish a benchmark dataset of standardized norms. The knowledge base draws from extensive and authoritative sources, specifically including the following: national and industry standards, technical guidelines issued by government agencies, white papers from authoritative certification bodies, technical specifications compiled by industry expert teams, as well as practical experience corpora derived from professional forums, industry Q&A platforms, academic symposium minutes, corporate technical reports, and textbook case libraries. To ensure comprehensive and balanced knowledge coverage, we mapped these multi-source data into five core business domains: Equipment Management and Maintenance, Fault Diagnosis and Repair Recommendations, Regulatory and Standard Compliance, Safety Management and Emergency Response, and Quality Control of Prefabricated Decoration. This multi-source fusion strategy ensures that the model can access comprehensive information support ranging from top-level standards to grassroots practical experiences during retrieval.

For data preprocessing, this study adheres to data best practices [47] to standardize information related to the O&M management of prefabricated buildings. This involves three key stages: (1) text normalization: this involves word reduction, professional terminology standardization, and text structure reorganization [48]; (2) noise processing: this covers detecting and deleting duplicate data, desensitizing sensitive information, and rejecting low-quality samples [49]; (3) knowledge enhancement: this encompasses entity recognition and linking, relationship extraction, and knowledge graph alignment. This ensures the purity and accuracy of the training data, thereby enhancing the robustness and generalization capabilities of the Qwen2.5-7B foundation model, providing a solid base for developing the knowledge base-augmented OM-GPT model [2].

3.2. OM-GPT Model Development

In this study, a loss-based gradient descent algorithm was used to adjust the parameters of the pre-trained model [13]. Comprehensive fine-tuning of LLMs like Qwen2.5-7B requires significant computational resources, which poses a great challenge for typical construction enterprises in practical applications [2]. Therefore, this study optimizes LoRA technology with the Unsloth framework. This optimization scheme fine-tunes only 0.1% of the original model’s parameters, improving training speed by about 30%, and reducing memory usage by 40%.

Currently, AI research generally uses pre-trained language models as the infrastructure for downstream tasks [50]. However, models like BERT, RoBERTa, DeBERTa were originally pre-trained on English datasets by their developers, leading to poor performance in Chinese. Therefore, based on the excellent performance of the Qwen2.5 open-source model in Chinese, this study selected Qwen2.5-7B as the base model for fine-tuning. The specific fine-tuning process is shown in Figure 1.

This study’s fine-tuning process adopts a hierarchical adaptation strategy to inject domain knowledge into Qwen2.5’s attention mechanism to enhance the model’s professional reasoning ability. This is achieved by updating only the rank decomposition matrix to obtain new parameters.

Let the fine-tuning dataset of this study be represented as Q_k, the pre-trained Qwen2.5 weight parameter as

Θ_{P L M}

, and the adapter design parameter for the LoRA part as

Φ

and

|Φ| < < |Θ_{P L M}|

. The optimization objective function is (see Equation (1)):

\arg \min_{Φ} L (Q_{k}; Θ_{P L M}, Φ) \to Φ

(1)

Specifically, when a question such as “What are the characteristics of intelligent equipment maintenance in prefabricated building O&M management?” is input, the system first converts it into a word vector representation. This vector is then fed into the pre-trained Qwen2.5-7B model. The model’s output, a target embedding vector Y, is then optimized against the embedding of the standard answer: “intelligent equipment maintenance in prefabricated building O&M management has the characteristics of automation, remote monitoring, predictive maintenance, etc. ” During the training process, this study utilized the Loss (L_CE) loss function, defined as Equation (2),

L_{C E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{t = 1}^{T_{i}} \log (p_{i, t} [y_{i, t}])

(2)

where N is the total number of samples in the training dataset, T_i is the target sequence length of the i-th sample, p_i,t is the predicted probability distribution of the model for the i-th sample at time t, and y_i,t is the real word index at time t.

In the backpropagation stage, this study utilizes Autograd’s automatic differentiation mechanism to construct a dynamic computational graph for gradient calculation. To improve computational efficiency, this study applied Matrix Chain Multiplication Optimization to adjust the matrix operation order. This significantly reduces the computational complexity, leading to an approximate 35% reduction in floating-point operations per second (FLOPs) during the fine-tuning process, all while maintaining the convergence performance of the model.

After several iterative experiments, the optimal configuration of key training parameters was determined (see Table 1). These parameter settings underwent rigorous cross-validation to ensure faster convergence, more stable training process, and superior domain adaptation performance during fine-tuning within the field of prefabricated building O&M management.

To verify the fine-tuning effect of the model, this study evaluated it using the ROUGE index. ROUGE is a widely recognized metric in Natural Language Processing (NLP) tasks such as summarization and translation, designed to compare machine-generated output against human-created references. For this study’s fine-tuning assessment, ROUGE quantified the consistency of model responses before and after the fine-tuning process. More precisely, ROUGE-N measures the recall rate for n-grams, with ROUGE-1 and ROUGE-2 scores serving as the main evaluation metrics, shown as Equation (3):

R O U G E - N = \frac{\sum_{S \in {Re f e r e n c e S u m m a r i e s}} \sum_{g r a m_{n} \in S} C o u n t_{m a t c h} (g r a m_{n})}{\sum_{S \in {Re f e r e n c e S u m m a r i e s}} \sum_{g r a m_{n} \in S} C o u n t (g r a m_{n})}

(3)

where n stands for the length of the n-gram (gram_n), and Count_match(gram_n) denotes the maximum number of n-grams co-occurring in a generated text and a set of reference texts.

3.3. Developing a GraphRAG-Powered Knowledge Base for Automated Prefabricated Building O&M

This study established the optimized OM-GPT model by directionally fine-tuning the Qwen2.5-7B foundation model for prefabricated building O&M management. However, this model’s application in professional fields still faces challenges such as incomplete knowledge coverage and insufficient industry-specific case support. To improve the practicability and professionalism of the model in actual O&M scenarios, this study constructs a dedicated knowledge base for prefabricated building O&M management based on the GraphRAG framework. It aims to provide users with more accurate and industry-specific decision support in O&M management. The overall framework flow of GraphRAG is shown in Figure 2.

3.3.1. Knowledge Graph Development

This study proposes an LLM-based knowledge graph construction method for prefabricated building O&M management. First, the preprocessed domain dataset undergoes an intelligent dynamic chunking algorithm to ensure the semantic coherence of each text block. For knowledge extraction, this study designed a GPT-4-based prompt. A multi-stage prompt template guides the model to perform entity recognition and relationship extraction tasks. In the entity recognition stage, this study adopts the standard output format of [entity{tuple_delimiter} entity name {tuple_delimiter} entity type {tuple_delimiter} entity description]. This study sets the parameter of logit_bias = 100 for confidence evaluation and ensures the completeness of the recognition results through multiple rounds of iterations. In the relationship extraction stage, a quadruple structure of [source = head entity, target = tail entity, weight = weight value, description = relationship description] is generated to accurately describe the semantic association between entities.

Based on the indices constructed from the prior steps, the final knowledge graph, denoted as G(V, E), is represented as a set of triples [52], where the node set V contains entity properties, while the edge set E stores relationship information with weights and descriptions (see Equation (4)). Each node

v = {v_{1}, v_{2}, v_{3} \dots v_{n}} \in V

and each edge

e = {e_{1}, e_{2}, e_{3} \dots e_{n}} \in E

carry specific semantic information.

G = {τ_{i} = (s_{i}, r_{i}, t_{i}) | s_{i}, t_{i} \in V, r_{i} \in E}

(4)

In the equation, a triplet (h, r, t) represents the source entity (h) connected to the target entity (t) through a specific relationship (r). This structure encapsulates structured knowledge unique to the prefabricated building O&M management; for example, the triplet {“release agent”, “cause”, “PC laminate cracking problem”}. To gain deep insight into the topological characteristics, the Leiden community discovery algorithm is employed to recursively divide the constructed KG into multi-level communities. By optimizing the modularity index, the algorithm recursively divides the graph into several internally connected communities until the size of each community reaches a preset threshold. Based on this community structure, the system automatically generates structured knowledge summaries that not only maintain the integrity of domain knowledge but also optimize retrieval efficiency in the subsequent query stage for O&M [54].

3.3.2. Knowledge Base-Augmented OM-GPT for O&M

In queries involving domain expertise, especially within prefabricated building O&M management, existing models often suffer significant limitations, making it difficult to effectively capture complex domain-specific relationships and handle semantically ambiguous queries. Therefore, this study integrates the fine-tuned OM-GPT model and the knowledge graph built on GraphRAG to achieve more comprehensive and accurate semantic retrieval for user queries.

Knowledge graph-driven approaches can effectively compensate for the shortcomings of existing models in terms of professionalism and semantic understanding by integrating explicit relationships with structured knowledge representations [55]. For example, when the user submits a query q = {“There is a cracks and leakage in the cast-in-place slab of PC prefabricated laminated floor slabs in prefabricated concrete buildings, please put forward relevant measures and suggestions”}, the OM-GPT model enhanced by the knowledge base will disassemble the user’s query q to obtain entity nodeS v = {v₁, v₂, v₃, v₄…, v_n}, such as v = {“prefabricated concrete building”, “PC prefabricated laminated floor slab”, “crack”, “leakage”}. For each entity v_i, the model retrieves a community summary with strong correlations and connects text chunks (c = {c₁, c₂, c₃… c_n}). These chunks are then passed to the text generator G to generate an intermediate answer. Low-scoring answers are filtered out by a scoring mechanism to obtain a refined answer

A

.

In addition, this study incorporates a new external retrieval mechanism. For a query q, the knowledge base-augmented OM-GPT model can leverage external retrieval tools (e.g., Tavily search) to obtain a broader source of information, represented as

\bar{A}

. The final response generated by the knowledge base-augmented OM-GPT model can be expressed as

\tilde{A}

(see Equation (5)).

\tilde{A} = {G (q, {c_{1}, c_{2}, c_{3} \dots c_{n}}), \bar{A}}

(5)

3.4. Model Evaluation

To ensure a rigorous evaluation of the OM-GPT model’s performance, this study employed a hybrid assessment approach combining automated and manual methods. The evaluation content is based on typical representative samples meticulously constructed according to the principles of “full lifecycle coverage” and “core business scenario mapping”. It focuses on five critical aspects widely concerned within the prefabricated building O&M industry: Equipment Management and Maintenance (covering routine O&M fundamentals); Fault Diagnosis and Repair Recommendations (covering complex technical reasoning); Regulatory and Standard Compliance (covering compliance-based factual verification); Safety Management and Emergency Response (covering high-risk decision-making scenarios); Quality Control of Prefabricated Decoration (covering specific process details). These five domains strictly align with industry standards, comprehensively covering the entire O&M chain from routine prevention to emergency response and from top-level regulations to grassroots processes; in vertical domain research, deep evaluation of such highly representative core scenarios is more effective than generalized breadth testing for validating a model’s professional reasoning capabilities and factual accuracy. Therefore, this study selects one representative question from each aspect as a testing benchmark. This aims to construct an objective comparative analysis framework to comprehensively evaluate the practical application capabilities of different models.

(1): Automatic evaluation

Referencing recent reviews on LLM evaluation methods [56,57], this study used the responses effect of specific questions as the benchmark for measuring model performance. This benchmark enables a systematic comparison among the selected models, including GPT-4, Qwen2.5, and the knowledge base-augmented OM-GPT.

Through a comprehensive literature review, this study analyzed the actual needs and focus of prefabricated building O&M management. This process summarizes five core issues that are representative and key to the industry for automatic evaluation.

For the automated assessment, this study used GPT-4 as an automated assessment tool [58]. This study constructed an analytical framework based on the dimensions of helpfulness, relevance, accuracy, depth, and creativity to decouple and quantify the evaluation. The specific implementation involves designing prompt words (as shown in Figure 3) and employing an adversarial generation strategy. This guides the evaluation model to perform multiple rounds of multi-dimensional comparative judgments. A quantitative comparison matrix is formed by calculating the binary adversarial winning rate of different model combinations across the five test questions. This method ensures an objective and neutral evaluation of different models’ outputs by GPT-4, thereby guaranteeing consistency and comparability in evaluation.

(2): Manual evaluation

Although well-designed prompts effectively reduce the impact of positional and verbose bias during automated evaluation, this method still faces scalability limitations when dealing with numerous candidate responses. In addition, manual evaluation offers more comprehensive and accurate feedback, being closer to actual application scenarios than automated methods. Therefore, this study complements its automated evaluation manual evaluation to correct possible inherent bias and address its restricted scope in evaluation dimensions.

Human evaluations typically involve inviting evaluators—including domain experts, researchers, or casual users—to score the quality of the model-generated responses across multiple dimensions. Existing studies offer reliable reference criteria; for instance, Singhal et al. [59] consider agreement with scientific and clinical consensus, the presence of incorrect content, potential for and likelihood of possible harm, and possible bias. Ziems et al. [60], conversely, focused on faithfulness, coherence, relevance, and fluency.

To better evaluate the model, this study combines these insights with the specific characteristics of prefabricated building O&M management. This study selected five tailored evaluation dimensions: logical consistency, sentence fluency, response completeness, practical application, and relevance. Using these criteria, this study thoroughly evaluated the generation quality of GPT-4, Qwen2.5, and knowledge base-augmented OM-GPT.

To deepen the understanding of prefabricated O&M and validate the effectiveness of the questionnaire, this study employed an expert survey method to explore its compatibility with practical applications. To improve the rigor and reliability of feedback analysis, a structured design process was adopted. The steps taken to ensure effective conclusions from expert feedback are as follows: First, to ensure the internal consistency of the evaluation framework and compensate for the uni-dimensional bias of the automated assessment, the manual evaluation retained the same five questions. Then, 42 experts from the construction industry were invited to evaluate the overall answer quality of the three models (i.e., DeepSeek, GPT-4, and knowledge base-augmented OM-GPT) across the five dimensions. The scoring criteria are shown in Table 2. This study then aggregated the scores from all 42 experts for DeepSeek, GPT-4, and the knowledge base-augmented OM-GPT model for each question for further analysis. The experts panel comprising 19 senior experts and professors (accounting for 45%), 18 senior engineers (accounting for 43%), and 5 project managers (accounting for 12%), with the detailed composition shown in Figure 4.

A two-way ANOVA test is used to compare the differences in the responses of the three models across the five questions in the specified dimensions. Two-way ANOVA refers to the analysis of variance used to analyze two-factor experimental data, and its null hypothesis states that each set of data has the same distribution, which makes it very suitable for data analysis in this study.

4. Results

4.1. Validation

4.1.1. Performance Benchmarking of Parameter Optimized-OM-GPT

To advance domain-specific response quality in prefabricated building O&M management, this study implemented PEFT on the OM-GPT architecture. The optimization targets three critical dimensions: technical accuracy, theoretical depth, and domain linguistic conformity. Leveraging 102 curated samples automatically generated by AutoGen, this study fine-tuned and assessed outcomes via ROUGE metrics (see Figure 5).

As evidenced in Figure 6: (1) The scores of ROUGE-1 and ROUGE-2 consistently fall between 0.7 and 1, indicating a high overlap with the reference dataset at both the single-word and continuous-word levels, thereby confirming the effectiveness of model fine-tuning. (2) The data points are dense with minimal fluctuation. This demonstrates that the performance of the model remains stable across different data, suggesting that it is neither significantly overfitted to the training data nor underfitted to the data features. Such stability is crucial for ensuring the model’s robust generalization ability in practical applications.

4.1.2. Validation of Knowledge Graph of Prefabricated Building O&M Management

Based on the standardized specification benchmark dataset for prefabricated building O&M management, this study establishes a domain knowledge graph to visualize the entities and relationships constructed by GraphRAG. Representative topological structures are shown in Figure 6, illustrating a topological map of multiple entities (e.g., “release agent”, “emergency response plan”, “material supplier”, etc.) and relationships (e.g., “belongs to”, “connects”, “complies with”, etc.), while further detailed information on the knowledge graph is provided in Table 3.

In order to verify the enhancement efficiency of the knowledge graph, this study conducted tests using the question “How to achieve dynamic health monitoring and leakage risk early warning for construction joints at the junction of composite floor slabs and ALC wall panels? “. As shown in Figure 6, GraphRAG identifies the “composite floor slabs” and “ALC wall panels” as the key nodes within the knowledge graph (highlighted by the red circle in Figure 6). It then deduces the relationship subgraph with strong correlation between nodes for further analysis.

4.1.3. Analysis of Knowledge Base-Augmented OM-GPT Model Responses

In order to scientifically evaluate the content generation effect of knowledge base-augmented OM-GPT in prefabricated building O&M management, this study comparatively analyzes its responses using example-based scenarios. All responses were obtained from the actual testing of three models: DeepSeek, GPT-4 and the knowledge base-augmented OM-GPT. Table 4 presents a comparison of response examples, demonstrating a certain degree of professional knowledge in addressing prefabricated building O&M challenges across all models. Specifically, GPT-4’s responses, while touching upon aspects like design, materials, construction, and maintenance, are less practical. Its expressions are relatively broad, and its focus is vague, offering limited practical guidance for frontline workers. DeepSeek provided more comprehensive responses, offering specific suggestions for improving construction quality and selecting high-quality materials. The knowledge base-augmented OM-GPT performed best in terms of technical depth and practicality. Its responses are rich in technical details through both local and global searches, making it more suitable for practice.

4.2. Performance Evaluation

4.2.1. Automatic Evaluation

This section evaluates the responses quality of the three models, DeepSeek, GPT-4, and the knowledge base-augmented OM-GPT across five test questions. Figure 7 is a summary of the evaluation results, which presents a comparison of the three models across the five dimensions: helpfulness, relevance, accuracy, depth, and creativity. In this figure, the value represents the percentage of wining rate for the row model relative to the column model, with 50% displayed as the expected reference for parity.

As Figure 7 illustrates, the knowledge base-augmented OM-GPT consistently achieved a winning rate of over 60% against both GPT-4 and DeepSeek across all five dimensions, indicating that the knowledge base-augmented OM-GPT shows high proficiency in generating responses related to prefabricated building O&M. This outcome effectively demonstrates the utility of both our model fine-tuning and the GraphRAG framework’s ability to construct a high-quality prefabricated building O&M knowledge base, making it suitable as an operational guide for frontline construction personnel.

It is noted that (1) the knowledge base-augmented OM-GPT achieved a winning rate of over 100% in the accuracy and depth dimensions (all five questions were won), demonstrating a unilateral dominance and indicating its substantial superiority over other models. (2) GPT-4 exhibited a winning rate of less than 50% in the helpfulness, depth, and creativity, indicating weak performance in these areas. This can be attributed to the limitations of GPT-4’s general training paradigm. While it possesses extensive knowledge coverage, it lacks structured professional knowledge injection specific to prefabricated building O&M management. This results in issues such as insufficient domain knowledge density, limited contextual reasoning depth, and a lack of innovation constraints in its outputs.

4.2.2. Manual Evaluation

A total of 42 experts were invited to evaluate and score the model responses. The scores from these experts for each question were statistically analyzed using the median, mean, and variance. The statistical results are shown in Figure 8. From this figure, it is evident that the variance of all dimension–question combinations is small, indicating that strong expert consensus and reflecting the model’s stable performance across the five questions. The means of the knowledge base-augmented OM-GPT are consistently high with

μ \geq 7

, indicating a high and consistent overall expert evaluation of the model. In the response completeness for question 5, the knowledge base-augmented OM-GPT reached a median of 9.00. This highlights that the responses provided by the model comprehensively address typical problems in prefabricated building O&M management, earning high recognition from experts.

To further substantiate the professionalism and validity of the model’s responses, this study conducted a two-way ANOVA analysis on the results presented above, as detailed in Table 5. Statistical analysis revealed that across all five evaluation dimensions, the results demonstrated p-value

< 5 %

, indicating statistically significant performance differences among the three models, with F-values significantly exceeding the F-critical value of 3.01. As evidenced in Figure 8, the knowledge base-augmented OM-GPT model consistently outperformed other models across all dimensions. Further analysis confirmed that F-values between different questions were significantly lower than the F-critical value, suggesting no significant variation across question types. This empirically validates the stable performance superiority of the knowledge-enhanced OM-GPT while eliminating potential interference from question type variations.

Moreover, for model and question interaction effects, all dimensions showed interaction with F-values below the F_critical with p-value > 5%, indicating no significant interaction between model type and question type, thereby further verifying the knowledge base-augmented OM-GPT’s generalization capability. These results collectively demonstrate the effectiveness of integrating fine-tuning with GraphRAG-augmented knowledge retrieval technology for prefabricated building O&M management.

4.3. Ablation Study

To investigate the specific contributions of fine-tuning and the knowledge base, this study conducted an ablation experiment comparing the performance of the base model, the Fine-Tuned Base Model, and the Integrated Model (OM-GPT). For each of the five domains outlined in Section 3.4 (Equipment Management and Maintenance, Fault Diagnosis and Repair Recommendations, Regulatory and Standard Compliance, Safety Management and Emergency Response, and Quality Control of Prefabricated Decoration), 10 representative questions were selected as the test set. In addition to the five metrics listed in Table 2, we introduced a new metric, Domain Linguistic Adaptability, to specifically evaluate whether fine-tuning successfully enabled the base model to adapt to the terminology and semantic patterns of the vertical domain. The experimental results are presented in Figure 9: The Fine-Tuned Model demonstrated the most significant improvement over the base model in Domain Linguistic Adaptability, confirming the effectiveness of the fine-tuning process in aligning linguistic styles. The Integrated Model showed varying degrees of improvement over the Fine-Tuned Model across four dimensions: logical consistency, response completeness, practical applicability, and relevance. The most substantial gains were observed in Practical Applicability and Relevance, attributed to the retrieval-augmented generation capabilities. The improvement in sentence fluency was relatively marginal. This is primarily because the base model already possessed strong inherent fluency capabilities, leaving limited room for further enhancement.

5. Conclusions and Future Work

This study proposes a fine-tuned and knowledge-augmented LLM for prefabricated building O&M management. Specifically, it leverages the AutoGen framework to construct a multi-agent system for automatically generate standardized datasets. The Qwen2.5-7B model was selected as the foundation, and its parameters were adjusted using loss-based gradient descent, yielding the OM-GPT model. This fine-tuning process resulted in linguistic patterns better aligned with prefabricated building O&M management.

To further enhance the professionalism and reliability of problem analysis within the domain, this study employed the GraphRAG framework to construct a knowledge graph through entity recognition and relationship extraction. Consequently, an O&M management system for prefabricated buildings was developed by integrating the OM-GPT model with the knowledge graph.

This study validated the effectiveness of model fine-tuning using ROUGE, with the results demonstrating its efficacy. Additionally, the effect of GraphRAG-enhanced reasoning was illustrated through the visualization of the knowledge graph. For the overall response evaluation, both automatic and human assessment methods were employed. The results consistently indicated that, for the tested questions, the knowledge base-augmented OM-GPT model outperformed both DeepSeek and GPT-4 in over 50% of cases (and reaching 100% in some instances) across five dimensions: helpfulness, relevance, accuracy, depth, and creativity when generating responses involving prefabricated building O&M knowledge. This outcome confirms the effectiveness of both the model fine-tuning and the construction of a high-quality prefabricated building O&M management knowledge base using the GraphRAG framework. Furthermore, the two-way ANOVA revealed a p-value of less than 0.05 for all dimensions concerning the large language model type. This statistically proves that there are significant performance differences among the three models in each dimension, with the knowledge base-augmented OM-GPT being significantly superior to other models across all dimensions.

Although the proposed method demonstrates superior performance on our constructed dataset for prefabricated building operation and maintenance (O&M), several limitations remain that warrant further exploration in future research: (1) Data Breadth and Knowledge Depth: The quality of the model’s responses is highly dependent on the granularity of domain-specific expertise. There is a critical need to expand the dataset with more high-quality, fine-grained data for model fine-tuning. Currently, the knowledge base is primarily constructed from structured data sources such as public white papers and monitoring logs, lacking a systematic integration of tacit knowledge, such as the non-standardized maintenance experiences of senior engineers. Furthermore, when encountering entirely new building types not covered in the training set, the model may face generalization challenges due to domain shift. (2) Practical Deployment and Evaluation: The inference latency and computational costs of large language models on resource-constrained edge devices require further optimization. Additionally, despite the introduction of statistical validation, potential subjective biases inherent in expert evaluations remain difficult to eliminate completely.

In light of these limitations and the dynamic evolution of industry standards, future research will focus on the following three directions: (1) Continuous evolution of the knowledge system: We will establish a periodic knowledge base maintenance strategy. Once new industry standards or regulatory provisions are released, they will be incorporated into the data source, leveraging GraphRAG technology to extract newly emerging key entities and relationships for the incremental expansion and version iteration of the existing knowledge graph. (2) Multi-source heterogeneous data fusion and generalization enhancement: We aim to introduce digital twin real-time data streams and federated learning technologies. This will facilitate the fusion of tacit knowledge from multiple projects while preserving data privacy. Furthermore, few-shot learning techniques will be employed to enhance the model’s robustness and adaptability to unknown building types. (3) Lightweight Deployment and Objective Evaluation: We will explore model quantization and knowledge distillation algorithms to lower computational barriers. Concurrently, by integrating novel automated evaluation mechanisms, we aim to further mitigate human bias, thereby improving the system’s practicality and fairness in complex engineering scenarios.

Author Contributions

Conceptualization, L.S. and Y.Z.; methodology, L.S.; writing—original draft, L.Z.; writing—review & editing, Y.Z. and I.F.; supervision, Y.Z. and I.F.; project administration, L.S.; funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number [Grant No. 71901077].

Data Availability Statement

Due to privacy and copyright issues, the primary data used for this study are companies’ internal materials and cannot be published.

Conflicts of Interest

The authors declare no conflict of interest.

References

Daly, M.; Kempton, L.; McCarthy, T. Sustainability of Prefabricated Construction in Australia: Industry Perspectives on Challenges and Opportunities. J. Build. Eng. 2025, 102, 111805. [Google Scholar] [CrossRef]
Lin, Z.; Hu, X.; Zhang, Y.-X.; Chen, Z.; Fang, Z.; Chen, X.; Li, A.; Vepakomma, P.; Gao, Y. SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models. arXiv 2024, arXiv:2407.00952. [Google Scholar]
Yuan, Z.; Sun, C.; Wang, Y. Design for Manufacture and Assembly-Oriented Parametric Design of Prefabricated Buildings. Autom. Constr. 2018, 88, 13–22. [Google Scholar] [CrossRef]
Chang, L.; Zhao, S. Risk Evaluation of Prefabricated Building Construction Based on PTF-VIKOR of Prospect Theory. Alex. Eng. J. 2025, 115, 147–159. [Google Scholar] [CrossRef]
Luo, L.; Wu, X.; Hong, J.; Wu, G. Fuzzy Cognitive Map-Enabled Approach for Investigating the Relationship between Influencing Factors and Prefabricated Building Cost Considering Dynamic Interactions. J. Constr. Eng. Manag. 2022, 148, 04022081. [Google Scholar] [CrossRef]
Sukhwal, P.C.; Rajan, V.; Kankanhalli, A. A Joint LLM-KG System for Disease Q&A. IEEE J. Biomed. Health Inform. 2025, 29, 2257–2270. [Google Scholar] [CrossRef]
Chen, L.; Darko, A.; Zhang, F.; Chan, A.P.C.; Yang, Q. Can Large Language Models Replace Human Experts? Effectiveness and Limitations in Building Energy Retrofit Challenges Assessment. Build. Environ. 2025, 276, 112891. [Google Scholar] [CrossRef]
Lee, J.; Ahn, S.; Kim, D.; Kim, D. Performance Comparison of Retrieval-Augmented Generation and Fine-Tuned Large Language Models for Construction Safety Management Knowledge Retrieval. Autom. Constr. 2024, 168, 105846. [Google Scholar] [CrossRef]
Xia, L.; Liang, Y.; Leng, J.; Zheng, P. Maintenance Planning Recommendation of Complex Industrial Equipment Based on Knowledge Graph and Graph Neural Network. Reliab. Eng. Syst. Saf. 2023, 232, 109068. [Google Scholar] [CrossRef]
Zhou, B.; Hua, B.; Gu, X.; Lu, Y.; Peng, T.; Zheng, Y.; Shen, X.; Bao, J. An End-to-End Tabular Information-Oriented Causality Event Evolutionary Knowledge Graph for Manufacturing Documents. Adv. Eng. Inform. 2021, 50, 101441. [Google Scholar] [CrossRef]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2023, 36, 3580–3599. [Google Scholar] [CrossRef]
Kim, J.; Kwon, Y.; Jo, Y.; Choi, E. KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models. arXiv 2023, arXiv:2310.11220. [Google Scholar] [CrossRef]
Zhou, C.; Liu, P.; Xu, P.; Iyer, S.; Sun, J.; Mao, Y.; Ma, X.; Efrat, A.; Yu, P.; Yu, L.; et al. LIMA: Less Is More for Alignment. arXiv 2023, arXiv:2305.11206. [Google Scholar] [CrossRef]
Wei, L.; Jiang, Z.; Huang, W.; Sun, L. InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4. arXiv 2023, arXiv:2308.12067. [Google Scholar]
He, Z.; Chen, H. Critical Factors for Practicing Sustainable Construction Projects in Environmentally Fragile Regions Based on Interpretive Structural Modeling and Cross-Impact Matrix Multiplication Applied to Classification: A Case Study in China. Sustain. Cities Soc. 2021, 74, 103238. [Google Scholar] [CrossRef]
Yan, H.; He, Z.; Gao, C.; Xie, M.; Sheng, H.; Chen, H. Investment Estimation of Prefabricated Concrete Buildings Based on XGBoost Machine Learning Algorithm. Adv. Eng. Inform. 2022, 54, 101789. [Google Scholar] [CrossRef]
Batikha, M.; Jotangia, R.; Baaj, M.Y.; Mousleh, I. 3D Concrete Printing for Sustainable and Economical Construction: A Comparative Study. Autom. Constr. 2022, 134, 104087. [Google Scholar] [CrossRef]
Liu, C.; Wu, J.; Jiang, X.; Gu, Y.; Xie, L.; Huang, Z. Automatic Assembly of Prefabricated Components Based on Vision-Guided Robot. Autom. Constr. 2024, 162, 105385. [Google Scholar] [CrossRef]
Liu, P.; Qi, H.; Liu, J.; Feng, L.; Li, D.; Guo, J. Automated Clash Resolution for Reinforcement Steel Design in Precast Concrete Wall Panels via Generative Adversarial Network and Reinforcement Learning. Adv. Eng. Inform. 2023, 58, 102131. [Google Scholar] [CrossRef]
Alabbasi, M.; Agkathidis, A.; Chen, H. Robotic 3D Printing of Concrete Building Components for Residential Buildings in Saudi Arabia. Autom. Constr. 2023, 148, 104751. [Google Scholar] [CrossRef]
Nguyen, D.-C.; Jeon, C.-H.; Roh, G.; Shim, C.-S. BIM-Based Preassembly Analysis for Design for Manufacturing and Assembly of Prefabricated Bridges. Autom. Constr. 2024, 160, 105338. [Google Scholar] [CrossRef]
Pan, W.; Gibb, A.G.F.; Dainty, A.R.J. Strategies for Integrating the Use of Off-Site Production Technologies in House Building. J. Constr. Eng. Manag. 2012, 138, 1331–1340. [Google Scholar] [CrossRef]
Pan, H.; Yang, B.; Pan, Y.; Luo, Z. Evolutionary Game of Incentive Strategy for Chinese Prefabricated Buildings Based on System Dynamics from the Perspective of Prospect Theory. Eng. Constr. Archit. Manag. 2024, ahead-of-print. [Google Scholar] [CrossRef]
Chen, G.; Huang, J.; Wang, J.; Wei, J.; Shou, W.; Cao, Z.; Pan, W.; Zhou, J. Optimal Procurement Strategy for Off-Site Prefabricated Components Considering Construction Schedule and Cost. Autom. Constr. 2023, 147, 104726. [Google Scholar] [CrossRef]
Zhang, Y.; Xu, L.; Xue, X.; Wang, Z.; Skibniewski, M. Formation and Dynamics of Prefabricated Building Innovation Network. Eng. Constr. Archit. Manag. 2024, ahead-of-print. [Google Scholar] [CrossRef]
Xu, Z.; Kang, R.; Lu, R. 3D Reconstruction and Measurement of Surface Defects in Prefabricated Elements Using Point Clouds. Comput.-Aided Civ. Infrastruct. Eng. 2020, 34, 04020033. [Google Scholar] [CrossRef]
Luo, L.; Shen, G.Q.P.; Xu, G.; Liu, Y.; Wang, Y. Stakeholder-Associated Supply Chain Risks and Their Interactions in a Prefabricated Building Project in Hong Kong. J. Manag. Eng. 2019, 35, 05019008. [Google Scholar] [CrossRef]
Luo, L.; Jin, X.; Shen, G.; Wang, Y.; Liang, X.; Li, X.; Li, C.Z. Supply Chain Management for Prefabricated Building Projects in Hong Kong. J. Manag. Eng. 2020, 36, 04020023. [Google Scholar] [CrossRef]
Marocco, M.; Garofolo, I. Integrating Disruptive Technologies with Facilities Management: A Literature Review and Future Research Directions. Autom. Constr. 2021, 131, 103917. [Google Scholar] [CrossRef]
Rashid, K.M.; Louis, J.; Fiawoyife, K.K. Wireless Electric Appliance Control for Smart Buildings Using Indoor Location Tracking and BIM-Based Virtual Environments. Autom. Constr. 2019, 101, 48–58. [Google Scholar] [CrossRef]
Vitiello, U.; Ciotta, V.; Salzano, A.; Asprone, D.; Manfredi, G.; Cosenza, E. BIM-Based Approach for the Cost-Optimization of Seismic Retrofit Strategies on Existing Buildings. Autom. Constr. 2019, 98, 90–101. [Google Scholar] [CrossRef]
Bonci, A.; Carbonari, A.; Cucchiarelli, A.; Messi, L.; Pirani, M.; Vaccarini, M. A Cyber-Physical System Approach for Building Efficiency Monitoring. Autom. Constr. 2019, 102, 68–85. [Google Scholar] [CrossRef]
Ma, G.; Wu, Z. BIM-Based Building Fire Emergency Management: Combining Building Users’ Behavior Decisions. Autom. Constr. 2020, 109, 102975. [Google Scholar] [CrossRef]
Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey. arXiv 2024, arXiv:2402.12104. [Google Scholar]
Liu, X.; Ji, K.; Fu, Y.; Du, Z.; Yang, Z.; Tang, J. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-Tuning Universally Across Scales and Tasks. arXiv 2021, arXiv:2110.07602. [Google Scholar]
Hu, J.E.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
Mao, Y.; Mathias, L.; Hou, R.; Almahairi, A.; Ma, H.; Han, J.; Yih, W.-t.; Khabsa, M. UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, Online, 1–6 August 2021. [Google Scholar]
Fitkau, I.; Hartmann, T. An Ontology-Based Approach of Automatic Compliance Checking for Structural Fire Safety Requirements. Adv. Eng. Inform. 2024, 59, 102314. [Google Scholar] [CrossRef]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-t.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2020, arXiv:2005.11401. [Google Scholar]
Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Guo, Q.; Wang, M.; et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
Heredia Álvaro, J.A.; Barreda, J.G. An Advanced Retrieval-Augmented Generation System for Manufacturing Quality Control. Adv. Eng. Inform. 2025, 64, 103007. [Google Scholar] [CrossRef]
Uhm, M.; Kim, J.; Ahn, S.; Jeong, H.; Kim, H. Effectiveness of Retrieval Augmented Generation-Based Large Language Models for Generating Construction Safety Information. Autom. Constr. 2025, 170, 105926. [Google Scholar] [CrossRef]
Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Larson, J. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv 2024, arXiv:2404.16130. [Google Scholar] [CrossRef]
He, X.; Tian, Y.; Sun, Y.; Chawla, N.; Laurent, T.; LeCun, Y.; Bresson, X.; Hooi, B. G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. arXiv 2024, arXiv:2402.07630. [Google Scholar]
Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef]
Fu, Y.; Xu, C.; Zhang, L.; Chen, Y. Control, Coordination, and Adaptation Functions in Construction Contracts: A Machine-Coding Model. Autom. Constr. 2023, 152, 104890. [Google Scholar] [CrossRef]
Zhong, Y.; Goodfellow, S.D. Domain-Specific Language Models Pre-Trained on Construction Management Systems Corpora. Autom. Constr. 2024, 160, 105316. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Zhang, L.; Han, W.; Huang, M.; et al. Pre-Trained Models: Past, Present and Future. arXiv 2021, arXiv:2106.07139. [Google Scholar] [CrossRef]
Jeong, C. Fine-Tuning and Utilization Methods of Domain-Specific LLMs. arXiv 2024, arXiv:2401.02981. [Google Scholar]
Parthasarathy, V.B.; Zafar, A.; Khan, A.I.; Shahid, A. The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities. arXiv 2024, arXiv:2408.13296. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Traag, V.A.; Waltman, L.; van Eck, N.J. From Louvain to Leiden: Guaranteeing Well-Connected Communities. Sci. Rep. 2018, 9, 5233. [Google Scholar] [CrossRef]
Rezaei, M.R.; Fard, R.S.; Parker, J.; Krishnan, R.G.; Lankarany, M. Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge. arXiv 2025, arXiv:2502.13010. [Google Scholar] [CrossRef]
Bommasani, R.; Liang, P.; Lee, T. Holistic Evaluation of Language Models. Ann. N. Y. Acad. Sci. 2023, 1525, 140–146. [Google Scholar] [CrossRef]
Chang, Y.-C.; Wang, X.; Wang, J.; Wu, Y.; Zhu, K.; Chen, H.; Yang, L.; Yi, X.; Wang, C.; Wang, Y.; et al. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 2023, 15, 39. [Google Scholar] [CrossRef]
Zheng, L.; Chiang, W.-L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.P.; et al. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023; Volume 36, pp. 46595–46623. [Google Scholar]
Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large Language Models Encode Clinical Knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef]
Ziems, C.; Held, W.B.; Shaikh, O.; Chen, J.; Zhang, Z.; Yang, D. Can Large Language Models Transform Computational Social Science? Comput. Linguist. 2023, 50, 237–291. [Google Scholar] [CrossRef]

Figure 1. LLMs fine-tuning process diagram.

Figure 2. GraphRAG framework implementation process diagram.

Figure 3. GPT-4 automated evaluation prompts.

Figure 4. Expert composition.

Figure 5. ROUGE assessment results. The bar chart displays the ROUGE-1 (blue) and ROUGE-2 (green) scores between the model outputs and the reference answers across 102 test samples. The red horizontal line represents the perfect score (1.0). The results show that the model scores remain stable between 0.7 and 1.0 with minimal fluctuation, indicating that the model possesses good performance and stability after fine-tuning.

Figure 6. Illustration of the knowledge graph of prefabricated building O&M management.

Figure 7. Automated evaluation results.

Figure 8. Descriptive statistics.

Figure 9. Ablation study results.

Table 1. LoRA training parameter setting.

Parameter	Parameter Configuration	Specifications
r	16	r > 0, with suggested values of r = 8, 16, 32, 64. Determine the size of the LoRA matrix; a higher rank can store more information but will increase the computational and memory cost of LoRA. In this study, r = 16 is chosen to avoid overfitting the dataset [51,52].
Learning_rate	2 × 10⁻⁴	The hyperparameter that controls the update step of the model parameters. The initial learning rate is 2 × 10⁻⁴ [52], and the cosine annealing strategy achieves stable convergence. This value has been experimentally verified to balance training speed and model performance.
Lora_alpha	16	LoRA’s scaling factor, a larger number will allow fine-tuning to learn more about the dataset but may lead to overfitting.
Lora_dropout	0	Leaving it at 0 enables faster training and reduces overfitting [53].
Random_state	3407	Used to generate random seeds to ensure reproducibility.

Table 2. Expert scoring criteria.

Dimension	Evaluation Description	Scoring Range
Logical consistency	Whether the logic of the answer content is coherent and consistent	1~10
Sentence fluency	Whether the generated responses are natural and easy to understand in terms of language expression	1~10
Response completeness	Whether the answers provided are comprehensive and cover all the key points of the question	1~10
Practical application	Whether the content of the answer is practical, has strong operability, and can directly and effectively solve users’ problems or needs	1~10
Relevance	Whether the answers are closely related to the questions asked by users	1~10

Table 3. Statistics about the graph.

Data	Count	Note
Entities	289	Event (113), Organization (109), Uncategorized (41), Person (17), GEO (9)
Relationships	346	The relationship between entities
Communities	37	Communities identified through graph clustering
Community Report	37	Automatic summary of each community
Text Units	85	Original segmented text block

Table 4. Comparison of model responses.

In Prefabricated Concrete Buildings, PC Precast Composite Floor Slabs May Develop Cracks and Leakage at the Cast-in-Place Slab Joints. Please Provide Relevant Recommendations for Measures to Address This Issue?
DeepSeek	(1) Optimize node design: Strengthen the connection structure between cast-in-place strips and precast slabs, add crack-resistant steel mesh or fiber-reinforced materials to reduce shrinkage stress; (2) Control materials and construction: Use micro-expansion concrete or add anti-crack fibers to reduce shrinkage; ensure the base layer is moist before casting the in-place strips, compact in layers using vibration; strengthen curing, cover for moisture retention for ≥7 days; (3) Waterproofing: Add elastic sealant or waterproof membrane at joints and conduct water-tightness tests for verification; (4) Process control: Strictly check the stiffness of formwork supports, avoid premature formwork removal, and ensure coordinated deformation between cast-in-place strips and precast slabs. By controlling multiple aspects including design, materials, and construction, the risks of cracks and leakage can be effectively reduced.
GPT-4	(1) Improve construction quality: Whether it is concrete pouring or the installation of prefabricated panels, construction workers need high-quality construction skills. Training and education can be used to improve workers’ technical level and construction quality; (2) Strict quality supervision: During the construction process, quality supervision should be strengthened to ensure that all construction steps meet the prescribed standards. Problems should be identified and resolved in a timely manner; (3) Use high-quality materials: Choosing high-quality, accurately sized prefabricated panels and concrete can effectively prevent cracking and leakage problems; (4) Timely maintenance and repair: For existing problems, such as cracks and leaks, maintenance and repair should be carried out promptly to prevent further development of the issues; (5) Improve design: At the design stage, potential problems that may arise during construction should be considered, and structures with better crack and leak resistance should be designed.
Knowledge base-enhanced OM-GPT	(1) Local search ① Strengthen concrete vibration, especially at the composite slab strips, strictly follow the code requirements for vibrating, and no construction joints should be left in the slab strips during pouring; ② Strictly control the formwork flatness during formwork construction, set grout stop strips, and after the slab surface is cleaned and accepted as qualified, proceed with lifting work, while enhancing the rigidity of the lower support frame; ③ Strengthen curing, delay the application of construction loads on the upper layer, and postpone the removal of the lower frame; ④ Treatment measures: First, chisel open the crack, and chisel about five centimeters on both sides of the crack; then, ensure that the joint surface maintains a sloped shape, and chisel the surface and sides of the prefabricated slab. After chiseling, clean the area thoroughly, and then brush a layer of pure cement mortar inside the crack. (2) Global search Design Stage: ① Optimize structural design: Fully consider the connection methods between PC components and cast-in-place sections. Use appropriate node designs to ensure firm connections and reduce cracks and leakage caused by stress concentration at joints; ② Add waterproof layers: Install reliable waterproof layers at the composite slab seams, such as SBS membranes or polyurethane coatings, clearly specifying the technical parameters and usage requirements of the waterproof materials in the construction drawings; ③ Refine structural detailing: Add water-stop strips at the composite slab seams and optimize the concrete vibration process parameters to ensure controllable construction quality; ④ Establish a ‘structural self-waterproofing + material waterproofing’ dual-system: apply polymer self-adhesive waterproof membranes in the composite layers and incorporate expansive anti-crack waterproofing agents into the cast-in-place layer, creating a dense millimeter-level waterproof barrier. Construction Stage: ① Strictly control the installation accuracy of PC components, ensure accurate alignment of joints, and reduce leakage and cracks caused by misalignment; ② Strengthen the management of template engineering, use high-quality template materials, seal the joints of the templates properly, and pay special attention to the reinforcement of the templates at the laminated board strips; ③ Strengthen concrete vibration (especially at the composite panel strip), strictly follow the specifications to vibrate until the surface is bleeding and no bubbles emerge; ④ During pouring, no construction joints shall be left at the strip, and the position of construction joints shall be arranged reasonably to avoid easily stressed areas; ⑤ Improve the quality of concrete, select high-quality aggregates and additives, ensure accurate mix proportions, and fully wet the surface of PC components before pouring; ⑥ Clean the surface of the formwork before installation and pass the acceptance inspection, strictly control the flatness of the formwork. ⑦ Deploy millimeter level laser displacement detection system to achieve real-time monitoring of PC component installation accuracy. Post maintenance Stage: ① Regular inspection and maintenance: Establish a comprehensive inspection system to promptly detect and address issues such as cracks and leaks. For cracks that have already appeared, epoxy resin or other specialized materials can be used for sealing and reinforcement; ② Strengthen maintenance work: Ensure that the cast-in-place concrete reaches a certain strength before demoulding, and immediately carry out sufficient maintenance after demoulding to avoid cracks caused by drying shrinkage. ③ Process integration innovation: adopting the combined process of “high-frequency vibration + vacuum dehydration” to reduce the porosity of concrete; Develop an intelligent vibration monitoring system to provide real-time feedback on vibration quality; ④ Develop phase change energy storage maintenance film, establish maintenance parameter database, and automatically adjust maintenance plan according to environmental temperature. (3) Tavily search ① A review of research on prefabricated assembled concrete composite floor slabs. This article systematically elaborates on the bending, shear, and connection performance of composite floor slabs, and points out the shortcomings in the development research of composite slabs, providing strong support for the derivation of China’s construction industrialization development. ② [Prefabricated Building Training] Prefabricated PC Components--Prefabricated Composite Plate-Zhihu. Composite Floor is a structural form that combines prefabricated floor slabs and cast-in-place floor slabs. The bottom uses prefabricated concrete thin plates as permanent templates, and the upper part is poured with cast-in-place concrete composite layers. The two parts of the concrete are subjected to overall force, forming an assembled integrated composite floor slab. ③ Research on Construction Technology of PC Prefabricated Components in Prefabricated Buildings-China Journal Network Research on Construction Technology of PC Prefabricated Components in Prefabricated Buildings. Shen Wei. Shanghai Chunli Construction Engineering Consulting Co., Ltd. (Shanghai, China).

Table 5. Summary of ANOVA test results.

Dimensions	Source of Variation	SS	df	MS	F	p-Value	F Crit
Logical consistency	Type of LLMs	103.4381	2	51.7190	38.9703	0.0000	3.0104
	Questions	3.0381	4	0.7595	0.5723	0.6828	2.3864
	Interaction	4.1333	8	0.5167	0.3893	0.9265	1.9534
	Within	816.1905	615	1.3271
	Total	926.8000	629
Sentence fluency	Type of LLMs	103.8698	2	51.9349	39.5635	0.0000	3.0104
	Questions	9.5651	4	2.3913	1.8216	0.1230	2.3864
	Interaction	5.4159	8	0.6770	0.5157	0.8450	1.9534
	Within	807.3095	615	1.3127
	Total	926.1603	629
Response completeness	Type of LLMs	285.2790	2	142.6400	95.4949	0.0000	3.0104
	Questions	4.6095	4	1.1524	0.7715	0.5440	2.3864
	Interaction	19.6571	8	2.4571	1.6450	0.1090	1.9534
	Within	918.6190	615	1.4937
	Total	1228.1700	629
Practical application	Type of LLMs	236.6890	2	118.3440	77.6616	0.0000	3.0104
	Questions	4.2222	4	1.0556	0.6927	0.5972	2.3864
	Interaction	13.6444	8	1.7056	1.1192	0.3480	1.9534
	Within	937.1670	615	1.5239
	Total	1191.72	629
Relevance	Type of LLMs	158.2600	2	79.1302	58.8826	0.0000	3.0104
	Questions	2.7365	4	0.6841	0.5091	0.7291	2.3864
	Interaction	5.1683	8	0.6460	0.4807	0.8702	1.9534
	Within	826.4760	615	1.3439
	Total	992.6410	629

Note: SS stands for standard deviation, df denotes degree of freedom, MS represents mean square, F means F-statistic, F crit designates F-critical value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, L.; Zou, L.; Zhang, Y.; Flood, I. OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management. Buildings 2026, 16, 1429. https://doi.org/10.3390/buildings16071429

AMA Style

Sun L, Zou L, Zhang Y, Flood I. OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management. Buildings. 2026; 16(7):1429. https://doi.org/10.3390/buildings16071429

Chicago/Turabian Style

Sun, Lingzhi, Linyan Zou, Yuanxin Zhang, and Ian Flood. 2026. "OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management" Buildings 16, no. 7: 1429. https://doi.org/10.3390/buildings16071429

APA Style

Sun, L., Zou, L., Zhang, Y., & Flood, I. (2026). OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management. Buildings, 16(7), 1429. https://doi.org/10.3390/buildings16071429

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

OM-GPT: A Knowledge-Augmented and Fine-Tuned Large Language Model for Prefabricated Building Operation and Maintenance Management

Abstract

1. Introduction

2. Literature Review

2.1. Research Related to Prefabricated Buildings

2.2. Research Related to LLMs

2.3. Points of Departure

3. Methodology

3.1. Data Collection and Processing

3.2. OM-GPT Model Development

3.3. Developing a GraphRAG-Powered Knowledge Base for Automated Prefabricated Building O&M

3.3.1. Knowledge Graph Development

3.3.2. Knowledge Base-Augmented OM-GPT for O&M

3.4. Model Evaluation

4. Results

4.1. Validation

4.1.1. Performance Benchmarking of Parameter Optimized-OM-GPT

4.1.2. Validation of Knowledge Graph of Prefabricated Building O&M Management

4.1.3. Analysis of Knowledge Base-Augmented OM-GPT Model Responses

4.2. Performance Evaluation

4.2.1. Automatic Evaluation

4.2.2. Manual Evaluation

4.3. Ablation Study

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI