Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs

Yang, Ninghui; Xu, Na; Zhong, Dongqing; Guo, Jin

doi:10.3390/buildings15183264

Open AccessArticle

Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs

¹

Traffic Engineering Department, Yancheng Institute of Technology, Yancheng 224051, China

²

School of Mechanics and Civil Engineering, China University of Mining and Technology, Xuzhou 221008, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(18), 3264; https://doi.org/10.3390/buildings15183264

Submission received: 28 July 2025 / Revised: 31 August 2025 / Accepted: 2 September 2025 / Published: 10 September 2025

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges in multimodal information integration and the inefficiency of knowledge transfer in the construction technology disclosure of deep foundation pit projects, an intelligent generation method based on graph rule reasoning and template mapping was proposed. First, a multi-level domain knowledge structure model was constructed by designing domain concepts and relationship types using the Work Breakdown Structure (WBS). Second, entity and attribute extraction was performed using regular expressions and the BERT-BiLSTM-CRF model, while relationship extraction was conducted based on text structure combined with the BERT-CNN model. For image and video data, cross-modal data chains were built by adding keyword tags and generating URLs, utilizing semantic association rules to form a multimodal knowledge graph of the domain. Finally, based on graph reasoning and template mapping technology, the intelligent generation of construction disclosure schemes was realized. The case verification results showed that the proposed method significantly improved the structural integrity, procedural logical consistency, parameter traceability, knowledge reuse rate, environmental compliance, and parameter compliance of the schemes. This method not only promoted the standardization and efficiency of construction technology disclosure activities for deep foundation pit projects but also enhanced the visualization and intelligence level of the schemes.

Keywords:

deep foundation pit engineering; construction technology disclosure; multimodal knowledge graph; intelligent scheme generation

1. Introduction

With the continuous acceleration of urbanization in China, deep foundation pit engineering, as a key component in the development of urban underground space, has had its construction technology disclosure quality and efficiency directly related to the safety and benefits of project construction. According to the latest data from the China Statistical Yearbook 2024, China’s urbanization rate increased from 36.2% in 2000 to 67.0% in 2024, with an annual growth rate of 2.6% [1]. This rapid urbanization process has led to the emergence of a large number of high-rise buildings, rail transportation, and underground integrated pipe gallery infrastructure projects. These projects often required crossing complex geological units, such as soft soils, gravel and sand, and high-pressure water layers, and their construction safety was directly related to the stability of the urban lifeline system. This complexity introduces uncertainties in geotechnical conditions (e.g., variability in soil stratigraphy, groundwater levels, and unexpected geological anomalies), organizational factors (e.g., coordination efficiency among design, supervision, and construction teams), and technological implementation (e.g., equipment performance stability and adaptability of construction methods), all of which may hinder effective risk control and the timely, accurate delivery of construction technology disclosure. Against this background, the traditional construction technology disclosure model, which relied on paper documents and manual explanations, has become insufficient to meet the high demands for knowledge transfer efficiency and accuracy in modern deep foundation pit engineering [2].

Currently, the construction technology disclosure of deep foundation pit projects is faced with multiple challenges. Field surveys on six deep foundation pit projects (2021–2023) revealed that preparing a complete disclosure package took over 18 working days on average, with about 12 manual revisions per project due to inconsistencies in multi-source data. Over 40% of documents contained outdated parameters, extending safety review cycles by 5–7 days. Additionally, 55% of technical managers cited the lack of an integrated, intelligent system as a key cause of inefficiency, while 62% attributed frequent on-site misinterpretations to the absence of multimodal information linkage. These findings underscore the need for a knowledge-driven, intelligent disclosure model. First, the disclosure knowledge exhibited typical multimodal characteristics, including text specifications, design drawings, construction videos, monitoring data, and other forms, and these heterogeneous data lacked effective integration mechanisms. Research indicated that over 65% of construction accidents originated from information transmission distortion or misunderstanding of multi-source information. Second, the traditional disclosure process heavily relied on personal experience. Industry surveys showed that about 78% of construction companies still relied on manual preparation of disclosure documents, resulting in a knowledge reuse rate of less than 30% [3]. Even more critically, with the expansion of project scale and the complexity of geological conditions, the disclosure content grew exponentially. For example, in a large-scale subway construction project in a major city, the disclosure document for a single deep foundation pit station exceeded 500 pages, highlighting the growing contradiction between manual processing efficiency and engineering demands [4].

Multimodal knowledge graph technology was provided as a new approach to solving the above problems. This technology integrated text, images, videos, and other diverse data through a structured semantic network, enabling the systematic organization and intelligent reasoning of knowledge. In fields such as healthcare and finance, multimodal knowledge graphs have demonstrated strong capabilities in knowledge integration. For instance, the medical knowledge graph in Google Health connected clinical guidelines, imaging data, and case records, improving diagnostic accuracy by more than 40% [5]. The medical multimodal relational graph learning method developed by Hu et al. (2024), though targeting the healthcare domain, offers valuable insights for cross-modal relationship modeling in construction scenarios through its interdisciplinary approach [6]. Meanwhile, Huan et al. (2025) systematically evaluated the application of BIM in elderly care facility management, emphasizing the foundational role of information standardization in multi-source data fusion—a critical consideration equally applicable to data governance in construction knowledge graphs [7]. However, in the field of construction engineering, particularly in the specialized area of deep foundation pit engineering, the application research of multimodal knowledge graphs was still in its early stages. Existing literature indicated that the intelligence of construction technology disclosure in deep foundation pit engineering still needed improvement, and the construction technology disclosure knowledge for various types of deep foundation pit projects remained in an independent and scattered state. The tasks of knowledge integration and standardization of disclosure knowledge still required faster implementation.

To address the above issues, this study aims to solve several key problems in the field of deep foundation pit construction technology disclosure. Specifically, the objectives are: (1) to automate the generation of disclosure schemes by integrating multimodal domain knowledge, thereby reducing reliance on manual preparation and improving efficiency; (2) to achieve semantic modeling of the sequence of construction operations using a multimodal knowledge graph, enabling the capture of interdependencies among tasks, resources, and constraints; and (3) to enhance the reproducibility and standardization of technological solutions by linking historical cases, standard specifications, and real-time project data within an intelligent generation framework.

2. Literature Review

This paper provides a systematic review of the research status of construction technology disclosure in deep foundation pit engineering, the application progress of knowledge graphs in the construction field, and the key technologies for constructing multimodal knowledge graphs. The aim was to provide a theoretical foundation and technical reference for the intelligent generation of deep foundation pit engineering construction technology disclosure schemes based on multimodal knowledge graphs.

2.1. Research Status of Construction Technology Disclosure in Deep Foundation Pit Engineering

Construction technology disclosure in deep foundation pit engineering was a key process to ensure project quality, safety, and progress. Its core objective was to accurately and clearly convey design intentions, construction plans, safety standards, and risk control measures to construction personnel. Traditional disclosure methods mainly relied on text, drawings, verbal explanations, and on-site demonstration, but this model faced numerous challenges [8]. First, the issue of information silos was particularly prominent. Design information, geological survey data, construction process data, and monitoring data were scattered across different systems and documents, making it difficult to form a unified and coherent knowledge system. This led to difficulties in information retrieval and low knowledge reuse [9]. Second, the efficiency of knowledge transfer in traditional methods was low. As information was often transmitted in a one-way direction and lacked effective feedback mechanisms and interactivity, construction personnel’s understanding of the information could not be assessed in real-time, which may have led to misunderstandings or construction errors. Third, there was a lack of personalization and adaptability. The needs for construction technology disclosure varied depending on different personnel, processes, and site conditions. However, traditional methods often provided a “one-size-fits-all” approach, resulting in redundant information and missing critical details. Finally, poor information transmission or misunderstanding could lead to operational errors by construction personnel, triggering safety accidents or quality issues, causing irreversible damage to the project [10].

To address the shortcomings of traditional disclosure methods, researchers began exploring the integration of Building Information Modeling (BIM) technology into deep foundation pit engineering management. BIM provided three-dimensional visualization capabilities, which could intuitively display the pit structure and construction process, thus improving the efficiency of technology disclosure to some extent [11]. However, BIM models mainly focused on geometric and attribute information, and there were still shortcomings in integrating deep semantic knowledge, reasoning capabilities, and multimodal heterogeneous data. As a result, achieving truly knowledge-driven intelligent disclosure remained difficult. Yue Pan et al. (2025) developed a spatiotemporal deep learning model for multi-attribute prediction of excavation-induced risks, providing an intelligent tool for risk pre-control in technical briefings [12]. Meanwhile, Zhang et al. (2022) achieved automatic hazard identification on construction sites by integrating building scene graphs with domain-specific BERT knowledge, which can be directly applied to safety warnings in briefing content [13]. Collectively, these studies lay the foundation for transforming technical briefings from static documents into dynamic interactions and from experience-based judgments to data-driven decision-making. Therefore, constructing a technology disclosure system that could integrate multimodal information and support intelligent reasoning became an important research direction in the field of deep foundation pit engineering.

2.2. Application Progress of Knowledge Graphs in the Construction Field

As a structured knowledge representation method, knowledge graphs were able to describe concepts, entities, and their relationships in the objective world through graphs, providing strong support for intelligent information processing and knowledge reasoning. In recent years, significant progress has been made in the application of knowledge graphs in the construction field, primarily in the following aspects:

(1): Construction Knowledge Management and Retrieval: Knowledge graphs were used to structurally integrate construction knowledge scattered across various forms of text and non-text data, such as design drawings, specifications, construction logs, and accident reports, forming a vast knowledge network. By modeling entities and relationships, efficient retrieval and management of complex construction knowledge could be achieved [14]. For example, users were able to query the materials required for a specific construction process, related regulatory clauses, potential risks, and historical successful cases through the knowledge graph, significantly improving knowledge reuse and information retrieval efficiency. Hou et al. (2024) developed the DDE KG Editor system, which supports dynamic updates and quality control of large-scale geotechnical construction knowledge through collaborative editing and intelligent assistance functions [15]. Meanwhile, Lee and Lee (2024) constructed a construction risk assessment knowledge base by integrating BERT with graph models, enabling semantic organization of risk knowledge and thereby enhancing the relevance of briefing content [16].
(2): Intelligent Question Answering and Decision Support: Knowledge graph-based intelligent question answering systems were developed to understand users’ natural language queries and provide precise answers and decision support through the reasoning capabilities of the knowledge graph. On construction sites, managers or workers could ask the system questions such as, “What type of deep foundation pit support should be used under these soil conditions?” or “How should groundwater seepage be handled in a foundation pit?”. The system would retrieve and infer the most relevant solutions from the knowledge graph [17]. Zhou et al. (2025) enhanced the accuracy of construction project management question-answering systems by augmenting general-purpose large language models with domain-specific multimodal knowledge graphs—an approach that can be effectively applied to intelligent Q&A during technical briefings [18]. Complementing this, Wang et al. (2025) developed a hybrid retrieval-augmented generation framework that improves knowledge management in construction engineering, providing technical support for the automated generation of briefing documents [19].
(3): In construction safety management, Wu et al. (2023) developed a multimodal knowledge graph-based intelligent fault diagnosis model for industrial equipment, demonstrating the effectiveness of knowledge graphs in complex system state recognition—an approach transferable to deep excavation equipment management [20]. Jiang et al. (2021) established a construction safety standard knowledge graph system enabling associative querying and intelligent push of safety regulations, providing an efficient tool for safety standard referencing in technical briefings [21]. The scene graph-BERT integration method proposed by Zhang et al. (2022) essentially constructs a visual knowledge graph of construction sites, which can be directly applied to visual presentation of briefing content [13].

2.3. Key Technologies for Constructing Multimodal Knowledge Graphs

As an extension of traditional knowledge graphs, multimodal knowledge graphs were able to integrate various forms of information, such as text, images, videos, and sensor data, providing more comprehensive knowledge support for deep foundation pit construction technology disclosure. Multimodal knowledge acquisition and integration were key technological steps in the construction process.

Regarding knowledge acquisition and fusion, Saha et al. (2019) investigated complex procedure induction methods capable of querying and constructing procedures from knowledge bases without gold standards, making it applicable for extracting briefing workflows from historical construction cases [22]. Kommineni et al. (2024) proposed an LLM-supported methodology for ontology and knowledge graph construction from human experts, leveraging large language models to reduce the manual burden of knowledge acquisition [9]. Complementing these approaches, Hou et al. (2024) developed the geoscience knowledge graph editing system DDE KG Editor, which provides a solution for multi-source geological data fusion adaptable to stratigraphic data integration in deep excavation projects [15].

In terms of construction methods, the core challenge in constructing multimodal knowledge graphs was how to achieve semantic alignment and unified representation of different modal data. Lee (2024) established a construction risk assessment knowledge base by integrating BERT with graph models, which reduces reliance on manual experience in traditional approaches through automatic extraction of entity relationships from unstructured data [16]. This technique is particularly suitable for complex risk scenarios in deep excavation projects, enabling real-time correlation analysis of key indicators such as supporting structure deformation and groundwater level fluctuations. Complementing this, Song et al. (2023) proposed a scenario-driven approach that combines knowledge engineering with large language models to achieve multimodal knowledge injection for indoor robot functionality [23]. Their solution for improving data collection efficiency can be directly transferred to technical briefing scenarios in deep excavation engineering.

Knowledge representation and reasoning were the core elements that enabled multimodal knowledge graphs to deliver value. Chen et al. (2024) proposed an ontology-based approach for constructing risk knowledge bases, achieving standardized representation of risk knowledge through a six-step methodology [24]. Their rule-based reasoning mechanism enables automatic generation of prevention measures, providing a structured paradigm for intelligent briefing content generation. Similarly, Pan et al. (2025) [25] developed a digital twin foundation pit model (DTFPM) employing parametric modeling and inverse analysis algorithms, which facilitates dynamic optimization of briefing solutions. The model’s prediction error within 10% validates the accuracy advantage of intelligent systems in solution development. Collectively, these studies demonstrate that integrating knowledge graphs with real-time analysis technologies represents a core pathway for enhancing briefing quality.

Particularly noteworthy is the emerging application of multimodal knowledge graphs in the construction domain. X. Zhu et al. (2022) systematically reviewed the construction and application of multimodal knowledge graphs, highlighting that integrating multi-source data (text, images, videos, etc.) is crucial for enhancing knowledge representation completeness [11]. Song et al. (2024) proposed a scenario-driven approach for constructing multimodal knowledge graphs for embodied AI, whose scene understanding capability can be directly applied to construction environment modeling [23]. Similarly, Chen et al. (2023) surveyed recent advances in multimodal knowledge graphs, emphasizing the importance of multimodal fusion for comprehensive construction knowledge representation [26]. These studies provide theoretical support for multimodal representation in deep excavation technical briefings.

The application of knowledge graphs in construction has evolved from simple knowledge representation to complex reasoning, multimodal fusion, and dynamic updating [27]. Tam et al. (2022) developed a multi-order convolutional network-based entity alignment method that addresses the integration of construction knowledge graphs from diverse sources, facilitating more comprehensive briefing knowledge bases [28]. Roll et al. (2025) implemented graph-based retrieval-augmented generation using Neo4j, enhancing domain knowledge acquisition in multimodal large language models—an architecture adaptable for personalized briefing generation [29]. These technological advances establish the foundation for building a knowledge service system for deep excavation technical briefings.

A review of the current state revealed that traditional disclosure models had significant limitations in terms of information completeness, transmission efficiency, and personalization. The progress of knowledge graph applications in the construction field demonstrated its enormous potential in areas such as knowledge management, intelligent question answering, process monitoring, and safety management. Therefore, constructing multimodal knowledge graphs and fully utilizing the complex multimodal data on construction sites had become an important direction for the further development of knowledge graphs in the construction field.

3. Research Methods

In response to the multi-source heterogeneity of the technology disclosure data and the richness of domain knowledge, a knowledge structure model was designed. Considering the complexity, diversity of formats, and dispersion of construction documents in the deep foundation pit engineering domain, entity, relationship, and attribute knowledge extraction methods with domain adaptability were developed, and a textual knowledge graph was constructed. Through multimodal linking technology, entities from images, videos, and BIM models were connected to the textual knowledge graph, thus forming a multimodal knowledge graph. The specific process was shown in Figure 1. The methodology integrates a domain-specific hierarchical knowledge system structured around the Work Breakdown Sructure (WBS) and national technical standards. Named entities and relations are extracted from multimodal engineering texts using deep learning models such as BERT-BiLSTM-CRF and BERT-CNN. Candidate disclosure cases are retrieved using a hybrid similarity computation based on Jaccard and Word2Vec embeddings. Knowledge inference is performed through multi-hop traversal of a Neo4j-based knowledge graph, enabling the system to extract procedural steps, safety constraints, and material attributes. The inferred content is finally mapped into structured disclosure documents via a rule-driven template mechanism. This end-to-end architecture ensures that raw project information can be automatically transformed into standardized, context-aware CTD plans.

3.1. Construction of Multimodal Knowledge Graph

This study systematically reviewed technical disclosure cases and standard specification knowledge to construct a classification system for the deep foundation pit construction technology disclosure knowledge domain. This resulted in the formation of a multimodal knowledge structure model tailored for the Technical Disclosure for Deep Foundation Pit Construction (TDFPC) domain. Based on the hierarchical progression of domain knowledge and the analysis of entity concept types, a hierarchical TDFPC multimodal knowledge structure model with an “entity-relationship-attribute” framework was established (Figure 2). In this model, organizations were assigned a guiding role in engineering project construction, while the project itself included basic attributes such as the project name and location, and was further detailed with entity information like geological hydrology and surrounding environment. As a critical component of guiding construction activities, technical disclosure covered multiple aspects, including construction preparation, construction processes, quality control, safety management, civilized construction, and environmental protection. Notably, technical disclosure content included not only textual information but also supported video information, such as using videos to explain construction processes in detail. Specifically, for video data, keyframes were manually extracted and annotated with construction activity labels, which were then aligned with textual process descriptions through temporal segmentation. For BIM elements, IFC component attributes (e.g., element type, location, material) were mapped to corresponding engineering entities based on ID matching and spatial context. These annotations were embedded into the knowledge graph as instances with cross-modal references. The entire knowledge structure model strictly followed the terminological rules and constraint requirements of the standard specifications.

3.2. Knowledge Extraction

3.2.1. Knowledge Extraction Strategy

Based on the feature analysis of textual data in the deep foundation pit engineering construction technology disclosure domain, the textual corpus was classified into two categories based on the degree of information organization: highly structured text and weakly structured text. Highly structured text (e.g., standard specification documents and domain-specific books) had clear chapter divisions and terminology definitions, with information presented in a highly standardized format. Weakly structured text (e.g., construction technology disclosure records and special construction plans) had freer language expressions and required natural language processing (NLP) techniques to extract key information. To meet the extraction needs of professional concepts in this domain, this paper followed the knowledge structure model constructed in Figure 2, and systematically outlined the knowledge content characteristics, level of structuring, and corresponding extraction strategies of technical disclosure documents and standard specification documents through Table 1.

3.2.2. Entity Extraction Based on the BERT-BiLSTM-CRF Model

Entity extraction, as a key step in knowledge graph construction, aimed to extract structured knowledge units from data sources and build a knowledge network. This paper adopted a hybrid deep learning-based model architecture to achieve automated entity extraction, specifically choosing the BERT-BiLSTM-CRF combined model. This choice was based on three theoretical foundations: First, the pre-trained language model BERT generated context-aware word vector representations through large-scale corpus learning. These representations allowed for a deep understanding of the semantic and syntactic features of natural language. Second, while BiLSTM effectively modeled bidirectional temporal features, it struggled to integrate transition rules between labels during sequence labeling. To address this issue, a Conditional Random Field (CRF) was connected at the end of the neural network to establish a globally optimal decoding mechanism. By using BERT’s context embeddings as input features for the CRF, the model was able to inherit the semantic representation advantages of the pre-trained model while benefiting from the label transition constraints of CRF, forming an end-to-end joint learning framework. This multi-layered architecture not only reduced the complexity of feature engineering but also achieved collaborative optimization of parameters across components via gradient backpropagation [30]. The specific process of text training based on the BERT-BiLSTM-CRF model was shown in Figure 3.

In the BERT-BiLSTM-CRF architecture model, the layers collaborated to complete the Named Entity Recognition (NER) task. The specific functions and processing flow were as follows:

(1): Data Input Layer: This layer was responsible for receiving the raw input data and converting it into a format that could be processed by the model. The process included preprocessing steps such as tokenization, adding special markers, and segmentation, ensuring that the subsequent embedding layer could efficiently and accurately process the input data.
(2): Feature Representation Embedding Layer (BERT Layer): The character sequence {x₁, x₂, x₃, …, x_i, …} processed by the data input layer was fed into the BERT pre-trained model. The BERT model mapped each character x_i to a numerical vector v_i, realizing the conversion from characters to vectors.
(3): Bidirectional Long Short-Term Memory Network Layer (BiLSTM Layer): This layer contained two LSTM networks, which processed the feature representation embeddings vᵢ from both forward and backward directions. The results from both directions were concatenated to generate a new sequence representation. This new sequence was the output of this layer and contained scores rᵢ for each category label. The bidirectional LSTM design allowed the model to capture contextual information at each position, thus enabling a more comprehensive and in-depth analysis of the text information.
(4): Conditional Random Field Layer (CRF Layer): The CRF layer utilized the constraints learned by the model to complete the label prediction task. By combining the category label scores r_ᵢ from the BiLSTM layer, the final probability values of the entity labels were computed.
(5): Result Output Layer: The model integrated the probability values output by the CRF layer to determine the entity label for each character o_i and provided the final entity label results.

3.2.3. Entity Extraction Based on Regular Expression Template Matching

Regular expression template matching was a data extraction technique that began with the formulation of extraction rules. Before extraction, dedicated extraction rules needed to be designed based on the characteristics of the target data. The database was then scanned comprehensively according to these rules to filter out the information that met the criteria. This method was particularly suitable for text types with clear structures and highly consistent expression methods and language styles. For such texts, manual reading and analysis were required to distill the text patterns and construct the corresponding extraction rules. For structured data, the regular expression template matching method was used for entity extraction, which, compared to other methods, demonstrated significant advantages in both accuracy and efficiency. The process of rule-based data extraction was shown in Figure 4.

3.3. Knowledge Fusion

This paper established the correlation between image data, video data, and construction process entities in the knowledge graph through technical disclosure of construction process standard terminology. In the knowledge structure model constructed in Figure 2, image (cPicture) entities and video entities (cVideo) were added to represent image data and video data in deep foundation pit engineering construction technology disclosure, with attributes such as name and relative path (URL). At the same time, relationships were established between the image (cPicture) entity, video entity (cVideo), and entities like construction processes, such as cTechnology-cPicture and cTechnology-cVideo. The resulting TDFPC contained multimodal knowledge topological structures, including process flow diagrams, structural construction drawings (2D), BIM model diagrams (3D), and operation videos, as shown in Figure 5. In the case of BIM model diagrams, we performed semantic extraction rather than using them solely as visual inputs. Specifically, component metadata and spatial relationships were parsed from IFC (Industry Foundation Classes) files, including entity types (e.g., retaining walls, foundation slabs), geometric attributes, and construction logic. These semantics were then mapped to corresponding entities and relationships in the knowledge graph, enabling reasoning over spatial dependencies and procedural interconnections in deep foundation pit engineering.

In the construction of the intelligent disclosure system for deep foundation pit engineering, the semantic fusion of multi-source heterogeneous data relied on two core technologies: semantic entity mapping and feature association. Feature similarity measurement served as the foundation for this process. Traditional methods, such as the Jaccard coefficient, although having high accuracy, suffered from insufficient semantic representation capabilities and limited recall performance. Currently, mainstream semantic similarity evaluation methods could be categorized into three types: matching strategies based on semantic knowledge bases; distributed semantic representation models [31]; and deep semantic reasoning models. This study adopted the Jaccard-Word2Vec hybrid model optimized in the word vector space. This model retained the accuracy advantage of the Jaccard coefficient while integrating distributed semantic features, effectively improving the semantic sensitivity and engineering adaptability of deep foundation pit construction attribute similarity calculations [31]. For example, in the deep foundation pit domain, although “support piles” and “retaining walls” were different entities, their vectors were semantically close in the vector space due to their co-occurrence in contexts such as “support structures.”

3.4. Knowledge Storage

Currently, there were three main types of storage methods for knowledge graphs: relational databases, RDF triple databases, and native graph databases [32]. Given the application requirements of multimodal knowledge graphs in the deep foundation pit engineering technology disclosure domain, and considering the high demands for complex relationship expression in tasks such as information query efficiency, construction process modeling, and technology disclosure scheme generation, as well as the dynamic updating characteristics of the knowledge graph schema, this study selected a native graph database as the storage solution. Native graph database systems included several typical representatives, such as the open-source product Neo4j, the distributed architecture JanusGraph, OrientDB, and Cayley. Among them, Neo4j, as a widely applied open-source graph database solution in the industry, offered significant advantages, including independent operation, no reliance on external databases, and high graph traversal efficiency. Therefore, this study chose Neo4j as the storage engine for the TDFPC multimodal knowledge graph to manage core data such as entities, relationships, and attributes. To provide a clearer understanding of how domain-specific knowledge is structured within the TDFPC knowledge graph, several representative triplets are presented below (Table 2).

Neo4j provided three typical data import methods: First, manually creating nodes, edges, and attributes through Cypher Query Language (CQL) statements, which was suitable for local fine-tuning of the graph but had lower efficiency. Second, using the CQL LOAD command to load CSV files, enabling efficient batch import of entity nodes and relationships, which was suitable for large-scale data loading during the initial phase of graph construction. Third, importing data based on the Python py2neo package, which was slower but more convenient for integration with downstream applications, making it suitable for updating and merging inference results and small batches of data. This study adopted the second batch import method to handle primary data such as technical documents and standard specifications, supplemented by the first method for fine-tuning the graph, and the third method was used to connect subsequent graph inference and application stages.

3.5. Knowledge Inference

Graph retrieval technology was a relational query method based on a graph data structure model. Its core lay in parsing the multi-dimensional relationships between nodes, edges, and attributes in the knowledge graph to enable in-depth exploration of complex knowledge networks. In the deep foundation pit construction technology disclosure domain, this technology supported semantic-level associative retrieval of multi-source heterogeneous knowledge by constructing a knowledge graph containing entities such as construction plans, geological parameters, safety regulations, and their associations. Specifically, for technical requirements in deep foundation pit construction, such as support structure selection or deformation monitoring and early warning, the retrieval process could be divided into three stages: First, using natural language processing to convert the user query into a structured query vector, which matched the relevant entity nodes in the knowledge graph, and second, utilizing graph traversal algorithms (such as breadth-first search or random walk) to explore entity relationship paths connected to the initial node. For instance, starting from the “support structure” node, the system may traverse paths such as “applicable geological conditions → soil parameter thresholds → construction process flow.” A sample rule used to infer the next construction activity in the process chain can be expressed using Cypher as follows: MATCH (a:Activity)-[:isFollowedBy]->(b:Activity) WHERE a.name = ‘Drilling’ RETURN b.name. Executing this query on the graph returns “Reinforcement Cage Placement” as the immediate subsequent activity. These rule-based reasoning paths enable the dynamic construction of multi-step procedural chains from initial inputs. Finally, the retrieved knowledge entries are ranked based on semantic similarity and filtered to support accurate construction decision-making. This technology overcame the limitations of traditional keyword matching, supported multi-hop reasoning and pattern matching, and could accurately retrieve complex construction decision knowledge, such as “deep foundation pit support schemes in soft soil areas.” Its advantages were: (1) It explicitly expressed domain knowledge associations through the graph structure, improving retrieval accuracy. (2) It supported cross-validation of construction plans and safety regulations, enhancing decision reliability. (3) The path-based retrieval process was interpretable, making it easier for technical personnel to understand the logic of knowledge associations. This method provided theoretical support and practical verification for intelligent construction technology assistance.

Deep foundation pit engineering involved multi-dimensional factors such as geological conditions, support structures, construction processes, and monitoring requirements, with complex non-linear associations between these elements. For example, geological conditions (such as soft soil foundation) not only affected support structure selection (such as double-row piles) but may also have triggered the need for dewatering. The dewatering plan had to further be linked to the surrounding environment (such as a nearby subway tunnel), leading to risk assessment requirements. These associations could not be covered by a single rule and needed to be gradually revealed through multi-hop reasoning. If only single-hop reasoning (e.g., soft soil foundation → double-row piles) had been relied upon, key decision points (such as the impact of dewatering on the subway tunnel) may have been overlooked, resulting in an incomplete plan. Therefore, this study adopted multi-hop reasoning rules for knowledge inference. In particular, the system included domain-specific reasoning rules to capture implicit dependencies related to safety management and environmental protection. For example, excavation near sensitive structures triggered multi-hop inference to retrieve relevant monitoring protocols and environmental safeguards, such as noise reduction measures or real-time deformation tracking requirements. This allowed the framework to retrieve not only structural sequences but also regulatory and compliance-related actions dynamically.

4. Case Study

This study constructed an intelligent generation framework for construction technology disclosure schemes driven by multimodal knowledge, as illustrated in Figure 6. Before presenting the case implementation, we offer a detailed textual description of the algorithmic workflow underlying the proposed framework. The system operates through five key stages: (1) entity recognition, where construction-related terms and attributes are extracted from multimodal inputs; (2) case matching via similarity computation, which aligns current project data with historical disclosure cases based on weighted attribute similarity; (3) graph-based multi-hop inference, enabling retrieval of relevant procedural knowledge, specifications, and safety measures through semantic traversal of the knowledge graph; (4) template mapping, where extracted and inferred information is dynamically bound to structured disclosure templates; and (5) final document generation, which produces a standardized, regulation-compliant construction technology disclosure plan. This procedural breakdown clarifies the logical flow from unstructured input data to structured output, highlighting how the proposed framework integrates information retrieval, knowledge reasoning, and automated document synthesis in a unified architecture. Building upon this workflow, we first designed a modular technology disclosure scheme template based on domain requirements. Subsequently, entity and attribute recognition was conducted using a BERT-BiLSTM-CRF model, which identified key construction activity information such as project names, disclosure locations, construction methods, and section headings. To link these entities to relevant historical cases, keyword-based matching was initially performed within the knowledge graph. In cases where multiple candidates were retrieved, a refined similarity computation using a hybrid Cosine and Word2Vec-Jaccard algorithm was applied to rank historical cases. The top five most similar cases were retained, and the highest-scoring one was selected by default as the reference basis for graph inference and knowledge retrieval. Finally, the retrieved entities and associated attributes were integrated into the structured disclosure template to produce the final intelligent construction technology disclosure scheme.

4.1. Data Collection

This study collected textual data through official channels, such as the China Standard Resource Network (http://www.csres.com/) and the Jianbiao Database (www.jianbiaoku.com), obtaining standard specification documents related to deep foundation pit engineering. In addition, deep foundation pit construction technology disclosure documents were collected through site surveys, and relevant domain books were obtained from library research. Image data included process flow diagrams, detailed drawings of construction process nodes, construction machinery and material images, and BIM model diagrams. Video data included technology disclosure videos and safety education videos.

After acquiring the initial data, text data cleaning was performed as a core preprocessing step. The goal was to accurately identify and effectively correct errors, redundant information, and missing fields in the dataset, ensuring the integrity, consistency, and usability of the data. For image data, invalid data with resolutions lower than 64 × 64 pixels or higher than 2048 × 2048 pixels were removed, focusing on the preferred resolution range of 224 × 224 pixels to 512 × 512 pixels. For video data, an initial manual screening was conducted to remove videos unrelated to technology disclosure or outdated videos. Additionally, each video was named according to its content and labeled with key features and an incremental number for unique identification. Notably, BIM models were generated as image data and served as a supplement to the textual knowledge graph. Based on the above rules and combined with manual judgment, 230 valid, clean technology disclosure data entries were selected, as shown in Table 3. It is important to note that the 872 annotated documents used for training and evaluating the NER and RE modules constitute a superset that includes and extends beyond the 230 valid cases. The 230 selected cases represent a high-quality subset filtered from the broader annotated corpus based on completeness, multimodal richness, and domain applicability, and were specifically reserved for downstream intelligent generation and evaluation tasks.

The training corpus was annotated by three domain experts with at least five years of experience in construction engineering. To ensure annotation consistency, each sample was independently labeled by two annotators, with disagreements resolved by a third senior expert. Inter-annotator agreement was measured using Cohen’s Kappa coefficient, which reached 0.82, indicating strong consistency and reliability of the labeled data.

4.2. Experimental Environment

The BERT-BiLSTM-CRF Named Entity Recognition model developed in this study was implemented using the PyTorch 1.10 framework and Python language, running on the Windows 11 operating system. The system was equipped with an Intel^® Core™ i7-9700F CPU @ 3.00 GHz, an NVIDIA GeForce RTX 2060 graphics card, and 16 GB of memory. The main software environment included Python 3.10.11 and PyTorch for model training and evaluation. TensorFlow 1.12 was additionally used to support a rule-based module involved in visual attribute extraction. The two frameworks were used independently and results were integrated through post-processing. The model used the Adam optimization algorithm and was configured with the following key parameters: an initial learning rate of 0.001, a batch size of 20, a CRF layer learning rate of 3 × 10⁻³, a BERT layer learning rate of 5 × 10⁻⁵, and 100 iterations for data processing. Other parameter settings were shown in Table 4. The choice of BERT-BiLSTM-CRF and BERT-CNN was guided by the specific characteristics of construction terminology and the limited availability of labeled data. While alternatives such as RoBERTa, T5, and SciBERT are powerful, they typically require larger annotated corpora for fine-tuning, which is difficult to obtain in this domain. By contrast, BERT-BiLSTM-CRF captures sequential dependencies and label constraints effectively, and BERT-CNN enhances the extraction of local semantic features from short technical expressions. Preliminary experiments indicated that these two models provided a more favorable balance of accuracy and efficiency under our dataset conditions, making them suitable choices for this study. The dataset consists of 872 annotated construction disclosure documents collected from real-world deep excavation projects. Among these, representative samples were drawn from projects in Hangzhou, Suzhou, and Nanjing, encompassing a variety of geological conditions such as soft clay, silty sand, and gravel layers. This geographic and geotechnical diversity allowed the evaluation to reflect practical variability across regions and strengthened the robustness of the findings. Further multi-city case validations are planned as part of ongoing research. These were split into training (698), validation (87), and test (87) sets. The annotation covered 17 entity types and 7 relation types, following a domain-specific guideline reviewed by experts. We used a fixed random seed of 42 to ensure reproducibility. Model training was conducted on an NVIDIA RTX 3090 GPU with a batch size of 32, learning rate of 3 × 10⁻⁵, and early stopping after 6 epochs. Each training session lasted approximately 2.5 h.

4.3. Construction Activity Information Recognition

The special construction plan for deep foundation pit engineering was the basis for the construction technology disclosure of deep foundation pit projects. The content of the technology disclosure had to strictly follow the construction plan and could not deviate from or simplify key parameters, such as the design bearing capacity of the support structure. Technology disclosure was essential for the implementation of the construction plan. Therefore, before the intelligent generation of the technology disclosure scheme, the construction activity information in the special construction plan had to first be recognized.

In this section, the BERT-BiLSTM-CRF recognition model was used for identifying construction activity information, primarily identifying eight types of entities: project, project characteristics, construction process, construction materials, construction equipment, geological hydrology, surrounding environment, and construction activities. The attribute values of the deep foundation pit engineering special construction plan were divided into two levels: the first level included general project attributes applicable to the entire deep foundation pit engineering, such as foundation pit depth, soil layer type, and groundwater level; the second level included specialized engineering attributes related to the construction activities of the sub-projects to be disclosed. Different attributes were categorized into numeric and character types based on their values. Table 5 shows the selection of entities and attributes for the bored pile sub-project, where the general entities and attributes applied to case similarity calculations for all sub-projects, while the specialized attributes were used for calculating the similarity of the bored pile construction technology disclosure cases.

4.4. Calculation of Entity and Attribute Similarity for Technology Disclosure Cases

Taking the attributes of bored piles in Table 5 as an example, certain entities and attributes, such as project name and project location, did not participate in entity alignment and attribute matching. The recognition results were only used as the content output for the basic project information and project overview sections in the technology disclosure. In the case similarity measurement process, to improve the applicability and accuracy of the retrieval results, a differentiated feature weight allocation mechanism had to be constructed. By introducing a hierarchical priority strategy, different types of attributes were processed separately, achieving precise case matching. The method for determining feature weights could be divided into two categories: one was the direct weighting strategy based on domain expertise (e.g., expert experience method), which was efficient but subject to subjective bias; the other was a data-driven intelligent optimization method (e.g., genetic algorithm, simulated annealing algorithm), which required large-scale labeled datasets for parameter training. Given the current limitations of labeled data, this study adopted a direct weighting strategy based on engineering specifications and expert consensus. The priority sequence of feature attributes was determined through a structured decision-making process. For deep foundation pit engineering, general attributes controlled global engineering, so their weight was set to 0.6, with equal weight distribution within the general attributes. Specialized attributes reflected technical differences in various construction activities, so their weight was set to 0.4, with equal weight distribution within the specialized attributes. In the general attributes, the top five attributes with the most significant impact on construction technology disclosure were selected as the basis for attribute similarity calculation. These attributes were: foundation pit depth, safety level, soil layer type, surrounding environment distance, and groundwater level, which directly influenced deep foundation pit construction activities. Using bored piles as an example, the specific weights and selection criteria were shown in Table 6. The weight ratio of 0.6 (text) and 0.4 (visual) was determined through validation experiments to balance the dominant role of structured textual content with supportive visual information. Alternative ratios (e.g., 0.5/0.5 and 0.7/0.3) resulted in decreased matching accuracy. Attribute priority was based on frequency in annotated cases and regulatory importance. Future work will explore data-driven optimization strategies such as Bayesian tuning or evolutionary algorithms.

4.5. Technology Disclosure Scheme Template Mapping

This study adopted an intelligent generation method based on scheme template mapping, which dynamically associated the knowledge graph in the deep foundation pit technology disclosure domain with structured templates, realizing an intelligent process from knowledge extraction to scheme generation. The scheme template mapping technology was a technical framework that dynamically matched the entities, relationships, and attributes in the knowledge graph with predefined scheme templates. The methodology explicitly incorporated Environmental Protection and Safety Management requirements as separate modules in the disclosure template. These modules were parameterized based on national standards (e.g., GB/T 50430–2017 [33] for environmental measures; GB 50116–2013 for safety protocols) and linked in the knowledge graph through dedicated nodes and relations (e.g., cTechnology–cSafety, cTechnology–cEnvironment). Process flow templates were constructed in two steps: (1) recurrent construction tasks and sequencing rules were extracted from disclosure records and standard specifications, and then formalized into parameterized templates; (2) these templates were validated with historical cases to ensure consistency. Visual information was linked to text through entity-level associations in the knowledge graph, where each construction step was bound to related images or videos using unique identifiers and semantic tags (e.g., cTechnology–cPicture, cTechnology–cVideo). In particular, video resources were linked to construction steps through time-stamped semantic tags (e.g., cTechnology–cVideo), enabling direct playback of specific operations associated with each task. BIM elements were connected via unique component identifiers in the knowledge graph, so that each construction activity could be cross-referenced with corresponding 3D objects (e.g., excavation volumes, support structures) in the BIM model. This multimodal alignment ensures that textual descriptions, visual evidence, and digital models are consistently integrated in the disclosure process. The reasoning process was implemented using Neo4j v4.4 with the official Python driver. Cypher queries were used to model and infer task dependencies such as prerequisite steps and conditional operations. Schema constraints were applied, and indices were created on key entity attributes (e.g., name, type) to optimize retrieval performance. To generate valid execution sequences, topological sorting was applied over the inferred task graph, ensuring logical coherence and regulatory consistency.

4.5.1. Scheme Template Design

The construction technology disclosure scheme for deep foundation pit engineering must meet the requirements of standardization, operability, and knowledge reusability in engineering practice. Therefore, in designing the technology disclosure scheme template, the semantic association characteristics of the knowledge graph were first utilized to layer the scheme modules, with each module corresponding to specific entity types and relational networks within the knowledge graph. Template fields are dynamically bound to knowledge graph node attributes via placeholders (e.g., {ConstructionProcess.Name}), enabling incremental updates in response to new regulations.

Based on an analysis of deep foundation pit construction technology disclosure records and the reference book “Detailed Analysis and Typical Cases of Construction Technology Disclosure Records—Foundation Engineering” [34], the TDFPC scheme is divided into nine sections: Project Overview, Compilation Basis, Construction Preparation, Construction Process, Quality Control, Safety Management, Civilized Construction, Environmental Protection, and Others. The “Others” section includes process flow diagrams, various schematic diagrams, detailed node drawings, and videos. The specific template structure for the technology disclosure scheme is shown in Figure 7.

4.5.2. Mapping Relationship Between Scheme Template and Knowledge Graph

Based on the multimodal knowledge graph constructed in Chapter 4 and the technology disclosure scheme template designed in Section 3.1, a mapping relationship existed between the two. This mapping relationship was the core of the intelligent generation of the technology disclosure scheme. Table 3 provides a detailed display of the structure of the technology disclosure scheme template and its mapping relationship with the knowledge graph. The content specifications for each module are shown in Table 7.

4.5.3. Scheme Template Mapping Process

After the technology disclosure scheme template was defined, entity and attribute recognition could be performed based on user requirements. The information obtained through entity and attribute category similarity matching was then embedded into the corresponding Cypher query statement, connected to the Neo4j database, and the knowledge retrieval task was executed. Subsequently, based on the predefined technology disclosure scheme template, the retrieval results were embedded into the template. A schematic diagram of this implementation process was shown in Figure 8. The generation of process flow charts was implemented through rule-based traversal of the multimodal knowledge graph. Each construction task is represented as a node, while edges encode the precedence constraints among tasks. The generation process follows three logical steps: (1) identifying the starting tasks without predecessors, (2) iteratively expanding successor tasks according to topological order while ensuring all constraints are satisfied, and (3) outputting the ordered task sequence together with the linked multimodal resources. This procedure ensures that the generated process flow strictly respects task dependencies encoded in the knowledge graph, thereby improving the reproducibility of the results. For illustration, consider a query regarding the excavation stage of a deep foundation pit. The inference mechanism begins by identifying the starting task (Diaphragm Wall Construction). Based on the precedence relations encoded in the knowledge graph, it then infers the next tasks (Excavation to −3.0 m, followed by Steel Strut Installation). The reasoning path continues by expanding successor nodes until the full sequence of excavation and support operations is produced. This example demonstrates how the rule-based traversal ensures that the generated process strictly follows technological and safety constraints. Moreover, the framework supports dynamic updating through incremental knowledge graph modification. When new information is introduced (e.g., revised excavation depth or unexpected soil conditions), corresponding nodes and relations are added or updated in the graph. The process flow generation module then re-infers the task sequence based on the revised graph structure, ensuring that the Construction Technology Disclosure plan remains consistent with current conditions. In practice, this allows rapid adjustment of disclosure content in response to design changes or unforeseen site events, with engineers validating and confirming updates through the system interface.

5. Results

5.1. Knowledge Extraction Result Analysis

(1) Entity Extraction Result Analysis. In this study, 230 technology disclosure cases and four domain books were used as data sources, from which 17 types of entities were extracted, including project, organization, geological hydrology, surrounding environment, and others. The specific results are shown in Table 8. To ensure a comprehensive evaluation of the knowledge extraction components, we computed precision, recall, and F1-scores for each of the 17 entity types in the NER module and for 7 core relation types in the relation extraction module. As shown in Table 7 and Table 8, the average F1 score for entities was 84.6%, with ‘Project Features’ and ‘Construction Preparation’ achieving scores above 87%. Relationship extraction reached an average F1 score of 81.7%, with high precision in identifying sequential and dependency relations between construction tasks.

In terms of overall evaluation metrics, the “Construction Preparation” module performed the best (88.16%), while “Environmental Protection” performed the worst (73.25%). The high accuracy and recall group (P > 80%, R > 80%) includes “Project Features” (P = 87.85%, R = 87.64%) and “Construction Preparation” (P = 88.91%, R = 87.43%); the low accuracy and recall group (P < 75% or R < 75%) includes “Geological Hydrology” (P = 73.32%, R = 84.37%), “Surrounding Environment” (P = 74.62%, R = 85.69%), and “Environmental Protection” (P = 73.48%, R = 73.02%). The recall deficiency for “Construction Materials” is due to the diverse ways material specifications are expressed and inconsistent annotation granularity. The low precision and recall for “Environmental Protection” is due to vague descriptions of environmental protection measures or semantic overlap with “Civilized Construction” categories. Further error analysis revealed two recurring patterns. First, many sentences used non-specific phrases (e.g., “appropriate action should be taken”) that lacked domain-grounded technical terms, making it difficult for the model to identify concrete entities. Second, multi-clause sentences often embedded several protection instructions (e.g., dust suppression, noise reduction, waste management) within a single expression, increasing complexity for both boundary detection and relation extraction. These patterns contributed to entity fragmentation and reduced overall extraction accuracy. The feature capture ability for “Safety Management” is weak because safety event descriptions often use negation sentences (e.g., “No over-excavation allowed”), which have complex syntactic patterns. To improve coverage in these two categories, we enhanced the annotation schema with domain-specific examples, such as regulatory phrases and typical safety warnings. For Safety Management, additional training samples containing negation structures were introduced to help the model capture implicit constraints. For Environmental Protection, synonym normalization and rule-based filters were applied to handle semantic ambiguity and overlap with Civilized Construction. These adjustments led to slight improvements in F1-scores in both categories during validation experiments.

(2) Relation Extraction Results. In the 10 types of entity relationships defined in this study, for “Preparation Relationship” and “Approval Relationship,” since both the head and tail entities strictly follow established standards and are directly determined, and the number of involved entities is relatively small, they were processed through direct determination. Similarly, the “Explanation Relationship” between “Disclosure Video” and “Disclosure Document” was also handled by direct determination. For the remaining 7 types of relationships, considering their complexity and diversity, a BERT-CNN model-based relation extraction method was used.

The experimental results of the model are shown in Table 9. Data analysis indicates that the BERT-CNN integrated model proposed in this study demonstrated excellent performance in the entity relationship classification task. Particularly, it performed well in “Task Relationship” and “Production Relationship,” with an average F1 score exceeding 83%. The performance for “Equipment Relationship,” “Containment Relationship,” and “Reference Relationship” was stable, with F1 scores ranging from 77% to 82%. The “Guidance Relationship” needs further optimization, and it is suggested that domain rule constraints be introduced to improve precision. The overall F1 score standard deviation was 3.3%, indicating balanced performance across different relationship types, but with about 15–25% room for improvement. Future work should focus on enhancing feature engineering and negative sample mining for relationships with lower scores. To evaluate the sensitivity of the proposed model to incomplete or contradictory data sources, a robustness test was performed. The test simulated conditions of missing or outdated documentation by randomly removing 10–30% of the disclosure text and introducing inconsistent parameter values. The results showed that the performance of knowledge extraction (measured by F1-score) decreased moderately by 3.5–7.2%, while the accuracy of process flow generation decreased by less than 5%. Importantly, the overall logical structure of the disclosure maps remained stable, and most missing elements could be recovered through knowledge graph inference. These findings suggest that the system demonstrates good resilience to imperfect data, which is a common challenge in engineering practice. In addition to robustness testing, we conducted a structural evaluation by comparing the system-generated knowledge graph against ground-truth graphs manually constructed by three domain experts. The results showed a node match rate of 91.6%, an edge alignment rate of 88.4%, and an overall F1-score of 0.90. These quantitative indicators demonstrate a high degree of structural consistency between the automated and expert-generated graphs, thereby validating the semantic accuracy and domain fidelity of the proposed knowledge modeling framework.

Additional error analysis was performed for categories with relatively low performance, such as Environmental Protection. Most errors arose from two sources: (1) vague or generic expressions (e.g., “measures should be taken to reduce dust” without specifying technology or material), which hindered precise entity recognition; and (2) long and complex sentences that combined multiple measures in one clause, which increased difficulty for relation extraction. These findings indicate that the extraction model is more challenged by abstract, loosely standardized language compared to technical descriptions of structural or procedural tasks.

5.2. Knowledge Graph Result Display

By using the LOAD CSV method to import entities, relationships, and attributes, and exporting the Neo4j graph as a PNG file, the knowledge graph shown in Figure 9 was obtained. In the graph, the unique red node is named “Deep Foundation Pit Engineering Construction Technology Disclosure,” and the six surrounding blue nodes represent entity categories, which are: Project Basic Information, Construction Preparation, Construction Technology, Construction Organization Management, Standard Specifications, and Multimodal. The entire graph contains 803 entity nodes, 1509 entity relationships, and 46 attribute categories.

5.3. Entity Similarity Algorithm Result Analysis

This study adopted the Jaccard entity similarity measurement algorithm integrated with the Word2Vec model and used a similarity evaluation test set containing 210 construction information entities and attributes. The test results were shown in Table 10.

The experimental analysis results indicated that the integrated algorithm outperformed the standalone Word2Vec model and the traditional Jaccard algorithm, demonstrating a high accuracy rate of 89.27%, which highlighted its excellent capability in correct matching for construction information alignment tasks. However, its recall rate was 74.38%, reflecting that approximately a quarter of the positive samples were not successfully identified. Analysis revealed that the main reason for this phenomenon was the ambiguity in the description of construction information attributes, as well as the high similarity between the construction entities and attributes returned by the knowledge graph and the input information entities and attributes, which led to a lower recall rate.

5.4. Analysis of Technology Disclosure Scheme Intelligent Generation Results

After inputting the construction information for bored piles into the intelligent generation framework for technology disclosure schemes, the construction element analysis was first completed by the pre-trained BERT-BiLSTM-CRF deep learning model. The BERT layer captured contextual semantic representations, and the bidirectional LSTM network captured the temporal features of the construction information. Finally, the CRF layer was used to identify construction activity entities (e.g., water-stop curtain construction) and construction process entities (e.g., bored pile construction). Next, the Word2Vec-Jaccard hybrid model was used for entity alignment, locating the bored pile entity in the knowledge graph. This entity contained multiple bored pile technology disclosure examples, and then, through the attribute similarity algorithm, entity attributes were matched, and the top five bored pile technology disclosure case entities with the highest scores were calculated. These entities included five general attributes: entity name, foundation pit depth, safety level, soil layer type, surrounding building distance, and groundwater level, as well as eight specific attributes unique to bored piles: cement type, pile diameter, pile spacing, sinking speed, lifting speed, stirring axle speed, spraying method, and cement incorporation amount.

By default, the bored pile construction process entity with the highest score was selected. This entity contained the process flow entity and technical requirements entity. The process flow entity was connected to the bored pile process flow diagram entity, and the technical requirements entity was connected to the bored pile construction process diagram entity and bored pile construction sequence diagram entity through inclusion relationships. Image data was retrieved via cloud storage and could be directly called after retrieving the URL link.

By using the relationship rules and multi-hop graph inference rules defined during the construction of the knowledge graph, knowledge related to technology disclosure, including standard specifications, quality control, safety management, civilized construction, and environmental protection knowledge, could be retrieved. Additionally, technology disclosure videos could also be retrieved via the knowledge graph, and the videos were accessed through cloud storage by obtaining the URL link, which could be directly called after retrieval. If there was a change in the construction information, the intelligent generation framework dynamically adapted through a three-tier response mechanism (“parameter change-template matching-parameter reconstruction”). When an engineering change order (such as a construction parameter adjustment) was received, the intelligent generation framework first extracted the feature vectors of the changed entities and attributes. Then, a multi-dimensional match was performed in the multimodal knowledge graph, and the best-matching technology disclosure knowledge was inferred and retrieved. This knowledge was mapped to the technology disclosure scheme template for parameter reconstruction, thereby generating an updated technology disclosure scheme, ensuring the framework could dynamically respond to construction changes. In practical scenarios, such changes may originate from revised design drawings, on-site deviations, or risk warnings. The system monitors for these updates and extracts relevant entities or parameter changes from new inputs. It then re-evaluates case similarity and inference paths in the knowledge graph, adjusts matched templates accordingly, and regenerates a compliant CTD document. This enables real-time adaptation without restarting the entire pipeline, ensuring that generated disclosures reflect the current construction context accurately.

Through the above process, the intelligent generation of the technology disclosure scheme could be achieved, and the dynamic response to changes in the technology disclosure scheme could also be realized. Finally, the inferred and retrieved knowledge was mapped and output according to the designed technology disclosure scheme template, resulting in the generation of the technology disclosure scheme for the target construction activity. To assess the quality of the generated disclosure plans, we adopted both objective metrics and expert evaluation. The average field completion accuracy reached 96.4%, with a knowledge reuse rate of 72.5% and an average generation time of 18.7 s per plan. Semantic consistency with expert-authored templates was measured using cosine similarity, yielding a score of 0.87. Furthermore, three senior engineers independently evaluated the generated plans in terms of completeness, logical coherence, and regulatory compliance, resulting in an average score of 4.42 out of 5. The inter-rater agreement, calculated using Cohen’s Kappa, was 0.79, indicating substantial expert consensus. The results of the intelligent generation based on the multimodal knowledge graph were shown in Figure 10 (image display effect shows two pages combined).

By comparing the original technology disclosure scheme with the intelligently generated one, it was found that the original scheme lacked the three parts related to construction personnel, construction materials, and construction equipment in the construction preparation section and only covered on-site civilized construction, lacking environmental protection content. In contrast, the intelligently generated technology disclosure scheme, based on the technology disclosure scheme template and knowledge graph inference and retrieval, could intelligently generate a structurally complete technology disclosure scheme, reducing the omissions typically found in manual writing. While the current case study demonstrates the feasibility and advantages of the proposed framework, the validation remains limited to a single project. To strengthen generalizability, future work will extend the analysis to multiple cases under different geological conditions, contractors, and project locations. In addition, benchmarking against baseline approaches such as manual drafting and BIM-only workflows will provide a clearer comparative understanding of the benefits and limitations of the intelligent generation framework. In particular, the intelligent system ensured explicit inclusion of Environmental Protection and Safety Management content, which were either missing or incomplete in the manually prepared Construction Technology Disclosure Plan. This demonstrates the advantage of the proposed approach in systematically covering essential regulatory dimensions. To further validate the system’s practical effectiveness, we conducted a pilot user survey among construction technical personnel. Key performance indicators were compared against expected benchmarks, as shown in Table 11.

The relevant parameters for the intelligent generation of the technology disclosure scheme came partly from standard specification knowledge in the knowledge graph, which included constraints on quality, construction processes, safety, construction materials, construction equipment, civilized construction, and green construction. Another part came from construction design document information, and the last part came from historical case knowledge. These three sources not only facilitated the intelligent generation of technology disclosure knowledge but also enabled cross-validation, resulting in high attribute accuracy for the intelligent generation of technology disclosure schemes and strong operability. The multimodal knowledge graph based on technology disclosure knowledge could not only intelligently generate traditional document-based disclosure knowledge but also push construction process-related images and technology disclosure videos through graph inference and retrieval, promoting the standardization, acceleration, visualization, and intelligence of deep foundation pit engineering technology disclosure activities.

To further validate the effectiveness of each component, we conducted additional comparison and ablation experiments. The BERT-BiLSTM-CRF model outperformed classical CRF and LSTM-CRF baselines in NER, with over 8% higher F1-scores. A preliminary RAG-based method using construction codes generated fluent outputs but showed lower structural consistency compared to our approach. We also tested a graph embedding-based generation strategy, which lacked accuracy in capturing multi-step task dependencies.

In ablation studies, removing image and video inputs reduced completeness and field consistency by 6.8% and 4.3%, respectively. Excluding rule-based reasoning led to disordered task sequences and lower expert scores (from 4.42 to 3.7). Changing the fusion weights from 0.6/0.4 to other ratios also degraded performance. These results support the rationality of the system design.

6. Conclusions

This study proposed the application of multimodal knowledge graph technology to the field of deep foundation pit engineering technology disclosure. Focusing on domain knowledge modeling, knowledge extraction, and knowledge application, it enhanced the systematization of intelligent knowledge generation for deep foundation pit engineering based on multimodal knowledge graphs. The main conclusions were as follows:

Domain Knowledge Modeling and Relationship Categorization. Through the analysis of construction technology disclosure records, professional literature, and standard specification text knowledge characteristics in the deep foundation pit engineering domain, this study planned the conceptual system and relationship categories for the domain of deep foundation pit engineering technology disclosure. Specifically, the study classified entity categories into six aspects: basic project information, construction preparation, construction technology, construction management, multimodal, and standard specifications. Ten core relationship types were extracted, and a multi-level knowledge structure model was constructed, providing clear guidance for the extraction and filling of the knowledge graph data layer.
Knowledge Extraction Strategy and Method Construction. This study developed targeted knowledge extraction strategies based on the characteristic attributes of various texts and built a systematic domain knowledge extraction method. In the field of deep foundation pit engineering construction, for texts such as standard specification documents, professional books, and technology disclosure records, rule-based knowledge extraction techniques and BERT-BiLSTM-CRF deep learning models were comprehensively applied to achieve accurate extraction of entities and their attributes. Meanwhile, in the relationship extraction process, text structure analysis techniques combined with the BERT-CNN deep learning model ensured the accuracy and comprehensiveness of the relationship extraction. Additionally, multimodal entities, such as images and videos, were linked across modalities using semantic association rules by adding keyword tags and generating resource identifiers.
Neo4j-based Knowledge Storage Solution. This study designed a knowledge storage solution based on the Neo4j graph database. Knowledge triples extracted from multimodal data sources were imported into the Neo4j database to construct the multimodal TDFPC knowledge graph. As a core tool for knowledge management, the knowledge graph not only provided a storage carrier for the integration and accumulation of TDFPC domain knowledge but also offered an effective path for the intuitive presentation of knowledge and laid the data foundation for subsequent intelligent knowledge generation.
Intelligent Generation of Technology Disclosure Schemes. Based on the needs of technical managers, this study conducted research on the intelligent generation of technology disclosure schemes based on multimodal knowledge graphs. A technology disclosure scheme template, containing nine modules, was designed for technology disclosure records and domain-specific books. Using the BERT-BiLSTM-CRF deep learning model, entities and attributes of target construction information were recognized. For numeric and character-type attributes, historical case entity and attribute matching was carried out using cosine similarity and Word2Vec-Jaccard methods. Based on the construction activity entity with the highest matching score, graph inference and retrieval were performed according to knowledge graph relationship rules and multi-hop inference rules. The final results of inference and retrieval were returned and mapped to the technology disclosure scheme template, achieving the intelligent generation of deep foundation pit engineering construction technology disclosure schemes, meeting the business needs on construction sites, and promoting the standardization, acceleration, visualization, and intelligence of deep foundation pit engineering construction technology disclosure activities.
In terms of scalability, the framework has the potential to be generalized beyond deep foundation pit projects. Since the core mechanism is based on a multimodal knowledge graph and rule-based inference, it can in principle be adapted to other construction domains such as bridges and tunnels by expanding the ontology and disclosure templates with domain-specific entities, relations, and standards. In particular, the framework’s modular architecture enables adaptation without reengineering the entire pipeline. For example, in bridge or tunnel projects, the system can incorporate new ontological concepts such as span type, bearing capacity, or tunnel lining methods by updating the entity schema and retraining on relevant annotated corpora. Moreover, the template-based document generation module supports reconfiguration of output formats according to regulatory requirements in diverse construction subdomains. This flexibility demonstrates the generalizability of the proposed system to a broader range of civil infrastructure scenarios. Future work will focus on extending the knowledge base and verifying the framework across different types of construction projects to demonstrate broader applicability.
The ontology-based design allows the system to be extended beyond deep excavation to other construction types, as well as adapted to different regulatory standards. Scalability evaluations showed stable performance with large graphs, with reasoning time under 2 s and storage under 2.5 GB, demonstrating feasibility for real-world engineering scenarios.

Despite the promising results, this study has several limitations. First, the application scope is currently limited to deep excavation projects, which may restrict generalizability to other construction domains. Second, the preparation of training data relied on manual annotation, which is time-consuming and may introduce subjectivity. Third, the available dataset size was limited, which constrained the performance of deep models. Fourth, validation was conducted on a single case project, and comparative benchmarking against alternative approaches such as manual drafting and BIM-only workflows was not included. These limitations partly explain the residual errors observed. Future work will therefore focus on expanding the domain coverage, reducing annotation costs through semi-automatic methods, leveraging larger domain-specific corpora, and conducting multi-case evaluations with systematic baseline comparisons to better demonstrate robustness and practical value.

Author Contributions

Formal analysis, D.Z.; Methodology, N.Y. and N.X.; Resources, N.X.; Validation, N.X. and D.Z.; Writing—original draft, N.Y.; Writing—review and editing, N.Y.; supervised the project, acquired funding, and provided critical feedback on the manuscript, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study received funding from the Fundamental Research Funds for Central Universities of China University of Mining and Technology (Grant No. 2024QN11014).

Data Availability Statement

The full dataset used in this study, including case documents, images, videos, and BIM models, cannot be made publicly available due to third-party agreements and engineering confidentiality. A subset of anonymized data and relevant model code will be released on GitHub upon publication. The repository will also include a runtime guide, CQL/CSV import files for knowledge graph construction, and model documentation.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Available online: https://www.stats.gov.cn/sj/ndsj/2024/indexch.htm (accessed on 27 July 2025).
Zhao, J.; Li, W.; Peng, Y. Analysis on Intelligent Deformation Prediction of Deep Foundation Pits With Internal Support Based on Optical Fiber Monitoring and the HSS Model. Front. Mater. 2023, 10, 1231303. [Google Scholar] [CrossRef]
Xu, Z.; Wang, J.; Zhu, H. A Semantic-Based Methodology to Deliver Model Views of Forward Design for Prefabricated Buildings. Buildings 2022, 12, 1158. [Google Scholar] [CrossRef]
Fu, L.; Li, X.; Wang, X.; Li, M. Safety risk propagation in complex construction projects: Insights from metro deep foundation pit projects. Reliab. Eng. Syst. Saf. 2025, 257, 110858. [Google Scholar] [CrossRef]
Volk, R.; Stengel, J.; Schultmann, F. Building Information Modeling (BIM) for Existing Buildings—Literature Review and Future Research Directions. Autom. Constr. 2014, 38, 109–127. [Google Scholar] [CrossRef]
Hu, X.; Gu, L.; Kobayashi, K.; Liu, L.; Zhang, M.; Harada, T.; Summers, R.M.; Zhu, Y. Interpretable medical image Visual Question Answering via multi-modal relationship graph learning. Med. Image Anal. 2024, 97, 103279. [Google Scholar] [CrossRef] [PubMed]
Huan, X.; Kang, B.G.; Xie, J.; Hancock, C. Building information modelling (BIM)-enabled facility management (FM) of nursing homes in China: A systematic review. J. Build. Eng. 2025, 99, 103142. [Google Scholar] [CrossRef]
Chen, X.; Zhang, J.; Wang, X.; Zhang, N.; Wu, T.; Wang, Y.; Wang, Y.; Chen, H. Continual Multimodal Knowledge Graph Construction. arXiv 2023, arXiv:2305.08698. [Google Scholar] [CrossRef]
Kommineni, V.K.; König-Ries, B.; Samuel, S. From Human Experts to Machines: An LLM Supported Approach to Ontology and Knowledge Graph Construction. arXiv 2024, arXiv:2403.08345. [Google Scholar] [CrossRef]
Dmochowski, G.; Szołomicki, J. Technical and Structural Problems Related to the Interaction Between a Deep Excavation and Adjacent Existing Buildings. Appl. Sci. 2021, 11, 481. [Google Scholar] [CrossRef]
Zhu, X.; Li, Z.; Wang, X.; Jiang, X.; Sun, P.; Wang, X.; Xiao, Y.; Yuan, N.J. Multi-Modal Knowledge Graph Construction and Application: A Survey. IEEE Trans. Knowl. Data Eng. 2022, 36, 715–735. [Google Scholar] [CrossRef]
Pan, Y.; He, W.; Chen, J.-J. Spatiotemporal deep learning for multi-attribute prediction of excavation-induced risk. Autom. Constr. 2025, 171, 105964. [Google Scholar] [CrossRef]
Zhang, L.; Wang, J.; Wang, Y.; Sun, H.; Zhao, X. Automatic construction site hazard identification integrating construction scene graphs with BERT based domain knowledge. Autom. Constr. 2022, 142, 104535. [Google Scholar] [CrossRef]
Gan, B.-L.; Zhang, D.-M.; Huang, Z.-K.; Zheng, F.-Y.; Zhu, R.; Zhang, W. Ontology-driven knowledge graph for decision-making in resilience enhancement of underground structures: Framework and application. Tunn. Undergr. Space Technol. 2025, 163, 106739. [Google Scholar] [CrossRef]
Hou, C.; Liu, K.; Wang, T.; Shi, S.; Li, Y.; Zhu, Y.; Hu, X.; Wang, C.; Zhou, C.; Lv, H. DDE KG Editor: A Data Service System for Knowledge Graph Construction in Geoscience. Geosci. Data J. 2024, 11, 1073–1085. [Google Scholar]
Lee, W.; Lee, S. Development of a Knowledge Base for Construction Risk Assessments Using BERT and Graph Models. Buildings 2024, 14, 3359. [Google Scholar] [CrossRef]
Wang, J.; Xie, H.; Zhang, S.; Qin, S.J.; Tao, X.; Wang, F.L.; Xu, X. Multimodal fusion framework based on knowledge graph for personalized recommendation. Expert Syst. Appl. 2025, 268, 126308. [Google Scholar] [CrossRef]
Zhou, S.; Liu, K.; Li, D.; Fu, C.; Ning, Y.; Ji, W.; Liu, X.; Xiao, B.; Wei, R. Augmenting general-purpose large-language models with domain-specific multimodal knowledge graph for question-answering in construction project management. Adv. Eng. Inform. 2025, 65, 103142. [Google Scholar] [CrossRef]
Wang, Z.; Liu, Z.; Lu, W.; Jia, L. Improving knowledge management in building engineering with hybrid retrieval-augmented generation framework. J. Build. Eng. 2025, 103, 112189. [Google Scholar] [CrossRef]
Wu, Y.; Liu, F.; Wan, L.; Wang, Z. Intelligent Fault Diagnostic Model for Industrial Equipment Based on Multimodal Knowledge Graph. IEEE Sens. J. 2023, 23, 26269–26278. [Google Scholar] [CrossRef]
Jiang, Y.; Gao, X.; Su, W.; Li, J. Systematic Knowledge Management of Construction Safety Standards Based on Knowledge Graphs: A Case Study in China. Int. J. Environ. Res. Public Health 2021, 18, 10692. [Google Scholar] [CrossRef]
Saha, A.; Ansari, G.A.; Laddha, A.; Sankaranarayanan, K.; Chakrabarti, S. Complex Program Induction for Querying Knowledge Bases in the Absence of Gold Programs. Trans. Assoc. Comput. Linguist. 2019, 7, 185–200. [Google Scholar] [CrossRef]
Song, Y.; Sun, P.; Liu, H.; Li, Z.; Song, W.; Xiao, Y.; Zhou, X. Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI. IEEE Trans. Knowl. Data Eng. 2023, 36, 6962–6976. [Google Scholar] [CrossRef]
Chen, Y.; Liang, B.; Hu, H. Research on Ontology-Based Construction Risk Knowledge Base Development in Deep Foundation Pit Excavation. J. Asian Archit. Build. Eng. 2024, 24, 1640–1658. [Google Scholar] [CrossRef]
Pan, P.; Sun, S.-H.; Feng, J.-X.; Wen, J.-T.; Lin, J.-R.; Wang, H.-S. Intelligent Monitoring System for Deep Foundation Pit Based on Digital Twin. Buildings 2025, 15, 366. [Google Scholar] [CrossRef]
Chen, Y.; Ge, X.; Yang, S.; Hu, L.; Li, J.; Zhang, J. A Survey on Multimodal Knowledge Graphs: Construction, Completion and Applications. Mathematics 2023, 11, 1815. [Google Scholar] [CrossRef]
Li, Y.; Ji, H.; Yu, F.; Cheng, L.; Che, N. Temporal multi-modal knowledge graph generation for link prediction. Neural Netw. 2025, 185, 107108. [Google Scholar] [CrossRef]
Tam, N.T.; Trung, H.T.; Yin, H.; Van Vinh, T.; Sakong, D.; Zheng, B.; Hung, N.Q.V. Entity Alignment for Knowledge Graphs With Multi-Order Convolutional Networks. IEEE Trans. Knowl. Data Eng. 2022, 34, 4201–4214. [Google Scholar] [CrossRef]
Roll, D.S.; Kurt, Z.; Li, Y.; Woo, W.L. Augmenting Orbital Debris Identification with Neo4j-Enabled Graph-Based Retrieval-Augmented Generation for Multimodal Large Language Models. Sensors 2025, 25, 3352. [Google Scholar] [CrossRef]
Wu, W.; Wen, C.; Yuan, Q.; Chen, Q.; Cao, Y. Construction and application of knowledge graph for construction accidents based on deep learning. Eng. Constr. Archit. Manag. 2025, 32, 1097–1121. [Google Scholar] [CrossRef]
Zhu, H.; Kan, B.; Li, Y.; Yan, E.; Weng, H.; Wang, F.L.; Hao, T. A new semi-supervised fuzzy clustering method based on latent representation learning and information fusion. Appl. Soft Comput. 2025, 170, 112717. [Google Scholar] [CrossRef]
Yang, Y.; Xiang, P. A knowledge graph for the vulnerability of construction safety system in megaprojects based on accident inversion. Eng. Appl. Artif. Intell. 2025, 150, 110630. [Google Scholar] [CrossRef]
GB/T 50430—2017; Quality Management Specifications for Construction Enterprises in Engineering Projects. China Planning Press: Beijing, China, 2017.
Naderi, H.; Shojaei, A. Large-Language model (LLM)-Powered system for Situated and Game-Based construction safety training. Expert Syst. Appl. 2025, 283, 127887. [Google Scholar] [CrossRef]
GB 50202—2018; Code for Quality Acceptance of Construction of Building Foundation. China Architecture & Building Press: Beijing, China, 2018.
JGJ 120—2012; Technical Code for Support of Building Foundation Pits. China Architecture & Building Press: Beijing, China, 2012.

Figure 1. Construction Process of Multimodal Knowledge Graph for Deep Foundation Pit Engineering Technology Disclosure.

Figure 2. Hierarchical TDFPC Multimodal Knowledge Structure Model for Deep Foundation Pit Engineering Technology Disclosure.

Figure 3. BERT-BiLSTM-CRF Model Training Process Diagram.

Figure 4. Regular Expression Template Matching Extraction Process.

Figure 5. Example of Multimodal Knowledge Topology Architecture.

Figure 6. TDFPC Scheme Intelligent Generation Framework.

Figure 7. Technology Disclosure Scheme Template.

Figure 8. Technology Disclosure Scheme Template Mapping Process.

Figure 9. Multimodal Knowledge Graph for Deep Foundation Pit Engineering Construction Technology Disclosure.

Figure 10. Intelligent Generation Result of Water-stop Curtain Bored Pile Construction Technology Disclosure Scheme.

Table 1. Domain Knowledge Extraction Strategy.

No.	Knowledge Content	Degree of Structuring	Extraction Method	Knowledge Source
1	Disclosure Content	Weak	Regular Expression + Deep Learning	Technical Disclosure Cases
2	Construction Technology + Construction Process	Weak	Regular Expression + Deep Learning	Special Construction Plans
3	Construction Process + Construction Management	Strong	Regular Expression	Domain-Specific Books
4	Standard Specification Document Information	Strong	Regular Expression	Standard Specification Documents
5	Chapter and Article Information	Strong	Regular Expression	Standard Specification Documents
6	Clause Items	Strong	Regular Expression	Standard Specification Documents
7	Terminology Concepts	Strong	Regular Expression	Standard Specification Documents

Table 2. Representative Triplets in the TDFPC Knowledge Graph.

Subject Entity	Relation Type	Object Entity
Bored Pile Construction	hasMaterial	C35 Concrete
C35 Concrete	hasStrengthClass	C35
Bored Pile Construction	hasProcessStep	Reinforcement Cage Placing
Reinforcement Cage Placing	isFollowedBy	Concrete Pouring

Table 3. Statistics of Valid Multimodal Data.

Corpus Type	Corpus Name	Raw Data (Count)	Valid Data (Count)
Text	Standard Specifications	17	17
Text	Technology Disclosure Cases	278	230
Text	Books	4	4
Image	Deep Foundation Pit Engineering Images	256	193
Video	Technology Disclosure Videos	50	40
Atlas	“Construction of Foundation Pit Support Structures”	1	1
Model	BIM Models	4	4

Table 4. Parameter Settings of the Deep Learning Training Model.

Parameter Name	Parameter Setting	Parameter Name	Parameter Setting
weight_decay	0.01	max_seq_len	256
Epoch	100	Hidden size	768
Hidden size	100	Initializer range	0.02
Bidirectional	True	BERT-Learning rate	0.00005
Batch first	Ture	Hidden layers	12
Num layers	1	Warmup steps	0.1
CRF-Learning rate	0.003	Dropout	0.4
Batch size	20	D model	128

Table 5. Selection of Entities and Attributes for Bored Pile Construction Information.

Category	Entity Type	Attribute	Data Type
General Category	Engineering Project	Project Name	String Type
	Engineering Project	Project Location	String Type
	Project Characteristics	Excavation Depth	Numeric Type
	Project Characteristics	Excavation Area	Numeric Type
	Technique	Layered Excavation Thickness	Numeric Type
		Dewatering Depth	Numeric Type
		Safety Level	String Type
		Acceptance Standard	Numeric Type
	Construction Material	Material Specification	String Type
	Construction Equipment	Equipment Type	String Type
	Geological & Hydrological Conditions	Soil Layer Type	String Type
	Geological & Hydrological Conditions	Groundwater Level	Numeric Type
	Surrounding Environment	Distance to Adjacent Buildings	Numeric Type
	Surrounding Environment	Pipeline Type	String Type
Specialized Category	Bored Cast-in Situ Pile	Concrete Strength Grade	String Type
		Pile Diameter	Numeric Type
		Pile Spacing	Numeric Type
		Rebar Specification	String Type

Table 6. Bored Pile Construction Technology Disclosure Attribute Weight Assignment Table.

Category	Category Weight	Attribute	Number	Attribute Weight	Data Type	Key Basis
General Attributes	0.6	Foundation Pit Depth	A1	0.2	Numeric	Directly determines the selection of the support structure (e.g., depth > 10 m requires internal support), dewatering plan, and construction risk level
		Safety Level	A2	0.2	Character	Safety level classification directly affects support standards (e.g., level-1 pits require expert review) and monitoring frequency
		Soil Layer Type	A3	0.2	Character	Soil characteristics (e.g., soft soil/sandy soil) determine the design of the support system (e.g., soil nail wall spacing, underground continuous wall thickness)
		Surrounding Building Distance	A4	0.2	Numeric	When distance < 5 m, a special protection plan is required (e.g., increase the frequency of settlement monitoring to once a day)
		Groundwater Level	A5	0.2	Numeric	Water level height determines dewatering depth and water stop measures (e.g., artesian water requires deep well dewatering)
Specialized Attributes	0.4	Concrete Strength Grade	B1	0.25	Character	Core requirement for bearing capacity and structural safety; compressive strength directly determines bearing capacity and impermeability
		Pile Diameter	B2	0.25	Numeric	If pile diameter deviation > ±50 mm, bearing capacity decreases by 15%, requiring dynamic adjustment according to load demands
		Pile Spacing	B3	0.25	Numeric	If spacing < 2.5 d, the settlement of the pile group increases by 30%, requiring optimization based on soil uplift characteristics
		Reinforcement Type	B4	0.25	Numeric	If spacing < 2.5 d, the settlement of the pile group increases by 30%, requiring optimization based on soil uplift characteristics

Table 7. Mapping between Technology Disclosure Scheme Template and Multimodal Knowledge Graph.

Template Module	Knowledge Graph Entity Type	Core Field Example	Data Source
Project Overview	Project, Geological Hydrology	Project Name, Foundation Pit Depth, Groundwater Level	Entity Attribute Extraction
Compilation Basis	Standard Specifications, Laws and Regulations	GB 50202-2018 [35], JGJ 120-2012 [36]	Regular Expression Matching
Construction Preparation	Construction Personnel, Construction Materials	Trade Configuration, Material Specifications	BiLSTM-CRF Entity Recognition
Construction Process	Construction Process, Process Flow	Underground Continuous Wall Construction Steps, Technical Parameters	Graph Inference
Quality Control	Quality Control, Common Quality Issues	Pile Verticality Deviation ≤ 3‰	Graph Inference, Knowledge Graph Attribute Mapping
Safety Management	Safety Risks, Emergency Plans	Collapse Risk Prevention Measures	Historical Case Similarity Matching
Civilized Construction	Civilized Construction	Site Hardening Standards	Knowledge Graph Attribute Mapping
Environmental Protection	Environmental Protection	Noise Control Limits	Knowledge Graph Attribute Mapping
Others	Image Entities, Video Entities	Process Flow Diagram URL, BIM Model View	Cloud Server Storage

Table 8. BERT-BiLSTM-CRF Model Experimental Results.

Index	Entity Type	Entity Code	Accuracy	Recall	Score
1	Engineering Project	cProject	75.42%	84.67%	79.78%
2	Organizational Structure	cOrganizations	76.32%	81.93%	79.03%
3	Geological &Hydrological Conditions	cGeology	73.32%	84.37%	78.46%
4	Surrounding Environment	cSurroundings	74.62%	85.69%	79.77%
5	Project Characteristics	cFeatures	87.85%	87.64%	87.74%
6	Construction Location	cLocation	72.89%	81.54%	76.97%
7	Construction Preparation	cPreparation	88.91%	87.43%	88.16%
8	Management Personnel	cManagement	72.13%	77.78%	74.85%
9	Construction Personnel	cworkers	81.02%	79.59%	80.30%
10	Construction Materials	cMaterials	87.68%	64.14%	74.09%
11	Construction Equipment	cEquipments	77.91%	87.69%	82.51%
12	Construction Activities	cActivities	81.01%	73.47%	77.06%
13	Technique	cTechnology	88.93%	68.79%	77.57%
14	Quality Control	cQuality	83.78%	67.87%	74.99%
15	Safety Management	cSafety	71.46%	74.01%	72.71%
16	Civilized Construction Practices	cCivilization	83.17%	82.17%	82.67%
17	Environmental Protection	cEnvironment	73.48%	73.02%	73.25%

Table 9. Entity Relationship Classification Recognition Model Experimental Results.

Relation Type	Precision (P)	Recall (R)	Score
Usage Relation	81.52%	76.35%	78.85%
Task Relation	84.17%	83.24%	83.70%
Production Relation	81.42%	86.31%	83.79%
Equipment Relation	76.86%	77.18%	77.02%
Containment Relation	79.28%	83.84%	81.50%
Guidance Relation	73.75%	82.91%	78.06%
Reference Relation	81.29%	79.24%	80.25%

Table 10. Experimental Results of the Word2Vec-Jaccard Model.

Model	Precision (P)	Recall (R)	F1-Score
Word2Vec-Jaccard	89.27%	74.38%	81.14%

Table 11. Expected vs. Obtained Performance in Pilot Evaluation.

Evaluation Metric	Expected Value	Obtained Result
Field Completion Accuracy	>95%	96.40%
Knowledge Reuse Rate	>70%	72.50%
Generation Time per Scheme	<30 s	18.7 s
Semantic Similarity (to Expert)	>0.85	0.87
Expert Evaluation (1–5 scale)	>4.0	4.42

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, N.; Xu, N.; Zhong, D.; Guo, J. Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs. Buildings 2025, 15, 3264. https://doi.org/10.3390/buildings15183264

AMA Style

Yang N, Xu N, Zhong D, Guo J. Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs. Buildings. 2025; 15(18):3264. https://doi.org/10.3390/buildings15183264

Chicago/Turabian Style

Yang, Ninghui, Na Xu, Dongqing Zhong, and Jin Guo. 2025. "Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs" Buildings 15, no. 18: 3264. https://doi.org/10.3390/buildings15183264

APA Style

Yang, N., Xu, N., Zhong, D., & Guo, J. (2025). Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs. Buildings, 15(18), 3264. https://doi.org/10.3390/buildings15183264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Generation of Construction Technology Disclosure Plans for Deep Foundation Pit Engineering Based on Multimodal Knowledge Graphs

Abstract

1. Introduction

2. Literature Review

2.1. Research Status of Construction Technology Disclosure in Deep Foundation Pit Engineering

2.2. Application Progress of Knowledge Graphs in the Construction Field

2.3. Key Technologies for Constructing Multimodal Knowledge Graphs

3. Research Methods

3.1. Construction of Multimodal Knowledge Graph

3.2. Knowledge Extraction

3.2.1. Knowledge Extraction Strategy

3.2.2. Entity Extraction Based on the BERT-BiLSTM-CRF Model

3.2.3. Entity Extraction Based on Regular Expression Template Matching

3.3. Knowledge Fusion

3.4. Knowledge Storage

3.5. Knowledge Inference

4. Case Study

4.1. Data Collection

4.2. Experimental Environment

4.3. Construction Activity Information Recognition

4.4. Calculation of Entity and Attribute Similarity for Technology Disclosure Cases

4.5. Technology Disclosure Scheme Template Mapping

4.5.1. Scheme Template Design

4.5.2. Mapping Relationship Between Scheme Template and Knowledge Graph

4.5.3. Scheme Template Mapping Process

5. Results

5.1. Knowledge Extraction Result Analysis

5.2. Knowledge Graph Result Display

5.3. Entity Similarity Algorithm Result Analysis

5.4. Analysis of Technology Disclosure Scheme Intelligent Generation Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI