5.1. Knowledge Extraction Result Analysis
(1) Entity Extraction Result Analysis. In this study, 230 technology disclosure cases and four domain books were used as data sources, from which 17 types of entities were extracted, including project, organization, geological hydrology, surrounding environment, and others. The specific results are shown in
Table 8. To ensure a comprehensive evaluation of the knowledge extraction components, we computed precision, recall, and F1-scores for each of the 17 entity types in the NER module and for 7 core relation types in the relation extraction module. As shown in
Table 7 and
Table 8, the average F1 score for entities was 84.6%, with ‘Project Features’ and ‘Construction Preparation’ achieving scores above 87%. Relationship extraction reached an average F1 score of 81.7%, with high precision in identifying sequential and dependency relations between construction tasks.
In terms of overall evaluation metrics, the “Construction Preparation” module performed the best (88.16%), while “Environmental Protection” performed the worst (73.25%). The high accuracy and recall group (P > 80%, R > 80%) includes “Project Features” (P = 87.85%, R = 87.64%) and “Construction Preparation” (P = 88.91%, R = 87.43%); the low accuracy and recall group (P < 75% or R < 75%) includes “Geological Hydrology” (P = 73.32%, R = 84.37%), “Surrounding Environment” (P = 74.62%, R = 85.69%), and “Environmental Protection” (P = 73.48%, R = 73.02%). The recall deficiency for “Construction Materials” is due to the diverse ways material specifications are expressed and inconsistent annotation granularity. The low precision and recall for “Environmental Protection” is due to vague descriptions of environmental protection measures or semantic overlap with “Civilized Construction” categories. Further error analysis revealed two recurring patterns. First, many sentences used non-specific phrases (e.g., “appropriate action should be taken”) that lacked domain-grounded technical terms, making it difficult for the model to identify concrete entities. Second, multi-clause sentences often embedded several protection instructions (e.g., dust suppression, noise reduction, waste management) within a single expression, increasing complexity for both boundary detection and relation extraction. These patterns contributed to entity fragmentation and reduced overall extraction accuracy. The feature capture ability for “Safety Management” is weak because safety event descriptions often use negation sentences (e.g., “No over-excavation allowed”), which have complex syntactic patterns. To improve coverage in these two categories, we enhanced the annotation schema with domain-specific examples, such as regulatory phrases and typical safety warnings. For Safety Management, additional training samples containing negation structures were introduced to help the model capture implicit constraints. For Environmental Protection, synonym normalization and rule-based filters were applied to handle semantic ambiguity and overlap with Civilized Construction. These adjustments led to slight improvements in F1-scores in both categories during validation experiments.
(2) Relation Extraction Results. In the 10 types of entity relationships defined in this study, for “Preparation Relationship” and “Approval Relationship,” since both the head and tail entities strictly follow established standards and are directly determined, and the number of involved entities is relatively small, they were processed through direct determination. Similarly, the “Explanation Relationship” between “Disclosure Video” and “Disclosure Document” was also handled by direct determination. For the remaining 7 types of relationships, considering their complexity and diversity, a BERT-CNN model-based relation extraction method was used.
The experimental results of the model are shown in
Table 9. Data analysis indicates that the BERT-CNN integrated model proposed in this study demonstrated excellent performance in the entity relationship classification task. Particularly, it performed well in “Task Relationship” and “Production Relationship,” with an average F1 score exceeding 83%. The performance for “Equipment Relationship,” “Containment Relationship,” and “Reference Relationship” was stable, with F1 scores ranging from 77% to 82%. The “Guidance Relationship” needs further optimization, and it is suggested that domain rule constraints be introduced to improve precision. The overall F1 score standard deviation was 3.3%, indicating balanced performance across different relationship types, but with about 15–25% room for improvement. Future work should focus on enhancing feature engineering and negative sample mining for relationships with lower scores. To evaluate the sensitivity of the proposed model to incomplete or contradictory data sources, a robustness test was performed. The test simulated conditions of missing or outdated documentation by randomly removing 10–30% of the disclosure text and introducing inconsistent parameter values. The results showed that the performance of knowledge extraction (measured by F1-score) decreased moderately by 3.5–7.2%, while the accuracy of process flow generation decreased by less than 5%. Importantly, the overall logical structure of the disclosure maps remained stable, and most missing elements could be recovered through knowledge graph inference. These findings suggest that the system demonstrates good resilience to imperfect data, which is a common challenge in engineering practice. In addition to robustness testing, we conducted a structural evaluation by comparing the system-generated knowledge graph against ground-truth graphs manually constructed by three domain experts. The results showed a node match rate of 91.6%, an edge alignment rate of 88.4%, and an overall F1-score of 0.90. These quantitative indicators demonstrate a high degree of structural consistency between the automated and expert-generated graphs, thereby validating the semantic accuracy and domain fidelity of the proposed knowledge modeling framework.
Additional error analysis was performed for categories with relatively low performance, such as Environmental Protection. Most errors arose from two sources: (1) vague or generic expressions (e.g., “measures should be taken to reduce dust” without specifying technology or material), which hindered precise entity recognition; and (2) long and complex sentences that combined multiple measures in one clause, which increased difficulty for relation extraction. These findings indicate that the extraction model is more challenged by abstract, loosely standardized language compared to technical descriptions of structural or procedural tasks.
5.4. Analysis of Technology Disclosure Scheme Intelligent Generation Results
After inputting the construction information for bored piles into the intelligent generation framework for technology disclosure schemes, the construction element analysis was first completed by the pre-trained BERT-BiLSTM-CRF deep learning model. The BERT layer captured contextual semantic representations, and the bidirectional LSTM network captured the temporal features of the construction information. Finally, the CRF layer was used to identify construction activity entities (e.g., water-stop curtain construction) and construction process entities (e.g., bored pile construction). Next, the Word2Vec-Jaccard hybrid model was used for entity alignment, locating the bored pile entity in the knowledge graph. This entity contained multiple bored pile technology disclosure examples, and then, through the attribute similarity algorithm, entity attributes were matched, and the top five bored pile technology disclosure case entities with the highest scores were calculated. These entities included five general attributes: entity name, foundation pit depth, safety level, soil layer type, surrounding building distance, and groundwater level, as well as eight specific attributes unique to bored piles: cement type, pile diameter, pile spacing, sinking speed, lifting speed, stirring axle speed, spraying method, and cement incorporation amount.
By default, the bored pile construction process entity with the highest score was selected. This entity contained the process flow entity and technical requirements entity. The process flow entity was connected to the bored pile process flow diagram entity, and the technical requirements entity was connected to the bored pile construction process diagram entity and bored pile construction sequence diagram entity through inclusion relationships. Image data was retrieved via cloud storage and could be directly called after retrieving the URL link.
By using the relationship rules and multi-hop graph inference rules defined during the construction of the knowledge graph, knowledge related to technology disclosure, including standard specifications, quality control, safety management, civilized construction, and environmental protection knowledge, could be retrieved. Additionally, technology disclosure videos could also be retrieved via the knowledge graph, and the videos were accessed through cloud storage by obtaining the URL link, which could be directly called after retrieval. If there was a change in the construction information, the intelligent generation framework dynamically adapted through a three-tier response mechanism (“parameter change-template matching-parameter reconstruction”). When an engineering change order (such as a construction parameter adjustment) was received, the intelligent generation framework first extracted the feature vectors of the changed entities and attributes. Then, a multi-dimensional match was performed in the multimodal knowledge graph, and the best-matching technology disclosure knowledge was inferred and retrieved. This knowledge was mapped to the technology disclosure scheme template for parameter reconstruction, thereby generating an updated technology disclosure scheme, ensuring the framework could dynamically respond to construction changes. In practical scenarios, such changes may originate from revised design drawings, on-site deviations, or risk warnings. The system monitors for these updates and extracts relevant entities or parameter changes from new inputs. It then re-evaluates case similarity and inference paths in the knowledge graph, adjusts matched templates accordingly, and regenerates a compliant CTD document. This enables real-time adaptation without restarting the entire pipeline, ensuring that generated disclosures reflect the current construction context accurately.
Through the above process, the intelligent generation of the technology disclosure scheme could be achieved, and the dynamic response to changes in the technology disclosure scheme could also be realized. Finally, the inferred and retrieved knowledge was mapped and output according to the designed technology disclosure scheme template, resulting in the generation of the technology disclosure scheme for the target construction activity. To assess the quality of the generated disclosure plans, we adopted both objective metrics and expert evaluation. The average field completion accuracy reached 96.4%, with a knowledge reuse rate of 72.5% and an average generation time of 18.7 s per plan. Semantic consistency with expert-authored templates was measured using cosine similarity, yielding a score of 0.87. Furthermore, three senior engineers independently evaluated the generated plans in terms of completeness, logical coherence, and regulatory compliance, resulting in an average score of 4.42 out of 5. The inter-rater agreement, calculated using Cohen’s Kappa, was 0.79, indicating substantial expert consensus. The results of the intelligent generation based on the multimodal knowledge graph were shown in
Figure 10 (image display effect shows two pages combined).
By comparing the original technology disclosure scheme with the intelligently generated one, it was found that the original scheme lacked the three parts related to construction personnel, construction materials, and construction equipment in the construction preparation section and only covered on-site civilized construction, lacking environmental protection content. In contrast, the intelligently generated technology disclosure scheme, based on the technology disclosure scheme template and knowledge graph inference and retrieval, could intelligently generate a structurally complete technology disclosure scheme, reducing the omissions typically found in manual writing. While the current case study demonstrates the feasibility and advantages of the proposed framework, the validation remains limited to a single project. To strengthen generalizability, future work will extend the analysis to multiple cases under different geological conditions, contractors, and project locations. In addition, benchmarking against baseline approaches such as manual drafting and BIM-only workflows will provide a clearer comparative understanding of the benefits and limitations of the intelligent generation framework. In particular, the intelligent system ensured explicit inclusion of Environmental Protection and Safety Management content, which were either missing or incomplete in the manually prepared Construction Technology Disclosure Plan. This demonstrates the advantage of the proposed approach in systematically covering essential regulatory dimensions. To further validate the system’s practical effectiveness, we conducted a pilot user survey among construction technical personnel. Key performance indicators were compared against expected benchmarks, as shown in
Table 11.
The relevant parameters for the intelligent generation of the technology disclosure scheme came partly from standard specification knowledge in the knowledge graph, which included constraints on quality, construction processes, safety, construction materials, construction equipment, civilized construction, and green construction. Another part came from construction design document information, and the last part came from historical case knowledge. These three sources not only facilitated the intelligent generation of technology disclosure knowledge but also enabled cross-validation, resulting in high attribute accuracy for the intelligent generation of technology disclosure schemes and strong operability. The multimodal knowledge graph based on technology disclosure knowledge could not only intelligently generate traditional document-based disclosure knowledge but also push construction process-related images and technology disclosure videos through graph inference and retrieval, promoting the standardization, acceleration, visualization, and intelligence of deep foundation pit engineering technology disclosure activities.
To further validate the effectiveness of each component, we conducted additional comparison and ablation experiments. The BERT-BiLSTM-CRF model outperformed classical CRF and LSTM-CRF baselines in NER, with over 8% higher F1-scores. A preliminary RAG-based method using construction codes generated fluent outputs but showed lower structural consistency compared to our approach. We also tested a graph embedding-based generation strategy, which lacked accuracy in capturing multi-step task dependencies.
In ablation studies, removing image and video inputs reduced completeness and field consistency by 6.8% and 4.3%, respectively. Excluding rule-based reasoning led to disordered task sequences and lower expert scores (from 4.42 to 3.7). Changing the fusion weights from 0.6/0.4 to other ratios also degraded performance. These results support the rationality of the system design.