Automated Checking of Highway Bridge BIM Models Based on Large Language Models
Abstract
1. Introduction
2. Semantic Structuring of Engineering Specifications Using LLMs
2.1. Overview of Language Modeling Development
2.2. LLM-Based Structuring of Design Specifications
2.2.1. Text Preprocessing
- Format Conversion: Convert engineering specification texts from various sources (e.g., PDF, Word, HTML) into a standardized text format to ensure data consistency and operability.
- Removal of Irrelevant Content: Eliminate unrelated parts, such as covers, tables of contents, and copyright statements, retaining only the clauses and main body of the text.
- Clause Segmentation: Automatically divide the text into independent clauses based on the clause numbers in the specification, ensuring clear boundaries for each clause and preventing information confusion.
- Separation of Comments and Main Text: Separate clause comments from the main body of the specification. Comments are often crucial for understanding the clauses, and separating them facilitates further processing.
- Text Standardization: Standardize various expressions, units, and symbols to ensure consistency in the text, making subsequent information extraction easier.
2.2.2. Prompt Engineering
- Role-playing: The prompt design involves role-playing, where the LLM’s role is that of a “highway bridge engineering expert” proficient in the use and analysis of specifications. The task is to extract structured data from the bridge design specifications, particularly information related to design parameters and constraints.
- Thought Chain: The thought chain technique enables the LLM to better generate the required results. When processing each clause step by step, the model should follow this thought chain:
- Understand the content and structure of the clause and identify the main design requirements.
- Extract key attributes (such as dimensions, strength, etc.), constraints (such as size limits, ratio requirements), and scope from the clause.
- Organize the extracted structured information into a processable format (e.g., JSON, tables, etc.).
- Requirements: Detailed requirements ensure more accurate content.
- Accuracy: The extracted information must strictly conform to the specification requirements, avoiding any misunderstandings or omissions.
- Consistency: Information from different clauses should follow the same structure to ensure the model can uniformly process all clauses.
- Clarity: Each extracted attribute and constraint should have a clear definition and boundary.
- Example: Provide input–output examples to help the LLM better generate the desired content. Giving templates for the LLM to learn from is a common and effective method for adjusting large models.
2.2.3. Knowledge Extraction and Structured Storage Based on Large Language Models
3. Knowledge Graph Construction Methods
3.1. Knowledge Graph Technology
3.2. Knowledge Graph Construction Workflow
4. Automated Checking of Highway Bridge BIM Models Based on Large Language Models
4.1. Research Methods and Technical Framework
- Information Extraction: Extract structured rules from bridge design specifications and construct the knowledge graph, while extracting target components and their attribute data from the BIM model.
- Information Matching: Match the actual model data with the design requirements in the graph database.
- Inspection Result Generation: Output the inspection results and provide feedback and design adjustment recommendations as necessary.
4.2. Scope of Model Checking
4.2.1. Structural Completeness Checking
4.2.2. Regulatory Compliance Checking
- Single-Component Attribute Compliance: Determines whether the attribute values of each component (e.g., diameter, height, or concrete strength grade) comply with the specifications defined in the knowledge graph.
- Inter-Component Relational Compliance: Evaluates whether the logical relationships between multiple components (e.g., the spacing between adjacent piers or the elevation difference between the abutment and road surface) satisfy the design requirements.
5. Case Study and Model Validation
5.1. LLM-Based Structuring of Highway Bridge Specifications
5.2. Knowledge Graph Database Construction
- Import the necessary libraries, such as py2neo and JSON.
- Connect to the local Neo4j database and configure the connection parameters.
- Load and parse the JSON file, which includes components and their attribute constraints.
- Iterate through the data to create component nodes, value nodes, and HAS_CONSTRAINT relationships.
5.3. Model Checking Based on the Graph Database
5.3.1. Component Localization and Attribute Extraction Based on the IFC Model
- Component Localization: The process begins by identifying pier-related components in the IFC model using the IfcColumn type. In the IFC standard, IfcColumn represents vertical load-bearing components, suitable for modeling solid or hollow pier columns. Each the IfcColumn entity is further filtered based on key properties defined in its IfcPropertySet. This ensures accurate identification of hollow pier columns even when naming conventions vary. The updated component localization logic is illustrated in Figure 13.
- Attribute Extraction: For each identified component, key structural and material properties are extracted. In IFC, component attributes are typically stored in IfcPropertySet entities, which are linked to components via IfcRelDefinesByProperties. This study focuses on attributes such as diameter, concrete strength grade, and wall thickness. The extracted results for each component are ultimately organized into a structured dictionary format as illustrated in Table 3.
5.3.2. Validation of Model Attributes Against Knowledge Graph Rules
6. Discussion
6.1. Innovation and Effectiveness of the Method
6.2. Practical Significance
6.3. System Optimization, Limitations, and Future Work
- Accuracy of Semantic Parsing by LLMs: The accuracy of LLMs in parsing design specifications depends heavily on prompt formulation. Performance can vary across engineering domains and specification formats. Future work should focus on refining prompt design and expanding training data to improve the model’s understanding of complex design rules. In practice, LLMs may produce outputs that do not match the specifications or generate inconsistent numerical values. To address this, we decomposed the specifications into individual clauses, clarified the reasoning steps and requirements, and provided structured examples. Prompts and outputs were iteratively refined to improve accuracy and consistency. The choice of LLM and parameter settings also affects the results. For example, in specification information extraction, ChatGPT-4.1 showed slightly higher speed and accuracy than DeepSeek-V3, but it produced more hallucinations.
- Domain-Specific Fine-Tuning: To further improve LLM performance in parsing bridge engineering specifications, future work could explore domain-specific fine-tuning. This involves retraining the model on bridge design specifications and historical BIM datasets. Such fine-tuning helps the model to better understand engineering terminology, numerical constraints, and structural rules, reducing errors in semantic parsing and improving reliability. For example, Kasimir Forth [27] proposed a method that uses semantic textual similarity (STS) and fine-tuned multilingual LLMs to automatically enrich missing information in BIM. Their study demonstrates that domain-specific fine-tuning can significantly enhance model performance in engineering BIM tasks, providing a solid reference for bridge specification parsing. In addition, recent AI applications in infrastructure engineering include deep learning for structural health monitoring, reinforcement learning for construction process optimization, and generative AI for design proposal generation [28]. These approaches illustrate the increasing integration of AI technologies into digital construction workflows. Although this study focuses on LLM-based automated checking, these examples provide context for potential future extensions.
- Knowledge Graph Storage and Query Efficiency: The knowledge graph effectively stores design specification rules but may face challenges in handling large-scale, cross-disciplinary specifications. Future research should address improving the scalability and query performance of knowledge graphs in multi-disciplinary applications.
- BIM Model Data Quality: BIM model data in real-world applications often suffer from incompleteness or inconsistency, requiring additional preprocessing. Future work should explore methods to enhance the robustness of the system by optimizing BIM model data preprocessing.
- Integration with BIM Workflows and Computational Considerations: The framework extracts IFC data using Python and the Ifcopenshell library. Key components, including piers, caps, beams, and slabs, are identified, and attributes such as diameter, concrete strength, wall thickness, and pile spacing are retrieved. These attributes are compared with design rules stored in the knowledge graph, and checking results are presented in tables and visual alerts. Users can import IFC data for batch checking and obtain structured outputs. Current interaction features are basic and not fully integrated into BIM workflows. Regarding computational cost, the current implementation is lightweight and can process medium-sized bridge models on standard workstations. Future work will focus on optimizing performance and implementing parallel processing to handle larger models efficiently. Additionally, BIM plugins or standalone applications will be developed to allow engineers and inspectors to access automated checking directly within common BIM environments, improving usability, streamlining workflows, and promoting practical adoption.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BIM | Building Information Modeling |
LLMs | Large Language Models |
KGs | Knowledge Graphs |
IFC | Industry Foundation Classes |
NLP | Natural Language Processing |
RDF | Resource Description Framework |
LPG | Labeled Property Graph |
SMT | Statistical Machine Translation |
LLaMA-2 | Large Language Model Meta AI |
SVM | Support Vector Machines |
HMM | Hidden Markov Models |
KNN | K-Nearest Neighbors |
STS | Semantic Textual Similarity |
References
- Zhou, X.; Zhao, J.C.; Wang, J.; Huang, X.; Li, X.; Guo, M.; Xie, P. Parallel Computing-Based Online Geometry Triangulation for Building Information Modeling Utilizing Big Data. Autom. Constr. 2019, 107, 102942. [Google Scholar] [CrossRef]
- Eastman, C.; Lee, J.M.; Jeong, Y.S.; Lee, J.-K. Automatic Rule-Based Checking of Building Designs. Autom. Constr. 2009, 18, 1011–1033. [Google Scholar] [CrossRef]
- Zhou, P.; El Gohary, N. Ontology-Based Automated Information Extraction from Building Energy Conservation Codes. Autom. Constr. 2017, 74, 103–117. [Google Scholar] [CrossRef]
- Wang, X.; El-Gohary, N. Deep Learning-Based Relation Extraction from Construction Safety Regulations for Automated Field Compliance Checking. In Proceedings of the Construction Research Congress 2022, Arlington, TX, USA, 9–12 March 2022; pp. 290–297. [Google Scholar] [CrossRef]
- Zhang, J.S.; El Gohary, N.M. Integrating Semantic NLP and Logic Reasoning into a Unified System for Fully-Automated Code Checking. Autom. Constr. 2017, 73, 45–57. [Google Scholar] [CrossRef]
- Xue, X.R.; Zhang, J.S. Part-of-Speech Tagging of Building Codes Empowered by Deep Learning and Transformational Rules. Adv. Eng. Inform. 2021, 47, 101235. [Google Scholar] [CrossRef]
- Lee, J.; Yi, J.-S. Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining. Appl. Sci. 2017, 7, 1141. [Google Scholar] [CrossRef]
- Hassan, F.U.; Le, T.; Lv, X. Addressing Legal and Contractual Matters in Construction Using Natural Language Processing: A Critical Review. J. Constr. Eng. Manag. 2021, 147, 03121004. [Google Scholar] [CrossRef]
- Hassan, F.U.; Le, T. Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing. J. Leg. Aff. Dispute Resolut. Eng. Constr. 2020, 12, 04520009. [Google Scholar] [CrossRef]
- Peng, J.; Liu, X. Automated Code Compliance Checking Research Based on BIM and Knowledge Graph. Sci. Rep. 2023, 13, 7065. [Google Scholar] [CrossRef] [PubMed]
- Hearne, M.; Way, A. Statistical Machine Translation: A Guide for Linguists and Translators. Lang. Linguist. Compass 2011, 5, 205–226. [Google Scholar] [CrossRef]
- Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.; et al. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions; Ananiadou, S., Ed.; Association for Computational Linguistics: Prague, Czech Republic, 2007; pp. 177–180. [Google Scholar]
- Koseki, S.; Kutsuzawa, K.; Owaki, D.; Hayashibe, M. Multimodal Bipedal Locomotion Generation with Passive Dynamics via Deep Reinforcement Learning. Front. Neurorobot. 2022, 16, 1054239. [Google Scholar] [CrossRef] [PubMed]
- Qi, W.; Fan, H.Y.; Karimi, H.R.; Su, H. An Adaptive Reinforcement Learning-Based Multimodal Data Fusion Framework for Human–Robot Confrontation Gaming. Neural Netw. 2023, 164, 489–496. [Google Scholar] [CrossRef] [PubMed]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2025. [Google Scholar] [CrossRef] [PubMed]
- GPT-4. Available online: https://openai.com/zh-Hans-CN/index/gpt-4/ (accessed on 14 September 2025).
- Li, H.; Dong, Z.; Wang, S.; Zhang, H.; Shen, L.; Peng, X.; She, D. Extracting Formal Specifications from Documents Using LLMs for Automated Testing. In Proceedings of the 2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC), Ottawa, ON, Canada, 27–28 April 2025. [Google Scholar]
- Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Rosen, A.S.; Ceder, G.; Persson, K.A.; Jain, A. Structured Information Extraction from Scientific Text with Large Language Models. Nat. Commun. 2024, 15, 1418. [Google Scholar] [CrossRef] [PubMed]
- Shanahan, M.; McDonell, K.; Reynolds, L. Role Play with Large Language Models. Nature 2023, 623, 493–498. [Google Scholar] [CrossRef] [PubMed]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2023. [Google Scholar] [CrossRef]
- Ehrlinger, L.; Wöß, W. Towards a Definition of Knowledge Graphs. In Proceedings of the Posters and Demos Track of 12th International Conference on Semantic Systems—SEMANTiCS2016 and 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS16), Leipzig, Germany, 12–15 September 2016. [Google Scholar]
- Färber, M.; Bartscherer, F.; Menne, C.; Rettinger, A. Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web 2017, 9, 77–129. [Google Scholar] [CrossRef]
- Peng, C.; Xia, F.; Naseriparsa, M.; Osborne, F. Knowledge Graphs: Opportunities and Challenges. Artif. Intell. Rev. 2023, 56, 13071–13102. [Google Scholar] [CrossRef] [PubMed]
- JTGT 3365-05—2022; Technical Specifications for Design of Prefabricated Concrete Highway Bridges. Ministry of Transport of the People’s Republic of China: Beijing, China, 2022.
- JTG 3363-2019; Highway Bridge Foundation and Substructure Design Code. Ministry of Transport of the People’s Republic of China: Beijing, China, 2019.
- JTGD60—2015; General Specifications for Highway Bridges. Ministry of Transport of the People’s Republic of China: Beijing, China, 2015.
- Forth, K.; Borrmann, A. Semantic Enrichment for BIM-Based Building Energy Performance Simulations Using Semantic Textual Similarity and Fine-Tuning Multilingual LLM. J. Build. Eng. 2024, 95, 110312. [Google Scholar] [CrossRef]
- Nyokum, T.; Tamut, Y. Artificial Intelligence in Civil Engineering: Emerging Applications and Opportunities. Front. Built Environ. 2025, 11, 1622873. [Google Scholar] [CrossRef]
Field Name | Meaning | Purpose |
---|---|---|
Component | Name of the structural component or entity | Used to generate a component node |
Property Constraint | ||
– Property | The design attribute is being constrained | Attached to the relationship as the subject of the constraint |
– Comparator | Comparison operator (e.g., ≥, =, ≤) | Defines the logical condition in the relationship |
– Value | The actual limit or required value | Used to generate a Value node |
Type | Description | Unique Attribute |
---|---|---|
Component | Node representing a structural component entity | Name |
Value | Node representing a constraint value | Value |
Relation: HAS_CONSTRAINT | Edge from component to value, indicating that the value is a constraint on a certain attribute | Carries two properties: attribute, comparator |
Field | Value |
---|---|
Component Name | Hollow Pier Column-1 |
Attributes | |
– Diameter | 0.9 m |
– Concrete Strength Grade | C50 |
– Wall Thickness | 150 mm |
No. | Component Name | Attribute | Actual Value | Constraint |
---|---|---|---|---|
1. | Hollow Pier Column-1 | Diameter | 0.98 m | ≥1 m |
2. | Hollow Pier Column-8 | Diameter | 0.90 m | ≥1 m |
3. | Hollow Pier Column-4 | Concrete Strength Grade | C25 | ≥C50 |
4. | Hollow Pier Column-7 | Concrete Strength Grade | C30 | ≥C50 |
5. | Cap-5 | Cap Thickness | 1.0 m | ≥1.5 m |
6. | Cap-10 | Cap Thickness | 1.2 m | ≥1.5 m |
7. | Railing-1 | Height | 1.0 m | ≥1.1 m |
8. | Hollow Pier | Pile Spacing | 1.8 m | ≥2.4 m |
Metric | Formula | Value |
---|---|---|
Accuracy | (TP + TN)/(TP + TN + FP + FN) | 65.20% |
Precision | TP/(TP + FP) | 84.40% |
Recall | TP/(TP + FN) | 70.70% |
F1 Score | 2 * (Precision * Recall)/(Precision + Recall) | 76.90% |
Parameter Name | IFC Type | Check Result |
---|---|---|
Deck | IfcSlab | Correct |
Bridge Beams | IfcBeam | Correct |
Bridge Slab | IfcSlab | Correct |
Pier | IfcColumn | Correct |
Bearing | IfcBearing | Correct |
Abutments | IfcBridgeElement | Correct |
Foundation | IfcFoundation | Correct |
Cap Slab | IfcSlab | Correct |
Piles | IfcPile | Correct |
Transition Section | IfcBridgeElement | Absent |
Expansion Joints | IfcBridgeElement | Absent |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Y.; Jing, X.; Liu, Y.-M. Automated Checking of Highway Bridge BIM Models Based on Large Language Models. Buildings 2025, 15, 3465. https://doi.org/10.3390/buildings15193465
Yang Y, Jing X, Liu Y-M. Automated Checking of Highway Bridge BIM Models Based on Large Language Models. Buildings. 2025; 15(19):3465. https://doi.org/10.3390/buildings15193465
Chicago/Turabian StyleYang, Yongyi, Xiaoping Jing, and Yan-Ming Liu. 2025. "Automated Checking of Highway Bridge BIM Models Based on Large Language Models" Buildings 15, no. 19: 3465. https://doi.org/10.3390/buildings15193465
APA StyleYang, Y., Jing, X., & Liu, Y.-M. (2025). Automated Checking of Highway Bridge BIM Models Based on Large Language Models. Buildings, 15(19), 3465. https://doi.org/10.3390/buildings15193465