Enhancing Embodied Carbon Calculation in Buildings: A Retrieval-Augmented Generation Approach with Large Language Models
Abstract
1. Introduction
2. Literature Review
2.1. Large Language Models
2.2. Embodied Carbon Emissions Estimation
- (1)
- Life cycle assessment (LCA)
- (2)
- The construction energy consumption list statistics method
- (3)
- Integrating Building Information Modeling (BIM)
- (4)
- Machine learning methods
2.3. The Application Prospects of LLMs in Calculating the Embodied Carbon Emissions of Buildings
3. Methods
3.1. Benchmark Case
3.2. Application of the Basic Large Model
3.2.1. Selection of the Basic Large Model
3.2.2. Information Input Combination
3.2.3. Verification Process
3.3. Enhanced Retrieval-Based LLM
3.3.1. Systematic Architecture Design
3.3.2. Realization Process
3.3.3. Analysis of Method Advantages
3.3.4. Evaluation and Verification
4. Results and Discussion
4.1. Analysis of Carbon Emission Calculation Results in Buildings-Based on the Basic LLMs
4.2. Construction of Enhanced Retrieval-Based LLM
4.2.1. Selection of the Foundational Large Model
4.2.2. Knowledge Base Construction
4.2.3. Information Input Method
4.2.4. Workflow Setup
- (1)
- Information Extraction Phase
- (2)
- Carbon Emission Factor Retrieval Phase
- (3)
- Carbon Emission Calculation
- (4)
- Result Presentation
4.3. Analysis of Carbon Emission Calculation Results in Buildings Based on Retrieval-Enhanced LLM
5. Further Discussion
6. Conclusions
- (1)
- The basic large model has application potential in the calculation of embodied carbon emissions in buildings but has limited reliability of results: It is pointed out that although the basic large model has application potential in the field of embodied carbon emissions calculation for buildings, due to its core reliance on massive text data training, by learning language patterns and context correlations to generate “most likely outputs”, rather than based on precise mathematical logical operations, there is a problem of limited reliability of results. This clarifies the direction for subsequent technical optimization.
- (2)
- Revealing the influence of data elements on calculation accuracy: It is clarified that data integrity and regional identification significantly affect calculation accuracy. It is proposed that inputting complete information, including structured material data, regional identification, and carbon emission factors, can minimize parsing errors, providing an optimization idea at the data level for improving calculation accuracy.
- (3)
- Creating a pioneering application path for integrating RAG and large models: This is the first proposal combining RAG technology and large models for application in the embodied carbon emissions calculation scenario for buildings. The core defect of general large models in this field is solved at the technical architecture level through the dynamic invocation of external knowledge bases to achieve precise matching of carbon emission factors and linkage with the calculation module to complete zero-error numerical operations. It provides a new perspective and new methods for theoretical research in this field.
- (4)
- Verifying the significant efficiency improvement of RAG technology: Empirical evidence shows that the combination of RAG technology and the basic large model outperforms the basic large model by 25%, fully demonstrating the optimization value of RAG technology for the application of large models in carbon emission calculations, and providing a reusable performance improvement benchmark for the technical implementation in this field.
- (5)
- Establishing a supporting theory and application system: Based on the above integration practice, a dynamic mapping relationship between data integrity and calculation accuracy is established, and an exclusive RAG architecture and hierarchical application strategy suitable for embodied carbon calculation are proposed and verified.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
- (1)
- InfoSet-1 Minimal Text—No Building Materials Data
- (2)
- InfoSet-2 Plain Text—Contains building material data
- (3)
- InfoSet-3 Plain Text—Includes building material data + area markings
- (4)
- InfoSet-4 Structured EXCEL + Area Marking
- (5)
- InfoSet-5—Complete Package Information
References
- Huang, Z.J.; Zhou, H.; Miao, Z.J.; Tang, H.; Lin, B.R.; Zhuang, W.M. Life-Cycle Carbon Emissions (LCCE) of Buildings: Implications, Calculations, and Reductions. Engineering 2024, 35, 115–139. [Google Scholar] [CrossRef]
- Farhan, S.A.; Shafiq, N.; Azizli, K.A.; Umar, U.A.; Gardezi, S.S.S. Embodied Carbon of Buildings: Tools, Methods and Strategies. In Proceedings of the 2nd International Conference on Civil, Offshore and Environmental Engineering (ICCOEE), Kuala Lumpur, Malaysia, 3–5 June 2014. [Google Scholar] [CrossRef]
- Peng, C.H. Calculation of a building’s life cycle carbon emissions based on Ecotect and building information modeling. J. Clean. Prod. 2016, 112, 453–465. [Google Scholar] [CrossRef]
- Erzurum, T.; Bettemir, Ö. Analysis of Embodied Energy and Carbon Emission of a Single Story Rural Area Structure. J. Polytech.-Politek. Derg. 2024, 27, 1565–1580. [Google Scholar] [CrossRef]
- Jiang, B.Z.; Zhang, H.; Li, Y.; Zhou, H.; Xiao, Z.; He, S.; Qiu, W.; Li, Y. A Practical Investigation of the Accuracy of Large Language Models in Various Industrial Application Scenarios. In Proceedings of the 1st International Workshop on IoT Datasets for Multi-modal Large Model, Hangzhou, China, 4–7 November 2024. [Google Scholar] [CrossRef]
- Pan, H.N.; Mudur, N.; Taranto, W.; Tikhanovskaya, M.; Venugopalan, S.; Bahri, Y.; Brenner, M.P.; Kim, E.-A. Quantum many-body physics calculations with large language models. Commun. Phys. 2025, 8, 49. [Google Scholar] [CrossRef]
- Rezgui, K. Large Language Models for Healthcare: Applications, Models, Datasets, and Challenges. In Proceedings of the 10th International Conference on Control, Decision and Information Technologies (CoDIT), Vallette, Malta, 1–4 July 2024. [Google Scholar] [CrossRef]
- Zhang, L.; Chen, Z.L. Opportunities of applying Large Language Models in building energy sector. Renew. Sustain. Energy Rev. 2025, 214, 115558. [Google Scholar] [CrossRef]
- Cang, Y.J.; Yang, L.; Luo, Z.; Zhang, N. Prediction of embodied carbon emissions from residential buildings with different structural forms. Sustain. Cities Soc. 2020, 54, 101946. [Google Scholar] [CrossRef]
- Gao, H.; Wang, X.; Wu, K.; Zheng, Y.; Wang, Q.; Shi, W.; He, M. A Review of Building Carbon Emission Accounting and Prediction Models. Buildings 2023, 13, 1617. [Google Scholar] [CrossRef]
- Chang, Y.J.; Yu, T.Y.; Chang, C.H. Evaluating the Performance of Open-Source LLMs in Local RAG Systems: A Practical Study on Low-Carbon Data Applications. In Proceedings of the Communications in Computer and Information Science, New Delhi, India, 24 May 2025. [Google Scholar] [CrossRef]
- Mohebbi, G.; Bahadori-Jahromi, A.; Ferri, M.; Mylona, A. The Role of Embodied Carbon Databases in the Accuracy of Life Cycle Assessment (LCA) Calculations for the Embodied Carbon of Buildings. Sustainability 2021, 13, 7988. [Google Scholar] [CrossRef]
- Mohan, G.B.; Kumar, R.P.; Krishh, P.V.; Keerthinathan, A.; Lavanya, G.; Meghana, M.K.U.; Sulthana, S.; Doss, S. An analysis of large language models: Their impact and potential applications. Knowl. Inf. Syst. 2024, 66, 5047–5070. [Google Scholar] [CrossRef]
- Chen, Z.Y.; Xu, L.; Zheng, H.; Chen, L.; Tolba, A.; Zhao, L.; Yu, K.; Feng, H. Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models. CMC-Comput. Mater. Contin. 2024, 80, 1753–1808. [Google Scholar] [CrossRef]
- Saleh, Y.; Abu Talib, M.; Nasir, Q.; Dakalbab, F. Evaluating large language models: A systematic review of efficiency, applications, and future directions. Front. Comput. Sci. 2025, 7, 1523699. [Google Scholar] [CrossRef]
- Naik, D.; Naik, I.; Naik, N. Applications of AI Chatbots Based on Generative AI, Large Language Models and Large Multimodal Models. In Proceedings of the 2024 International Conference on Computing, Communication, Cybersecurity and AI, London, UK, 3–4 July 2024. [Google Scholar] [CrossRef]
- Dai, Z.Q. Applications and Challenges of Large Language Models in Smart Government—From technological Advances to Regulated Applications. In Proceedings of the 3rd International Conference on Frontiers of Artificial Intelligence and Machine Learning (FAIML), College of Computer and Information Technology, Yichang, China, 26–28 April 2024. [Google Scholar] [CrossRef]
- Santoro, J.F.; Kripka, M. Evaluation of CO2 emissions in RC structures considering local and global databases. Innov. Infrastruct. Solut. 2024, 9, 33. [Google Scholar] [CrossRef]
- Luo, Z.X.; Yang, L.; Liu, J.P. Embodied carbon emissions of office building: A case study of China’s 78 office buildings. Build. Environ. 2016, 95, 365–371. [Google Scholar] [CrossRef]
- Aparna, K.; Baskar, K. Scientometric analysis and panoramic review on life cycle assessment in the construction industry. Innov. Infrastruct. Solut. 2024, 9, 96. [Google Scholar] [CrossRef]
- Zhai, Y.K.; Li, Y.; Tang, S.; Liu, Y.; Liu, Y. Lightweight Strategies for Wooden-Structure Buildings Based on Embodied Carbon Emission Calculations for Carbon Reduction. Buildings 2024, 14, 3460. [Google Scholar] [CrossRef]
- Robati, M.; Daly, D.; Kokogiannakis, G. A method of uncertainty analysis for whole-life embodied carbon emissions (CO2-e) of building materials of a net-zero energy building in Australia. J. Clean. Prod. 2019, 225, 541–553. [Google Scholar] [CrossRef]
- Su, S.; Zang, Z.; Yuan, J.; Pan, X.; Shan, M. Considering critical building materials for embodied carbon emissions in buildings: A machine learning-based prediction model and tool. Case Stud. Constr. Mater. 2024, 20, e02887. [Google Scholar] [CrossRef]
- Rodrigues, F.; Isayeva, A.; Rodrigues, H.; Pinto, A. Energy efficiency assessment of a public building resourcing a BIM model. Innov. Infrastruct. Solut. 2020, 5, 41. [Google Scholar] [CrossRef]
- Cang, Y.J.; Luo, Z.; Yang, L.; Han, B. A new method for calculating the embodied carbon emissions from buildings in schematic design: Taking “building element” as basic unit. Build. Environ. 2020, 185, 107306. [Google Scholar] [CrossRef]
- Jiang, X. Prediction method of carbon emissions of intelligent buildings based on secondary decomposition BAS-LSTM. Clean Technol. Environ. Policy 2025, 27, 1903–1913. [Google Scholar] [CrossRef]
- Zheng, Y.; Li, J.; Wang, S.; Ying, D.; Chew, B.C. Research on the Prediction Model of Green Building Carbon Emission Based on Computer Big Data. In Proceedings of the 2024 International Conference on Telecommunications and Power Electronics, TELEPE 2024, Frankfurt, Germany, 29–31 May 2024. [Google Scholar] [CrossRef]
- Xie, Q.M.; Jiang, Q.; Kurnitski, J.; Yang, J.; Lin, Z.; Ye, S. Quantitative Carbon Emission Prediction Model to Limit Embodied Carbon from Major Building Materials in Multi-Story Buildings. Sustainability 2024, 16, 5575. [Google Scholar] [CrossRef]
- Li, L. Research on Low Carbon Building Technology System and Carbon Emission Measurement Method Based on Neural Network. In Proceedings of the ACM International Conference Proceeding Series, Nantes, France, 12–15 June 2023. [Google Scholar] [CrossRef]
- Gu, X.R.; Chen, C.; Fang, Y.; Mahabir, R.; Fan, L. CECA: An intelligent large-language-model-enabled method for accounting embodied carbon in buildings. Build. Environ. 2025, 272, 112694. [Google Scholar] [CrossRef]
- Liu, M.; Zhang, L.; Chen, J.; Chen, W.-A.; Yang, Z.; Lo, L.J.; Wen, J.; O’nEill, Z. Large language models for building energy applications: Opportunities and challenges. Build. Simul. Int. J. 2025, 18, 225–234. [Google Scholar] [CrossRef]
- Zhou, L.; Yan, S.; Li, Z.; Ma, J. Exploring the Application of Retrieval-Augmented Generation Technology in Defense Technology Intelligence. In Proceedings of the 2024 International Annual Conference on Complex Systems and Intelligent Science (CSIS-IAC), Guangzhou, China, 20–22 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 664–669. [Google Scholar] [CrossRef]
Material Name | Material Weight (kg) | Carbon Emission Coefficient | Embedded Carbon Emissions (kgCO2e) |
---|---|---|---|
Concrete, Cast-in-Place-C15 | 14,674.40 | 0.10 | 1511.46 |
Default Mass Floor | 1190.91 | 0.10 | 122.66 |
Concrete-Cast-in-Place Concrete | 97,024.06 | 0.10 | 9993.48 |
Concrete, Cast in Situ | 4.81 | 0.10 | 0.50 |
Concrete | 9764.19 | 0.10 | 1005.71 |
Concrete-Precast Concrete-35 MPa | 12,790.71 | 0.28 | 3581.40 |
Steel | 4528.74 | 3.02 | 13,676.80 |
QuadCore Trapezoidal Roof Panel_KS1000RW_External Weather Sheet | 19,720.41 | 3.29 | 64,899.88 |
QuadCore Trapezoidal Roof Panel_KS1000RW_Internal LinerSheet | 29,468.40 | 3.06 | 90,173.30 |
Default Roof-metal Single Skin | 43.79 | 3.06 | 134.00 |
Metal-Steel 50-355 | 20,982.77 | 2.45 | 51,407.78 |
Metal-Steel-S275 | 64,730.55 | 2.45 | 158,589.85 |
PrepaintSteel_ArcelorMittal_Construction_HARULTRA-35-CORAL | 27,644.39 | 3.06 | 84,591.83 |
Cladding, Vertical Ribbed | 18,124.46 | 3.29 | 59,647.60 |
Metal Stud Layer | 169,630.17 | 2.97 | 502,953.45 |
Aluminium | 2678.14 | 1.71 | 4568.90 |
Default Roof-Generic Insulation 125 mm | 178.04 | 1.44 | 256.38 |
ArcelorMittal-Mineral Wool | 244,027.36 | 0.74 | 179,360.11 |
Insulation/Support Frame | 28.46 | 0.74 | 21.06 |
Rock Wool | 33,027.24 | 1.44 | 47,559.22 |
Default Wall | 768.25 | 0.39 | 299.62 |
Gypsum Wall Board | 36,163.64 | 0.39 | 14,103.82 |
Plaster | 4467.50 | 0.39 | 1742.32 |
Brick, Common | 25,933.09 | 0.21 | 5523.75 |
PrepaintSteel_ArcelorMittal_Construction_INTERIEUR-12-WHITE | 3213.46 | 15.40 | 49,487.24 |
Paint-White Lining | 0.56 | 2.33 | 1.30 |
Glass | 8286.05 | 1.67 | 13,812.84 |
Total | 849,094.53 | - | 1,359,026.26 |
Number | Material Quantity | ECF | Regional Data Source Indication |
---|---|---|---|
InfoSet 1 | No | No | No |
InfoSet 2 | Yes, in plain text | No | No |
InfoSet 3 | Yes, in plain text | No | Yes, explain the area where the building is located. |
InfoSet 4 | Yes, in Excel | No | Yes, explain the area where the building is located. |
InfoSet 5 | Yes, in Excel | Yes | Yes, explain the area where the building is located. |
Indicator Type | Indicator Name | Symbol | Formula | Eq. |
---|---|---|---|---|
Factor Matching Indicator | FM Score | FMi | (1) | |
Error Correlation Indicator | Total ECE Relative Error | δi | (2) | |
Discrepancy Index | Prediction Accuracy | PAi | δ | (3) |
Standard Deviation | σ | (4) | ||
Coefficient of Variation | CV | × 100% | (5) | |
Stability | S | S = 1 − CV | (6) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, Y.; Zheng, R.; Xia, J. Enhancing Embodied Carbon Calculation in Buildings: A Retrieval-Augmented Generation Approach with Large Language Models. Buildings 2025, 15, 3449. https://doi.org/10.3390/buildings15193449
Zou Y, Zheng R, Xia J. Enhancing Embodied Carbon Calculation in Buildings: A Retrieval-Augmented Generation Approach with Large Language Models. Buildings. 2025; 15(19):3449. https://doi.org/10.3390/buildings15193449
Chicago/Turabian StyleZou, Yushi, Rengeng Zheng, and Jun Xia. 2025. "Enhancing Embodied Carbon Calculation in Buildings: A Retrieval-Augmented Generation Approach with Large Language Models" Buildings 15, no. 19: 3449. https://doi.org/10.3390/buildings15193449
APA StyleZou, Y., Zheng, R., & Xia, J. (2025). Enhancing Embodied Carbon Calculation in Buildings: A Retrieval-Augmented Generation Approach with Large Language Models. Buildings, 15(19), 3449. https://doi.org/10.3390/buildings15193449