Modular Chain-of-Thought (CoT) for LLM-Based Conceptual Construction Cost Estimation
Abstract
1. Introduction
2. Literature Review
2.1. Existing Cost Estimation Approaches in Commercial Construction
2.2. Application of Generative Pre-Trained LLMs in Construction Cost Estimation
2.3. Chain of Thoughts (CoT) Prompting
- RQ1: To what extent can state-of-the-art general-purpose LLMs perform construction cost estimation workflow tasks under zero-shot prompting, without additional instructions or data?
- RQ2: How effectively does CoT prompting improve the performance of LLMs in executing construction cost estimation tasks?
3. Methodology
3.1. Proposed LLM Framework
3.2. Cost Estimation Scenario for Existing LLMs
3.3. Evaluation of Existing LLMs
4. Case Study: Modular Chain of Thoughts Prompting for Conceptual Estimation
4.1. Data
4.2. Modular Framework for the Scenario
4.3. CoT Instructions & Architecture
4.4. Module Calling Function
5. Results & Discussion
5.1. Qualitative Evaluation
5.2. Quantitative Evaluation
6. Limitations and Future Work
7. Conclusions
- Among four pre-trained LLMs tested—GPT-4o, LLaMA 3.2, Gemini 2.0, and Claude 3.5 Sonnet—GPT-4o demonstrated the highest performance across BLEU, ROUGE-L, METEOR, Content Overlap, and Semantic Similarity metrics.
- The CoT approach achieved significant quantitative gains, including a 1536% BLEU increase, 236% ROUGE-L improvement, and 210% METEOR enhancement, with notable rises in Content Overlap (355%) and Semantic Similarity (143%) when compared with zero-shot prompting.
- The modular CoT prompting approach significantly enhanced model accuracy and completeness, raising the human-evaluated confidence score from 1.91 (64%) to 2.52 (84%), marking a 20% performance improvement over zero-shot prompting.
- Results confirm that pre-trained LLMs alone are insufficient for detailed cost estimation, but structured reasoning through CoT substantially improves the performance.
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Burdens to Tasks to Sub-Tasks Mapping [58]
| Estimation Burdens | Tasks | Sub-Tasks |
| Aggregating Quantities | Specify Estimation Type and WBS |
|
| Collect Quantities |
| |
| Format Quantities |
| |
| Referencing Enterprise Historic Cost | Referencing Multiple Cost Data Sources |
|
| Prioritize Cost References |
| |
| Handle Missing Items |
| |
| Referencing External Cost Database | Referencing External Cost Database |
|
| Handle No Match & Multiple matches |
| |
| Follow Output Format |
| |
| Cross-verification with Project Specification | Cross-Verification with Specification |
|
| Handle Changes |
| |
| Work Resuming |
| |
| Planning Evaluation | Initialize Subcontractor Evaluation |
|
| Handle Inconsistencies & Mapping |
| |
| Evaluating Completeness | Check Completeness |
|
| Evaluating Discrepancies |
| |
| Identify Options |
| |
| Metrics Based Evaluation | Metrics Based Evaluation |
|
| Create Negotiation List |
| |
| Visualizing and compiling | Calculate and visualize |
|
| Compile Estimates |
| |
| Generate Final Estimate |
| |
| Managing Changes | Navigate Recent Estimation |
|
| Update Changes |
| |
| Version Control | Naming Standard |
|
| Changes between Versions |
| |
| Data Re-Cycling | Recycle to Historic database |
|
Appendix B. Scenario 1-Conceptual Cost Estimation Scenario
| ID | Main Task | Estimation Sub-Task | Query from the “User” | Expected Answer from “LLM” |
| 1 | Specify Estimation Type and WBS | Task Initiation | I have a new project for which I need to create an estimate. Can you help me? | Yes, I can. Could you provide some details about the project, such as the type of construction, location of the project, and any specific type of estimation—Assembly or Item based—that you want to follow? |
| 2 | Specify Estimation Type and WBS | Specify Estimation Type | Yes. This is a new commercial building construction in Omaha, Nebraska. I want to do an Assembly cost estimation. | Ok. For assembly estimation, you need to know the assembly breakdown in Uniformat II before starting the estimation. Do you know this? |
| 3 | Specify Estimation Type and WBS | Confirm Work Breakdown Structure (WBS) | No, can you explain the Assembly breakdown in Uniformat II down structure for me? | Of course! Uniformat II divides the building into major functional areas or “assemblies” based on the work they perform. Here’s an overview of the Uniformat II structure, broken down into its primary categories: A. Substructure This section includes all the elements related to the building’s foundation and below-grade construction:
The building’s exterior envelope or shell:
This section covers everything inside the building:
Systems that provide utilities and services to the building:
This category covers the built-in equipment and furnishings in the building:
Non-standard or specialized construction:
Anything outside the building footprint:
|
| 4 | Collect Quantities | Collect Quantities | Ok, understand. Now how can I start? | Now, you need to provide quantities, cost data, project specifications, and a particular format or template for the final output of the estimate. |
| 5 | Collect Quantities | Collect Quantities | OK. let’s go one by one with the data | Great. Provide me quantities. |
| 6 | Collect Quantities | Determine Scope | OK. I only have quantities for the shell category. And want to estimate only that portion. | I got it! I will focus on the Shell (B) category for now. Please provide the quantities. |
| 7 | Collect Quantities | Summarize and Display Collected Quantities | Here’s the quantity excel file I have uploaded. Please show me a list of items and quantities in tabular format. | Ok. Here is the list of your items and respective quantities. Do you want to know more about, or do you want to move ahead to the next steps? |
| 8 | Collect Quantities | Check Duplications | Can you check if there are any duplicate items based on the description and assembly code? | These are items that have exactly the same description and assembly code. Do you want to add them to a single item? |
| 9 | Collect Quantities | Check Duplications | Yes, please add. Also, can you find potential duplication that doesn’t match exactly but potentially? | Here are these group items that look potentially similar based on size and descriptions. |
| 10 | Collect Quantities | Finalize Quantities | No, they are not the same items. Keep separate. | Ok. We have now total 25 items. What do you want to do next with quantities? Should we move to cost data? |
| 11 | Format Quantities | Format Quantities | Now, I want you to help formatting before starting estimate. Can you do that? | Yes, I can. Do you have any template of format or you want me to suggest some format? |
| 12 | Format Quantities | Suggest Template | Please suggest a template format suitable for this estimation | Here is the suggested format suitable for the assembly estimate… Do you want to use this? Or do you want me to create other templates? |
| 13 | Format Quantities | Upload Template and Map Quantities | I have one standard template format for quantities. Here I have uploaded. Please put quantities into this new format and follow these instructions to do mapping: (i) Read Instructions inside Template file, each description under every title is the description for you understand what it means (ii) Look at example output and understand (iii) map all quantities to suitable section of this template’s yellow highlighted area, and you are free to add remove rows as required for items. Show the table and save. | I have arranged the original quantities into the new template that you provided as per instructions i, ii, and iii. See the table below. Let me know if you want to modify or change anything. |
| 14 | Format Quantities | Provide download link | Also, can you export new table in excel format. | You can click on the link below to download the new quantities table. |
| 15 | Referencing Multiple Cost Data Sources | Referencing Multiple Cost Data Sources | Now, we are good for quantities. What’s our next task? | We will work on cost data in the next step. Please provide me with cost data. |
| 16 | Referencing Multiple Cost Data Sources | Collect Cost Data | Ok. I have uploaded two datasets for cost data that I want to use. (1) Historic cost—enterprise database (2) External cost database | Great! I have stored both datasets. How do you want to set up the estimation to use both your historic cost and external cost? |
| 17 | Prioritize Cost References | Define Priority and calculations | I want to use the historical database as much as possible. | Ok. I can structure the estimation to prioritize historic cost and only use the database cost if the historic cost for an item is not available. |
| 18 | Prioritize Cost References | Define Priority and calculations | In the historic cost data add a column and get the average of all available commercial projects, for all cost items | Ok. The new column is added to historic dataset, I have saved the updated data. I have calculated the average for items in past commercial projects Here is the table: |
| 19 | Prioritize Cost References | Set Conditions | I want to use historic cost for all items if it is available. If not, then I want to use database cost. | Ok. I can structure the estimation to prioritize historic cost and only use the database cost if the historic cost for an item is not available. |
| 20 | Prioritize Cost References | Set Common Identifier | How would you prioritize? | Are there any common identifiers in two cost data sets? On what basis you want to find items? |
| 21 | Prioritize Cost References | Set Common Identifier | We have the same Assembly Code but slight variations in descriptions. So, consider Assembly Code as the common Identifier. | I will first load both the historic and database cost data from separate files. Let’s assume both files contain Assembly Code as a common identifier that can be used to match items. I first check the historic cost data. If a historic cost is available, I use that. If no historic cost is found, I search for the cost in the database cost file. |
| 22 | Handle Missing Items | Handle Missing Items | If an item is missing from both datasets, what would you do? | I will ask you with closest options based on description. |
| 23 | Handle Missing Items | Summarize Work Done | Briefly Summarize in bullet points: what you have done so far, what data you have, what data you need now | Summary of What Has Been Done:
|
| 24 | Handle Missing Items | Explain Next Steps | What’s the next steps? | Next Steps:
|
| 25 | Handle Missing Items | Output Template Recall | Here, I have uploaded the output template file and specification file. Please read and store both files. | Thank you! Now that I have the output template and project specification files. Can I go ahead start cost calculations? |
| 26 | Referencing External Cost Database | Define Priority for External Cost & Define Condition to use External Cost Data | How would you utilize the template and start calculation? Explain to me before doing it. | First, I’ll carefully review the output template to understand its structure, column names and section layout. There may be sections based on Uniformat II categories (e.g., B10 Superstructure, B20 Exterior Enclosure). Second, I will map the quantities (description, unit, and quantity) to the output template. Third, based on the description of quantities, I will search for each item to match in cost data, first to ‘historical cost’, where available, and then to the ‘database cost’ if needed. Fourth, multiply the quantity by the unit cost to calculate the total cost Fifth, I will keep the cost data assembly code as the unique identifier in the ‘Assembly Code’ column |
| 27 | Handle No Match & Multiple Matches | Handling No Match Items | What if you don’t find an exact match? | If I don’t find each match in the cost data, I will ask you to select with the closest options (maximum 5 options). |
| 28 | Handle No Match & Multiple Matches | Handling Multiple Matched Items | What if you find multiple matches? | If I find multiple matching items in the cost data, I will provide you with options and ask you to select one out of those options. |
| 29 | Follow Output Format | Follow Output Format | Can you export the estimate in excel format with the file name ‘conceptual estimation _today’s date)? | Yes, I can export the estimate with the file name “conceptual estimation_[today’s date].” |
| 30 | Cross-Verification with Specification | Cross-Verification with Specification | Ok. If I want to verify anything with specifications, how would you help? | I can help you in multiple ways: I can provide any specific information that you want to look at. I can also provide a summary of any discrepancies in item or installation description, between specification and items in estimation, that need clarification. And, If adjustments are required, I can incorporate them into the estimate. |
| 31 | Handle Changes | Handle Changes | If something needs to be modified, added, removed, or corrected, would you do it for me? | Absolutely! If any modifications, additions, removals, or corrections are needed, just let me know the specific details, and I’ll make the necessary changes for you. Whether it’s adjusting costs, updating quantities, or correcting formatting, I’m happy to assist with all aspects of the estimate. |
| 32 | Work Resuming | Work Resuming | If I stop here, and come back to continue work, could you remind me with summary of what we have done, and what’s next step, and providing me the last file? | Yes. When you come back, I’ll provide a quick summary of the work we’ve completed so far, including details like the data we’ve loaded, the calculations performed, and any adjustments made. I’ll outline the next steps based on where we left off, so you can pick up exactly where you need to. I’ll also share the last file we were working on. |
Appendix C. Human Evaluation Criteria
| Task ID | Estimation Sub-Task | Completeness Criteria for LLM Response | User Confidence Level on Response (1 = Low 2 = Medium 3 = High) |
| 1 | Task Initiation |
| 3 |
| 2 | Specify Estimation Type |
| 2 |
| 3 | Confirm Work Breakdown Structure (WBS) |
| 3 |
| 4 | Collect Quantities |
| 1 |
| 5 | Collect Quantities |
| 1 |
| 6 | Determine Scope |
| 3 |
| 7 | Summarize and Display Collected Quantities |
| 3 |
| 8 | Check Duplications |
| 1 |
| 9 | Check Duplications |
| 1 |
| 10 | Finalize Quantities |
| 1 |
| 11 | Format Quantities |
| 3 |
| 12 | Suggest Template |
| 3 |
| 13 | Upload Template and Map Quantities |
| 2 |
| 14 | Provide download link |
| 2 |
| 15 | Referencing Multiple Cost Data Sources |
| 3 |
| 16 | Collect Cost Data |
| 2 |
| 17 | Define Priority and calculations |
| 3 |
| 18 | Define Priority and calculations |
| 2 |
| 19 | Set Conditions |
| 2 |
| 20 | Set Common Identifier |
| 2 |
| 21 | Set Common Identifier |
| 1 |
| 22 | Handle Missing Items |
| 1 |
| 23 | Summarize Work Done |
| 1 |
| 24 | Explain Next Steps |
| 1 |
| 25 | Output Template Recall |
| 2 |
| 26 | Define Priority for External Cost & Define Condition to use External Cost Data |
| 2 |
| 27 | Handling No Match Items |
| 1 |
| 28 | Handling Multiple Matched Items |
| 1 |
| 29 | Follow Output Format |
| 1 |
| 30 | Cross-Verification with Specification |
| 2 |
| 31 | Handle Changes |
| 3 |
| 32 | Work Resuming |
| 2 |
References
- Ali, Z.H.; Burhan, A.M. Hybrid machine learning approach for construction cost estimation: An evaluation of extreme gradient boosting model. Asian J. Civ. Eng. 2023, 24, 2427–2442. [Google Scholar] [CrossRef]
- Swei, O.; Gregory, J.; Kirchain, R. Construction cost estimation: A parametric approach for better estimates of expected cost and variation. Transp. Res. Part. B Methodol. 2017, 101, 295–305. [Google Scholar] [CrossRef]
- Hashemi, S.T.; Ebadati, O.M.; Kaur, H. Cost estimation and prediction in construction projects: A systematic review on machine learning techniques. SN Appl. Sci. 2020, 2, 1703. [Google Scholar] [CrossRef]
- Holm, L.; Schaufelberger, J.E. Construction Cost Estimating; Routledge: London, UK, 2021. [Google Scholar] [CrossRef]
- Ahiaga-Dagbui, D.D.; Smith, S.D. Rethinking construction cost overruns: Cognition, learning and estimation. J. Financ. Manag. Prop. Constr. 2014, 19, 38–54. [Google Scholar] [CrossRef]
- Ghimire, P.; Pokharel, S.; Kim, K.; Barutha, P. Machine learning-based prediction models for budget forecast in capital construction. In Proceedings of the 2nd International Conference on Construction, Energy, Environment & Sustainability; Itecons: Funchal, Portugal, 2023; pp. 27–30. [Google Scholar]
- Messner, J. Introduction to Construction Cost Estimating, August 2022. Available online: https://psu.pb.unizin.org/buildingconstructionmanagement/chapter/introduction-to-construction-cost-estimating/ (accessed on 8 March 2025).
- Abanda, F.H.; Kamsu-Foguem, B.; Tah, J.H.M. BIM—New rules of measurement ontology for construction cost estimation. Eng. Sci. Technol. Int. J. 2017, 20, 443–459. [Google Scholar] [CrossRef]
- Ghimire, P.; Kim, K.; Acharya, M. Opportunities and Challenges of Generative AI in Construction Industry: Focusing on Adoption of Text-Based Models. Buildings 2024, 14, 220. [Google Scholar] [CrossRef]
- Rane, N. Role of ChatGPT and Similar Generative Artificial Intelligence (AI) in Construction Industry. Soc. Sci. Res. Netw. Rochester 2023. [Google Scholar] [CrossRef]
- Charette, R.P.; Marshall, H.E. UNIFORMAT II Elemental Classification for Building Specifications, Cost Estimating, and Cost Analysis; MD, NIST IR 6389; National Institute of Standards and Technology: Gaithersburg, Sweden, 1999. [Google Scholar] [CrossRef]
- Sayed, M.; Abdel-Hamid, M.; El-Dash, K. Improving cost estimation in construction projects. Int. J. Constr. Manag. 2023, 23, 135–143. [Google Scholar] [CrossRef]
- Juszczyk, M. The Challenges of Nonparametric Cost Estimation of Construction Works with the use of Artificial Intelligence Tools. Procedia Eng. 2017, 196, 415–422. [Google Scholar] [CrossRef]
- Lim, C.; Hong, W.-K.; Lee, D.; Kim, S. Automatic Rebar Estimation Algorithms for Integrated Project Delivery. J. Asian Archit. Build. Eng. 2016, 15, 411–418. [Google Scholar] [CrossRef]
- Elfaki, A.O.; Alatawi, S.; Abushandi, E. Using Intelligent Techniques in Construction Project Cost Estimation: 10-Year Survey. Adv. Civ. Eng. 2014, 2014, 107926. [Google Scholar] [CrossRef]
- Babatunde, S.O.; Perera, S.; Ekundayo, D.; Adeleye, T.E. An investigation into BIM-based detailed cost estimating and drivers to the adoption of BIM in quantity surveying practices. J. Financ. Manag. Prop. Constr. 2019, 25, 61–81. [Google Scholar] [CrossRef]
- Mubarak, S.A. How to Estimate with RSMeans Data: Basic Skills for Building Construction; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
- Mahamid, I. Factors affecting cost estimate accuracy: Evidence from Palestinian construction projects. Int. J. Manag. Sci. Eng. Manag. 2015, 10, 117–125. [Google Scholar] [CrossRef]
- Akanbi, T.; Zhang, J. Design information extraction from construction specifications to support cost estimation. Autom. Constr. 2021, 131, 103835. [Google Scholar] [CrossRef]
- Feuerriegel, S.; Hartmann, J.; Janiesch, C.; Zschech, P. Generative AI. Bus. Inf. Syst. Eng. 2024, 66, 111–126. [Google Scholar] [CrossRef]
- Baidoo-anu, D.; Ansah, L.O. Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. J. AI 2023, 7, 52–62. [Google Scholar] [CrossRef]
- Zheng, J.; Fischer, M. Dynamic prompt-based virtual assistant framework for BIM information search. Autom. Constr. 2023, 155, 105067. [Google Scholar] [CrossRef]
- Li, C.; Su, Y.; Liu, W. Text-To-Text Generative Adversarial Networks. In 2018 International Joint Conference on Neural Networks (IJCNN); Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, C.; Zhang, M.; Kweon, I.S. Text-to-image Diffusion Models in Generative AI: A Survey. arXiv 2023, arXiv:2303.07909. [Google Scholar] [CrossRef]
- Liu, V.; Long, T.; Raw, N.; Chilton, L. Generative Disco: Text-to-Video Generation for Music Visualization. arXiv 2023, arXiv:2304.08551. Available online: http://arxiv.org/abs/2304.08551 (accessed on 27 August 2023).
- Lei, T.; Barzilay, R.; Jaakkola, T. Rationalizing Neural Predictions. arXiv 2016, arXiv:1606.04155. Available online: http://arxiv.org/abs/1606.04155 (accessed on 27 August 2023).
- Wu, A.N.; Stouffs, R.; Biljecki, F. Generative Adversarial Networks in the built environment: A comprehensive review of the application of GANs across data types and scales. Build. Environ. 2022, 223, 109477. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. MAL 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 6840–6851. Available online: https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html (accessed on 16 September 2023).
- Kumar, M.; Babaeizadeh, M.; Erhan, D.; Finn, C.; Levine, S.; Dinh, L.; Kingma, D. VideoFlow: A Flow-Based Generative Model for Video. arXiv 2019, arXiv:1903.01434. [Google Scholar]
- Lee, J.; Kim, H.; Shim, J.; Hwang, E. Cartoon-Flow: A Flow-Based Generative Adversarial Network for Arbitrary-Style Photo Cartoonization. In Proceedings of the 30th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1241–1251. [Google Scholar] [CrossRef]
- Wan, H.; Zhang, J.; Chen, Y.; Xu, W.; Feng, F. Generative AI Application for Building Industry. arXiv 2024, arXiv:2410.01098. [Google Scholar] [CrossRef]
- Saka, A.; Taiwo, R.; Saka, N.; Salami, B.; Ajayi, S.; Akande, K.; Kazemi, H. GPT Models in Construction Industry: Opportunities, Limitations, and a Use Case Validation. arXiv 2023, arXiv:2305.18997. [Google Scholar] [CrossRef]
- Prieto, S.A.; Mengiste, E.T.; de Soto, B.G. Investigating the Use of ChatGPT for the Scheduling of Construction Projects. Buildings 2023, 13, 857. [Google Scholar] [CrossRef]
- Hassan, H.A.M.; Marengo, E.; Nutt, W. A BERT-Based Model for Question Answering on Construction Incident Reports. In Natural Language Processing and Information Systems; Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2022; pp. 215–223. [Google Scholar] [CrossRef]
- Kim, K.; Ivashchenko, M.; Ghimire, P.; Huang, P.-C. Context-Aware and Adaptive Task Planning for Autonomous Construction Robots Through Llm-Robot Communication. Soc. Sci. Res. Netw. Rochester 2024. [Google Scholar] [CrossRef]
- Kim, K.; Ghimire, P.; Huang, P.-C. Framework for LLM-Enabled Construction Robot Task Planning: Knowledge Base Preparation and Robot–LLM Dialogue for Interior Wall Painting. Robotics 2025, 14, 117. [Google Scholar] [CrossRef]
- Parsafard, P.; Elezaj, O.; Ekundayo, D.; Vakaj, E.; Parmar, M.; Wani, M.A. Automation in Construction Cost Budgeting using Generative Artificial Intelligence. In Proceedings of the International Conference on Industrial Engineering and Operations Management; IEOM Society International: Dubai, United Arab Emirates, 2024. [Google Scholar] [CrossRef]
- Gatto, C.; Cassandro, J.; Mirarchi, C.; Pavan, A. LLM Based Automatic Relation Between Cost Domain Descriptions and IFC Objects. Available online: https://re.public.polimi.it/handle/11311/1280791 (accessed on 23 February 2025).
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large Language Models are Zero-Shot Reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Lazaridou, A.; Gribovskaya, E.; Stokowiec, W.; Grigorev, N. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv 2022, arXiv:2203.05115. [Google Scholar] [CrossRef]
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Adv. Neural Inf. Process. Syst. 2023, 36, 11809–11822. [Google Scholar]
- Nong, Y.; Aldeen, M.; Cheng, L.; Hu, H.; Chen, F.; Cai, H. Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities. arXiv 2024, arXiv:2402.17230. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 1877–1901. Available online: https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html (accessed on 23 February 2025).
- Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. arXiv 2023, arXiv:2205.10625. [Google Scholar] [CrossRef]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y. Self-Refine: Iterative Refinement with Self-Feedback. Adv. Neural Inf. Process. Syst. 2023, 36, 46534–46594. [Google Scholar]
- Conklin, H.; Wang, B.; Smith, K.; Titov, I. Meta-Learning to Compositionally Generalize. arXiv 2021, arXiv:2106.04252. [Google Scholar] [CrossRef]
- Creswell, A.; Shanahan, M.; Higgins, I. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning. arXiv 2022, arXiv:2205.09712. [Google Scholar] [CrossRef]
- Zelikman, E.; Wu, Y.; Mu, J.; Goodman, N. STaR: Bootstrapping Reasoning with Reasoning. Adv. Neural Inf. Process. Syst. 2022, 35, 15476–15488. [Google Scholar]
- Havrilla, A.; Du, Y.; Raparthy, S.C.; Nalmpantis, C.; Dwivedi-Yu, J.; Zhuravinskyi, M.; Hambro, E.; Sukhbaatar, S.; Raileanu, R. Teaching Large Language Models to Reason with Reinforcement Learning. arXiv 2024, arXiv:2403.04642. [Google Scholar] [CrossRef]
- Cheng, M.-Y.; Tsai, H.-C.; Hsieh, W.-S. Web-based conceptual cost estimates for construction projects using Evolutionary Fuzzy Neural Inference Model. Autom. Constr. 2009, 18, 164–172. [Google Scholar] [CrossRef]
- Elmousalami, H.H. Artificial Intelligence and Parametric Construction Cost Estimate Modeling: State-of-the-Art Review. J. Constr. Eng. Manag. 2020, 146, 03119008. [Google Scholar] [CrossRef]
- Walton, J.R.; Stevens, J.D. Improving Conceptual Estimating Methods Using Historical Cost Data. Transp. Res. Rec. 1997, 1575, 127–131. [Google Scholar] [CrossRef]
- Ji, S.-H.; Park, M.; Lee, H.-S. Cost estimation model for building projects using case-based reasoning. Can. J. Civ. Eng. 2011, 38, 570–581. [Google Scholar] [CrossRef]
- Ghimire, P. Framework for Integrating Industry Knowledge into a Large Language Model to Assist Construction Cost Estimation. Ph.D. Thesis, The University of Nebraska—Lincoln, Lincoln, NE, USA, 2025. Available online: https://www.proquest.com/docview/3198872319/abstract/D556793967F749FCPQ/1 (accessed on 12 October 2025).
- Liu, R.; Li, M.; Zhao, S.; Chen, L.; Chang, X.; Yao, L. In-Context Learning for Zero-shot Medical Report Generation. In Proceedings of the 32nd ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2024; pp. 8721–8730. [Google Scholar] [CrossRef]
- Merkus, B. An Assessment of Zero-Shot Open Book Question Answering Using Large Language Models. Master’s Thesis, Utrecht University, Utrecht, The Netherlands, 2023. Available online: https://studenttheses.uu.nl/handle/20.500.12932/44625 (accessed on 2 March 2025).
- Salvador, J.; Bansal, N.; Akter, M.; Sarkar, S.; Das, A.; Karmaker, S.K. Benchmarking LLMs on the Semantic Overlap Summarization Task. arXiv 2024, arXiv:2402.17008. [Google Scholar] [CrossRef]
- Xu, S.; Wu, Z.; Zhao, H.; Shu, P.; Liu, Z.; Liao, W.; Li, S.; Sikora, A.; Liu, T.; Li, X. Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis. arXiv 2024, arXiv:2402.11398. [Google Scholar] [CrossRef]
- Yang, G.; Zhou, Y.; Chen, X.; Zhang, X.; Zhuo, T.Y.; Chen, T. Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models. IEEE Trans. Softw. Eng. 2024, 50, 2437–2457. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics—ACL ’02; Association for Computational Linguistics: Philadelphia, PN, USA, 2001; p. 311. [Google Scholar] [CrossRef]
- Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. Available online: https://aclanthology.org/W04-1013/ (accessed on 2 March 2025).
- Banerjee, S.; Lavie, A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization; Goldstein, J., Lavie, A., Lin, C.-Y., Voss, C., Eds.; Association for Computational Linguistics: Ann Arbor, MI, USA, 2005; pp. 65–72. Available online: https://aclanthology.org/W05-0909/ (accessed on 2 March 2025).
- Niwattanakul, S.; Singthongchai, J.; Naenudorn, E.; Wanapu, S. Using of Jaccard Coefficient for Keywords Similarity; IAENG: Hong Kong, China, 2013. [Google Scholar]
- Sitikhu, P.; Pahi, K.; Thapa, P.; Shakya, S. A Comparison of Semantic Similarity Methods for Maximum Human Interpretability. In 2019 Artificial Intelligence for Transforming Business and Society (AITB); Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Islam, R.; Moushi, O.M. GPT-4o: The Cutting-Edge Advancement in Multimodal LLM. TechRxiv 2024. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. Available online: http://arxiv.org/abs/2302.13971 (accessed on 26 August 2023).
- Islam, R.; Ahmed, I. Gemini-the most powerful LLM: Myth or Truth. In 2024 5th Information Communication Technologies Conference (ICTC); IEEE: Piscataway, NJ, USA, 2024; pp. 303–308. [Google Scholar] [CrossRef]
- Kurokawa, R.; Ohizumi, Y.; Kanzawa, J.; Kurokawa, M.; Sonoda, Y.; Nakamura, Y.; Kiguchi, T.; Gonoi, W.; Abe, O. Diagnostic performances of Claude 3 Opus and Claude 3.5 Sonnet from patient history and key images in Radiology’s ‘Diagnosis Please’ cases. Jpn. J. Radiol. 2024, 42, 1399–1402. [Google Scholar] [CrossRef]
- Diab, N. Out of the BLEU: An Error Analysis of Statistical and Neural Machine Translation of WikiHow Articles from English into Arabic. CDELT Occas. Pap. Dev. Engl. Educ. 2021, 75, 181–211. [Google Scholar] [CrossRef]
- Lee, S.; Lee, J.; Moon, H.; Park, C.; Seo, J.; Eo, S.; Koo, S.; Lim, H. A Survey on Evaluation Metrics for Machine Translation. Mathematics 2023, 11, 1006. [Google Scholar] [CrossRef]
- Ganesan, K. ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. arXiv 2018, arXiv:1803.01937. [Google Scholar] [CrossRef]
- Holzinger, A.; Zatloukal, K.; Müller, H. Is human oversight to AI systems still possible? New Biotechnol. 2025, 85, 59–62. [Google Scholar] [CrossRef] [PubMed]
- Gallegos, I.O.; Rossi, R.A.; Barrow, J.; Tanjim, M.M.; Kim, S.; Dernoncourt, F.; Yu, T.; Zhang, R.; Ahmed, N.K. Bias and Fairness in Large Language Models: A Survey. Comput. Linguist. 2024, 50, 1097–1179. [Google Scholar] [CrossRef]
- Jung, J.; Brahman, F.; Choi, Y. Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement. arXiv 2024, arXiv:2407.18370. [Google Scholar] [CrossRef]
- prolego-team/pdd; Python Prolego-Team; GitHub, Inc.: San Francisco, CA, USA. Available online: https://github.com/prolego-team/pdd (accessed on 3 March 2025).
- Virk, Y.; Devanbu, P.; Ahmed, T. Enhancing Trust in LLM-Generated Code Summaries with Calibrated Confidence Scores. arXiv 2024, arXiv:2404.19318. [Google Scholar] [CrossRef]
















| Estimation Method | Description | Typical Accuracy | Common Application |
|---|---|---|---|
| Rough Order of Magnitude (RoM) | Relies on past project data and analogous estimating techniques to provide preliminary estimates when design details are limited [4,7]. | ±25% | Early feasibility and concept development stages |
| Square Footage Estimating | Uses cost-per-square-foot benchmarks derived from similar completed projects to provide a broad cost range for labor, materials, and services [4,7]. | ±20% | Schematic design and early planning |
| Assemblies Estimating | Breaks down costs into specific building systems and assemblies (e.g., plumbing or mechanical installations) using a functional classification [11]. | ±15% | Design development & conceptual estimates |
| Unit Cost Estimating | The most detailed approach, itemizing materials, labor, and equipment at the lowest quantifiable level to produce precise cost reports [12]. | −5% to +10% | Bidding phase |
| Model | BLEU | ROUGE-L | METEOR |
|---|---|---|---|
| GPT4o | 0.023 * | 0.185 * | 0.196 * |
| Llama 3.2 | 0.0126 | 0.112 | 0.157 |
| Gemini 2.0 | 0.010 | 0.095 | 0.122 |
| Claude 3.5 Sonnet | 0.0135 | 0.170 | 0.168 |
| Evaluation Category | Zero-Shot with GPT4o | CoT with GPT4o |
|---|---|---|
| BELU | 0.023365 | 0.382353 |
| ROUGE_L | 0.185215 | 0.622845 |
| METEOR | 0.196798 | 0.610922 |
| Content Overlap | 0.109057 | 0.497031 |
| Semantic Similarity | 0.245202 | 0.597011 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ghimire, P.; Kim, K.; Stentz, T.; Roy, T. Modular Chain-of-Thought (CoT) for LLM-Based Conceptual Construction Cost Estimation. Buildings 2026, 16, 396. https://doi.org/10.3390/buildings16020396
Ghimire P, Kim K, Stentz T, Roy T. Modular Chain-of-Thought (CoT) for LLM-Based Conceptual Construction Cost Estimation. Buildings. 2026; 16(2):396. https://doi.org/10.3390/buildings16020396
Chicago/Turabian StyleGhimire, Prashnna, Kyungki Kim, Terry Stentz, and Tirthankar Roy. 2026. "Modular Chain-of-Thought (CoT) for LLM-Based Conceptual Construction Cost Estimation" Buildings 16, no. 2: 396. https://doi.org/10.3390/buildings16020396
APA StyleGhimire, P., Kim, K., Stentz, T., & Roy, T. (2026). Modular Chain-of-Thought (CoT) for LLM-Based Conceptual Construction Cost Estimation. Buildings, 16(2), 396. https://doi.org/10.3390/buildings16020396

