Next Article in Journal
Integrating Building- and Site-Specific and Generic Fragility Curves into Seismic Risk Assessment: A PRISMA-Based Analysis of Methodologies and Applications
Next Article in Special Issue
A Quantitative Approach to Evaluating Multi-Event Resilience in Oil Pipeline Incidents
Previous Article in Journal
Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework
Previous Article in Special Issue
Structural Performance of a Hollow-Core Square Concrete Column Longitudinally Reinforced with GFRP Bars under Concentric Load
 
 
Article
Peer-Review Record

Optimizing the Utilization of Generative Artificial Intelligence (AI) in the AEC Industry: ChatGPT Prompt Engineering and Design

CivilEng 2024, 5(4), 971-1010; https://doi.org/10.3390/civileng5040049
by Reihaneh Samsami
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
CivilEng 2024, 5(4), 971-1010; https://doi.org/10.3390/civileng5040049
Submission received: 12 June 2024 / Revised: 23 September 2024 / Accepted: 17 October 2024 / Published: 28 October 2024
(This article belongs to the Collection Recent Advances and Development in Civil Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper succinctly captures the potential of generative AI, specifically ChatGPT, in transforming the Architecture, Engineering, and Construction (AEC) industry by automating complex tasks like construction scheduling and hazard recognition. The guidelines for prompt design and engineering presented in this paper are aimed at optimizing AI responses to improve efficiency and innovation in the AEC sector. The illustrative examples and methodology discussed provide practical insights into the application of AI in enhancing project scheduling and hazard recognition.  The paper provides valuable insights into optimizing Generative AI utilization in the AEC industry through strategic prompt engineering. The detailed methodology and practical applications discussed offer a comprehensive guide for stakeholders to harness AI's potential effectively. The clear articulation of the study's objectives, combined with the thorough analysis and practical examples, makes this paper a significant contribution to the field of AI in AEC. However, there are several areas where the research could be expanded and refined to provide a more comprehensive understanding and practical applicability.  

  1. Limited Scope of Applications:

    • While the paper focuses on construction scheduling and hazard recognition, it would benefit from a broader examination of AI applications across other phases of the AEC project lifecycle, such as design optimization, project cost estimation, and post-construction maintenance. Expanding the scope could provide a more holistic view of AI's potential impact.
  2. Depth of Case Studies:

    • The case studies presented are a good start, but they lack depth. A more detailed analysis of each case, including the specific challenges encountered and how ChatGPT's outputs compared to traditional methods, would strengthen the findings. Additionally, providing more diverse case studies across different types of construction projects could offer a richer dataset for analysis.
  3. Evaluation Metrics:

    • The paper primarily uses "Relative Error" as the accuracy metric. While this is useful, incorporating additional metrics such as precision, recall, and F1 score, especially in hazard recognition tasks, would provide a more nuanced evaluation of ChatGPT's performance. Furthermore, a discussion on the trade-offs between different metrics would be valuable.
  4. Practical Implementation Challenges:

    • The paper should address potential challenges in the practical implementation of ChatGPT in the AEC industry. Issues such as integration with existing BIM software, data privacy concerns, and the need for ongoing training and fine-tuning of AI models in dynamic construction environments are critical for real-world application.
  5. Human-AI Collaboration:

    • The role of human oversight and collaboration in AI-driven processes is underexplored. Future research should investigate how AI can best augment human expertise in the AEC industry, ensuring that AI tools are used to enhance rather than replace human judgment.
  6. Regulatory and Ethical Considerations:

    • The paper would benefit from a discussion on the regulatory and ethical implications of using AI in construction. This includes issues related to liability in case of AI errors, ethical considerations in data usage, and compliance with industry standards and regulations.

Suggestions for Future Research are mentioned below:

  1. Broader Application Spectrum:

    • Future research should explore a wider range of applications for generative AI in the AEC industry. This includes areas like sustainability assessments, energy efficiency optimization, and automated compliance checks with building codes and regulations.
  2. Longitudinal Studies:

    • Conducting longitudinal studies to assess the long-term impact of AI integration on project outcomes would provide valuable insights. This could help in understanding how AI tools evolve and improve with continuous use and feedback in real-world scenarios.
  3. Interdisciplinary Collaboration:

    • Encourage interdisciplinary collaboration between AI researchers, construction professionals, and regulatory bodies to develop comprehensive guidelines and best practices for AI implementation in the AEC industry.
  4. User-Centric Design:

    • Investigate the development of user-friendly interfaces and training programs to ensure that AEC professionals can effectively utilize AI tools. This includes studying the usability of AI-driven software and the effectiveness of training programs in enhancing user competence.
  5. Scalability and Adaptability:

    • Research should focus on the scalability of AI solutions across different project sizes and types. Additionally, exploring the adaptability of AI models to different cultural and regulatory contexts in the global construction industry would be beneficial.

By addressing these areas, future research can provide a more robust and practical framework for integrating generative AI in the AEC industry, ultimately driving innovation, efficiency, and safety in construction projects.

Author Response

Reviewer 1:

The paper succinctly captures the potential of generative AI, specifically ChatGPT, in transforming the Architecture, Engineering, and Construction (AEC) industry by automating complex tasks like construction scheduling and hazard recognition. The guidelines for prompt design and engineering presented in this paper are aimed at optimizing AI responses to improve efficiency and innovation in the AEC sector. The illustrative examples and methodology discussed provide practical insights into the application of AI in enhancing project scheduling and hazard recognition.  The paper provides valuable insights into optimizing Generative AI utilization in the AEC industry through strategic prompt engineering. The detailed methodology and practical applications discussed offer a comprehensive guide for stakeholders to harness AI's potential effectively. The clear articulation of the study's objectives, combined with the thorough analysis and practical examples, makes this paper a significant contribution to the field of AI in AEC. However, there are several areas where the research could be expanded and refined to provide a more comprehensive understanding and practical applicability.

Thank you for your time and valuable comments. Your review offers constructive suggestions that can significantly enhance the quality and scope of our paper. I have addressed each comment in the following:

  1. Limited Scope of Applications:
    • While the paper focuses on construction scheduling and hazard recognition, it would benefit from a broader examination of AI applications across other phases of the AEC project lifecycle, such as design optimization, project cost estimation, and post-construction maintenance. Expanding the scope could provide a more holistic view of AI's potential impact.

I appreciate your suggestion to broaden the scope of AI applications across additional phases of the AEC project lifecycle, such as design optimization, project cost estimation, and post-construction maintenance. I acknowledge that the current paper focuses primarily on construction scheduling and hazard recognition. This was a deliberate choice to provide a focused and detailed exploration of these specific areas where generative AI, particularly ChatGPT, shows immediate promise. However, I recognize the importance of exploring a broader range of applications to provide a more holistic view of AI's potential in the AEC industry.

It is important to note that research in the domain of AI applications in the AEC industry is still in its early stages, and there is currently limited literature available. This paper aims to pioneer the study of AI's applications within this field, and I view it as a foundational step toward a more comprehensive exploration. I consider this work as part of an ongoing research effort, and future studies will indeed expand into other critical phases such as design optimization, project cost estimation, and post-construction maintenance.

I agree that expanding the scope in future research will be essential for providing a more robust understanding of AI's impact across the entire lifecycle of AEC projects. I will make sure to emphasize this in our discussion and future research directions sections of the paper.

  1. Depth of Case Studies:
    • The case studies presented are a good start, but they lack depth. A more detailed analysis of each case, including the specific challenges encountered and how ChatGPT's outputs compared to traditional methods, would strengthen the findings. Additionally, providing more diverse case studies across different types of construction projects could offer a richer dataset for analysis.

I appreciate your recognition of the potential of these case studies and agree that a more detailed analysis would further enhance the findings.

The current case studies were designed to provide initial insights into the practical application of ChatGPT in the AEC industry. Given the pioneering nature of this research, I aimed to establish a baseline understanding of how generative AI can be applied to construction scheduling and hazard recognition. However, I recognize that a deeper exploration of the specific challenges encountered, and a comparative analysis between ChatGPT's outputs and traditional methods, would indeed provide a more robust evaluation.

In future iterations of this research, I will incorporate case studies from different types of construction projects, such as residential, commercial, and infrastructure projects, to provide a richer and more comprehensive dataset for analysis. I will include a more thorough examination of each case study, highlighting the specific challenges faced during implementation, as well as a detailed comparison of ChatGPT's outputs with those generated by traditional methods.

  1. Evaluation Metrics:
    • The paper primarily uses "Relative Error" as the accuracy metric. While this is useful, incorporating additional metrics such as precision, recall, and F1 score, especially in hazard recognition tasks, would provide a more nuanced evaluation of ChatGPT's performance. Furthermore, a discussion on the trade-offs between different metrics would be valuable.

I appreciate your suggestion to incorporate additional metrics such as precision, recall, and F1 score, particularly in the context of hazard recognition tasks. The use of "Relative Error" was chosen as an initial metric to provide a straightforward measure of accuracy, especially in construction scheduling tasks. However, I agree that a more comprehensive evaluation framework incorporating multiple metrics would offer a deep understanding of ChatGPT's performance, particularly in tasks like hazard recognition, where false positives and false negatives can have significant implications.

  1. Practical Implementation Challenges:
    • The paper should address potential challenges in the practical implementation of ChatGPT in the AEC industry. Issues such as integration with existing BIM software, data privacy concerns, and the need for ongoing training and fine-tuning of AI models in dynamic construction environments are critical for real-world application.

I appreciate this feedback on practical implementation of Gen AI. A paragraph is added to Line 1165 to address this comment:

“While ChatGPT holds considerable promise for transforming the AEC industry, several practical challenges must be tackled to ensure its successful implementation in real-world scenarios. First, integrating ChatGPT seamlessly with existing BIM software is crucial, as BIM systems play a central role in managing construction projects across various stages, from planning to execution. Ensuring AI compatibility with these systems is essential for its broader application.

Another key concern is data privacy. As AI becomes more embedded in the AEC industry, addressing the regulatory and ethical implications surrounding its use is crucial. One critical issue is liability—if an AI tool like ChatGPT makes an error, it’s important to determine who is accountable and establish protocols for managing such situations. This highlights the need for clear guidelines and legal frameworks to mitigate AI-related risks in construction projects.

Additionally, ethical considerations around data usage are paramount. Since construction projects often involve sensitive information, it’s crucial that AI systems safeguard this data responsibly, maintaining privacy and adhering to legal standards. Furthermore, as the industry evolves, AI tools must be continually updated to align with the latest standards and regulations. A thorough examination of these regulatory and ethical aspects will better equip the AEC industry to integrate AI responsibly and effectively.

It is also essential to recognize that AI should be used to enhance human understanding rather than foster dependency or oversimplification. While tools like ChatGPT can significantly improve efficiency in the AEC industry, human oversight and collaboration remain indispensable. AI should complement, not replace, human expertise. Given the complexity of the construction industry, which often demands nuanced decision-making, integrating AI-driven insights with human experience is critical. Future research should explore how AI can work alongside AEC professionals, supporting and amplifying their judgment and skills. This collaborative approach ensures that AI tools are used thoughtfully, adding value to the decision-making process while preserving the crucial human element in construction projects.

Furthermore, there is a risk of inaccuracies in ChatGPT responses that may not always be readily apparent, highlighting the need for ensuring the reliability of this tool. One of the critical challenges in applying AI models like ChatGPT within the AEC industry is the potential for biases inherent in the model's training data, which can result in skewed or incomplete hazard identification. Given that the training data for LLMs may not comprehensively encompass the full spectrum of industry-specific hazard standards, there is a significant risk that the outputs generated by these models may not fully align with the latest safety regulations or may include inaccuracies, commonly referred to as 'hallucinations'.

Last, but not least, the dynamic nature of construction safety regulations poses an additional challenge, as it is improbable that any single LLM will remain fully aligned with these evolving standards without regular updates and fine-tuning. Consequently, while this study highlights the potential of ChatGPT in enhancing hazard recognition, it is imperative to caution that AI-generated outputs should be consistently cross-referenced with the most recent industry guidelines and expert insights to ensure their accuracy and relevance.”

  1. Human-AI Collaboration:
    • The role of human oversight and collaboration in AI-driven processes is underexplored. Future research should investigate how AI can best augment human expertise in the AEC industry, ensuring that AI tools are used to enhance rather than replace human judgment.

I appreciate this feedback on the role of human supervision. It was briefly mentioned in the conclusion section of the text as “However, leveraging ChatGPT presents notable challenges. It is essential to use the technology to augment human understanding rather than fostering dependency and superficial thinking.”

A paragraph is added at Line 1183 to address this comment:

“It is also essential to recognize that AI should be used to enhance human understanding rather than foster dependency or oversimplification. While tools like ChatGPT can significantly improve efficiency in the AEC industry, human oversight and collaboration remain indispensable. AI should complement, not replace, human expertise. Given the complexity of the construction industry, which often demands nuanced decision-making, integrating AI-driven insights with human experience is critical. Future research should explore how AI can work alongside AEC professionals, supporting and amplifying their judgment and skills. This collaborative approach ensures that AI tools are used thoughtfully, adding value to the decision-making process while preserving the crucial human element in construction projects.”

  1. Regulatory and Ethical Considerations:
    • The paper would benefit from a discussion on the regulatory and ethical implications of using AI in construction. This includes issues related to liability in case of AI errors, ethical considerations in data usage, and compliance with industry standards and regulations.

I appreciate this feedback on the regulatory and ethical concerns of Gen AI. A paragraph is added at Line 11176 to address this comment:

“Additionally, ethical considerations around data usage are paramount. Since construction projects often involve sensitive information, it’s crucial that AI systems safeguard this data responsibly, maintaining privacy and adhering to legal standards. Furthermore, as the industry evolves, AI tools must be continually updated to align with the latest standards and regulations. A thorough examination of these regulatory and ethical aspects will better equip the AEC industry to integrate AI responsibly and effectively.”

Suggestions for Future Research are mentioned below:

  1. Broader Application Spectrum:
  • Future research should explore a wider range of applications for generative AI in the AEC industry. This includes areas like sustainability assessments, energy efficiency optimization, and automated compliance checks with building codes and regulations.
  1. Longitudinal Studies:
  • Conducting longitudinal studies to assess the long-term impact of AI integration on project outcomes would provide valuable insights. This could help in understanding how AI tools evolve and improve with continuous use and feedback in real-world scenarios.
  1. Interdisciplinary Collaboration:
  • Encourage interdisciplinary collaboration between AI researchers, construction professionals, and regulatory bodies to develop comprehensive guidelines and best practices for AI implementation in the AEC industry.
  1. User-Centric Design:
  • Investigate the development of user-friendly interfaces and training programs to ensure that AEC professionals can effectively utilize AI tools. This includes studying the usability of AI-driven software and the effectiveness of training programs in enhancing user competence.
  1. Scalability and Adaptability:
  • Research should focus on the scalability of AI solutions across different project sizes and types. Additionally, exploring the adaptability of AI models to different cultural and regulatory contexts in the global construction industry would be beneficial.

By addressing these areas, future research can provide a more robust and practical framework for integrating generative AI in the AEC industry, ultimately driving innovation, efficiency, and safety in construction projects.

Thank you for your thoughtful suggestions regarding future research directions. I agree that exploring a broader spectrum of applications for generative AI in the AEC industry, such as sustainability assessments, energy efficiency optimization, and automated compliance checks, would significantly enhance our understanding of AI's potential impact across various facets of construction projects. I also recognize the value of conducting longitudinal studies to assess the long-term effects of AI integration, as this would provide deeper insights into how these tools evolve and improve with continuous feedback from real-world applications.

The idea of fostering interdisciplinary collaboration between AI researchers, construction professionals, and regulatory bodies is particularly compelling. Developing comprehensive guidelines and best practices for AI implementation will be essential for ensuring that these technologies are integrated effectively and responsibly. Additionally, focusing on user-centric design is crucial. By investigating how to create user-friendly interfaces and effective training programs, we can help AEC professionals more readily adopt and utilize AI tools in their work.

Finally, I agree that future research should delve into the scalability and adaptability of AI solutions, especially across different project sizes, types, and cultural contexts. Addressing these areas will provide a more robust framework for integrating AI into the AEC industry, ultimately driving innovation, efficiency, and safety in construction projects. I am committed to considering these important aspects in our future research endeavors and appreciate your insightful recommendations.

Reviewer 2 Report

Comments and Suggestions for Authors

The research provides practical insights for AEC professionals on how to harness the power of generative AI tools like LLMs. By following the proposed guidelines, stakeholders can improve construction scheduling processes, enhance hazard recognition capabilities, and ultimately drive innovation and growth in the industry.  The paper attempts to demonstrate how prompt engineering can enhance the practical application of LLMs in the AEC industry, leading to more accurate and reliable outputs in construction scheduling and hazard recognition.

(1) There are a number of validation issues related to the methodology described in the paper:

* The paper has not incorporated expert verification of ChatGPT's outputs. Given that application of LLMs in the study is very domain specific and the domain is one where safety-related information is critical, relying solely on AI-generated content without expert review could lead to potentially harmful advice or overlooked hazards. Even in more simple domain specific tasks like generating social media messages about life events researchers have found that LLMs can make errors at non-trivial rates and in a variety of ways. Several critical studies related to this issue are described below:

Xue, Zhiwen, Chong Xu, and Xiwei Xu. "Application of ChatGPT in natural disaster prevention and reduction." Natural Hazards Research 3.3 (2023): 556-562.

Oviedo-Trespalacios, Oscar, et al. "The risks of using ChatGPT to obtain common safety-related information and advice." Safety science 167 (2023): 106244.

Lynch, Christopher J., et al. "A structured narrative prompt for prompting narratives from large language models: Sentiment assessment of chatgpt-generated narratives and real tweets." Future Internet 15.12 (2023): 375.

* The authors have not adequately addressed the potential biases present in ChatGPT's training data, which could lead to skewed or incomplete hazard identification in certain construction scenarios. Specifically, is it clear the the LLMs used by the authors are applicable to industry-specific hazard standards? Or are there cases where because of a lack of data about these standards the LLM is creating hallucinations. In addition, it is not clear that the study is taking into account the possibility of evolving safety standards. Given the dynamic nature of construction safety regulations and best practices, it is unlikely that any one LLM will be able to remain up to date on them. Addressing how specifically the conclusions need to be taken into account given these issues would improve the paper.

(2) There is a lack of testing for statistical significance with respect to the results in the paper. 

For construction scheduling, the paper demonstrates that prompt engineering significantly reduces the relative error in schedule predictions compared to control prompts. For example: (a) for multipurpose buildings the relative error for ChatGPT with prompt engineering was 0.03, compared to 1.00 for the control group and (b) for business park development the relative error for ChatGPT with prompt engineering was 0.06, compared to 1.05 for the control group. While both these applications indicate superior performance it is unclear if that performance is statistically significantly better than the control group.

(3) In the replication crisis era the Python scripts, the prompts, the data, and scripts used to create the tabular data and graphics should be provided to the reader and reviewers.

(4) There are a number of presentation issues with the paper:

* The paper contains numerous unbound references to bibligographic entries, tables and/or figures (Error! Reference source not found.)=

* Numerous written section (i.e. 2.3 is one example) contain bulleted point lists; this is not technical writing. Furthermore, this is a hallmark of output written by a LLM. At least this section needs to be cleaned up; it should also be checked as it appears to be possibly written by ChatGPT instead of the authors.

* There are a nontrivial number of readers who suffer from red/green color blindness. Using a color-blindness safe color palette in the graphics in the web application, and figures 1, 3, would improve the paper. A series of color blind safe palettes are available here (https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40) and larger text fonts would improve the readability of figures.

* The Figures in the paper are not numbered correctly. Figure 1 appears twice as two different figures. After Figure 15 the numbering restarts.

* The numeric data in the paper is left justified; it should be right justified.

Comments on the Quality of English Language

Numerous written section (i.e. 2.3 is one example) contain bulleted point lists; this is not technical writing. Furthermore, this is a hallmark of output written by a LLM. At least this section needs to be cleaned up; it should also be checked as it appears to be possibly written by ChatGPT instead of the authors.

Author Response

Reviewer 2:

The research provides practical insights for AEC professionals on how to harness the power of generative AI tools like LLMs. By following the proposed guidelines, stakeholders can improve construction scheduling processes, enhance hazard recognition capabilities, and ultimately drive innovation and growth in the industry.  The paper attempts to demonstrate how prompt engineering can enhance the practical application of LLMs in the AEC industry, leading to more accurate and reliable outputs in construction scheduling and hazard recognition.

i appreciate your thoughtful and constructive comments. Each feedback is addressed in the following:

(1) There are a number of validation issues related to the methodology described in the paper:

* The paper has not incorporated expert verification of ChatGPT's outputs. Given that application of LLMs in the study is very domain specific and the domain is one where safety-related information is critical, relying solely on AI-generated content without expert review could lead to potentially harmful advice or overlooked hazards. Even in more simple domain specific tasks like generating social media messages about life events researchers have found that LLMs can make errors at non-trivial rates and in a variety of ways. Several critical studies related to this issue are described below:

Xue, Zhiwen, Chong Xu, and Xiwei Xu. "Application of ChatGPT in natural disaster prevention and reduction." Natural Hazards Research 3.3 (2023): 556-562.

Oviedo-Trespalacios, Oscar, et al. "The risks of using ChatGPT to obtain common safety-related information and advice." Safety science 167 (2023): 106244.

Lynch, Christopher J., et al. "A structured narrative prompt for prompting narratives from large language models: Sentiment assessment of chatgpt-generated narratives and real tweets." Future Internet 15.12 (2023): 375.

Thank you for your insightful feedback regarding the validation issues within the methodology. We fully recognize the critical importance of expert verification, especially in a domain where safety-related information is paramount. While our current study focuses on demonstrating the potential of ChatGPT in the AEC industry, i acknowledge that relying solely on AI-generated content without expert review could introduce risks, such as potentially harmful advice or overlooked hazards.

As such, i addressed these concerns as important limitations in the current study. We will emphasize in the paper that the outputs generated by ChatGPT should be viewed as preliminary recommendations that require validation by industry experts before being applied in practice. This approach highlights the necessity of human oversight in ensuring the safety and accuracy of AI-generated content, particularly in safety-critical environments.

Moreover, i agree that this issue warrants further investigation in future research. Future work will incorporate a structured process for expert review and verification of AI outputs, assessing how such oversight can mitigate potential risks. i also appreciate the references you provided, which i included in our discussion to underline the potential risks of using ChatGPT for domain-specific tasks and to reinforce the importance of continuous expert involvement.

These are all addressed at Line 1183 as follows:

“It is also essential to recognize that AI should be used to enhance human understanding rather than foster dependency or oversimplification. While tools like ChatGPT can significantly improve efficiency in the AEC industry, human oversight and collaboration remain indispensable. AI should complement, not replace, human expertise. Given the complexity of the construction industry, which often demands complex decision-making, integrating AI-driven insights with human experience is critical. Future research should explore how AI can work alongside AEC professionals, supporting and amplifying their judgment and skills. This collaborative approach ensures that AI tools are used thoughtfully, adding value to the decision-making process while preserving the crucial human element in construction projects.

Furthermore, there is a risk of inaccuracies in ChatGPT responses that may not always be readily apparent, highlighting the need for ensuring the reliability of this tool. One of the critical challenges in applying AI models like ChatGPT within the AEC industry is the potential for biases inherent in the model's training data, which can result in skewed or incomplete hazard identification. Given that the training data for LLMs may not comprehensively encompass the full spectrum of industry-specific hazard standards, there is a significant risk that the outputs generated by these models may not fully align with the latest safety regulations or may include inaccuracies, commonly referred to as 'hallucinations'.

Last, but not least, the dynamic nature of construction safety regulations poses an additional challenge, as it is improbable that any single LLM will remain fully aligned with these evolving standards without regular updates and fine-tuning. Consequently, while this study highlights the potential of ChatGPT in enhancing hazard recognition, it is imperative to caution that AI-generated outputs should be consistently cross-referenced with the most recent industry guidelines and expert insights to ensure their accuracy and relevance.”

* The authors have not adequately addressed the potential biases present in ChatGPT's training data, which could lead to skewed or incomplete hazard identification in certain construction scenarios. Specifically, is it clear the the LLMs used by the authors are applicable to industry-specific hazard standards? Or are there cases where because of a lack of data about these standards the LLM is creating hallucinations. In addition, it is not clear that the study is taking into account the possibility of evolving safety standards. Given the dynamic nature of construction safety regulations and best practices, it is unlikely that any one LLM will be able to remain up to date on them. Addressing how specifically the conclusions need to be taken into account given these issues would improve the paper.

Thank you for highlighting the potential biases in ChatGPT's training data and the challenges related to the applicability of LLMs to industry-specific hazard standards. i recognize that biases in AI models, including ChatGPT, could indeed lead to skewed or incomplete hazard identification, especially in a domain as critical as construction safety.

In the current study, while I aimed to explore the potential of ChatGPT in enhancing hazard recognition, I acknowledge that the LLMs used may not always align perfectly with industry-specific standards due to the limitations in their training data. This can result in the generation of outputs that might not fully comply with current safety regulations or that could contain inaccuracies—often referred to as "hallucinations."

Furthermore, I understand that construction safety regulations and best practices are continuously evolving. It is unlikely that any single LLM could stay fully up-to-date with these changes without ongoing updates and fine-tuning. I will address this in our paper by discussing the limitations of the current model, specifically noting that the outputs should be used with caution and should always be cross-referenced with the most recent industry standards and expert guidance.

In future research, I plan to explore ways to mitigate these issues, including the integration of up-to-date domain-specific datasets and the incorporation of mechanisms for ongoing model updates. I will also consider developing methodologies that allow for the dynamic adaptation of LLMs to reflect the latest regulatory changes in the AEC industry.

I have added the following paragraph to Line 1193 to address this comment:

“While the practical challenges outlined above must be addressed for successful AI integration, there are also inherent limitations within the current AI models that need to be acknowledged. For instance, there is a risk of inaccuracies in ChatGPT responses that may not always be readily apparent, highlighting the need for ensuring the reliability of this tool. One of the critical challenges in applying AI models like ChatGPT within the AEC industry is the potential for biases inherent in the model's training data, which can result in skewed or incomplete hazard identification. Given that the training data for LLMs may not comprehensively encompass the full spectrum of industry-specific hazard standards, there is a significant risk that the outputs generated by these models may not fully align with the latest safety regulations or may include inaccuracies, commonly referred to as 'hallucinations'. “

(2) There is a lack of testing for statistical significance with respect to the results in the paper. 

For construction scheduling, the paper demonstrates that prompt engineering significantly reduces the relative error in schedule predictions compared to control prompts. For example: (a) for multipurpose buildings the relative error for ChatGPT with prompt engineering was 0.03, compared to 1.00 for the control group and (b) for business park development the relative error for ChatGPT with prompt engineering was 0.06, compared to 1.05 for the control group. While both these applications indicate superior performance it is unclear if that performance is statistically significantly better than the control group.

Thank you for your observation regarding the need for statistical significance testing in the results presented. We appreciate your recognition that the application of prompt engineering significantly reduced the relative error in schedule predictions for construction projects. However, I understand the importance of establishing whether these improvements are statistically significant.

In this study, our primary focus was on demonstrating the practical impact of prompt engineering on the accuracy of AI-generated schedule predictions. While the results indicate a notable reduction in relative error, I acknowledge that a statistical analysis would provide a more rigorous validation of these findings. Specifically, applying statistical tests could help determine whether the observed improvements are not only substantial but also statistically significant, thereby strengthening the conclusions drawn from the data.

In future research, I plan to incorporate statistical significance testing to evaluate the effectiveness of prompt engineering more robustly. By doing so, I can provide a clearer understanding of the reliability of the improvements observed and better quantify the advantages of applying prompt engineering techniques in the AEC industry.

(3) In the replication crisis era the Python scripts, the prompts, the data, and scripts used to create the tabular data and graphics should be provided to the reader and reviewers.

Thank you for highlighting the importance of transparency and reproducibility, especially in the context of the replication crisis. I fully agree that providing access to the Python scripts, prompts, data, and other materials used in the study is crucial for enabling other researchers to replicate and validate our findings.

To address this, I plan to make these resources available to the readers and reviewers. I will include the Python scripts, the specific prompts used, and the datasets in a supplementary materials section or a publicly accessible repository. This will allow others to fully understand the methodologies applied, replicate the results, and explore further improvements.

(4) There are a number of presentation issues with the paper:

* The paper contains numerous unbound references to bibligographic entries, tables and/or figures (Error! Reference source not found.).

Thank you for mentioning this. The captions are updated throughout the whole document.

* Numerous written section (i.e. 2.3 is one example) contain bulleted point lists; this is not technical writing. Furthermore, this is a hallmark of output written by a LLM. At least this section needs to be cleaned up; it should also be checked as it appears to be possibly written by ChatGPT instead of the authors.

Thank you for your feedback regarding the use of bulleted lists in Section 2.3. The author acknowledges that while technical writing typically favors narrative format, the use of bulleted points in this section is intentional to improve clarity and readability for the reader, particularly when outlining key concepts or procedures. These lists were manually written by the author and properly cited. For instance, section 2.3. is fully cited to references [5], [14], [15], [16], [18], [19], and [20].

* There are a nontrivial number of readers who suffer from red/green color blindness. Using a color-blindness safe color palette in the graphics in the web application, and figures 1, 3, would improve the paper. A series of color blind safe palettes are available here (https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40) and larger text fonts would improve the readability of figures.

Thank you for the recommended color palette. The colors are adjusted as much as possible, to avoid problematic pairs of colors.

* The Figures in the paper are not numbered correctly. Figure 1 appears twice as two different figures. After Figure 15 the numbering restarts.

Thank you for mentioning this. The figure captions are updated throughout the whole document.

* The numeric data in the paper is left justified; it should be right justified.

Thank you for mentioning the presentation issues. The text is reviewed, and these issues are fully addressed.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript presents guidelines for prompt design and engineering to elicit desired responses from ChatGPT in AEC applications. Εxamples on construction scheduling and hazard recognition are provided to demonstrate the effectiveness of the proposed methodology of prompt engineering. Project scheduling and hazard recognition capability in the AEC industry are significantly improved when using the proposed methodology. The manuscript is well written and organized. However, in order to increase its quality, the following comments need to be addressed:
-At line 70, please remove "Error! Reference source not found". The same for lines 98, 115 etc.
-The author is kindly requested to check reference [26]. It seems that the link provided leads to a webpage stating "404 The page you were looking for doesn’t exist. You may have mistyped the address or the page may have moved."
-In the study both control (basic prompts without optimization) and experimental (optimized prompts based on case studies) groups are included to isolate the effect of prompt engineering on ChatGPT's performance. Do the prompts presented in Table 1 fall into the second category? If yes, which are the prompts of the first category that were used to get the results presented in Table 2 (line 732)? Same question for the hazard recognition prompts shown in Table 3.
-The author is kindly requested to provide in the manuscript a discussion related to the applicability of the findings of the study beyond the specific cases studied (construction scheduling and hazard recognition). How might the results generalize to other domains or applications of AEC? How would the optimized prompts shown in Tables 1 & 3 be modified for cases other than construction scheduling and hazard recognition?
-Is there a quantitative way to prove that the prompts presented in Tables 1 & 3 are optimal compared to others? The author is requested to provide a related discussion in the manuscript.

Author Response

Reviewer 3:

The manuscript presents guidelines for prompt design and engineering to elicit desired responses from ChatGPT in AEC applications. Εxamples on construction scheduling and hazard recognition are provided to demonstrate the effectiveness of the proposed methodology of prompt engineering. Project scheduling and hazard recognition capability in the AEC industry are significantly improved when using the proposed methodology. The manuscript is well written and organized. However, in order to increase its quality, the following comments need to be addressed:

I appreciate your time and thoughtful feedback. All comments are addressed in the following.

  1. At line 70, please remove "Error! Reference source not found". The same for lines 98, 115 etc.

The error message is removed, and the link is updated to cross reference Figure 1 in the text.

  1. The author is kindly requested to check reference [26]. It seems that the link provided leads to a webpage stating "404 The page you were looking for doesn’t exist. You may have mistyped the address or the page may have moved."

Thank you for mentioning this. OpenAI has moved the content to a new address. This address is updated at Line 1293 as follows:

  1. OpenAI. Prompt Engineering. 2023. Retrieved from “https://platform.openai.com/docs/guides/prompt-engineering”.
  2. In the study both control (basic prompts without optimization) and experimental (optimized prompts based on case studies) groups are included to isolate the effect of prompt engineering on ChatGPT's performance. Do the prompts presented in Table 1 fall into the second category? If yes, which are the prompts of the first category that were used to get the results presented in Table 2 (line 732)? Same question for the hazard recognition prompts shown in Table 3.

Thank you for your question. Yes, the prompts presented in Table 1 fall into the second category, representing optimized prompts used in the experimental group. The control group utilized basic, non-optimized prompts that were designed to be general and lacked the specific instructions or refinements found in the experimental group. Unfortunately, due to space limitations, the control prompts were not included in the manuscript. However, the author is happy to provide examples of the control prompts upon request. The same applies to the hazard recognition prompts in Table 3, where the optimized versions were shown, and the basic control prompts can be shared if needed.

  1. The author is kindly requested to provide in the manuscript a discussion related to the applicability of the findings of the study beyond the specific cases studied (construction scheduling and hazard recognition). How might the results generalize to other domains or applications of AEC? How would the optimized prompts shown in Tables 1 & 3 be modified for cases other than construction scheduling and hazard recognition?

Thank you for your thoughtful request regarding the discussion on the applicability of the study's findings beyond the specific cases of construction scheduling and hazard recognition. I appreciate the opportunity to explore how the results might generalize to other domains or applications within the AEC industry.

In the revised manuscript, I have included a discussion on the potential for generalizing the findings to other areas of the AEC sector. For example, the principles of prompt engineering demonstrated in this study could be adapted for applications such as project cost estimation, resource allocation, and quality control. The optimized prompts shown in Tables 1 and 3 serve as templates that can be modified based on the specific needs of these different applications. For instance, prompts used for project cost estimation might require additional constraints related to budget limits or cost categories, while those for quality control could include specific standards or regulatory requirements.

By discussing these possibilities, I aim to provide readers with a broader understanding of how the methodologies developed in this study could be applied across a range of AEC tasks, thereby enhancing the versatility and impact of generative AI tools in the industry.

This feedback is addressed in Lines as follows:

“While this study primarily addresses the specific cases of construction scheduling and hazard recognition, the principles of prompt engineering possess broader applicability across various domains within the AEC industry. The optimized prompts developed for these tasks can function as adaptable templates, suitable for a wide range of applications such as project cost estimation, resource allocation, and quality control. For instance, in project cost estimation, prompts could be customized to incorporate constraints related to budget categories or cost limits, while in quality control, they might be tailored to include specific standards or regulatory requirements.”

  1. Is there a quantitative way to prove that the prompts presented in Tables 1 & 3 are optimal compared to others? The author is requested to provide a related discussion in the manuscript.

Thank you for your insightful question regarding the optimality of the prompts presented in Tables 1 and 3. I recognize the importance of establishing whether the prompts used in this study are indeed the most effective compared to potential alternatives.

In the current study, the prompts were designed based on established prompt engineering techniques and tailored specifically for the tasks of construction scheduling and hazard recognition. While these prompts have shown significant improvements in the accuracy of AI-generated outputs, we acknowledge that proving their optimality quantitatively would require a more rigorous comparative analysis.

In future research, I plan to explore potential methods for quantitatively evaluating the optimality of prompts. This could involve systematic testing of various prompt formulations and comparing their performance using metrics such as accuracy, precision, recall, or other task-specific criteria. Additionally, methods like A/B testing or optimization algorithms could be employed to identify the most effective prompt configurations. While the current study does not include this level of analysis, I recognize it as an important area for future research to further validate and refine the prompt engineering process.

 

Reviewer 4 Report

Comments and Suggestions for Authors

The topic of the work is very interesting, and the depth of analysis is very significant. The exploration and contextualization of the topic is a little weak and needs to be improved. 

Improvement suggestions:

1. The potential of generative artificial intelligence should be grounded in the literature.

2. Research gap must be better explored and based in recent studies.

3. Correct and check: (Error! Reference source not found.).

4. Clarify if Figure 1 is original.

5. The specific cases of employing generative AI should provide better insights.

6. Challenges and limitations of using generative AI are too different concepts. It would be important to distinguish them.

7. Section 2.5 is more important to present in the Introduction section.

8. It is not clear if the strategies proposed in section 3.1.4 can be used together.

9. It is important to better describe the similarities and differences between the characteristics of the case studies. You can use a table for that.

10. Number of references should be increased.

Comments on the Quality of English Language

see my comments

Author Response

Reviewer 4:

The topic of the work is very interesting, and the depth of analysis is very significant. The exploration and contextualization of the topic is a little weak and needs to be improved.

We appreciate your time and thoughtful feedback. All comments are addressed in the following.

Improvement suggestions:

  1. The potential of generative artificial intelligence should be grounded in the literature.

Thank you for your feedback. The author believes that the potential of generative AI is already well-grounded in the literature, as several key references have been cited throughout the paper to support the claims. However, the author will review the section again to ensure that the existing references sufficiently cover the topic and are presented clearly.

  1. Research gap must be better explored and based in recent studies.

Thank you for the feedback. This section is expanded at Line 296 as follows:

“While it is commonly recognized that prompt engineering with ChatGPT has exhibited exceptional performance in contrast to other leading methods on benchmark datasets [23], there exists a notable absence of empirical research and validation studies concerning construction-related tasks such as scheduling and hazard recognition utilizing ChatGPT. Despite the broad spectrum of potential applications in the AEC domain, limited studies have emphasized the importance of assessing the applicability and validity of ChatGPT and prompt engineering within this context.

 Furthermore, the few existing studies tend to focus on isolated tasks without exploring the integration of ChatGPT into larger, more complex workflows commonly found in the AEC industry. As construction projects often involve a dynamic interplay of multiple factors, such as scheduling, resource management, and safety regulations, it becomes crucial to validate AI outputs in the context of such multifaceted environments. The lack of research on ChatGPT’s performance in addressing these interconnected tasks introduces uncertainty regarding its practical effectiveness.

Additionally, current studies rarely address the need for continuous fine-tuning and adaptation of generative AI models like ChatGPT to accommodate evolving construction standards and project-specific requirements. This gap highlights the need for future work that not only evaluates AI's immediate performance but also examines how these systems can be adapted over time to remain relevant in a constantly changing industry.

Given the critical role of safety and compliance in construction, a significant gap also exists in evaluating ChatGPT's ability to reliably recognize hazards and offer safety recommendations that align with the latest industry standards. As regulations and safety protocols continue to evolve, the ability of generative AI to stay up to date without human oversight remains a key concern that requires further investigation.”

  1. Correct and check: (Error! Reference source not found.).

The error message is removed, and the link is updated to properly cross reference figures/tables.

  1. Clarify if Figure 1 is original.

Yes, it is created by the author, using the information provided in reference [3].

  1. The specific cases of employing generative AI should provide better insights.

Thank you for your feedback. The author believes the current case studies offer valuable insights into the application of generative AI. However, the author will review the content to ensure that the key findings and implications are clearly articulated and consider enhancing the discussion if necessary.

  1. Challenges and limitations of using generative AI are too different concepts. It would be important to distinguish them.

Thank you for your feedback highlighting the distinction between challenges and limitations in the context of generative AI. The manuscript is revised to clearly differentiate between these two concepts. The section now separates practical challenges, such as integration with existing BIM systems and data privacy concerns, from inherent limitations, like biases in training data and the difficulty of keeping AI models aligned with evolving industry standards.

  1. Section 2.5 is more important to present in the Introduction section.

Thank you for the suggestion regarding Section 2.5. In response, the author has added a sentence to the Introduction to emphasize the key points from Section 2.5. However, the full section has been retained, as it offers a more detailed discussion that is important for readers to fully grasp the background and context of the study. The added sentence at Line 38 is as follows:

“While ChatGPT has demonstrated strong performance on general benchmark datasets, there is a distinct lack of empirical validation for its use in construction-specific tasks such as scheduling and hazard recognition.”

  1. It is not clear if the strategies proposed in section 3.1.4 can be used together.

Yes, these tactics can be combined, as shown in the case studies.

  1. It is important to better describe the similarities and differences between the characteristics of the case studies. You can use a table for that.

Thank you for the suggestion to describe the similarities and differences between the case studies. The author believes that the narrative description currently provided in the manuscript sufficiently captures these characteristics. However, if needed, a table can be considered in future revisions to further clarify the comparison.

  1. Number of references should be increased.

Thank you for your suggestion regarding the number of references in the manuscript. We appreciate the importance of thoroughly supporting the study with relevant literature. However, we believe that the focus should be on the quality and relevance of the references rather than solely on their quantity. The current selection of references has been carefully chosen to ensure they are directly related to the topics discussed, providing a strong foundation for the arguments and findings presented. We are committed to maintaining a high standard of academic rigor by including only those references that significantly contribute to the study's context and validity.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have no further comments and accept the paper as it is.

Author Response

I appreciate your time and consideration. Thank you for the insights provided.

Reviewer 2 Report

Comments and Suggestions for Authors

My concerns have been sufficiently addressed. The paper is now suitable for publication.

Author Response

I appreciate your time and consideration. Thank you for the insights provided.

Reviewer 4 Report

Comments and Suggestions for Authors

I recommend the authors to better structure the Conclusions section, which is currently too long. This can be done if the authors use sub-sections to present contributions offered by this study, policy implications, limitations, and future research directions.   

Author Response

I appreciate your time and consideration. Thank you for the insights provided. Subsections are added to the Conclusion section for easier readability.

Back to TopTop