1. Introduction
The Architecture, Engineering, and Construction (AEC) industry has been experiencing rapid and transformative changes of digitalization [
1]. These changes are largely driven by the need to adopt advanced technologies that enhance efficiency, strengthen collaboration, and improve overall project performance. One key aspect of this transformation is the widespread adoption of Building Information Modeling (BIM), which has become a foundational component in the design, construction, operation, and maintenance of buildings [
2]. Government incentives and increasing industry-wide demand further reinforce this transition to more efficient digital workflows. For example, a survey conducted in the UK and Ireland reported that BIM adoption increased from 13% to 73% by 2020, with an additional 26% of respondents indicating plans for future adoption. Similarly, in North America, 53.6% of professionals utilized BIM in at least 60% of their projects, and some firms reported that more than 80% of their revenue is linked to BIM-related works [
3]. As the digital transformation accelerates across the AEC industries, educational institutions are under increasing pressure to adapt accordingly [
4].
Many architecture and construction programs at educational institutions have integrated BIM into their curricula and instruction, often utilizing tools such as Autodesk Revit and Navisworks to teach key concepts including design coordination, clash detection, process virtualization, and Virtual Design and Construction (VDC) [
5]. While BIM education has significantly enhanced students’ digital skills, recent advancements in generative artificial intelligence (GenAI), particularly large language models (LLMs), present new opportunities to further advance AEC education and practice. These AI tools demonstrate great potential in tasks such as interpreting complex building codes and automating compliance checks [
6]. In addition, LLMs can generate scripts and code to facilitate various automated engineering and construction workflows [
7]. Despite the growing interest and demand in integrating AI into education, there remains a significant gap in research investigating how students perceive and utilize LLMs within real classroom environments [
8].
This study evaluated the educational impact of integrating large language models (LLMs) into BIM instruction. Specifically, we addressed the following research questions:
RQ1: Compared to traditional manual rule-checking methods, how do students perceive the workload and effectiveness of LLM-assisted instruction?
RQ2: What usability challenges do students encounter when interacting with LLM-generated Python scripts for BIM tasks?
RQ3: How open are students to adopting AI-assisted methods in future construction management workflows?
RQ4: What instructional strategies and prompting structures can improve student outcomes in LLM-based compliance checking?
This paper is structured as follows:
Section 2 reviews the existing literature on LLMs in the construction industry and higher education.
Section 3 describes the proposed instructional framework and evaluation of learning experiences through surveys and interviews.
Section 4 presents the evaluation and data analysis results from two student cohorts.
Section 5 describes the findings and lessons learned from the results. Finally,
Section 6 concludes with key findings, limitations, and future research directions.
4. Results
4.1. Between-Group Comparisons: Manual vs. LLM Cohorts
To evaluate the differential impact of instructional methods, we conducted independent-samples
t-tests comparing the Fall 2024 (manual-only) and Spring 2025 (LLM-assisted) cohorts across all NASA-TLX workload dimensions and perception-based variables.
Table 7 summarizes the results. Cohen’s d values (ranging from 0.76 to 1.53) indicate medium to large practical effects, which suggests that the observed differences between the manual and LLM-assisted groups are not only statistically significant but also educationally meaningful.
4.2. Perceptions of Technology Adoption and Task Usability
In addition to workload and performance measures, this study investigated students’ perceptions of the AI-driven code compliance checking method, focusing on three key dimensions: openness to adopting new technology, perceived time-saving benefits of the AI-driven method, and the ease of identifying non-compliant components using AI.
As shown in
Table 8, a majority of students showed a willingness to adopt new technological tools. Specifically, 41.82% of the students were open “To a moderate extent,” and 38.18% of them were open “To a great extent.” These findings demonstrate an overall positive attitude toward technological innovation within the learning process. However, a smaller group of students expressed hesitation or reluctance, with 10.91% selecting “Very little” and 7.27% “To a small extent,” which indicates that not all participants were equally comfortable engaging with emerging AI-driven tools. This hesitation may be attributed to limited familiarity with coding environments or limited prior exposure to LLM systems.
Students’ perceptions of the AI method’s impact on time efficiency were also predominantly positive. A majority of students indicated that the method saved “A meaningful amount of time,” while an additional 23.64% reported “a somewhat meaningful amount of time.” In contrast, 9.09% of students indicated that the method saved “Very little” time, with the same percentage (9.09%) selecting “Not at all.” A smaller group (5.45%) perceived the time-saving benefit as “An incomparable amount of time.” These results indicate that, although the AI-powered method provided notable time-saving potential for most students, some experienced limited benefits. Such variation likely reflects individual differences in prompt formulation, interpretation of AI-generated outputs, and navigation of the Revit-based workflows.
Regarding the ease of identifying non-compliant elements using the AI method, 36.36% of students selected “To a moderate extent,” indicating a moderate level of confidence in the usability of the tool. A comparable proportion (34.55%) reported “To a small extent,” while 16.36% selected “Very little” and 9.09% “Not at all.” Only 3.64% of students perceived the method as highly easy to use, selecting “To a great extent.” These findings indicate that, although many students perceived the tool as somewhat usable, a considerable proportion encountered challenges. This underscores the need for clearer instructional guidance, well-defined prompt engineering approaches, and instructional scaffolding to support effective use of AI in compliance checking tasks.
4.3. Learning-Outcome Performance
Table 9 summarises mean scores for each learning outcome (LO1–LO4) by cohort. Independent-samples
t-tests show that the LLM-assisted group outperformed the manual group on LO2 (violation detection) and LO3 (script accuracy) with large effect sizes, while no significant difference was observed for LO1 (code interpretation) or LO4 (critical reflection).
The pronounced gains on LO2 and LO3 indicate that AI assistance substantially improved students’ practical compliance-checking skills and the technical correctness of their solutions. By contrast, the absence of a significant difference in LO1 suggests that LLM support did not automatically strengthen students’ independent interpretation of code provisions. One possible explanation is that the LLM-assisted workflow primarily supported task execution, prompt refinement, and code generation, which are more directly related to procedural performance than to deeper conceptual understanding of regulatory logic. In other words, students may have benefited from AI assistance in completing the compliance-checking process without necessarily improving their ability to interpret code requirements on their own. Similar LO4 scores imply that reflective capacity did not depend on instructional modality.
To test whether improved performance was associated with reduced cognitive load, Pearson correlations were computed between LO2 and LO3 scores and the NASA-TLX dimensions. Higher LO2 scores were negatively correlated with mental demand (, ; ) and frustration (, ; ). These results indicate moderate negative relationships, suggesting that stronger performance in violation detection was associated with lower perceived cognitive burden and frustration.
4.4. Student Reflections on the AI Experience
In addition to quantitative survey responses, qualitative insights were obtained through structured interviews and open-ended survey comments, revealing several important themes regarding students’ experiences with AI-assisted code generation. The iterative nature of effective prompt engineering emerged as a critical factor in successful outcomes. Many students observed that generating functional code typically required multiple rounds of prompt refinement, with success rates improving significantly when prompts contained comprehensive and specific details. As one participant reflected, the quality of AI-generated output was directly proportional to the specificity of the initial request, noting that “the more detailed your prompt is, the better results I was getting.” This finding underscores the importance of prompt literacy as a foundational skill for effective AI tool utilization.
Nevertheless, concerns regarding the reliability and consistency of AI outputs tempered students’ enthusiasm. Many participants reported experiencing inconsistent results when submitting similar prompts, along with occasional misinterpretation of complex rule logic by the AI systems. Students addressed these issues using the two validation practices introduced during instruction: manually locating flagged elements in the BIM model to confirm that reported values matched the model parameters, and submitting the same problem to a second LLM to check whether outputs agreed across models. While these strategies helped students identify suspect results, they also reinforced the broader message that AI-generated compliance outputs require human oversight and cannot be accepted at face value.
Despite these challenges, students consistently recognized the efficiency gains provided by AI tools in completing rule-checking assignments. Once scripts achieved functional status, the AI demonstrated considerable effectiveness in identifying non-compliant model elements, offering substantial time savings compared to traditional manual verification methods. This efficiency was universally regarded as a significant pedagogical advantage, enabling students to focus on higher-order learning objectives rather than routine procedural tasks.
In response to these challenges, students articulated clear recommendations for enhanced pedagogical scaffolding. They identified three primary areas for improvement: (1) foundational tutorials covering Python programming concepts relevant to the assignment context, such as how to organize a script, how to debug and troubleshoot a script, and how to better visualize outcomes of a script, (2) comprehensive examples of common AI-generated errors accompanied by debugging strategies, and (3) structured walkthroughs designed specifically for novice users where different case studies with a step-by-step implementation of the AI tools are delivered, starting from a prompt engineering to a script debugging. Students viewed these scaffolding elements as essential for maximizing learning outcomes while simultaneously reducing over-dependence on AI tools for problem resolution.
5. Interpretation of Findings
The integrated results from the survey and student reflections reveal several key insights into how students experienced and evaluated the AI-powered rule-checking assignment. These findings highlight the cognitive demands of the workflow, the role of perceived usefulness in shaping satisfaction, and the instructional challenges of integrating large language models (LLMs) into BIM education.
5.1. Cognitive Load and Mental Demands
Students consistently reported experiencing high mental and temporal demands throughout the assignment, accompanied by substantial effort requirements. These findings align with cognitive load challenges previously identified in ChatGPT-in-education studies [
24], suggesting that AI integration in educational contexts inherently introduces additional cognitive complexity. The AI-assisted workflow, which required students to craft effective prompts, interpret generated outputs, and debug resulting scripts, proved cognitively demanding and presented particular difficulties for students lacking programming backgrounds.
The iterative nature of the prompting process significantly contributed to cognitive strain. Students frequently described engaging in extensive prompt revision cycles, with some reporting “four or five prompt revisions” before achieving satisfactory results. This trial-and-error approach to debugging, particularly when dealing with ambiguous rules or limited data availability, substantially increased frustration levels. The opaque nature of AI behavior, when unsupported by clear instructional guidance, further heightened user frustration and mental strain [
19], creating barriers to effective learning engagement.
5.2. Impact of Perceived Usefulness on User Experience
The analysis of the survey results revealed that effort, temporal demand, and mental demand emerged as the strongest predictors of student frustration, with these relationships supported by both quantitative data and qualitative interview responses. However, students who perceived clear benefits from the AI tool, specifically those who believed it saved time and effectively identified non-compliant model elements, experienced significantly lower frustration levels. This pattern suggests that perceived usefulness serves as a crucial mediating factor that can mitigate the stress associated with cognitive demands.
The tool’s perceived usefulness demonstrated significant influence on overall satisfaction ratings. Students who recognized time-saving benefits (
) and appreciated improved rule-checking clarity (
) consistently rated the AI-assisted method more favorably [
8,
25,
26]. Conversely, students experiencing higher temporal demands and increased frustration levels showed reduced perceptions of the method’s effectiveness [
19,
27], creating a feedback loop that could potentially undermine learning outcomes.
5.3. Factors Influencing Future Adoption
Student openness to future AI adoption demonstrated strong associations with positive perceptions of time efficiency and task-related benefits [
8,
26]. This relationship suggests that early positive experiences with AI tools significantly influence students’ willingness to incorporate these technologies into their future professional practice. Additionally, openness to future adoption correlated positively with higher levels of mental engagement and greater construction-related experience [
28], indicating that students with stronger foundational knowledge and active cognitive involvement were more receptive to AI integration.
However, frustration levels negatively impacted openness to future adoption [
19,
24], highlighting the critical importance of designing supportive learning environments that minimize unnecessary cognitive barriers. Students who encountered persistent difficulties or felt overwhelmed by the technical complexity of the AI-assisted workflow were significantly less likely to express interest in continued use of such tools.
5.4. Implications for Practical Skill Development and AI Dependency
While the LLM-assisted approach improved efficiency and technical performance, the findings also raise an important pedagogical concern regarding the development of students’ practical skills. In particular, the significant gains observed in violation detection (LO2) and method accuracy (LO3), combined with the absence of a significant difference in code interpretation (LO1), suggest that LLM support may enhance procedural execution more readily than conceptual understanding. This pattern indicates that students may complete rule-checking tasks more effectively with AI assistance without necessarily strengthening their independent ability to interpret regulatory logic or understand the reasoning behind the generated solution.
Student reflections further reinforce this concern. Several participants reported difficulty debugging AI-generated scripts and often relied on the same LLM tools to correct AI-generated errors. Although this iterative interaction helped them complete the task, it also points to a possible dependency pattern in which students rely on AI support throughout the entire problem-solving process rather than developing independent troubleshooting strategies. From an educational perspective, this raises an important question: whether repeated reliance on LLMs might reduce opportunities to practice foundational skills that remain necessary when AI tools are unavailable or when their outputs are incorrect. This consideration is especially important in professional settings where AI tools may be unavailable, restricted, or produce unreliable outputs, requiring students to rely on their own judgment and manual verification skills.
Therefore, the educational value of LLM integration should not be understood only in terms of speed or task completion. In BIM and code-compliance education, students must still develop the ability to interpret code provisions, verify model conditions manually, and critically evaluate whether AI-generated outputs are valid. LLMs may serve as useful instructional aids, but they should supplement rather than replace the development of core professional judgment and independent problem-solving skills.
5.5. Fostering Critical Oversight in LLM-Assisted Workflows
Building on these student reflections, an additional pedagogical issue concerns how LLM-assisted workflows may shape students’ habits of verification and independent reasoning. An important concern emerging from the qualitative findings is the risk of dependency when students rely on LLMs not only to generate code, but also to diagnose and correct errors in that same code. While this iterative use of AI may help students complete the task more efficiently, it can also reduce opportunities for independent reasoning, debugging, and verification. In such cases, students may focus on obtaining a working output without fully understanding the logic behind the generated solution. This concern is consistent with the finding that the LLM-assisted cohort demonstrated stronger performance in violation detection and method accuracy, but not in code interpretation, suggesting that procedural gains may occur without comparable gains in underlying conceptual understanding.
To address this issue, LLM-assisted instruction should include explicit support for critical oversight rather than treating AI output as inherently reliable. Students should be guided to question whether the generated script reflects the correct rule logic, whether the relevant BIM parameters are being interpreted appropriately, and whether the reported violations are consistent with manual inspection. Pedagogically, this means that AI use should be paired with structured verification activities, such as requiring students to explain the logic of the generated code, manually validate a subset of AI-identified violations, and reflect on possible sources of error in the output. In practice, this may involve requiring students to identify likely failure points before re-prompting the LLM, to compare AI-generated results with manual checks on selected elements, and to justify why a revised script should be considered valid. These practices can help ensure that LLMs function as learning aids rather than substitutes for professional judgment and independent problem-solving.
5.6. Supporting Conceptual Understanding in LLM-Assisted Learning
An additional implication of the learning-outcome results is that LLMs may support procedural performance more readily than conceptual understanding. While the LLM-assisted cohort performed better in violation detection and method accuracy, no significant gain was observed in code interpretation. This pattern suggests that students may have used AI effectively to complete the task without substantially improving their independent understanding of the underlying code logic. To better support conceptual learning, LLM-assisted instruction should require students not only to use AI-generated outputs but also to explain the meaning of the relevant code provisions, justify how those provisions are translated into checking logic, and manually verify whether the generated script aligns with the intended rule. For example, students could be asked to interpret a code provision in plain language before prompting the LLM, annotate the logic of the generated script step by step, and compare AI-generated results with manual checks on selected BIM elements. In this way, LLMs can be positioned not only as tools for procedural efficiency but also as supports for deeper interpretation and reflective understanding.
5.7. Pedagogical Implications and Recommendations
In light of the benefits and risks discussed above, students articulated specific recommendations for instructional improvements designed to reduce frustration while preserving meaningful learning. Their suggestions focused on three primary areas: providing structured examples of successful AI interactions, developing pre-written prompt templates to scaffold initial attempts, and offering comprehensive Python syntax walkthroughs to build foundational programming literacy. These recommendations reflect students’ recognition that effective AI integration requires substantial pedagogical support rather than simply providing access to the technology itself.
An additional implication of these findings is that effective pedagogical scaffolding depends not only on student preparation but also on instructor readiness. In LLM-assisted BIM instruction, teaching staff must be able to guide students in prompt formulation, explain common sources of AI-generated errors, and support the validation of generated outputs against model conditions and code requirements. Without this instructional capacity, students may receive access to AI tools without receiving the support needed to use them critically and effectively. Therefore, successful integration of LLMs into technical education may require not only revised student-facing materials, such as prompt templates and debugging guides, but also greater faculty preparedness to supervise AI-assisted workflows and respond to emerging usability challenges. This does not imply that instructors must become AI specialists; however, they should be sufficiently prepared to model responsible AI use, identify common failure patterns, and help students maintain critical oversight during task completion.
To reduce dependency risks, future implementations should incorporate code-explanation tasks, manual cross-checking of selected outputs, and assessment criteria that reward validation and error identification rather than successful execution alone.
6. Conclusions
This study evaluated the integration of LLMs into rule-based code compliance checking within a graduate-level BIM course. A quasi-experimental design compared manual-only instruction in Fall 2024 with LLM-assisted instruction in Spring 2025. The study used both quantitative data, such as NASA-TLX workload ratings, perceptions toward the AI-driven method, and its usability, and qualitative data, including open-ended feedback and interviews. The results suggest that AI-assisted methods significantly reduce frustration and perceived workload while improving efficiency and openness to technology adoption. However, effective use still requires prompt engineering skills and the ability to interpret AI-generated code.
The findings reveal important insights about the cognitive demands and pedagogical considerations associated with AI integration in technical education. While students experienced substantial benefits in terms of time efficiency and task completion, they also encountered significant challenges related to prompt crafting, code interpretation, and debugging processes. Students who perceived clear utility in the AI tools demonstrated significantly higher satisfaction levels and greater openness to future technology adoption. These results highlight the critical importance of pedagogical scaffolding, as students consistently recommended enhanced instructional support, including foundational programming tutorials, structured prompt templates, and comprehensive debugging guides.
6.1. Limitations
The study has several limitations that should be considered when interpreting the findings. First, it was conducted at a single institution with a relatively small sample size (N = 55), which may limit the generalizability of the results to other educational settings. Second, although the study employed a quasi-experimental design comparing two distinct cohorts, unmeasured confounding variables such as differences in student backgrounds or prior experience may have influenced the observed outcomes. Third, students encountered various usability issues including inconsistent AI outputs, opaque error messages, and unpredictable behavior when dealing with complex rule logic. These technical limitations were particularly challenging for students without coding experience. In addition, this study did not directly measure whether repeated use of LLM assistance affects students’ long-term retention of manual rule-checking, code interpretation, or independent problem-solving skills. Finally, the AI-assisted approach examined in this study remains under active development and requires further refinement to achieve optimal reliability and clarity. In terms of scoring, all submissions were evaluated by a single blinded rater using a standardized rubric; however, formal intra-rater reliability was not assessed, which represents a methodological limitation. A further limitation concerns the scope of the background variables collected. The demographic survey captured students’ prior experience in construction, BIM, code compliance, and LLM exposure, but did not assess their broader academic background in language, writing, or the humanities. Because prompt engineering relies heavily on clarity of expression, specification of intent, and structured natural-language reasoning, differences in humanities-oriented preparation may influence students’ ability to formulate effective prompts. This factor was not examined in the present study and could not be disentangled from other background variables.
6.2. Future Research Directions
Future studies will address these limitations by expanding to multiple institutions and larger student populations to enhance generalizability. Implementing longitudinal tracking of skill retention and tool adoption would provide valuable insights into long-term educational impact. Where feasible, employing stronger experimental controls would strengthen causal inferences about AI-assisted instruction effectiveness.
Additionally, future research will focus on several key areas. Early integration of prompt engineering and Python programming concepts into the curriculum represents a critical need, as foundational technical literacy appears essential for successful AI tool utilization. Developing comprehensive support resources, including detailed error-handling guides, structured prompt libraries, and step-by-step debugging protocols, could significantly reduce cognitive load and improve learning outcomes.
Improving the AI tool’s consistency and transparency through iterative refinement based on systematic user feedback represents another important research direction. Future work will also examine optimal pedagogical strategies for scaffolding AI interaction skills and explore the broader implications of AI integration for professional preparation in the AEC industry.
Future research will also examine faculty readiness and instructional capacity for supporting LLM-assisted learning in technical disciplines, including what forms of training and teaching resources are most effective for instructors. Future studies should also examine whether LLM-assisted instruction influences students’ ability to perform similar tasks independently when AI support is unavailable.
Furthermore, future research will investigate how students’ broader academic background, particularly training in language, writing, and the humanities, relates to their ability to formulate, refine, and critically evaluate prompts. Because effective prompt engineering depends on precise natural-language expression and structured reasoning, engineering students with stronger humanities preparation may approach LLM interaction differently than those with a predominantly technical background. Collecting self-reported data on humanities coursework, writing confidence, and language proficiency, alongside direct student reflections on whether they perceive such preparation as helpful, would allow this relationship to be tested empirically.