Scaffolding Probabilistic Reasoning in Civil Engineering Education: Integrating AI Tutoring with Simulation-Based Learning
Abstract
1. Introduction
1.1. The Education Challenge
1.2. Design-Based Research Positioning, Contributions, and Paper Organization
2. Theoretical Framework
2.1. Cognitive Load Theory
2.2. Multimedia Learning Principles
2.3. Zone of Proximal Development and Scaffolding
2.4. Self-Regulated Learning
2.5. Threshold Concepts
2.6. Integration of Different Theories
2.7. Educational Quality as Design Target
3. Background and Literature Review
3.1. Structural Reliability
3.2. Simulation-Based Learning in Engineering Education
3.3. Artificial Intelligence in Engineering Education
3.4. Existing Approaches to Structural Reliability Education
3.5. Evidence from Comparable Domains
4. Pedagogical Framework Design
4.1. Learning Objectives and Competency Outcomes
4.2. Conceptual Progression and Scaffolding
4.3. Role of the AI Chatbot Tutor
4.4. AI Chatbot Tutor: Architecture, Implementation, and Safeguards
4.4.1. System Architecture and Model Selection
4.4.2. Prompt Engineering for Domain Accuracy and Pedagogical Effectiveness
4.4.3. Misconception Inventory: Systematic Identification and Encoding
4.4.4. Accuracy Safeguards and Error Prevention
4.4.5. Human Escalation Protocols
4.4.6. Pedagogical Design Alignment
4.5. Assessment Framework and Learning Measures
4.5.1. Conceptual Understanding
4.5.2. Process Measures from Interaction Logs
4.5.3. Practical Competencies
4.5.4. Research Questions and Variables for Empirical Validation
- RQ1.
- Does the integrated AI tutoring and simulation framework improve conceptual understanding of structural reliability compared to traditional lecture-based instruction?
- RQ2.
- Does the framework reduce the documented misconceptions about probabilistic structural behavior?
- RQ3.
- How do students’ interaction patterns with simulations and the AI chatbot relate to learning outcomes?
- RQ4.
- What is the relative contribution of simulation-based exploration versus AI tutoring to learning gains?
- RQ5.
- How do individual differences in prior statistics knowledge and learning preferences moderate framework effectiveness?
4.6. Integration with Existing Curriculum
- Structural mechanics: (i) describe a plausible failure/limit state for a simple member in words; (ii) identify governing response quantities for a given loading scenario.
- Mathematics: (i) manipulate algebraic expressions and solve for an unknown; (ii) interpret functions/graphs in context; (iii) compute and interpret a basic derivative as a rate of change (sufficient for sensitivity/linearization concepts).
- Probability/statistics: (i) distinguish deterministic vs variable quantities; (ii) interpret mean and variance/standard deviation; (iii) interpret a probability statement using a distribution/CDF at a conceptual level.
5. Teaching Modules
5.1. Module Overview
5.2. Instructional Structure
- Activation and motivation (15–20 min): The module opens with a concrete engineering scenario that motivates the concepts to be learned. The AI chatbot poses initial questions to activate prior knowledge and surface existing conceptions.
- Concept introduction (20–30 min): Key concepts are introduced through brief exposition, immediately followed by interactive simulation activities. The chatbot provides just-in-time explanations responsive to student queries.
- Guided exploration (30–45 min): Students engage in structured simulation-based activities with increasing autonomy. The chatbot employs Socratic questioning to deepen understanding and diagnostic prompts to identify misconceptions.
- Consolidation and reflection (15–20 min): Students complete application problems and engage in chatbot-facilitated reflection on key insights and connections to engineering practice.
5.3. Illustration: Solving the Structural Reliability Problem
5.3.1. Conceptual Foundation
5.3.2. Simulation Component
5.3.3. AI Chatbot Integration
5.3.4. Assessment
- Prediction tasks: Students predict reliability changes before manipulating parameters, then verify predictions through simulation.
- Explanation prompts: The chatbot asks students to explain observed phenomena in their own words, assessing conceptual understanding.
- Application problems: Students calculate reliability indices for given scenarios and interpret results in engineering terms.
5.4. Module Progression
6. Illustrative Learning Scenarios
6.1. Scenario 1: Successful Conceptual Transformation
6.2. Scenario 2: Failure Modes
6.3. Learner Variability Considerations
7. Discussion
7.1. Design-Based Research Limitations
7.2. Synthesis of Design Principles
7.3. Anticipated Benefits and Contributions
7.4. Limitations and Challenges
7.5. Future Research Directions
8. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Technical Implementation Specifications
Appendix A.1. Complete System Prompt Template
Appendix A.2. Misconception Detection Patterns
| ID | Detection Pattern | Socratic Opener |
|---|---|---|
| M1.1 | Keywords: “definitely safe,” “guaranteed,” “won’t ever fail”; Patterns matching safety factor claims with certainty language | “That’s an interesting way to think about it. Let me ask: if we built 10,000 structures all with SF = 2.0, what do you think would happen to them over their lifetimes?” |
| M1.3 | Patterns matching small probabilities with “never,” “impossible,” “basically zero” | “That probability does seem incredibly small. Here’s something to consider: how many structures do you think exist in a country like China?” |
| M2.1 | Patterns matching sample measurement claims with “know,” “found,” “exact” | “If a different engineer ran the same test with a different set of 30 specimens, what mean value do you think they would get?” |
| M3.2 | Patterns matching with “percent,” “probability,” or direct equality to | “I see you’re connecting to probability, that’s on the right track! But let me ask: what are the units of , and how does that relate to probability, which must be between 0 and 1?” |
Appendix A.3. Calculation Verification Protocols
Appendix A.4. Response Validation Protocol
Appendix A.5. Escalation Decision Tree
- If student explicitly requests human help → immediate escalation with context summary
- If safety or well-being concern detected → immediate escalation plus provide appropriate resources
- If response validation failed → withhold response, inform student, escalate to instructor
- If same misconception appears >3 times in session → continue interaction but flag for instructor review and suggest office hours
- If high frustration indicators detected → empathetic response, suggest break, escalate
- If question outside domain boundaries → polite redirect with alternative resources
- If confidence score below threshold → include uncertainty language and suggest verification
- Otherwise → deliver response normally
Appendix A.6. Instructor Dashboard Specifications
| 1 | Throughout this paper, “failure” in the reliability context refers to the mathematical event where load effect exceeds resistance (), which may or may not correspond to physical collapse depending on the limit state under consideration. We distinguish this from “failure” of the pedagogical framework to achieve learning objectives, and from “failure” of AI systems to provide appropriate responses. |
References
- Abdelghani, R., Wang, Y.-H., Yuan, X., Wang, T., Lucas, P., Sauzéon, H., & Oudeyer, P.-Y. (2024). GPT-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education, 34(2), 483–518. [Google Scholar] [CrossRef]
- Albadarin, Y., Saqr, M., Pope, N., & Tukiainen, M. (2024). A systematic literature review of empirical research on ChatGPT in education. Discover Education, 3, 60. [Google Scholar] [CrossRef]
- Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4(2), 167–207. [Google Scholar] [CrossRef]
- Ang, H.-S., Alfredo, & Tang, W. H. (1975). Probability concepts in engineering planning and design. John Wiley and Sons. [Google Scholar]
- Azevedo, R., & Hadwin, A. F. (2005). Scaffolding self-regulated learning and metacognition—Implications for the design of computer-based scaffolds. Instructional Science, 33(5/6), 367–379. [Google Scholar] [CrossRef]
- Babai Shishavan, H. (2024, December 1–4). AI in higher education: Guidelines on assessment design from Australian universities. ASCILITE Conference Proceedings, Melbourne, Australia. [Google Scholar]
- Baidoo-Anu, D., & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52–62. [Google Scholar] [CrossRef]
- Bakker, A. (2018). Design research in education: A practical guide for early career researchers. Routledge. [Google Scholar]
- Barradell, S. (2013). The identification of threshold concepts: A review of theoretical complexities and methodological challenges. Higher Education, 65(2), 265–276. [Google Scholar] [CrossRef]
- Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(26), e2422633122. [Google Scholar] [CrossRef]
- Batanero, C., & Álvarez-Arroyo, R. (2024). Teaching and learning of probability. ZDM–Mathematics Education, 56(1), 5–17. [Google Scholar] [CrossRef]
- Belland, B. R. (2014). Scaffolding: Definition, current debates, and future directions. In J. M. Spector, M. D. Merrill, J. Elen, & M. J. Bishop (Eds.), Handbook of research on educational communications and technology (pp. 505–518). Springer. [Google Scholar]
- Blikstein, P., & Worsley, M. (2016). Multimodal learning analytics and education data mining: Using computational technologies to measure complex learning tasks. Journal of Learning Analytics, 3(2), 220–238. [Google Scholar] [CrossRef]
- Cheng, H., Xiao, E., Gu, J., Yang, L., Duan, J., Zhang, J., Cao, J., Xu, K., & Xu, R. (2024a). Unveiling typographic deceptions: Insights of the typographic vulnerability in large vision-language models. In Proceedings of the European conference on computer vision (pp. 179–196). Springer. [Google Scholar]
- Cheng, H., Xiao, E., Yang, J., Cao, J., Zhang, Q., Yang, L., Zhang, J., Xu, K., Gu, J., & Xu, R. (2024b). Typography leads semantic diversifying: Amplifying adversarial transferability across multimodal large language models. arXiv. [Google Scholar] [CrossRef]
- Cheng, H., Xiao, E., Yang, J., Cao, J., Zhang, Q., Zhang, J., Xu, K., Gu, J., & Xu, R. (2025, June 11–15). Not just text: Uncovering vision modality typographic threats in image generation models. Computer Vision and Pattern Recognition Conference (pp. 2997–3007), Nashville, TN, USA. [Google Scholar]
- Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243. [Google Scholar] [CrossRef]
- Davidovitch, L., Parush, A., & Shtub, A. (2006). Simulation-based learning in engineering education: Performance and transfer in learning project management. Journal of Engineering Education, 95(4), 289–299. [Google Scholar] [CrossRef]
- De Jong, T. (2010). Cognitive load theory, educational research, and instructional design: Some food for thought. Instructional Science, 38(2), 105–134. [Google Scholar] [CrossRef]
- Der Kiureghian, A., & Ditlevsen, O. (2009). Aleatory or epistemic? Does it matter? Structural Safety, 31(2), 105–112. [Google Scholar] [CrossRef]
- Design-Based Research Collective. (2003). Design-based research: An emerging paradigm for educational inquiry. Educational Researcher, 32(1), 5–8. [Google Scholar] [CrossRef]
- Dos, I. (2025). A systematic review of research on ChatGPT in higher education. The European Educational Researcher, 8(2), 59–76. [Google Scholar] [CrossRef]
- Elsayed, H. (2024). The impact of hallucinated information in large language models on student learning outcomes: A critical examination of misinformation risks in AI-assisted education. Northern Reviews on Algorithmic Research, Theoretical Computation, and Complexity, 9(8), 11–23. [Google Scholar]
- Faber, M. H. (2005). On the treatment of uncertainties and probabilities in engineering decision analysis. Journal of Offshore Mechanics and Arctic Engineering, 127(3), 243–248. [Google Scholar] [CrossRef]
- Fernández-Sánchez, G., & Millán, M. Á. (2013). Structural analysis education: Learning by hands-on projects and calculating structures. Journal of Professional Issues in Engineering Education and Practice, 139(3), 244–247. [Google Scholar] [CrossRef]
- Garfield, J., & Ahlgren, A. (1988). Difficulties in learning basic concepts in probability and statistics: Implications for research. Journal for Research in Mathematics Education, 19(1), 44–63. [Google Scholar] [CrossRef]
- Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H. H., Ventura, M., Olney, A., & Louwerse, M. M. (2004). AutoTutor: A tutor with dialogue in natural language. Behavior Research Methods, Instruments, & Computers, 36(2), 180–192. [Google Scholar] [CrossRef] [PubMed]
- Gu, T., Liang, Y., Yan, Y., Jiang, W., Yue, H., Hu, G., & Zhang, J. (2026). Towards high-fidelity urban wind profiles for the built environment: A neural field to fuse multi-source observational data in Guangzhou, China. Building and Environment, 288, 114009. [Google Scholar] [CrossRef]
- Gundersen, P. B. (2021). Exploring the challenges and potentials of working design-based in educational research [Doctoral dissertation, Aalborg University]. [Google Scholar]
- Haldar, A., & Mahadevan, S. (2000). Probability, reliability and statistical methods in engineering design. John Wiley and Sons. [Google Scholar]
- Hanghøj, T., Händel, V. D., Duedahl, T. V., & Gundersen, P. B. (2022). Exploring the messiness of design principles in design-based research. Nordic Journal of Digital Literacy, 17(4), 222–233. [Google Scholar] [CrossRef]
- Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141–158. [Google Scholar] [CrossRef]
- Jakeman, J., Eldred, M., & Xiu, D. (2010). Numerical approach for quantification of epistemic uncertainty. Journal of Computational Physics, 229(12), 4648–4663. [Google Scholar] [CrossRef]
- Kaplar, M., Lužanin, Z., & Verbić, S. (2021). Evidence of probability misconception in engineering students—Why even an inaccurate explanation is better than no explanation. International Journal of STEM Education, 8(1), 18. [Google Scholar] [CrossRef]
- Kapur, M. (2016). Examining productive failure, productive success, and restudying: Definitions and conceptual clarity. Educational Psychologist, 51(2), 289–299. [Google Scholar] [CrossRef]
- Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., & Krusche, S. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. [Google Scholar] [CrossRef]
- Kazemitabaar, M., Chow, J., Ma, C. K. T., Ericson, B. J., Weintrop, D., & Grossman, T. (2023, November 13–18). How novices use LLM-based code generators to solve CS1 coding tasks in a self-paced learning environment. 23rd Koli Calling International Conference on Computing Education Research (pp. 1–12), Koli, Finland. [Google Scholar]
- Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI tutoring outperforms in-class active learning: An RCT introducing a novel research-based design in an authentic educational setting. Scientific Reports, 15(1), 17458. [Google Scholar] [CrossRef]
- Koedinger, K. R., & Corbett, A. (2001). Cognitive tutors. In Smart machines in education (pp. 145–167). MIT Press. [Google Scholar]
- Koparan, T. (2019). Teaching game and simulation based probability. International Journal of Assessment Tools in Education, 6(2), 235–258. [Google Scholar] [CrossRef]
- Lane, D. M., & Peres, S. C. (2006, July 2–7). Interactive simulations in the teaching of statistics: Promise and pitfalls. Seventh International Conference on Teaching Statistics (Vol. 7, pp. 1–6), Salvador (Bahia), Brazil. [Google Scholar]
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in neural information processing systems (Vol. 33, pp. 9459–9474). Curran Associates, Inc. [Google Scholar]
- Liu, H. L., Carpenter, M., & Gómez, J.-C. (2024). We know that we don’t know: Children’s understanding of common ignorance in a coordination game. Journal of Experimental Child Psychology, 243, 105930. [Google Scholar] [CrossRef]
- Low, B. K., & Phoon, K.-K. (2015). Reliability-based design and its complementary role to Eurocode 7 design approach. Computers and Geotechnics, 65, 30–44. [Google Scholar] [CrossRef]
- Marelli, S., & Sudret, B. (2014). UQLab: A framework for uncertainty quantification in Matlab. In Vulnerability, uncertainty, and risk: Quantification, mitigation, and management (pp. 2554–2563). American Society of Civil Engineers. [Google Scholar]
- Mayer, R. E. (2005). Introduction to multimedia learning. The Cambridge Handbook of Multimedia Learning, 2(1), 24. [Google Scholar]
- Melchers, R. E., & Beck, A. T. (2018). Structural reliability analysis and prediction. John Wiley and Sons. [Google Scholar]
- Meyer, J. H., & Land, R. (2006). Threshold concepts and troublesome knowledge: An introduction. In Overcoming barriers to student understanding (pp. 3–18). Routledge. [Google Scholar]
- Mitrovic, A., Ohlsson, S., & Barrow, D. K. (2013). The effect of positive feedback in a constraint-based intelligent tutoring system. Computers & Education, 60(1), 264–272. [Google Scholar] [CrossRef]
- Moss, R. E. S. (2011). Teaching reliability at the undergraduate level. In GeoRisk 2011 (pp. 1165–1171). American Society of Civil Engineers. [Google Scholar]
- Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2025). A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology, 16(5), 1–72. [Google Scholar] [CrossRef]
- Naznin, K., Al Mahmud, A., Nguyen, M. T., & Chua, C. (2025). ChatGPT integration in higher education for personalized learning, academic writing, and coding tasks: A systematic review. Computers, 14(2), 53. [Google Scholar] [CrossRef]
- Németh, R., Tátrai, A., Szabó, M., Zaletnyik, P. T., & Tamási, Á. (2025). Exploring the use of retrieval-augmented generation models in higher education: A pilot study on artificial intelligence-based tutoring. Social Sciences & Humanities Open, 12, 101751. [Google Scholar]
- Nye, B. D., Graesser, A. C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education, 24(4), 427–469. [Google Scholar] [CrossRef]
- Olivier, A., Giovanis, D. G., Aakash, B., Chauhan, M., Vandanapu, L., & Shields, M. D. (2020). UQpy: A general purpose Python package and development environment for uncertainty quantification. Journal of Computational Science, 47, 101204. [Google Scholar] [CrossRef]
- OpenAI. (2024). GPT-4 technical report. arXiv. [Google Scholar] [CrossRef]
- Panadero, E. (2017). A review of self-regulated learning: Six models and four directions for research. Frontiers in Psychology, 8, 422. [Google Scholar] [CrossRef] [PubMed]
- Peng, H., & Zhang, J. (2025). Efficient, scalable emulation of stochastic simulators: A mixture density network based surrogate modeling framework. Reliability Engineering & System Safety, 257, 110806. [Google Scholar] [CrossRef]
- Perlman, A., Sacks, R., & Barak, R. (2014). Hazard recognition and risk perception in construction. Safety Science, 64, 22–31. [Google Scholar] [CrossRef]
- Plass, J. L., & Kalyuga, S. (2019). Four ways of considering emotion in cognitive load theory. Educational Psychology Review, 31(2), 339–359. [Google Scholar] [CrossRef]
- Plomp, T., & Nieveen, N. (Eds.). (2013). Educational design research. SLO, Enschede. [Google Scholar]
- Pollock, E., Chandler, P., & Sweller, J. (2002). Assimilating complex information. Learning and Instruction, 12(1), 61–86. [Google Scholar] [CrossRef]
- Qian, Y. (2025). Pedagogical applications of generative AI in higher education: A systematic review of the field. TechTrends, 69, 1105–1120. [Google Scholar] [CrossRef]
- Reeves, K., Blank, B., Hernandez-Gantes, V., & Dickerson, M. (2010, May 30–June 4). Using constructivist teaching strategies in probability and statistics. 2010 Annual Conference & Exposition (pp. 15–1322), Kansas City, MO, USA. [Google Scholar]
- Renkl, A. (2014). Learning from worked examples: How to prepare students for meaningful problem solving. In V. A. Benassi, C. E. Overson, & C. M. Hakala (Eds.), Applying science of learning in education (pp. 118–130). Society for the Teaching of Psychology. [Google Scholar]
- Romero, M. L., & Museros, P. (2002). Structural analysis education through model experiments and computer simulation. Journal of Professional Issues in Engineering Education and Practice, 128(4), 170–175. [Google Scholar] [CrossRef]
- Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General, 130(3), 380–400. [Google Scholar] [CrossRef]
- Sheikh, W. (2024). An intuitive, application-based, simulation-driven approach to teaching probability and random processes. International Journal of Electrical Engineering & Education, 61(1), 17–57. [Google Scholar]
- Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. [Google Scholar] [CrossRef]
- Sriramanan, G., Bharti, S., Sadasivan, V. S., Saha, S., Kattakinda, P., & Feizi, S. (2024). LLM-check: Investigating detection of hallucinations in large language models. In Advances in neural information processing systems (Vol. 37, pp. 34188–34216). Curran Associates, Inc. [Google Scholar]
- Swacha, J., & Gracel, M. (2025). Retrieval-augmented generation (RAG) chatbots for education: A survey of applications. Applied Sciences, 15(8), 4234. [Google Scholar] [CrossRef]
- Sweller, J. (2010). Cognitive load theory: Recent theoretical advances. In Cognitive load theory (pp. 29–47). Cambridge University Press. [Google Scholar]
- Sweller, J., van Merriënboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261–292. [Google Scholar] [CrossRef]
- Tu, J., Choi, K. K., & Park, Y. H. (1999). A new study on reliability-based design optimization. Journal of Mechanical Design, 121(4), 557–564. [Google Scholar] [CrossRef]
- VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221. [Google Scholar] [CrossRef]
- Vrouwenvelder, T. (1997). The JCSS probabilistic model code. Structural Safety, 19(3), 245–251. [Google Scholar] [CrossRef]
- Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes (Vol. 86). Harvard University Press. [Google Scholar]
- Wang, F., & Hannafin, M. J. (2005). Design-based research and technology-enhanced learning environments. Educational Technology Research and Development, 53(4), 5–23. [Google Scholar] [CrossRef]
- Wang, R. E., Ribeiro, A. T., Robinson, C. D., Loeb, S., & Demszky, D. (2025). Tutor CoPilot: A human—AI approach for scaling real-time expertise. arXiv. [Google Scholar] [CrossRef]
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in neural information processing systems (Vol. 35, pp. 24824–24837). Curran Associates, Inc. [Google Scholar]
- Wieman, C., & Perkins, K. (2005). Transforming physics education. Physics Today, 58(11), 36–41. [Google Scholar] [CrossRef]
- Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89–100. [Google Scholar] [CrossRef]
- Zhang, J., Kailkhura, B., & Han, T. Y.-J. (2020, July 12–18). Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. International Conference on Machine Learning (pp. 11117–11128), Online. [Google Scholar]
- Zhang, J., Kailkhura, B., & Han, T. Y.-J. (2021). Leveraging uncertainty from deep learning for trustworthy material discovery workflows. ACS Omega, 6(19), 12711–12721. [Google Scholar] [CrossRef]
- Zhang, J., & Taflanidis, A. A. (2019). Bayesian model averaging for Kriging regression structure selection. Probabilistic Engineering Mechanics, 56, 58–70. [Google Scholar] [CrossRef]
- Zhang, J., & Taflanidis, A. A. (2020). Evolutionary multi-objective optimization under uncertainty through adaptive Kriging in augmented input space. Journal of Mechanical Design, 142(1), 011404. [Google Scholar] [CrossRef]
- Zhang, Z., Wang, C., Wang, Y., Shi, E., Ma, Y., Zhong, W., Chen, J., Mao, M., & Zheng, Z. (2025). Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation. Proceedings of the ACM on Software Engineering, 2(ISSTA), 481–503. [Google Scholar] [CrossRef]
- Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory into Practice, 41(2), 64–70. [Google Scholar] [CrossRef]
- Zokaie, T. (2000). AASHTO-LRFD live load distribution specifications. Journal of Bridge Engineering, 5(2), 131–138. [Google Scholar] [CrossRef]



| Stage | Focus | Description |
|---|---|---|
| 1 | Connection | Linking probabilistic concepts to student experiences with measurement and manufacturing variability |
| 2 | Characterization | Introducing random variables and probability distributions as mathematical tools |
| 3 | Formulation | Presenting the reliability problem as an extension of structural mechanics |
| 4 | Computation | Introducing Monte Carlo for reliability estimation |
| 5 | Application | Connecting classroom concepts to professional practice and design codes |
| ID | Category | Misconception | Response Strategy |
|---|---|---|---|
| M1.1 | Determinism | Safety factors guarantee safety (“If SF > 1, it won’t fail”) | Simulation showing failures despite SF > 1; introduce probability language |
| M1.2 | Determinism | Nominal values equal true values (“The yield strength is 250 MPa”) | Show material test data variability; discuss what specifications represent |
| M1.3 | Probability | Low probability means impossible (“ basically means never”) | Connect to portfolio of structures; expected failures in building stock |
| M2.1 | Statistics | Sample statistics equal population parameters | Sampling simulation; repeated samples give different means |
| M2.2 | Statistics | More data eliminates uncertainty | Show parameter uncertainty decreasing but inherent variability remaining |
| M2.3 | Distributions | All distributions are normal | Show clearly non-normal engineering data; introduce lognormal |
| M2.4 | Distributions | PDF height equals probability | Interactive PDF exploration; probability as area not height |
| M3.1 | Limit state | Limit state is a physical boundary | Multiple failure modes; same beam, different limit states |
| M3.2 | Reliability | is a probability | Explicit conversion; interpretation of as “number of standard deviations” |
| M4.1 | Simulation | More simulations always better without limit | Convergence demonstration; diminishing returns visualization |
| M4.2 | Simulation | Simulation results are exact | Repeated simulations give different estimates; confidence intervals |
| M5.1 | Codes | Code values are scientifically optimal | Calibration history; different codes give different values |
| M5.2 | Risk | Lower always better | Economic optimization; diminishing returns on safety investment |
| Learning Theory | Design Principle | Implementation Feature | Operational/Technical Constraint |
|---|---|---|---|
| Cognitive Load Theory | Scaffold progression from deterministic foundations | Five-module sequence with prerequisite structure | Sequential access enforced (modules unlocked only after completion) |
| Multimedia Learning | Externalize abstractions through visualization | Interactive and real-time distribution displays | Visual–verbal co-location and synchronization |
| Zone of Proximal Development | Calibrate challenge to current understanding | Adaptive hint & difficulty levels | Robust learner-state inference from interaction signals (errors, time-on-task, attempt patterns); conservative defaults when confidence is low |
| Scaffolding | Provide fading support | Socratic prompts before explanations | Explicit prompting policy and consistent selection rules across sessions |
| Self-Regulated Learning | Integrate metacognitive prompting | Prediction tasks; reflection prompts; self-explanation requests | Capture/score explanations with quality checks; flag low-quality inputs |
| Threshold Concepts | Support liminal state navigation | Misconception detection; cognitive conflict activities; integration prompts | Misconception detection reliability sufficient for targeted remediation; escalation to instructor after repeated flags or persistent confusion |
| Variable Type | Variable | Operationalization |
|---|---|---|
| Independent | Instructional condition | Framework intervention vs. traditional implementations (simulation-only, tutoring-only) |
| Implementation fidelity | Degree to which implementation follows design specifications | |
| Dosage | Time spent engaging with framework components | |
| Dependent | Conceptual understanding | Scores on reliability concept inventory (pre/post) |
| Procedural competence | Accuracy on reliability analysis problems | |
| Engagement | Time-on-task; simulation exploration breadth; chatbot interaction depth | |
| Moderating | Prior statistics knowledge | Diagnostic assessment score (prerequisite test) |
| Prior programming experience | Self-report verified by diagnostic coding task | |
| Mediating | Simulation exploration patterns | Systematic vs. random parameter manipulation |
| Misconception trajectories | Time and interactions required to resolve detected misconceptions |
| Module | Title | Learning Objectives | Primary Activities |
|---|---|---|---|
| 1 | Recognizing Uncertainty | Identify uncertainty sources; distinguish aleatory and epistemic uncertainty | Virtual material testing; variability observation |
| 2 | Probability Distributions | Interpret PDFs and CDFs; calculate probabilities; select appropriate distributions | Interactive distribution explorer; parameter manipulation |
| 3 | Structural Reliability Problem | Formulate limit state functions; calculate reliability index; interpret failure probability | Load-resistance distribution visualization; sensitivity exploration |
| 4 | Monte Carlo Simulation | Implement Monte Carlo algorithms; assess convergence; interpret confidence intervals | Step-by-step simulation; convergence observation |
| 5 | Sensitivity Analysis and Design | Conduct sensitivity analysis; connect to LRFD; optimize designs for target reliability | Design optimization; factor calibration |
| Trigger | Chatbot Response | Function |
|---|---|---|
| Student asks: “If mean resistance is double the mean load, is failure impossible?” | “Let’s explore this. Even with , what happens in the distribution tails? Try setting high variability for both and observe the overlap region.” | Socratic |
| Student increases ; decreases | “You’ve discovered something important: increasing resistance variability decreases reliability. Why might quality control matter for structural safety?” | Diagnostic |
| Student asks: “What value is good enough?” | “This involves engineering judgment. Bridges typically target . What factors should influence acceptable risk levels?” | Explanatory |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, J. Scaffolding Probabilistic Reasoning in Civil Engineering Education: Integrating AI Tutoring with Simulation-Based Learning. Educ. Sci. 2026, 16, 103. https://doi.org/10.3390/educsci16010103
Zhang J. Scaffolding Probabilistic Reasoning in Civil Engineering Education: Integrating AI Tutoring with Simulation-Based Learning. Education Sciences. 2026; 16(1):103. https://doi.org/10.3390/educsci16010103
Chicago/Turabian StyleZhang, Jize. 2026. "Scaffolding Probabilistic Reasoning in Civil Engineering Education: Integrating AI Tutoring with Simulation-Based Learning" Education Sciences 16, no. 1: 103. https://doi.org/10.3390/educsci16010103
APA StyleZhang, J. (2026). Scaffolding Probabilistic Reasoning in Civil Engineering Education: Integrating AI Tutoring with Simulation-Based Learning. Education Sciences, 16(1), 103. https://doi.org/10.3390/educsci16010103
