Using LLM to Identify Pillars of the Mind Within Physics Learning Materials
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsI find that this is a strong and timely paper that presents a novel application of Large Language Models (LLMs) to map physics educational content onto the innovative theoretical framework of the Five Pillars of the Mind. The research is well-structured, the case study is appropriate, and the quantitative evaluation using precision, recall, and F1-Score provides clear, empirical evidence of the performance of different AI models. One key strength is the proposing of a specialized tool (MAXQDA AI Assist) thatt significantly outperforms generalized LLMs (GPT-4o, o1-mini) due to issues with hallucination is a valuable contribution to the field of educational technology. The fusion of a neuroscientific theory (Five Pillars), physics education, and state-of-the-art AI analysis is highly original and represents a new direction for learning analytics.
That said, there aer room for improvement that leads to a clearer educational impact.In particular, the most significant area for improvement lies in fully developing the educational rationale and application. The introduction of the Five Pillars framework is excellent, but the paper currently stops short of explaining why this automated categorization is pedagogically valuable. The most compelling answer to this "why" is its profound connection to Self-Regulated Learning (SRL). The authors should have some discussions in the paper's introduction and discussion and link it with larger context such as SRL for learning physics by high school students. See Davy TK Ng et al, Empowering student self‐regulated learning and science education through ChatGPT: A pioneering pilot study, British Journal of Educational Technology Volume 55 Issue 4 Pages 1328-1353, 2024, which analyzes LLM-based chatbot for physics learning like Newton's laws, similar to the paper reviewed. Some suggestions:
-
Explicitly State the SRL Hypothesis: The introduction can posit that by making the cognitive architecture of a topic (i.e., its constituent Pillars) explicit, learners can better plan, monitor, and evaluate their own understanding. For example, a student who struggles with the "Somatic" pillar in mechanics could self-identify this gap and seek out tactile experiments or simulations.
-
Propose a SRL Application: The discussion can move beyond the analytical benchmark and hypothesize how this technology could be deployed to foster SRL. The key suggestion is to highlight the need for an LLM-powered chatbot or tutor that uses this Pillar-mapping capability. This tool could provide learners with a metacognitive "Pillar map" of a lesson before they begin, helping them plan their learning strategies or how LLM can act as a SRL prompt: e.g., "This problem solution primarily uses the Numerical pillar. How would you explain it using the Spatial-Manipulative pillar instead?" This helps students identify their own knowledge gaps by asking them to categorize concepts and then comparing their assessment to the AI's model.
-
In concluding remark, can discuss MAXQDA AI Assist results using SRL as reference, for example the performance metrics of a reliable SRL tool can be mapped to MAXQDA AI Assist. The high precision of MAXQDA AI Assist is crucial for building student trust, while the hallucinations of general-purpose LLMs demonstrate a significant risk for such an application.
Minor Points:
-
The phrase "o4-mini" is likely a typo and should be standardized to "o1-mini" throughout.
-
The conclusion could be strengthened by outlining a future research agenda that includes a pilot study with students using a Pillar-informed chatbot to measure its actual impact on their LLM behaviors and physics problem-solving skills.
Author Response
Authors’ Response to Review 1:
Changes in the manuscript can be seen in the attachment to this response.
General comment 1: I find that this is a strong and timely paper that presents a novel application of Large Language Models (LLMs) to map physics educational content onto the innovative theoretical framework of the Five Pillars of the Mind. The research is well-structured, the case study is appropriate, and the quantitative evaluation using precision, recall, and F1-Score provides clear, empirical evidence of the performance of different AI models. One key strength is the proposing of a specialized tool (MAXQDA AI Assist) thatt significantly outperforms generalized LLMs (GPT-4o, o1-mini) due to issues with hallucination is a valuable contribution to the field of educational technology. The fusion of a neuroscientific theory (Five Pillars), physics education, and state-of-the-art AI analysis is highly original and represents a new direction for learning analytics.
Response: Thank you for the very positive and constructive review.
General comment 2: That said, there is room for improvement that leads to a clearer educational impact. In particular, the most significant area for improvement lies in fully developing the educational rationale and application. The introduction of the Five Pillars framework is excellent, but the paper currently stops short of explaining why this automated categorization is pedagogically valuable. The most compelling answer to this "why" is its profound connection to Self-Regulated Learning (SRL). The authors should have some discussions in the paper's introduction and discussion and link it with larger context such as SRL for learning physics by high school students. See Davy TK Ng et al, Empowering student self‐regulated learning and science education through ChatGPT: A pioneering pilot study, British Journal of Educational Technology Volume 55 Issue 4 Pages 1328-1353, 2024, which analyzes LLM-based chatbot for physics learning like Newton's laws, similar to the paper reviewed.
Response: Thank you for pointing this out. We agree with this comment. Therefore, we have added some details to the Conclusion, lines 265-295.
Some details:
That said, there is room for improvement that leads to a clearer educational impact. In particular, the most significant area for improvement lies in fully developing the educational rationale and application. The introduction of the Five Pillars framework is excellent, but the paper currently stops short of explaining why this automated categorization is pedagogically valuable.
Response: Yes, thank for this, agree. In the firs version of the contribution, we focused on a short description of why labelling neural networks by the theory of the five pillars of the mind is useful, and now we have added lines 65.66, reference 7, a book published some months ago. Thanks for your question – to focus also on why the automated categorisation is valuable. We have added lines 25-28 in the abstract, and we significantly enlarged conclusion.
The most compelling answer to this "why" is its profound connection to Self-Regulated Learning (SRL). The authors should have some discussions in the paper's introduction and discussion and link it with larger context such as SRL for learning physics by high school students. See Davy TK Ng et al, Empowering student self‐regulated learning and science education through ChatGPT: A pioneering pilot study, British Journal of Educational Technology Volume 55 Issue 4 Pages 1328-1353, 2024, which analyzes LLM-based chatbot for physics learning like Newton's laws, similar to the paper
Response: Yes, agree. We clearly see that automated categorisation is connected to the topic of Self-Regulated Learning. We have added this idea to the conclusion, lines 285-295, and we highlighted that this new, emerging attempt to apply AI to the theory of Self-Regulated Learning, and apply such theory in practical use, has a high value.
Some suggestions:
- Explicitly State the SRL Hypothesis: The introduction can posit that by making the cognitive architecture of a topic (i.e., its constituent Pillars) explicit, learners can better plan, monitor, and evaluate their own understanding. For example, a student who struggles with the "Somatic" pillar in mechanics could self-identify this gap and seek out tactile experiments or simulations.
Response to suggestion 1: Yes, thank, we have added lines 8-9; 307-308 (where we actually used the words from this review, thank).
2. Propose a SRL Application: The discussion can move beyond the analytical benchmark and hypothesize how this technology could be deployed to foster SRL. The key suggestion is to highlight the need for an LLM-powered chatbot or tutor that uses this Pillar-mapping capability. This tool could provide learners with a metacognitive "Pillar map" of a lesson before they begin, helping them plan their learning strategies or how LLM can act as a SRL prompt: e.g., "This problem solution primarily uses the Numerical pillar. How would you explain it using the Spatial-Manipulative pillar instead?" This helps students identify their own knowledge gaps by asking them to categorize concepts and then comparing their assessment to the AI's model.
Response to suggestion 2: Yes, thank you for the inspiring ideas. We consider such an application beyond the scope of this article. Some ideas are mentioned in the conclusion, but this seems to be an open space for future development and collaboration. A kind of map is used in the International Baccalaureate, MYP curriculum, and applied in textbooks – one of which is we referenced (28). Trying to comply with this proposal in this article, we added two more pages, and we saw that the article would become too complex. We prefer to work on this suggestion for some months and possibly publish the complex results of possible cooperation later.
3. In concluding remark, can discuss MAXQDA AI Assist results using SRL as reference, for example the performance metrics of a reliable SRL tool can be mapped to MAXQDA AI Assist. The high precision of MAXQDA AI Assist is crucial for building student trust, while the hallucinations of general-purpose LLMs demonstrate a significant risk for such an application.
Response to suggestion 3: Yes, thank you, we highlighted this idea throughout the article and explicitly on lines 299-300.
Minor Points:
- The phrase "o4-mini" is likely a typo and should be standardized to "o1-mini" throughout
Response to minor point 1: Thank you for this note. We have made references to the AI technologies more consistent and clearer throughout the whole article.
- The conclusion could be strengthened by outlining a future research agenda that includes a pilot study with students using a Pillar-informed chatbot to measure its actual impact on their LLM behaviors and physics problem-solving skills.
Response to minor point 2: Thank you, we have added these ideas to the Abstract as well as to the Conclusion.
Author Response File:
Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this paper, the authors point out that the use of Large Language Models (LLMs) in line with the Five Pillars of the Mind theory could improve the use of LLMs in physics education. This paper presents a case study in which the authors used selected large language models to identify pillars in a selected eight-page teaching material on forces aimed at 12- to 14-year-old students. For this research, they used the LLM ChatGPT-4o and ChatGPT4o-mini, as well as MAXQDA AI Assist.
General comments:
The authors tackled the interesting problem of how people learn, which stems from their previous research and publication [8]. They brought together an interesting theory, the idea of which dates back to the ancient Greeks, and its application was widespread in various fields. Over the centuries, various derivatives of this theory have been created, from philosophical to neuroscientific and educational, to more modern views of neuroconstructivism, which have followed people's desires to know themselves better. The theory of the five pillars of the mind, based on the studies of the mind (psychology), the brain (neuroscience) and learning in education, where it primarily seeks to redefine curricula. We know very little about these various derivatives of the basic theory of the 5 pillars from the paper, since the research relies exclusively on only one interpretation of this theory, that is, the Tokuhama-Espinosa theory.
I am thus missing two key, slightly more comprehensive explanations in the article, namely:
- A slightly more detailed explanation of the Theory of the Five Pillars of the Mind, their historical background, and the conceptual transition to your (Tokuhama-Espinosa) (why and if there are any similar, reference needed) uses, and
- More consistent interpretation and use of different LLMs where there is "real confusion" with you. To be a little more specific: OpenAI is a organization (not a model) that has been developing LLMs from ChatGPT -1.0 to ChatGPT -5.0, in the form of a complete (purchased) version, a demo-free version and in the form of SLM (mini) with a limited learning base, so it is very problematic to make consistent comparisons - be careful about this. For example, when you talk about ChatGPT 4o or ChatGPT4o mini, you are talking about the same version of the 4th generation, but the question is on what basis the model learned, whether limited - mini (SLM), or extended (LLM). I also don't understand what ChatGPT1 and ChatGPT2 mean, whether they are initial versions or just some kind of personal designation of yours. In short, to summarise, please, if you consistently mark the versions and accordingly also check and define your conclusions (approx. line 229-248).
Other, more detailed comments include:
- Make the Abstract more telling. The Abstract is the essence of the article, which, in addition to the title, the reader encounters first. E.g. highlight a research problem (perhaps in the form of research questions), how you approached the research, where the challenges were, and what conclusions you came to.
- Outline the Theory of 5 pillars in a little more detail, as we distinguish between psychological (mind), neuroscientific (brain) and also educational pillars. Primarily, the theory in the area of education is intended for redesigning school curricula, so it would also be useful to think about what the balanced learning material should be. Add various other references and explanations.
141 - "and five codes each representing one of the pillars – were defined "- explanation needed.
142 - "The definitions by Tokuhama-Espinosa [5] were used as the code description/.../ - additional explanation needed.
- 174 "The authors identified 13 physics concepts (see Table 2) that are discussed within the /.../" - further explain this claim, link them to "Pilars"
- 202 - "identify Ohm's Law and Electromagnetic induction as physics concepts discussed in the material" - you mentioned that you deal with forces, etc. What's the connection?
- In the methodological part, it would be useful to add a diagram, a flowchart, and mind maps (algorithm) of the research to make the solved problem more transparent. From the written text, it is very complicated to follow the conducted research.
- Figures 6, 7 and 8 need to be further explained (what are the designations abcisi, what are Summary 1 and 2, etc.).
- 224 -" ChatGPT1 and ChatGPT2 are two summaries of the same version of ChatGPT; however, their results differ". - Explain and analyse why (see introductory remark)
- 229 - 248: "Conclusions drawn from this comparison are /.../" - Further in-depth explanation is needed!
- Consider, and perhaps at least in Conclusion, also point out how and whether textbooks (teaching materials) are appropriately made and whether they highlight the correctness and balance of all five pillars, as this theory is primarily intended to transform the school curriculum.
In general, the language is fine; it probably just needs minor editorial corrections.
Author Response
Authors’ response to Review 2:
Changes in the manuscript can be seen in the attachment to this response.
General comment: . We know very little about these various derivatives of the basic theory of the 5 pillars from the paper, since the research relies exclusively on only one interpretation of this theory, that is, the Tokuhama-Espinosa theory.
Response: Yes, we agree. In this article, the Theory of the Five Pillars is taken as an example to be used by LLMs for textbook analysis. To align the article with Digital, we aimed to avoid focusing on Education Sciences. We hope that a description of the Five Pillars theory, exceeding one page, is sufficient to provide context for the use of AI tools. More about this theory can be found in references 5, 6, 7, and 26.
Comments 1: A slightly more detailed explanation of the Theory of the Five Pillars of the Mind, their historical background, and the conceptual transition to your (Tokuhama-Espinosa) (why and if there are any similar, reference needed) uses
Response 1: We agree. We have added reference 29, a comprehensive study on a similar theory, where A. diSessa predicted around 1000 “primitive phenomenologies”. At the end of the 20th century, digital technologies were not advanced enough to handle such a large number of items in mutual interactions. Additionally, we included reference 30 to illustrate the state of the art in 1979.
Comments 2: More consistent interpretation and use of different LLMs.
Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have changed the abbreviations to standard and consistent abbreviations, from line 91 to the end of the article.
We have also added an explanation of ChatGPT o4 1 and ChatGPT o4 2, line 229. We used the same chatbot to execute the same task twice, but the results were inconsistent. We used ChatGPT o4 to perform this task. We have adjusted the image and the following text to provide more transparency in this methodology. We have also renamed OpenAI o4-mini to ChatGPT o4-mini consistently throughout the paper.
Comments 3: Make the Abstract more telling.
Response 3: We agree. We have, accordingly, modified the whole Abstract. In the attachment, you can see the changes in green.
Comments 4: Outline the Theory of 5 pillars in a little more detail, as we distinguish between psychological (mind), neuroscientific (brain) and also educational pillars. Primarily, the theory in the area of education is intended for redesigning school curricula, so it would also be useful to think about what the balanced learning material should be. Add various other references and explanations.
Response 4: Thank for this question. We tried to add some explanation and references. In Education Sciences, we aim to develop neural networks of learners, and labelling these systems provides us with tools to manage them – we have incorporated these ideas into the abstract (lines 7-12). For readers from Education, we recommend reference 7, a newer and more complex book focused on thinking and the brain.
Comment 5: 174 "The authors identified 13 physics concepts (see Table 2) that are discussed
within the /.../" - further explain this claim, link them to "Pilars"
Response 5: Yes, thank you, we tried to clarify it in Table 2, and lines 179-195.
Comment 6: "identify Ohm's Law and Electromagnetic induction as physics concepts discussed in the material" - you mentioned that you deal with forces, etc. What's the connection?
Response 6: On line 208, there is a list of falsely identified concepts. Of course, there is no real link (at the level of the textbook analysed) between Forces and Electromagnetic Induction; however, we tried to hypothesise how the concept could be generated.
Comment 7: In the methodological part, it would be useful to add a diagram, a flowchart, and mind maps (algorithm) of the research to make the solved problem more transparent. From the written text, it is very complicated to follow the conducted research.
Response 7: We added a concept map to illustrate how we gained data from the LLM to further evaluate their performance for a specific task, Figure 2.
Comments 8 to 10, details relevant to Comment 2.
Response: Thanks for the details. We have made changes, as mentioned in Response 2.
Comment 11: Consider, and perhaps at least in Conclusion, also point out how and whether textbooks (teaching materials) are appropriately made and whether they highlight the correctness and balance of all five pillars, as this theory is primarily intended to transform the school curriculum.
Response 11: This is focused on in our previous article, reference 8. Here, we would like to avoid over-generalisation, so we only added some ideas to the Conclusion, lines 285 to 295.
Author Response File:
Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have addressed all my previous review comments and the paper has improved significantly. I recommend accept.
Author Response
I appreciate your support; the first review really improved the manuscript.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have improved their contributions, taking into account most of the comments and reasoning them appropriately. Certainly, the article could be further improved, especially with the addition of Figure 2.
In Figure 2, try to follow the whole process in more detail, using standardised symbols (https://studyglance.in/c/display.php?tno=5&topic=Algorithms-and-Flow-Chart). It would be necessary to add a colour legend. Why did you use different colours?
Author Response
Comments: The authors have improved their contributions, taking into account most of the comments and reasoning them appropriately. Certainly, the article could be further improved, especially with the addition of Figure 2.
In Figure 2, try to follow the whole process in more detail, using standardised symbols (https://studyglance.in/c/display.php?tno=5&topic=Algorithms-and-Flow-Chart). It would be necessary to add a colour legend. Why did you use different colours?
Response: Thank you for the review. We initially did not use standard symbols - I don't know why. We probably were too focused on the pedagogical aspect of the topic. Of course, standardisation in symbols is a great help in the readability of the flowchart. The colours are not necessary in the flowchart; we have removed them. I appreciate your support. The review really increased the quality of the article. As we note in conclusion, we are going to apply the MAXQDA methods to analyze a whole textbook; and, at the same time, we are going to search for answers which pillars (which neural networks) are reasonable to develop through physics education at various ages. We can expect initial results in about one year, but to obtain more comprehensive results, we need longitudinal research.
