Automating Spatial Visualisation of Handwritten Vector Equations Using Large Vision Models in Pre-Tertiary Mathematics
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsA very interesting paper, carefully written and containing very relevant information concerning the current (and future) of mathematics education in the context of AI.
I have to confess I am not an expert in the tools (prompting techniques) the authors have described and developed, but I appreciate as highly relevant and well done the goals of the paper, the description of the experience, the results and the discussion.
My only suggestion is about Figures 9, 10, 11 and 12, concerning "...some example visualisations of other vector operations..." It is not clear (in my opinion) in the paper if the presented visualizations, that in my opinion are the final and more important output to help students working and visualizing 3D geometry, have any relevant manipulation features (as Dynamic Geometry programs usually have) like dragging the vectors origin, changing the perspective, zooming in or out, etc..And I consider this information as one of the most important facts concerning "...visualising and manipulating objects in three-dimensional space based on abstract equations.." as recognized by the authors in the Introduction.
Author Response
Please see the attachment
Author Response File:
Author Response.docx
Reviewer 2 Report
Comments and Suggestions for Authors1. Originality and Relevance
The manuscript targets a highly relevant and innovative intersection: utilizing Large Vision Models (LVMs) to resolve core cognitive challenges in advanced secondary mathematics education. Specifically, it addresses a well-documented educational gap—the difficulty pre-tertiary students face when attempting to translate abstract symbolic notations of three-dimensional vectors into accurate spatial visualizations. Using computer vision to automate immediate, interactive 3D graphical feedback from handwritten equations represents a highly original and practical pedagogical intervention.
2. Contribution to the Subject Area
Compared with existing published material that typically investigates general AI chat tools or pre-rendered virtual math simulations, this study provides a significant contribution by evaluating a substantial, custom dataset. Testing the system against 1,000 handwritten vector equations modeled directly after a standardized national curriculum (the Singapore-Cambridge GCE 'A' Level H2 Mathematics syllabus) gives this study high empirical value and contextual authenticity. Establishing GPT-4o as a capable baseline for interpreting handwritten syntax offers solid benchmarks for developers building multimodal educational software.
3. Consistency of Conclusions
The conclusions are consistent with the experimental evidence and arguments presented. The data adequately shows that immediate multimodal visual feedback bridges the abstract-geometric cognitive gap for students. The authors are appropriately cautious in acknowledging that while the LVM serves as an effective parsing engine, specific instructional guardrails are still necessary to completely optimize the tool for self-directed learning environments.
4. Appropriateness of References
The references are appropriate, well-targeted, and up-to-date. The bibliography cleanly connects foundational machine learning concepts (such as Vision Transformers and Chain-of-Thought prompting models) with specialized handwriting recognition competitions (e.g., CROHME datasets) and classic spatial reasoning pedagogy.
5. Additional Comments on Tables and Figures
The data presentation effectively demonstrates the model's accuracy and performance limits across different equation formats. To make the report fully thorough for publication, the following minor points should be polished:
-
Ensure that the performance metrics separating correct, partially correct, and failed vision-model conversions are displayed with uniform notation formatting across all analytical segments.
-
Suggestion for Authors: While the textual description of the model pipeline is clear, adding a simple structural diagram showing the execution workflow—from the user's raw handwritten vector input to the LVM segmentation layer, and finally to the generated 3D graphical visualization output—would immensely benefit readers coming from purely pedagogical backgrounds.
Minor Revisions Recommended:
-
Error Characterization: Provide a brief breakdown or qualitative example of the most common handwriting styles or symbolic syntax configurations that triggered failures in the model's parsing baseline to assist future educational tool developers.
-
Technical Environment Disclosure: Briefly state the exact parameters (such as temperature or API model versions) used during testing to support future replication of the accuracy metrics.
Author Response
Please see the attachment
Author Response File:
Author Response.docx

