Automating Spatial Visualisation of Handwritten Vector Equations Using Large Vision Models in Pre-Tertiary Mathematics
Abstract
1. Introduction
2. Objectives
3. Literature Review
3.1. The Pedagogical Importance of Spatial Visualisation Tools
3.2. Learning with Visualisations Helps: A Meta-Analysis of Visualisation Interventions in Mathematics Education
3.3. Image Recognition Performance of GPT-4V(ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions
3.4. Handwritten Mathematical Expression Recognition Using Deep Learning Techniques
4. Methodology
4.1. Preparing Input Images of Handwritten Equations
4.2. Image Processing, GPT-4o’s Computer Vision & Prompt Engineering
- Listing 1. Python-based function (version 3.13.2) to Encode the Image.

- Listing 2. Python-based code snippet (version 3.13.2) using GPT-4o to Analyse the Input Image.

- (i)
- Zero-shot
- Listing 3. Prompt used for zero-shot prompting.

- (ii)
- Few-shot
- Listing 4. Prompt used for few-shot prompting.

- (iii)
- Chain-of-thought
- Listing 5. Prompt used for chain-of-thought prompting.

- (iv)
- Multi-shot
- Listing 6. Prompt used for multi-shot prompting.

4.3. Processing the Data and Visualisation
4.4. Solving Vector-Related Problems
- Listing 7. Snippet of Python-based OOP function (version 3.13.2) to calculate vector-related problems.

4.5. Performance Analysis
- Symbol and Number Recognition Accuracy: Correct identification of all symbols, numbers, and operators.
- Positional Accuracy: Correct spatial placement, ensuring the reconstructed equations maintained structural integrity.
- Formatting Compliance: Adherence to the specified equation format necessary for subsequent processing stages.
- Correct (Full Credit): The model perfectly identifies all vector components and mathematical operations, and the output is structurally identical to the strictly requested formatting (e.g., [[1, 2, 3], [4, 5, 6], ‘addition’]).
- Partially Correct (Partial Credit): The model correctly captures the majority of the mathematical intent but falls short of a perfect conversion. This includes outputs with minor transcription errors (e.g., misreading a single integer), correctly identifying the vectors but mislabelling the operation (or vice versa), or successfully extracting the correct mathematical data but failing to adhere to the strict Python list formatting requested by the prompt.
- Failed (No Credit): The model completely fails to recognise the handwritten input as a mathematical expression, hallucinates major structural components, or produces an output that cannot be reliably parsed by the subsequent programmatic stages (e.g., admitting inability to read the image or generating plain text without data structures).
4.6. Comparison with Other Models
5. Results
5.1. Pipeline
5.2. Performance of GPT-4o
5.3. Performance of Other Models
6. Discussion
6.1. Overview of Findings
6.1.1. Viability of the System
6.1.2. Cost and Effect
6.1.3. GPT-4o’s Position Amongst the Models
6.2. Limitations
- Handwriting ambiguities frequently lead to character misinterpretation. Negative signs are particularly vulnerable; they are often missed entirely or mistaken for stray marks, which alters the mathematical value significantly (e.g., interpreting [−1, −1, 0] as [1, −1, 0]). Additionally, cramped handwriting occasionally causes the model to confuse structurally similar digits, such as 0 and 8.
- When processing multiple vectors, the model applies wrapping brackets inconsistently. Instead of outputting the expected flat list structure (e.g., [[1, 2, 3], [4, 5, 6], ‘addition’]), it frequently over-nests the components (e.g., [[[1, 2, 3], [4, 5, 6]], ‘addition’]). In more complex syntactic structures, such as linear combinations, the model sometimes drops scalar multipliers entirely and extracts only the base vectors.
- If the original handwritten text is tightly cramped or faded at the edges, the model occasionally truncates three-dimensional vectors, mistakenly extracting a 3D coordinate like [1, 0, 0] as a 2D coordinate [1, 0].
- The model exhibits minor syntactic instability in its descriptive string labels. It frequently oscillates between plural and singular nouns (“vector” versus “vectors”) and swaps spaces for underscores (“cross product” versus “cross_product”). Such quirks require robust downstream parsing scripts to prevent strict string-matching algorithms from flagging them as complete failures.
6.3. Future Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
| Model | Input | Cached Input | Output |
|---|---|---|---|
| gpt-4.1 (14 April 2025) | $2.00 | $0.50 | $8.00 |
| gpt-4.1-mini (14 April 2025) | $0.40 | $0.10 | $1.60 |
| gpt-4.1-nano (14 April 2025) | $0.10 | $0.025 | $0.40 |
| o4-mini (16 April 2025) | $1.10 | $0.275 | $4.40 |
| gpt-4o-mini (18 July 2024) | $0.15 | $0.075 | $0.60 |
| gpt-4-turbo (29 April 2024) | $10.00 | NIL | $30.00 |
References
- Duval, R. A Cognitive Analysis of Problems of Comprehension in a Learning of Mathematics. Educ. Stud. Math. 2006, 61, 103–131. [Google Scholar] [CrossRef]
- Sabah, S. Science and engineering students’ difficulties in understanding vector concepts. Eurasia J. Math. Sci. Technol. Educ. 2023, 19, em2310. [Google Scholar] [CrossRef] [PubMed]
- Arcavi, A. The role of visual representations in the learning of mathematics. Educ. Stud. Math. 2003, 52, 215–241. [Google Scholar] [CrossRef]
- Battista, M. The development of geometrical and spatial thinking. In Second Handbook of Research on Mathematics Teaching and Learning; Emerald Publishing Limited: Cambridge, MA, USA, 2007; pp. 843–908. [Google Scholar]
- Resnick, I.; Harris, D.; Logan, T.; Lowrie, T. The relation between mathematics achievement and spatial reasoning. Math. Ed. Res. J. 2020, 32, 171–174. [Google Scholar] [CrossRef]
- Herrera, L.M.M.; Ordóñez, S.J.; Ruiz-Loza, S. Enhancing mathematical education with spatial visualisation tools. Front. Educ. 2024, 9, 1229126. [Google Scholar] [CrossRef]
- Schoenherr, J.; Strohmaier, A.R.; Schukajlow, S. Learning with visualisations helps: A meta-analysis of visualisation interventions in mathematics education. Educ. Res. Rev. 2024, 45, 100639. [Google Scholar] [CrossRef]
- Tomita, K.; Nishida, T.; Kitaguchi, Y.; Kitazawa, K.; Miyake, M. Image Recognition Performance of GPT-4V(ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions. Clin. Ophthalmol. 2025, 19, 1557–1564. [Google Scholar] [CrossRef] [PubMed]
- Kalpana, Y.; Benita, P.S. Handwritten mathematical expression recognition using deep learning techniques. J. Neonatal Surg. 2025, 14, 516–523. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought prompting elicits reasoning in large language models. arXiv 2022, arXiv:2201.11903. [Google Scholar] [CrossRef]
- OpenAI. GPT-4 Technical Report. 2023. Available online: https://openai.com/research/gpt-4 (accessed on 20 June 2025).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Yasuhara, T.; Watanabe, T.; Yamaguchi, T. Handwritten mathematical symbol recognition system considering writing variability. Int. J. Pattern Recognit. Artif. Intell. 2017, 31, 1757002. [Google Scholar]
- Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling laws for neural language models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
- Xie, Y.; Mouchère, H.; Simistira, L.F.; Rakesh, S.; Saini, R.; Nakagawa, M.; Nguyen, C.T.; Truong, T.N. ICDAR 2023 CROHME: Competition on Recognition of Handwritten Mathematical Expressions [Data set]. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 2023), San José, CA, USA, 21–26 August 2023. [Google Scholar] [CrossRef]
- Gervais, P.; Fadeeva, A.; Maksai, A. MathWriting: A dataset for handwritten mathematical expression recognition. arXiv 2024, arXiv:2404.10690. [Google Scholar] [CrossRef]
- Tengan, D.; Wang, H. Variability in mathematical notation across different cultures and curricula. Educ. Process. 2020, 7, 134–145. [Google Scholar]
- Zanibbi, R.; Blostein, D. Recognition and retrieval of mathematical expressions. IJDAR 2012, 15, 331–357. [Google Scholar] [CrossRef]
- Long, Y.; Wang, Z.; Huang, J. Handwritten mathematical expression recognition with deep learning: Challenges and prospects. Pattern Recognit. Lett. 2022, 157, 47–55. [Google Scholar]
- Sharma, A.; Singh, K.; Mishra, R. Noise-robust handwritten mathematical expression recognition: A survey. Image Vis. Comput. 2019, 86, 1–15. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 2023, 36, 34892–34916. [Google Scholar]
- Zhai, X.; Kolesnikov, A.; Houlsby, N.; Beyer, L. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12104–12113. [Google Scholar] [CrossRef]




















| Operation Type | Addition | Cross Product |
|---|---|---|
| Original handwritten equation | ![]() | ![]() |
| Step 1: GPT-4o’s Interpretation | ![]() | ![]() |
| Step 2: Graph Produced | ![]() | ![]() |
| Step 3: Calculation | ![]() | ![]() |
| Prompting Technique | Average Input Tokens * | Average Output Tokens | Average Cost per Query ** | Average Response Time/s | Accuracy |
|---|---|---|---|---|---|
| Zero-shot *** | 447.40 | 44.38 | US$0.001562 | 2.504 | 48.2384% |
| Few-shot | 547.40 | 18.80 | US$0.001557 | 1.547 | 80.6667% |
| Multi-shot | 674.40 | 18.81 | US$0.001874 | 1.458 | 84.5833% |
| Chain-of-thought | 550.40 | 18.79 | US$0.001564 | 1.519 | 63.9167% |
| Model | Average Input Tokens * | Average Output Tokens | Average Cost ** | Average Response Time | Accuracy |
|---|---|---|---|---|---|
| GPT-4o-mini | 13,427.94 | 17.88 | US$0.002025 | 1.114 | 75.6667% |
| GPT-4o-turbo | 675.40 | 18.44 | US$0.007307 | 2.759 | 70.6667% |
| o4-mini | 692.93 | 236.11 | US$0.001801 | 3.319 | 51.5000% |
| GPT-4.1 | 675.40 | 18.92 | US$0.001502 | 1.174 | 89.5833% |
| GPT-4.1-nano | 872.34 | 18.73 | US$0.000117 | 0.467 | 70.9167% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lim, K.Y.T.; Le, N.T.M.; Chanoudam, S. Automating Spatial Visualisation of Handwritten Vector Equations Using Large Vision Models in Pre-Tertiary Mathematics. Multimodal Technol. Interact. 2026, 10, 68. https://doi.org/10.3390/mti10060068
Lim KYT, Le NTM, Chanoudam S. Automating Spatial Visualisation of Handwritten Vector Equations Using Large Vision Models in Pre-Tertiary Mathematics. Multimodal Technologies and Interaction. 2026; 10(6):68. https://doi.org/10.3390/mti10060068
Chicago/Turabian StyleLim, Kenneth Y. T., Nguyen Thanh Minh Le, and Sopheap Chanoudam. 2026. "Automating Spatial Visualisation of Handwritten Vector Equations Using Large Vision Models in Pre-Tertiary Mathematics" Multimodal Technologies and Interaction 10, no. 6: 68. https://doi.org/10.3390/mti10060068
APA StyleLim, K. Y. T., Le, N. T. M., & Chanoudam, S. (2026). Automating Spatial Visualisation of Handwritten Vector Equations Using Large Vision Models in Pre-Tertiary Mathematics. Multimodal Technologies and Interaction, 10(6), 68. https://doi.org/10.3390/mti10060068









