You are currently viewing a new version of our website. To view the old version click .
Education Sciences
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

5 December 2025

Visual Translator: Bridging Students’ Handwritten Solutions and Automatic Diagnosis of Students’ Use of Number Lines to Represent Fractions

,
,
and
1
Graduate School of Education, Rutgers University, New Brunswick, NJ 08901, USA
2
Department of Computer Science, Rutgers University, New Brunswick, NJ 08901, USA
3
School of Education, University of Washington, Seattle, WA 98195, USA
4
Teachers College, Columbia University, New York, NY 10027, USA
Educ. Sci.2025, 15(12), 1638;https://doi.org/10.3390/educsci15121638 
(registering DOI)
This article belongs to the Special Issue Conceptual Understanding in Mathematics: Focusing on Students with Learning Disabilities or Difficulties

Abstract

The latest AI advancements have provided opportunities for developing automated scoring and diagnosis systems that interpret and evaluate students’ written solutions and assist teachers’ grading and evaluation, yet computer vision still represents a technical challenge in detecting and describing the numerical values and spatial locations of key elements in students’ hand-written solutions to mathematics tasks. This study reports the development and evaluation of an AI-based platform, called Visual Translator (VT), that automatically detects and describes the key visual information which is essential to the next step of auto-grading and diagnosis. The VT was trained with a private dataset of students’ handwritten solution images. Human-experts annotated the key elements in students’ solution images to build ground truth. We evaluated the VT performance by comparing the fraction value identification accuracy and location detection accuracy between VT and available LLMs against human expert annotations. Results suggested that VT surpassed GPT and Grok in fraction value identification, and also outperformed Geimini, the only LLM that supports image segmentation, in location detection. This model serves as the first step to reach the ultimate goal for classifying problem-solving strategies and error types in students’ handwritten solutions. Implications for computer vision research, auto-grading and diagnosis in K12 mathematics education are discussed.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.