Next Article in Journal
Writing with Decoding and Spelling Difficulties—A Qualitative Perspective
Next Article in Special Issue
Optimizing Assessment Thresholds of a Computer Gaming Intervention for Students with or at Risk for Mathematics Learning Disabilities: Accuracy and Response Time Trade-Offs
Previous Article in Journal
Differentiated Instruction in Problem-Based and Direct Instruction: The Moderating Role of Mathematical Disposition on Students’ Mathematical Communication Skills
Previous Article in Special Issue
Fostering Reflection and Attention to Enhance Struggling Students’ Mathematical Problem Solving—A Case Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Visual Translator: Bridging Students’ Handwritten Solutions and Automatic Diagnosis of Students’ Use of Number Lines to Represent Fractions

1
Graduate School of Education, Rutgers University, New Brunswick, NJ 08901, USA
2
Department of Computer Science, Rutgers University, New Brunswick, NJ 08901, USA
3
School of Education, University of Washington, Seattle, WA 98195, USA
4
Teachers College, Columbia University, New York, NY 10027, USA
*
Author to whom correspondence should be addressed.
Educ. Sci. 2025, 15(12), 1638; https://doi.org/10.3390/educsci15121638
Submission received: 10 September 2025 / Revised: 14 November 2025 / Accepted: 21 November 2025 / Published: 5 December 2025

Abstract

The latest AI advancements have provided opportunities for developing automated scoring and diagnosis systems that interpret and evaluate students’ written solutions and assist teachers’ grading and evaluation, yet computer vision still represents a technical challenge in detecting and describing the numerical values and spatial locations of key elements in students’ hand-written solutions to mathematics tasks. This study reports the development and evaluation of an AI-based platform, called Visual Translator (VT), that automatically detects and describes the key visual information which is essential to the next step of auto-grading and diagnosis. The VT was trained with a private dataset of students’ handwritten solution images. Human-experts annotated the key elements in students’ solution images to build ground truth. We evaluated the VT performance by comparing the fraction value identification accuracy and location detection accuracy between VT and available LLMs against human expert annotations. Results suggested that VT surpassed GPT and Grok in fraction value identification, and also outperformed Geimini, the only LLM that supports image segmentation, in location detection. This model serves as the first step to reach the ultimate goal for classifying problem-solving strategies and error types in students’ handwritten solutions. Implications for computer vision research, auto-grading and diagnosis in K12 mathematics education are discussed.

1. Introduction

Recent advancements in AI have opened up new possibilities for developing automated scoring systems that can help interpret and evaluate students’ written solutions, offering teachers a faster and more efficient way to identify patterns in student thinking and common error types. Although diagnostic assessment is crucial to make individualized instructional decisions, it remains one of the core pedagogical practices that most teachers fall short of, especially when working with students with special needs (Overton, 2016; Zhang et al., 2022). One area of AI applications that fuels increasing enthusiasm is how automated scoring can support teachers in their formative assessment practice when it comes to grade and diagnose students’ handwritten solutions in mathematics problem solving. As the first step towards this ultimate goal, this study focuses on the development and evaluation of an AI-based platform, called Visual Translator, that automatically detects and describes the key visual information essential for the next step of grading and diagnosis. Specifically we chose students’ handwritten representation of fractions with a number line as the target task.

1.1. Importance of Using Number Lines to Represent Fractions

Fraction instruction, as the first step for students to move forward from understanding whole numbers to understanding real numbers, is a critical component of K-12 mathematics instruction. The National Mathematics Advisory Panel (2008) identified proficiency with fractions (including decimals, percentages, and negative fractions) as the most important foundational skill to be developed. Unfortunately, many students struggle with learning fractions (Lortie-Forgues et al., 2015).
Number lines are highly recommended by the Institute of Education Sciences (Siegler et al., 2010) for teaching fractions. In basic mathematics, a number line is a picture of a straight line on which every point is assumed to correspond to a real number and every real number to a point (Stewart et al., 2008). As part of the real number system, any rational number, including positive and negative integers, fractions and decimals, can be represented on a number line. From an assessment perspective, a recent population-level study with 6484 Luxembourgish ninth graders suggested that the fraction estimation task on the number line is a valid tool for assessing mathematical achievement across the entire mathematical achievement spectrum (Nuraydin et al., 2023). From an instructional perspective, representing a rational number on a number line helps students link conceptual and procedural knowledge (Siegler et al., 2010). That is, number lines help students extend their understanding of the number system from whole numbers to rational numbers and help promote students’ understanding of fraction equivalence and translation among fractions, decimals, and percentages. Number lines also help students understand the infinity and density of rational numbers (e.g., there are infinite numbers between 1/2 and 1/3) (Kullberg, 2010) and are used to teach fraction calculation (e.g., 1/2 ÷ 1/4; 1/2 + (−1/3) in many Asian countries (Noparit & Saengpun, 2013). Moreover, the use of number lines aids students’ later learning of coordinates and functional graphs (Earnest, 2015).
Although there has been adequate evidence showing the effectiveness of teaching students to use number lines to solve fraction problems, one should not assume it is easy for students to represent a fraction on a number line. As a method for representing fractions, the number line differs from other representation methods (e.g., sets or regions; Bright et al., 1988). Simply put, the number line is more abstract. According to Piaget et al. (1960), children initially develop their sense of fractions in situations where they must divide lengths and areas in halves (1/2) and then halves again (1/4), therefore children naturally develop the ability to represent a fraction in a closed region that has a clear boundary or ends to represent a unit or a whole. However, a number line is a continuous line of numbers; some students assume that the end of what is seen on the number line is the location of 1 unit or the whole (Tunç-Pekkan, 2015). Additionally, the number line requires the use of symbols to convey part of the intended meaning in the pictorial image; a number line is meaningless until two reference points are marked to indicate a scale (Bright et al., 1988; Pettersson-Berggren, 2015). This makes it hard for students to understand that a unit can be either very long or very short on the number line. These features make the learning of number lines a difficult task. Tunç-Pekkan (2015) documented that fourth and fifth graders exhibited superior performance in fraction problems represented with circles and rectangles that required using part-whole fractional reasoning, in comparison to their performance on items represented with number lines across all problem types. Many teachers prefer to begin fraction instruction with other representation methods rather than number lines (Leinhardt & Smith, 1985); it is also not uncommon that students generate misconceptions in using number lines to solve fraction problems because of teachers’ ineffective pedagogical methods or unclear instruction (Izsák et al., 2008).

1.2. Student Strategies and Error Types When Representing Fractions with a Number Line

Siegler et al. (2011) examined how middle school students estimated the magnitude of fraction numbers on a 0–1 number line and a 0–5 number line. They classified students’ strategies into two primary categories: segmentation strategies (e.g., division into halves, division into whole number units, division into units corresponding to the denominator) and numerical transformation strategies (e.g., rounding, simplifying, or converting a fraction to a whole number or to a decimal). Their research suggests that eighth graders used numerical transformation strategies more frequently than sixth graders. Siegler and Pyke (2013) extended the above study to examine developmental and individual differences and related students’ fraction magnitude estimation with number lines to differences in understanding of whole number division, executive functioning, and metacognitive judgments.
The literature on solving fraction problems has also offered insights into students’ error patterns in using number lines. Bright et al. (1988) examined the error types in students when solving fraction problems using number lines and reported that using the wrong unit was the most common (42%) of the errors, followed by counting marks, rather than intervals, and presenting the inverse of fractions. They also reported students’ difficulties with partitioning and mentally removing partitioning points in understanding fraction equivalence. This difficulty was also found in a later study (Izsák et al., 2008) on teaching fraction addition using number lines.
In a previous study (Zhang et al., 2017), researchers examined the strategies used in number line estimation among 51 middle schoolers, including 27 students with mathematics disabilities. Participants were asked to estimate 10 fractions on a 0–1 number line and 10 fractions on a 0–5 number line and explain their procedures. Researchers identified two faulty strategies (i.e., not-on-the-line and ruler-tick-mark counting strategy) and two execution mistakes (i.e., unequal segmentation and inaccurate numerical transformations) on both types of number-line estimation tasks. We also identified one additional faulty strategy (i.e., treating 0–5 as the 0–1 number line) with the 0–5 number-line estimation tasks. Students with mathematics disabilities were significantly more likely to use the faulty strategies than their peers without mathematics disabilities. The patterns of faulty strategies, rather than execution mistakes, were consistent across two number-line tasks and predicted students’ performance on fraction problem-solving tasks.

1.3. Challenges in Teachers’ Grading and Diagnosis

With all said, the enormously wide variations in strategies and error types that emerged in student problem-solving processes cause challenges and obstacles for teachers to effectively grade students’ number line use, diagnose their mistakes, or interpret students’ reasoning.
Although teachers may easily grade students’ responses by relying on an answer key, it is challenging for teachers to understand why students make this or that type of mistakes as well as what misconceptions are underlying these mistakes. Accurate diagnosis depends on content knowledge in mathematics subject (i.e., understanding of fractions and number lines) and pedagogical knowledge in teaching mathematics (i.e., students’ misconceptions in fraction learning). Unfortunately, mathematics education research shows that teachers vary in how well they anticipate, recognize, and interpret student misconceptions, and these knowledge differences affect the quality of diagnosis and response (Hill et al., 2008).
Given the reality that many elementary teachers may lack substantive rational number knowledge (Cramer et al., 2002; Newton, 2008; Siegler & Lortie-Forgues, 2015), in-depth analysis of students’ error types and accordingly providing individualized instruction is beyond the comfort zone of daily assessment practice for many elementary teachers. Specifically, Newton (2008) reported that many preservice teachers had a limited understanding of fractions as numbers on a number line, often interpreting them as parts of a whole rather than points on a continuous scale. Siegler and Lortie-Forgues (2015) emphasized that pre-service teachers’ misconceptions mirror those of students, especially regarding the placement of fractions on number lines. Cramer et al. (2002) found that even experienced teachers often failed to recognize conceptual errors students made in number line tasks, focusing instead on procedural correctness. This was echoed by a recent study (Kang, 2022) that observed preservice teachers were only able to correctly answer 33% of fraction multiplication problems with a proper fraction as a multiplicand and 29% of division problems with proper fractions as dividend.

1.4. Auto-Grading: Opportunities Offered by AI Development

Auto grading provides opportunities for parents, educators, and students themselves who are less than proficient in the content area or possess adequate pedagogy skills to identify students’ strategies and error types. In the case of representing fractions with a number line, auto grading tools can assist teachers, especially those who lack conceptual understanding of fractions and number lines, and cannot assess or diagnosis students’ work, to provide an in-depth understanding of students’ conceptual or procedural mistakes, such as whether students partitioned the interval correctly, identified the appropriate referent unit, or placed the fraction at the correct location relative to benchmark values. That is, AI-supported systems can guide teacher users toward a deeper understanding of the mathematical reasoning embedded in students’ diagrams. This, in turn, supports more informed instructional responses, enables more consistent feedback across different evaluators, and empowers students to reflect on and improve their own representational strategies. In the next sections, we will review the development of AI techniques and challenges in auto-grading and diagnosing. In particular, we will review the literature on auto-grading with students’ handwritten work, as well as our prior research on using existing LLMs for auto-grading and auto-diagnosis of students’ fraction representation on number line tasks.

1.4.1. Text Mining

Text mining, also known as text analytics or natural language processing (NLP), is a crucial component in the realm of data science and artificial intelligence. Researchers and practitioners (Farrell et al., 2024; Ke et al., 2024) were making notable strides in various aspects of text mining. In the field of education, text mining has been used in a variety of ways, including automated grading of assignments and essays. Natural language processing algorithms assessed the quality of student writing and provided instant feedback, which could be a valuable time-saver for educators. However, challenges persisted, including issues related to bias in NLP models, data privacy concerns, and the need for more interpretability and transparency in NLP systems. Moreover, imbalanced data with rarely seen error types and some children’s irregular handwriting also presents challenges particularly for studying the written responses of students having learning difficulties. As the field advances, addressing these challenges becomes a priority.

1.4.2. Image Processing

In the past decade, there have been impressive advances in developing computer vision (CV) algorithms for different object recognition-related problems, including instance recognition, categorization, scene recognition and pose estimation (Ravneet & Elgammal, 2012). When AI researchers investigate an image, the proposed machine learning algorithms not only recognize its object category and scene category but can also infer various semantic classifications, such as cultural and historical styles. The ability of the machine to classify styles implies that the machine has learned an internal representation that encodes discriminative features through its visual analysis of a complex image. However, it is typical that those visual features extracted and used by machines are not necessarily interpretable by humans. Prior research has reported that using semantic-level information would be more suitable than lower-level visual features (e.g., color, shades, texture or edges) for the style classification of complex visual images such as fine-art genre classification (Ravneet & Elgammal, 2012).
Unfortunately, there is little research that applies these image-data-mining techniques to student responses that were manually generated in the field of K-12 STEM education. There are several unique challenges in this task. Unlike general object detection/classification tasks, in which there is abundant labeling data such as ImageNet, no public labeling data about students’ responses is available. Additionally, students’ response images are noisy. Moreover, the classifications are not as clear as those in common image classification tasks, such as cat, dog, car, etc.

1.5. Prior Work on Auto-Grading with Students’ Drawn Images

Although accuracy has steadily improved since its debut in 2023, the computer vision capability of ChatGPT-4o remains unreliable in processing handwritten fraction representations on a number line. Prior research on ChatGPT’s grading of student-drawn science models (Lee & Zhai, 2023) reported accuracy ranging from 0.26 to 0.64. Among students’ handwritten solution images, there are wide differences in size, slant, neatness, and mathematical notations; students may also write in unconventional orders or include exploratory work. Not all correct answers follow a single path, making rigid pattern-matching approaches insufficient; in the same token, identifying why an error occurred (e.g., conceptual misunderstanding vs. calculation slip) requires deeper semantic understanding of the work.
In our prior research (Zhang et al., 2023, 2025), we identified that the most frequently cited reason for GPT’s failure to evaluate or diagnose students’ number line solutions is its poor computer vision, citing that the image’s resolution is inadequate. In particular, in comparison to reporting the handwritten number values in the images, GPT performed even more poorly in grading tasks that require the identification of locations. For instance, although GPT may identify the values of fractions, it often makes mistakes in describing whether a fraction is on the left or right side of another fraction. Indeed, among the major LLMs (e.g., Grok, GPT), only Gemini-2.5 Pro currently supports image segmentation. However, merely reading the fraction labels does not provide the necessary and sufficient visual feature information to serve as the basis for GPT to process the diagnosis.

2. The Present Research

To fill this research gap, this use-inspired research focuses on the interpretability of image-data-mining by taking a top-down approach to draw on our prior studies as well as other relevant literature in the use of number lines and fractions to identify categories of student strategies, including less productive ones, and delineate in detail the properties for each category that can be used to explore, evaluate, and select the features extracted from student drawings based on machine learning algorithms. Two major steps are involved in the entire auto-diagnosis process: (i) Computer Vision (CV) that is intended to describe what is presented in an image, which is achieved either in our self-developed visual processing model or through the visual model employed in a given generative AI model by asking the LLM to describe the image, and (ii) Error Type Diagnosis based on CV, which can be achieved either through our self-programming by defining the extracted visual features that indicate an error type, or asking the LLM to evaluate students’ error types based on a researcher-provided rubric. This study, focusing on the first step, describes and reports our efforts on leveraging computer vision models to correctly describe the key visual details in students’ works.

Research Questions

Thus, this study is the first step to achieve the ultimate goal of developing and evaluating an AI-based platform that is capable of classifying students’ problem-solving strategies and error types. It reports the development and evaluation of an AI-based platform, Visual Translator (VT). VT aims at two functionalities in order to diagnose students’ responses: (1) improved image processing of students’ solutions when representing a rational number on a number line; and (2) producing text-formatted descriptions which can then be served as input for any generative AI tools so that to fully leverage their powerful reasoning capabilities.
Our research questions include:
(a)
To what extent does the VT model demonstrate effectiveness in reading students’ hand-written numerical values in comparison to ground truth and other LLMs (i.e., GPT 3o, 4o., Gemini 2.5, and Grok), and
(b)
To what extent does the VT model demonstrate effectiveness in location identification in comparison with the ground truth and the other three LLMs?

3. Method

3.1. Data Source

The whole Mathnet dataset, including 3.8 million images of students’ hand-written math problem solutions, was provided by ASSITment platform. The database included a number of variables for each image such as image ID, unit ID, lesson ID, problem ID, math curriculum title, Common Core Standard codes, content tag, along with other variables of student ID and teacher ID, and vice versa.
We first filtered the original dataset using keywords such as fraction from the associated JSON metadata, which contains attribute information for each student’s work. As a result, we obtained 139k images of elementary student responses to fraction problems.

3.2. Procedures

3.2.1. Data Preparation

To efficiently locate images that included number lines within this subset of images on fractions, we adopted an iterative active learning strategy that combined manual labeling with model-assisted filtering. We first manually labeled about 200 seed images that visibly contained number lines and used this subset to train an initial YOLOv8 (Ultralytics, 2023)-based detector as a seed model. The trained detector was then applied to several batches of images to identify likely candidates, which were manually reviewed and verified by trained graduate assistants. The verified samples were added to the training set to retrain and refine the model. This filter–verify–retrain cycle was repeated iteratively until 20,000 images had been processed. Through this pipeline, we ultimately curated 1134 high-quality images featuring 0–1 fraction number lines, which served as the foundation for subsequent key-element annotation and model development. A visualization of the overall workflow is shown in Figure 1.
The curated dataset was then divided into training (70%), validation (15%), and test (15%) subsets, which is a common practice in computer vision research to balance model learning, hyperparameter tuning, and unbiased evaluation. The training set (70%) provides sufficient data diversity for model learning, the validation set (15%) allows for tuning model hyperparameters and preventing overfitting, and the independent test set (15%) ensures reliable performance evaluation on unseen data. This split ensures a rigorous evaluation pipeline: the training set is used to fit the model parameters, the validation set provides unbiased feedback for hyperparameter tuning and early stopping, and the test set serves as a final hold-out set to assess generalization performance on unseen data.

3.2.2. Expert Annotation

Expert annotation purported to train the VT model to detect key elements associated with fraction representations on a number line, which serves as the initial effort toward generating a textual description of the essential visual information in a student’s handwritten solution. A review of mathematics education literature on number line representations for fractions (Siegler et al., 2011) suggested that essential visual information for determining students’ problem solving strategies and accuracy include the ending points (e.g., 0 and 1 for a 0–1 number line), the location of ticks that segment the number line into equal parts, fraction values linked to the ticks, and some students may have commentary number sentences or verbal explanations written as text input (if any). As the vast majority of Mathnet data does not include students’ commentaries, the VT focused on identifying and describing ticks and fraction values and locations.
Trained graduate research assistants annotated the key visual elements by drawing bounding boxes around the ticks, fractions, and whole-number digits (0–9) in each image. As illustrated in Figure 2, different colors were used to distinguish label types—red for ticks, light yellow for fractions, and light blue for the digit 0, among others. Specifically, rather than treating each fraction as a single atomic unit, we deliberately labeled its constituent digits (numerator and denominator) as separate elements. This design enables the VT model to recognize individual digits as fundamental visual components, while a post-processing algorithm subsequently reconstructs complete fraction values by combining the detected digits. Such a representation not only improves the model’s interpretability but also enhances its generalizability to a wider range of fraction formats beyond those explicitly observed during training.
Two trained graduate research assistants independently annotated the visual features with bounding boxes, numerical values, and textual labels, and then cross-checked 30% of each other’s work to assess inter-rater reliability. In total, we annotated 8199 fractions, 8385 ticks, 2474 zeros, 4447 ones, 2054 threes, and other digits.

3.2.3. Model Development

Using the annotated data, we first trained an original model to detect the key elements. We then integrated this model into our VT system, which interprets the detected key elements to construct fraction values and links their spatial locations along with tick marks, thereby generating textual descriptions of the visual elements. Finally, we made the system accessible through a web application for teachers and researchers.
Step 1: Training Model for Key Elements Detection. The training was conducted on the Roboflow platform, which provides computing resources for vision model training. The key elements to be detected were determined by the requirements of our subsequent diagnostic task. Specifically, we defined the key elements as tick marks, fractions, and digits (0 to 9). The labeled data was used to train an object detection model from the YOLOv8 series, which is well-suited for real-time detection and achieves high accuracy in small object recognition.
Step 2. Deploying the Model. The trained model detects key elements—ticks, fractions, and digits—and returns their labels and coordinates. The model, deployed on Roboflow, can be accessed via an API, enabling seamless integration into downstream applications. An example of the VT generated description can be found in Figure 3.
Step 3. Web App Platform. To improve accessibility, we also put VT on Hugging Face Spaces, enabling users to interact with it via a web interface (https://huggingface.co/spaces/wzzanthony7/MathNet (accessed on 29 June 2025)). The system supports functions including (a) uploading an image of a student’s work, (b) displaying the detection results overlaid on the uploaded image, (c) deriving the final fraction values through processing these detection results by clustering the digits associated with each fraction into two groups corresponding to the numerator and denominator, from which the final fraction values are obtained, (d) generating a textual summary of the detected key elements along with their coordinates, and (e) providing a downloadable JSON file containing the detection results. This deployment pipeline ensures that the model can be seamlessly integrated into automated grading and diagnostic workflows, providing both visual and structured data outputs for subsequent analysis.

3.2.4. Model Evaluation

We evaluated the VT by comparing the descriptions generated by VT with (i) descriptions generated by existing LLMs (e.g., GPT 4o, Grok, Gemini 2.5, and open ai o3) and (ii) descriptions produced by human experts as ground truth.
In addition to the earlier annotation of key elements that were used for the VT model development, we pursued two additional steps of annotations to establish the ground truth. As the first additional step for ground truth building, our trained graduate assistants read each image and visually identified fraction values in handwritten images, and these expert-read values were saved as ground truth to evaluate VT-identified fraction values. An example is illustrated in Figure 4, the first line of texts under the image presents the expert-read values of the students’ handwritten fraction values. Expert graduate assistants manually tagged each fraction number with the value the child wrote in the image (e.g., 1/8, 2/8, 3/8, etc.). In this way, we associated a specific expert-read fraction number value that the child wrote (e.g., 1/8) with a corresponding region that is outlined with a box.
In the next step, we established the ground truth for linking fractions to their corresponding ticks (if present) using index pairs in the form of F0–T1, where F0 denotes the first fraction and T1 denotes the second tick. The indices of fractions (i.e., F0, F1, F2 …) and ticks (T0, T1, T2 …) were automatically generated based on the left-to-right order of the top-left coordinates of their bounding boxes. As illustrated in Figure 4, the second line of texts below the image represents the human-read associations between the fractions and the ticks.
An example of the ground truth description is illustrated in Figure 5.
Our evaluation metrics reflect the core objectives of our task, which emphasizes both value accuracy and spatial precision in interpreting students’ handwritten number-line responses. Unlike conventional vision benchmarks that focus solely on object localization, our goal is twofold: (1) to accurately recognize the fraction values written by students, and (2) to correctly identify the positions of key visual elements (e.g., ticks, digits, fractions) that reflect how those values are represented on the number line.
For the fraction value detection, we compared the accuracy of fraction recognized by VT and LLMs with human-annotated fraction values in sequence. To evaluate the model’s ability to detect key elements, we designed two accuracy indices: the Jac Index and the Seq Index. The main difference between them lies in whether the order of detected elements is considered. (1) Jac Index—Order-independent accuracy. The Jac Index measures the overlap between the set of detected elements and the set of ground-truth elements, ignoring the order in which they appear. For example: Assume the ground truth is {0/4, 1/4, 2/4, 3/4, 4/4}. For prediction results such as {0/4, 1/4, 2/4, 3/4, 4/4} or {1/4, 0/4, 2/4, 3/4, 4/4}, the accuracy remains the same because all five elements are correctly detected. (2) Seq Index—Order-sensitive accuracy. The Seq Index evaluates the longest subsequence of correctly detected elements that also follows the ground-truth order (from left to right). This criterion is stricter than the Jac Index, as it penalizes out-of-order detections. For example: assume the ground truth is {0/4, 1/4, 2/4, 3/4, 4/4}. For the prediction {0/4, 1/4, 2/4, 3/4, 4/4}, all five elements are correctly detected in both value and order. In contrast, the prediction {1/4, 0/4, 2/4, 3/4, 4/4} contains all the correct values but places 1/4 and 0/4 in the wrong order, resulting in only three elements being counted as correct. The Jac Index reflects overall detection completeness, while the Seq Index places additional constraints on ordering, making it a more stringent measure.
As for the coordinate information, we evaluate detection accuracy by calculating the Intersection over Union (IoU) between the bounding boxes detected by the VT model and those manually annotated in the ground truth. IoU is a standard metric in computer vision that measures the degree of overlap between detected and reference regions. Typically, a threshold of 0.50 is used to determine whether a detection is correct.

4. Results

4.1. Fraction Value Identification

For fraction value identification, we evaluated performance using both the Jac Index (order-independent) and the Seq Index (order-sensitive) for precision and recall. Precision is defined as the proportion of correctly detected elements out of all elements predicted by the model. In our case,
P r e c i s i o n = #   o f   c o r r e c t l y   d e t e c t e d   f r a c t i o n   v a l u e s #   o f   a l l   d e t e c t e d   f r a c t i o n   v a l u e s
This answers the question: “Of all the fraction values detected by the model, how many are correct?”
Recall is defined as the proportion of correctly detected fraction values among all ground-truth fraction values. Formally,
R e c a l l = #   o f   c o r r e c t l y   d e t e c t e d   f r a c t i o n   v a l u e s #   o f   g r o u n d t r u t h   f r a c t i o n   v a l u e s
This answers the question: “Of all the fraction values present in a student’s work, how many are correctly detected by the model?”
For example, assume the ground truth fractions are {0/4, 1/4, 2/4, 3/4, 4/4} and the model predicts {0/4,1/4, 2/4, 3/4}. In this case, the model achieves a precision of 1.0 since four out of four predictions are correct and a recall of 0.8 since it misses one ground-truth fraction, 4/4.
The results show that our VT model outperforms GPT-4o and Grok-2 under both metrics, but falls slightly behind GPT-o3 and Gemini-2.5 Pro. A detailed comparison of these results is provided in Table 1.
Under the Jac Index, VT achieved a precision of 0.741 and a recall of 0.701, exceeding GPT-4o (precision = 0.638, recall = 0.592) and Grok-2 (precision = 0.455, recall = 0.541), but lower than GPT-o3 (precision = 0.770, recall = 0.695) and Gemini-2.5 Pro (precision = 0.848, recall = 0.874).
Under the Seq Index, VT obtained a precision of 0.611 and a recall of 0.582, again outperforming GPT-4o (precision = 0.521, recall = 0.499) and Grok-2 (precision = 0.352, recall = 0.416), but trailing GPT-o3 (precision = 0.659, recall = 0.606) and Gemini-2.5 Pro (precision = 0.726, recall = 0.749).

4.2. Image Segmentation

In addition to fraction values, the location information of key elements is also crucial for subsequent diagnostic tasks. We evaluated location detection performance using mAP@50 (mean Average Precision at an Intersection over Union threshold of 0.50) across these key elements. This metric, widely used in object detection, measures the average precision of predicted bounding boxes when their overlap with the ground-truth boxes (IoU) is at least 50%. Among the compared models, only Gemini-2.5 Pro currently supports image segmentation; however, its mAP@50 is only 0.11—significantly lower than the t VT model, which achieves an mAP@50 of up to 0.88. As summarized in Table 2, VT consistently outperforms Gemini-2.5 Pro in detecting the locations of all four categories.

5. Discussion

The purpose of this study was to develop and evaluate an AI-based platform, Visual Translator, which provides improved image processing of students’ written solutions, and produces text-formatted descriptions which then be served as input for any generative AI tools in order to fully leverage their powerful reasoning capabilities. Our comparison with major LLMs suggested that (a) with regard to fraction value identification, the VT outperformed GPT 4.0, Grok2, close to GPT o3, and lower than Gemini 2.5; and (b) in terms of location identification, VT outperformed Gemini-2.5 Pro, the only LLM that currently supports image segmentation. As such, we can make a conclusion that the VT, trained with our specialized educational datasets though expert-annotations, outperformed the current LLMs in describing essential visual information.
The VT not only enables efficient and accurate interpretation of images by comprehensively and accurately describing the necessary and sufficient visual elements, but also translates the key image components into a text format for GPT as input prompts. In the following diagnosis step, the textual description serves as input for GPT. This allows GPT to fully leverage its powerful reasoning capabilities. We anticipate that with these enhancements in computer vision, a prerequisite to develop auto-grading and auto-diagnosis, the error diagnosis generated from our VT model’s descriptions will surpass that of the leading LLMs, as the VT has already surpassed the GPT and Grok in fraction value identification and exceeded all LLMs in spatial location identification.

5.1. Implications for Research

Our results shed light on developing effective auto-grading and auto-diagnosis by breaking down the grading into two sub-steps: (a) developing a computer vision model that convert the visual information into text information, and then (b) using the LLMs to process the text information to reach the diagnosis goal. The results suggested that despite being in the era of LLMs, it remains valuable to train a dedicated computer vision model—such as the Visual Translator (VT)—on a private educational dataset annotated by human experts. While LLMs have achieved unprecedented performance in natural language processing (NLP) tasks, their visual processing capabilities are still limited, particularly in accurately detecting and describing spatial relationships within images. This limitation is especially pronounced in educational contexts where visual information often contains nuanced, domain-specific details that are critical for assessment.
In our case, the educational dataset consisted of students’ handwritten work in mathematics problem-solving, annotated by experts to capture key elements and the spatial arrangement of visual elements (e.g., number lines, ending points, fractions, ticks, etc.). Such spatial cues are essential for understanding student thinking but are often overlooked or misinterpreted by current multimodal LLMs. For example, when a number is misplaced on a number line, these errors carry diagnostic significance for identifying misconceptions. However, LLMs, even when augmented with vision capabilities, tend to struggle with recognizing these fine-grained spatial patterns and translating them into accurate, structured descriptions.
By contrast, a purpose-built computer vision model trained on high-quality, expert-annotated educational data can be optimized to detect, classify, and describe such spatial information with greater precision. The VT, for instance, can systematically encode visual features that align with domain-specific instructional goals, enabling it to produce detailed and pedagogically relevant descriptions. This accuracy is crucial for downstream tasks such as auto-grading and auto-diagnosis, where the quality of the visual interpretation directly impacts the validity of the assessment and the effectiveness of personalized feedback.
We recognize that the results of this study suggested that the precision and recall of computer vision, either with VT or any available LLMs, still show a substantial gap compared to human visual abilities in identifying children’s handwritten fraction values or location detections. This result is consistent with the literature that human visual abilities still outperform computer vision at understanding complex visual scenes and novel stimuli that differ from training data (e.g., children’s hand written numbers), and that humans still outperform computer vision systems in contextual and relational localization such as conceptual spatial mapping (e.g., math diagrams, number lines) (Firestone, 2020; Ullman, 2019).
However, the results of this study indicated that the purpose-built VT model outperformed current LLMs, suggesting a promising direction for addressing the technical challenge of computer vision. Our findings suggest that instead of relying on end-to-end LLM grading of raw handwritten images, a more effective approach is to decouple the visual extraction process from the reasoning process. In this framework, teachers who are interested in auto-grading can upload students’ solution images to our VT web app. The system first applies specialized vision tools to extract and translate visual information into structured text descriptions, capturing the key mathematical elements present in the student’s work, such as numerical values, diagram features, interval placements, and location coordinates. By converting messy, variable handwritten work into clean, standardized textual data, the VT model effectively “normalizes” the visual input and reduces the burden on general-purpose LLMs, which typically struggle with irregular handwriting and unconventional visual representations. Once these structured text descriptions are generated, teachers can then leverage the strong text-processing, pattern-matching, and verbal-reasoning capabilities of current LLMs to perform auto-grading or diagnostic analysis. In this two-step pipeline, the LLM is no longer required to interpret complex visual inputs; instead, it works from a clear, accurate textual representation of the student’s mathematical reasoning. This approach allows teachers to benefit from AI-supported grading while avoiding the most problematic limitations of existing computer vision technologies. Overall, this represents a hybrid human-AI workflow in which specialized visual tools extract key features and LLMs provide high-level reasoning. Such a workflow may offer a scalable and pedagogically meaningful pathway for integrating AI into classroom assessment practices, especially as the technology continues to evolve.
Moreover, relying on private, domain-specific datasets addresses another challenge—data privacy and security. Educational data often contains personally identifiable information or sensitive student work that cannot be shared openly. Training a specialized vision model in-house ensures that student data remains protected while still enabling the development of highly effective AI-based assessment tools.

5.2. Implications for Practice

AI development has been surpassing the auto-grading of student work to determine correctness, and expanded to auto-diagnosis involving identifying specific reasoning patterns, error types, and conceptual misunderstandings. Auto-grading and diagnosis systems may drastically reduce the time teachers spend manually scoring assignments, quizzes, and exams. This efficiency is especially valuable for inexperienced teachers without adequate expertise in diagnosing students’ error types, and for high-needs school districts that often experience a shortage in math teacher supplies. When auto-grading and auto-diagnosis are integrated into adaptive learning platforms, they enable personalized learning paths. Students can receive tailored practice exercises, resources, and challenges based on their performance data. This creates a more inclusive learning environment, ensuring that advanced learners are challenged while struggling students receive timely support. For mathematics education, especially at the K-12 level, these capabilities are crucial for timely feedback, formative assessment, and targeted intervention.
The rapid development of AI in education has expanded beyond grading multiple-choice and typed responses to tackling the more complex task of assessing students’ handwritten mathematics work. To be fair, we should point out that computer vision still lags significantly behind human visual abilities in many grading and diagnostic tasks that involve unusual or non-standard image inputs—such as students’ handwritten notes, unconventional number forms, erased or overwritten work, or diagrams drawn in idiosyncratic ways. Current computer vision models are typically trained on large datasets of regular, high-quality images that differ substantially from the messy, varied, and often ambiguous work students produce in real classrooms. As a result, even state-of-the-art LVLMs may struggle with recognizing atypical handwriting, distinguishing between intentional markings and stray pen strokes, or accurately interpreting diagram features that deviate from standard visual patterns. Given these limitations, teachers should exercise caution when relying solely on available LLMs or educational AI platforms, especially those that depend heavily on computer vision components, to perform auto-grading or auto-diagnosis of students’ handwritten work. While these models can provide valuable support and efficiency gains, they may also produce misinterpretations or overlook meaningful student strategies that a human teacher would immediately recognize. Therefore, AI-generated analyses should be used as assistive tools rather than unquestioned evaluators, with teachers maintaining an essential role in verifying outputs, interpreting student reasoning, and ensuring the accuracy and fairness of grading.
On the other hand, although technical bottleneck challenges remain, particularly the limited accuracy of computer vision components in most LLMs, the results of this study nonetheless point to a notably promising direction for applying AI in auto-grading and auto-diagnosis in K12 education using students’ handwritten solutions. The hybrid approach demonstrated in this study (i.e., specialized vision tools first extract key mathematical features and LLMs subsequently perform reasoning and grading) offers a few important implications for classroom instruction. It presents a practical pathway for integrating AI into assessment, with the potential to enhance the diagnostic value of classroom assessments. Teachers often face substantial workloads when grading open-ended mathematical tasks, especially those requiring detailed evaluation of students’ reasoning, diagram use, or conceptual understanding. By using VT-generated text descriptions as input for LLM-based grading, teachers can obtain rapid, consistent analyses of students’ work.
Moreover, this model democratizes access to advanced assessment technologies. Schools with limited resources often cannot adopt fully engineered, commercial auto-grading platforms. The VT-to-LLM pipeline can be implemented with tools that are widely accessible, relatively low-cost, and continually improving. Even teachers without technical expertise can upload images, receive structured outputs, and use LLM, many of which are available through user-friendly interfaces, to generate actionable insights about their students’ learning. This has implications for equity, particularly in under-resourced districts where assessment support may be limited.
Last, the approach encourages a more transparent and interpretable form of AI-supported grading. Because the VT-generated text descriptions explicitly list the extracted numerical values, coordinates, and diagram features, teachers can easily inspect and verify what the AI “saw.” This stands in contrast to black-box end-to-end models, where errors are often opaque and difficult to identify. Such transparency not only improves teacher trust but also supports responsible AI integration consistent with emerging ethical standards in education.

5.3. Limitations & Future Research

To further improve our VT model, we are developing a post-processing module based on reinforcement learning (RL). Unlike the initial detection stage, which identifies digits, fractions, and ticks, this RL-based module operates on the detected items to perform self-correction. For example, if a digit “4” is detected but appears after a sequence ranging from 1 to 5, it is likely a misrecognition and should be corrected to “6.” Similarly, a tick mark that is mistakenly recognized as “1” can be revised back to a tick.
This study demonstrates the high potential of AI use in error diagnosis across the board in early mathematics. Our future research will extend the current work on auto-grading and diagnosis of proper fractions on a 0–1 number line to a broader range of rational numbers. Specifically, we are expanding our labeled training data and broadening the scope of our research from proper fractions on the 0–1 number line to include improper fractions, mixed fractions, decimals, and negative numbers, thereby making fuller use of number lines for understanding students’ numerical development.

Author Contributions

Conceptualization, D.Z. and M.L.; model development Z.W.; model evaluation—Z.W., M.L. and Y.T.; writing—original draft preparation, D.Z. and Z.W.; writing—review and editing, M.L. and Y.T.; visualization, Z.W.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Jaffe Foundation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from Assistments and are available https://www.assistments.org/ (accessed on 29 June 2025) with the permission of Assistments.

Acknowledgments

During the preparation of this manuscript/study, the author(s) used Open AI for the purposes of grammar check and language editing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
APIApplication Programming Interface
AIArtificial Intelligence
CVComputer Vision
GPTGenerative Pre-trained Transformer
GPT 4oGenerative Pre-trained Transformer 4 Omni
K12Kindergarten to 12th grade
IDIdentifier
IoUIntersection over Union
JSONJavaScript Object Notation
LLMLarge Language Model
mAP@50mean Average Precision at an Intersection over Union threshold of 0.50
RLReinforcement Learning
STEMScience, Technology, Engineering, and Mathematics
VTVisual Translator
YOLOv8You Only Look Once, version 8

References

  1. Bright, G. W., Behr, M. J., Post, T. R., & Wachsmuth, I. (1988). Identifying fractions on number lines. Journal for Research in Mathematics Education, 19(3), 215–232. [Google Scholar] [CrossRef]
  2. Cramer, K. A., Post, T. R., & delMas, R. C. (2002). Initial fraction learning by fourth- and fifth-grade students: A comparison of the effects of using commercial curricula with the Rational Number Project curriculum. Journal for Research in Mathematics Education, 33(2), 111–144. [Google Scholar]
  3. Earnest, D. (2015). From number lines to graphs in the coordinate plane: Investigating problem solving across mathematical representations. Cognition and Instruction, 33(1), 46–87. [Google Scholar] [CrossRef]
  4. Farrell, M. J., Le Guillarme, N., Brierley, L., Hunter, B., Scheepens, D., Willoughby, A., Yates, A., & Mideo, N. (2024). The changing landscape of text mining: A review of approaches for ecology and evolution. Proceedings of the Royal Society B, 291, 20240423. [Google Scholar] [CrossRef] [PubMed]
  5. Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43), 26562–26571. [Google Scholar] [CrossRef]
  6. Hill, H. C., Ball, D. L., & Schilling, S. G. (2008). Unpacking pedagogical content knowledge: Conceptualizing and measuring teachers’ topic-specific knowledge of students. Journal for Research in Mathematics Education, 39(4), 372–400. [Google Scholar] [CrossRef]
  7. Izsák, A., Tillema, E., & Tunç-Pekkan, Z. (2008). Teaching and learning fraction addition on number lines. Journal for Research in Mathematics Education, 39(1), 33–62. [Google Scholar]
  8. Kang, H. J. (2022). Preservice elementary teachers’ understanding of fraction multiplication and division in multiple contexts. International Electronic Journal of Elementary Education, 15(2), 109–121. [Google Scholar] [CrossRef]
  9. Ke, Z. T., Ji, P., Jin, J., & Li, W. (2024). Recent advances in text analysis. Annual Review of Statistics and Its Application, 11, 347–372. [Google Scholar] [CrossRef]
  10. Kullberg, A. (2010). What is taught and what is learned. Professional insights gained and shared by teachers of mathematics. Department of Pedagogical, Curricular and Professional Studies, Institutionen för Didaktik och Pedagogisk Profession. [Google Scholar]
  11. Lee, G., & Zhai, X. (2023). NERIF: GPT-4V for automatic scoring of drawn models. arXiv. [Google Scholar] [CrossRef]
  12. Leinhardt, G., & Smith, D. A. (1985). Expertise in mathematics instruction: Subject matter knowledge. Journal of Educational Psychology, 77(3), 247–271. [Google Scholar] [CrossRef]
  13. Lortie-Forgues, H., Tian, J., & Siegler, R. S. (2015). Why is learning fraction and decimal arithmetic so difficult? Developmental Review, 38, 201–221. [Google Scholar] [CrossRef]
  14. National Mathematics Advisory Panel. (2008). Foundation for success: The final report of the National Mathematics Advisory Panel. U.S. Department of Education. [Google Scholar]
  15. Newton, K. J. (2008). An extensive analysis of preservice elementary teachers’ knowledge of fractions. American Educational Research Journal, 45(4), 1080–1110. [Google Scholar] [CrossRef]
  16. Noparit, T., & Saengpun, J. (2013). How student teachers use proportional number line to teach multiplication and division of fraction: Professional learning in context of lesson study and open approach. Creative Education, 4(8), 19–24. [Google Scholar] [CrossRef]
  17. Nuraydin, S., Stricker, J., Ugen, S., Martin, R., & Schneider, M. (2023). The number line estimation task is a valid tool for assessing mathematical achievement: A population-level study with 6484 Luxembourgish ninth-graders. Journal of Experimental Child Psychology, 225, 105521. [Google Scholar] [CrossRef] [PubMed]
  18. Overton, T. (2016). Assessing learners with special needs: An applied approach. Pearson. [Google Scholar]
  19. Pettersson-Berggren, M. B. G. (2015). Teachers developing teaching: A comparative study on critical features for pupils’ perception of the number line. International Journal for Lesson and Learning Studies, 4(4), 383–400. [Google Scholar]
  20. Piaget, J., Inhelder, B., & Szeminska, A. (1960). The child’s conception of geometry. Routledge. [Google Scholar]
  21. Ravneet, S. A., & Elgammal, A. (2012, November 11–15). Towards automated classification of fine-art painting style: A comparative study. 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan. [Google Scholar]
  22. Siegler, R. S., Carpenter, T., Fennell, F., Geary, D., Lewis, J., Okamoto, Y., Thompson, L., & Wray, J. (2010). Developing effective fractions instruction for kindergarten through 8th grade: A practice guide (NCEE 2010-4039). National Center for Education Evaluation and Regional Assistance. Available online: https://ies.ed.gov/ncee/wwc/docs/practiceguide/fractions_pg_093010.pdf (accessed on 29 June 2025).
  23. Siegler, R. S., & Lortie-Forgues, H. (2015). Conceptual knowledge of fraction arithmetic. Journal of Educational Psychology, 107(3), 909. [Google Scholar] [CrossRef]
  24. Siegler, R. S., & Pyke, A. A. (2013). Developmental and individual differences in understanding of fractions. Developmental Psychology, 49(10), 1994–2004. [Google Scholar] [CrossRef]
  25. Siegler, R. S., Thompson, C. A., & Schneider, M. (2011). An integrated theory of whole number and fraction development. Cognitive Psychology, 62, 273–296. [Google Scholar] [CrossRef]
  26. Stewart, J., Redlin, L., & Watson, S. (2008). Precalculus: Mathematics for calculus (5th ed.). Cengage Learning. [Google Scholar]
  27. Tunç-Pekkan, Z. (2015). An analysis of elementary school children’s fractional knowledge depicted with circle, rectangle, and number line representations. Educational Studies in Mathematics, 89, 419–441. [Google Scholar] [CrossRef]
  28. Ullman, S. (2019). Using neuroscience to develop artificial intelligence. Science, 363(6428), 692–693. [Google Scholar] [CrossRef] [PubMed]
  29. Ultralytics. (2023). Ultralytics YOLO: Real-time object detection and segmentation [Computer software]. Ultralytics. Available online: https://docs.ultralytics.com (accessed on 29 June 2025).
  30. Zhang, D., Li, M., Wang, Z., Lu, Q., & Deng, D. (2023, December 1). Automatic screening and diagnosis of students’ use of number lines to solve fraction problems. AI Education Summit, Notre Dame, Indiana. [Google Scholar]
  31. Zhang, D., Maher, C., & Wilkinson, L. (2022). What is meaningful assessment? In Y. Xin, R. Tzur, & H. Thouless (Eds.), Enabling mathematics learning of struggling students. Springer. [Google Scholar]
  32. Zhang, D., Stecker, P. M., & Beqiri, K. (2017). Understanding the faulty strategies in estimating fractions on number lines among students with and without mathematics disabilities. Learning Disability Quarterly, 40(4), 225–236. [Google Scholar] [CrossRef]
  33. Zhang, D., Wang, Z., & Li, M. (2025, March 12–15). Automated classification of student problem-solving style in representing fractions with a number line. CEC 2025, Baltimore, MD, USA. [Google Scholar]
Figure 1. The workflow to identify images including 0–1 number lines.
Figure 1. The workflow to identify images including 0–1 number lines.
Education 15 01638 g001
Figure 2. Manual labeling of the fraction values in one example image of a student’s handwritten solution. Different colors were used to distinguish label types: red for ticks, light yellow for fractions, light blue for the digit 0, and additional colors for the other digits.
Figure 2. Manual labeling of the fraction values in one example image of a student’s handwritten solution. Different colors were used to distinguish label types: red for ticks, light yellow for fractions, light blue for the digit 0, and additional colors for the other digits.
Education 15 01638 g002
Figure 3. A screenshot of the VT-generated description. Different colors were used to distinguish label types: pink for ticks, orange for fractions, light green for the digit 0, and additional colors for the other digits.
Figure 3. A screenshot of the VT-generated description. Different colors were used to distinguish label types: pink for ticks, orange for fractions, light green for the digit 0, and additional colors for the other digits.
Education 15 01638 g003
Figure 4. Establishing ground truth for fraction values and fraction-tick link in one example image. Different colors were used to distinguish label types: orange for ticks, red for fractions, blue for the digit 0, and pink for the digit 1.
Figure 4. Establishing ground truth for fraction values and fraction-tick link in one example image. Different colors were used to distinguish label types: orange for ticks, red for fractions, blue for the digit 0, and pink for the digit 1.
Education 15 01638 g004
Figure 5. Description generated with ground truth.
Figure 5. Description generated with ground truth.
Education 15 01638 g005
Table 1. Fraction Value Identification Accuracy of VT and LLMs in Comparison against Ground Truth.
Table 1. Fraction Value Identification Accuracy of VT and LLMs in Comparison against Ground Truth.
ModelPrecision (jac)Recall (seq)Precision (seq)Recall (seq)
VT0.7410.7010.6110.582
Grok-20.4550.5410.3520.416
GPT-4o0.6380.5920.5210.499
o30.7700.6950.6590.606
Gemini 2.5 Pro0.8480.8740.7260.749
Table 2. Location Detection Accuracy between VT and Gemini in Comparison Against Ground Truth.
Table 2. Location Detection Accuracy between VT and Gemini in Comparison Against Ground Truth.
Precision of
VT
Recall of
VT
Precision of
Gemini
Recall of
Gemini
Location of ticks0.7610.7560.1800.176
Location of fractions0.9020.9610.5340.556
Location of ones0.2090.55500
Location of zeros0.5270.83800
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, D.; Wang, Z.; Li, M.; Tao, Y. Visual Translator: Bridging Students’ Handwritten Solutions and Automatic Diagnosis of Students’ Use of Number Lines to Represent Fractions. Educ. Sci. 2025, 15, 1638. https://doi.org/10.3390/educsci15121638

AMA Style

Zhang D, Wang Z, Li M, Tao Y. Visual Translator: Bridging Students’ Handwritten Solutions and Automatic Diagnosis of Students’ Use of Number Lines to Represent Fractions. Education Sciences. 2025; 15(12):1638. https://doi.org/10.3390/educsci15121638

Chicago/Turabian Style

Zhang, Dake, Zhizhi Wang, Min Li, and Yuhan Tao. 2025. "Visual Translator: Bridging Students’ Handwritten Solutions and Automatic Diagnosis of Students’ Use of Number Lines to Represent Fractions" Education Sciences 15, no. 12: 1638. https://doi.org/10.3390/educsci15121638

APA Style

Zhang, D., Wang, Z., Li, M., & Tao, Y. (2025). Visual Translator: Bridging Students’ Handwritten Solutions and Automatic Diagnosis of Students’ Use of Number Lines to Represent Fractions. Education Sciences, 15(12), 1638. https://doi.org/10.3390/educsci15121638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop