CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems

Baek, Sung-Bum; Shon, Jin-Gon; Park, Ji-Su

doi:10.3390/math10081277

Open AccessArticle

CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems

by

Sung-Bum Baek

¹

,

Jin-Gon Shon

²

and

Ji-Su Park

^3,*

¹

Department of e-Learning, Graduate School, Korea National Open University, Seoul 03087, Korea

²

Department of Computer Science, Korea National Open University, Seoul 03087, Korea

³

Department of Computer Science and Engineering, Jeonju University, Jeonju 55069, Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(8), 1277; https://doi.org/10.3390/math10081277

Submission received: 26 January 2022 / Revised: 7 April 2022 / Accepted: 9 April 2022 / Published: 12 April 2022

(This article belongs to the Special Issue Advances in Mathematical Methods, Machine Learning and Deep Learning Based Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The e-learning environment should support the handwriting of mathematical expressions and accurately recognize inputted handwritten mathematical expressions. To this end, expression-related information should be fully utilized in e-learning environments. However, pre-existing handwritten mathematical expression recognition models mainly utilize the shape of handwritten mathematical symbols, thus limiting the models from improving the recognition accuracy of a vaguely represented symbol. Therefore, in this paper, a context-aided correction (CAC) model is proposed that adjusts an output of handwritten mathematical symbol (HMS) recognition by additionally utilizing information related to the HMS in an e-learning system. The CAC model collects learning contextual data associated with the HMS and converts them into learning contextual information. Next, contextual information is recognized through artificial intelligence to adjust the recognition output of the HMS. Finally, the CAC model is trained and tested using a dataset similar to that of a real learning situation. The experiment results show that the recognition accuracy of handwritten mathematical symbols is improved when using the CAC model.

Keywords:

handwritten mathematical symbol recognition; learning context; contextual data; contextual information

MSC:

68T10; 97U50; 97U70

1. Introduction

Numerous symbols with an explicit meaning are used in various mathematical expressions. However, symbols in handwritten mathematical expressions are often vaguely expressed for various reasons, including the handwriting style of the individual writer and the characteristics of the input tool. Therefore, even in datasets widely applied in handwritten mathematical expression recognition research, many vaguely expressed symbols exist.

Therefore, to recognize a handwritten mathematical symbol (HMS) more accurately, it is necessary to consider not only the shape of the HMS but also the data surrounding the HMS, that is, the contextual data. The contextual data of an HMS can be broadly divided into contextual data inside the expression and contextual data outside the expression. Figure 1 shows two examples of HMS recognition errors. Among them, Figure 1a shows a case in which the contextual data inside the expressions must be considered. If referring to the other symbols in the expression, the incorrectly recognized “v” can be corrected as “a”. By contrast, Figure 1b shows a case in which the contextual data outside the expression must be considered. Here, when referring to the symbols used in the first two entered expressions, the incorrectly recognized “u” can be modified as “a”.

Human-related data are ambiguous and diverse; therefore, it is necessary to utilize contextual data to process them accurately. Accordingly, studies using contextual data have been conducted to accurately recognize complex data, such as human behavior and living environments [1,2]. However, no studies have been conducted on the recognition of an HMS that sufficiently consider contextual data in e-learning environments. This paper proposes a use of contextual data outside the expression, which are obtained from an e-learning system.

Throughout this paper, learning context (LC) refers to the environment that influences the learning, such as the learning contents and learning situations. Accordingly, the data in the e-learning system, which are related to the data generated by the learner during learning, are defined as learning contextual data (LC data). In addition, information converted to allow LC data to be used directly for functions including automatic computer recognition is defined as learning contextual information (LC information).

This paper describes a method for adjusting an output of HMS recognition (HMS output) by effectively using LC data. To this end, symbols in mathematical expressions extracted from the learning contents and system data regarding the input positions of these are used as LC data. In addition, LC information is generated using LC data so that it can be directly used to adjust the HMS output. By recognizing LC information through artificial intelligence and correcting the HMS output, the effect of using learning context is proven. The symbols and range of the learning contents used in the implementation and experiment were limited to specific units of middle school mathematics, and the LC data was randomly generated but configured similarly to an actual workbook.

2. Related Work

2.1. Handwritten Mathematical Expression Recognition

Handwritten mathematical expressions refer to expressions written by a user by hand with a pen or similar tool. In an e-learning environment, handwritten mathematical expressions are generally stored as digital data in the form of images and can be broadly divided into offline handwritten mathematical expressions and online handwritten mathematical expressions. Offline handwritten mathematical expressions contain only pixel data, such as general photographic images, whereas online handwritten mathematical expressions include stroke data obtained through a stylus or finger touch, that is, both coordinates of the points and temporal sequence data [3]. This paper aims to identify online handwritten mathematical symbols that can utilize LC data. Therefore, the handwritten mathematical expressions and symbols mentioned in this paper refer to online ones.

As shown in Figure 2, the recognition process for handwritten mathematical expressions is divided into symbol segmentation, symbol recognition, and structural analysis. Whereas symbol segmentation is the process of grouping one or more strokes in a handwritten mathematical expression and dividing them into individual symbol images, symbol recognition is the process of recognizing each symbol image and converting the images into text-format data. A structural analysis is the process of identifying the spatial relations between symbols in consideration of their size and position [3,4,5,6].

In the initial studies on handwritten mathematical expression recognition, the recognition process shown in Figure 2 was sequentially carried out according to an order; however, there is a limitation in that the incorrect output of the previous process affects the following process, and the contextual data inside the expression are not considered [5]. Owing to these difficulties, although, many studies have attempted to apply all recognition processes simultaneously as follows, a perfect level has yet to be reached.

Geometric convex hull constraint, A-star completion estimate, book-keeping [7]
Simultaneous segmentation and recognition through hidden Markov model (HMM) approach [8]
Simultaneous segmentation and recognition through probabilistic context-free grammar [9]
Gaussian mixture model, bidirectional long short-term memory (BLSTM) and recurrent neural network (RNN), two-dimensional probabilistic context-free grammars [10]
BLSTM, Cocke–Younger–Kasami algorithm (CYK) [11]

In particular, studies using two-dimensional probabilistic context-free grammar, HMM, and contextual information inside the expression have been conducted to solve the ambiguous symbol recognition problem. However, it has been difficult to obtain efficient recognition results because of symbols that have similar shape but different semantics, such as

{1, |,^{'}, comma}

,

{P, p}

,

{S, s}

,

{C, c}

,

{X, x, \times}

,

{V, v}

and

{o, 0}

[12,13].

2.2. Pre-Existing Handwritten Mathematical Expression Recognition Models

The Competition on Recognition of Handwritten Mathematical Expressions (CROHME) is held to encourage handwritten mathematical expression recognition research. It provides available data and evaluates the system performance using the same platform and testing data. Numerous research teams have participated in six competitions from 2011 to 2019. Three tasks were applied at 2019 CROHME: online handwritten mathematical expression recognition (Task 1), offline handwritten mathematical expression recognition (Task 2), and the detection of expressions in document pages (Task 3). Among them, the subtasks of recognizing isolated symbols, Tasks 1a and 2a, and parsing expressions from the provided symbols, Tasks 1b and 2b, were added for Tasks 1 and 2, respectively. For these tasks, CROHME provided 12,178 expression data, 214,358 symbol data, 12,126 structure data, and 38,280 expression detection data [14]. The experiment conducted in this paper used the symbol dataset provided for the online single-symbol recognition task (Task 1a) at 2019 CROHME.

The online handwritten mathematical expression recognition task (Task 1) at 2019 CROHME involved eight research teams, as shown in Table 1 [14]. The team that obtained the highest recognition rate was USTC-iFLYTEK (USTC-NELSLIP and iFLYTEK Research), who achieved an accuracy of 80.73% in the simultaneous recognition of expression structures and symbols, whereas the recognition accuracy when considering only the expression structure and ignoring the symbol recognition result was 91.49%. The fact that the accuracy of the symbol recognition barely exceeded 80% means that the symbol was incorrectly recognized once every time five expressions were input in a real learning environment, which is a level at which learners can still feel uncomfortable.

Further analysis of the misrecognized results of all teams suggests that errors in the structure recognition commonly lead to errors in symbol recognition [14]. In addition, it can be interpreted that there are many handwritten mathematical expressions in which the information on the structure did not help in recognizing the symbols, even when the structure was properly recognized. Taking the results of the USTC-iFLYTEK team as an example, in 8.51% of all data, the structure was incorrectly recognized, and many structural errors caused errors in symbol recognition. In addition, in 10.76% of all data, although a correct structure recognition was achieved, an error occurred in the symbol recognition. In most cases, information on the structure recognition outputs was not utilized or was insufficient for recognition of an ambiguous symbol.

An RNN is a representative artificial neural network used to recognize handwritten mathematical expressions. Because RNNs are suitable for recognizing sequential data of variable lengths, they have been used in various studies, including a document summary and email traffic modeling [19,20]. Because the length of online handwritten mathematical expression data is not fixed, RNNs are typically used for the data recognition [15]. In particular, long short-term memory (LSTM), an improved RNN model, adds an input gate, a forget gate, and an output gate to the memory cells of the hidden layer. They remove unnecessary memories from the cell state or add specific information required to it concerning the inputs and the hidden states. All information can be linked to other information with relatively large time intervals through the cell state [21,22]. As shown in Table 1, many of the teams participating in the online handwritten mathematical expression recognition task at 2019 CROHME used RNNs or LSTM.

3. LC Data

3.1. Composition of Learning Contents

As shown in Figure 3, the learning contents stored in the e-learning system described in this paper are composed of four types of learning parts: the learning topics, questions, solving processes, and correct answers. The

k

th learning topic

S^{k}

contains expressions

σ_{1}^{k}

,

σ_{2}^{k}

, etc. The questions related to the learning topic

S^{k}

are

Q_{1}^{k}

, …,

Q_{N_{k}}^{k}

. The

l

th question

Q_{l}^{k}

(

1 \leq l \leq N_{k}

) contains expressions

ρ_{(l) 1}^{k}

,

ρ_{(l) 2}^{k}

, etc. In addition,

W_{l}^{k}

, which is the solving process related to the question

Q_{l}^{k}

, contains expressions

ω_{(l) 1}^{k}

,

ω_{(l) 2}^{k}

, etc. Here,

A_{l}^{k}

, which is the correct answer related to the question

Q_{l}^{k}

, contains expressions

α_{(l) 1}^{k}

,

α_{(l) 2}^{k}

, etc.

The universal set

U

of learning parts in the e-learning system and subsets of

U

according to types of learning parts are defined as follows.

U = {P | P is a learning part in the e-learning system}

(1)

U_{1} = {P \in U | P is a learning topic} = {S^{1}, S^{2}, \dots}

(2)

U_{2} = {P \in U | P is a question} = {Q_{1}^{1}, Q_{2}^{1}, \dots, Q_{1}^{2}, Q_{2}^{2}, \dots}

(3)

U_{3} = {P \in U | P is a solving process} = {W_{1}^{1}, W_{2}^{1}, \dots, W_{1}^{2}, W_{2}^{2}, \dots}

(4)

U_{4} = {P \in U | P is a correct answer} = {A_{1}^{1}, A_{2}^{1}, \dots, A_{1}^{2}, A_{2}^{2}, \dots}

(5)

For a learning part

P \in U

,

L_{1} (P)

,

L_{2} (P)

,

L_{3} (P)

, and

L_{4} (P)

are defined as learning part sets of learning topics, questions, solving processes, and correct answers related to

P

, respectively.

L_{1} (P) = {P^{'} \in U_{1} | P^{'} is a learning topic related to P}

(6)

L_{2} (P) = {P^{'} \in U_{2} | P^{'} is a question related to P}

(7)

L_{3} (P) = {P^{'} \in U_{3} | P^{'} is a solving process related to P}

(8)

L_{4} (P) = {P^{'} \in U_{4} | P^{'} is a correct answer related to P}

(9)

For example,

L_{1} (Q_{1}^{1}) = {S^{1}}

,

L_{2} (S^{1}) = {Q_{1}^{1}, Q_{2}^{1}, \dots, Q_{N_{1}}^{1}}

,

L_{3} (Q_{1}^{1}) = {W_{1}^{1}}

, and

L_{4} (S^{1}) = {A_{1}^{1}, A_{2}^{1}, \dots, A_{N_{1}}^{1}}

.

For a learning part

P

,

F (P)

is defined as a set of all expressions included in

P

.

F (P) = {φ | φ is an expression in P}

(10)

For example,

F (W_{1}^{1}) = {ω_{(1) 1}^{1}, ω_{(1) 2}^{1}, \dots}

.

3.2. Extracted Symbol and Input Position

The expressions in the learning contents of each learning part contain symbols. An extracted symbol is defined as a symbol extracted from the expressions in the learning contents. Table 2 lists an example of extracted symbols. Because the learner inputs mathematical expressions based on these symbols, it is necessary to use the extracted symbols as LC data for correcting the outputs of the ambiguously expressed symbols in the HMS recognition algorithm.

For an expression

φ

,

S (φ)

is defined as a set of all extracted symbols in

φ

.

S (φ) = {s | s is an extracted symbol in φ}

(11)

For example,

S (a \leq - 4) = {a

,

\leq

,

-

,

4}

.

For a symbol

s

and a learning part set

X

,

\bar{P} (s, X)

is defined as a set of all learning parts containing

s

within

X

.

\bar{P} (s, X) = {P \in X | s \in U_{φ_{k} \in F (P)} S (φ_{k})}

(12)

Therefore,

\bar{P} (s, U_{1})

,

\bar{P} (s, U_{2})

,

\bar{P} (s, U_{3})

, and

\bar{P} (s, U_{4})

are sets of learning topics, questions, solving processes, and correct answers that include an expression containing symbol

s

, respectively.

The input position of the expression is also used as LC data in this paper. As shown in the example in Figure 4, there are two types of places where a learner enters an expression during learning: solving processes and answers. The symbols that learners primarily use at each position are different. In addition, even if the same symbol is used for each position, the meaning may be interpreted differently.

3.3. LC Data from e-Learning System

In this paper, it is assumed that learners try to write similar solving processes and answers to model-solving processes and correct answers, respectively, as much as possible with reference to contents in learning topics and questions, and that the following data can be obtained as LC data along with HMS

x

from an e-learning system.

$P_{1} (x)$ is the learning topic that the learner is studying when $x$ is input.
$P_{2} (x)$ is the question that the learner is solving when $x$ is input.
$P_{3} (x)$ is the solving process of the question that the learner is solving when $x$ is input.
$P_{4} (x)$ is the correct answer of the question that the learner is solving when $x$ is input.
The input position $i (x)$ is the value indicating which type of learning part $x$ is input in.

$i (x) = {\begin{matrix} 0, x is input in a solving process . \\ 1, x is input in an answer . \end{matrix}$

(13)

We defined symbol list

D

, which is an ordered list of all symbols used as an output of the HMS recognition. The symbol list size

n_{D}

is the total number of symbols in symbol list

D

, and

d_{i} \in D

(

1 \leq i \leq n_{D}

) represents the

i

th symbol of symbol list

D

, where

i

is the index of symbol

d_{i}

.

The HMS information used in this paper is a row vector expressed as

[\begin{matrix} p_{1} & p_{2} & \dots & p_{n_{D}} \end{matrix}]

. Each element

p_{i}

(

1 \leq i \leq n_{D}

) of the HMS information is the probability that the interpretation of symbol

d_{i} \in D

is the correct one. Given HMS

x

, two vectors for the HMS information are used: the HMS output

y_{o} (x) = [\begin{matrix} p_{1}^{o} & p_{2}^{o} & \dots & p_{n_{D}}^{o} \end{matrix}]

, which is the recognition output of HMS

x

, and the context-applied output

y_{r} (x) = [\begin{matrix} p_{1}^{r} & p_{2}^{r} & \dots & p_{n_{D}}^{r} \end{matrix}]

, which is the adjusted output of the HMS output

y_{o} (x)

that is reflects the LC information.

The definitions of all symbols and functions in Section 3 are summarized in Appendix A.

4. CAC Model

4.1. Composition of CAC Model

In this paper, a context-aided correction (CAC) model is designed as a method for correcting the HMS output using the learning context. It consists of three parts: an LC data collection module, LC information generation module, and HMS output correction module. The composition and function of each module are shown in Figure 5.

First, the LC data collection module collects LC data related to the HMS, such as symbols included in the learning contents and the input position of the expression, from the e-learning system. Next, the LC information generation module converts the collected LC data into LC information so that it can be used in the artificial neural network. Finally, the HMS output correction module recognizes the LC information through an artificial neural network based on the LSTM and corrects the incorrect HMS output to improve the recognition accuracy.

4.2. LC Data Collection Module

The LC data collection module collects four learning parts, which are

P_{1} (x)

,

P_{2} (x)

,

P_{3} (x)

, and

P_{4} (x)

, and input position, which is

i (x)

, for the HMS

x

from the e-learning system.

4.2.1. Extracted Symbol Matrix Generation

E_{n} (x)

(1 \leq n \leq 4)

is a set of extracted symbols for each learning part

P_{n} (x)

.

E_{n} (x) = U_{φ_{k} \in F (P_{n} (x))} S (φ_{k})

(14)

Therefore,

E_{1} (x)

,

E_{2} (x)

,

E_{3} (x)

, and

E_{4} (x)

are sets of extracted symbols within the learning topic, the question, the solving processes, and the correct answers related to the question that the learner is solving when symbol

x

is input, respectively.

The extracted symbol matrix

E

is a matrix containing information about the symbols included in the expressions of each learning part. A

4 \times n_{D}

matrix

E

is obtained as follows:

E = [\begin{matrix} e_{11} & e_{12} & \dots & e_{1 n_{D}} \\ e_{21} & e_{22} & \dots & e_{2 n_{D}} \\ e_{31} & e_{32} & \dots & e_{3 n_{D}} \\ e_{41} & e_{42} & \dots & e_{4 n_{D}} \end{matrix}]

(15)

where

e_{n i} = {\begin{matrix} 0, d_{i} \notin E_{n} \\ 1, d_{i} \in E_{n} \end{matrix}

, and symbol

d_{i}

means the

i

th symbol of the symbol list

D

.

4.2.2. Symbol Frequency Matrix Generation

Assuming that learners input expressions related to the learning contents during mathematic learning, symbols of learning contents tend to be frequently used in expressions input by learners. However, not all symbols have the same frequency. It is therefore necessary to obtain symbol frequency rates, which indicate how often symbols in one learning part are used in another learning part, and to reflect these in adjusting the HMS output. Symbol frequency rates can be obtained from learning contents stored in an e-learning system using the statistical probability of how much the symbols of each learning part match those of the other learning parts.

For a symbol

s

and a learning part set

X

,

{\bar{L}}_{P r c} (s, X)

and

{\bar{L}}_{A n s} (s, X)

are defined as sets of solving processes and answers, respectively, related to all learning parts containing

s

within

X

.

{\bar{L}}_{P r c} (s, X) = U_{P_{k} \in \bar{P} (s, X)} L_{3} (P_{k})

(16)

{\bar{L}}_{A n s} (s, X) = U_{P_{k} \in \bar{P} (s, X)} L_{4} (P_{k})

(17)

For example,

{\bar{L}}_{P r c} (s, U_{2})

is the set of solving processes related to all questions containing symbol

s

, and

{\bar{L}}_{A n s} (s, U_{3})

is the set of answers related to all solving processes containing symbol

s

.

For a symbol

s

and a learning part set

X

, the symbol frequency rate

f_{P r c} (s, X)

is defined as the frequency at which expressions containing

s

are used in the solving processes related to learning parts including

s

in

X

and calculated as follows.

f_{P r c} (s, X) = \frac{\sum_{P_{l} \in \bar{P} (s, {\bar{L}}_{P r c} (s, X))} n (F (P_{l}))}{\sum_{P_{k} \in {\bar{L}}_{P r c} (s, X)} n (F (P_{k}))}

(18)

where

n (A)

means the number of all elements in set

A

. For example,

f_{P r c} (s, U_{2})

is, in all solving processes related to questions containing

s

, the number of expressions containing

s

, divided by the number of all expressions.

Similar to

f_{P r c} (s, X)

, for symbol

s

and learning part set

X

, the symbol frequency rate

f_{A n s} (s, X)

is defined as the frequency at which expressions containing

s

are used in the correct answers related to learning parts including

s

in

X

and is calculated as follows.

f_{A n s} (s, X) = \frac{\sum_{P_{l} \in \bar{P} (s, {\bar{L}}_{A n s} (s, X))} n (F (P_{l}))}{\sum_{P_{k} \in {\bar{L}}_{A n s} (s, X)} n (F (P_{k}))}

(19)

Using the symbol frequency rates of symbols in each learning part, the symbol frequency matrices

R_{P r c}

and

R_{A n s}

with a size of

4 \times n_{D}

can be obtained.

R_{P r c}

represents the symbol frequency rates of symbols in the solving process when they are used in each learning part, and

R_{A n s}

represents the symbol frequency rates of symbols in the correct answer when they are used in each learning part, as shown in Equations (20) and (21):

R_{P r c} = [\begin{matrix} f_{P r c} (d_{1}, U_{1}) & f_{P r c} (d_{2}, U_{1}) & \dots & f_{P r c} (d_{n D}, U_{1}) \\ f_{P r c} (d_{1}, U_{2}) & f_{P r c} (d_{2}, U_{2}) & \dots & f_{P r c} (d_{n D}, U_{2}) \\ f_{P r c} (d_{1}, U_{3}) & f_{P r c} (d_{2}, U_{3}) & \dots & f_{P r c} (d_{n D}, U_{3}) \\ f_{P r c} (d_{1}, U_{4}) & f_{P r c} (d_{2}, U_{4}) & \dots & f_{P r c} (d_{n D}, U_{4}) \end{matrix}]

(20)

R_{A n s} = [\begin{matrix} f_{A n s} (d_{1}, U_{1}) & f_{A n s} (d_{2}, U_{1}) & \dots & f_{A n s} (d_{n D}, U_{1}) \\ f_{A n s} (d_{1}, U_{2}) & f_{A n s} (d_{2}, U_{2}) & \dots & f_{A n s} (d_{n D}, U_{2}) \\ f_{A n s} (d_{1}, U_{3}) & f_{A n s} (d_{2}, U_{3}) & \dots & f_{A n s} (d_{n D}, U_{3}) \\ f_{A n s} (d_{1}, U_{4}) & f_{A n s} (d_{2}, U_{4}) & \dots & f_{A n s} (d_{n D}, U_{4}) \end{matrix}]

(21)

In the LC data collection module, the symbol frequency matrix

R_{o}

is selected as follows by reflecting the input position

i (x)

of the expression to adjust the recognition output of the input HMS

x

efficiently.

R_{o} = {\begin{matrix} R_{P r c}, i (x) = 0 (w h e n x i s i n a n e x p r e s s i o n o f a s o l v i n g p r o c e s s) \\ R_{A n s}, i (x) = 1 (w h e n x i s i n a n e x p r e s s i o n o f a n a n s w e r) \end{matrix}

(22)

4.3. LC Information Generation Module

The LC information generation module receives the extracted symbol matrix

E

and symbol frequency matrix

R_{o}

from the LC data collection module and generates the expected symbol matrix

R

, which is the LC information.

The LC information used in the CAC model is the expected symbol list for the input HMS and the expected value of each expected symbol. For an HMS, the expected symbol is defined as a symbol with the probability to be the correct one, and the expected value means the probability.

The learner tends to use symbols related to the learning contents of each learning part when inputting the expression. Therefore, in the CAC model, the extracted symbols of each learning part are considered the expected symbols, and the expected value of each symbol is set to the symbol frequency rate of this. Therefore, from the extracted symbol matrix

E

(Equation (15)) generated through the extracted symbols of each learning part and the symbol frequency rate matrix

R_{o}

(Equation (22)) generated through the symbol frequency rate and the input position of the equation, the expected symbol matrix

R

with a size of

4 \times n_{D}

, which is the LC information, is calculated as follows:

R = R_{o} \otimes E = [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 n_{D}} \\ r_{21} & r_{22} & \dots & r_{2 n_{D}} \\ r_{31} & r_{32} & \dots & r_{3 n_{D}} \\ r_{41} & r_{42} & \dots & r_{4 n_{D}} \end{matrix}]

(23)

where

\otimes

stands for element-wise multiplication of the matrices. In addition, each element

r_{t i}

of the expected symbol matrix

R

is the expected value of the symbol

d_{i} \in D

obtained from each learning part (t).

4.4. HMS Output Correction Module and Output

The HMS output correction module receives the HMS output

y_{o} (x)

and the expected symbol matrix

R

obtained from the LC information generation module, which are merged as follows and transformed into LC information matrix

C

with a size of

5 \times n_{D}

.

C = [\begin{matrix} y_{o} (x) \\ R \end{matrix}] = [\begin{matrix} p_{1}^{o} & p_{2}^{o} & \dots & p_{n_{D}}^{o} \\ r_{11} & r_{12} & \dots & r_{1 n_{D}} \\ r_{21} & r_{22} & \dots & r_{2 n_{D}} \\ r_{31} & r_{32} & \dots & r_{3 n_{D}} \\ r_{41} & r_{42} & \dots & r_{4 n_{D}} \end{matrix}]

(24)

The 2nd, 3rd, 4th, and 5th row of matrix

C

, which are the rows of the expected symbol matrix

R

, are referred to as sub-contextual information 1, 2, 3, and 4 respectively. To apply these to the HMS output adjustment, one aspect must first be solved, i.e., the problem regarding how much weight each sub-contextual piece of information must have in the coordination of the LC information to achieve the best results. It is difficult to obtain an optimal weight, and even if it is obtained, the list of symbols used for each learning part and the symbol frequency rate are different depending on the learning range and learning contents; therefore, the values also change when the learning conditions change. In the CAC model, a complex algorithm for obtaining these variable weights is implemented using an artificial neural network.

Therefore, the role of the artificial neural network used to recognize the LC information is to improve the accuracy of the HMS output by assigning optimal weights to each sub-contextual information. To this end, in the artificial neural network, the HMS output should be related to all sub-contextual information, and each weight should be applied appropriately. However, in matrix

C

, sub-contextual information is sequentially listed following the HMS output; therefore, an appropriate method for linking the HMS output with all sub-contextual information is required. To efficiently solve this problem, in this paper, LSTM was applied as shown in Figure 6. The parameters of LSTM play the role of weight to be applied to each element of sub-contextual information (

x_{2}

,

x_{3}

,

x_{4}

, and

x_{5}

) to be calculated with HMS output (

x_{1}

). In detail, appropriate weights between HMS output and all sub-contextual information are calculated through the cell state (

c_{t}

) responsible for long-term memory in LSTM. In addition, the relationship between sub-contextual information through the hidden state (

h_{t}

) responsible for short-term memory along with cell state is also reflected in the weight. Matrix

C

is transformed into the context-applied output

y_{r} (x) = [\begin{matrix} p_{1}^{r} & p_{2}^{r} & \dots & p_{n_{D}}^{r} \end{matrix}]

, which is a row vector with a size of

1 \times n_{D}

, through the artificial neural network constructed using LSTM.

Finally, considering the context-applied output

y_{r} (x)

,

θ^{r}

, which is the index of the element with the maximum value, is obtained. That is,

θ^{r} = \underset{1 \leq i \leq n_{D}}{argmax} (p_{i}^{r})

. As a result, the symbol

d_{θ^{r}} \in D

with index

θ^{r}

becomes the final output of the CAC model.

The definitions of all symbols and functions in Section 4 are summarized in Appendix B.

5. Experiment

5.1. Experiment Environment

In this paper, the results of HMS recognition were compared according to whether the LC information was applied using a dataset configured similarly to the actual learning conditions. To this end, units of rational numbers, the calculation of the monomials, and the calculation of the polynomials in a mathematics workbook [23] for middle school students were set up as experimental targets. Like the composition of learning contents in this paper, each question in the workbook is related to a topic, a solving process, and a correct answer. The learning contents within the units consisted of 11 topics and 557 questions related to those topics, and a total of 50 symbols were used. The list of all symbols used in these units is provided in Table 3.

As discussed above, in the analyzed data, it can be confirmed that there is a difference in the symbol frequency rate between the solving processes and the correct answers. Assuming that the learners studying these units write expressions that are similar to the model-solving processes, as shown in Appendix C, Appendix D, Appendix E and Appendix F, the extracted symbols from the learning topics, questions, and solving processes are used more repeatedly in the expressions of the solving processes than in the expressions of the answer.

Accordingly, as shown in Table 4, 89,477 data points for 50 symbols among the datasets of the 2019 CROHME online symbol recognition task (Task 1a) were used for the experiment. Among them, 81,265 were training data, and 8212 were test data.

However, the CROHME dataset did not contain the LC data required for this experiment. Two methods can be considered to arbitrarily match the learning contents of the workbook and the CROHME dataset: (1) a method of allocating the symbols of the CROHME dataset to the LC data constructed from the learning contents, and (2) a method of allocating LC data to the symbols of the CROHME dataset similarly to the learning contents. In the method of (1), the same LC data as the actual learning contents are composed, but many of the symbols of the CROHME dataset are omitted or duplicated. On the other hand, the method of (2) uses all symbols of the CROHME dataset without omission or duplication, but the LC data do not completely match the learning content. In this paper, method (2) was used as follows.

Input position: 16,821 data points of the training set and 1714 data points of the test set, randomly selected according to the ratio of pre-investigated statistics, were set to the symbols of the expression in the answer parts; that is, their input positions were set to answer parts. The others’ input positions were set to solving processes.
Extracted symbols: As shown in Table 5, for a given symbol, there are 16 cases (00 to 15) of a method of designating extracted symbols of the four learning parts, depending on which learning part contains the symbol for data where the symbol is the correct one. Similarly, there are 16 cases for data where the symbol is not the correct one as well. Therefore, all data can be divided into 32 cases for each symbol. For each symbol, we randomly portioned the entire CROHME dataset according to the 32 ratios calculated from the number of symbols in the learning contents to make the setting similar to the actual learning environment. As can be seen in Table 6, which compares the ratio of extracted symbols for symbol ‘2’, in all cases in Table 5, we matched the ratios of extracted symbols assigned to the CROHME dataset to the ratios of the symbols in the learning contents. As a result, the symbol frequency rates of the CROHME dataset became the same as the symbol frequency rates of the learning contents.

Table 7 shows samples in which input position and extracted symbols are arbitrarily assigned to data points.

In addition, the TAP model was used to recognize the HMS of all datasets. TAP is the model used by the USTC-iFLYTEK team and achieved the best results for the online handwritten mathematical expression recognition task (Task 1) at 2019 CROHME, and its source code is open for use in other studies [15].

As discussed in Section 4.4, the artificial neural network used in the HMS output correction module of the CAC model should be able to grasp the relationship between the HMS output and all sub-contextual information sequentially arranged in the LC information. Therefore, the LSTM was used for the artificial neural network. For efficient training and an adjustment of the outputs, dropout [24], fully connected [24], and softmax [25] layers were added to the artificial neural network, as shown in Table 8. The output dimension of each layer was set to 50, which is the total number of symbols used in this paper. To prevent an overfitting, the dropout ratio was set to 0.5.

5.2. Training and Testing

Two groups were used in the experiment. As shown in Table 9, in experiment group I, the TAP model was trained using 81,265 HMS data points. The TAP was tested at every epoch on the testing set. Subsequently, the entire training set was recognized again through the trained TAP model to obtain the HMS output dataset for experiment group II. In experiment group II, the CAC model was trained using LC data constructed by the method discussed in Section 5.1, along with the obtained HMS output dataset. At this time, 24,379 data points, which is 30% of the total training set, were used as the validation set. The CAC model of experiment group II was tested at every epoch on the validation set. The training of each experiment group ended before the decrease in recognition accuracy.

The model evaluation test measured the accuracy of the same testing set in both experiment groups I and II. The HMS recognition results of the testing set obtained using the trained TAP model of experiment group I were used as the HMS output data for experiment group II.

5.3. Results and Discussion

As shown in Table 10, the results for experiment group I showed that the accuracy of the TAP model, which recognized only the shape of the HMS, was 93.22%. On the other hand, the results of experiment group II showed that the recognition accuracy of the CAC model, which adjusts the HMS outputs of the TAP model by applying the LC information, was 97.15%, which was 3.93% higher than that of experiment group I. These results indicate that the LC information recognition improves the accuracy of the HMS recognition results.

The recognition accuracies of the TAP model for HMS in solution processes and answers are similar, at 93.20% and 93.29%, respectively, while the recognition accuracies of the CAC model differ by 96.48% and 99.71%, respectively. This means that the effect of using LC information in the solution processes is different from that in the correct answers.

More specifically, the recognition results of experiment groups I and II were compared, as shown in Table 11. In total, 404 data points, which were the symbols with an ambiguous representation misrecognized by the TAP model, were accurately adjusted through the CAC model. Conversely, 81 data points that were properly recognized in the TAP model were incorrectly recognized as they went through the CAC model; however, they accounted for 0.99% of the total data, which is a relatively small number.

Since HMS recognition and LC information recognition processes are independent of each other, not only the TAP model used in the experiment but also any recognition model that outputs the probability of each symbol as a result of HMS recognition can be linked with the CAC model. In addition, no matter which model it is interlocked with, the CAC model will be able to perform.

In the experiment, the actual LC data of e-learning systems could not be tested. In addition, there is a limitation in that data of various learning content ranges could not be tested. These are because sufficient LC data paired with HMS could not be obtained. Future work will further refine LC data and experiment with a wide range of data. In addition, some expressions entered by learners in solving processes and answers might not match model-solving processes and correct answers, respectively. If learners use symbols different from the ones proposed in the learning contents, LC data could worsen the recognition performance of the CAC model. The recognition method of HMS entered by learners inconsistent with LC data is a task to be studied in the future.

In this paper, a simple LSTM model is used as a method for recognizing learning context information in the CAC model. However, methods using more elaborately set LSTM or other artificial intelligence models (such as BLSTM) need to be studied in the future.

6. Conclusions

An e-learning system should support learners who learn mathematics to write mathematical expressions freely. However, handwritten mathematical expressions contain many ambiguous symbols. Most existing studies have mainly used the shape of the symbol to recognize the HMS. This method has limitations in terms of accurately predicting ambiguous symbols, even for humans.

In this paper, the CAC model was designed to use LC data and improve the results of existing studies on e-learning environments. In the CAC model, sufficient LC information was generated using data outside the expressions, i.e., LC data that are relatively indirectly related to the HMS. In the process of using LC information to adjust the output of the HMS recognition, the optimal weight is applied to each sub-contextual piece of LC information through an artificial neural network.

In the experiment, the existing and CAC models were trained and tested on a dataset similar to the actual learning environment. The results showed that the CAC model corrected the misrecognized results of the existing model, and the recognition accuracy improved. Therefore, it was found that the use of LC information proposed in this paper has a positive effect on improving the accuracy of HMS recognition.

Author Contributions

Conceptualization, S.-B.B. and J.-G.S.; methodology, S.-B.B., J.-G.S. and J.-S.P.; software, S.-B.B.; validation, S.-B.B., J.-G.S. and J.-S.P.; formal analysis, S.-B.B.; investigation, S.-B.B.; resources, S.-B.B.; data curation, S.-B.B.; writing—original draft preparation, S.-B.B.; writing—review and editing, S.-B.B., J.-G.S. and J.-S.P.; visualization, S.-B.B. and J.-G.S.; supervision, J.-G.S. and J.-S.P.; project administration, J.-G.S. and J.-S.P.; funding acquisition, J.-S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Definitions of Symbols and Functions in Section 3.

Section	Symbol/ Function	Definition	Equation
Section 3.1	$U$	the universal set of learning parts in the e-learning system	(1)
	$U_{1}$	the set of all learning topics	(2)
	$U_{2}$	the set of all questions	(3)
	$U_{3}$	the set of all solving process	(4)
	$U_{4}$	the set of all correct answers	(5)
	$L_{1} (P)$	$the learning part set of learning topics related to a learning part P$	(6)
	$L_{2} (P)$	$the learning part set of questions related to a learning part P$	(7)
	$L_{3} (P)$	$the learning part set of solving processes related to a learning part P$	(8)
	$L_{4} (P)$	$the learning part set of correct answers related to a learning part P$	(9)
	$F (P)$	$the set of all expressions included in a learning part P$	(10)
Section 3.2	$S (φ)$	$the set of all extracted symbols in an expression φ$	(11)
Section 3.2	$\bar{P} (s, X)$	$the set of all learning parts containing a symbol s$ $within a learning part set X$	(12)
Section 3.3	$P_{1} (x)$	$the learning topic that the learner is studying when HMS x$ is input
	$P_{2} (x)$	$the question that the learner is solving when HMS x$ is input
	$P_{3} (x)$	$the solving process of the question that the learner is solving when HMS x$ is input
	$P_{4} (x)$	$the correct answer of the question that the learner is solving when HMS x$ is input
	$i (x)$	$the value indicating which type of learning part when HMS x$ is input	(13)
	$D$	the ordered list of all symbols used as an output of the HMS recognition
	$n_{D}$	$the total number of symbols in symbol list D$
	$d_{i} \in D$	$the i$ $th symbol of symbol list D$
	$y_{o} (x)$	$the HMS output, which is the recognition output of HMS x$
	$y_{r} (x)$	$the context - applied output, which is the adjusted output of the HMS output y_{o} (x)$

Appendix B

Table A2. Definitions of Symbols and Functions in Section 4.

Section	Symbol/ Function	Definition	Equation
Section 4.2	$E_{n} (x)$	$the set of extracted symbols for each learning part P_{n} (x)$	(14)
	$E$	the matrix containing information about the symbols included in the expressions of each learning part	(15)
		$the set of solving processes related to all learning parts containing a symbol s$ $within a learning part set X$	(16)
	${\bar{L}}_{A n s} (s, X)$	$the set of answers related to all learning parts containing a symbol s$ $within a learning part set X$	(17)
	$f_{P r c} (s, X)$	$the frequency at which expressions containing a symbol s$ $are used in the solving processes related to learning parts including a symbol s$ $in a learning part set X$	(18)
	$f_{A n s} (s, X)$	$the frequency at which expressions containing a symbol s$ $are used in the correct answers related to learning parts including a symbol s$ $in a learning part set X$	(19)
	$R_{P r c}$	the matrix that represents symbol frequency rates of symbols in the solving process when they are used in the learning topics, the questions, the solving processes, and the correct answers	(20)
	$R_{A n s}$	the matrix that represents symbol frequency rates of symbols in the correct answer when they are used in the learning topics, the questions, the solving processes, and the correct answers	(21)
	$R_{o}$	$the matrix selected from R_{P r c}$ $and R_{A n s}$ $by reflecting the input position i (x)$ of the expression	(22)
Section 4.3	$R$	$the expected symbol matrix, which is calculated as R_{o} \otimes E$	(23)
Section 4.4	$C$	$the LC information matrix, which is a merge of the HMS output y_{o}$ $and the expected symbol matrix R$	(24)
Section 4.4	$θ^{r}$	$the index of the element with the maximum value in the context - applied output y_{r} (x)$

Appendix C

Table A3. Sample Symbol Frequency Rates in Solving Processes and Correct Answers for Symbols Extracted from Learning Topics.

Extracted Symbols of Learning Topics		Symbol Frequency Rate		Extracted Symbols of Learning Topics		Symbol Frequency Rate
Extracted Symbols of Learning Topics		Solving Process	Correct Answer	Extracted Symbols of Learning Topics		Solving Process	Correct Answer
Numbers	$0$	133/243 (55%)	51/100 (51%)	Signs	$-$	682/1326 (51%)	223/554 (40%)
	$1$	145/243 (60%)	41/100 (41%)		―(fraction)	423/1326 (32%)	123/554 (22%)
	$2$	126/243 (52%)	39/100 (39%)		$($	388/1083 (36%)	10/454 (2%)
	$5$	84/243 (35%)	35/100 (35%)		$)$	388/1083 (36%)	10/454 (2%)
					⋮	⋮	⋮

Appendix D

Table A4. Sample Symbol Frequency Rates in Solving Processes and Correct Answers for Symbols Extracted from Questions.

Extracted Symbols of Questions		Symbol Frequency Rate		Extracted Symbols of Questions		Symbol Frequency Rate
Extracted Symbols of Questions		Solving Process	Correct Answer	Extracted Symbols of Questions		Solving Process	Correct Answer
Numbers	$2$	816/1048 (78%)	244/423 (58%)	Uppercases	$A$	41/115 (36%)	2/25 (8%)
	$3$	535/821 (65%)	115/323 (36%)		$B$	28/92 (30%)	0/18 (0%)
	$1$	406/606 (67%)	98/216 (45%)		C	21/48 (44%)	2/9 (22%)
	$4$	279/517 (54%)	53/195 (27%)		S	29/40 (73%)	13/16 (81%)
	⋮	⋮	⋮		⋮	⋮	⋮
Lowercases	$x$	474/667 (71%)	169/253 (67%)	Signs	−	553/718 (77%)	200/290 (69%)
	a	331/436 (76%)	132/177 (75%)		(	265/664 (40%)	7/295 (2%)
	$y$	257/394 (65%)	94/156 (60%)		$)$	265/661 (40%)	7/294 (2%)
	$b$	214/314 (68%)	89/120 (74%)		$+$	424/644 (66%)	98/231 (42%)
	⋮	⋮	⋮		⋮	⋮	⋮

Appendix E

Table A5. Sample Frequency Rates in Different Solving Processes and Correct Answers for Symbols Extracted from Such Processes.

Extracted Symbols of Solving Processes		Symbol Frequency Rate		Extracted Symbols of Solving Processes		Symbol Frequency Rate
Extracted Symbols of Solving Processes		Solving Process ¹	Correct Answer	Extracted Symbols of Solving Processes		Solving Process ¹	Correct Answer
Numbers	$2$	1744/2367 (74%)	514/914 (56%)	Uppercases	$A$	270/488 (55%)	2/105 (2%)
	$1$	980/1638 (60%)	267/555 (48%)		$B$	96/259 (37%)	0/49 (0%)
	$3$	928/1542 (60%)	224/602 (37%)		$C$	46/137 (34%)	4/25 (16%)
	$4$	492/1011 (49%)	120/396 (30%)		$S$	68/82 (83%)	26/30 (87%)
	⋮	⋮	⋮		⋮	⋮	⋮
Lowercases	$x$	886/1297 (68%)	313/524 (60%)	Signs	$=$	2972/3269 (91%)	73/1189 (6%)
	$a$	560/786 (71%)	232/337 (69%)		$-$	1350/1750 (77%)	413/679 (61%)
	$b$	390/620 (63%)	146/224 (65%)		$+$	878/1391 (63%)	223/529 (42%)
	$y$	318/558 (57%)	155/260 (60%)		―(fraction)	710/1097 (65%)	175/422 (41%)
	⋮	⋮	⋮		⋮	⋮	⋮

¹ Only cases with two or more expressions in the solving process of one question were counted.

Appendix F

Table A6. Sample Symbol Frequency Rates in Solving Processes and Correct Answers for Symbols Extracted from Correct Answers.

Extracted Symbols of Correct Answers		Symbol Frequency Rate		Extracted Symbols of Correct Answers		Symbol Frequency Rate
Extracted Symbols of Correct Answers		Solving Process	Correct Answer	Extracted Symbols of Correct Answers		Solving Process	Correct Answer
Numbers	$2$	514/659 (78%)	303/303 (100%)	Uppercases	$S$	26/33 (79%)	13/13 (100%)
	$1$	267/494 (54%)	196/196 (100%)		$A$	2/6 (33%)	2/2 (100%)
	$3$	224/391 (57%)	160/160 (100%)		$V$	4/6 (67%)	3/3 (100%)
	$4$	120/287 (42%)	128/128 (100%)		$C$	4/4 (100%)	2/2 (100%)
	⋮	⋮	⋮		⋮	⋮	⋮
Lowercases	$x$	313/350 (89%)	173/173 (100%)	Signs	$-$	413/498 (83%)	223/223 (100%)
	$a$	232/257 (90%)	134/134 (100%)		―(fraction)	175/279 (63%)	123/123 (100%)
	$y$	155/178 (87%)	96/96 (100%)		$+$	223/261 (85%)	133/133 (100%)
	$b$	146/172 (85%)	91/91 (100%)		$=$	73/79 (92%)	33/33 (100%)
	⋮	⋮	⋮		⋮	⋮	⋮

References

Babli, M.; Rincon, J.A.; Onaindia, E.; Carrascosa, C.; Julian, V. Deliberative context-aware ambient intelligence system for assisted living homes. Hum.-Cent. Comput. Inf. Sci. 2021, 11, 19. [Google Scholar] [CrossRef]
Khowaja, S.A.; Yahya, B.N.; Lee, S.L. CAPHAR: Context-aware personalized human activity recognition using associative learning in smart environments. Hum.-Cent. Comput. Inf. Sci. 2020, 10, 35. [Google Scholar] [CrossRef]
Chan, K.; Yeung, D. Mathematical expression recognition: A survey. Int. J. Doc. Anal. Recogn. 2000, 3, 3–15. [Google Scholar] [CrossRef]
Chan, C.K. Stroke extraction for offline handwritten mathematical expression recognition. IEEE Access 2020, 8, 61565–61575. [Google Scholar] [CrossRef]
Zhang, T. New architectures for handwritten mathematical expressions recognition. In Image Processing; Université de Nantes: Nantes, France, 2017. [Google Scholar]
Zhang, J.; Du, J.; Zhang, S.; Liu, D.; Hu, Y.; Hu, J.; Wei, S.; Dai, L. Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 2017, 71, 196–206. [Google Scholar] [CrossRef]
Miller, E.G.; Viola, P.A. Ambiguity and constraint in mathematical expression recognition. Am. Assoc. Artif. Intell. 1998, 784–791. [Google Scholar] [CrossRef]
Kosmala, A.; Rigoll, G. On-line handwritten formula recognition using statistical methods. Fourteenth Int. Conf. Pattern Recognit. 1998, 2, 1306–1308. [Google Scholar] [CrossRef] [Green Version]
Chou, P.A. Recognition of Equations Using a Two-Dimensional Stochastic Context-Free Grammar. Vis. Commun. Image Process. IV 1989, 1199, 852–863. [Google Scholar] [CrossRef]
Álvaro, F.; Sánchez, J.A.; Benedí, J.M. An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit. 2016, 51, 135–147. [Google Scholar] [CrossRef] [Green Version]
Zhelezniakov, D.; Zaytsev, V.; Radyvonenko, O. Acceleration of Online Recognition of 2D Sequences using Deep Bidirectional LSTM and Dynamic Programming. Adv. Comput. Intell. 2019, 11507, 438–449. [Google Scholar] [CrossRef]
Naik, S.A.; Metkewar, P.S.; Mapari, S.A. Recognition of ambiguous mathematical characters within mathematical expressions. Symbiosis Institute of Computer Studies and Research. In Proceedings of the 2017 International Conference on Electrical Computer and Communication Technologies, Coimbatore, India, 22–24 February 2017; pp. 1–4. [Google Scholar] [CrossRef]
Álvaro, F.; Sánchez, J.A.; Benedí, J.M. Offline Features for Classifying Handwritten Math Symbols with Recurrent Neural Networks. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 2944–2949. [Google Scholar] [CrossRef] [Green Version]
Mahdavi, M.; Zanibbi, R.; Mouch`ere, H.; Viard-Gaudin, C.; Garain, U. ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In Proceedings of the 2019 International Conference on Document Analysis and Recognition, Sydney, NSW, Australia, 20–25 September 2019. [Google Scholar] [CrossRef]
Zhang, J.; Du, J.; Dai, L. Track, attend and parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimed. 2019, 21, 221–233. [Google Scholar] [CrossRef]
Degtyarenko, I.; Radyvonenko, O.; Bokhan, K.; Khomenko, V. Text/shape classifier for mobile applications with handwriting input. Int. J. Doc. Anal. Recogn. 2016, 19, 369–379. [Google Scholar] [CrossRef]
Wu, J.; Yin, F.; Zhang, Y.; Zhang, X.; Liu, C. Image-to-markup generation via paired adversarial learning. In Machine Learning and Knowledge Discovery in Databases; Springer International Publishing: Cham, Switzerland, 2019; pp. 18–34. [Google Scholar]
Le, A.; Nakagawa, M. A system for recognizing online handwritten mathematical expressions by using improved structural analysis. Int. J. Doc. Anal. Recog. 2016, 19, 305–319. [Google Scholar] [CrossRef]
Kim, H.C.; Lee, S.W. Document summarization model based on general context in RNN. J. Inf. Process. Syst. 2019, 15, 1378–1391. [Google Scholar] [CrossRef]
Om, K.; Boukoros, S.; Nugaliyadde, A.; McGill, T.; Dixon, M.; Koutsakis, P.; Wong, K. Modelling email traffic workloads with RNN and LSTM models. Hum.-Cent. Comput. Inf. Sci. 2020, 10, 1–16. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Olah, C. Understanding LSTM Networks. 2015. Available online: https://colah.github.io/posts/2015-08-Understanding-LSTMs (accessed on 17 January 2022).
Yang, T. Concept Plus Type Middle School Mathematics 2-1; Concept Volume; Visang Education: Seoul, Korea, 2011. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Wood, T. Softmax Function. 2019. Available online: https://deepai.org/machine-learning-glossary-and-terms/softmax-layer (accessed on 17 January 2022).

Figure 1. Samples of HMS misrecognitions. (a) Missing contextual data inside the expression. (b) Missing contextual data outside the expression.

Figure 2. Recognition process of handwritten mathematical expressions.

Figure 3. Composition of learning contents stored in e-learning systems.

Figure 4. Input position of expression.

Figure 5. CAC model using LC information to recognize HMS.

Figure 6. Application of LSTM to LC information recognition.

t

is the timestep; x_t, c_t, h_t are the input vector, cell state, and hidden state, respectively, at timestep t.

Figure 6. Application of LSTM to LC information recognition.

t

is the timestep; x_t, c_t, h_t are the input vector, cell state, and hidden state, respectively, at timestep t.

Table 1. Online handwritten mathematical expressions recognition results.

No	Team	Model (Based Method)	Recognition Data	Accuracy
No	Team	Model (Based Method)	Recognition Data	Structure + Symbol Labels	Structure
1	USTC-iFLYTEK	TAP (RNN ¹) [15]	Online data	80.73%	91.49%
2	Samsung R&D 1	PCFG (RNN ¹, PCFG ²) [11]	Online data	79.82%	89.32%
3	MyScript	MyScript Math recognizer (BLSTM ³, LSTM ⁴) [14]	Online data	79.15%	90.66%
4	Sun Yat-Sen U.	MyScript Interactive Ink [4]	Online data extracted from images	77.40%	88.82%
5	Samsung R&D 2	Text/shape classifier (SVM ⁵) [16]	Online data	65.97%	82.82%
6	PAL-v2	PAL-v2 (LSTM ⁴) [17]	Images converted from online data	62.55%	79.15%
7	MathType	MathType (LSTM ⁴) [14]	Images converted from online data	60.13%	79.15%
8	TUAT	body box (LSTM ⁴, PCFG ³, SVM ⁵) [18]	Online data and offline data (converted from online data)	39.95%	58.22%

¹ Recurrent Neural Network, ² Probabilistic Context-free Grammar, ³ Bidirectional Long Short-Term Memory, ⁴ Long Short-Term Memory, ⁵ Support Vector Machine.

Table 2. Sample symbols extracted from learning contents.

Learning Part	Learning Contents	Expressions	Extracted Symbols
Learning topic	<Linear Inequality> When the terms on the right side of the inequality are transposed to the left side, the inequality that appears in either (Linear Expression) < 0, (Linear Expression) > 0, (Linear Expression) ≤ 0, and (Linear Expression) ≥ 0 is called the linear inequality.	(Linear Expression) < 0	<, 0
		(Linear Expression) > 0	>, 0
		(Linear Expression) ≤ 0	≤, 0
		(Linear Expression) ≥ 0	≥, 0
Question	Find the range of values of the constant $a$ when the root of equation $x - 2 = \frac{x + a}{3}$ is not greater than 1.	$a$	$a$
		$x - 2 = \frac{x + a}{3}$	$x, a, -, =, +, \frac{}{}, 2, 3$
		$1$	$1$
Solving process	$3 x - 6 = x + a$ $2 x = a + 6$ $x = \frac{a + 6}{2}$ $Since x \leq 1$ , $\frac{a + 6}{2} \leq 1$ $a + 6 \leq 2$	$3 x - 6 = x + a$	$x, a, -, =, +, 3, 6$
		$2 x = a + 6$	$x, a, =, +, 2, 6$
		$x = \frac{a + 6}{2}$	$x, a, =, +, \frac{}{}, 6, 2$
		$x \leq 1$	$x, \leq, 1$
		$\frac{a + 6}{2} \leq 1$	$a, +, \frac{}{}, \leq, 6, 2, 1$
		$a + 6 \leq 2$	$a, +, \leq, 6, 2$
Correct answer	$a \leq - 4$	$a \leq - 4$	$a, \leq, -, 4$

Table 3. Symbols used in the experiment.

Index	Symbol	Latex	Index	Symbol	Latex	Index	Symbol	Latex
1	$7$	7	18	$b$	b	35	$c$	c
2	$1$	1	19	$a$	a	36	$A$	A
3	$\times$	\times	20	$F$	F	37	$B$	B
4	$t$	t	21	$C$	C	38	$[$	[
5	$-$	-	22	$5$	5	39	$]$	]
6	$2$	2	23	$9$	9	40	$<$	\lt
7	x	x	24	$8$	8	41	$L$	L
8	$=$	=	25	$π$	\pi	42	$h$	h
9	$n$	n	26	$d$	d	43	E	E
10	$y$	y	27	$\div$	\div	44	$V$	V
11	z	z	28	$0$	0	45	$s$	s
12	$)$	)	29	$g$	g	46	$q$	q
13	$($	(	30	$p$	p	47	$l$	l
14	$+$	+	31	$r$	r	48	$v$	v
15	$6$	6	32	$m$	m	49	$M$	M
16	$3$	3	33	$\leq$	\leq	50	$I$	I
17	$4$	4	34	$.$	.

Table 4. Composition of dataset used in the experiment.

Purpose	Training Set (HMS Recognition and LC Information Recognition)	Testing Set	Total
Number of data points	81,265	8212	89,477

Table 5. Classification of LC data according to whether the symbol is included in each learning part.

Learning Part	Whether to Include the Symbol (Case 00 to 15) ¹
Learning Part	00	01	02	03	04	05	06	07	08	09	10	11	12	13	14	15
Learning topic	×	×	×	×	×	×	×	×	○	○	○	○	○	○	○	○
Question	×	×	×	×	○	○	○	○	×	×	×	×	○	○	○	○
Solving process	×	×	○	○	×	×	○	○	×	×	○	○	×	×	○	○
Answer	×	○	×	○	×	○	×	○	×	○	×	○	×	○	×	○

¹ ○: the symbol is included in the learning part, ×: the symbol is not included in the learning part.

Table 6. Classification of LC data for symbol ‘2’ and the number of data.

Case	$Number of Data (Correct Symbol Is ‘ 2 ’)$			$Number of Data (Correct Symbol Is Not ‘ 2 ’)$
Case	Workbook	Train Dataset	Test Dataset	Workbook	Train Dataset	Test Dataset
00	58 (1.8%)	95 (1.8%)	10 (1.9%)	183 (10.4%)	6150 (10.4%)	620 (10.4%)
01	41 (1.2%)	66 (1.2%)	7 (1.3%)	62 (3.5%)	2084 (3.5%)	210 (3.5%)
02	136 (4.1%)	219 (4.1%)	22 (4.1%)	58 (3.3%)	1949 (3.3%)	196 (3.3%)
03	111 (3.4%)	179 (3.4%)	18 (3.4%)	41 (2.3%)	1378 (2.3%)	139 (2.3%)
04	244 (7.4%)	394 (7.4%)	40 (7.4%)	324 (18.4%)	10,888 (18.4%)	1097 (18.4%)
05	157 (4.8%)	253 (4.8%)	26 (4.8%)	179 (10.2%)	6015 (10.2%)	606 (10.2%)
06	980 (29.8%)	1581 (29.8%)	160 (29.8%)	244 (13.9%)	8200 (13.9%)	826 (13.9%)
07	1042 (31.7%)	1681 (31.7%)	170 (31.7%)	157 (8.9%)	5276 (8.9%)	532 (8.9%)
08	13 (0.4%)	21 (0.4%)	2 (0.4%)	227 (12.9%)	7628 (12.9%)	769 (12.9%)
09	4 (0.1%)	6 (0.1%)	1 (0.2%)	47 (2.7%)	1579 (2.7%)	159 (2.7%)
10	11 (0.3%)	18 (0.3%)	2 (0.4%)	13 (0.7%)	437 (0.7%)	44 (0.7%)
11	9 (0.3%)	15 (0.3%)	1 (0.2%)	4 (0.2%)	134 (0.2%)	14 (0.2%)
12	58 (1.8%)	94 (1.8%)	9 (1.7%)	56 (3.2%)	1882 (3.2%)	190 (3.2%)
13	48 (1.5%)	77 (1.5%)	8 (1.5%)	59 (3.4%)	1983 (3.4%)	200 (3.4%)
14	194 (5.9%)	313 (5.9%)	32 (6.0%)	58 (3.3%)	1949 (3.3%)	196 (3.3%)
15	178 (5.4%)	287 (5.4%)	29 (5.4%)	48 (2.7%)	1613 (2.7%)	163 (2.7%)
Total	3284	5299	537	1760	59,145	5961

Table 7. Sample data points assigned LC data.

CROME Dataset		LC Data
HMS	Correct Symbol	Extracted Symbols				Input Position
HMS	Correct Symbol	Learning Topic	Question	Solving Process	Correct Answer	Input Position
	$m$	$.$ $+$ $-$ $[$ $]$	$2$ $7$ $9$ $=$ $m$	$2$ $=$	$1$ $3$	Solving process
	$7$	$-$ $($ $[$ $]$	$2$ $3$ $8$ $b$ $x$	$2$ $3$ $8$ $=$ $b$	$0$ $3$ $b$	Solving process

Table 8. Configuration of LC information recognition based artificial neural network.

No.	Layer	Setting
1	LSTM	Output dimension = 50
2	Dropout	Rate = 0.5
3	Fully connected (dense)	Output dimension = 50
4	Softmax	Output dimension = 50

Table 9. Training set used for each experimental group.

Experimental Group	Dataset (81,265 Data Points)		Artificial Neural Network to Train
Experimental Group	Training Set	Validation Set	Artificial Neural Network to Train
I	HMS (81,265 data points)	-	TAP
II	HMS output data obtained using the model of experimental group I after training (56,886 data points)	24,379 data points (30%)	CAC
II	LC data (56,886 data points)	24,379 data points (30%)	CAC

Table 10. Experiment results.

Experimental Group	Model	Test Subject	Accuracy (Number of Symbols)
Experimental Group	Model	Test Subject	Solving Processes (6498)	Answers (1714)	Solving Process + Answers (8212)
I	TAP	Recognition of HMS	93.20% (6056)	93.29% (1599)	93.22% (7655)
II	TAP + CAC	Recognition of HMS outputs and LC data	96.48% (6269)	99.71% (1709)	97.15% (7978)

Table 11. Corrected and missed symbols using the CAC model.

Recognition Result		Number of Data	Symbols with Recognition Results
TAP	TAP + CAC	Number of Data	Output of TAP → Output of CAC (Number of Data)	HMS
Error	Correct	404 (4.92%)	$\times$ $\to x$ (47)	…
			$C$ $\to c$ (20)	…
			$x$ $\to \times$ (17)	…
			$t$ $\to +$ (15)	…
			⋮	⋮
Correct	Error	81 (0.99%)	$x \to \times$ (22)	…
			$1$ $\to)$ (5)
			$a$ $\to 9$ (4)
			$2$ $\to =$ (4)
			⋮	⋮

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Baek, S.-B.; Shon, J.-G.; Park, J.-S. CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems. Mathematics 2022, 10, 1277. https://doi.org/10.3390/math10081277

AMA Style

Baek S-B, Shon J-G, Park J-S. CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems. Mathematics. 2022; 10(8):1277. https://doi.org/10.3390/math10081277

Chicago/Turabian Style

Baek, Sung-Bum, Jin-Gon Shon, and Ji-Su Park. 2022. "CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems" Mathematics 10, no. 8: 1277. https://doi.org/10.3390/math10081277

APA Style

Baek, S.-B., Shon, J.-G., & Park, J.-S. (2022). CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems. Mathematics, 10(8), 1277. https://doi.org/10.3390/math10081277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CAC: A Learning Context Recognition Model Based on AI for Handwritten Mathematical Symbols in e-Learning Systems

Abstract

1. Introduction

2. Related Work

2.1. Handwritten Mathematical Expression Recognition

2.2. Pre-Existing Handwritten Mathematical Expression Recognition Models

3. LC Data

3.1. Composition of Learning Contents

3.2. Extracted Symbol and Input Position

3.3. LC Data from e-Learning System

4. CAC Model

4.1. Composition of CAC Model

4.2. LC Data Collection Module

4.2.1. Extracted Symbol Matrix Generation

4.2.2. Symbol Frequency Matrix Generation

4.3. LC Information Generation Module

4.4. HMS Output Correction Module and Output

5. Experiment

5.1. Experiment Environment

5.2. Training and Testing

5.3. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI