Handwritten Character Recognition on Android for Basic Education Using Convolutional Neural Network

: An international initiative called Education for All (EFA) aims to create an environment in which everyone in the world can get an education. Especially in developing countries, many children lack access to a quality education. Therefore, we propose an ofﬂine self-learning application to learn written English and basic calculation for primary level students. It can also be used as a supplement for teachers to make the learning environment more interactive and interesting. In our proposed system, handwritten characters or words written on tablets were saved as input images. Then, we performed character segmentation by using our proposed character segmentation methods. For the character recognition, the Convolutional Neural Network (CNN) was used for recognizing segmented characters. For building our own dataset, handwritten data were collected from primary level students in developing countries. The network model was trained on a high-end machine to reduce the workload on the Android tablet. Various types of classiﬁers (digit and special characters, uppercase letters, lowercase letters, etc.) were created in order to reduce the incorrect classiﬁcation. According to our experimental results, the proposed system achieved 95.6% on the 1000 randomly selected words and 98.7% for each character.


Introduction
Education is very important to human life. Education enables people to have a quality life by gaining knowledge and building character. Education provides a broad-based view of life, instilling ideals and teaching adaptability to a changing environment. Therefore, all citizens have the right to a quality education [1]. However, according to the UIS global data for the school year ending in 2018 [2], over 59 million children of primary school age were not in school. Most of the children in rural areas often sell goods on the street or work in the fields to supplement family income, as seen in [3]. The author also claims that students often walk more than 3 km to and from their schools every day since few schools are in rural areas. Moreover, the lack of resources and an insufficient number of teachers present major barriers to a quality education. How can we provide high quality education for all children equally? For these reasons, we proposed an Android application for primary education. Using this application, children can study anytime, anywhere. Transportation also will not be a problem anymore. The courses are developed according to quality standards. Since the application includes an automatic checking system, students can practice alone. The application is not only useful in rural areas, as children confined to their homes for any reason can get an education. Parents can arrange for their children to study at home without anxiety.
Research findings in [4] indicated that the advantages of using handwriting applications along with traditional teaching for children can improve handwriting skills. The

•
In the character segmentation part, we proposed the simple concepts of labeling and projection to overcome the segmentation problems that are found after initial seg-mentation. • Since the network model is trained on a high-end machine and converted into a very light classifier which will be deployed on mobile devices, therefore, the model can be loaded quickly for each character classification process.

•
The application provides an effective learning environment without requirements for experienced teachers or internet service. It can be accessed anytime, anywhere. The standard courses provide equally high quality for all users.
The rest of this paper is organized as follows. In Section 2, we review some related research on handwritten character segmentation and recognition on an Android system. Section 3 covers the steps of our proposed methods. The experimental results and discussion of the results are described in Section 4. The paper is concluded, and future work is discussed in Section 5.

Related Work
As a subset of optical character recognition (OCR), the technology for handwritten character recognition needs refinement, and is the subject of research worldwide. In recent years, OCR concepts have been used in applications for banking, healthcare, and the legal industry, e.g., for invoice imaging [11]. The authors in [12] and [13] apply various projection methods for the segmentation of digits and CNN for recognition. In another work, Kannada optical character recognition was implemented on the Android operating system for Kannada sign boards [14]. In the described research, a Kohonen network was used to recognize segmented characters. Other related research on recognizing English letters has also been carried out using a simple feature extraction and neural network [15]. A neural network was also applied to research on the conversion of English text into Marathi text. This research was also carried out on the Android platform. In this application, an image with English text is scanned with an Android camera, subjected to several image processing steps, recognized through a neural network, and then converted into Marathi text [16]. Not only for English text, but also for other languages such as Malayalam, a Convolutional Neural Network (CNN) is applied for character recognition in [17].
In Reference [18], the authors proposed a model by presenting Deep Convolutional Recurrent Network (DCRN) to recognize handwritten Japanese text lines. Their model is composed of three parts: the Convolutional Neural Network (CNN) and sliding window method, used for feature extraction from a handwritten image; Bidirectional Long short-term memory (BLSTM), applied for recurrent layers to make predictions; and finally, the Connectionist Temporal Classification (CTC) decoder, used for the transcription layer to make the recognition process. To extract textual information from medical laboratory reports, the authors in [19] proposed a system by using a deep learning approach. For text detection, Faster RCNN [20] combined with a patch-based training strategy is applied and Convolutional Neural Network with a concatenation structure is proposed for the text recognition part. For Arabic handwritten character recognition, the authors in [21] proposed a mobile application by using cloud computing and Google APIs. The handwritten recognition part is developed on a cloud computing platform with the approach proposed in [22]. Then, recognized text is returned to the mobile phone to apply in the Google map API for locating the Arabic address on the map, and the Google translate API for translating Arabic text to English text. In our prior work, we mainly focused on the character segmentation part that can handle segmentation errors [23]. In this paper, we modified our prior segmentation method to solve the over segmentation errors. We also combined it with the character recognition part to develop the complete application. In the handwritten character recognition field, machine learning and deep learning algorithms play an important role. In Reference [24], the authors proposed a handwritten character recognition model by using deep neural networks and TensorFlow libraries. To perform specific handwritten character recognition on mobile devices, the lightweight network model is designed to perform a character classification process [25]. The authors also described the steps that are applied in the data collection process by using mobile devices.
Many useful applications can be developed by using OCR on the Android platform, including our proposed system. Our system is developed with our own dataset for an effective application area which will be applied as a teaching aid for children's education. Since OCR is more reliable and provides more accurate results with machine learning algorithms, we need massive amounts of training and testing data. Our proposed system performs a network training process on a high-end machine instead of loading massive amounts of training data on the Android platform. Using simple image processing methods without a deep learning approach provides efficient and reliable segmentation results in our system. Our system can also give good recognition accuracy by using a simple network with lightweight, learnable parameters, which is more suitable for mobile devices.

Proposed System
The handwritten image is segmented and recognized. In the segmentation phase, the input image is firstly pre-processed. Secondly, the connected components are labeled. Thirdly, the individually labeled objects are segmented using the vertical projection method because some words might be written cursively. Finally, some over-segmented parts are removed, and the image is segmented with the remaining segmentation points. Then, the resulting segmented characters are recognized with a Convolutional Neural Network (CNN) classifier. The classifier is trained on a high-end machine and changed into an Android compatible (.tflite) file using the TensorFlow model. Finally, the classification result for each character is shown to the user. An overview of our target application is shown in Figure 1 and the flowchart is shown in Figure 2. This includes preprocessing for input characters or words, segmentation, character classification and the checking process. API for locating the Arabic address on the map, and the Google translate API for translating Arabic text to English text. In our prior work, we mainly focused on the character segmentation part that can handle segmentation errors [23]. In this paper, we modified our prior segmentation method to solve the over segmentation errors. We also combined it with the character recognition part to develop the complete application. In the handwritten character recognition field, machine learning and deep learning algorithms play an important role. In Reference [24], the authors proposed a handwritten character recognition model by using deep neural networks and TensorFlow libraries. To perform specific handwritten character recognition on mobile devices, the lightweight network model is designed to perform a character classification process [25]. The authors also described the steps that are applied in the data collection process by using mobile devices.
Many useful applications can be developed by using OCR on the Android platform, including our proposed system. Our system is developed with our own dataset for an effective application area which will be applied as a teaching aid for children's education. Since OCR is more reliable and provides more accurate results with machine learning algorithms, we need massive amounts of training and testing data. Our proposed system performs a network training process on a high-end machine instead of loading massive amounts of training data on the Android platform. Using simple image processing methods without a deep learning approach provides efficient and reliable segmentation results in our system. Our system can also give good recognition accuracy by using a simple network with lightweight, learnable parameters, which is more suitable for mobile devices.

Proposed System
The handwritten image is segmented and recognized. In the segmentation phase, the input image is firstly pre-processed. Secondly, the connected components are labeled. Thirdly, the individually labeled objects are segmented using the vertical projection method because some words might be written cursively. Finally, some over-segmented parts are removed, and the image is segmented with the remaining segmentation points. Then, the resulting segmented characters are recognized with a Convolutional Neural Network (CNN) classifier. The classifier is trained on a high-end machine and changed into an Android compatible (.tflite) file using the TensorFlow model. Finally, the classification result for each character is shown to the user. An overview of our target application is shown in Figure 1 and the flowchart is shown in Figure 2. This includes preprocessing for input characters or words, segmentation, character classification and the checking process.  In our current system, we do not work on the character-or word-checking process. In the word-checking section, the classified result is compared with the correct ground truth answer. When incorrect characters are found, these characters are highlighted. It will be combined in our future application. In our current system, we do not work on the character-or word-checking process. In the word-checking section, the classified result is compared with the correct ground truth answer. When incorrect characters are found, these characters are highlighted. It will be combined in our future application.

Segmentation
Segmentation is the process of separating individual characters from the handwritten word. We perform pre-processing on the input image before extracting characters. We use a labeling process to get each connected object. Then, we combine some characters such as i, j and ?. Since some characters might be connected to each other, they can be assumed to be one character if we only use the labeling approach. Therefore, we apply the projection profile method to segment cursive characters.

Pre-Processing
In the pre-processing step, the four lines from the ruled screen in the saved image must be removed without affecting the handwritten word. Moreover, the saved image must also be binarized and inverted to facilitate subsequent steps of image processing. Therefore, we perform pre-processing by inverting black and white pixels. We invert all black pixels into 1 and the other pixels into 0. Although the background in processed images is black in our system, a white background is used in this paper for better visualization. Since vertical projection is used for segmentation, a morphological thinning process is performed on the image.
Step-by-step pre-processing for the word "birthday" is illustrated in Figure 3.

Segmentation
Segmentation is the process of separating individual characters from the handwritten word. We perform pre-processing on the input image before extracting characters. We use a labeling process to get each connected object. Then, we combine some characters such as i, j and ?. Since some characters might be connected to each other, they can be assumed to be one character if we only use the labeling approach. Therefore, we apply the projection profile method to segment cursive characters.

Pre-Processing
In the pre-processing step, the four lines from the ruled screen in the saved image must be removed without affecting the handwritten word. Moreover, the saved image must also be binarized and inverted to facilitate subsequent steps of image processing. Therefore, we perform pre-processing by inverting black and white pixels. We invert all black pixels into 1 and the other pixels into 0. Although the background in processed images is black in our system, a white background is used in this paper for better visualization. Since vertical projection is used for segmentation, a morphological thinning process is performed on the image.
Step-by-step pre-processing for the word "birthday" is illustrated in Figure 3.

Labeling
After the pre-processing step, objects in the image are labeled in a four-step process. In the first step, connected objects are labeled. Since some characters such as i, j, ? and !

Labeling
After the pre-processing step, objects in the image are labeled in a four-step process. In the first step, connected objects are labeled. Since some characters such as i, j, ? and ! are composed of more than one object, we need to combine labeled objects as the second step. After combining labels, we create a new image with specific spaces between each labeled object to remediate overlapping characters in the third step. Since the projection profile approach is mainly used to segment cursive words, we should connect small disjoints in the fourth step to reduce over-segmented points.
In the proposed system, there are 72 characters (digit, alphabet and special characters). The connected object is referred to a group of connected pixels. Each connected object is assigned with a unique label number. For example, character i has two objects and these two objects need to combine as one label number, which is also called a combined label. In the following Figure 4

Labeling
After the pre-processing step, objects in the image are labeled in a four-step process. In the first step, connected objects are labeled. Since some characters such as i, j, ? and ! are composed of more than one object, we need to combine labeled objects as the second step. After combining labels, we create a new image with specific spaces between each labeled object to remediate overlapping characters in the third step. Since the projection profile approach is mainly used to segment cursive words, we should connect small disjoints in the fourth step to reduce over-segmented points.
In the proposed system, there are 72 characters (digit, alphabet and special characters). The connected object is referred to a group of connected pixels. Each connected object is assigned with a unique label number. For example, character i has two objects and these two objects need to combine as one label number, which is also called a combined label. In the following Figure 4  (a) Initial labeling using connected regions: The objects are labeled using the connected component labeling concept. The labeled number, horizontal start and end points, and vertical start and end points are retained for each labeled object. Labels are sorted in ascending order based on the horizontal start point. This allows for analyzing the characters according to the stroke order used in composing them. In the label-combination step, we use horizontal start and end points, as well as vertical start and end points of two labels to extract the location of each label from the image. Next, we replace the larger label with the smaller one. After that, we recalculate connected component objects in the modified data. Then, the label numbers, horizontal start and end points, and vertical start and end points are recalculated, and the labels are resorted to use in the next step. In this way, we correctly label characters.
(b) Label combination for some characters Generally, we use four categories for objects in the label combination process. They are completely covered objects, partially covered objects, uncovered objects and objects left open. Based on empirical results, we assume that an object with less than a 600-pixel count cannot be a complete character. For combining labels for objects in the first category,  (a) Initial labeling using connected regions: The objects are labeled using the connected component labeling concept. The labeled number, horizontal start and end points, and vertical start and end points are retained for each labeled object. Labels are sorted in ascending order based on the horizontal start point. This allows for analyzing the characters according to the stroke order used in composing them. In the label-combination step, we use horizontal start and end points, as well as vertical start and end points of two labels to extract the location of each label from the image. Next, we replace the larger label with the smaller one. After that, we recalculate connected component objects in the modified data. Then, the label numbers, horizontal start and end points, and vertical start and end points are recalculated, and the labels are resorted to use in the next step. In this way, we correctly label characters.
(b) Label combination for some characters Generally, we use four categories for objects in the label combination process. They are completely covered objects, partially covered objects, uncovered objects and objects left open. Based on empirical results, we assume that an object with less than a 600-pixel count cannot be a complete character. For combining labels for objects in the first category, if two consecutively written objects are completely covered by each other, and the pixel counts for both objects are less than 600, they are combined. For the second category, if two consecutively written labels are partially covered by each other by at least 20 pixels, they are combined.
For the third category, if objects with pixel counts of less than 600 are found, they are combined with their left or right labels. If the object is the first component of a word, it must be combined with the next object to the right, and if it is last component of a word, it is combined with its left object. If the object is neither the first nor the last component of a word, we must determine whether it is associated with an object to the left or right. To find associated objects, the center points of a found object must be calculated, along with the Electronics 2021, 10, 904 7 of 18 center points of objects to the left and right. Then, the found object is combined with the closest adjacent object. We should not combine labels if the character is a decimal point. Since the pixel count of a decimal point is also matched with a third constraint, it can be wrongly combined, so we add a constraint for a third category of objects. In this constraint, labels are not combined if the object is horizontally covered by both neighbors.
In the final category, since characters composed of non-contiguous strokes (left open) are not included in the set of recognizable characters in the proposed system, the left-open object is combined with the object to the left. Using the flowchart in Figure 5, we determine whether each labeled object is left open or not. We have illustrated the left-open detection process using the word "birthday" as a sample image. We must detect left-open data for the b character. In the sample image, only three objects (left-open data, r and a) are possibilities after the first two conditions are satisfied. Therefore, we can confirm the existence of left-open data after satisfying the third condition. For detecting characters such as s and g, we consider one final condition, as they might result in two separate objects if divided. However, they cannot have a gap that is greater than half the image height and might contain a hole after the line is added.
they are combined.
For the third category, if objects with pixel counts of less than 600 are found, they are combined with their left or right labels. If the object is the first component of a word, it must be combined with the next object to the right, and if it is last component of a word, it is combined with its left object. If the object is neither the first nor the last component of a word, we must determine whether it is associated with an object to the left or right. To find associated objects, the center points of a found object must be calculated, along with the center points of objects to the left and right. Then, the found object is combined with the closest adjacent object. We should not combine labels if the character is a decimal point. Since the pixel count of a decimal point is also matched with a third constraint, it can be wrongly combined, so we add a constraint for a third category of objects. In this constraint, labels are not combined if the object is horizontally covered by both neighbors.
In the final category, since characters composed of non-contiguous strokes (left open) are not included in the set of recognizable characters in the proposed system, the left-open object is combined with the object to the left. Using the flowchart in Figure 5, we determine whether each labeled object is left open or not. We have illustrated the left-open detection process using the word "birthday" as a sample image. We must detect left-open data for the b character. In the sample image, only three objects (left-open data, r and a) are possibilities after the first two conditions are satisfied. Therefore, we can confirm the existence of left-open data after satisfying the third condition. For detecting characters such as s and g, we consider one final condition, as they might result in two separate objects if divided. However, they cannot have a gap that is greater than half the image height and might contain a hole after the line is added.

(c) Label separation
In the label separation process, we create a new image with the same height as the current image with a new width, as calculated in (1). We use a space pixel count of 20 in our system. We put the first labeled object into a new image, and then add the next object 20 pixels away from the first labeled object's end point.
We place the next object similarly. We place a fixed segment point halfway across the space to the next object: where w (I new ) is the width of the new image, x ri is the rightmost x-coordinate of object I, x li is the leftmost x-coordinate of object I, n is the total number of objects, and k is the space pixel count.

(d) Small disjoint connection
In the small disjoint connecting process, we dilate and erode each object label with same structuring element in order to preserve the object size, and to connect small disjoints. If the number of holes in the resulting object is increased by exactly one, we replace the original object with a dilated and eroded object. Otherwise, we retain the original object.
All four steps in the labeling process are illustrated in Figure 6 using the word "birthday". space to the next object: where w (Inew) is the width of the new image, xri is the rightmost x-coordinate of object I, xli is the leftmost x-coordinate of object I, n is the total number of objects, and k is the space pixel count.

(d) Small disjoint connection
In the small disjoint connecting process, we dilate and erode each object label with same structuring element in order to preserve the object size, and to connect small disjoints. If the number of holes in the resulting object is increased by exactly one, we replace the original object with a dilated and eroded object. Otherwise, we retain the original object.
All four steps in the labeling process are illustrated in Figure 6 using the word "birthday".

Segmentation with Projection
Segmentation in cursive words is performed using vertical projection [26] which is the summation of pixel values in a row for each column. After connecting small disjoints, we perform projection segmentation for each segmented object of the new image. The steps of projection segmentation are described as follows: Step 1. Find vertical projection values for each x-coordinate in the segmented image using (2) and use the points where projection values are 1: where vp(x) is the vertical projection or histogram values of the x-coordinate, y is the ycoordinate, x is the x-coordinate, m is the height of the image and f(x,y) is the pixel value at (x,y) of the image.
Step 2. Use an average point if adjacent points are less than 7 pixels apart, in which a character cannot exist in the part segmented with the adjacent points.
Step 3. Retain points that match the following two constraints as projection segment points.

Segmentation with Projection
Segmentation in cursive words is performed using vertical projection [26] which is the summation of pixel values in a row for each column. After connecting small disjoints, we perform projection segmentation for each segmented object of the new image. The steps of projection segmentation are described as follows: Step 1. Find vertical projection values for each x-coordinate in the segmented image using (2) and use the points where projection values are 1: where vp(x) is the vertical projection or histogram values of the x-coordinate, y is the y-coordinate, x is the x-coordinate, m is the height of the image and f (x,y) is the pixel value at (x,y) of the image.
Step 2. Use an average point if adjacent points are less than 7 pixels apart, in which a character cannot exist in the part segmented with the adjacent points.
Step 3. Retain points that match the following two constraints as projection segment points.
• The first point of the group of adjacent points is not the leftmost point of the partial image. • The last point of the group of adjacent points is not the rightmost point of the partial image.
This process is illustrated in Figure 7. After performing the above steps for all segments, all projection segment points are combined with the fixed segment points obtained in the label combination step. This process is illustrated in Figure 7. After performing the above steps for all segments, all projection segment points are combined with the fixed segment points obtained in the label combination step.

Over-Segments' Removal
Over-segmented points are discarded based on the following three points of logic. These are closed character detection, left open character detection, and pixel count detection [27]. Closed character detection is useful for removing over-segments because an over-segmented character cannot have a loop or a semi-loop in its nearest neighbor so a segment line is discarded if neither of its neighbors has a closed character. A closed character is one with a loop or a semi-loop such as a, c, n and o [28]. A closed character is detected using four pairs of foreground pixel points, as illustrated in Figure 8. In other words, to detect a closed character, we use the following two vertical lines: from a1 to a2, and from b1 to b2, and we use the following two horizontal lines: from point c1 to c2, and d1 to d2. In this process, the empirical results indicate that a pixel distance of 20 is best used as the threshold.
1. Find two points, a1 and a2, whose x-coordinates are equal, and the differences between y-coordinates are greater than the threshold. 2. Similarly, find an additional two points, b1 and b2. In finding these points, the difference between the x-coordinates of a1 and b1 must also exceed the threshold. 3. Find c1 and c2, in the same way as finding a1 and a2, and then find d1 and d2 in the same way as finding b1 and b2. 4. If none of these four points are zero, the object is determined to be a closed character.
A left open object is again detected using the process described by the flowchart in Figure 5, and an over-segmented point between the object and its neighbor to the left is removed. If the pixel count of an object is less than the threshold (a number that is too

Over-Segments' Removal
Over-segmented points are discarded based on the following three points of logic. These are closed character detection, left open character detection, and pixel count detection [27]. Closed character detection is useful for removing over-segments because an over-segmented character cannot have a loop or a semi-loop in its nearest neighbor so a segment line is discarded if neither of its neighbors has a closed character. A closed character is one with a loop or a semi-loop such as a, c, n and o [28]. A closed character is detected using four pairs of foreground pixel points, as illustrated in Figure 8. In other words, to detect a closed character, we use the following two vertical lines: from a 1 to a 2 , and from b 1 to b 2 , and we use the following two horizontal lines: from point c 1 to c 2 , and d 1 to d 2 . In this process, the empirical results indicate that a pixel distance of 20 is best used as the threshold.

1.
Find two points, a 1 and a 2 , whose x-coordinates are equal, and the differences between y-coordinates are greater than the threshold.

2.
Similarly, find an additional two points, b 1 and b 2 . In finding these points, the difference between the x-coordinates of a 1 and b 1 must also exceed the threshold.

3.
Find c 1 and c 2 , in the same way as finding a 1 and a 2 , and then find d 1 and d 2 in the same way as finding b 1 and b 2 . 4.
If none of these four points are zero, the object is determined to be a closed character.

•
The last point of the group of adjacent points is not the rightmost point of the partial image.
This process is illustrated in Figure 7. After performing the above steps for all segments, all projection segment points are combined with the fixed segment points obtained in the label combination step.

Over-Segments' Removal
Over-segmented points are discarded based on the following three points of logic. These are closed character detection, left open character detection, and pixel count detection [27]. Closed character detection is useful for removing over-segments because an over-segmented character cannot have a loop or a semi-loop in its nearest neighbor so a segment line is discarded if neither of its neighbors has a closed character. A closed character is one with a loop or a semi-loop such as a, c, n and o [28]. A closed character is detected using four pairs of foreground pixel points, as illustrated in Figure 8. In other words, to detect a closed character, we use the following two vertical lines: from a1 to a2, and from b1 to b2, and we use the following two horizontal lines: from point c1 to c2, and d1 to d2. In this process, the empirical results indicate that a pixel distance of 20 is best used as the threshold.
1. Find two points, a1 and a2, whose x-coordinates are equal, and the differences between y-coordinates are greater than the threshold. 2. Similarly, find an additional two points, b1 and b2. In finding these points, the difference between the x-coordinates of a1 and b1 must also exceed the threshold. 3. Find c1 and c2, in the same way as finding a1 and a2, and then find d1 and d2 in the same way as finding b1 and b2. A left open object is again detected using the process described by the flowchart in Figure 5, and an over-segmented point between the object and its neighbor to the left is removed. If the pixel count of an object is less than the threshold (a number that is too A left open object is again detected using the process described by the flowchart in Figure 5, and an over-segmented point between the object and its neighbor to the left is removed. If the pixel count of an object is less than the threshold (a number that is too small to be a character), we remove the projection segment point for that segment of the object. All three points of logic used in removing over-segmented points are illustrated in Figure 9. In this figure, the blue segmentation lines are drawn using the labeled points. The red lines are drawn using the projection points. The 1st segment line is removed using the left open data constraint, the 6th line is removed using the closed character constraint, and the 7th line is removed using the pixel count constraint. small to be a character), we remove the projection segment point for that segment of the object. All three points of logic used in removing over-segmented points are illustrated in Figure 9. In this figure, the blue segmentation lines are drawn using the labeled points. The red lines are drawn using the projection points. The 1st segment line is removed using the left open data constraint, the 6th line is removed using the closed character constraint, and the 7th line is removed using the pixel count constraint.

Classification
After the handwritten characters are segmented, the final step is to classify the handwritten characters. We applied a convolutional neural network (CNN) model for feature extraction and classification processes. To perform character recognition on mobile phones and Android tablets, we used a framework based on an open-source software library called TensorFlow Lite [29]. We set up the TensorFlow Lite model in our system as described in [30]. TensorFlow was designed and developed by Google Brain. One of the advantages of using TensorFlow Lite is that the learning model does not need to be trained on mobile devices. The learning model is trained on a high-end machine, and then converted into the TensorFlow Lite model file (a .tflite file) by the converter. By using this file, the classification process can be performed on mobile devices.

Image Normalization
In preparing for training and testing, we first performed image size normalization. Each segmented character is normalized into a 64 × 64 pixel width and height. We converted an input image into a binary image, and extracted a region of interest (ROI). Then we performed a resizing operation. To adjust the aspect ratio, we primarily checked the number of rows and columns in the input image. In the first step, we resized the larger side into 64 pixels. In the second step, we padded zero values to the smaller size to fulfill 64 pixels. We inserted zero padding values from left and right for the image so that its width is smaller than its height. When the height of the image is smaller than its width, we inserted zero padding values from top and bottom. While resizing an input image, degradation can occur due to the number of its object pixels. To overcome this problem, we applied a morphological operation called thickening on the image, and then resized it. This process is shown in Figure 10.

Classification
After the handwritten characters are segmented, the final step is to classify the handwritten characters. We applied a convolutional neural network (CNN) model for feature extraction and classification processes. To perform character recognition on mobile phones and Android tablets, we used a framework based on an open-source software library called TensorFlow Lite [29]. We set up the TensorFlow Lite model in our system as described in [30]. TensorFlow was designed and developed by Google Brain. One of the advantages of using TensorFlow Lite is that the learning model does not need to be trained on mobile devices. The learning model is trained on a high-end machine, and then converted into the TensorFlow Lite model file (a .tflite file) by the converter. By using this file, the classification process can be performed on mobile devices.

Image Normalization
In preparing for training and testing, we first performed image size normalization. Each segmented character is normalized into a 64 × 64 pixel width and height. We converted an input image into a binary image, and extracted a region of interest (ROI). Then we performed a resizing operation. To adjust the aspect ratio, we primarily checked the number of rows and columns in the input image. In the first step, we resized the larger side into 64 pixels. In the second step, we padded zero values to the smaller size to fulfill 64 pixels. We inserted zero padding values from left and right for the image so that its width is smaller than its height. When the height of the image is smaller than its width, we inserted zero padding values from top and bottom. While resizing an input image, degradation can occur due to the number of its object pixels. To overcome this problem, we applied a morphological operation called thickening on the image, and then resized it. This process is shown in Figure 10.
small to be a character), we remove the projection segment point for that segment of the object. All three points of logic used in removing over-segmented points are illustrated in Figure 9. In this figure, the blue segmentation lines are drawn using the labeled points. The red lines are drawn using the projection points. The 1st segment line is removed using the left open data constraint, the 6th line is removed using the closed character constraint, and the 7th line is removed using the pixel count constraint.

Classification
After the handwritten characters are segmented, the final step is to classify the handwritten characters. We applied a convolutional neural network (CNN) model for feature extraction and classification processes. To perform character recognition on mobile phones and Android tablets, we used a framework based on an open-source software library called TensorFlow Lite [29]. We set up the TensorFlow Lite model in our system as described in [30]. TensorFlow was designed and developed by Google Brain. One of the advantages of using TensorFlow Lite is that the learning model does not need to be trained on mobile devices. The learning model is trained on a high-end machine, and then converted into the TensorFlow Lite model file (a .tflite file) by the converter. By using this file, the classification process can be performed on mobile devices.

Image Normalization
In preparing for training and testing, we first performed image size normalization. Each segmented character is normalized into a 64 × 64 pixel width and height. We converted an input image into a binary image, and extracted a region of interest (ROI). Then we performed a resizing operation. To adjust the aspect ratio, we primarily checked the number of rows and columns in the input image. In the first step, we resized the larger side into 64 pixels. In the second step, we padded zero values to the smaller size to fulfill 64 pixels. We inserted zero padding values from left and right for the image so that its width is smaller than its height. When the height of the image is smaller than its width, we inserted zero padding values from top and bottom. While resizing an input image, degradation can occur due to the number of its object pixels. To overcome this problem, we applied a morphological operation called thickening on the image, and then resized it. This process is shown in Figure 10.

Initial Classification with Convolutional Neural Network
For the CNN model configuration, we applied 20 convolutional filters with the same padding and a filter size of 3 × 3, and then applied a Rectified Linear Unit (ReLU) layer for the first hidden layer. For the second hidden layer, we used 40 filters and a filter size of 5 × 5, as well as a ReLU layer. For a third hidden layer, we used 60 filters with a filter size of 5 × 5 and a ReLU layer, followed by two fully connected layers. The first fully connected layer has 1000 nodes. The size (n) of a second fully connected or classification layer varies in size for each type of classification. In order to downsample the layer size, we performed max pooling after each convolution operation with a size of 2 × 2, and a stride of 2. We proposed different types of classifiers and set the number of nodes for each character type in the final classification layer as follows: digits and special characters (20), uppercase (26), lowercase (26), the combination of uppercase and lowercase (52), digit and lowercase (36) and the combination of all types (72). For the training parameters, we used the Adam optimizer [31] with a learning rate of 0.001 and 4 epochs. The network architecture is illustrated in Figure 11.

Initial Classification with Convolutional Neural Network
For the CNN model configuration, we applied 20 convolutional filters with the same padding and a filter size of 3 × 3, and then applied a Rectified Linear Unit (ReLU) layer for the first hidden layer. For the second hidden layer, we used 40 filters and a filter size of 5 × 5, as well as a ReLU layer. For a third hidden layer, we used 60 filters with a filter size of 5 × 5 and a ReLU layer, followed by two fully connected layers. The first fully connected layer has 1000 nodes. The size (n) of a second fully connected or classification layer varies in size for each type of classification. In order to downsample the layer size, we performed max pooling after each convolution operation with a size of 2 × 2, and a stride of 2. We proposed different types of classifiers and set the number of nodes for each character type in the final classification layer as follows: digits and special characters (20), uppercase (26), lowercase (26), the combination of uppercase and lowercase (52), digit and lowercase (36) and the combination of all types (72). For the training parameters, we used the Adam optimizer [31] with a learning rate of 0.001 and 4 epochs. The network architecture is illustrated in Figure 11.

Differentiating Uppercase and Lowercase Characters
The system is designed to recognize digits, uppercase characters, lowercase characters, and some special characters. In the final result, however, some characters are confused between uppercase and lowercase. In order to solve this problem, we relied on a strength of our proposed system. It provides four lines on a ruled screen as guidelines to help children write. We calculate the threshold value from the first two lines of the four lines area. The midpoint between the first two lines is used as the threshold value. If the start point of the character is under the threshold value, it is considered a lowercase letter and otherwise an uppercase letter. The purpose of differentiating uppercase and lowercase characters is to correctly display on the proposed application. Our list of easily confused characters is the following: C, c, O, o, P, p, S, s, U, u, V, v, W, w, X, x, Y, y, Z and z. If the recognized object is on this list, we used this differentiation approach. This process is illustrated in Figure 12.

Differentiating Uppercase and Lowercase Characters
The system is designed to recognize digits, uppercase characters, lowercase characters, and some special characters. In the final result, however, some characters are confused between uppercase and lowercase. In order to solve this problem, we relied on a strength of our proposed system. It provides four lines on a ruled screen as guidelines to help children write. We calculate the threshold value from the first two lines of the four lines area. The midpoint between the first two lines is used as the threshold value. If the start point of the character is under the threshold value, it is considered a lowercase letter and otherwise an uppercase letter. The purpose of differentiating uppercase and lowercase characters is to correctly display on the proposed application. Our list of easily confused characters is the following: C, c, O, o, P, p, S, s, U, u, V, v, W, w, X, x, Y, y, Z and z. If the recognized object is on this list, we used this differentiation approach. This process is illustrated in Figure 12.
For the CNN model configuration, we applied 20 convolutional filters with the same padding and a filter size of 3 × 3, and then applied a Rectified Linear Unit (ReLU) layer for the first hidden layer. For the second hidden layer, we used 40 filters and a filter size of 5 × 5, as well as a ReLU layer. For a third hidden layer, we used 60 filters with a filter size of 5 × 5 and a ReLU layer, followed by two fully connected layers. The first fully connected layer has 1000 nodes. The size (n) of a second fully connected or classification layer varies in size for each type of classification. In order to downsample the layer size, we performed max pooling after each convolution operation with a size of 2 × 2, and a stride of 2. We proposed different types of classifiers and set the number of nodes for each character type in the final classification layer as follows: digits and special characters (20), uppercase (26), lowercase (26), the combination of uppercase and lowercase (52), digit and lowercase (36) and the combination of all types (72). For the training parameters, we used the Adam optimizer [31] with a learning rate of 0.001 and 4 epochs. The network architecture is illustrated in Figure 11.

Differentiating Uppercase and Lowercase Characters
The system is designed to recognize digits, uppercase characters, lowercase characters, and some special characters. In the final result, however, some characters are confused between uppercase and lowercase. In order to solve this problem, we relied on a strength of our proposed system. It provides four lines on a ruled screen as guidelines to help children write. We calculate the threshold value from the first two lines of the four lines area. The midpoint between the first two lines is used as the threshold value. If the start point of the character is under the threshold value, it is considered a lowercase letter and otherwise an uppercase letter. The purpose of differentiating uppercase and lowercase characters is to correctly display on the proposed application. Our list of easily confused characters is the following: C, c, O, o, P, p, S, s, U, u, V, v, W, w, X, x, Y, y, Z and z. If the recognized object is on this list, we used this differentiation approach. This process is illustrated in Figure 12.

Data Collection
All experimental work was performed on a self-collected handwritten dataset. These data were collected from students in the targeted countries, and their grade range was from kindergarten to primary school. Sample rough data are illustrated in Figure 13. We collected handwritten data in the traditional way from Nepal, Myanmar, and the Philippines in the same format. Data from Bangladeshi students were also collected in a traditional way, but by using a different format. We also collected data from students in Myanmar using an Android tablet. We created a simple application to collect data written on an Android tablet. This application features a screen on which to draw, and two buttons: save and clear. The amount of data in our dataset is shown in Table 1.
All experimental work was performed on a self-collected handwritten dataset. These data were collected from students in the targeted countries, and their grade range was from kindergarten to primary school. Sample rough data are illustrated in Figure 13. We collected handwritten data in the traditional way from Nepal, Myanmar, and the Philippines in the same format. Data from Bangladeshi students were also collected in a traditional way, but by using a different format. We also collected data from students in Myanmar using an Android tablet. We created a simple application to collect data written on an Android tablet. This application features a screen on which to draw, and two buttons: save and clear. The amount of data in our dataset is shown in Table 1.   Bangladesh  5107  758  754  350  6969  Nepal  221  600  385  125  1331  Myanmar  451  788  1026  120  2385  Philippines  134  690  628  283  1735  Tablet  collected data  2450  6287  6292  2217  17,246   Total  8363  9123  9085  3095 29,666

Dataset Information Digits Uppercase Lowercase Special Characters Total
The proposed system aims to help children write the alphabet, digits, and some mathematical characters. Therefore, we used different types of classifiers for these characters. The classifier types and their respective ID numbers are listed in Table 2. We performed training and testing operations on each classifier. These images were split in an 80:20 ratio between training and testing. The training accuracy for each classifier is also shown in Table 2. While performing the training and testing processes on our self-collected handwritten dataset, the differentiation of uppercase and lowercase condition was not applied. This condition is only applied on Android applications of our proposed system.   Bangladesh  5107  758  754  350  6969  Nepal  221  600  385  125  1331  Myanmar  451  788  1026  120  2385  Philippines  134  690  628  283  1735  Tablet  collected data  2450  6287  6292  2217  17,246   Total  8363  9123  9085  3095 29,666 The proposed system aims to help children write the alphabet, digits, and some mathematical characters. Therefore, we used different types of classifiers for these characters. The classifier types and their respective ID numbers are listed in Table 2. We performed training and testing operations on each classifier. These images were split in an 80:20 ratio between training and testing. The training accuracy for each classifier is also shown in Table 2. While performing the training and testing processes on our self-collected handwritten dataset, the differentiation of uppercase and lowercase condition was not applied. This condition is only applied on Android applications of our proposed system.

Experimental Results
The proposed CNN model (Mp) is compared with two other CNN models: model 1 (M 1 ) and model 2 (M 2 ). Each model has a different architecture, filter size, and output feature map. The detailed architecture for each model is shown in Table 3. In order to downsample the image layer size, we performed max pooling on the ReLU layer after each convolution operation with a size of 2 × 2, and a stride of 2. The first fully connected layers are composed of 80 and 100 nodes for M 1 and M 2 . For the second fully connected layer or the classification layer, the output node sizes are same as with the proposed system's architecture. Table 4 provides the testing accuracies of the proposed system, compared with those for models M 1 and M 2 . According to the experimental results, the proposed model (Mp) gives higher accuracy than the other compared models. According to our experimental results, the proposed system has a recognition accuracy of over eighty percent for each classification for digits and special characters, uppercase letters, lowercase letters, the combination of uppercase and lowercase letters, digit and lowercase letters, and the combination of all characters.
Our proposed model also makes a comparison with other methods, such as Histogram of Oriented Gradient (HOG) [32] features with the multiclass Support Vector Machine (SVM) classifier and AlexNet [33], on our local handwritten dataset. The same training and testing dataset are applied on both models. For HOG feature extraction, we empirically set a 4 × 4 cell size with 9 bins orientation to extract prominent features from each image. The Alexnet model was set up as described in [34]. For AlexNet implementation on our dataset images, we set a stochastic gradient descent with momentum (SGDM) as the optimizer, a batch size of 64 images and the learning rate to 0.0001 with 4 epochs. The testing accuracies of the proposed model and other compared models are shown in Table 5. According to the experimental result, the proposed system has good recognition accuracy than the other compared models in all classifier types. The confusion matrix of the Type 1 classifier (digit and special character) is shown in Figure 14.
on our dataset images, we set a stochastic gradient descent with momentum (SGDM) as the optimizer, a batch size of 64 images and the learning rate to 0.0001 with 4 epochs. The testing accuracies of the proposed model and other compared models are shown in Table  5. According to the experimental result, the proposed system has good recognition accuracy than the other compared models in all classifier types. The confusion matrix of the Type 1 classifier (digit and special character) is shown in Figure 14. We then developed an Android application for our proposed system. In our application, we used Android studio IDE (version 3.4.1). The OpenCV computer vision library (opencv-3.4.6-android-sdk) is imported into our system to incorporate built-in image processing modules. For the classification process, we used a TensorFlow Lite (version 1.13.1) We then developed an Android application for our proposed system. In our application, we used Android studio IDE (version 3.4.1). The OpenCV computer vision library (opencv-3.4.6-android-sdk) is imported into our system to incorporate built-in image processing modules. For the classification process, we used a TensorFlow Lite (version 1.13.1) to transform the trained network file (.h5) into a file (.tflite) that can be applied on Android phones and tablets. The application interface is shown in Figure 15. Students can write in a ruled text area with four lines provided as writing guidelines. This ruled interface is useful in distinguishing similar uppercase and lowercase characters, such as C, c, O, o, P and p. The user can choose between the various types of classification by entering the classifier ID number in classifier type (for example, 1 for digits and special characters). The user can write on the ruled screen with four green guidelines, and then push the classify button to get a classification result.
The classification results for each character and for the number of characters are shown to the user. After adding lecture contents to the application, inputting the classifier type will no longer be necessary.
We tested our proposed system using 1000 randomly selected words written by children. Table 6 provides the segmentation accuracy by word. The recognition accuracy is calculated in two ways: for words and for characters. The results are described in Table 7. Some correctly recognized words together with their segmentation lines and experimental results are shown in Figure 16. and p. The user can choose between the various types of classification by entering the classifier ID number in classifier type (for example, 1 for digits and special characters). The user can write on the ruled screen with four green guidelines, and then push the classify button to get a classification result.
The classification results for each character and for the number of characters are shown to the user. After adding lecture contents to the application, inputting the classifier type will no longer be necessary. We tested our proposed system using 1000 randomly selected words written by children. Table 6 provides the segmentation accuracy by word. The recognition accuracy is calculated in two ways: for words and for characters. The results are described in Table 7. Some correctly recognized words together with their segmentation lines and experimental results are shown in Figure 16.   shown to the user. After adding lecture contents to the application, inputting the classifier type will no longer be necessary. We tested our proposed system using 1000 randomly selected words written by children. Table 6 provides the segmentation accuracy by word. The recognition accuracy is calculated in two ways: for words and for characters. The results are described in Table 7. Some correctly recognized words together with their segmentation lines and experimental results are shown in Figure 16.

Limitations
Based on these results, our system cannot correctly segment for some characters such as a disjoint U or d. This is because U and LI are easily confused, as are d and cl. Wrongly segmented and recognized words are shown in Figures 17 and 18, respectively. The recognition results for some messy images are shown in Figure 19. The overall accuracy of our proposed system on the 1000 words in our randomly tested data was 95.60%, and the overall accuracy on 4607 characters was 98.76%.
Based on these results, our system cannot correctly segment for some characters such as a disjoint U or d. This is because U and LI are easily confused, as are d and cl. Wrongly segmented and recognized words are shown in Figures 17 and 18, respectively. The recognition results for some messy images are shown in Figure 19. The overall accuracy of our proposed system on the 1000 words in our randomly tested data was 95.60%, and the overall accuracy on 4607 characters was 98.76%.

Conclusions
In this paper, we proposed an application for handwritten characters that can be applied as a teaching aid for children. The application enables self-study, so that children can learn without experienced teachers. We also proposed a character segmentation processes to overcome the segmentation problems. For character recognition, we applied the CNN classifier and processed each segmented character. The recognition time for one character took approximately 0.3 s on a mobile phone or tablet, and it is an acceptable time. In future work, we will combine the word-checking process in our current system to check answers immediately. Moreover, we will collect more handwritten data to cover the various handwriting styles. Based on these results, our system cannot correctly segment for some characters such as a disjoint U or d. This is because U and LI are easily confused, as are d and cl. Wrongly segmented and recognized words are shown in Figures 17 and 18, respectively. The recognition results for some messy images are shown in Figure 19. The overall accuracy of our proposed system on the 1000 words in our randomly tested data was 95.60%, and the overall accuracy on 4607 characters was 98.76%.

Conclusions
In this paper, we proposed an application for handwritten characters that can be applied as a teaching aid for children. The application enables self-study, so that children can learn without experienced teachers. We also proposed a character segmentation processes to overcome the segmentation problems. For character recognition, we applied the CNN classifier and processed each segmented character. The recognition time for one character took approximately 0.3 s on a mobile phone or tablet, and it is an acceptable time. In future work, we will combine the word-checking process in our current system to check answers immediately. Moreover, we will collect more handwritten data to cover the various handwriting styles. Based on these results, our system cannot correctly segment for some characters such as a disjoint U or d. This is because U and LI are easily confused, as are d and cl. Wrongly segmented and recognized words are shown in Figures 17 and 18, respectively. The recognition results for some messy images are shown in Figure 19. The overall accuracy of our proposed system on the 1000 words in our randomly tested data was 95.60%, and the overall accuracy on 4607 characters was 98.76%.

Conclusions
In this paper, we proposed an application for handwritten characters that can be applied as a teaching aid for children. The application enables self-study, so that children can learn without experienced teachers. We also proposed a character segmentation processes to overcome the segmentation problems. For character recognition, we applied the CNN classifier and processed each segmented character. The recognition time for one character took approximately 0.3 s on a mobile phone or tablet, and it is an acceptable time. In future work, we will combine the word-checking process in our current system to check answers immediately. Moreover, we will collect more handwritten data to cover the various handwriting styles.

Conclusions
In this paper, we proposed an application for handwritten characters that can be applied as a teaching aid for children. The application enables self-study, so that children can learn without experienced teachers. We also proposed a character segmentation processes to overcome the segmentation problems. For character recognition, we applied the CNN classifier and processed each segmented character. The recognition time for one character took approximately 0.3 s on a mobile phone or tablet, and it is an acceptable time. In future work, we will combine the word-checking process in our current system to check answers immediately. Moreover, we will collect more handwritten data to cover the various handwriting styles. Funding: This research was supported in part by the Strategic Foundational Technology Improvement Support Operation, Japan.