Evaluation and Recognition of Handwritten Chinese Characters Based on Similarities

: To accurately recognize ordinary handwritten Chinese characters, it is necessary to recognize the normative level of these characters. This study proposes methods to quantitatively evaluate and recognize these characters based on their similarities. Three different types of similarities, including correlation coefﬁcient, pixel coincidence degree, and cosine similarity, are calculated between handwritten and printed Song typeface Chinese characters. Eight features are derived from the similarities and used to verify the evaluation performance and an artiﬁcial neural network is used to recognize the character content. The results demonstrate that our proposed methods deliver satisfactory evaluation effectiveness and recognition accuracy (up to 98%~100%). This indicates that it is possible to improve the accuracy in recognition of ordinary handwritten Chinese characters by evaluating the normative level of these characters and standardizing writing actions in advance. Our study can offer some enlightenment for developing methods for the identiﬁcation of handwritten Chinese characters used in transaction processing activities.


Introduction
The field of text recognition can be roughly divided into two categories.The first category is for text content identification, and the second category is for authorization identification of signatures.The method proposed in our paper contains both functions of identifying the content and authorization.Therefore, this research lies between the need for confirmation of information content and the need for authorization identification, and this field has not been studied by others.After continuous exploration in recent years, researchers have found many effective Chinese character recognition methods, and the recognition accuracy has also been greatly improved.However, a major factor restricting the further improvement of the accuracy of Chinese character recognition at present is that there are some very scribbled Chinese characters in the test set.There must be differences in the Chinese characters written on the touch screen, especially when conducting banking transactions.When writing the words "同意办理" (means "agree to proceed") on a touch screen, it will be different from the Chinese characters we normally write on paper.Sometimes, handwritten characters can be difficult for manual recognition.If the neatness of the writing of Chinese characters can be standardized in advance, it will be of great benefit to improve the recognition rate.
The recognition of off-line handwritten Chinese characters has received considerable research attention due to its wide application and ability to offer significant economic and social benefits [1].The recognition is usually realized based on the features derived from static two-dimensional (2D) images [2].However, major challenges still remain due to the large number of Chinese characters in various writing styles, similar and easily confused characters, and the lack of handwritten Chinese character sets [3,4].Despite that, many methods have been developed to improve the recognition accuracy, even the best model, which is based on the traditional modified quadratic discriminant function (MQDF), falls far behind human beings when it comes to recognition accuracy [5].One of the reasons may be that the algorithms have little experience in analyzing scrawled and non-standard handwritten Chinese characters.Therefore, it is important to develop a quantitative measure for evaluating handwritten Chinese characters based on their normative level.Such a measure not only helps to standardize writing actions and improve the recognition accuracy but also facilitates the development of Chinese character identification methods based on handwritings.
Up to now, most Chinese character evaluation methods are focused on calligraphy works with the aim to help learners improve their writing skills.According to the characteristics of characters in different fields, researchers have created various datasets to conduct in-depth research on handwritten Chinese character recognition.For example, Perez et al. used computer technology to simulate human's understanding of calligraphy and built a database that contained a large number of images of handwritten characters which were labeled as "beautiful" or "ugly" by two calligraphy connoisseurs [6].By training a classifier based on the K-nearest neighbors algorithm, they labeled the test images automatically without quantitative details.Kusetogullari et al. [7] introduced a new image-based dataset of handwritten historical digits named Arkiv Digital Sweden (ARDIS).The images in the ARDIS dataset are drawn from 15,000 Swedish church records written in different handwriting styles by different priests in the 19th and 20th centuries.The constructed dataset consists of three one-digit datasets and one-digit string datasets.Experimental results show that machine learning algorithms (including deep learning methods) face difficulties in training on existing datasets and testing on ARDIS datasets, resulting in low recognition accuracy.Convolutional neural networks trained with MNIST and USPS and tested on ARDIS provided the highest accuracy rates of 58.80% and 35.44%, respectively.The NIST dataset represents a logical extension of the classification task in the MNIST dataset, and Cohen et al. [8] proposed a method to convert this dataset into a format that is directly compatible with classifiers built to process the MNIST dataset.Balaha et al. [9] provide a large and complex Arabic handwritten character dataset (HMBD) for designing an Arabic handwritten character recognition (AHCR) system.Using HMBD and two other datasets: CMATER and AIA9k, 16 experiments were applied to the system.By using data augmentation, the best results for test accuracy of the two datasets were 100% and 99.0%, respectively.In order to recognize handwritten digit strings in images of Swedish historical documents using a deep learning framework, a large dataset of historical handwritten digits named DIDA is introduced, which consists of historical Swedish handwritten documents written between 1800 and 1940 [10].Li et al. [11] proposed an image-only model for performing handwritten Chinese character recognition.The motivation for this approach came from the desire to have a purely data-driven approach, thus avoiding any associated subjective decisions that require feature extraction.However, this method requires the complexity of computing fuzzy similarity relations.The study showed that frame and single structural features proved to be the easiest to classify, with a 100% classification rate.The upper and lower and surrounded characters are more challenging, with 85% and 83.33% accuracy, respectively.
In order to realize the accurate evaluation of handwritten Chinese characters, researchers continue to use various methods.Xu et al. [12] first proposed a concept learningbased method for handwritten Chinese character recognition.Different from the existing character recognition methods based on image representation, this method uses prior knowledge to build a meta-stroke library, and then uses the Chinese character stroke extraction method and Bayesian program learning to propose a Chinese character conceptual model based on stroke relationship learning.Ren et al. [13] proposed an online handwritten Chinese character end-to-end recognizer based on a new recurrent neural network (RNN).
In the system, the RNN was used to directly process the raw sequence data through simple coordinate normalization, and the recognition accuracy reached 97.6%.In order to solve the problems of low-quality historical Chinese character images and lack of annotated training samples, Cai et al. [14] proposed a transfer learning method based on a generative adversarial network (GAN) to alleviate these problems.After the fine-tuning training process with the real target domain training data set, the final recognition accuracy of historical Chinese characters reached 84.95%.Liu et al. [15] proposed a method combining deep convolutional neural networks and support vector machines.The features of Chinese characters were automatically learned and extracted by using a deep convolutional neural network, and then the extracted features were classified and recognized by a support vector machine.Experiments showed that the deep convolutional neural network can effectively extract features, avoiding the shortage of manual feature extraction, and the accuracy was further improved.
To achieve accurate evaluations, it is necessary to introduce more features and calculate various indexes which can describe the global or local features of characters.Sun et al. proposed 22 global shape features and a new 10-dimensional feature vector to describe the overall properties of characters and represent their layout information, respectively [16].Based on these features, they developed an artificial neural network model to quantitatively evaluate handwritten Chinese characters and its performance was comparable to that of human beings.Wang et al. presented a calligraphy evaluation system that worked by analyzing the characters' direction and font shape [17].Several types of parameters, including roundness index, smooth index, width index, "Sumi" ratio, and stability index, were calculated to provide a quantitative reference for learners.Considering that Chinese calligraphy falls within the domain of visual art, Xu et al. proposed a numerical method to evaluate calligraphy works from an aesthetic point of view, including stroke shape, spatial layout, style coherence, and the whole character [18].However, the algorithm was complex, and for scrawled Chinese characters, it even required manually marking the strokes for feature recognition and evaluation.Wang et al. proposed an approach from another important perspective, i.e., by comparing the test calligraphy images with the standard ones [19].They used a method that combined vectorization of disk B-spline curves and an iterative closest point algorithm to evaluate the similarities between the whole Chinese characters and strokes and finally worked out a composite evaluation score.Although these methods are effective in quantitatively evaluating calligraphy works, further adjustments should be made to the evaluation perspectives and extracted features based on the normative level of the characters.
Similarity is a basis for classification and it is usually used to measure how similar two elements are and to determine the extent to which two elements are similar to each other in terms of their features.Here, we consider normative Chinese characters as those having no scribble and a structure and style close to those printed in Song typeface or regular script.The degree of similarity between handwritten Chinese characters and printed Song typeface characters is used as a quantitative criterion to evaluate the normative level.Aiming at the evaluation of off-line handwritten Chinese characters writing neatness, this paper takes the common Chinese characters "同意办理" (agree to proceed) in banking business processing as the research object, and evaluates the neatness of handwritten Chinese characters by constructing a sufficiently small dataset.Four handwritten Chinese characters "同", "意", "办" and "理", which are frequently used in banks or communication dealings, are chosen for evaluation and recognition.We made a thorough comparison among three different similarities, including correlation coefficient, pixel coincidence degree, and cosine similarity.Then, we extracted eight features from the three similarities to differentiate the characters written by different persons with different normative levels and used an artificial neural network to recognize the character content.This strategy enables not only quantitative evaluation and but also more accurate recognition of ordinary handwritten Chinese characters.
Here is a summary of our contribution in this work: 1.
In order to verify the accuracy of the similarity evaluation, this paper established a small "同意办理" (means "agree to proceed") handwritten library written by 5 people as required.The library consists of 20 "同意办理" written by each person, with a total of 400 Chinese characters.

2.
In order to eliminate the influence of the stroke thickness of Chinese characters on the similarity calculation, this paper uses a parallel iterative Z-S skeleton extraction algorithm to extract the skeleton of the preprocessed Chinese character images.After scanning all the pixels in the binary image one by one, arithmetic and logical operations are performed on the eight neighborhoods of each pixel in order.Then, according to the result of arithmetic and logic operation, it is determined whether the pixels in the neighborhood need to be deleted.Finally, the skeleton of Chinese characters is obtained, which is beneficial to increase the diversity of similarity features.

3.
In order to quantitatively evaluate the roundness of handwritten Chinese characters, we applied three similarity coefficients: correlation coefficient, Tversky index, and cosine similarity.The similarity features between eight handwritten Chinese characters and template Chinese characters are extracted to distinguish characters written by different people with different normative levels, and an artificial neural network is used to identify the character content.Among them, the features required for cosine similarity calculation are extracted through concentric circle segmentation, texture features, grid features, and image projection.By comparing with the results of manual roundness evaluation, we found that the recognition accuracy rate can reach more than 90% based on the self-built handwritten Chinese character dataset.
This paper divides the work into four sections for detailed description: Section 1 is the introduction, which mainly introduces the research background and significance of the paper.Some methods of handwritten Chinese character recognition and evaluation are summarized, and some data sets are introduced.In Section 2 we present the details of the method proposed in this paper, including the image preprocess and different methods for feature extraction.Section 3 evaluates and recognizes handwritten Chinese characters through different machine learning methods.Section 4 is the conclusions of the paper.

Experimental Method
In order to evaluate the neatness of the writer's writing, we need to perform image processing and evaluation comparison on the samples in the Chinese character dataset proposed in this paper.As shown in Figure 1, the procedure can be described as: 1.
Using image projection algorithm to cut out a single Chinese character from the written phrase or sentence.

2.
Using the weighted average method to grayscale the handwritten Chinese character image.Then, the best threshold is obtained through the peaks and troughs of the image grayscale histogram, and the Chinese character image is segmented by the global threshold segmentation method.After the binarization operation, filter processing is performed to eliminate the isolated points in the background of the Chinese character image.

3.
Eliminate the blank area of the Chinese character image by projection method to ensure that the position of the handwritten Chinese character image in the whole image is as consistent as possible with the position of the template Chinese character image.Then, we use the bicubic interpolation algorithm to unify the image size of Chinese characters, and scale all the handwritten Chinese characters images and template Chinese characters images after removing the borders into images with a size of 100 × 100.

4.
Using a parallel iterative algorithm Z-S skeleton extraction algorithm to complete the skeleton extraction of Chinese character images after preprocessing.Then, the selected template Chinese character images are processed in the same way to obtain a set of Chinese character images to be tested.

5.
Extract the similarity features of handwritten Chinese characters, evaluate the normative degree of handwritten Chinese characters based on the similarity features, and compare with manual evaluation.

Dataset Production
For the recognition and classification of handwritten characters in different languages, various datasets have been created and used.Below we provide an overview of some of the datasets including IAM, RAMIS, SHIBR, Parzival, Washington, Saint Gall, Germana, Esposalles, and Rodrigo.The IAM dataset consists of 1539 pages of handwritten modern English text, written by 657 writers.An important feature of the dataset is that each of the three subsets is used for training, validation, and testing [20].The RIMES database is composed of mail such as those sent by individuals to companies by fax or postal mail [21].Moreover, 12,723 pages written by 1300 volunteers have been collected corresponding to 5605 mails.The SHIBR dataset is semi-annotated, easing the development of automated and semi-automated machine learning methods for document analysis applications [22].The Parival dataset contains the epic poem Parzival, one of the most important epic works in the European Middle Ages, and is written with ink on parchment in the Middle High German language [23].The George Washington database is a baseline database for text line segmentation, word spotting, and word recognition tasks.The Washington database consists of 20 historical handwritten document images written in English with longhand script and inktype pen in the eighteenth century [24].RODRIGO is completely written in old Castilian (Spanish) by a single author and is comparable in size to standard databases.It is an 853-page bound volume divided into 307 chapters describing chronicles from Spanish history [25].Most pages only contain a single text block of nearly calligraphed handwriting on well-separated lines.The Saint Gall database is made up of Latin manuscripts written in the Carolingian script [26].GERMANA is the result of digitizing and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines [27].The Esposalles database is a Spanish historical handwriting document image database consisting of 173 document images [28].The documents were written between 1451 and 1905, and they contain information from the marriage licenses of Spanish citizens.After comparison, we list the comparison of these datasets, as shown in Table 1.
For the creation of the dataset, we invited five test subjects with large differences in writing level to write Chinese characters, with a total of 400 samples.At the same time, we use "同意办理" (means "agree to proceed") in italic font as the standard Chinese character template.In China, we are all beginners to other languages, and it is difficult to use other languages in practical applications.Regarding the banking or telecommunications business processing process, we found that the four characters of "同意办理" (means "agree to proceed") were used most frequently during business processing.This paper selects the four most common Chinese characters in business "同意办理" as the research object.Five test subjects with large differences in writing styles were invited to write Chinese characters.The identification numbers of these five people are A, B, C, D, and E, respectively.The five testers were asked to handwrite four Chinese characters "同意办理" 20 times, and the total number of Chinese characters written by each person was 80. Therefore, we obtained 400 Chinese characters written from the five subjects, and the four characters "同意办理" in the italic Song font in the word are printed as the standard Chinese character template.This database has the following characteristics: 1.
The Chinese characters in the database are the most frequently used business words in banking or communication business.By testing these Chinese characters, it can reflect the effectiveness of the method proposed in this paper for content recognition and authentication.

2.
The samples in this dataset are intended to be used to evaluate the roundness of Chinese characters through similarity feature extraction, and the number of 400 samples is sufficient to verify the accuracy and reliability of the method.

3.
The writing style of sample Chinese characters varies greatly, which is convenient for manual judgment of writing quality.In this paper, the evaluation results of the neatness of Chinese writing should be compared with the results of manual evaluation.

Image Capturing and Skeleton Extraction
Five persons were asked to write four ordinary Chinese characters, i.e., "同", "意", "办", and "理", on the paper with a sign pen.We chose these four characters because they are the most commonly used in teller services.Then the written characters were captured as digital images for subsequent comparisons.The "同意办理" (means "agree to proceed") in Song typeface were adopted as standard characters and captured as templates.Before image processing, eight senior hand-writing masters were selected to evaluate the written characters and the human evaluation results would be used later for comparison.Figure 2 shows the images of the characters after they were preprocessed by being grayed, binarized, denoised, and scaled.These preprocessing measures could eliminate the errors caused by the color difference of the background while minimizing the influence of writing position and character size.Character skeleton extraction [29,30] is believed as a key preprocessing step and it can reflect the writing trace with less interference from contour noises.Hence, we extracted the skeleton of the characters before calculating the cosine similarity, as shown in Figure 3.The skeleton extraction process started with the binary images as presented in Figure 2, where the value of each pixel was 0 (black) or 1 (white).First, the binary Chinese character images were inverted.That meant that the original white background turned black and the black character turned white.Then, the isolated white pixels, which had no white neighboring pixels, were removed by setting them to 0. Next, the binary images with a black background and a white character skeleton were obtained.Finally, the images were once again inverted to obtain black skeleton images with a white background.The purpose of skeleton extraction was to investigate whether the thickness of Chinese characters would affect the similarity calculation results.

Correlation Coefficient
The correlation coefficient [31] between two random variables is a measure of their linear dependence.If both variables A and B have N scalar observations, then the Pearson correlation coefficient can be defined as: where µ A and σ A are the mean and standard deviation of variable A, while µ B and σ B are the mean and standard deviation of variable B.
Figure 4 shows the calculation flow chart of the Pearson correlation coefficient between the test character and its template.After preprocessing, the dimensions of both the test and template images were adjusted to 100 by 100 columns.Here, vectors A and B were defined.Both of them were one-dimensional column vectors and consisted of 10,000 elements by rearranging the pixel matrix of the template and test images as demonstrated in Figure 4.Then, the correlation coefficient between vectors A and B was calculated based on Formula (1).The higher the correlation coefficient between vectors A and B, the higher the similarity between the test and template images.

Pixel Coincidence Degree
Tversky index [32][33][34] is another measure of the similarity between two different sets and it was used in this paper to calculate the degree of pixel coincidence between the test and template images for further comparison.According to the concept of similarity, a function F was defined to measure the similarity between sets A and B as shown in Formula (2): Here, A ∩ B is the intersection between A and B; A ∪ B is the union of A and B; A − B represents features belonging to A not to B and B − A represents features belonging to B not to A. These relationships are schematically plotted in Figure 5.
Figure 6 is a flow chart showing the process of calculating the degree of pixel coincidence between the template and test images.First, the original template and test images should be preprocessed following the procedure described in Section 2.2 and converted to binary images with black characters and white backgrounds.Then, the images were inverted to obtain images with white characters and black backgrounds.To calculate the similarity between the template and test images using Formula (2), the pixel matrix of the template image was defined as set A and that of the test image as set B. Next, the intersection and union of sets A and B were calculated, and the related images are shown in Figure 6.It is worth noting that, in the binary images, the pixel could be considered as a logic variable, which had either of these two values: 0 or 1.Hence, the intersection was obtained by performing AND operation between every two pixels appearing in the same position in the template and test images and further represented by the total number of pixels with the value of 1.Meanwhile, the union was obtained by implementing the OR operation between every two pixels appearing in the same position and also further represented by the total number of pixels with the value of 1. Finally, the degree of pixel coincidence was calculated based on Formula (2).

Cosine Similarity
Cosine similarity [35] is a measure for evaluating the difference between two different vectors.It can be calculated by Formula (3): Here, both A and B are the eigenvectors, A i and B i are the elements of the vectors of A and B, and θ is the angle of A and B in the vector space.
To calculate the cosine similarity, we needed to extract the eigenvectors from the digital images in advance and four different ways were adopted here.
In the first way, after image preprocessing and skeleton extraction, column and row eigenvectors were obtained by simply summing up pixels in each column and row.
In the second way, the gray co-occurrence matrix was used to extract texture parameters as eigenvectors, which were calculated by Formula (4).Specifically, P is the probability of pixel pairs, R is the normalization factor, N is the size of the Chinese character images, and θ is the scanning angle of the pixel pairs.After image preprocessing, the image was transformed and compressed to a grayscale image.The texture features included energy, contrast, entropy, mean, variance, and other features, which could reflect the slow change or periodicity of the image surface.When calculating the co-occurrence matrix, we could choose a pixel pair with a horizontal orientation, a vertical orientation, a 45-degree downward slope, or a 45-degree upward slope [36][37][38].
In the third way, after image preprocessing and skeleton extraction, the Chinese character images were divided into 24 regions as shown in Figure 7.The sum of the pixel values in each region was calculated as one element of the eigenvector and arranged in the order shown in Figure 7b.In the fourth way, a pre-processed binary image (size: 100 pixels × 100 pixels) was divided into 100 regions as shown in Figure 8a and the sum of the pixel values was calculated in each region.To compress the number of pixels to 100, the sum of pixels was calculated in each region.If the sum was less than 50, the whole region was set to 0; if otherwise, it was set to 1.Then, these treated pixels were used as eigenvector elements arranged in the order shown in Figure 8b.

Correlation Coefficient Results
By calculating the correlation coefficient between the test and template character images, we evaluated the Chinese characters "同", "意", "办" and "理" (means "agree to proceed") written by five different persons labeled with "A", "B", "C", "D", and "E", respectively.The results are shown in Table 2.It can be seen that the biggest correlation coefficient was 0.47 and the smallest was only 0.01.The correlation coefficient of characters written by person A was no less than 0.32, whereas that by person E was no greater than 0.08.Although they were all less than 0.5, these results were virtually consistent with our subjective feelings.This is probably because the template characters usually have thicker strokes than the handwritten ones.It is worth noting that the characters written by person E were the most scrawled and had the smallest correlation coefficient.Unfortunately, although experienced calligraphy experts have no trouble recognizing scrawled characters, it is fairly challenging for the machine to perform such tasks just based on correlation coefficients.This also suggests that characters in a specific font are definitely needed in the automatic recognition of electronic signatures and perfunctory writing should be avoided.

Pixel Coincidence Degree Results
The degree of pixel coincidence between the test and template characters was calculated based on Formula (2), and the results are shown in Table 3.All the images were preprocessed using the procedure described in Section 2.2.We see that all the pixel coincidence values were less than 0.5, but they can clearly indicate the normative level of the characters following the positive correlation law.Compared with the correlation coefficient method, the pixel coincidence method produced little higher similarity values for group E, and thereby delivered better recognition results.This may be because the former method viewed the pixel column as a random variable and could get a smaller correlation coefficient when a big difference was present in between the test and template characters.However, the pixel coincidence method aimed to evaluate the degree of overlap between the test and template characters, and the proportion of overlap would not be extremely low even if a big difference was present.Nevertheless, for the recognition of characters written by persons A, B, C, and D, the pixel coincidence method had less satisfactory differentiation performance than the correlation coefficient method.

Cosine Similarity Results
Eigenvector extraction is one of the key steps in calculating the cosine similarity.Here, we obtained the eigenvectors through the four different ways described in Section 2.5 and the results are illustrated in Figure 9.In each subgraph, the abscissas labeled with "同", "意", "办" and "理" (means "agree to proceed") represent the four Chinese characters after image preprocessing, and "S同", "S意", "S办" and "S理" represent the four Chinese characters after skeleton extraction."A", "B", "C", "D", and "E" represent five subjects at different levels.Figure 9a compares the cosine similarity values calculated based on the handwritten Chinese character images before and after skeleton extraction, respectively.Eigenvectors were calculated using the first way described in Section 2.5.It can be seen from Figure 9a that the cosine similarity of the skeleton images was significantly lower than that of the preprocessed images.This is because the difference in the position of Chinese characters in the images after skeleton extraction had a greater impact on the result, which led to a larger difference in the pixel sum between the template and test characters in the row.
Figure 9b shows the cosine similarity values calculated based on the texture feature.The texture feature was extracted for gray images and skeleton extraction was performed on binary images.Therefore, no skeleton extraction is involved here.Because it is difficult for the texture feature to show the small differences in written Chinese characters, this method cannot distinguish the Chinese characters effectively.
The results shown in Figure 9c were obtained by processing Chinese character images using the third way described in Section 2.5.From the perspective of the specific data, when this method is used for extracting feature vectors and calculating cosine similarity values, whether the skeleton is extracted has little influence on the similarity results.Specifically, the regional segmentation of Chinese characters reduces the influence of the character position difference on the similarity degree, and the thickness also varies little among Chinese characters written with a hard pen.This means that the skeleton extraction has a minimal influence on the calculated similarity values.
Figure 9d shows the cosine similarity results obtained by calculating the eigenvectors using the fourth way described in Section 2.5.This way reduces the influence of the Chinese character position difference on the similarity results.To understand how the stroke thickness influences the similarity results, we compared the similarity results calculated based on images with and without the skeleton extracted.It can be seen from the analysis that skeleton extraction will magnify the position difference between the test Chinese characters and the template, which is not conducive to similarity evaluation.However, appropriate segmentation can reduce the impact of position deviation.
In these four ways, the third one had the highest accuracy and the strongest discrimination ability.This is because this segmentation method can reflect both the stroke position and stroke writing direction differences among different Chinese characters, which can fully and accurately reflect the differences between test and template characters.

Neural Network Test Results
In this section, we used a neural network [39][40][41][42][43][44] to evaluate the effect of previously calculated similarity values on the evaluation of handwritten Chinese characters.First, we asked one subject to handwrite four Chinese characters "同", "意", "办", and "理" (means "agree to proceed") with five different normative levels labeled with "A", "B", "C", "D", and "E".Each of the Chinese characters was written 20 times for each degree and thus 100"同", 100"意", 100"办", and 100"理" were obtained.Then, we calculated the correlation coefficient, coincidence degree, and cosine similarity degree between each 100 handwritten Chinese characters and their templates using the previously described methods [45].It is worth noting that six different results were obtained from the cosine similarity calculation; plus the correlation coefficient and coincidence degree, there were a total of eight features for classifying the normative level of the Chinese characters [46].Here, we used a neural network to classify each 100 handwritten characters and the results are shown in Figure 10. Figure 10a-d shows the classification results of the four handwritten Chinese characters.Specifically, "A", "B", "C", "D", and "E" represent the normative level; the numbers and percentages in the green grids are the numbers and percentages of characters correctly classified as the corresponding levels, respectively.In the gray grids, two percentages were used to represent the total recognition accuracy rate (in green) and error rate (in red).The total recognition accuracy rates of "同", "意", "办" and "理" (means "agree to proceed") were 94%, 98%, 93%, and 94%, respectively.These results verified that our similarity-based method worked well in distinguishing the normative levels of characters written by one person.
To distinguish the normative levels of characters written by different persons, we asked another four subjects to handwrite "同", "意", "办" and "理" in five different normative levels with "A", "B", "C", "D", and "E", and again, each of the characters was written 20 times for each degree.Then, we put all the characters in the same normative level written by four persons together for further classification.This Chinese character set contained 80"同", 80"意", 80"办", and 80"理" in each level.The identification results of the character "意" are shown in Figure 11.Among the 80 A-level characters, 79 were correctly identified as A-level, while the remaining 1 character was mistakenly identified as B-level.Among the 80 B-level characters, 53 were correctly identified as B-level, and 2 characters were mistakenly identified as A-level, 11 as C-level, 4 D-level, and 10 as E-level.The total recognition accuracy rate of B-level characters was only 66.3%.The recognition accuracy rates of C-, D-, and E-level characters were also less than optimal and mostly not greater than 60%.These results suggest that the recognition accuracy was not high when identifying characters written by four subjects simultaneously.This may be because a big difference was present among non-normative characters written by different persons.For example, the B-level characters written by one subject were likely to be similar to C-or D-level characters written by other subjects.Finally, we asked four subjects to handwrite "同", "意", "办" and "理" (means "agree to proceed") in three different levels labeled with "A", "B", and "C", and again, each of the characters was written 20 times for each level.The number of each character in each level was also 80.The classification results of "同", "意", "办", and "理" are shown in Figure 12a-d, respectively.In Figure 12a, we see that 69 of the A-level characters "同" were correctly identified as A-level, while the remaining 11 were mistakenly identified as B-level; in other words, the recognition accuracy of A-level "同" was about 86.3%.Since the recognition accuracy rates of B-and C-level "同" were 93.8% and 89.2%, respectively, the total accuracy of "同" characters in all different levels was about 89.2%.As shown in Figure 12b-d, we can deduce that the total recognition accuracy rates of "意", "办" and "理" were 83.3%, 83.8%, and 81.7%, respectively.These recognition results were much superior to those of characters written in five different levels.
To verify the effectiveness of our similarity-based method in differentiating the Chinese characters, we asked 20 persons to write "同", "意", "办", and "理" (means "agree to proceed") and adopted the aforementioned similarities as features for further classification.As described in Section 3.4, eight features were extracted, including four cosine similarities calculated based on non-skeleton images in four different ways, two cosine similarities calculated based on the skeleton images shown in Figure 9a, c correlation coefficient, and pixel coincidence degree.The results of distinguishing each character from the other three ones are shown in Figure 13a-d.The recognition accuracy rates of the characters "同", "意", and "理", were all 100%, and that of "办" is 98%.This proves that our similarity-based method worked well in distinguishing one character from the others [47].
To compare the performance of different machine learning algorithms in classifying the four characters, we also applied SVM, K-NN, and CNN to replace the BP neural network.The experimental results of using these methods to perform classification on the same features are shown in Table 4. From the table, we can find that the neural networks (BP and CNN) have more than 90% accuracy, while the accuracy of SVM and K-NN is only about 60%-70%.Among them, the average accuracy rate of CNN (95.38%) is the highest.The results of distinguishing one character from others.Distinguish (a) "同" and "意办 理" (b) "意" and "同办理" (c) "办" and "同意理" (d) "理" and "同意办" (means "agree to proceed").Table 4. Different methods for feature classification and recognition for the Chinese Characters "同意 办理" (means "agree to proceed").To compare the performance of different machine learning algorithms in classifying the four characters, we also applied SVM, K-NN, and CNN to replace the BP neural network.The experimental results of using these methods to perform classification on the same features are shown in Table 4. From the table, we can find that the neural networks (BP and CNN) have more than 90% accuracy, while the accuracy of SVM and K-NN is only about 60%-70%.Among them, the average accuracy rate of CNN (95.38%) is the highest.To compare the performance of different machine learning algorithms in classifying the four characters, we also applied SVM, K-NN, and CNN to replace the BP neural network.The experimental results of using these methods to perform classification on the same features are shown in Table 4. From the table, we can find that the neural networks (BP and CNN) have more than 90% accuracy, while the accuracy of SVM and K-NN is only about 60%-70%.Among them, the average accuracy rate of CNN (95.38%) is the highest.To compare the performance of different machine learning algorithms in classifying the four characters, we also applied SVM, K-NN, and CNN to replace the BP neural network.The experimental results of using these methods to perform classification on the same features are shown in Table 4. From the table, we can find that the neural networks (BP and CNN) have more than 90% accuracy, while the accuracy of SVM and K-NN is only about 60%-70%.Among them, the average accuracy rate of CNN (95.38%) is the highest.To compare the performance of different machine learning algorithms in classifying the four characters, we also applied SVM, K-NN, and CNN to replace the BP neural network.The experimental results of using these methods to perform classification on the same features are shown in Table 4. From the table, we can find that the neural networks (BP and CNN) have more than 90% accuracy, while the accuracy of SVM and K-NN is only about 60%-70%.Among them, the average accuracy rate of CNN (95.38%) is the highest.

Comparison with Human Evaluations
In order to realize the accurate evaluation of handwritten Chinese characters, researchers use various methods including feature similarity calculation, deep learning, rule-based and fuzzy, and matrix-based technical methods.A rule-based approach evaluates handwritten Chinese characters by formulating feature normative rules [48].This method is easy to implement but relies on experts to formulate rules for each feature and requires re-formulation of new strokes.Furthermore, this method is only effective for standard Chinese character feature extraction, and the judgment of writing error type is fully determined the rules, which limits the accuracy and diversity of evaluation.The method based on matrix fuzzy is to express the features that cannot be specifically described by handwritten Chinese characters by the membership degree of the fuzzy matrix [49].It can solve the problem of vague concepts in handwritten Chinese characters, but the data acquisition relies on online devices and cannot obtain a detailed evaluation of handwritten Chinese characters.Aiming at the small data set of handwritten Chinese characters in banking communication business proposed in this paper, this paper uses a combined method based on feature similarity calculation for obtaining the detailed evaluation results of handwritten Chinese characters and a machine learning-based approach to learn the evaluation information of handwritten Chinese characters from input data.However, current machine learning methods also rely on traditional feature extraction methods [50].
We experimentally compared the computer evaluation results with the human evaluation results to demonstrate the accuracy of our proposed methods.We asked eight calligraphy experts to evaluate the handwritten Chinese characters.The experts rated the characters on a scale of one to five, based on how similar they were to the standard template characters.The human evaluation results are shown in the last row of Table 3.
In Table 5, the first row shows the results evaluated by the correlation coefficient method.First, we averaged the values of similarity between the test characters written by each of the five subjects and the template ones.Then, the five similarity values were divided by the maximum one, and the result was multiplied by 100 to obtain the handwriting evaluation score.Furthermore, using the same process, we obtained the evaluation scores based on the pixel coincidence degree and cosine similarity values.For a reasonable evaluation, we arranged the evaluation results of the five subjects in "A", "B", "C", "D", and "E" in descending order.We see that the evaluation results of "同", "办", and "办" were consistent among the first six subjects and those of "意" were consistent among all the subjects.Finally, we averaged the scores of all the experts as the total human score.
To quantitatively evaluate the ability disparity among the three methods in evaluating Chinese characters, the statistical distributions of errors in evaluation are illustrated using a box-and-whisker diagram shown in Figure 14.The diagram shows the variance between the scores converted from the similarity values and those given by the experts.We can find that both correlation coefficient and pixel coincidence degree had a maximum deviation of less than 20%, and the cosine similarity had a maximum deviation of greater than 50%.This means that the results of the former two methods in evaluating handwritten Chinese characters were closer to the human results than the third one.From the data illustrated in Table 5 and Figure 14, we see that the correlation coefficient method performed the best in evaluation, exhibiting an ability closest to that of humans.The evaluation results of this method were in good agreement with the human results.In addition, the scores converted from the correlation coefficients also showed a good degree of differentiation for recognizing the characters written by five subjects with different writing levels.
Meanwhile, the scores converted from the pixel degree were also very close to the human scores.However, the maximum deviation of the pixel coincidence degree method was over 20% and a little larger than that of the correlation coefficient-based method.Nonetheless, this method can also be regarded as acceptable.The scores converted from the cosine similarity varied from 85 to 100.The most scrawled characters written by subject E got an average score of 85, which was quite different from the score given by the experts.It should be noted that the scores on the characters written by subjects C, D, and E were all quite different from the corresponding human scores as well.This means the cosine similarity method had little ability to distinguish the normative level of Chinese characters.
Finally, the results recognized by the neural network verified a comprehensive evaluation.For the 100 characters in five different normative levels written by one person, the recognition accuracy was above 90%.For the 400 characters in three different normative levels written by four different persons, the recognition accuracy was above 80%.

Conclusions
Previous studies have demonstrated the effectiveness of computer technology in evaluating calligraphy works from the perspective of art.However, these studies have not focused on the evaluation of the normative level of handwritten Chinese characters that are frequently used in electronic transactions in China.In this study, we proposed a strategy for evaluating the normative level of handwritten characters based on three different similarities, including correlation coefficient, pixel coincidence degree, and cosine similarity, to improve the accuracy for the recognition of ordinary handwritten Chinese characters.
We found that the calculated similarities and related features are effective for evaluating the normative level of off-line handwritten characters and recognizing their content.The similarity-based methods facilitate the recognition of handwritten Chinese characters, while helping to standardize writing actions and thereby avoid scrawled and non-standard characters.In addition, the calculation process involved in our study was simple and the algorithm ran quickly and automatically.Most notably, the results from evaluation based on the correlation coefficient and pixel coincidence degree were close to those from human evaluation.Therefore, our proposed methods are practical in improving the recognition

Figure 2 .
Figure 2. The images of Chinese characters "同", "意", "办", and "理" (means "agree to proceed") written by five different persons labeled with A, B, C, D, and E in five normative levels and their Song typeface counterparts (the first column).

Figure 5 .
Figure 5. Graphical illustration of relationships between two features.

Figure 9 .
Figure 9.The cosine similarity results for the Chinese Characters "同意办理" (means "agree to proceed"), where "S" means "skeleton extraction".(a) Compares the cosine similarity values calculated before and after skeleton extraction.(b) The cosine similarity values calculated based on the texture feature.(c) Processing Chinese character images using the 24-regins way.(d) Calculating the eigenvectors using the way that sum 100 pixels.

Figure 11 .
Figure 11.The results of a neural network test in which four subjects wrote the Chinese character in five different levels.

Figure 14 .
Figure 14.Statistical distributions of grading errors in similarity evaluation results.

Table 1 .
Comparison of different datasets.

Table 4 .
[48]erent methods for feature classification and recognition for the Chinese Characters " 同意办理" (means "agree to proceed").In order to realize the accurate evaluation of handwritten Chinese characters, researchers use various methods including feature similarity calculation, deep learning, rule-based and fuzzy, and matrix-based technical methods.A rule-based approach evaluates handwritten Chinese characters by formulating feature normative rules[48].This

Table 4 .
[48]erent methods for feature classification and recognition for the Chinese Characters " 同意办理" (means "agree to proceed").In order to realize the accurate evaluation of handwritten Chinese characters, researchers use various methods including feature similarity calculation, deep learning, rule-based and fuzzy, and matrix-based technical methods.A rule-based approach evaluates handwritten Chinese characters by formulating feature normative rules[48].This

Table 4 .
Different methods for feature classification and recognition for the Chinese Characters " 同意办理" (means "agree to proceed").
[48]Comparison with Human EvaluationsIn order to realize the accurate evaluation of handwritten Chinese characters, researchers use various methods including feature similarity calculation, deep learning, rule-based and fuzzy, and matrix-based technical methods.A rule-based approach evaluates handwritten Chinese characters by formulating feature normative rules[48].This Appl.Sci.2022, 12, 8521 16 of 20

Table 4 .
Different methods for feature classification and recognition for the Chinese Characters " 同意办理" (means "agree to proceed").
[48]Comparison with Human EvaluationsIn order to realize the accurate evaluation of handwritten Chinese characters, researchers use various methods including feature similarity calculation, deep learning, rule-based and fuzzy, and matrix-based technical methods.A rule-based approach evaluates handwritten Chinese characters by formulating feature normative rules[48].This

Table 5 .
Comparison of evaluation results.