Development of a Chinese Chess Robotic System for the Elderly Using Convolutional Neural Networks

: According to the data from Alzheimer’s Disease International (ADI) in 2018, it is estimated that 10 million new dementia patients will be added worldwide, and the global dementia population is estimated to be 50 million. Due to a decline in the birth rate and the development and great progress of medical technology, the proportion of elderly people has risen annually in Taiwan. In fact, Taiwan has become one of the fastest-growing aged countries in the world. Consequently, problems related to aging societies will emerge. Dementia is one of most prevailing aging-related diseases, with a great inﬂuence on daily life and a great economic burden. Dementia is not a single disease, but a combination of symptoms. There is currently no medicine that can cure dementia. Finding preventive measures for dementia has become a public concern. Older people should actively increase brain-protective factors and reduce risk factors in their lives to reduce the risk of dementia and even prevent the occurrence of dementia. Studies have shown that engaging in mental or creative activities that stimulate brain function has a relative risk reduction of nearly 50%. Elderly people should develop the habit of life-long learning to strengthen e ﬀ ective neural bonds between brain cells and preserve brain cognitive functions. Playing chess is one of the suggested activities. This paper aimed to develop a Chinese robotic chess system for the elderly. It mainly uses a camera to capture the contour of the Chinese chessman, recognizes the character and location of the chessman, and then transmits this information to the robotic arm, which will grab and place the chessman in the appropriate position on the chessboard. The camera image is transmitted to MATLAB for image recognition. The character of the chessman is recognized by convolutional neural networks (CNNs). Forward and inverse kinematics are used to manipulate the robotic arm. Even if the chessmen are arbitrarily placed, the experiment showed that their coordinates can be found through the camera as long as they are located within the working scope of the camera and the robotic arm. For black chessmen, no matter how many degrees they are rotated, they can be recognized correctly, while the red ones can be recognized 100% of the time within 90 ◦ of rotation and 98.7% with more than a 90 ◦ rotation.


Introduction
The data from Alzheimer's Disease International (ADI) in 2018 estimate that 10 million new dementia patients will be added worldwide. This means that an average of one person will suffer from dementia every 3 s. The global dementia population is estimated to be 50 million, and by 2050 the  Before we applied convolutional neural networks (CNNs) in the Chinese chess game, previous studies focused massively on recognition and detection for applications such as healthcare, education, e-commerce, surveillance systems, and many others [9][10][11][12][13][14][15]. Regarding these strategies, Manwatkar et al. [16] introduced the process of converting images to text using document image analysis (DIA). A more sophisticated method was carried out by Lara et al. [17] and Jalal et al. [18] using human activity recognition (HAR). Although the method is not specifically used for text, its performance is quite good, with a recognition rate of 97.16%. In another paper [19,20], the hidden Markov model (HMM) was used to detect shapes and motion features. Thus, several methods for detecting moving targets have been introduced. However, in this paper, the chessman's target is not moving. Its position is very random, determined by the Chinese symbol.
General optical character recognition (OCR) begins by recognizing printed numbers and letters, and then develops to recognize the printed texts. Chinese chess has various font types and various characters. Wen [21] proposed an input image and database feature comparison method that consists of the noise filter, object extraction, normalization, feature calculation of the distance between the contour of the character and the center of the chessman, and maximum energy slop algorithm, for the Chinese chessmen. Seniman et al. [22] presented the backpropagation algorithm of a feed-forward neural network as well as direction feature extraction method by iterating and calculating the directions surrounding each pixel in the image to obtain the features and recognize Chinese chess characters. The proposed method had the ability to resist noise, brightness changes and rotation, and was tested by five different fonts. The image preprocessing and advanced Hough transformation [23] was used to segment the image and calculate the location of the center of the chessman and the circle edge of the chessman, respectively. Fang [24] designed a machine vision system for Chinese chess-playing robots with two color cameras taking two images from different angles simultaneously. A hierarchical Hough transform algorithm was used to detect lines and circles in the binarized image and the backpropagation neural network and ring intersection points were adopted to recognize the Chinese characters. In addition, experimental results verified that it can work well with higher reliability.
CNN is used to recognize Chinese chessmen in this paper; several previous studies have applied it to human activity recognition, face recognition, and text recognition [25][26][27]. Most of the recognition and segmentation work has involved hybrid methods, such as object recognition, human tracking, activity recognition, and human gait [28][29][30]. Meanwhile, [30] used depth image for face recognition, [26] adopted a similar technique but applied to time attendance systems, and [31] proposed Sustainability 2020, 12, 3980 4 of 20 high multiplexing system performance to support face recognition over Wi-Fi. Action recognition, proposed by [32][33][34][35], is more complicated and can be solved using an RGB-D camera with intrinsic features. However, the whole process of identification used a fixed orientation. It is very diverse from Chinese chess, where the position of a round piece can change its orientation when picked by a gripper so that it will become an obstacle during recognizing. For this reason, in recognizing targets, approaches such as a human way of thinking are needed.
Previously, the artificial neural network (ANN) is a model developed based on imitating the structure and operation of the brain and becomes the basis of the convolution network. This method can be used to simulate complex models and prediction problems. The traditional neural network consists of three parts: the input layer, the hidden layer, and the output layer. The hidden layer has many neurons, and each neuron in each layer is connected to all neurons in the next layer. A network with multiple hidden layers is called a multilayer perceptron (MLP) [36].
For computer vision, a convolutional neural network (CNN) [37] is mainly used for image classification and object recognition. The main difference between an MLP and CNN is that only the last layer of a CNN is fully connected, while in an MLP, each neuron is connected to each neuron of the next layer, resulting in a large increase in the number of parameters. For large images, it generates complex vectors. In addition, it ignores spatial information and flattens the image as input. J. Jin et al. [38] used CNNs to recognize traffic signs and used hinge loss stochastic gradient descent to train CNNs which were evaluated on the German traffic sign recognition benchmark. Chen et al. [39] presented a hybrid deep convolutional neural networks (HDNNs) to recognize vehicles in satellite images by dividing the maps of the last convolutional layer and the max-pooling layer of DNN into multiple blocks of variable receptive field sizes or max-pooling field sizes to enable the HDNN to extract variable-scale features. In addition to images, CNNs are also used for speech recognition. O. Abdel-Hamid et al. [40] used a limited-weight-sharing scheme to simulate speech features in CNNs. Compared with DNNs, the bit error rate of the proposed method is reduced by 6-10%.
In this paper, the chess piece is photographed by a camera and the picture is input to a convolutional neural network (CNN) for chess recognition. At the same time, the coordinates of the chessman are obtained by image processing and sent to a robot system to grab the target chessman using the forward and inverse dynamics. In this paper, the CNN will be used to recognize the characters on the chessmen and distinguish the front or backside of the chessmen, even when they are randomly placed. The robot arm will be controlled to accurately grasp the chessmen and place them on exact positions of the chessboard. The remainder of this paper is organized as follows: the Chinese chess robotic system is introduced in Section 2 and the convolutional neural network is indicated in Section 3. Experimental results are analyzed in Section 4 along with the conclusions in Section 5 of the paper.

Chinese Chess Robotic System
The chessmen are photographed through the camera and the image is processed. Then, the coordinate transformation for the chessmen is setup by PC. Finally, the chessmen are randomly picked up by the robot arm through the gripper and placed on the proper positions on the chessboard, as shown in Figure 2. The robotic arm uses five-degree-of-freedom (5DOF) Microbot's TeachMover II, whose variables of the kinematics model are shown in Figure 3 [41], where the distance between each joint is respectively represented by constants H, L and LL with values of 195.0, 177.8, and 96.5 mm. Table 1 lists the relation between the motor step and the actual joint rotation.
In order to define the coordinate system of the robotic arm, it is necessary to first establish a coordinate on each link and use the Denavit-Hartenberg (DH) rule to determine the DH transformation matrix of each link. As long as this transformation matrix is used to achieve the transformation of the two coordinate systems, thus the equations of forward and inverse kinematics are derived. Table 2 shows the D-H parameters of the robot, where α i , a i , d i , and θ i respectively, represent link twist, link length, link distance, and link angle.    In order to define the coordinate system of the robotic arm, it is necessary to first establish a coordinate on each link and use the Denavit-Hartenberg (DH) rule to determine the DH transformation matrix of each link. As long as this transformation matrix is used to achieve the transformation of the two coordinate systems, thus the equations of forward and inverse kinematics are derived.     In order to define the coordinate system of the robotic arm, it is necessary to first establish a coordinate on each link and use the Denavit-Hartenberg (DH) rule to determine the DH transformation matrix of each link. As long as this transformation matrix is used to achieve the transformation of the two coordinate systems, thus the equations of forward and inverse kinematics are derived.
Pitch 0 Inverse kinematics estimates the motion angle of each joint axis if the position of the end-effector of the robotic arm is given. The angle of each joint can be also derived from the geometric point of view. As shown in Figure 4, when the point P at the end-effector of the arm with the known coordinates is projected onto the XY plane, we can find the angle θ 1 . Referring the picture and geometric figure of the robotic arm in Figure 5, we may obtain the angles of θ 2 , θ 3 and θ 4 as follows, where d is the distance between points O and E, p is the distance between points B and D, and where d is the distance between points O and E, p is the distance between points B and D, and

Convolutional Neural Network
A convolutional neural network (CNN) consists of one or more convolutional layers, and then one or more fully connected layers (FCs) which are similar to the neural network structure. The structural design of the CNN uses the two-dimensional structure of the image as inputs to achieve local connection, weighting, and then pooling, which equips CNN translation-invariant features. Compared to neural networks with similar layers, CNNs have fewer parameters and connections and are therefore easier to train. CNNs consist of many convolutional and pooling layers, and finally a fully connected layer. A convolution layer adopts an image as its input and is formed by a plurality of different, generally 3 3× , filters (called convolution kernels) to conduct convoluting operation and then produce different features.
The convolutional principle uses a small-sized window to slide from left to right and top to bottom to obtain the local features in the image as the inputs of the next layer. This sliding window is called a convolution kernel or filter. The matrix formed by sliding and calculating on the image is called a convolution feature or feature map. The feature map is the output to the next layer through a rectified linear unit (ReLU) for activation function. It is a type of downsampling, because the size of the data will be reduced, so the number of parameters and calculations are reduced, which speeds up the system operation, reduces the possibility of overfitting, and has the effect of anti-interference.

Convolutional Neural Network
A convolutional neural network (CNN) consists of one or more convolutional layers, and then one or more fully connected layers (FCs) which are similar to the neural network structure. The structural design of the CNN uses the two-dimensional structure of the image as inputs to achieve local connection, weighting, and then pooling, which equips CNN translation-invariant features. Compared to neural networks with similar layers, CNNs have fewer parameters and connections and are therefore easier to train. CNNs consist of many convolutional and pooling layers, and finally a fully connected layer. A convolution layer adopts an image as its input and is formed by a plurality of different, generally 3 × 3, filters (called convolution kernels) to conduct convoluting operation and then produce different features.
The convolutional principle uses a small-sized window to slide from left to right and top to bottom to obtain the local features in the image as the inputs of the next layer. This sliding window is called a convolution kernel or filter. The matrix formed by sliding and calculating on the image is Sustainability 2020, 12, 3980 7 of 20 called a convolution feature or feature map. The feature map is the output to the next layer through a rectified linear unit (ReLU) for activation function. It is a type of downsampling, because the size of the data will be reduced, so the number of parameters and calculations are reduced, which speeds up the system operation, reduces the possibility of overfitting, and has the effect of anti-interference. After sampling, the outputs are inputted to the fully connected layer [23,24]. The fully connected layer is a general neural network for classification. The connection layer is also the easiest way to learn a non-linear combination of the features from the previously convolutional layer and pooling layer. We flatten the feature map in the fully connected layer and update the weights in the neural network through backpropagation.
The softmax function is used in the output of the fully connected layer. The softmax function can convert an N-dimensional vector containing any real number into another N-dimensional real vector so that the range of each element in the vector is between 0 and 1, and the sum of all elements is 1. The equation of softmax function is described as Since the output of the Softmax function is between 0 and 1, it can be regarded as the probability of one type of class prediction. The loss function is an important part of the artificial neural network. It is used to measure the inconsistency between the predicted value and the actual label. Its output is a non-negative value. The robustness of the model increases as the value of the loss function decreases. This paper uses a cross-entropy algorithm to calculate the loss function, shown in Equation (8), where N is the number of samples, K is the number of classifications, and t ij is the actual label. This paper uses the stochastic gradient descent method to update the network parameters (weights) in each iteration to minimize the loss function through the negative gradient direction of the loss function. The equation for updating parameters is as follows: where l is the number of iterations, ∇loss(w) is the gradient of the loss function, and α is the learning rate. The expression for calculating the loss function gradient is as follows: where j means all outputs and i is one of them. The CNN architecture used in this paper is shown in Figure 6, including three convolutional layers, three pooling layers, and one connection layer.
where N is the number of samples, K is the number of classifications, and ij t is the actual label. This paper uses the stochastic gradient descent method to update the network parameters (weights) in each iteration to minimize the loss function through the negative gradient direction of the loss function. The equation for updating parameters is as follows: where l is the number of iterations, ) (w loss ∇ is the gradient of the loss function, and α is the learning rate. The expression for calculating the loss function gradient is as follows: where j means all outputs and i is one of them. The CNN architecture used in this paper is shown in Figure 6, including three convolutional layers, three pooling layers, and one connection layer.

Experimental Results
This proposed system includes a robotic arm and a camera, where the camera communicates PC via USB, and PC sends signals to the robotic arm controller via RS232 to complete the action. The

Experimental Results
This proposed system includes a robotic arm and a camera, where the camera communicates PC via USB, and PC sends signals to the robotic arm controller via RS232 to complete the action. The camera is set up directly above the chessboard. All chessmen are randomly placed on the chessboard. The camera captures the image in this range, and then, from left to right and from bottom to top, the image is cut out sub-images of multiple chessmen. The sub-image is input to the CNN for recognition, and the recognition is repeated until the recognition of multiple chessmen is completed. The coordinates where the chessman is currently located and should be placed are transmitted to the robotic arm. The arm then picks up the recognized chessman and then places it in the correct position on the board. The system repeats the above procedure until all the chessmen are placed. The system block diagram is shown in Figure 7 and the experimental environment includes Logitech C310 camera and PC with CPU of Intel Core i5-3570 3.4 GHz shown in Figure 8.

Experimental Results
This proposed system includes a robotic arm and a camera, where the camera communicates PC via USB, and PC sends signals to the robotic arm controller via RS232 to complete the action. The camera is set up directly above the chessboard. All chessmen are randomly placed on the chessboard. The camera captures the image in this range, and then, from left to right and from bottom to top, the image is cut out sub-images of multiple chessmen. The sub-image is input to the CNN for recognition, and the recognition is repeated until the recognition of multiple chessmen is completed. The coordinates where the chessman is currently located and should be placed are transmitted to the robotic arm. The arm then picks up the recognized chessman and then places it in the correct position on the board. The system repeats the above procedure until all the chessmen are placed. The system block diagram is shown in Figure 7 and the experimental environment includes Logitech C310 camera and PC with CPU of Intel Core i5-3570 3.4GHz shown in Figure 8.  This paper uses MATLAB to integrate the program of the camera and the robotic arm, as shown in Figure 9. Under the GUI operation interface of MATLAB, the system performs basic actions such as picking up and placing chessmen. The upper left image is the original image, the lower left image is the binarized image, the upper right image is the chessman image, the lower right table shows the prediction result and its coordinates, and the system control block provides the keys for all operational functions. This paper uses MATLAB to integrate the program of the camera and the robotic arm, as shown in Figure 9. Under the GUI operation interface of MATLAB, the system performs basic actions such as picking up and placing chessmen. The upper left image is the original image, the lower left image is the binarized image, the upper right image is the chessman image, the lower right table shows the prediction result and its coordinates, and the system control block provides the keys for all operational functions. This paper uses MATLAB to integrate the program of the camera and the robotic arm, as shown in Figure 9. Under the GUI operation interface of MATLAB, the system performs basic actions such as picking up and placing chessmen. The upper left image is the original image, the lower left image is the binarized image, the upper right image is the chessman image, the lower right table shows the prediction result and its coordinates, and the system control block provides the keys for all operational functions.  When grasping an object, the exact coordinates of the object are required. But when using the camera, the imaging will be more or less distorted due to the problem of the camera itself [42]. Therefore, the camera must be calibrated to obtain the intrinsic parameters. The distortion is corrected through these parameters to obtain a correct image. The correction method uses a black and white 9 × 7 checkerboard diagram. Its grid size is 28 × 28 (mm 2 ). After placing the checkerboard image in front of the camera and allowing the camera to take the complete checkerboard image, we change the direction of the checkerboard image facing the camera for adjustment. Logitech's network camera C310 with resolution of 1280 × 960 pixels is used in this paper. The extrinsic parameters of the camera can be calculated by the cameraCalibrator function based the image captured by the camera [43]. The matrix of intrinsic parameters is:

Robot Arm Controller
where f x and f y are the focal lengths in the X and Y directions of the image plane, c x and c y are the reference points which are ideally the center of the image. CNN learns various chessmen's features to recognize them. To obtain training data, a large number of images, which will be binarized, are obtained by rotating and translating the chessmen, as shown in Figure 10. Using these images as training data to train CNN, the trained CNN will have high accuracy in recognition and can accurately determine the characters of chessmen. The training process is shown in Figure 11. The upper part depicts the change in accuracy during training and the lower figure shows the change in loss during training. The horizontal axis is the number of iterations. In the ninth training period (Epoch 9), the accuracy does not change much. The recognition time of a single piece is about 0.35 s, and it takes about 11 s to recognize all chessmen. During the recognition process, if affected by reflection, the chessman character will be incomplete, as shown in Figures 12 and 13. Table 3 shows the recognition tests of chessmen rotated by 0 • , 45 • , 90 • , 105 • , 120 • , and 180 • . For black chessmen, no matter how many degrees the chessmen are rotated, they can be recognized correctly, while the red ones can be recognized 100% of the time within 90 • of rotation, and some chessmen are unable to reach a 100% recognition rate at more than a 90 • rotation. The recognition of the red chessmen is obviously worse than that of the black ones, mainly because the characters of the black chessmen are quite different, but the red chessmen have the same radical, and the strokes are more likely to affect the recognition result. For the arbitrary placement test, considering the confusion matrices of red and black chessmen as shown in Tables 4 and 5, the accuracy of black chessmen is 100%, and the accuracy of red chessmen is 98.7%. In the case, the three chessmen of Sustainability 2020, 12, x FOR PEER REVIEW Table 5. Confuse matrix of red chessm

Actual Class
Predicted class When the camera captures the image, the coordinates of the ch actual ones due to the height of the chessman. Therefore, the real be corrected to obtain an accurate grab, as shown in Figure 14. Po C is the actual position of the chessman, and point B is the position camera. The errors before and after the correction of the coordina Figure 15 and Table 6. Since the height h of the chessman is kno chessman can be obtained through the trigonometric function af and its height H, number of images, which will be binarized, are obtained by rotating and translating the chessmen, as shown in Figure 10. Using these images as training data to train CNN, the trained CNN will have high accuracy in recognition and can accurately determine the characters of chessmen. The training process is shown in Figure 11. The upper part depicts the change in accuracy during training and the lower figure shows the change in loss during training. The horizontal axis is the number of iterations. In the ninth training period (Epoch 9), the accuracy does not change much. The recognition time of a single piece is about 0.35 s, and it takes about 11 s to recognize all chessmen. During the recognition process, if affected by reflection, the chessman character will be incomplete, as shown in Figures 12 and 13. Table 3 shows the recognition tests of chessmen rotated by 0°, 45°, 90°, 105°, 120°, and 180°. For black chessmen, no matter how many degrees the chessmen are rotated, they can be recognized correctly, while the red ones can be recognized 100% of the time within 90° of rotation, and some chessmen are unable to reach a 100% recognition rate at more than a 90° rotation. The recognition of the red chessmen is obviously worse than that of the black ones, mainly because the characters of the black chessmen are quite different, but the red chessmen have the same radical, and the strokes are more likely to affect the recognition result. For the arbitrary placement test, considering the confusion matrices of red and black chessmen as shown in Tables 4 and 5, the accuracy of black chessmen is 100%, and the accuracy of red chessmen is 98.7%. In the case, the three chessmen of 俥, 傌, and 炮 are confused with each other, affecting the recognition.        When the camera captures the image, the coordinates of the chessman will be different from the actual ones due to the height of the chessman. Therefore, the real coordinates of the chessman must be corrected to obtain an accurate grab, as shown in Figure 14. Point D is the camera position, point C is the actual position of the chessman, and point B is the position of the chessman estimated by the camera. The errors before and after the correction of the coordinates of the chessman are shown in Figure 15 and Table 6. Since the height h of the chessman is known, the actual coordinates of the chessman can be obtained through the trigonometric function after finding the camera position O and its height H, Sustainability 2020, 12, x FOR PEER REVIEW 13 of 21 Table 5. Confuse matrix of red chessmen.

Actual Class
Predicted class When the camera captures the image, the coordinates of the chessman will be different from the actual ones due to the height of the chessman. Therefore, the real coordinates of the chessman must be corrected to obtain an accurate grab, as shown in Figure 14. Point D is the camera position, point C is the actual position of the chessman, and point B is the position of the chessman estimated by the camera. The errors before and after the correction of the coordinates of the chessman are shown in Figure 15 and Table 6. Since the height h of the chessman is known, the actual coordinates of the chessman can be obtained through the trigonometric function after finding the camera position O and its height H, Figure 14. Geometric representation for real and camera coordinates.

Actual Class
Predicted class

Actual Class
Predicted class When the camera captures the image, the coordinates of the chessman will be different from the actual ones due to the height of the chessman. Therefore, the real coordinates of the chessman must be corrected to obtain an accurate grab, as shown in Figure 14. Point D is the camera position, point C is the actual position of the chessman, and point B is the position of the chessman estimated by the camera. The errors before and after the correction of the coordinates of the chessman are shown in Figure 15 and Table 6. Since the height h of the chessman is known, the actual coordinates of the chessman can be obtained through the trigonometric function after finding the camera position O and its height H, Sustainability 2020, 12, x FOR PEER REVIEW 13 of 21 Table 5. Confuse matrix of red chessmen.

Actual Class
Predicted class When the camera captures the image, the coordinates of the chessman will be different from the actual ones due to the height of the chessman. Therefore, the real coordinates of the chessman must be corrected to obtain an accurate grab, as shown in Figure 14. Point D is the camera position, point C is the actual position of the chessman, and point B is the position of the chessman estimated by the camera. The errors before and after the correction of the coordinates of the chessman are shown in Figure 15 and Table 6. Since the height h of the chessman is known, the actual coordinates of the chessman can be obtained through the trigonometric function after finding the camera position O and its height H, Sustainability 2020, 12, x FOR PEER REVIEW 13 of 21 Table 5. Confuse matrix of red chessmen.

Actual Class
Predicted class When the camera captures the image, the coordinates of the chessman will be different from the actual ones due to the height of the chessman. Therefore, the real coordinates of the chessman must be corrected to obtain an accurate grab, as shown in Figure 14. Point D is the camera position, point C is the actual position of the chessman, and point B is the position of the chessman estimated by the camera. The errors before and after the correction of the coordinates of the chessman are shown in Figure 15 and Table 6. Since the height h of the chessman is known, the actual coordinates of the chessman can be obtained through the trigonometric function after finding the camera position O and its height H,     Before recognizing, the original image must be cut into images of chessmen. After the original image is binarized, connected-component labeling (CCL) [44] is used to find the position of each chessman, as shown in Figure 16. These chessmen are cut out and recognized using CNN, as shown in Figure 17. The connected-component labeling algorithm scans the input binarized image and calculates its eight connectivity pixels when it encounters a value of 1. The labeling rules are [44]: After doing the image segmentation and CNN recognition, the character and coordinates of each chessman are known. Then, its coordinates are transferred to the robot arm for grabbing. Since the chessmen are randomly placed on the chessboard, the order of grabbing begins from the chessmen placed on both sides of the chessboard until all chessmen are placed, as shown in Figure 18. If there are back-side chessmen, the system can identify the situation and notify the robotic arm to turn over those first, and then perform image recognition, as shown in Figure 19. Figure 20 depicts the complete process of chess placement, beginning with the interface, locating the first chessman at the side of the chessboard, turning over the back-side chessman, finally finishing the placement.
in Figure 17. The connected-component labeling algorithm scans the input binarized image and calculates its eight connectivity pixels when it encounters a value of 1. The labeling rules are [44]:  After doing the image segmentation and CNN recognition, the character and coordinates of each chessman are known. Then, its coordinates are transferred to the robot arm for grabbing. Since the chessmen are randomly placed on the chessboard, the order of grabbing begins from the chessmen placed on both sides of the chessboard until all chessmen are placed, as shown in Figure  18. If there are back-side chessmen, the system can identify the situation and notify the robotic arm to turn over those first, and then perform image recognition, as shown in Figure 19. Figure 20 depicts the complete process of chess placement, beginning with the interface, locating the first chessman at the side of the chessboard, turning over the back-side chessman, finally finishing the placement. After doing the image segmentation and CNN recognition, the character and coordinates of each chessman are known. Then, its coordinates are transferred to the robot arm for grabbing. Since the chessmen are randomly placed on the chessboard, the order of grabbing begins from the chessmen placed on both sides of the chessboard until all chessmen are placed, as shown in Figure  18. If there are back-side chessmen, the system can identify the situation and notify the robotic arm to turn over those first, and then perform image recognition, as shown in Figure 19. Figure 20 depicts the complete process of chess placement, beginning with the interface, locating the first chessman at the side of the chessboard, turning over the back-side chessman, finally finishing the placement.   Figure 20. Process of chessmen placement. (a) The random black chessman picked up and placed in the correct location, (b) a random back-side chessman is recognized and picked up, (c) the random back-side chessman is placed on the side and turned over, (d) the random back-side chessman is turned over and will be placed in front of the camera to be recognized, (e) the random back-side chessman is placed in its correct location, (f) another random red chessman is recognized and picked up, (g) the random red chessman is placed in its correct location.
In summary, correct recognition will have the correct picking and placing of the chessmen. The failure cases come from three points, the first: the chessmen 俥 and 傌 have the same radical of Chinese characters; the second: there may be an erroneous recognition of chessman 炮 with the same radical of 俥 and 傌; and the third: the strokes of these three chessmen are more likely to affect the recognition result. As a result, another auxiliary way may be included to eliminate these cases.

Conclusions
This paper proposes a system for chessman recognition and automatic placement. First, through the techniques of image processing and convolutional neural network technology, the character of arbitrarily placed chessman is recognized and its position is found. If there are back-side chessmen, the system will turn over those first and then perform image recognition. After obtaining the coordinates of the chessman and through coordinate transformation, the coordinates are The random black chessman picked up and placed in the correct location, (b) a random back-side chessman is recognized and picked up, (c) the random back-side chessman is placed on the side and turned over, (d) the random back-side chessman is turned over and will be placed in front of the camera to be recognized, (e) the random back-side chessman is placed in its correct location, (f) another random red chessman is recognized and picked up, (g) the random red chessman is placed in its correct location.
In summary, correct recognition will have the correct picking and placing of the chessmen. The failure cases come from three points, the first: the chessmen Sustainability 2020, 12, x FOR PEER REVIEW Table 5. Confuse matrix of red chessmen. When the camera captures the image, the coordinates of the chessman will be d actual ones due to the height of the chessman. Therefore, the real coordinates of th be corrected to obtain an accurate grab, as shown in Figure 14. Point D is the came C is the actual position of the chessman, and point B is the position of the chessman camera. The errors before and after the correction of the coordinates of the chessm Figure 15 and Table 6. Since the height h of the chessman is known, the actual c chessman can be obtained through the trigonometric function after finding the ca and its height H,

Actual Class
and 傌 have the same radical of Chinese characters; the second: there may be an erroneous recognition of chessman 炮 with the same radical of ility 2020, 12, x FOR PEER REVIEW 13 of 21 Table 5. Confuse matrix of red chessmen.

Actual Class
Predicted class hen the camera captures the image, the coordinates of the chessman will be different from the nes due to the height of the chessman. Therefore, the real coordinates of the chessman must ected to obtain an accurate grab, as shown in Figure 14. Point D is the camera position, point actual position of the chessman, and point B is the position of the chessman estimated by the . The errors before and after the correction of the coordinates of the chessman are shown in 15 and Table 6. Since the height h of the chessman is known, the actual coordinates of the an can be obtained through the trigonometric function after finding the camera position O height H, h p × and 傌; and the third: the strokes of these three chessmen are more likely to affect the recognition result. As a result, another auxiliary way may be included to eliminate these cases.

Conclusions
This paper proposes a system for chessman recognition and automatic placement. First, through the techniques of image processing and convolutional neural network technology, the character of arbitrarily placed chessman is recognized and its position is found. If there are back-side chessmen, the system will turn over those first and then perform image recognition. After obtaining the coordinates of the chessman and through coordinate transformation, the coordinates are transmitted to the robot arm to grab the chessman and place it at the correct location on the chessboard. Comparing the proposed method with several methods/approaches and improving the performance will be our future work. In the future, both the hardware and the functions of image vision and convolutional neural network technology can be improved to increase the recognition rate and speed and enhance the ability of the robot in playing chess. Then, if the software for playing chess is added, it will not only be a simple chessman placement system but also provide the function of playing chess with people. As a result, the proposed system can further enhance the elders' favor and develop their habit of playing chess to strengthen effective neural bonds between brain cells and reserve brain cognitive functions.