# Determining Chess Game State from an Image

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Previous Work

## 3. Dataset

## 4. Proposed Method

#### 4.1. Board Localisation

#### 4.1.1. Finding the Intersection Points

#### 4.1.2. Computing the Homography

**H**that we find using a RANSAC-based algorithm that is robust even when lines are missing (or additional lines are detected from straight edges elsewhere in the image) and shall be described below:

- Randomly sample four intersection points that lie on two distinct horizontal and vertical lines (these points describe a rectangle on the chessboard).
- Compute the homography matrix
**H**mapping these four points onto a rectangle of width ${s}_{x}=1$ and height ${s}_{y}=1$. Here, ${s}_{x}$ and ${s}_{y}$ are the horizontal and vertical scale factors, as illustrated in Figure 3. - Project all other intersection points using
**H**and count the number of inliers; these are points explained by the homography up to a small tolerance $\gamma $ (i.e., the Euclidean distance from a given warped point $(x,y)$ to the point $\left(\mathrm{round}\right(x),\mathrm{round}(y\left)\right)$ is less than $\gamma $). - If the size of the inlier set is greater than that of the previous iteration, retain this inlier set and homography matrix
**H**instead. - Repeat from step 1 for ${s}_{x}=2,3,\dots ,8$ and ${s}_{y}=2,3,\dots .,8$ to determine how many chess squares the selected rectangle encompasses.
- Repeat from step 1 until at least half of the intersection points are inliers.
- Recompute the least squared error solution to the homography matrix
**H**using all identified inliers.

**H**, obtaining a result like in Figure 4a. The intersection points are quantised so that their x and y coordinates are whole numbers because each chess square is now of unit length. Let ${x}_{\mathrm{min}}$ and ${x}_{\mathrm{max}}$ denote the minimum and maximum of the warped coordinates’ x-components, and similarly ${y}_{\mathrm{min}}$ and ${y}_{\mathrm{max}}$ denote the same concept in the vertical direction.

#### 4.2. Occupancy Classification

#### 4.3. Piece Classification

#### 4.4. Fine-Tuning to Unseen Chess Sets

## 5. Results and Discussion

#### 5.1. Board Localisation

#### 5.2. Occupancy and Piece Classification

#### 5.3. End-to-End Pipeline

#### 5.4. Unseen Chess Set

## 6. Summary and Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Wölflein, G.; Arandjelović, O. Dataset of Rendered Chess Game State Images; OSF, 2021. [Google Scholar] [CrossRef]
- Urting, D.; Berbers, Y. MarineBlue: A Low-Cost Chess Robot. In International Conference Robotics and Applications; IASTED/ACTA Press: Salzburg, Austria, 2003. [Google Scholar]
- Banerjee, N.; Saha, D.; Singh, A.; Sanyal, G. A Simple Autonomous Chess Playing Robot for Playing Chess against Any Opponent in Real Time. In International Conference on Computational Vision and Robotics; Institute for Project Management: Bhubaneshwar, India, 2012. [Google Scholar]
- Chen, A.T.Y.; Wang, K.I.K. Computer Vision Based Chess Playing Capabilities for the Baxter Humanoid Robot. In Proceedings of the International Conference on Control, Automation and Robotics, Hong Kong, China, 28–30 April 2016. [Google Scholar]
- Khan, R.A.M.; Kesavan, R. Design and Development of Autonomous Chess Playing Robot. Int. J. Innov. Sci. Eng. Technol.
**2014**, 1, 1–4. [Google Scholar] - Chen, A.T.Y.; Wang, K.I.K. Robust Computer Vision Chess Analysis and Interaction with a Humanoid Robot. Computers
**2019**, 8, 14. [Google Scholar] [CrossRef][Green Version] - Gonçalves, J.; Lima, J.; Leitão, P. Chess Robot System: A Multi-Disciplinary Experience in Automation. In Spanish Portuguese Congress on Electrical Engineering; AEDIE: Marbella, Spain, 2005. [Google Scholar]
- Sokic, E.; Ahic-Djokic, M. Simple Computer Vision System for Chess Playing Robot Manipulator as a Project-Based Learning Example. In Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Sarajevo, Bosnia and Herzegovina, 16–19 December 2008. [Google Scholar]
- Wang, V.; Green, R. Chess Move Tracking Using Overhead RGB Webcam. In Proceedings of the International Conference on Image and Vision Computing New Zealand, Wellington, New Zealand, 27–29 November 2013. [Google Scholar]
- Hack, J.; Ramakrishnan, P. CVChess: Computer Vision Chess Analytics. 2014. Available online: https://cvgl.stanford.edu/teaching/cs231a_winter1415/prev/projects/chess.pdf (accessed on 30 May 2021).
- Ding, J. ChessVision: Chess Board and Piece Recognition. 2016. Available online: https://web.stanford.edu/class/cs231a/prev_projects_2016/CS_231A_Final_Report.pdf (accessed on 30 May 2021).
- Danner, C.; Kafafy, M. Visual Chess Recognition. 2015. Available online: https://web.stanford.edu/class/ee368/Project_Spring_1415/Reports/Danner_Kafafy.pdf (accessed on 30 May 2021).
- Xie, Y.; Tang, G.; Hoff, W. Chess Piece Recognition Using Oriented Chamfer Matching with a Comparison to CNN. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, NV, USA, 12–15 March 2018. [Google Scholar]
- Czyzewski, M.A.; Laskowski, A.; Wasik, S. Chessboard and Chess Piece Recognition with the Support of Neural Networks. Found. Comput. Decis. Sci.
**2020**, 45, 257–280. [Google Scholar] [CrossRef] - Mehta, A.; Mehta, H. Augmented Reality Chess Analyzer (ARChessAnalyzer). J. Emerg. Investig.
**2020**, 2. [Google Scholar] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef] - Tam, K.; Lay, J.; Levy, D. Automatic Grid Segmentation of Populated Chessboard Taken at a Lower Angle View. In Proceedings of the Digital Image Computing: Techniques and Applications, Canberra, Australia, 1–3 December 2008. [Google Scholar]
- Neufeld, J.E.; Hall, T.S. Probabilistic Location of a Populated Chessboard Using Computer Vision. In Proceedings of the IEEE International Midwest Symposium on Circuits and Systems, Seattle, WA, USA, 1–4 August 2010. [Google Scholar]
- Kanchibail, R.; Suryaprakash, S.; Jagadish, S. Chess Board Recognition. 2016. Available online: http://vision.soic.indiana.edu/b657/sp2016/projects/rkanchib/paper.pdf (accessed on 30 May 2021).
- Xie, Y.; Tang, G.; Hoff, W. Geometry-Based Populated Chessboard Recognition. In International Conference on Machine Vision; SPIE: Munich, Germany, 2018. [Google Scholar]
- Matuszek, C.; Mayton, B.; Aimi, R.; Deisenroth, M.P.; Bo, L.; Chu, R.; Kung, M.; LeGrand, L.; Smith, J.R.; Fox, D. Gambit: An Autonomous Chess-Playing Robotic System. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011. [Google Scholar]
- Wei, Y.A.; Huang, T.W.; Chen, H.T.; Liu, J. Chess Recognition from a Single Depth Image. In Proceedings of the IEEE International Conference on Multimedia and Expo, Hong Kong, China, 10–14 July 2017. [Google Scholar]
- Hou, J. Chessman Position Recognition Using Artificial Neural Networks. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.4390&rep=rep1&type=pdf (accessed on 30 May 2021).
- Bilalić, M.; Langner, R.; Erb, M.; Grodd, W. Mechanisms and Neural Basis of Object and Pattern Recognition. J. Exp. Psychol.
**2010**, 139, 728. [Google Scholar] [CrossRef] [PubMed][Green Version] - Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell.
**1986**, 679–698. [Google Scholar] [CrossRef] - Duda, R.O.; Hart, P.E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Commun. ACM
**1972**, 15, 11–15. [Google Scholar] [CrossRef] - Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In International Conference on Knowledge Discovery and Data Mining; AAAI Press: Portland, OR, USA, 1996. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]

**Figure 3.**Four intersection points are projected onto the warped grid. The optimal values for the scale factors s

_{x}and s

_{y}are chosen based on how many other points would be explained by that choice, in order to determine the actual number of horizontal and vertical chess squares in the rectangular region from the original image. In this example, the algorithm finds s

_{x}= 3 and s

_{y}= 2.

**Figure 4.**Horizontal gradient intensities calculated on the warped image in order to detect vertical lines. The red dots overlaid on each image correspond to the intersection points found previously. Here, x

_{max}− x

_{min}= 7 because there are eight columns of points instead of nine (similarly, the topmost horizontal line will be corrected by looking at the vertical gradient intensities).

**Figure 5.**An example illustrating why an immediate piece classification approach is prone to reporting false positives. Consider the square marked in green. Its bounding box for piece classification (marked in white) must be quite tall to accomodate tall pieces like a queen or king (the box must be at least as tall as the queen in the adjacent square on the left). The resulting sample contains almost the entire rook of the square behind, leading to a false positive.

**Figure 6.**Samples for occupancy classification generated from the running example chessboard image. The squares are cropped with a 50% increase in width and height to include contextual information.

**Figure 7.**Architecture of the CNN $(100,3,3,3)$ network for occupancy classification. The input is a three-channel RGB image with $100\times 100$ pixels. The first two convolutional layers (yellow) have a kernel size of $5\times 5$ and stride 1 and the final convolutional layer has a kernel size of $3\times 3$. Starting with 16 filters in the first convolutional layer, the number of channels is doubled in each subsequent layer. Each convolutional layer uses the rectified linear unit (ReLU) activation function and is followed by a max pooling layer with a $2\times 2$ kernel and stride of 2. Finally, the output of the last pooling layer is reshaped to a 640,000-dimensional vector that passes through two fully connected ReLU-activated layers before reaching the final fully connected layer with softmax activation.

**Figure 8.**A random selection of six samples of white queens in the training set. Notice that the square each queen is located on is always in the bottom left of the image and of uniform dimensions across all samples.

**Figure 9.**The two images of the unseen chess set used for fine-tuning the chess recognition system. The images require no labels because they show the starting position from each player’s perspective, thus the chess position is known. Note that unlike the large dataset used for initial training, this dataset contains photos of a real chessboard, as opposed to rendered images.

**Figure 10.**The augmentation pipeline applied to an input image (left). Each output looks different due to the random parameter selection.

**Figure 11.**Confusion matrix of the per-square predictions on the test set. Non-zero entries are highlighted in grey. The final row/column represents empty squares. Chessboard samples whose corners were not detected correctly are ignored here.

**Figure 12.**Inference time benchmarks of the chess recognition pipeline on the test set, averaged per sample. The error bars indicate the standard deviation. All benchmarks were carried out on the same machine, although the data for the trial labelled cpu was gathered without gpu acceleration.

**Table 1.**Performance of the trained occupancy classifiers. Models prefixed with “cnn” are vanilla CNNs where the 4-tuple denotes the side length of the square input size in pixels, the number of convolution layers, the number of pooling layers, and the number of fully connected layers. The check mark in the left column indicates whether the input samples contained contextual information (cropped to include part of the adjacent squares). We report the total number of misclassifications on the validation set (consisting of 9346 samples) in the last column. The differences between training and validation accuracies indicate no overfitting.

Model | # Trainable Parameters | Train Accuracy | Val Accuracy | Val Errors | |
---|---|---|---|---|---|

✓ | ResNet [30] | 1.12 × 10^{7} | 99.93% | 99.96% | 4 |

✓ | VGG [29] | 1.29 × 10^{8} | 99.96% | 99.95% | 5 |

✗ | VGG [29] | 1.29 × 10^{8} | 99.93% | 99.94% | 6 |

✗ | ResNet [30] | 1.12 × 10^{7} | 99.94% | 99.90% | 9 |

✓ | AlexNet [16] | 5.7 × 10^{7} | 99.74% | 99.80% | 19 |

✗ | AlexNet [16] | 5.7 × 10^{7} | 99.76% | 99.76% | 22 |

✓ | CNN (100, 3, 3, 3) | 6.69 × 10^{6} | 99.70% | 99.71% | 27 |

✓ | CNN (100, 3, 3, 2) | 6.44 × 10^{6} | 99.70% | 99.70% | 28 |

✗ | CNN (100, 3, 3, 2) | 6.44 × 10^{6} | 99.61% | 99.64% | 34 |

✓ | CNN (50, 2, 2, 3) | 4.13 × 10^{6} | 99.62% | 99.59% | 38 |

✓ | CNN (50, 3, 1, 2) | 1.86 × 10^{7} | 99.67% | 99.56% | 41 |

✓ | CNN (50, 3, 1, 3) | 1.88 × 10^{7} | 99.66% | 99.56% | 41 |

✓ | CNN (50, 2, 2, 2) | 3.88 × 10^{6} | 99.64% | 99.54% | 43 |

✗ | CNN (50, 2, 2, 3) | 4.13 × 10^{6} | 99.57% | 99.52% | 45 |

✗ | CNN (100, 3, 3, 3) | 6.69 × 10^{6} | 99.55% | 99.50% | 47 |

✗ | CNN (50, 3, 1, 2) | 1.86 × 10^{7} | 99.44% | 99.50% | 47 |

✗ | CNN (50, 2, 2, 2) | 3.88 × 10^{6} | 99.54% | 99.44% | 52 |

✗ | CNN (50, 3, 1, 3) | 1.88 × 10^{7} | 99.41% | 99.39% | 57 |

Model | # Trainable Parameters | Train Accuracy | Val Accuracy | Val Errors |
---|---|---|---|---|

InceptionV3 [32] | 2.44 × 10^{7} | 99.98% | 100.00% | 0 |

VGG [29] | 1.29 × 10^{8} | 99.84% | 99.94% | 2 |

ResNet [30] | 1.12 × 10^{7} | 99.93% | 99.91% | 3 |

AlexNet [16] | 5.71 × 10^{7} | 99.51% | 99.02% | 31 |

CNN (100, 3, 3, 2) | 1.41 × 10^{7} | 99.62% | 96.94% | 97 |

CNN (100, 3, 3, 3) | 1.44 × 10^{7} | 99.49% | 99.49% | 98 |

**Table 3.**Performance of the chess recognition pipeline on the train, validation, and test datasets, as well as the fine-tuned pipeline on the unseen chess set.

Rendered Dataset | Unseen Chess Set | ||||
---|---|---|---|---|---|

Metric | Train | Val | Test | Train | Test |

mean number of incorrect squares per board | 0.27 | 0.03 | 0.15 | 0.00 | 0.11 |

percentage of boards predicted with no mistakes | 94.77% | 97.95% | 93.86% | 100.00% | 88.89% |

percentage of boards predicted with ≤1 mistake | 99.14% | 99.32% | 99.71% | 100.00% | 100.00% |

per-square error rate | 0.42% | 0.05% | 0.23% | 0.00% | 0.17% |

per-board corner detection accuracy | 99.59% | 100.00% | 99.71% | 100.00% | 100.00% |

per-square occupancy classification accuracy | 99.81% | 99.97% | 99.92% | 100.00% | 99.88% |

per-square piece classification accuracy | 99.99% | 99.99% | 99.99% | 100.00% | 99.94% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wölflein, G.; Arandjelović, O. Determining Chess Game State from an Image. *J. Imaging* **2021**, *7*, 94.
https://doi.org/10.3390/jimaging7060094

**AMA Style**

Wölflein G, Arandjelović O. Determining Chess Game State from an Image. *Journal of Imaging*. 2021; 7(6):94.
https://doi.org/10.3390/jimaging7060094

**Chicago/Turabian Style**

Wölflein, Georg, and Ognjen Arandjelović. 2021. "Determining Chess Game State from an Image" *Journal of Imaging* 7, no. 6: 94.
https://doi.org/10.3390/jimaging7060094