Streamlined Deep Learning Models for Move Prediction in Go-Game

: Due to the complexity of search space and move evaluation, the game of Go has been a long-standing challenge for artificial intelligence (AI) to achieve a high level of proficiency. It was not until DeepMind proposed the deep neural network and tree search algorithm AlphaGo in 2014 that an efficient learning algorithm was developed, marking a significant milestone in AI technology. In light of the key technologies in AI Computer Go, this work examines move prediction across different Go rankings and sophisticatedly develops two deep learning models by combining and extending the feature extraction methods of AlphaGo. Specifically, effective modules for neural networks are proposed to guide learning through complicated Go situations based on the Inception module in GoogLeNet and the Convolutional Block Attention Module (CBAM). Subsequently, the two models are combined by ensemble learning to improve generalization, and these streamlined models significantly reduce the number of model parameters to the scale of one hundred thousand. Experimental results show that our models achieve prediction accuracies of 46.9% and 50.8% on two different Go datasets, outperforming conventional models by significant margins. This work not only advances AI development in the Go-game but also offers an innovative approach to related studies.


Introduction
While improving the playing strength of Computer Go through AI technology is the primary focus, a research trend exploring move prediction has emerged [1][2][3].Move prediction labels all legal places by the probabilities that the next move, usually determined by experts, will happen.This work investigates the problem across different Go rankings.The output of move prediction can enhance the efficiency of game agents and can be extended to suggest the next move for a specific Go rank for learning purposes.Inspired by the Inception module in GoogLeNet [4], we developed modules to enhance the identification performance of different Go situations, with individual goals guiding the module modifications accordingly.Our model also refers to the attention mechanism of the Convolutional Block Attention Module (CBAM) [5], where the standard CBAM processes channel attention first and then spatial attention.However, we leverage these two attention modules separately and concatenate their results.Depending on different tasks and feature categories, channel attention is sometimes utilized while excluding spatial attention.Experimental results demonstrate that our model can efficiently and accurately capture the relationship between move positions and features, thereby improving prediction accuracy.
In addition, the limitations and performance of computing resources are crucial in practical applications.Therefore, the three neural network models developed in this work are lightweight, with the number of parameters kept below one million.The two proposed models show good performance in the experiments, and the ensemble result of the two models performs even better.In summary, our main contributions are as follows: (1) Feature extraction method: For move prediction in the game of Go, we customize the extracted feature planes, which include the current status of the Go board and territory, the last five moves, and the liberties (adjacent empty points of connected stones).
(2) Highly adaptive model: Based on the Inception module and CBAM, we develop models that are sensitive to different situations in the game of Go, thereby improving forecasting accuracy.(3) Lightweight design: Taking into account the limitations of computing resources for wide applications, we optimize the neural network architecture and significantly reduce the number of model parameters.Consequently, our model can be trained efficiently even with limited computing power.

Background and Existing Literature
A long-standing challenge in the game of Go is to develop Computer Go with the quality of a professional player.In the early days of Computer Go, the Native Bayes model provided a simplified approach based on the probability for move prediction [1], computing the probability for each move to forecast the most likely one.Subsequently, the maximum entropy method generated the best move by analyzing the relative frequencies of local board patterns in game records [1].To efficiently evaluate the current board situation, Bouzy and Helmstetter [6] presented the Monte Carlo method as an evaluation function for global search.However, this method was not attractive until it was combined with the Upper Confidence bounds for Trees (UCT) to balance exploration and exploitation in decision-making processes, resulting in the Monte Carlo Tree Search (MCTS) [7].
The game of Go is challenging for computers due to its search complexity and intricate board situations.However, top human players access the board situation visually, leading to research efforts that adopt deep Convolutional Neural Networks (CNNs) to decipher the Go board situation.Clark and Storkey [2] trained an 8-layer CNN using two Go datasets made by expert players and achieved an accuracy of 0.4437 in move prediction.Subsequently, Maddison et al. [8] created a 12-layer CNN to predict expert moves, reaching an accuracy of 0.5450, which is comparable to a 6-dan human player.Duc et al. [9] proposed a 5-layer CNN trained on approximately 600,000 board states to forecast the next move, and their work also suggested next moves by three-player ranks, beneficial for novice players.Furthermore, AlphaGo [10] and AlphaGo Zero [11], developed by DeepMind, marked significant milestones in the research study of AI Computer Go.These models not only represented major breakthroughs in AI but were also actively introduced into the training process of professional Go players [12].The AlphaGo network model combines a 13-layer CNN with MCTS and continues to learn against itself, achieving an impressive accuracy of 0.57.The subsequent version, AlphaGo Zero, is even more advanced in strategy and technology, entirely self-taught without relying on historical game records.The success of AlphaGo Zero presents an enormous potential for AI in exploring novel move patterns and strategies.At the same time, AlphaGo Zero reaches an accuracy of 0.6040 in move prediction.
The astonishing success of AlphaGo has spurred rapid development in Computer Go.Following AlphaGo, several open-source Computer Go programs such as MuGo [13], Minigo [14], and ELF OpenGo [15] have been launched.Leela Zero [16] leveraged the GPU computing power of volunteers and the AutoGTP program to participate in the distributed effort to recompute AlphaGo Zero weights.Subsequently, KataGo [17] employed a distributed training method to considerably reduce training time, making it one of the most powerful open-source Computer Go programs.In addition, residual networks have facilitated faster and deeper network training with a 28-layer residual network achieving a 4-dan level [18].Instead of using residual networks in standard reinforcement learning, Cazenave [19] improved mobile networks as an alternative to increase network depth for better results, having an accuracy of 0.6181 for move prediction in their experiments.Recent advances in neural networks have continued to introduce innovations for Computer Go [20,21].
On the other hand, more and more deep neural networks with complicated structures and a substantial number of parameters are proposed to improve accuracy.However, deploying such networks requires more powerful computing resources, which restricts their development in real-time and real-life applications.To address this issue, lightweight deep CNNs have been introduced.A lightweight deep CNN typically has a simpler network and can be deployed on devices with lower computing capabilities.These networks can be designed using methods such as parameter quantization, network pruning, and knowledge distillation to compress standard CNNs [22].
The MobileNet family is one of the typical lightweight deep CNNs to directly consider a lightweight structure.Instead of traditional convolution, MobileNets [23] adopted the depth-wise separable convolution to reduce the number of parameters while maintaining accuracy.MobileNetV2 [24] introduced inverted residuals and linear bottleneck modules, improving performance by incorporating a point-wise convolution layer.To further increase detection speed, MobileNetV3 [25] combined platform-aware network architecture search (NAS) and the NetAdapt algorithm for block-wise and layer-wise search, respectively.Moreover, ShuffleNet [26] is another kind of lightweight structure that improves feature information through channel shuffle and proposes point-wise group convolution instead of the traditional 1 × 1 convolution operation to reduce computational complexity.Based on ShuffleNet, ShuffleNetV2 [27] introduced the channel split operation to balance execution performance and forecasting accuracy.In particular, EfficientNet [28] employed the compound scaling method to balance several dimensions of the network, achieving optimal performance under different computing constraints.The design concepts and innovative structures of lightweight deep CNNs are not only suitable for limited computing environments but are also progressive to model compression and acceleration.
In this work, streamlined deep learning models are developed for the move prediction in Go-game, with lightweight designs that considerably reduce the number of network parameters.The rest of this paper is organized as follows: Section 3 presents the data source, preprocessing methods, feature design, and constructed models.Subsequently, several experiments and evaluations are conducted in Section 4. Finally, Section 5 draws our conclusions.

Data Source and Preprocessing
Our dataset comes from competition and contains dan and kyu ranking data with 100,160 and 118,500 records, respectively.The raw data are stored in CSV format, and several preprocessing steps are applied to facilitate the training and testing process.First, the records in CSV format are transformed into the standard SGF (Smart Game Format) format [29].To ensure the standardization of records and efficient parsing, we prepend each record with the message ";GM [1] FF [4] SZ [19]" as specified by the International Go Federation, where "GM [1]" denotes the Go-game, "FF [4]" is the file format, and "SZ [19]" defines a board size as 19 × 19.Recently, Gao et al. [30] presented the professional Go annotation dataset (PAGE), containing 98,525 games played by professional players and spans.In addition to extensive annotations, PAGE included a large amount of metadata and in-game statistics.However, it is not considered as our dataset because its records are derived exclusively from professional players.
Moreover, an issue is identified.The predicted move is sometimes the same as the last move, which may result from Pass moves.Ultimately, original records are retained without further modification in the preprocessing steps because the number of records with such issues is rare.After cleaning the dataset, Figures 1 and 2 demonstrate the distributions of the total number of moves in the records of the two datasets.From these figures, the observed number of moves in records is completely greater than 100, and most of them are between 200 and 300 where there are fewer records with more than 350 moves.
distributions of the total number of moves in the records of the two datasets.From these figures, the observed number of moves in records is completely greater than 100, and most of them are between 200 and 300 where there are fewer records with more than 350 moves.

Feature Design
Since the number of features greatly affects model performance, two groups with 18 and 10 feature planes (Table 1) are considered to develop streamlined deep learning models.Both groups contain common feature planes, including predicted player color and current board situation (i.e., empty positions and black and white stones), which are regarded as basic factors for analyzing board records.Moreover, the last five moves on board are salient for move prediction observed from conducted experiments, and they are integrated into a feature.An excessive number of feature planes can significantly slow down the training speed and convergence process.Based on our experience, stones with more than six liberties are generally less in danger.Therefore, the eight liberty feature planes used in AlphaGo are compressed into six and one for our models, which can balance forecasting performance with execution time.distributions of the total number of moves in the records of the two datasets.From these figures, the observed number of moves in records is completely greater than 100, and most of them are between 200 and 300 where there are fewer records with more than 350 moves.

Feature Design
Since the number of features greatly affects model performance, two groups with 18 and 10 feature planes (Table 1) are considered to develop streamlined deep learning models.Both groups contain common feature planes, including predicted player color and current board situation (i.e., empty positions and black and white stones), which are regarded as basic factors for analyzing board records.Moreover, the last five moves on board are salient for move prediction observed from conducted experiments, and they are integrated into a feature.An excessive number of feature planes can significantly slow down the training speed and convergence process.Based on our experience, stones with more than six liberties are generally less in danger.Therefore, the eight liberty feature planes used in AlphaGo are compressed into six and one for our models, which can balance forecasting performance with execution time.

Feature Design
Since the number of features greatly affects model performance, two groups with 18 and 10 feature planes (Table 1) are considered to develop streamlined deep learning models.Both groups contain common feature planes, including predicted player color and current board situation (i.e., empty positions and black and white stones), which are regarded as basic factors for analyzing board records.Moreover, the last five moves on board are salient for move prediction observed from conducted experiments, and they are integrated into a feature.An excessive number of feature planes can significantly slow down the training speed and convergence process.Based on our experience, stones with more than six liberties are generally less in danger.Therefore, the eight liberty feature planes used in AlphaGo are compressed into six and one for our models, which can balance forecasting performance with execution time.A feature can be composed of several feature planes, and Figure 4 illustrates an example of 10 feature planes.Figure 4a,b represent whether the next move is black or white stone.Figure 4c-e are the current board situation with empty, black, and white positions, respectively.The feature of the last five moves is depicted in Figure 4f.Based on our observations and evaluations, liberty is important in board analysis.Therefore, a feature plane for a black/white stone is included to represent liberty (Figure 4g,h), where each stone on the board is replaced by its liberty value.In this manner, each position in the feature plane is represented by a number between 1 and 6, allowing us to identify dangerous areas.Furthermore, a feature is designed for the territory state, indicating areas certainly controlled by black/white stone in the current board situation, as shown in Figure 4i,j.The concepts of territory and influence in Go are crucial for representing a player's actual control power and potential impact on the board.A feature can be composed of several feature planes, and Figure 4 illustrates an example of 10 feature planes.Figure 4a,b represent whether the next move is black or white stone.Figure 4c-e are the current board situation with empty, black, and white positions, respectively.The feature of the last five moves is depicted in Figure 4f.Based on our observations and evaluations, liberty is important in board analysis.Therefore, a feature plane for a black/white stone is included to represent liberty (Figure 4g,h), where each stone on the board is replaced by its liberty value.In this manner, each position in the feature plane is represented by a number between 1 and 6, allowing us to identify dangerous areas.Furthermore, a feature is designed for the territory state, indicating areas certainly controlled by black/white stone in the current board situation, as shown in Figure 4i,j.The concepts of territory and influence in Go are crucial for representing a player's actual control power and potential impact on the board.onto a 19 × 19 matrix, where we label empty positions as 0, black stones as 1, and white stones as 2. The figure on the far right depicts the predicted move.A feature can be composed of several feature planes, and Figure 4 illustrates an example of 10 feature planes.Figure 4a,b represent whether the next move is black or white stone.Figure 4c-e are the current board situation with empty, black, and white positions, respectively.The feature of the last five moves is depicted in Figure 4f.Based on our observations and evaluations, liberty is important in board analysis.Therefore, a feature plane for a black/white stone is included to represent liberty (Figure 4g,h), where each stone on the board is replaced by its liberty value.In this manner, each position in the feature plane is represented by a number between 1 and 6, allowing us to identify dangerous areas.Furthermore, a feature is designed for the territory state, indicating areas certainly controlled by black/white stone in the current board situation, as shown in Figure 4i,j.The concepts of territory and influence in Go are crucial for representing a player's actual control power and potential impact on the board.1, where (a,b) are player colors, (c-e) illustrate board situations, (f) is the last five moves, (g,h) represent liberties, and (i,j) are territory states.

Model Construction
The model architecture constructed in this work largely consists of a customized combination of Inception and Attention modules.This design offers much flexibility in handling various features and tasks, allowing us to adjust parameters as necessary.We develop three models for the task of move prediction.The first is the Incep-Attention model, which combines the Inception and Attention modules (Figure 5).The multiscale design of the Inception module enables the model to learn information at different scales, while the Attention module improves the model's ability to focus on key features.In the second half of the feature extraction process, the final output is generated by connecting a convolution layer with a single filter to a Softmax layer, instead of using a dense layer.This design maintains the integrity of special features, enabling the model to understand both global situations and specific details on the boar.Our first model contains a total of 867,551 parameters.
This design maintains the integrity of special features, enabling the model to understand both global situations and specific details on the boar.Our first model contains a total of 867,551 parameters.
The second is called the Up-Down model, whose design concept involves performing a dimensionality reduction after raising the dimensions, as shown in Figure 6.In this model, the number of parameters is significantly reduced to 106,257, which is one-eighth of our first model.There are three points in our strategy to raise and then reduce the dimensions: Firstly, increasing the size of the feature map facilitates the model to learn more complex and abstract features, thereby improving the overall representation of data.Secondly, performing dimensionality reduction at a deeper layer not only reduces the number of model parameters and the risk of overfitting but also improves computing and forecasting performances.Finally, by applying Channel Attention and Skip Connections, the model retains key features for move prediction while discarding noncritical information.
To further achieve a lightweight design, the number of filters in the Inception module of the Up-Down model is reduced to one-quarter of those in the Incep-Attention model.In addition, the Dropout value is set to a smaller level because, based on our experience, it facilitates the model to extract features that contribute to robustness.The second is called the Up-Down model, whose design concept involves performing a dimensionality reduction after raising the dimensions, as shown in Figure 6.In this model, the number of parameters is significantly reduced to 106,257, which is one-eighth of our first model.There are three points in our strategy to raise and then reduce the dimensions: Firstly, increasing the size of the feature map facilitates the model to learn more complex and abstract features, thereby improving the overall representation of data.Secondly, performing dimensionality reduction at a deeper layer not only reduces the number of model parameters and the risk of overfitting but also improves computing and forecasting performances.Finally, by applying Channel Attention and Skip Connections, the model retains key features for move prediction while discarding noncritical information.
To further achieve a lightweight design, the number of filters in the Inception module of the Up-Down model is reduced to one-quarter of those in the Incep-Attention model.In addition, the Dropout value is set to a smaller level because, based on our experience, it facilitates the model to extract features that contribute to robustness.
Each of the two constructed models has its advantages in interpreting the board situation due to their different features and structures.Therefore, the third model combines the decisions of the two models to improve overall performance by ensemble learning, as shown in Figure 7.There are three main classes of ensemble learning methods: bagging, stacking, and boosting.The stacking method is to explore a space of multiple classification or regression models for the same problem, which is proper here due to our two constructed models.Its concept is to build different learners to generate intermediate predictions and then combine these predictions using a meta-learner, a new model learning from the intermediate predictions for the same target.Numerous experiments and evaluations are conducted using conventional machine learning models (e.g., Logistic regression, Random Forest, SVM, and KNN) as meta-learners.Ultimately, the soft voting method is employed because of its better prediction performance.Soft voting simply returns the move position as the argmax of the sum of prediction probabilities, reducing the risk of overfitting compared to a more complex meta-learner.Our experimental results reveal that the ensemble model can improve forecasting performance with hardly increasing execution workload.Each of the two constructed models has its advantages in interpreting the board situation due to their different features and structures.Therefore, the third model combines the decisions of the two models to improve overall performance by ensemble learning, as shown in Figure 7.There are three main classes of ensemble learning methods: bagging, stacking, and boosting.The stacking method is to explore a space of multiple classification or regression models for the same problem, which is proper here due to our two constructed models.Its concept is to build different learners to generate intermediate predictions and then combine these predictions using a meta-learner, a new model learning from the intermediate predictions for the same target.Numerous experiments and evaluations are conducted using conventional machine learning models (e.g., Logistic regression, Random Forest, SVM, and KNN) as meta-learners.Ultimately, the soft voting method is employed because of its better prediction performance.Soft voting simply returns the move position as the argmax of the sum of prediction probabilities, reducing the risk of overfitting compared to a more complex meta-learner.Our experimental results reveal that the ensemble model can improve forecasting performance with hardly increasing execution workload.

Results
This work develops three models: the Incep-Attention model, the Up-Down model, and the ensemble model that combines the two former models, all with lightweight designs.Two testing datasets in the game of Go, including dan and 10 kyu ranks, are subse-  Each of the two constructed models has its advantages in interpreting the board situation due to their different features and structures.Therefore, the third model combines the decisions of the two models to improve overall performance by ensemble learning, as shown in Figure 7.There are three main classes of ensemble learning methods: bagging, stacking, and boosting.The stacking method is to explore a space of multiple classification or regression models for the same problem, which is proper here due to our two constructed models.Its concept is to build different learners to generate intermediate predictions and then combine these predictions using a meta-learner, a new model learning from the intermediate predictions for the same target.Numerous experiments and evaluations are conducted using conventional machine learning models (e.g., Logistic regression, Random Forest, SVM, and KNN) as meta-learners.Ultimately, the soft voting method is employed because of its better prediction performance.Soft voting simply returns the move position as the argmax of the sum of prediction probabilities, reducing the risk of overfitting compared to a more complex meta-learner.Our experimental results reveal that the ensemble model can improve forecasting performance with hardly increasing execution workload.

Results
This work develops three models: the Incep-Attention model, the Up-Down model, and the ensemble model that combines the two former models, all with lightweight designs.Two testing datasets in the game of Go, including dan and 10 kyu ranks, are subsequently used to evaluate the accuracy of move prediction.Top1 and Top5 are two evaluation metrics, where the former is the accuracy of move prediction and the latter represents the accuracy of five given predictions containing the ground truth.Table 2 summarizes the results compared.
In the training phase of Table 2, the IA model has better accuracy than the UD model under the same training data conditions (i.e., kyu and dan) and the same number of feature planes (i.e., 10 and 18).This is attributed to the more complex structure of the IA model.The best training accuracy of 0.5060 in Table 2 results from the IA model with kyu rank data and 10 feature planes.For the comparison using dan testing data, the IA model

Results
This work develops three models: the Incep-Attention model, the Up-Down model, and the ensemble model that combines the two former models, all with lightweight designs.Two testing datasets in the game of Go, including dan and 10 kyu ranks, are subsequently used to evaluate the accuracy of move prediction.Top1 and Top5 are two evaluation metrics, where the former is the accuracy of move prediction and the latter represents the accuracy of five given predictions containing the ground truth.Table 2 summarizes the results compared.
In the training phase of Table 2, the IA model has better accuracy than the UD model under the same training data conditions (i.e., kyu and dan) and the same number of feature planes (i.e., 10 and 18).This is attributed to the more complex structure of the IA model.The best training accuracy of 0.5060 in Table 2 results from the IA model with kyu rank data and 10 feature planes.For the comparison using dan testing data, the IA model trained with dan data and 10 feature planes achieves the best Top1 and Top5 accuracies, with values of 0.4581 and 0.7838, respectively.For kyu testing data, the best Top1 accuracy is 0.4984 given by the IA model with 18 feature planes, and the IA models also exhibit better Top5 accuracies.Although the forecasting performance of the UD model is worse than that of the IA model, they are within shouting distance under the same conditions.

Model Name Training
model needs stronger generalization abilities to tackle the challenge.The main concept behind ensemble learning is to combine the outputs of diverse models to generate more precise predictions and improve generalization, and ensemble models in Table 2 certainly achieve better performances.Figure 8 compares the TopN performance of models in Table 2 on two testing datasets, using IA_kyu-10 + UD_dan-18 as the ensemble model.As N increases, the accuracy of the TopN prediction approaches 1, with a sharp increase occurring from Top1 to Top5.In both figures, two performance curves of IA and UD models are close, indicating their similar prediction performance.Particularly, compared to the kyu testing data in Figure 8b, the model performances on dan data are more widely distributed for smaller N, suggesting greater uncertainty in predicting the move position on dan data.
generate more precise predictions and improve generalization, and ensemble models in Table 2 certainly achieve better performances.
Figure 8 compares the TopN performance of models in Table 2 on two testing datasets, using IA_kyu-10 + UD_dan-18 as the ensemble model.As N increases, the accuracy of the TopN prediction approaches 1, with a sharp increase occurring from Top1 to Top5.In both figures, two performance curves of IA and UD models are close, indicating their similar prediction performance.Particularly, compared to the kyu testing data in Figure 8b, the model performances on dan data are more widely distributed for smaller N, suggesting greater uncertainty in predicting the move position on dan data.
Experiments are conducted to compare our models with other models, including Al-phaGo and a lightweight network MobileNet [31].Table 3 summarizes the experimental results and the number of model parameters.For both datasets used, the performances of our three models are mostly better than those of AlphaGo-Like and MobileNet, except for the Up-Down model on dan testing data.It is no surprise that our ensemble model has the best forecasting performance on both datasets.Although the Up-Down model has the worst performance among our models, it has the smallest number of parameters, which is even an order of magnitude smaller than MobileNet.Experiments are conducted to compare our models with other models, including AlphaGo and a lightweight network MobileNet [31].Table 3 summarizes the experimental results and the number of model parameters.For both datasets used, the performances of our three models are mostly better than those of AlphaGo-Like and MobileNet, except for the Up-Down model on dan testing data.It is no surprise that our ensemble model has the best forecasting performance on both datasets.Although the Up-Down model has the worst performance among our models, it has the smallest number of parameters, which is even an order of magnitude smaller than MobileNet.In a column, the bold score and value are the best accuracy and number of parameters, respectively.

Discussion
This work deals with the move prediction in the game of Go by three deep learning models.First, feature engineering is the most crucial process in developing a predictive learning model for the next move, and the considered features include the current statuses of the Go board and territory, the last five moves, and liberties.Two feature groups containing 18 and 10 feature planes are designed to emphasize the requirements of board record analysis and execution performance, respectively.
Next, two networks, called Incep-Attention and Up-Down models, are developed to concern different board situations and predict the next move in Go-game based on the classic Inception and Attention modules.Numerous experimental results across different Go rankings reveal good forecasting performances for both constructed models.The Incep-Attention model has a better performance than the other two compared models, indicating that our model can effectively capture salient information from Go records by designed feature planes.Moreover, the Up-Down model is the most lightweight design with around one hundred thousand parameters, an order of magnitude less than MobileNet.In our experiments, there is little difference in accuracy between the two lightweight models, which presents the great potential of lightweight networks in move prediction.
In addition, the third model is an ensemble result created by selecting the combination with the best training accuracy from various pairings of the two mentioned models.Instead of using a meta-learner, a simple soft voting method is employed to combine the prediction results of the two models.The ensemble model can be efficient if the combined models have distinct yet complementary characters, allowing for a more comprehensive understanding of a board game situation.Our experimental results show that the forecasting performances of the combined results from the Incep-Attention and Up-Down models are usually superior to that of either model alone.
Two modules and network architectures developed in this work can be further extended as small-scale networks to forecast moves in similar board games.However, the feature planes have to be customized according to game characteristics, e.g., size of board game, game rule, and board state.For example, each board state is encoded by three feature planes for the move prediction in the Gomoku board game [32].

Conclusions
To tackle the move prediction problem, we sophisticatedly develop three lightweight deep neural networks, each with fewer than one million parameters.Their outstanding prediction performances are exhibited through extensive experiments.The lightweight design with great accuracy facilitates real-time and hardware-limited applications.Moreover, forecasting the next move at different ranks can help Go players learn at their corresponding levels.Our approach to designing feature planes can also be beneficial in constructing AI for other board games.However, the considerable uncertainty in predicting the next move remains challenging due to varying player levels.To further improve the accuracy of move prediction, it is crucial to design more diverse and representative feature planes.Combining distinct models with an appropriate strategy can help to grasp the nuanced situations of board games.Future work is to construct models based on record datasets with more rank variations.Additionally, since a feature can be composed of several feature planes, visualizing their activations can aid in error analysis and assessing feature importance.

Figure 1 .
Figure 1.Distribution of the total number of moves in dan records.

Figure 2 .
Figure 2. Distribution of the total number of moves in kyu records.

Figure 3
demonstrates three board representations for a record.The figure on the far left in Figure 3 is decoded from a record in SGF format.The middle figure is its projection

Figure 1 .
Figure 1.Distribution of the total number of moves in dan records.

Figure 1 .
Figure 1.Distribution of the total number of moves in dan records.

Figure 2 .
Figure 2. Distribution of the total number of moves in kyu records.

Figure 3
demonstrates three board representations for a record.The figure on the far left in Figure 3 is decoded from a record in SGF format.The middle figure is its projection

Figure 2 .
Figure 2. Distribution of the total number of moves in kyu records.

Figure 3
demonstrates three board representations for a record.The figure on the far left in Figure 3 is decoded from a record in SGF format.The middle figure is its projection onto a 19 × 19 matrix, where we label empty positions as 0, black stones as 1, and white stones as 2. The figure on the far right depicts the predicted move.

Figure 3 .
Figure 3. Diagram of a Go board record in move prediction task.

Figure 4 .
Figure 4. Diagram for 10 feature planes in Table1, where (a,b) are player colors, (c-e) illustrate board situations, (f) is the last five moves, (g,h) represent liberties, and (i,j) are territory states.

Figure 3 .
Figure 3. Diagram of a Go board record in move prediction task.

Figure 3 .
Figure 3. Diagram of a Go board record in move prediction task.

Figure 4 .
Figure 4. Diagram for 10 feature planes in Table1, where (a,b) are player colors, (c-e) illustrate board situations, (f) is the last five moves, (g,h) represent liberties, and (i,j) are territory states.

Figure 4 .
Figure 4. Diagram for 10 feature planes in Table1, where (a,b) are player colors, (c-e) illustrate board situations, (f) is the last five moves, (g,h) represent liberties, and (i,j) are territory states.

Figure 5 .
Figure 5. Conceptual framework of the Incep-Attention model.Figure 5. Conceptual framework of the Incep-Attention model.

Figure 5 .
Figure 5. Conceptual framework of the Incep-Attention model.Figure 5. Conceptual framework of the Incep-Attention model.

Figure 6 .
Figure 6.Conceptual framework of the Up-Down model.

Figure 6 .
Figure 6.Conceptual framework of the Up-Down model.

Figure 8 .
Figure 8.Comparison of TopN accuracies for models on (a) dan and (b) kyu testing data separately.

Figure 8 .
Figure 8.Comparison of TopN accuracies for models on (a) dan and (b) kyu testing data separately.

Table 1 .
Two groups with 18 and 10 feature planes.

Name No. of Feature Planes 18 10
Electronics 2024, 13, x FOR PEER REVIEW 5 of 12 onto a 19 × 19 matrix, where we label empty positions as 0, black stones as 1, and white stones as 2. The figure on the far right depicts the predicted move.

Table 1 .
Two groups with 18 and 10 feature planes.

Table 1 .
Two groups with 18 and 10 feature planes.

Table 2 .
Comparison of training and testing accuracies for constructed models.

Table 3 .
Comparison of accuracies and number of parameters for Computer Go models.In a column, the bold score and value are the best accuracy and number of parameters, respectively.

Table 3 .
Comparison of accuracies and number of parameters for Computer Go models.