Author Contributions
Conceptualization, Methodology, Writing—original draft, C.X. and R.Z.; Project administration, C.X.; Software, Writing—review and editing, K.W.; Data curation, M.Z.; Software, X.L.; Project administration, Writing—original draft, J.W. All authors have read and agreed to the published version of the manuscript.
Appendix A. Concepts and Definitions of the EdgeQuad Representations
For a Dots-and-Boxes board consisting of an
grid, the input is represented as a tensor of shape
. Each box has four edges corresponding to four channels. The state of each edge is either 0 (unoccupied) or 1 (occupied). Across all EdgeQuad configurations, the input is structured as a spatial tensor
. Every model defines the foundational state of the four edges of a given box—right (
r), up (
u), left (
l), and down (
d)—where each edge state
is binary: 0 for unoccupied and 1 for occupied.
From this shared grid foundation, we evaluate three EdgeQuad variations, which differ in their channel depth (C) and contextual encoding:
EdgeQuad4: (, )
This model uses only the four fundamental edge channels. Each box is represented by the binary states of its right, up, left, and down edges.
EdgeQuad9: (, )
This variant appends five additional channels to the base four edges. The five channels represent the number of occupied edges in the box, where the degree
indicates the number of occupied edges. For a box with degree
d, the value at channel
is 1, while the other four channels are 0.
EdgeQuad44: (, )
For a Dots-and-Boxes board consisting of an N × N grid, each box has four edges, each described by an 11-dimensional vector. A total of 44 channels represent the four edges of a box, with the channel order determined by the encoding. Edge types are defined by the edge state and the number of occupied edges in adjacent boxes. An outer layer of boxes is added to describe the board boundary, with the boundary edges of the added boxes set to 1.
Dimension 1: The edge state (0 for unoccupied, 1 for occupied).
Dimensions 2–6: The one-hot encoding of the number of occupied edges in the left or up adjacent box.
Dimensions 7–11: The one-hot encoding of the number of occupied edges in the right or down adjacent box.
The visual representations of EdgeQuad4, EdgeQuad9, and EdgeQuad44 are shown in
Figure A1.
Figure A1.
Schematic of EdgeQuad Feature Channels for Board State Encoding. (a) Encoding scheme; (b) one occupied edge in a box; (c) similar to (b); (d) left box: one occupied edge, right box: two occupied edges; (e) similar to (d). Colors are for differentiation only. “NA” indicates not applicable.
Figure A1.
Schematic of EdgeQuad Feature Channels for Board State Encoding. (a) Encoding scheme; (b) one occupied edge in a box; (c) similar to (b); (d) left box: one occupied edge, right box: two occupied edges; (e) similar to (d). Colors are for differentiation only. “NA” indicates not applicable.
Appendix B. Details of Training
Appendix B.1. Neural Network Training Process
We generate game data through self-play. The Dotsformer network produces the policy, value, initiative, and classification outputs at the same time. These outputs are combined with MCTS for decision-making and policy improvement. The model is then iteratively trained until it converges. Since the network produces four outputs, the predicted policy distribution
, the predicted value
, the predicted initiative
, and the predicted classification
, we use five loss functions for training. To balance contributions from old and new data, an experience decay factor
is introduced, assigning lower weights to earlier samples in time.
In this formula,
B denotes the batch size and
A represents the size of the action space. The index
b refers to a sample in the batch.
denotes the target policy distribution of the
b-th sample, while
denotes the predicted policy distribution.
represents the target state value and
denotes the predicted state value.
u indexes all trainable parameters of the model.
denotes the predicted probability that sample
b belongs to its ground-truth class
, and
is the class-balancing coefficient used to alleviate class imbalance. The term
encourages the training process to focus more on hard samples.
denotes the weighted cross-entropy loss computed over valid actions only.
The sample weight
is computed according to Equation (
A6). Here,
represents the current training iteration, and
denotes the iteration at which the sample was generated.
Table A1.
Hyperparameter settings used during training.
Table A1.
Hyperparameter settings used during training.
| Category | Parameter | Value | Description |
|---|
| MCTS search | | 800 | Number of MCTS simulations per move |
| 0.4 | Temperature parameter of the policy distribution |
| 1.0 | Exploration constant used in PUCT |
| Rollback strategy | | 32 | Step threshold at which incremental training begins |
| 1 | Step interval for sample generation |
| 10 | Iteration interval for sample generation |
| Experience replay | | 100 | Threshold on node visit counts |
| 0.8 | Threshold on the average node value |
| | Minimum number of states stored in the replay buffer |
| 5 | Minimum number of games per iteration |
| 0.9 | An experience decay factor |
| Optimizer | | | Initial learning rate |
| | L2 regularization coefficient |
| 80 | Iteration threshold for halving the learning rate |
| Training | Epochs | 5 | Number of training epochs per iteration |
| 1024 | Training batch size |
| Loss weights | | 1.0 | Weight of the policy loss |
| | Weight of the value loss |
| | Weight of the regularization term |
| | Weight of the initiative loss |
| 10.0 | Weight of the classification loss |
The weighting coefficients
,
,
,
, and
correspond to the five loss terms, respectively. Their values and other training parameters are listed in Appendix
Table A1.
Appendix B.2. Experimental Environment
Our experiments were conducted on a workstation equipped with an AMD Ryzen 9 9900X (12-core, 24-thread) CPU, an NVIDIA GeForce RTX 4080 SUPER (16 GB GPU memory), and 64 GB system memory. The training process was carried out in a Windows 11 (Version 22H2) environment with CUDA acceleration enabled and coded in PyTorch (Version 2.6.0, PyTorch Foundation, San Francisco, CA, USA) for model implementation and optimization.
Appendix B.3. Dynamic Adjustment of Search Steps
To gradually increase the model’s decision depth during training, we adopt a dynamic adjustment strategy for search steps, inspired by backward training. The baseline used later in this work also follows an AlphaZero implementation trained with backward training. This strategy monitors the accuracy of the model
Q value predictions at the root node and at selected search steps, and then adjusts the starting threshold of search steps during self-play accordingly, as summarized in
Table A2.
When the iteration gap satisfies and the starting search step , the dynamic search step adjustment mechanism is activated. This mechanism decides whether to enter a deeper search earlier by evaluating the accuracy of the Q value prediction. Specifically, if the prediction accuracy at the root node / exceeds the threshold , and the prediction accuracy at the search step nodes / exceeds the threshold , the current model is considered to meet the requirements for further progression. In this case, if , the starting step is reduced by , that is , and the process is recorded as meeting the Q constraint and moving one step forward. Otherwise, the starting step remains unchanged.
Table A2.
Parameter definitions for adjusting the starting depth of search during the search process.
Table A2.
Parameter definitions for adjusting the starting depth of search during the search process.
| Symbol | Value | Description |
|---|
| Counters and statistics |
| - | Total number of Q value predictions at the root node |
| - | Number of correct Q value predictions at the root node |
| - | Total number of Q value predictions at search step nodes |
| - | Number of correct Q value predictions at search step nodes |
| Thresholds and hyperparameters |
| 0.7 | Threshold for updating based on the root node Q value |
| 0.7 | Accuracy threshold at the root node required to allow rollback |
| 0.7 | Threshold for updating based on the search step node Q value |
| 0.7 | Accuracy threshold at search step nodes required to allow rollback |
| 4 | Length of the step interval used for selecting search step nodes |
| 32 | Initial search step |
| 1 | Adjustment step size |
| 10 | Minimum iteration interval |
| t | - | Current iteration index |
| - | The previous update of iteration |
Appendix C. Win Rate Statistics Under Opening Randomization
To evaluate the stability of the model’s strategy in non-standard opening positions, we designed a comparative experiment. We randomize the first 10, 15, or 20 opening moves, and do not test beyond 20 moves because excessive randomness could push the game toward a near-deterministic outcome, making it difficult to assess the model’s actual capabilities.
Table A3.
Comparison of win rates between Dotsformer and the baseline under different opening randomization settings.
Table A3.
Comparison of win rates between Dotsformer and the baseline under different opening randomization settings.
| Random Opening Steps | Guided Random (with Pruning) | Pure Random (No Strategy) |
|---|
| 10 Step | 85.6% [83.4%, 87.8%] | 81.7% [79.3%, 84.1%] |
| 15 Step | 87.6% [85.6%, 89.6%] | 84.7% [82.5%, 86.9%] |
| 20 Step | 86.8% [84.7%, 88.9%] | 81.4% [79.0%, 83.8%] |
In the experiment, we set up two methods of opening generation: one introduced a certain pruning strategy within the first 10, 15, and 20 moves, and the other used completely random moves within the same move counts without any rules. For each setting, the model played 1000 games against the baseline, and we recorded the win rates. The results are summarized in
Table A3 of the paper. The results show that the model achieves an average win rate above 80% under all settings. This suggests that even under highly irregular openings, the model remains robust and can rely on its mid- to late-game decision-making. In addition, the pruning-based openings generally lead to higher win rates than fully random ones, indicating that simple heuristics can help avoid unfavorable early positions.
Appendix D. Attention Map Analysis and Additional Interpretability Results
To improve interpretability, we included visualizations of two representative attention heads in
Figure A2, where attention maps show clear alignment with chain and loop structures. Since chain and loop structures are more prominent in the endgame, we select two late-stage board states for illustration.
The first attention head focuses on a small number of key edges on the board. These edges usually correspond to critical decision points in the current position. The second attention head shows a more distributed pattern. It attends not only to nearby edges, but also to edges that are spatially distant yet structurally related. In particular, it assigns relatively high weights to edges that belong to chains and loops. This behavior suggests that the model is able to capture long-range structural dependencies.
Figure A2.
Visualization of two representative attention heads in endgame states.
Figure A2.
Visualization of two representative attention heads in endgame states.
Appendix E. Search Algorithm
In board games, classical search algorithms such as MiniMax and its optimized variant, alpha-beta pruning, are commonly used for move generation. MiniMax identifies the optimal move by assuming that the player acts to maximize their advantage while the opponent acts to minimize it, recursively evaluating all possible positions in the game tree. However, the computational cost of MiniMax grows rapidly in complex games. The exponential growth of the search space limits the effectiveness of standard minimax. Alpha-Beta pruning addresses this by discarding irrelevant branches, which can improve the time complexity to under optimal ordering. This optimization is essential for deeper searches, allowing for better decision-making within practical time limits. Still, when the branching factor is high or the board states are complex, these methods face exponential growth in computation and struggle to make efficient decisions within a limited time.
We employ MCTS to select the optimal action. By performing extensive simulations, the algorithm evaluates potential outcomes through four main stages:
Selection: Starting from the root, recursively select the child node with the highest value according to the UCT formula until reaching a leaf node. The UCT formula is:
where,
is the average reward of node
a,
is the visit count of node
a,
is the total visit count of the root node, and
is a constant that balances exploration and exploitation.
Expansion: If the leaf node is not fully expanded, generate one or more child nodes to explore new possible board states.
Simulation: From the newly generated node, perform multiple random games until the end of the game to sample potential outcomes of that position.
Backpropagation: Propagate the simulation results up the search path, updating the visit counts N and cumulative rewards Q of all ancestor nodes.
By iterating this process, MCTS ultimately selects the root action with the highest visit count as the optimal strategy.
Figure 1.
The figure shows the chain and loop structures and edge counts returned by the Chain- Loop Detection. Numbers indicate the number of returned edges, and the numbering order is random. Gray lines represent existing edges on the board, and red lines represent newly placed edges.
Figure 1.
The figure shows the chain and loop structures and edge counts returned by the Chain- Loop Detection. Numbers indicate the number of returned edges, and the numbering order is random. Gray lines represent existing edges on the board, and red lines represent newly placed edges.
Figure 2.
A figure of the initiative in chain and loop structures, illustrating the difference between relinquishing and retaining the initiative. Black arrows indicate the next possible cases; gray lines represent existing edges on the board; red lines represent newly placed edges.
Figure 2.
A figure of the initiative in chain and loop structures, illustrating the difference between relinquishing and retaining the initiative. Black arrows indicate the next possible cases; gray lines represent existing edges on the board; red lines represent newly placed edges.
Figure 5.
Network Architecture for Initiative and Classification Prediction.
Figure 5.
Network Architecture for Initiative and Classification Prediction.
Figure 6.
A visual example of Classification Prediction in Dots-and-Boxes. It shows the feature decomposition for the action classification prediction task. For a specific board state, the model classifies all potential actions into four classes and maps them to four independent channels for representation.
Figure 6.
A visual example of Classification Prediction in Dots-and-Boxes. It shows the feature decomposition for the action classification prediction task. For a specific board state, the model classifies all potential actions into four classes and maps them to four independent channels for representation.
Figure 7.
Schematic diagram of the overall Dotsformer network architecture. (a) Input: The transformation in the input channel. (b) Network: The overall network architecture. (c) TransformerBlock: Detailed architecture of the Transformer block. (d) ResidualBlock: Structure of the Residual block used for feature extraction. (e) Policy Head: Architecture of the policy head. (f) Value Head: Architecture of the value head.
Figure 7.
Schematic diagram of the overall Dotsformer network architecture. (a) Input: The transformation in the input channel. (b) Network: The overall network architecture. (c) TransformerBlock: Detailed architecture of the Transformer block. (d) ResidualBlock: Structure of the Residual block used for feature extraction. (e) Policy Head: Architecture of the policy head. (f) Value Head: Architecture of the value head.
Figure 8.
The framework of data generation and model training in Dotsformer.
Figure 8.
The framework of data generation and model training in Dotsformer.
Figure 9.
Final ELO Score Evolution of the ResidualBlock–TransformerBlock Hybrid Architecture. Colored shadows represent variance bands, and colored lines represent the corresponding mean values.
Figure 9.
Final ELO Score Evolution of the ResidualBlock–TransformerBlock Hybrid Architecture. Colored shadows represent variance bands, and colored lines represent the corresponding mean values.
Figure 10.
Attention Heatmap Visualization in Representative Game States. Red lines represent the degree of attention for each edge.
Figure 10.
Attention Heatmap Visualization in Representative Game States. Red lines represent the degree of attention for each edge.
Figure 11.
Training curves of different loss functions.
Figure 11.
Training curves of different loss functions.
Figure 12.
Dynamic Adjustment of Search Steps in Dotsformer and AlphaZero. This figure illustrates the distributions of the ratios and for both the Dotsformer and AlphaZero models from step 22 to step 32. The box plots represent the median, upper, and lower quartiles, while the overlaid scatter points demonstrate the raw sampling distribution.
Figure 12.
Dynamic Adjustment of Search Steps in Dotsformer and AlphaZero. This figure illustrates the distributions of the ratios and for both the Dotsformer and AlphaZero models from step 22 to step 32. The box plots represent the median, upper, and lower quartiles, while the overlaid scatter points demonstrate the raw sampling distribution.
Figure 13.
Final ELO score evolution in Ablation Studies.
Figure 13.
Final ELO score evolution in Ablation Studies.
Figure 14.
Win Rate Comparison in Ablation Studies, where the dotted line represents the 50% win rate.
Figure 14.
Win Rate Comparison in Ablation Studies, where the dotted line represents the 50% win rate.
Figure 15.
Relative ELO progression of Dotsformer against AlphaZero across four random seeds.
Figure 15.
Relative ELO progression of Dotsformer against AlphaZero across four random seeds.
Table 1.
Comparison of our proposed approach with existing methods.
Table 1.
Comparison of our proposed approach with existing methods.
| Model | State Encoding | Architecture | Search | Supervision | Output | Game |
|---|
| AlphaGo [1] | Grid | CNN | MCTS | SL + RL | PH, VH | Go |
| AlphaGo Zero [2] | Grid | ResNet | MCTS | RL | PH, VH | Go |
| AlphaZero [3] | Grid | ResNet | MCTS | RL | PH, VH | Chess, Go, Shogi |
| Go Transformer [37] | Sequence | Causal Trans. | None | SL | LM Head | Go |
| Othello-GPT [38] | Sequence | Causal Trans. | None | SL | LM Head | Othello |
| ChessGPT [39] | Sequence | Causal Trans. | None | SL | LM Head | Chess |
| EfficientFormer [40] | Grid | Vision Trans. | MCTS | SL | PH, VH | Go |
| Chessformer [41] | Sequence | Transformer | None | SL | Multi PHs/VHs * | Chess |
| Tjong [8] | Grid | TIT | None | SL + RL | Hierarchical PH, VH | Mahjong |
| ResTNet [42] | Grid | ResNet + Trans. | MCTS | RL/SL | PH, VH | Go, Hex |
| GraphDQN [43] | Graph | GNN | None | RL | Q-values | Hex |
| GraphAra [43] | Graph | GNN | MCTS | SL + RL | PH, VH | Hex |
| AlphaGateau [44] | Graph | GNN | MCTS | RL | PH, VH | Chess |
| Dotsformer (Ours) | Grid + ChainLoop | MS-Trans | MCTS | RL | PH, VH, Init., Class. | Dots-and-Boxes |
Table 2.
Summary of key notations.
Table 2.
Summary of key notations.
| Notation | Description |
|---|
| X | Input tensor |
| N | Board length: 5 |
| C | Number of channels of the input tensor |
| D | Input channels, set to 128 |
| The feature vector at coordinate |
| The vector at coordinate in channel k, for dots or edges |
| m | (Red boxes count − Blue boxes count) |
| The vector at coordinate in channel l, for chain-loop representation |
| States of the four edges (right, up, left, down) in a box |
| 5-dimensional one-hot vector representing the number of occupied edges in a box |
| 5-dimensional one-hot vector for occupied edges of all adjacent neighbor boxes |
| Feature embedding function |
| Vector input to the neural network |
| L | Sequence length after mapping the original board to , |
| B | Batch size |
| h | Number of attention heads, set to 4 |
| Feature dimension of a single attention head |
| I | 3-dimensional one-hot vector to indicate the current initiative |
| 4-way categorical vector to classify the move |
Table 3.
Comparison of Baseline and Dotsformer configurations.
Table 3.
Comparison of Baseline and Dotsformer configurations.
| Feature | Baseline (AlphaZero) | Dotsformer (Proposed) |
|---|
| Backbone | 6 residual blocks | MS-Trans (6-layer Hybrid TR-structure) |
| Parameters | 1.42 M | 1.94 M |
| Policy Head | 2× Res + Fusion + BN + Softmax | Same |
| Value Head | Res+ Fusion + BN + Linear + BN + Tanh | Same |
| Search Budget | 800 simulations/move | Same |
| Training Budget | 4500 self-play games | Same |
Table 4.
Effects of Different Board Representation Configurations on the Efficiency of Baseline, EdgeQuad, and ChainLoop Models.
Table 4.
Effects of Different Board Representation Configurations on the Efficiency of Baseline, EdgeQuad, and ChainLoop Models.
| Model | FLOPs (G) | Params (M) | Latency (ms) |
|---|
| Baseline | 0.172 | 1.42 | 1.67 [0.14, 0.23] |
| EdgeQuad4 | 0.035 | 1.42 | 2.89 [0.26, 0.54] |
| EdgeQuad9 | 0.035 | 1.42 | 2.84 [0.18, 0.34] |
| EdgeQuad44 | 0.036 | 1.47 | 2.86 [0.31, 0.43] |
| ChainLoop64 | 0.181 | 1.50 | 1.67 [0.16, 0.22] |
| ChainLoop6 | 0.172 | 1.42 | 1.66 [0.14, 0.22] |
Table 5.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in Feature Representation Model.
Table 5.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in Feature Representation Model.
| Model | Step |
|---|
| 32nd | 31st | 30th | 29th | 28th | 27th | 26th | 25th | 24th |
|---|
| Baseline | 49 | 59 | 69 | 79 | 90 | 100 | stop | stop | stop |
| EdgeQuad4 | 12 | stop | stop | stop | stop | stop | stop | stop | stop |
| EdgeQuad9 | 77 | 106 | stop | stop | stop | stop | stop | stop | stop |
| EdgeQuad44 | 74 | 93 | 126 | stop | stop | stop | stop | stop | stop |
| ChainLoop64 | 23 | 33 | 46 | 56 | 66 | 76 | 106 | stop | stop |
| ChainLoop6 | 21 | 32 | 42 | 53 | 64 | 74 | 109 | 119 | stop |
Table 6.
ELO Score in Feature Representation Model.
Table 6.
ELO Score in Feature Representation Model.
| Model | ELO | Win Rate | 95% Confidence Interval |
|---|
| Baseline | 1746 | 73.28% | [71.55%, 75.01%] |
| ChainLoop6 | 1855 | 83.00% | [81.53%, 84.47%] |
| ChainLoop64 | 1784 | 75.96% | [74.29%, 77.63%] |
| EdgeQuad44 | 1259 | 28.56% | [26.79%, 30.33%] |
| EdgeQuad9 | 1207 | 23.44% | [21.78%, 25.10%] |
| EdgeQuad4 | 1148 | 15.76% | [14.33%, 17.19%] |
Table 7.
Detailed Win Rate in Feature Representation Model.
Table 7.
Detailed Win Rate in Feature Representation Model.
| Model | Baseline | ChainLoop6 | ChainLoop64 | EdgeQuad44 | EdgeQuad9 | EdgeQuad4 |
|---|
| Baseline | NA | 38.80% | 46.40% | 89.60% | 95.60% | 96.00% |
| ChainLoop6 | 61.20% | NA | 63.40% | 94.40% | 97.20% | 98.80% |
| ChainLoop64 | 53.60% | 36.60% | NA | 95.20% | 96.20% | 98.20% |
| EdgeQuad44 | 10.40% | 5.60% | 4.80% | NA | 57.40% | 64.60% |
| EdgeQuad9 | 4.40% | 2.80% | 3.80% | 42.60% | NA | 63.60% |
| EdgeQuad4 | 4.00% | 1.20% | 1.80% | 35.40% | 36.40% | NA |
Table 8.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in ResidualBlock–TransformerBlock Hybrid Architecture Model.
Table 8.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in ResidualBlock–TransformerBlock Hybrid Architecture Model.
| Model | Step |
|---|
| 32nd | 31st | 30th | 29th | 28th | 27th | 26th | 25th | 24th |
|---|
| Baseline | 49 | 59 | 69 | 79 | 90 | 100 | stop | stop | stop |
| RT | 16 | 26 | 37 | 47 | 57 | 89 | 106 | 129 | stop |
| TR | 17 | 27 | 37 | 50 | 63 | 84 | 97 | 107 | stop |
| TT | 41 | 55 | 65 | 82 | stop | stop | stop | stop | stop |
Table 9.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in MultiScale-Topo Model.
Table 9.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in MultiScale-Topo Model.
| Model | Step |
|---|
| 32nd | 31st | 30th | 29th | 28th | 27th | 26th | 25th | 24th | 23rd | 22nd |
|---|
| Linear-NoTopo | 10 | 20 | 30 | 44 | 54 | 81 | 97 | 107 | 129 | stop | stop |
| MultiScale2-Topo | 10 | 20 | 30 | 41 | 51 | 61 | 74 | 106 | 116 | stop | stop |
| MultiScale4-Topo | 11 | 21 | 31 | 41 | 51 | 62 | 73 | 95 | 121 | 139 | stop |
Table 10.
Performance Evaluation of MultiScale4-Topo, MultiScale2-Topo, and Linear-NoTopo: ELO Rating and Win Rate.
Table 10.
Performance Evaluation of MultiScale4-Topo, MultiScale2-Topo, and Linear-NoTopo: ELO Rating and Win Rate.
| Model | ELO | Win Rate | 95% Confidence Interval |
|---|
| MultiScale4-Topo | 1618 | 71.67% | 68.0–75.3% |
| MultiScale2-Topo | 1562 | 63.33% | 59.3–67.3% |
| Linear-NoTopo | 1420 | 45% | 40.8–49.2% |
Table 11.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in Ablation Studies.
Table 11.
Training Iterations Required for Stepwise Rollback of the MCTS Starting Step in Ablation Studies.
| Model | Step |
|---|
| 32nd | 31st | 30th | 29th | 28th | 27th | 26th | 25th | 24th | 23rd | 22nd | 21st |
|---|
| Baseline | 49 | 59 | 69 | 79 | 90 | 100 | stop | stop | stop | stop | stop | stop |
| No-MS-Trans | 24 | 34 | 44 | 54 | 64 | 74 | 89 | 125 | stop | stop | stop | stop |
| No-ChainLoop6 | 14 | 25 | 35 | 45 | 55 | 87 | 98 | 117 | 141 | stop | stop | stop |
| No-Auxiliary | 10 | 20 | 30 | 40 | 50 | 68 | 80 | 104 | 114 | stop | stop | stop |
| Full | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 85 | 109 | 126 | 152 | stop |
Table 12.
Performance Evaluation of Ablation Studies: ELO Rating and Win Rate.
Table 12.
Performance Evaluation of Ablation Studies: ELO Rating and Win Rate.
| Model | ELO | Win Rate | 95% Confidence Interval |
|---|
| Full | 1659 | 72.85% | [71.47%, 74.23%] |
| No-Auxiliary | 1629 | 68.65% | [67.21%, 70.09%] |
| No-MS-Trans | 1477 | 47.17% | [45.63%, 48.71%] |
| No-ChainLoop6 | 1416 | 36.05% | [34.56%, 37.54%] |
| Baseline | 1319 | 25.25% | [23.90%, 26.60%] |
Table 13.
Computational cost analysis of different model configurations.
Table 13.
Computational cost analysis of different model configurations.
| Model | FLOPs (G) | Params (M) | Latency (ms) |
|---|
| Full | 0.235 | 1.94 | 3.97 [0.34, 0.50] |
| No-Auxiliary | 0.224 | 1.85 | 3.73 [0.26, 0.39] |
| No-MS-Trans | 0.183 | 1.52 | 2.03 [0.09, 0.14] |
| No-ChainLoop6 | 0.235 | 1.94 | 3.98 [0.25, 0.52] |
| Baseline | 0.172 | 1.42 | 1.73 [0.14, 0.23] |