A Rule-Based AI Method for an Agent Playing Big Two

: Big Two is a popular multiplayer card game in Asia. This research proposes a new method, named rule-based AI, for an agent playing Big Two. The novel method derives rules based on the number of cards the AI agent has left and the control position. The rules for two to four cards left are used to select the card combination to discard based on the number of cards remaining in the agent’s hand. The rules for more than four cards left conditionally prioritize discarding the card combination in the classiﬁed cards with lower priority. A winning strategy provides guidelines to guarantee that the AI agent will win when a win is achievable within three moves. We also design the rules for the AI agent without control for holding cards and splitting cards. The experimental results show that our proposed AI agent can play Big Two well and outperform randomized AI, conventional AI, and human players, presenting winning rates of 89.60%, 73.00%, and 55.05%, respectively, with the capability of maximizing the winning score and minimizing the number of cards left when the chance of winning is low.


Introduction
Developing an agent (artificial player) for playing a strategy game (e.g., chess, mahjong, and Big Two) is crucial [1]. The players' decision-making skills can significantly determine the outcome. The experience of playing strategy games with a developed agent can help human players to improve their decision-making skills. Human players can practice and play the game anytime they want. Playing a game will be more exciting and popular if there are smart opponents in the game [2]. The first famous agent to beat a reigning world champion in a game was the chess program called Deep Blue [3]. Since then, research on how to develop an agent that can play against human players has gained a lot of traction [4,5]. Big Two is a multiplayer card game that is incredibly popular in China and other Asian countries. Although this game has many rule variations, it has one primary aim: each player has 13 cards and tries to discard them all as soon as possible based on valid card combinations. This study proposes an approach for developing an artificial intelligence (AI) agent to play Big Two [6][7][8].
Previous research has explained the framework of our web-based Big Two game [6]. Big Two requires four players to play. The game needs AI agents that can join if there are not enough players. Preliminary research in developing an AI agent has commonly used rule-based AI [9]. AI agents for playing games such as first-person shooter [10], Super Mario Bros. [11], Ms. Pac-Man [12][13][14], samurai [15], Battleship [16], Nomic game [17], Domineering [18], poker [19], and Xiangqi [20] have successfully been developed. One advantage of rule-based AI is its flexibility when encoding and modifying the system in the future [21]. Although some studies have developed sophisticated AI agents for playing games such as Air War [22], Mythic RPG [23], Chinese Chess [24], Sudoku [25], the fighting game [26], Star Trek [27], and Hanabi card [28], they used rule-based AI as an essential part of the new method. Developing sophisticated AI agents means they used sophisticated methods (e.g., fuzzy rule, deep learning) that involve computationally complex tasks.

Basic Rules of Big Two
Big Two is a card game with exactly four players [6][7][8]. Each player is dealt 13 cards and plays them using valid combinations. Valid combinations in this game are one card, two cards (a pair), and five cards. The rank of the cards is 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < J < Q < K < A < 2. If two cards have the same number, then the winner is the card with the highest suit. The rank of suits is diamond (D) < club (C) < heart (H) < spade (S). The rank of five-card combinations is straight (five consecutive cards) < flush (five cards of the same suit) < full house (three cards of one rank + a pair) < four of a kind (four cards of one rank + a single card) < straight flush (five consecutive cards of the same suit).
The player who has the three of diamonds (3D) starts the game. The player can be in one of two positions during the game: in control or not in control. The player who wins the round is in control of the next round of play, and they can lay down any valid card combination. Other players can only discard a high-card combination following the type of card played or choose to pass. The winner's score is derived by totaling the number of cards opponents have left. For example, if Player 1 wins and the three opponents have two, five, and nine cards left, then the winner's score would be 16. Each opponent receives a negative score equal to their remaining number of cards. Using the same example, the opponents would score −2, −5, and −9.

Agent-Based Gaming
Agents in the game are non-player characters (NPCs) behaving by programmed algorithms [37]. In this article, we use the term "AI agent" to refer to a computer player (e.g., Deep Blue, AlphaGo) that uses particular algorithms to determine the best strategy to compete with players. AI agents have individually owned variables that describe their internal state and conduct certain computations or run instructions. In multiplayer games, the game developer usually provides AI agents as opponents to play with the human players if the game does not have enough human players.
The algorithm which developers commonly uses to make the AI agent is the Min-Max algorithm [38], which assumes that the opponent always performs optimal actions. In Tic-Tac-Toe [39], the Min-Max algorithm is used to predict steps and win matches. The algorithm evaluates the minimum opponent's chances and maximum AI agent's chances of the win. The technique that can optimize the Min-Max algorithm is Alpha-Beta Pruning [40]. Two extra parameters are passed in the Min-Max function: alpha and beta. At each node in the game tree, alpha stores the maximum score, and beta stores the minimum score. When applied to a Min-Max tree, this technique prunes away branches that are unlikely to be the final decision to save computation time. However, the Min-Max algorithm and Alpha-Beta Pruning are typically used in perfect information games where both players have the same information about the game, such as chess, Tic-Tac-Toe, and Go. It can be time-consuming to compute moves for games with high branches.
Rule-based AI [9] is the most widely used method for agent-based gaming such as first-person shooters [10], Super Mario Bros. [11], Ms. Pac-Man [12][13][14], samurai [15], Battleship [16], Nomic game [17], Domineering [18], poker [19], and Xiangqi [20]. In their simplest form, rule-based AI consists of a set of if-then-style rules used to make actionable decisions. Rules, primarily conditional statements, are stored in unordered clusters. Rules are matched with existing facts. The collection of knowledge and facts is a dynamic set, constantly being adjusted. It describes what is known about the current state of the game. In rule-based AI, various conditions are carefully driven to achieve better performance. Rule-based AI is relatively easy to program and manage because the knowledge encoded in the rules is modular. This advantage gives some flexibility both when coding the game and modifying the game at a later time.

Related Work
Previous research has shown that our web-based Big Two is of good quality and is able to run smoothly [6]. A critical issue concerning this game is the number of players. Big Two requires precisely four players in one room to play. If there are not enough players, the player must wait in the game lobby. The game needs a lightweight AI agent that can handle this situation. If there are not enough players at the specified time, the AI agent will play against human players. Several previous studies have succeeded in creating agents for playing first-person shooters [10], Super Mario Bros. [11], Ms. Pac-Man [12][13][14], samurai [15], Battleship [16], Nomic game [17], Domineering [18], and Xiangqi [20] using rule-based AI. They chose to use rule-based AI because it is easy to implement and requires fewer computational resources. Rule-based AI uses a set of if-then statements to select actions based on several conditions in the game [9]. In game AI, intelligence is built through adequate rules with effective execution. However, all of these studies have been based on games with perfect information. In the games with perfect information, all the information about the actions and state of the game is available. Therefore, the computer can analyze all possible actions and strategies of the opponent [41]. Big Two is a multiplayer card game with imperfect information. The computer has incomplete information about the actions and state of the game [42,43]. The opponent's hand is hidden; thus, the computer has difficulty analyzing the possible actions and strategies of the opponent.
Developing an AI agent for playing multiplayer games with imperfect information is challenging because such games involve multiplayer interactions, imperfect information, and computationally complex tasks. The use of rule-based AI has successfully developed an AI agent for playing a type of poker called No-Limit Texas Hold'em [19]. During each agent's turn, the agent will follow three steps sequentially: (1) reading all the information  from the poker table; (2) searching the most suitable rules for the table circumstances; and (3) choosing the rule to follow. The AI agent successfully inferred strategies from the human player in one-on-one matches, but there were no consistent results in multiplayer games against humans.
We have successfully developed a rule-based AI agent for playing Taiwanese mahjong, and the AI is currently being used in the industry [5]. The proposed mahjong AI is easy to adjust for reducing players' waiting time and improving players' gaming experience. However, the main difference between mahjong and Big Two is the acceptable combinations. In mahjong, a combination can be either a chow (three consecutive tiles of the same suit), a pong (three identical tiles), or a kong (four identical tiles). In Big Two, it is more complex that a combination can be either a five-card (straight, flush, full house, four of a kind, and straight flush), a pair, or a single. Although mahjong uses four tiles for every three suits-characters, dots, and bamboos-that are numbered from one to nine, there is no ranking (e.g., a kong from the nine of dots and kong from the six of bamboos is the same value). In contrast, Big Two is very specific about ranking and card value. The rank of five-card combinations is straight < flush < full house < four of a kind < straight flush. The rank of the cards is 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < J < Q < K < A < 2. The rank of suits is diamonds < clubs < hearts < spades. The ranking system and the number of possible combinations make Big Two have more computationally complex tasks than mahjong.
The Big Two game is rarely studied, and thorough research is lacking in the literature. Although there are several servers where one can play Big Two online, those servers that feature AI agents playing against humans are scarce, and the AI algorithm itself is not yet open to the public. Some experts have tried to formalize how to play Big Two. They described a systematic approach [7] and some probability calculations [8] for playing Big Two. A systematic approach is used to decide which play will maximize the chances of winning. However, this approach is a narrative form and is unstructured. In this paper, we transform the strategies into a computational approach as a conventional AI and a baseline player in the second experiment. Table 1 illustrates the complexity of Big Two depending on the number of cards left in hand. The formula for calculating tree size is B D . B is the total number of possible moves from all actions (i.e., five-card, pair, single, and pass action). D is the average length of Big Two. We have observed 250 games and found out that the average length of Big Two is 37.3725 turn. Big Two has a complicated action space with up to 1379 possible actions and tree size of up to 2.15 × 10 117 . Applying sophisticated methods is not suitable for game developers with limited resources due to computational resource usage. Rule-based AI appears to be the correct method for developing a lightweight AI agent that can play Big Two.

Proposed Method
This section specifies the proposed method to develop an agent for playing Big Two. We used six primary rules with 29 sub-rules for the AI agent to play against human players. The six primary rules are rules for two, three, and four cards left, rules (a winning strategy) for the number of moves to win in three moves, rules for more than four cards left, and rules for the agent if they are not in the control position. We have played with the proposed AI in thousands of games to make sure that there are no conflicting rules. We managed all rules and defined the rules' priority using two features: the number of cards left in the agent's hand and the agent's control position (control or no control). Figure 1 shows a flowchart of the proposed rule-based AI. If one combination (e.g., five-card, pair, single) is left, the AI agent automatically discards it and then immediately wins. If more than one combination remains, proposed rules are applied to output an action which the AI agent will execute during the game. The first process is card classification, which classifies all cards in the agent's hand based on a list of cards that have been played (field history).
The agent can be in one of two positions during the game, in control or not in control. A player who has control has the opportunity to choose a type of card combination to play and discard a low-card combination. Other players can only discard a high-card combination following the type of card played or choose to pass. Based on our observations, we found out that the critical moment in the game is when the agent has control and fewer than five cards left in hand. The agent must be able to finish the game and anticipate the opponent's last card combination. If an opponent has one card left, then the agent discards a pair. If an opponent has two cards left, then the agent discards a single card. The rules for situations in which two, three, and four cards are left are used by the agent to select a card combination to be discarded. When the agent has control with four or more cards, sometimes a win is achievable within three moves. For example, the agent could have 11 cards consisting of a straight, a full house, and a single card. In this condition, the agent uses the winning strategy. Under normal conditions, the agent uses the rules for more than four cards left when the AI agent is in a control position. Additionally, we propose rules for the AI agent if they are not in the control position.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 25 sometimes a win is achievable within three moves. For example, the agent could have 11 cards consisting of a straight, a full house, and a single card. In this condition, the agent uses the winning strategy. Under normal conditions, the agent uses the rules for more than four cards left when the AI agent is in a control position. Additionally, we propose rules for the AI agent if they are not in the control position.

Card Classification
The first step in classifying a card is to generate all possible card combinations (single, pair, and five-card) of the opponents using Algorithm 1. The opponent's cards include all cards that are not in the agent's hand and have not yet been played. Note that we use pair and five-card functions to generate all pair and five-card combinations.
We classify card combinations that are in the agent's hand into four classes. A card combination which is of a higher rank than all of the opponent's card combinations will be classified in class A. Discarding a card combination of class A, the agent takes control while playing the game. By contrast, a card combination which is of a lower rank than all the opponent's card combinations will be classified in class D. The combinations of class D are impossible to play without control. The card combinations of class B have a high chance of being discarded and could serve as controls. Based on our observations, we classify the card combinations in class B if the rank of the combination is at least in the top 30% (approximately Queen) of the highest-ranking among the opponent's card combinations. Then, class C contains card combinations that are between class B and class D.

Card Classification
The first step in classifying a card is to generate all possible card combinations (single, pair, and five-card) of the opponents using Algorithm 1. The opponent's cards include all cards that are not in the agent's hand and have not yet been played. Note that we use pair and five-card functions to generate all pair and five-card combinations.
We classify card combinations that are in the agent's hand into four classes. A card combination which is of a higher rank than all of the opponent's card combinations will be classified in class A. Discarding a card combination of class A, the agent takes control while playing the game. By contrast, a card combination which is of a lower rank than all the opponent's card combinations will be classified in class D. The combinations of class D are impossible to play without control. The card combinations of class B have a high chance of being discarded and could serve as controls. Based on our observations, we classify the card combinations in class B if the rank of the combination is at least in the top 30% (approximately Queen) of the highest-ranking among the opponent's card combinations. Then, class C contains card combinations that are between class B and class D.

Algorithm 1. Generate all opponent's card combinations
Input: a list of cards in the agent's hand C a , a list of cards that have played (field history) C p . 1: opponent card C o ← all cards / ∈ (C a ∪ C p ) 2: single combinations CO s ← {x|for all x in C o } 3: pair combinations CO p ← getPair (C o ) 4: five-card combinations CO fc ← getFivecard (C o ) 5: CO = CO s ∪ CO p ∪ CO fc Output: all opponent's card combinations CO.  //a function to get five consecutive cards from Co sh  getFlush (Co) //a function to get five cards of the same suit from Co ll-house  getFullHouse (Co) //a function to get three cards of one rank + a pair fro ur-of-a-kind  getFourOfAKind (Co) //a function to get four cards of one rank + a single raight-flush  getStraightFlush (Co) //a function to get five consecutive cards of the same turn five-card {straight ∪ flush ∪ full-house ∪ four-of-a-kind ∪ straight-flush} An example of the agent's hand is shown in Figure 2   Another example is with the same agent's hand but using a different field history.

Rules for Two Cards Left
When the agent has control and only two cards remain in the hand, the agent will use the rules listed in Algorithm 2 with three parameters as inputs. The first parameter is a list of cards that are currently in the agent's hand, C a , the second is the classification results for the class that is only class A, and the third is the length (how many cards left) of each opponent's hand.

Algorithm 2. Rules for two cards left
Input: a list of cards in the agent's hand C a , the card classification results (class-A), and the length of each opponent's hand (opponent 1, opponent 2, opponent 3). 1: pair ← getPair (C a ) 2: if pair exists: //two cards left is a pair 3: return pair 4: else if class-A exists or length (opponent's card) == 1: 5: return C a [1] //the highest-ranking single card (index 1) 6: else: 7: return C a [0] //the lowest-ranking single card (index 0) Output: a selected card combination that the agent will discard. The getPair function generates any possible combination of pairs from cards in the agent's hand. Lines 2-7 select a card combination to discard based on the following conditions: (1) If a pair exists, the agent will discard a pair and win the game; (2) If any single card of class A is in the agent's hand, or if an opponent only has one card left, then the selected card is the highest-ranking single card (card index 1). Making this move ensures that the agent will win in the next turn because the agent will gain control and can then discard the last card; (3) If there is no single card in class A, the selected card is the lowest single card (card index 0).

Rules for Three Cards Left
The rules for three cards left (Algorithm 3) require all card classification results (class A, class B, class C, and class D) as input parameters. When the agent has a pair, the best combination to discard may not be a pair because the agent still has a single left in hand. Lines 4-12 are used to select the combination to discard based on the following conditions: (1) If a pair is in class A, then the selected combination is a pair. The agent still has control and will discard the last single card to win; (2) If a single card is in class A, then the agent discards the single card and will then discard a pair to win the game; (3) If an opponent has one card left, then the selected combination is a pair. This selection is a strategy to minimize losses; (4) However, if an opponent has two cards left, then the agent discards a single card; (5) If none of these conditions are fulfilled, then the agent discards a combination based on the lower class of card classification results.
If the agent does not have a pair, then the three cards left are singles. Lines 14-16 are used to select the single to discard based on the following conditions: (1) If the highest single card (card index 2) is in class A, the agent discards the middle single (card index 1) and will then discard the highest single on the next turn to gain control. The agent can then discard the lowest single to win the game; (2) If an opponent has one card left, then the agent discards the highest single to minimize losses; (3) If both of these conditions are not fulfilled, then the lowest single (card index 0) should be selected.

Rules for Four Cards Left
The rules for four cards left (Algorithm 4) are an expansion of the previous rules (Algorithm 3) and have the same input parameters. The agent may have two pairs, one pair and two single cards, or four single cards. If the agent has two pairs, Lines 2-7 are used to select a pair to discard based on the following conditions: (1) If the higher pair (pair index 1) is in class A, then the agent discards the higher pair and has control. The agent will then discard the last pair and win the game; (2) If an opponent has two cards left, the agent discards the higher pair to minimize losses; (3) If both of these conditions are not fulfilled, then the lower pair (pair index 0) is discarded.

Algorithm 3. Rules for three cards left
Input: a list of cards in the agent's hand C a , the card classification results (class-A, class-B, class-C, class-D), and the length of each opponent's hand (opponent 1, opponent 2, opponent 3). 1: pair ← getPair(C a ) 2: if pair exists: //the agent has a pair 3: else: return C a [0] Output: a selected card combination that the agent will discard.

Algorithm 4. Rules for four cards left
Input: a list of cards in the agent's hand C a , the card classification results (class-A, class-B, class-C, class-D), and the length of each opponent's hand (opponent 1, opponent 2, opponent 3 When the agent has one pair, checking the two single cards is essential. Lines 7-18 are used to select a combination to discard based on the following conditions: (1) If the higher single is in class A, then the agent discards the lower single (single index 0). The agent will have three cards left and can then play using Algorithm 3. The agent will have control after discarding the highest single card and can then discard the last pair to win; (2) If an opponent has one card left, then the agent discards a pair; (3) If an opponent has two cards left, then the agent discards a single; (4) If a pair is in class A, then the lower single should be selected. Discarding the highest pair is useless because the agent still has two singles; (5) If none of these conditions are fulfilled, then the agent discards a combination based on the lower class of card classification results.
If the agent does not have a pair, then the four remaining cards are singles. Lines 20-22 provide guidelines to select a single card based on the following conditions: (1) If any single card is in class A, then the agent discards the card index 1; (2) If an opponent has one card left, then the agent discards the highest single (card index 3); (3) If both of these conditions are not fulfilled, the lowest single (card index 0) is selected.

Rules for More Than Four Cards Left
When the agent has control and has more than four cards in hand, the agent uses the rules for more than four cards left. These rules determine all possible moves of the agent (single, pair, and five-card). These rules (Algorithm 5) prioritize discarding the card combination in the lower class. The agent must discard a card combination of class D and class C as soon as possible; otherwise, these card combinations are difficult to play without control.  There are exceptions to this idea: (1) If an opponent has one card left, then the agent discards a five-card or a pair. The priority order for discarding is as follows: five-card, pair, and then single; (2) If the agent does not have a five-card but has many pairs instead, then the agent discards a pair. If these two exceptions are not fulfilled, then the agent discards a combination based on the lower class of card classification results.

Winning Strategy
When the agent has control with more than four cards, we generate all possible moves and count how many moves will be needed to win. For example, the agent has 11 cards consisting of a straight, a full house, and a single card. The winning move (WM) for this example hand would be three moves. The winning strategy (Algorithm 6) only applies when the agent has more than four cards and fewer than four moves to end the game.  If the winning move is less than or equal to two, it is essential to play a card combination of class A first and win immediately. If class A does not exist, the order for discarding the cards is a five-card combination, a pair of cards, and then a single card. This strategy can not only minimize the risk, but also maximize the winning score. We apply the same strategy for winning moves that are equal to three. This strategy will guarantee that the agent wins if the number of card combinations in class A is two. The agent discards a card combination of class A twice and plays the last card to win.

Rules for the Agent, If Not in Control
If they are not in control, the agent can only discard a valid combination according to the type of card combination which is currently being played by the opponent. The agent cannot independently discard a card combination. For example, if the type of card combination played currently is a pair, the agent can only discard a higher pair. We will generate all valid combinations and always update them at every turn. The basic idea of these rules is to discard a card combination of lower-class classification. The critical action when the agent does not have control is to decide when to hold back a selected combination and when to split a card combination.
We designed the rules for the AI agent if not in control (Algorithm 7) by including two essential functions: hold-back function (Algorithm 8) and split-card function (Algorithm 9). The agent uses the hold-back function to check if the selected combination is essential for holding. If the agent decides to hold back, then it passes its turn. The split-card function is a function for splitting a five-card and a pair to minimize loss when an opponent only has one card remaining to win. The hold-back function (Algorithm 8) consists of three possible combinations based on the type of combination just played: a single card (lines 1-6), a pair (lines 8-12), and a five-card combination (lines [13][14][15][16][17][18][19]. When the selected combination is a single card, lines 1-6 determine whether to hold based on the following conditions: (1) if the agent's hand contains two cards at most, then the agent discards the selected card; (2) if an opponent has fewer than three cards left, then the agent discards the selected card; (3) if class-A combination has fewer card combinations than all other classes combined or if all opponents have at least six cards, the highest single card (card index −1) will be held to retain value in the agent's hand. In most cases, discarding the highest single during the early game, especially a non-class-A card, has no purpose, because opponents can easily discard a higher single.
When the selected combination is a pair, lines 7-11 determine if a pair should be held based on the following conditions: (1) if the agent's hand has three cards left at most, then the agent will discard the selected pair; (2) if the selected pair is a 2-pair, then the agent will hold as long as all opponents have more than two cards left. The agent can split a 2-pair to beat two singles and take control twice. Typically, a 2-pair is classified in class A, but some conditions can cause a 2-pair classified in class B. If 2-spade and one of 2-diamond/2-club/2-heart is in the opponent's hand, then a 2-pair will classify in class B. In this case, we can identify a 2-pair more accurately using a score. We use a card range of 1 to 52 (3-diamond to 2-spade), and only take a score from the highest card in a pair. For example, a 3-pair [3D, 3S] has a score of 4. Therefore, we identify that a 2-pair has a score of at least 50. The crucial moment is when the agent has two five-card combinations (e.g., a straight and a full house), which require the agent to be aware when there are opponent passes at the beginning of a game. For instance, if the first opponent discards a low full house and the second and third opponents pass, the agent must determine whether to discard a full house is the right choice or not. The second or third opponents may have higher rank straights or flushes, which will cause the agent to lose control after discarding a straight and has no full house anymore. Such a situation usually occurs during the beginning period of playing the game (before turn 4).
The function for splitting a five-card combination and a pair (Algorithm 9) is a closing strategy for minimizing losses. This function will be invoked when an opponent has one card remaining to win. If the type of combination played now is a pair, a pair can only be obtained by splitting a four-of-a-kind or a full house. In such a situation, the agent will discard a higher pair than a currently played pair. If a pair in hand does not meet that requirement, the agent will pass.

Algorithm 9. Split-card Function
Input: type of card combination played now CO f , all possible moves of the agent according to the type of combination played now CO pm , all possible moves of the agent (pair, and five-card If a single card is played, the agent can split a pair or any five-card combination to take control. The agent will prioritize splitting a pair with the intent to play a five-card combination after taking control. However, if no single cards can be played and a pair cannot be split, the agent will split a five-card combination to take control and play another pair or a single card before losing.

Additional Functions for Five-Card Combinations
We designed two additional functions for selecting five-card combinations. Bi-fivecard function (Algorithm 10) checks whether the agent has two five-card combinations or not. This algorithm will run after a five-card combination is found. If a second fivecard combination exists, it is listed by strength (e.g., a flush and a straight) with the first combination. The agent will prioritize these two five-card combinations for discarding. The chance of winning will increase because the agent can discard ten cards quickly and leave a few remaining cards in hand.

Algorithm 10. Bi-five-card Function
Input: a list of cards in the agent's hand C a , type of combination played now CO f . 1: five-card combinations CO fc ← getFivecard (C a ) 2: for all five-card in CO fc : 3: second five-card ← getFivecard (C a / ∈ five-card) 4: if second five-card exists: 5: add [five-card, second five-card] to CO fc 6: return CO fc Output: double five-card combinations.
The best-five-card function (Algorithm 11) simulates and determines the best fivecard combination for discard. The best combination should contain several cards that are difficult to discard. The cards remaining in hand also have a maximum value compared to other possible five-card combination discards.
The first-hand contains cards that will remain regardless of a five-card combination. The second-hand contains the cards remaining regardless of another five-card combination. This function calculates the remaining card scores from the first-hand (lines 4-11) and second-hand (lines [12][13][14][15][16][17][18][19]. Lines 20-30 select the best five-card combination for discard based on the following conditions: (1) the highest value of cards left; (2) the fewest number of card combinations left; and (3) the highest rank of selected five-card combinations. This function will repeat the process for all five-card combinations until the best possible five-card combination discard is determined.

Experimental Results and Discussion
We have performed three experiments to evaluate the performance of our rule-based AI agent. Big Two is a four-player card game; therefore, two rule-based AI agents played against two opponents and play log data were recorded in each of these experiments. The data presented in this study are available online in our repository (see Supplementary Materials). For the first and second experiments, two types of AI opponents were built: randomized AI and conventional AI, which have been implemented in Python as the rule-based agents. Note that we generated all possible moves of an agent (single, pair, and five-card) in each turn. Randomized AI randomly chooses a card combination from the possible moves to play. The first experiment was the baseline for verifying whether rule-based AI showed any intelligence in playing Big Two.
Conventional AI played the game according to the systematic approach [7] and probability calculations [8] which some experts have tried when formalizing how to play Big Two. The main idea of this AI is to play the lowest possible card combination to maximize the chances of winning. We generated all possible card combinations (single, pair, and five-card) of the agent. The value of possible combinations can be determined based on the probability of opponents having a single, pair, and five-card of a certain rank. In each turn, the agent will select the lowest possible card combination. If the agent does not have a valid combination to play, then it will pass its turn. The second experiment is designed to evaluate whether rule-based AI outperforms the existing approaches.
For the third experiment, our rule-based AI connected to our web-based Big Two [6] and played against human players. Figure 3 shows a screenshot of our web-based Big Two game (our web-based Big Two game is freely available to try using Google Chrome at http://web.bigtwo.gamelab.com.tw/, accessed on 28 December 2020). The green 19 in Figure 3 is the timer countdown. If the active player does not discard a valid combination at the specified time (20 s), then it will pass its turn. We selected 50 graduate students and assured that there was no cheating between human players. Every two random participants played against two rule-based AI agents in different game rooms. Before starting the experiment, each player played more than 20 games to become familiar with the game. Then, each player played 80 games against rule-based AI agents via our web-based Big Two.
We stated the following hypotheses:

Hypothesis 3 (H3). Rule-based AI significantly outperforms human players.
We used a Wilcoxon signed-rank test [44,45] to measure the effect size (r) and check whether each hypothesis was accepted and statistically significant using p-values. The hypothesis was accepted and statistically significant if the p-value < 0.01; otherwise, the hypothesis was rejected. Furthermore, we used Pearson's r to interpret the effect size [46] as small (0.10 < r < 0.30), medium (0.30 < r < 0.50), and large (r > 0.50). ure 3 is the timer countdown. If the active player does not discard a valid combination at the specified time (20 s), then it will pass its turn. We selected 50 graduate students and assured that there was no cheating between human players. Every two random participants played against two rule-based AI agents in different game rooms. Before starting the experiment, each player played more than 20 games to become familiar with the game. Then, each player played 80 games against rule-based AI agents via our web-based Big Two. We stated the following hypotheses:  We used a Wilcoxon signed-rank test [44,45] to measure the effect size (r) and check whether each hypothesis was accepted and statistically significant using p-values. The hypothesis was accepted and statistically significant if the p-value < 0.01; otherwise, the hypothesis was rejected. Furthermore, we used Pearson's r to interpret the effect size [46] as small (0.10 < r < 0.30), medium (0.30 < r < 0.50), and large (r > 0.50). Figure 4 shows a histogram of rule-based AI's win-loss records achieved after playing 2000 games in each experiment against various opponents. The number above the blue bar indicates the number of wins, and the number above the orange bar indicates the number of losses. In the first experiment against randomized AI, the rule-based AI won 1792 games and lost 208 games. In the second experiment against conventional AI, the rule-based AI won 1460 games and lost 540 games. In the third experiment against human players, the rule-based AI won 1101 games and lost 899 games. Overall, rule-based AI played 6000 games, winning 4353 games and losing 1647 games. Rule-based AI demonstrated better performance than all of its different opponents.  Figure 5 shows that rule-based AI's performance was stable during the experiments, especially after 250 games. The x-axis denotes the total number of games played, and the y-axis denotes the winning percentages of the AI and the opponent. The winning percentage is defined as wins divided by the total number of games played. In the first experiment, the winning percentage of rule-based AI was 89.60%, and that of the randomized AI was 10.40%. In the second experiment, the winning percentage of rule-based AI was 73.00%, compared to 27.00% for the conventional AI. Figure 5c shows the performance  Figure 5 shows that rule-based AI's performance was stable during the experiments, especially after 250 games. The x-axis denotes the total number of games played, and the yaxis denotes the winning percentages of the AI and the opponent. The winning percentage is defined as wins divided by the total number of games played. In the first experiment, the winning percentage of rule-based AI was 89.60%, and that of the randomized AI was 10.40%. In the second experiment, the winning percentage of rule-based AI was 73.00%, compared to 27.00% for the conventional AI. Figure 5c shows the performance curves for conventional AI and human players. It was observed that the winning percentage of conventional AI was 45.15%, and that of the human players was 54.85%. In the third experiment, the winning percentage of rule-based AI was 55.05%, and that of the human players was 44.95%. Based on the number of games won and the winning percentages, we can conclude that the rule-based AI agents played well because they outperformed all opponents in each experiment.

Experiment 1: Rule-Based AI versus Randomized AI
In the first experiment, two rule-based AI agents played 2000 games against two randomized AI agents. In every game, two rule-based AI agents played against two randomized AI agents; therefore, we gathered 4000 play logs from each set-4000 from the two rule-based AI agents and 4000 from the two randomized AI agents. Table 2 shows the statistics of play log data. The rule-based AI won 1792 games (89.60%) and lost 208 games (10.40%). We analyzed the frequency (f) of scores in the positive zone (when the AI won and achieved a positive score) and the frequency (f) of scores in the negative zone (when the AI lost and achieved a negative score). Rule-based AI finished with a positive score 1792 times and a negative score 2208 times. In contrast, randomized AI finished with a positive score 208 times and a negative score 3792 times. Table 2. Statistics of play log data from 2000 games of rule-based AI versus randomized AI.

Experiment 1: Rule-Based AI versus Randomized AI
In the first experiment, two rule-based AI agents played 2000 games against two randomized AI agents. In every game, two rule-based AI agents played against two randomized AI agents; therefore, we gathered 4000 play logs from each set-4000 from the two rule-based AI agents and 4000 from the two randomized AI agents. Table 2 shows the statistics of play log data. The rule-based AI won 1792 games (89.60%) and lost 208 games (10.40%). We analyzed the frequency (f ) of scores in the positive zone (when the AI won and achieved a positive score) and the frequency (f ) of scores in the negative zone (when the AI lost and achieved a negative score). Rule-based AI finished with a positive score 1792 times and a negative score 2208 times. In contrast, randomized AI finished with a positive score 208 times and a negative score 3792 times.
Rule-based AI's mean score when winning was 15.64 (SD = 6.43) and when losing it was −4.01 (SD = 2.57). Randomized AI's mean score when winning was 12.32 (SD = 6.53), and when losing it was −4.64 (SD = 2.53). When testing the first hypothesis at a confidence level of 95%, we obtained a p-value of 1.0669 × 10 −283 . The statistics confirm that the results are statistically significant (p-value < 0.01). Thus, rule-based AI significantly outperforms randomized AI. The effect size (r) was 0.5921, which is a large effect (r > 0.50), according to Pearson's classification of effect sizes. 0.5921 Figure 6 shows the frequency distribution of rule-based AI agents and randomized AI agents from 4000 scores. The x-axis denotes the score achieved in each game, and the y-axis denotes the frequency (number of games) of the players that achieved that score. In Big Two, each player receives 13 cards to play. Although each player can only lose 13 cards, they have a chance to win up to 39 cards from the three opponents. The winner's game score is determined by the total number of cards left by the three opponents. For example, if player 1 wins and the three opponents have 2, 6, and 9 cards left, then the scores will be 17; −2; −6; −9. Note that at the end of each game, the winner achieves a positive score and three losers achieve negative scores. Rule-based AI's mean score when winning was 15.64 (SD = 6.43) and when losing it was −4.01 (SD = 2.57). Randomized AI's mean score when winning was 12.32 (SD = 6.53), and when losing it was −4.64 (SD = 2.53). When testing the first hypothesis at a confidence level of 95%, we obtained a p-value of 1.0669 × 10 -283 . The statistics confirm that the results are statistically significant (p-value < 0.01). Thus, rule-based AI significantly outperforms randomized AI. The effect size (r) was 0.5921, which is a large effect (r > 0.50), according to Pearson's classification of effect sizes. Figure 6 shows the frequency distribution of rule-based AI agents and randomized AI agents from 4000 scores. The x-axis denotes the score achieved in each game, and the y-axis denotes the frequency (number of games) of the players that achieved that score. In Big Two, each player receives 13 cards to play. Although each player can only lose 13 cards, they have a chance to win up to 39 cards from the three opponents. The winner's game score is determined by the total number of cards left by the three opponents. For example, if player 1 wins and the three opponents have 2, 6, and 9 cards left, then the scores will be 17; −2; −6; −9. Note that at the end of each game, the winner achieves a positive score and three losers achieve negative scores. In Figure 6a, rule-based AI agents finished with a positive score in 1792 games and a negative score in 2208 games. In Figure 6b, randomized AI agents finished with a positive score in 208 games and a negative score in 3792 games. We compared the frequency distribution of all game scores between rule-based AI and randomized AI, as shown in Figure  6c. In the positive zone, the rule-based AI scored positively more often than the randomized AI. The percentage of rule-based AI was 89.60%, and the percentage of randomized In Figure 6a, rule-based AI agents finished with a positive score in 1792 games and a negative score in 2208 games. In Figure 6b, randomized AI agents finished with a positive score in 208 games and a negative score in 3792 games. We compared the frequency distribution of all game scores between rule-based AI and randomized AI, as shown in Figure 6c. In the positive zone, the rule-based AI scored positively more often than the randomized AI. The percentage of rule-based AI was 89.60%, and the percentage of randomized AI was 10.40%. The rule-based AI agents ended the game with a positive score (won) 79.20% more often than the randomized AI agents. The rule-based AI agents tended to end a game with a positive score of 8 to 21, higher than the randomized AI agents with a positive score of 5 to 18. The rule-based AI agents outperformed the randomized AI agents, ending the game early and leaving the randomized AI agents with many cards. In the negative zone, the randomized AI scored negatively more often than the rule-based AI. The percentage of rule-based AI was 36.80%, and the percentage of randomized AI was 63.20%. The randomized AI agents lost with many cards left 26.40% more often than the rule-based AI agents. The rule-based AI agents tended to end a game with a negative score of −5 to −1, better than the randomized AI agents, which had negative scores of −7 to −2. These results indicate that the strategies used to minimize losses in rule-based AI are effective.

Experiment 2: Rule-Based AI versus Conventional AI
In the second experiment, two rule-based AI agents played 2000 games against two conventional AI agents. We have gathered 4000 play logs from each set-4000 from the two rule-based AI agents and 4000 from the two conventional AI agents. Table 3 shows the statistics of play log data. The rule-based AI won 1460 games (73.00%) and lost 540 games (27.00%). Rule-based AI finished with a positive score 1460 times and a negative score 2540 times. Rule-based AI's mean score when winning was 16.08 (SD = 6.16) and when losing it was −4.22 (SD = 2.55). In contrast, conventional AI finished with a positive score 540 times and a negative score 3460 times. Conventional AI's mean score when winning was 15.16 (SD = 6.98), and when losing it was −4.63 (SD = 2.51). When testing the second hypothesis at a confidence level of 95%, we obtained a p-value of 2.5233 × 10 −100 . The statistics confirm that the results are statistically significant (p-value < 0.01). Thus, rulebased AI significantly outperforms conventional AI. The effect size (r) was 0.3545, which is a medium effect (0.30 < r < 0.50).  Figure 7 shows the frequency distribution of rule-based AI agents and conventional AI agents from 4000 scores. In Figure 7a, rule-based AI agents finished with a positive score in 1460 games and a negative score in 2540 games. In Figure 7b, conventional AI agents finished with a positive score in 540 games and a negative score in 3460 games. We compared the frequency distribution of all game scores between rule-based AI and conventional AI, as shown in Figure 7c. In the positive zone, the percentage of rule-based AI was 73.00%, and the percentage of conventional AI was 27.00%. The rule-based AI agents ended the game with a positive score (won) 46% more often than the conventional AI agents. The rule-based AI agents tended to end a game with a positive score of 10 to 23, higher than the conventional AI agents, which had positive scores of 8 to 21. The strategies to maximize the winning score worked because rule-based AI agents often achieved a high winning score. In the negative zone, the percentage of rule-based AI was 42.33%, and the percentage of conventional AI was 57.66%. The conventional AI agents lost with many cards left 15.33% more often than the rule-based AI agents. The rule-based AI agents tended to end a game with a negative score of −7 to −1, better than the conventional AI agents, which had negative scores of −8 to −1. The rule-based AI agents could minimize the number of cards left when the chance of winning was low. We can conclude that rule-based AI significantly outperforms conventional AI.

Experiment 3: Rule-Based AI versus Human Players
In the third experiment, two rule-based AI agents played 2000 games against two human players. We have gathered 4000 play logs from each set-4000 from the two rulebased AI agents and 4000 from the human players. Table 4 shows the statistics of play log data. The rule-based AI won 1101 games (55.05%) and lost 899 games (44.95%). Rule-based AI finished with a positive score 1101 times and a negative score 2899 times. Human players finished with a positive score 899 times and a negative score 3101 times. Rule-based AI's mean score when winning was 18.25 (SD = 7.24) and when losing it was −4.72 (SD = 3.16). Human players' mean score when winning was 16.10 (SD = 7.71), and when losing

Experiment 3: Rule-Based AI versus Human Players
In the third experiment, two rule-based AI agents played 2000 games against two human players. We have gathered 4000 play logs from each set-4000 from the two rulebased AI agents and 4000 from the human players. Table 4 shows the statistics of play log data. The rule-based AI won 1101 games (55.05%) and lost 899 games (44.95%). Rule-based AI finished with a positive score 1101 times and a negative score 2899 times. Human players finished with a positive score 899 times and a negative score 3101 times. Rule-based AI's mean score when winning was 18.25 (SD = 7.24) and when losing it was −4.72 (SD = 3.16). Human players' mean score when winning was 16.10 (SD = 7.71), and when losing it was −5.31 (SD = 3.33). When testing the third hypothesis at a confidence level of 95%, we obtained a p-value of 2.5325 × 10 −14 . The statistics confirm that the results are statistically significant (p-value < 0.01). Thus, rule-based AI significantly outperforms human players. The effect size (r) was 0.1442, which is a small effect (0.10 < r < 0.30). Figure 8 shows the frequency distribution of rule-based AI agents and human players from 4000 scores. In Figure 8a, rule-based AI finished with a positive score in 1101 games and with a negative score in 2899 games. In Figure 8b, human players finished with a positive score in 899 games and with a negative score in 3101 games. We compared the frequency distribution of all game scores between rule-based AI and human players, as shown in Figure 8c. In the positive zone, the percentage of rule-based AI was 55.05%, and the percentage of human players was 44.95%. The rule-based AI agents ended the game with a positive score (won) 10.10% more often than the human players. The rule-based AI agents tended to end a game with a positive score of 11 to 25, higher than the human players, which had positive scores of 8 to 22. The strategies to maximize the winning score worked because rule-based AI agents often achieved a high winning score. The rule-based AI agents outperformed the human players to end a game early and leave the human players with many cards.
In the negative zone, the percentage of rule-based AI was 48.32%, and the percentage of human players was 51.68%. The human players lost with many cards left 3.36% more often than the rule-based AI agents. The rule-based AI agents tended to end a game with a negative score of −7 to −1, better than the human players, which had negative scores of −9 to −1. These results indicate that the strategies used to minimize losses in rule-based AI are effective. The rule-based AI agents could minimize the number of cards left when the chance of winning was low. We can conclude that rule-based AI significantly outperforms human players.
Overall, the experimental results show that our AI agent played well in all the experiments that have been conducted. In each experiment, the proposed AI agent ended the game with more wins than all the opponents. The strategies to maximize the winning score work because rule-based AI agents often achieved a high winning score when they won, ending the game early and leaving every opponent with many cards. When the chance of winning is low, the strategies used to minimize losses in rule-based AI are effective because rule-based AI agents can minimize the number of cards left.
We have measured the average time taken by our rule-based AI to decide its future actions. The server is a CPU with 3.40 GHz Intel Core i7-6700 and 16 GB of random-access memory. Table 5 illustrates the results of this measurement. The evaluation was conducted in three different field conditions: one (single), two (pair), or five (five-card). As noted, the AI agent could only play the respective number of cards from its hand based In the positive zone, the percentage of rule-based AI was 55.05%, and the percentage of human players was 44.95%. The rule-based AI agents ended the game with a positive score (won) 10.10% more often than the human players. The rule-based AI agents tended to end a game with a positive score of 11 to 25, higher than the human players, which had positive scores of 8 to 22. The strategies to maximize the winning score worked because rule-based AI agents often achieved a high winning score. The rule-based AI agents outperformed the human players to end a game early and leave the human players with many cards.
In the negative zone, the percentage of rule-based AI was 48.32%, and the percentage of human players was 51.68%. The human players lost with many cards left 3.36% more often than the rule-based AI agents. The rule-based AI agents tended to end a game with a negative score of −7 to −1, better than the human players, which had negative scores of −9 to −1. These results indicate that the strategies used to minimize losses in rule-based AI are effective. The rule-based AI agents could minimize the number of cards left when the chance of winning was low. We can conclude that rule-based AI significantly outperforms human players.
Overall, the experimental results show that our AI agent played well in all the experiments that have been conducted. In each experiment, the proposed AI agent ended the game with more wins than all the opponents. The strategies to maximize the winning score work because rule-based AI agents often achieved a high winning score when they won, ending the game early and leaving every opponent with many cards. When the chance of winning is low, the strategies used to minimize losses in rule-based AI are effective because rule-based AI agents can minimize the number of cards left.
We have measured the average time taken by our rule-based AI to decide its future actions. The server is a CPU with 3.40 GHz Intel Core i7-6700 and 16 GB of random-access memory. Table 5 illustrates the results of this measurement. The evaluation was conducted in three different field conditions: one (single), two (pair), or five (five-card). As noted, the AI agent could only play the respective number of cards from its hand based on the field condition. If no valid combination was possible, then the AI agent skipped a turn. We have taken measurements in three different cases: 13 cards, 8 cards, and 5 cards that were left in the agent's hand for the average time needed by the AI agent to find and choose the correct card needed for the combination. As Table 5 shows, the AI agent took less than one second to decide a selected card combination for each of its turns. Furthermore, we analyzed the instances where our AI lost to the human player so that we could minimize its weaknesses and improve the algorithm of the proposed AI. We found two factors causing the AI agent to lose. The first is a bad initial set of cards, which highly diminishes the agent's chances of winning the game. This situation is because, in the beginning, cards are distributed at random; therefore, every player has a chance of drawing weak cards. The second factor that can result in a loss is the large number of potential combinations of cards in the opponent's hand. The AI agents could not see the cards in the opposing player's hand; thus, they could only attempt to predict their cards. This behavior can result in a defeat because sometimes the predictions are made incorrectly, causing the AI to play the wrong card and eventually lose.

Conclusions
In this paper, we have proposed a rule-based AI method for an agent playing Big Two. The proposed method includes rules for two to four cards left, rules for more than four cards left, a winning strategy, and rules for the agent if they are not in the control position. The experimental results show that all hypotheses were accepted, because the proposed AI significantly outperformed randomized AI, conventional AI, and human players. The proposed AI achieved winning rates of 89.60%, 73.00%, and 55.05%, respectively, better than all opponents with winning rates of 10.40%, 27.00%, and 44.95%. The proposed AI exhibited better performance than conventional AI when playing against human players, with a winning percentage of 55.05% compared to 45.15%. Further research can use this rule-based AI as a baseline player. Furthermore, our AI agent could play Big Two well with the capability to maximize the winning score and minimize the number of cards left when the chance of winning was low. This AI could potentially be used to give hints to beginners or players hesitating about their turn. However, playing against an agent that is too strong is not fun. Players want to play with opponents whose skills are close to theirs. The proposed method can be used as a starting point for the next stage of research. We plan to develop a multiple-level AI agent that will entertain and make playing Big Two enjoyable.

Data Availability Statement:
The data presented in this study are available online at https://github. com/sugiyantoID/RuleBasedAgents, accessed on 28 April 2021.