A Comparison of Evolutionary and Tree-Based Approaches for Game Feature Validation in Real-Time Strategy Games with a Novel Metric

: When it comes to game playing, evolutionary and tree-based approaches are the most popular approximate methods for decision making in the artiﬁcial intelligence ﬁeld of game research. The evolutionary domain therefore draws its inspiration for the design of approximate methods from nature, while the tree-based domain builds an approximate representation of the world in a tree-like structure, and then a search is conducted to ﬁnd the optimal path inside that tree. In this paper, we propose a novel metric for game feature validation in Real-Time Strategy (RTS) games. Firstly, the identiﬁcation and grouping of Real-Time Strategy game features is carried out, and, secondly, groups are included into weighted classes with regard to their correlation and importance. A novel metric is based on the groups, weighted classes, and how many times the playtesting agent invalidated the game feature in a given game feature scenario. The metric is used in a series of experiments involving recent state-of-the-art evolutionary and tree-based playtesting agents. The experiments revealed that there was no major di ﬀ erence between evolutionary-based and tree-based playtesting


Introduction
Real-Time Strategy (RTS) games are designed as turn-based games where players, each following their own strategies, try to defeat one another through a series of turns. The term 'strategy' stands for the highest form of decision-making process, where the ultimate purpose is to defeat the opponent. Decisions are made between turns (a turn is a transition from the current game state to the next one), which are so short (i.e., in the range of milliseconds) that the game looks as though it is progressing in real time. After a decision is made, the actions are executed. The difference between RTS games and classical turn-based board games, of which probably the most well-known representative is the game of chess, is in the execution of the actions. Actions in RTS games are durative and simultaneous [1], as opposed to the instant moves, of which each player can make one per turn, in classical board games.
During the last decade, RTS games have become one of the best test beds for researching Artificial Intelligence (AI) for games [2,3]. The main reason for the growth in research is the fact that RTS games offer plenty of challenges for researchers. For example, RTS games are representatives of the highest class of computational complexity [4], which is due to their extremely large state-action spaces [5] (i.e., search space). Search space is often impossible to search exhaustively, because a specific game is a high-dimensional space of game variants (many different parameters are available), and it is also called game space [6]. Exploring the search space of games is often considered to be a difficult problem [7], and most of the complex optimization problems relating to games' search spaces cannot be solved using the exact methods that search for the optimal solution by enumerating all possible solutions. To solve these problems, various methods have emerged in the past decades that solve problems approximately. In recent times, researchers have been looking for inspiration for the design of these approximate algorithms/methods in nature, e.g., Darwin's evolutionary theory [8], the collective behavior of social living insects [9], the social behavior of some animal species [10,11], physical phenomena [12], and so on.
The bio-inspired computation field [13] is a field that covers all of the algorithms/methods that fall within the scope of these mentioned inspirations and is an extensively studied research area of AI. Nowadays, numerous algorithms exist that fall under the bio-inspired computation umbrella, such as the Artificial Bee Colony (ABC) Algorithm [14], Differential Evolution (DE) [15], Firefly Algorithm (FA), Genetic Algorithm (GA) [16], Monarch Butterfly Optimization (MBO) [17], etc. Due to the popularity of this subject, numerous unprecedented implications of these approaches exist among real-world applications [13]. Some of the application areas where bio-inspired computation approaches have been successfully applied include: antenna design [18], medicine [19], and dynamic data stream clustering [20].
In addition to the many different application areas, bio-inspired computation also plays an important role in the design and development of games. Bio-inspired computation approaches in games have been used for procedural content generation [21], the development of controllers that are able to play games [22], educational and serious games [23], intelligent gaming systems [24], evolutionary methods in board games [25], behavioral design of non-player characters [26], etc.
Gameplaying agents (algorithms) are made to play the game in question, with the game rules being hard-coded or self-obtained (general gameplaying), in a self-sustained way (i.e., no human input is needed during the (general) gameplay) [27]. The primary task of the gameplaying agent is to win games, and the secondary task is to win them with a higher score [28]. For the RTS gameplaying agent [29] to be able to cope with the high computational complexity of the game space, it has to be able to function inside different segments of the game, which are as follows: resource and production management (also categorized as economy) [30], strategical [31], tactical [32] and micromanagement [33] operations, scouting [34] and sometimes even diplomacy [35]. For one to be successful when playing an RTS game, a balanced combination of all those segments must be considered by the agent [36]. Since gameplaying agents are already built to operate and cover a variety of tasks in a given game space, they can be adapted to become playtesting agents.
Playtesting agents are meant to play through the game (or a slice of it) and try to explore the behavior that can generate data that would assist developers during the development phase of a game [37,38]. Game studios conduct countless tests on gameplaying with real players [39], but relying on humans for playtesting can result in higher costs and can also be inefficient [37]. The research on playtesting is, therefore, very important for the following two reasons: it has a huge economic potential and is of considerable interest to the game industry [40]. Further economic potential is also apparent in semi-related fields, like Gamification [41].
A Game Design Document (GDC) specifies core gameplay, game elements, necessary game features, etc. [42]. With this paper, we tackle the problem of the automatic validation of game features for the game space specified in GDC and also address research requirements from articles (for instance, [43]), where the authors point out the need that games with a higher complexity have of scaling.
In this article, we will try to find the answers to the following research questions: • RQ1: How easy is it to adapt gameplaying agents as playtesting agents in RTS games? • RQ2: Which RTS game definitions can be used to make a comparison between different playtesting agents? • RQ3: How to evaluate playtesting agents based on RTS game definitions, and which are the most beneficial to them?
• RQ4: Is there a difference between evolutionary and the non-evolutionary approaches (like standard Monte Carlo tree searches [44]) with regard to playtesting abilities? • RQ5: How does one define valid/invalid game features in the game space?
Altogether, the main contributions of this paper are as follows: -A novel metric is proposed to make a comparison between different playtesting agents; -A method is proposed for adapting gameplaying agents as playtesting agents in real-time strategy games; and - The proposed metric is used in a series of experiments involving adapted evolutionary and tree-based state-of-the-art gameplaying agents.
The structure of the remainder of this paper is as follows. Section 2 outlines the game features of real-time strategy games and the microRTS simulation environment, while Section 3 presents the proposed novel metric that will allow for the comparison of different playtesting agents. Section 4 describes the experimental environment, adaptation of gameplaying agents as playtesting agents (including detailed descriptions of them) and the results of the experiments. A Discussion is provided in Section 5, and the conclusion is presented in Section 6.

Real-Time Strategy Games
This chapter briefly outlines the game features of RTS games, and a description of the microRTS environment is also provided.

Game Features of RTS Games
"Game feature" is a generic term used to refer to differences and similarities between games [45]. Game features are defined in GDC [46], and, after they are implemented, each game's features rely on the use of game mechanics. Game mechanics are methods invoked by agents in interacting with the game world (e.g., to obtain the health value of the unit) [47]. In [48], 18 general definitions of game features (hereinafter referred to as groups) can be found.
In the RTS game domain, different kinds of game feature subset groupings are possible (Economic, Military, Map Coverage, Micro Skill and Macro Skill) [49], but to the best of our knowledge, the RTS game features have not yet been placed into groups. The placement of RTS game features into groups is, in our opinion, important, because it allows for the possibility of comparing RTS game features with the features of other game genres in the future.

microRTS
There are many different RTS game worlds in existence. Not all of them are openly available, but even some of the commercial ones have been opened up for research purposes (e.g., StarCraft™ was opened through the programming interface). microRTS is a simple non-commercial simulation environment, which was created to test any theoretical ideas a game researcher might have.
This simulation environment follows standard RTS game genre game rules: 1.
Players gather resources and use them to create structures and new mobile units; 2.
The game goal is to defeat the opposing player in a battle for supremacy; and 3.
Resources, structures, and mobile units must be cleverly used.
The microRTS environment includes the following features (seen in Figure 1): Workers are used to gather resources and build structures, and they also possess the ability to attacks with limited firepower. Light, heavy and ranged are the main battle units used for attacks on opponent structures and mobile units. Battle units have different initial properties (i.e., a heavy battle unit can sustain more damage before being destroyed versus a light battle unit, and a ranged unit can shoot farther). Bases produce workers. Barracks are used to create battle units. The wall is used as a physical barrier in the map. Scenarios can be configured for varying map sizes (4 × 4, 8 × 8, 12 × 12, etc.) and with different starting positions for the unit types, structures, and resources (which can be placed anywhere on the map). The game can be played with visible features (graphical interface turned on for observations) or in the background (which allows for a faster execution of scenarios and quicker overall simulations, with less computer resources used). microRTS also already includes many gameplaying agents that can be used in experiments.

Proposal of a Metric for Game Feature Validation
Our motivation to create a metric came from the need to be able to differentiate easily between different playtesting agents' performances, when multiple game features need to be validated. In order to propose a novel metric for comparing playtesting agents, the following steps were considered in our study: -STEP 1: The RTS game features are identified; -STEP 2: The game features are grouped in precise game feature groups; (STEP 2.1): Classification of game feature groups according to their correlation (groups that are similar in description tend to be correlated, and this also allows single game features to be placed into multiple groups) and importance (some groups are of a higher importance, because they reflect and are essential to RTS gameplay, while some could be left out without jeopardizing the game's position in the RTS game genre); -STEP 3: For empty groups in STEP 2, a further identification of the RTS game features is conducted by including more search strings and other search engines (e.g., Google Scholar); and Workers are used to gather resources and build structures, and they also possess the ability to attacks with limited firepower. Light, heavy and ranged are the main battle units used for attacks on opponent structures and mobile units. Battle units have different initial properties (i.e., a heavy battle unit can sustain more damage before being destroyed versus a light battle unit, and a ranged unit can shoot farther). Bases produce workers. Barracks are used to create battle units. The wall is used as a physical barrier in the map. microRTS allows configurable scenarios to be placed in the environment. Scenarios can be configured for varying map sizes (4 × 4, 8 × 8, 12 × 12, etc.) and with different starting positions for the unit types, structures, and resources (which can be placed anywhere on the map). The game can be played with visible features (graphical interface turned on for observations) or in the background (which allows for a faster execution of scenarios and quicker overall simulations, with less computer resources used). microRTS also already includes many gameplaying agents that can be used in experiments.

Proposal of a Metric for Game Feature Validation
Our motivation to create a metric came from the need to be able to differentiate easily between different playtesting agents' performances, when multiple game features need to be validated. In order to propose a novel metric for comparing playtesting agents, the following steps were considered in our study: -STEP 1: The RTS game features are identified; -STEP 2: The game features are grouped in precise game feature groups; (STEP 2.1): Classification of game feature groups according to their correlation (groups that are similar in description tend to be correlated, and this also allows single game features to be placed into multiple groups) and importance (some groups are of a higher importance, because they reflect and are essential to RTS gameplay, while some could be left out without jeopardizing the game's position in the RTS game genre); -STEP 3: For empty groups in STEP 2, a further identification of the RTS game features is conducted by including more search strings and other search engines (e.g., Google Scholar); and -STEP 4: The novel metric is proposed. All steps are described in detail in the following subsections and are presented graphically in Figure 2.
Mathematics 2020, 8, x FOR PEER REVIEW 5 of 20 -STEP 4: The novel metric is proposed.
All steps are described in detail in the following subsections and are presented graphically in Figure 2.

Identification of RTS Game Features
Game features are mentioned in many RTS game research works, but they are scattered across different subdomains and research agendas. Our goal was to use the pool of research articles and dissertations and to identify the game features included in this research. The pool was reviewed with the help of a literature search. The ISI Web of Science and ProQuest research search engines were used. A search query with the following search string was made: "game features" and "real-time strategy games", which returned 88 hits for the ISI Web of Science and 34 hits for ProQuest.
The results (articles and dissertations) were filtered to exclude non-RTS game research works. A manual search was conducted through the research work for mentions of the "feature" string (note: 14 works from the ISI Web of Science and 0 for ProQuest were located after a manual search). The located text was extracted and analyzed for surrounding context, then transformed into a compact format that could act as a short game feature description. The surrounding context was used to transform the text, because not all research work has game features that can be used as-is. A short description was then made of the list of game feature descriptions. If a description was already on the list, and it was not adding additional information, it was omitted (note: seven works from the ISI Web of Science were omitted). Additionally, one research work can include more than just one mention of the string "feature" with the surrounding context. Note: Future work could broaden the scope and include other search strings (like "aspect" or "feature") for a more in-depth survey of the general RTS features. Table 1 includes a list of short game feature descriptions, which were produced after the completion of the first step. The short game feature descriptions are accompanied by a reference.

Identification of RTS Game Features
Game features are mentioned in many RTS game research works, but they are scattered across different subdomains and research agendas. Our goal was to use the pool of research articles and dissertations and to identify the game features included in this research. The pool was reviewed with the help of a literature search. The ISI Web of Science and ProQuest research search engines were used. A search query with the following search string was made: "game features" and "real-time strategy games", which returned 88 hits for the ISI Web of Science and 34 hits for ProQuest.
The results (articles and dissertations) were filtered to exclude non-RTS game research works. A manual search was conducted through the research work for mentions of the "feature" string (note: 14 works from the ISI Web of Science and 0 for ProQuest were located after a manual search). The located text was extracted and analyzed for surrounding context, then transformed into a compact format that could act as a short game feature description. The surrounding context was used to transform the text, because not all research work has game features that can be used as-is. A short description was then made of the list of game feature descriptions. If a description was already on the list, and it was not adding additional information, it was omitted (note: seven works from the ISI Web of Science were omitted). Additionally, one research work can include more than just one mention of the string "feature" with the surrounding context.
Note: Future work could broaden the scope and include other search strings (like "aspect" or "feature") for a more in-depth survey of the general RTS features. Table 1 includes a list of short game feature descriptions, which were produced after the completion of the first step. The short game feature descriptions are accompanied by a reference. GF1_RG [50] Game engine features and objects Game unit (battle unit) always hits with x points of damage. GF2_EOBJ [51] Game difficulty (aiding) The opponent is aided with x more units, resulting in a player losing every game. Note: such a feature can be part of an advanced mode, where non-advanced users must not/cannot win.
GF3_DIFA [52] Game objective (construction) If the player tries to, it must be able to create x game structure(s) (e.g., barracks). GF4_CONS [53] Game assessment Game score is calculated based on raw features (e.g., no. of workers) and must represent the game state status correctly when presented to the player.

GF5_AST
[54] Stumbling block The player cannot destroy the enemy in a specific part of the map due to stumbling blocks (e.g., a wall).
GF6_SB [55] Game exploration (unlocking new technologies) If the player tries discovery, it can create x game units (e.g., battle unit-light) through the usage of game structure(s) (e.g., barracks).
GF7_EXPL [56] Special unit The player is confronted with a special game unit (e.g., Super-Heavy with special features), which cannot be destroyed with the given resources.
GF8_FANT [57] Partial information (fog-of-war) The player cannot operate in a partially observable environment, so it therefore cannot destroy the opponent in such an environment.
GF9_PARI [58] Game difficulty (challenge) The player cannot destroy x structures (e.g., barracks) guarded with y rushing game units (e.g., battle unit-heavy) with access to z units of A type resources.
GF10_DIFC [52] Game control (take over the map) The player can destroy all the structures on the map before the time runs out. GF11_GCMP [59] Interaction on a complex map If the player controlling x battle units (e.g., a heavy battle unit) finds a static unit (e.g., barracks) in a maze (or complex map), the static unit is always destroyed.

Grouping the Game Features into Specific Groups
The grouping of game features into specific groups has two benefits: a group consists of game features with similar modus operandi (i.e., correlated and in the same context), and groups can serve as a basis for sharing research with other game genres.
We already mentioned (Section 2.1) that the literature search revealed 18 groups, which are formed independently of the specific game genre, and which we will use for grouping. These groups are: adaptation, assessment/rewards/scores, challenge, conflict, control, exploration, fantasy/location, interaction/interactivity (equipment), interaction (interpersonal/social), language/communication, motivation, mystery, pieces or players, progress and surprise, representation, rules/goals, safety and sensory stimuli. A detailed description about the meaning of each of the groups can be found in the tabular presentation in [62]. Table 2 presents the results after the completion of both steps, with references to the source of the compact description. Our goal was to have at least one game feature representative for each of the groups. If there was no game feature available in Table 1 for an empty group, we tried to locate the research work for that group by searching via Google Scholar (STEP 3) using different search strings (e.g., "impassable terrain" for a conflict group) in regard to the context of the group. The research works found went through the procedure described in STEP 1, and a short game feature description was included in Table 1 and in the accordingly empty group in Table 2. (Observation: we noticed that many research works on game feature descriptions originated from the domain of player/opponent modeling, RTS replay analysis, game balancing and strategy selection/prediction.) One game feature can belong to more than one group. For some groups, we could not find or create any viable game feature description that could be measured by the game mechanics. Such groups remained empty but were still included in the Table. The reason: future RTS research could produce game features for currently empty groups.

Classification of Feature Groups According to Their Correlation and Importance
As game features tend to be correlated, so do groups. One group can be, context wise, closely related to some groups but only loosely related to others. Additionally, some contexts are more important than others with regard to RTS gameplay. Table 3 presents the classification of feature groups into three importance classes: • The high-importance class contains groups that represent the essence of RTS gameplay (based on our understanding of the RTS game worlds and their aspects [63]); • Groups that operate on a game mechanics level (e.g., Interaction/Interactivity (Equipment) group) or are not essential to the game (they could potentially be left out, e.g., Mystery group) are in the medium-importance class; and • Groups that, in Table 2, did not have a feature representative (empty of features) were included in the low-importance class. The importance level of each of the groups is represented by a class. Regarding the game worlds, we allow for the possibility of different reconfigurations of the groups inside the classes. We also included the weight and mathematical description of the set. Weight is a numerical value that is set by the user of the metric. It represents how much the groups belonging to the specific class will count towards the metric score.

Proposal of the Metric
In this subchapter, we explain our metric for summarizing agents' performance while they validate game features in an RTS game space. The metric calculates its score based on how many times the playtesting agent invalidated the game feature of a fixed number of repeats for a given scenario (the sum of validations and invalidations equals the number of scenario repeats).
If the playtesting agent during the execution of the scenario could not test the game feature, because it does not come into a situation, or it is not programmed to deal with the situation where validation can take place, then such a game feature is valid from this point of view. The number of successful validations is, therefore, omitted from the game score, since it is biased.
For a set of groups G i , 1 ≤ i ≤ 18, where each member of group G i holds a set of Game features (GFs) (GF j ∈ G i , 0 ≤ j), and each GF i holds a set of executable scenarios S (S k ∈ GF i , 1 ≤ k), the number of unsuccessful validations per scenario is defined by numInvalid ijk , and the number of times the scenario is repeated is defined by numOfScenRep = n, 1 ≤ n, the following formulas apply: In Equation (1), the number of invalidations of a given group (index i), game feature (index j) and scenario (index k) is divided by the total number of scenario repeats. In Equation (2), the score is calculated for all the game features and scenarios that the set of groups holds. In Equation (3), the scores of the set of groups are multiplied by their respective weights.

Experiments and Results
In this chapter, we present the specifications of hardware and software used for the experimental environment, as well as the results of the experiments.

Experimental Environment
Hardware: The experiment was carried out on an i7-3770k CPU computer @ 3.50 (turbo: 3.9) GHz, 4 cores (note: during the experimentation, only one core was used, since agents do not implement the multi-core support) and 16 GB RAM.
Software: OS Windows 10 Pro and Java Development Kit 13.0.2. The experiment was set in the latest version of the microRTS environment, acquired from an online source at the time of preparing this article [64]. The microRTS environment comes pre-loaded with the following gameplaying agents: RandomAI, RandomAIBiased, MonteCarlo, IDRTMinimax, IDRTMinimaxRandomized, IDABCD, UCT, PuppetSearchMCTS, and NaiveMCTS. TiamatBot was acquired from the online source [65]. MixedBot (which includes TiamatBot source files but an improved version) was acquired from the online source [66] and was included in the microRTS environment. Every gameplaying agent is used in the experiment as it was acquired from the online source of original authors (i.e., no code or internal parameter was changed for experimental purposes). Table 4 shows the hyper-parameters used for the validation of every game feature presented in Table 1. These hyper-parameters are pre-set within the microRTS environment. The only parameter that we changed was iterations, which we set to 50 (before it was set to 10). The standard UnitTypeTable was used where necessary. Note: to validate the GF9_PARI, we changed the environment from fully observable to partially observable.
The game feature descriptions presented in Table 1 were derived from related works and written independently of a specific game environment, i.e., they can be implemented in any RTS game engine. In Table 5, we present the same game features as those presented in Table 1, although the former are adapted to the microRTS environment and a specific scenario. All game features in Table 5 are written with the assumption that they are valid for the microRTS environment. If the playtesting agent actually manages to invalidate a game feature from the list, it will add to its metric score.

Adaptation of Gameplaying Agents as Playtesting Agents
To adapt a gameplaying agent to the playtesting task, we created a non-intrusive component. The component contains information about the scenario (map, position of units and the opponent) and controls the validation procedure by following the playtesting agents' progress (i.e., actions that it executes) and by accessing game environment information (e.g., current game state status). All the information is accessible through well-defined interfaces of the microRTS source code. One of the interface methods is the method that returns the best action for the given game state, and every gameplaying agent operating in the microRTS environment implements it.

GF3_DIFA
The opponent is aided by 5 more heavy battle units, resulting in the player losing every game.
basesWorkers8x8.xml (standard map with 5 heavy units added for the opponent)

GF4_CONS
If the player tries to, they must be able to create 1 barracks. basesWorkers8x8.xml (standard map)

GF5_AST
The game score is calculated on the basis of raw features of the game state (no. of workers and no. of light, heavy and ranged units multiplied by their cost factors) and must represent the game state status correctly when presented to the player.

GF6_SB
The player cannot destroy the enemy in a specific part of the map due to a wall.
basesWorkers12x12.xml (standard map with a wall placed in the middle of the map).

GF7_EXPL
If the player tries discovery, it must be able to create 1 light battle unit through the usage of game barracks.

GF8_FANT
The player is confronted with a special game unit (Super-Heavy battle unit with ten-times the armor of a normal-Heavy one), which cannot be destroyed with the given resources.
basesWorkers8x8 (standard map with Super-Heavy battle units added to help the opponent)

GF9_PARI
The player cannot operate in a partially observable environment, so it therefore cannot destroy the opponent in such an environment.
basesWorkers12x12.xml (standard map with a partially observable environment enabled)

GF10_DIFC
The player cannot destroy 2 barracks guarded with 3 heavy rushing units with access to 60 units of resources.

GF11_GCMP
The player can destroy three barracks before the time runs out. When the actions are executed in a game state, it cycles to the next one (i.e., actions change the inner state). During such cycles, our component tests if the Game Feature is valid or invalid. A Game Feature is invalid if the condition that is written in the validation procedure of the game feature in question is not fulfilled. The condition is tested against the information provided from the agent's executed action and the environment's current game state. The validation procedure checks the validity of the game feature, until either the maximum number of cycles is reached, or the game is over (i.e., one of the players has no more units left on the field).
For example, the game feature, GF8_FANT, is validated by checking if the resulting game state still holds this special unit after the agent has given an order to fire on it. If in any cycle the unit is destroyed, the game feature is invalid.

Playtesting Agents
The following gameplaying agents have been adapted as playtesting agents for the purposes of experimentation:

1.
Basic (part of the microRTS package): • RandomAI: The choice of actions is completely random; • RandomBiasedAI: Based on RandomAI, but with a five times higher probability of choosing fighting or harvesting action over other actions; and • MonteCarlo: A standard Monte Carlo search algorithm.

2.
Evolutionary Algorithm (online source): • TiamatBot (original): Uses an evolutionary procedure to derive action abstractions (conducted as a preprocessing step [67]). The generation of action abstractions can be cast as a problem of selecting a subset of pure strategies from a pool of options. It uses Stratified Strategy Selection (SSS) to plan in real time in the space defined by the action abstraction thus generated [68]. It outperformed the best performing methods in the 2017 microRTS competition [69] and is therefore considered as one of the current state-of-the-art gameplaying agents.

4.
Evolutionary and Tree-Based (online source): • MixedBot: This bot integrates three bots into a single agent. The TiamatBot (improved original) was used for strategy decisions, Capivara was used for tactical decisions [73], and MicroRTSbot [74] included a mechanism that could change the time allocated for two decision parts dynamically based on the number of close armies. MixedBot placed second in the 2019 microRTS (standard track) competition (first place went to the game bot that also uses offline/out-game learning [75]).

Results of the Playtesting Agents
For each of the playtesting agents, Table 6 shows how many times the group's game feature representative was validated or invalidated. Table 6 also shows what metric score they acquired. The weights for calculating the metric score were set as follows: W1 = 1, W2 = 0.5 and W3 = 0. W3 was set to 0, because the C L class groups are devoid of features. Additionally, empty groups were omitted from the Table. To allow for clearer results, we abbreviated the game feature representatives' labels, e.g., G 1 and its GF3_DIFA Game Feature representative, if validated 50 times and invalidated 0 times, was shortened to G1GF3(50, 0). Table 6. Playtesting agent results for feature validations and their metric score.  Table A1, which, due to its size, can be found in Appendix A, shows how the metric score changes for each of the playtesting agents and all combinations of the W1 to be decreased from 1 to 0.55 (with steps of 0.05) and those of W2 to be decreased from 0.50 to 0.05 (also with steps of 0.05). Note: the data used for calculating the metric scores is the same as those presented in the second column of Table 6. RandomAI was omitted from Table A1, because its metric score is zero for all the combinations (it did not invalidate any of the features).

Discussion
During the experimentation phase, the microRTS game engine environment performed as expected (i.e., without visible or known bugs). Our presumption from the start of the experiment was that all of the Game Features were valid, yet the experiments showed that two of the Game Features were actually invalid (GF3 and GF8). A closer inspection of the GF3 results, specifically its invalidation number, revealed that not all of the playtesting agents caught the invalid game feature, and that some of them only invalidated it in a fraction of tries. Additionally, if the number of scenario repeats would be set to lower than fifty, it is possible that only the playtesting agents with a better performance would be successful in finding GF3 to be invalid.
GF3 was invalidated by eight playtesting agents, while GF8 was invalidated by all of them, with the only exception being the basic RandomAI. The difference in the number of playtesting agents that invalidated the game features, GF3 and GF8, successfully shows us that some game features are more sophisticated and require agents that intelligently explore and exploit the search space in question.
We discovered two important guidelines for validation testing: 1. Good agents' gameplaying performance is important, because it also reflects playtesting performance; and 2.
With a greater number of scenario repeats comes a higher probability of game features being valid.
Our purpose was not to judge the existing gameplaying agents created by the research community based on the score they achieved. We did, however, use the invalid number part that they attained to calculate the metric score for metric testing purposes. The results were encouraging. The state-of-the art evolutionary and tree-based agents were good performers, not just for gameplaying, but also for playtesting. The line between basic agents (e.g., G1GF3(50, 0)) and advanced ones (e.g., G1GF3 (21,29)) can also be clearly seen. We did not measure the average time for an agent to complete a scenario, but during playtesting, we noticed that agents that were either basic (e.g., RandomAI) or very good performers (e.g., NaiveMCTS) completed the validations in the fastest amount of time. We believe that this resulted from decisions being made quickly (either bad or good).
At this point, we can also provide answers to the research questions presented in the Introduction. RQ1: the adaptation of a gameplaying agent as a playtesting agent is straightforward, provided that the game engine follows good software design techniques (components, interfaces, etc.). In our estimation, this is very important, because it allows for research discoveries in the gameplaying domain to be transferred to the playtesting domain and probably also for higher adaptation rates of such discoveries for commercial use. RQ2: In comparing different playtesting agents, our metric relies on the groups presented in Table 2. The groups belong to different classes (Table 3), each with their own weights. Additional information for comparisons can also be found based on the calibration of these weights. For that purpose, Table A1 was created in Appendix A, which shows how the metric score changes in relation to the changes of the weights. In this way, we can give importance to a specific set of groups and achieve a greater differentiation between the playtesting agents covering them. RQ3: playtesting agents are evaluated through game feature definitions using the created metric. The most beneficial Game Feature definitions are the ones that belong to the groups that are in the high-importance class, shown in Table 3. RQ4: evolutionary and non-evolutionary approaches in the state-of-the-art segment both performed well, and their playtesting abilities are high. No major differences were detected for the game features and scenarios tested. RQ5: the validity of the game feature was defined, with the condition of the validation procedure inside the component used for the adaptation of the gameplaying agents.

Conclusions
The experiments provide encouraging results, and we confirmed our belief that playtesting with agents is important and worthy of further research. Playtesting agents can play in the same scenario repeatedly and with good results, while repetitive play (e.g., playing the same scenario fifty or more times) is probably tiresome for human players, who are therefore more prone to making errors. We also confirmed that our novel metric performed as expected, because the metric scores revealed a certain consistency when traversing from basic to state-of-the art playtesting agents. To the knowledge of the authors, such a metric (i.e., one that would evaluate playtesting game agents based on their game feature performance) does not yet exist. The creation of it is necessary to establish common ground for the research conducted in the domain of game features and in the domain of playtesting agents.
Through a series of experiments, we were also interested in how different evolutionary-based playtesting agents explored the search space. The valuable information obtained in our experiments will serve us as a steppingstone in the development of new playtesting agents that are based on modern Evolutionary Algorithms, as well as Swarm Intelligence algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Metric scores with variable weights for all the playtesting agents.

Playtesting Agent
Metric Scores