Next Article in Journal
Novel ZIF-67@Bentonite (ZIF-67@BNT) Nanocomposite for Aqueous Methyl Orange Removal
Previous Article in Journal
Evolution of Sedimentary Facies of the Ordovician-Silurian Transition and Its Response to the Guangxi Movement in Southern Sichuan Basin, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference

by
Rosa Lilia Segundo Díaz
1,2,*,
Gustavo Rovelo Ruiz
3,
Miriam Bouzouita
4,
Véronique Hoste
2 and
Karin Coninx
1
1
Faculty of Sciences, HCI and eHealth, Hasselt University, 3590 Diepenbeek, Belgium
2
Faculty of Arts and Philosophy, Department of Translation, Interpreting and Communication, LT3, Ghent University, 9000 Ghent, Belgium
3
Digital Future Lab-Flanders Make, Hasselt University, 3590 Diepenbeek, Belgium
4
Institut für Romanistik, Humboldt-Universität zu Berlin, 10099 Berlin, Germany
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(7), 3561; https://doi.org/10.3390/app15073561
Submission received: 3 March 2025 / Revised: 18 March 2025 / Accepted: 20 March 2025 / Published: 25 March 2025

Abstract

:
Linguistic tasks such as Part-of-Speech (PoS) tagging can be tedious, but are crucial for the development of Natural Language Processing (NLP) tools. Games With A Purpose (GWAPs) aim to reduce the monotony of the task for native speakers and non-experts who contribute to crowdsourcing projects. This study focuses on revising and correcting PoS tags in the Corpus Oral y Sonoro del Español Rural (COSER), the largest collection of oral data in the Spanish-speaking world, to create a parsed corpus of European Spanish dialects. It also examines how game design elements (GDEs) affect players’ enjoyment. Three games—Agentes, Tesoros, and Anotatlón—were developed, incorporating different GDEs, such as rewards and challenges. The results show two levels of enjoyment: at the concept level with Anotatlón, and at the level of individual GDEs with Tesoros. This suggests that certain GDEs influence player enjoyment and, consequently, their preference for certain games. However, the study also shows the complexity of evaluating triggers for player enjoyment in games with more than one implemented GDE.

1. Introduction

Natural Language Processing (NLP) tools used for Part-of-Speech (PoS) tagging have been trained with samples from written standard language, due to the lack of specific linguistic resources for oral corpora or varieties. This is even the case for languages that are spoken worldwide, such as Spanish. Consequently, the PoS tagging accuracy tends to drop on text types other than those the PoS tagger has been trained on, necessitating manual correction of the tags. It goes without saying that this manual correction is time-consuming and costly, even when several experts are involved. As a result, Games With A Purpose (GWAPs) implemented for this purpose have shown to be a promising alternative approach for creating large annotated corpora, compared to high-cost and time-consuming manual data annotation or other data collection methods (e.g., Mechanical Turk) [1]. However, designing a GWAP for linguistics is still a challenge, as many linguistic tasks are complex, specialized, and tedious. Even the most straightforward tasks, such as assigning a PoS tag (e.g., verb, noun, adjective, pronoun) to each word in an input text, can be problematic, given that the terminology may be unfamiliar to players.
Additionally, the corpus itself might present additional difficulties. To illustrate, let us consider the COSER corpus (Corpus Oral y Sonoro del Español Rural, ‘Audible Corpus of Spoken Rural Spanish’ [2]), on which the current research is based. COSER is the most extensive collection of oral data in the Spanish-speaking world and consists of transcripts of semi-directed interviews with rural elderly male and female informants who have little to no education and who have experienced low mobility [3]. As the corpus contains dialectal data, players from other geographic areas or different age groups might experience difficulty in identifying the words’ meanings and, therefore, the correct PoS tag.
Given these difficulties, this study aimed to investigate whether certain design elements can increase the engagement of players, in order to ensure the completion of the linguistics tasks and so that corpora can be successfully created and annotated. To engage players, we explore enjoyment, defined as the positive experience players undergo throughout a gaming session [4]. From this point of view, the design should allow players to understand the task at hand but, above all, significantly strengthen the attractiveness of games.
The existing body of research on GWAPs has presented various games to address specific linguistic tasks and build diverse linguistic resources. However, whether the GWAPs and their design are enjoyable for the players has not been addressed. Verbosity [5] was designed to collect commonsense knowledge about words, with that knowledge built up of facts that many people recognize as true. Zombilingo [6] focuses on dependency syntax annotation, in which two words are related, one the governor and the other the dependent. WordClicker [7] focuses on collecting judgments about PoS tags. Other games are centered on semantic tasks, such as Phrase Detectives [8], TileAttack [9], and Wordrobe [10]. More concretely, Phrase Detectives aims to identify the referent in an anaphora relation, whereby the anaphora remits semantically to an entity already introduced in the discourse. TileAttack targets a similar task called a co-reference annotation, in which the task is to find the relationship between multiple terms with a common referent. Meanwhile, Wordrobe focuses on different levels of linguistic annotation such as PoS tagging, named entity tagging (locate and classify named entities mentioned in a text), co-reference resolution, and word sense disambiguation (identifying which sense of a word is meant in a sentence).
As mentioned above, Wordrobe and Wordclicker have already explored the game-based approach for PoS tagging, but in two different directions, the former using Gamification and the second GWAPs. While Gamification uses GDEs in non-game contexts [11] to engage users with applications, the approach of GWAPs attempts to design video games that entertain and generate data as a side effect [12]. In that regard, in Wordrobe, researchers make use of ludic components, such as betting and awards, making it appear more like a gamified application. On the contrary, in WordClicker, the researchers attemped to increase the enjoyability of the GWAPs using a particular game mechanic, which consists of clicking [7]. If participants click on the correct PoS tag, they automatically produce and sell virtual cakes and earn virtual money that they can use to upgrade the bakery by buying new ingredients and equipment. Another difference in their approaches is that Wordrobe focuses on obtaining high-precision annotations, and WordClicker explores metrics to assess the enjoyability of the mechanics. However, neither of them analyzed or evaluated how the different GDEs used in their game design influence the user’s experience in terms of enjoyment, which is argued to be the main incentive in GWAPs [13]. Additionally, enjoyment has been identified as a critical factor of games, which not only motivates players [14] but helps to maintain their interest in the game [15]. Therefore, this study aimed to draw conclusions on these game characteristics in the context of GWAPs for linguistics by answering the following research questions:
  • RQ1: Do the implemented GDEs influence players’ enjoyment and, consequently, their preference for games? Based on the type and number of implemented GDEs (see Figure 1), the order of preference is hypothesized as follows: (1) Tesoros, (2) Anotatlón, and (3) Agentes.
  • RQ2: Are certain GDEs associated with higher values of player enjoyment?
Since variations in the design will obviously affect how participants perceive each of the games, answering RQ1 may reveal whether the integration of different GDEs increases player enjoyment and crowd participation. On the other hand, answering RQ2 will help to identify which GDEs are specifically responsible for increasing player enjoyment.

2. Materials and Methods

The study examined the implemented GDEs and their influence on the players’ preference for any of the three game concepts. Participants were split into three groups to evaluate their preference between two of three games that they played. They were separated this way to avoid biases due to the fatigue of performing more than two sessions of playing and evaluating. Participants played and evaluated their perceptions of the GWAPs by answering three questionnaires. Details of the experiment are described in the following sections.

2.1. Participants

As this study aimed to assess the GWAPs in terms of player enjoyment, the researchers’ networks were used to recruit potential players that met the inclusion criteria: (1) participants are native speakers or highly proficient in Spanish, with or without experience of PoS tagging; and (2) they are of legal age in the country where they have been recruited. Players did not need prior knowledge of PoS tagging, as the GWAPs included a training session to introduce or remind them of the different PoS tags. Fifty-four participants were randomly assigned to one of three groups (n = 18 in each group A, B, or C). Each participant played and evaluated two games. As a result, each game was played by 36 participants. In total, 30 male and 24 female participants took part in the study, 42 of these were native Spanish speakers, and 12 proficient.

2.2. GWAPs and the Implemented GDEs

In the context of this research, three game concepts to verify or correct pre-tagged PoS were developed: Agentes, Tesoros, and Anotatlón. Agentes is a roulette-style game where players categorize a word highlighted in a sentence by dragging or clicking on the correct PoS tag. Tesoros is a platform game in which players help an avatar named Gummy over obstacles, collect coins, and reach a treasure chest. Gummy moves automatically after the player assigns the PoS tag to the highlighted word. Anotatlón is a racing game where players navigate a car to avoid obstacles and then select the correct PoS tag for a highlighted word at the end of the race. Figure 2 shows the three game concepts. They include the same set of sentences extracted from the COSER corpus. As the COSER corpus comprises more than 4 million tokens, a sample of 12,402 sentences was extracted. These were automatically tokenized, lemmatized, and morpho-syntactically labeled using the spaCy library [16]. Additionally, the sentences were randomly presented to the players in each game.
For each GWAP, a selection of GDEs was implemented to enhance the player enjoyment, as shown in Figure 1. The design details are explained in [17,18].

2.3. Measurement

To investigate whether the implementation of GDEs influenced player enjoyment and whether there was a preference for certain games, both standardized questionnaires, such as the Intrinsic Motivation Inventory (IMI), and custom-designed ones were included. The IMI assists in measuring need satisfaction and evaluating intrinsic motivational characteristics, such as Interest/Enjoyment, Tension/Pressure, Competence, and Effort/Importance [19]. Two ad hoc questionnaires were designed in the context of this research: a post-game and a comparative questionnaire. The post-game questionnaire aimed to record players’ impressions of the game they had just played, and the comparative questionnaire sought to compare the two games played and their GDEs. The latter was assessed at two levels: the implementation, and preference/favorite.
The study also included an assessment of personality traits using the Ten-Item Personality Inventory (TIPI). However, the results are reported in [20] and not here because they fall outside the scope of this paper. Nevertheless, it is worth mentioning, because the TIPI was also collected during the experimental design.

2.4. Procedure

The study was conducted between May and July 2021 after being approved by the Social and Societal Ethics Committee (SMEC) of UHasselt (approval number REC/SMEC/VRAI/201/115, granted on 25 March 2021).
The experiment was carried out online for three reasons. Firstly, the setting was prepared to keep distance due to the COVID-19 pandemic. Secondly, as the research was developed in a non-Spanish speaking country, the online format allowed more participants from Spanish speaking countries. Thirdly, conducting the study online was a natural choice, as the GWAPs will be released on a crowdsourcing platform. Figure 3 provides an overview of the entire experimental procedure.
Participants were recruited via convenience sampling using the researchers’ networks. To those interested, the facilitator sent an official invite, including the informed consent form, the scheduled appointment, and the link to connect to the meeting. Participants who attended the appointment were first welcomed and then instructed about the study. They were asked to share their screen during the gaming session and stop sharing if they did not want to show their responses to the facilitator while answering the online questionnaires. After that, to formally initiate the playing and evaluation sessions, the facilitator sent a link that allocated the participant into one of the three experimental design groups. The first step was to accept the informed consent, and the second step consisted in completing a personality questionnaire. Then, the participants played the first game, which was randomly assigned, to ensure that both games were played the same number of times within the group, n times as the first game and n times as the second. This approach ensured that the evaluation had an equal weight for each game, avoiding the experience of playing the first or second game affecting the results. Once the game session was finished (minimum 10 min to maximum 20 min), participants completed the IMI and post-game questionnaire, which were the fourth and fifth steps of the study, respectively. Afterwards, participants played the remaining game, evaluated it by answering the questionnaires (steps 6–8). Finally, in the ninth step, participants completed the comparative questionnaire to identify differences and characteristics of the played games. At the end of the study, the facilitator acknowledged the players’ participation and concluded the study. Figure 3 also shows the multiple data sources used to collect the data, such as the questionnaires, observations of game sessions, and annotations (PoS tagging) obtained from the participants while they played the games.

3. Results

The following shows the results of the statistical analysis of the standardized questionnaires and the close-ended questions in the custom-designed questionnaires.

3.1. Game Concept Preference

The descriptive statistics, one-way analysis of variance (ANOVA) of each variable of the IMI, and the correlations between the IMI dimensions and the implemented GDEs in the games helped to answer the first research question (RQ1). For RQ1, it was hypothesized that, based on the number of implemented GDEs, the game preference order would be (1) Tesoros, (2) Anotatlón, and (3) Agentes. However, when participants were asked which game they would play again, the results showed a different order of preference: Anotatlón came out as the preferred game for 24 of the 36 participants who had played it. In second place, we find Tesoros (15/36 participants), and thus in third place was Agentes, with only 5/36 participants. In order to explain the participants’ reasoning for choosing one game concept over another, the dimensions of IMI were analyzed. First, the descriptive statistics in Table 1 show that the Interest/Enjoyment and Effort/Importance levels were higher for the game Anotatlón. For Tesoros, Competence was higher, and the Tension/Pressure was lower. One-way analyses of variances (ANOVA) revealed significant differences for Interest/Enjoyment and Competence. Post hoc comparisons using Tukey’s test were conducted to investigate which pair of games presented these differences. The perceptions of Interest/Enjoyment ( F 2 , 105 = 4.827 , p < 0.001) showed a significant difference between Anotatlón and Agentes. Similarly, perceived Competence was also statistically significant ( F 2 , 105 = 3.941 , p < 0.01) between Tesoros and Anotatlón. Effort/Importance and Tension/Pressure did not show significant differences.
Additionally, a logistic regression analysis was conducted to determine whether the IMI factors influenced the concept preference between the two games played by the players. As shown in Table 2, perceived Competence was a significant predictor for preferring Anotatlón and, in fact, the only construct with an effect on game preference. Thus, the results in Table 1 and Table 2 show that, to some extent, the Competence and Interest/Enjoyment influenced the concept preference. However, to discover whether those IMI levels resulted from implementing the GDEs, correlations among IMI constructs and the GDEs were analyzed to determine the differences in respondents’ preferences. The following section shows these results.

3.2. Game Design Elements

The analysis of correlations between the IMI dimensions and the GDEs implemented in the games helped to a certain degree to answer RQ2, given that some GDEs were positively associated with high values of player enjoyment. Concretely, leaderboards, challenges, and levels were evaluated as the best implemented GDEs, with challenges, adaptation, and theme as the favorite GDEs. The results in this section are presented following the two factors that were evaluated, i.e., Implementation and Preference/Favorite.

3.2.1. Implementation

As shown in Table 3, the GDEs in Tesoros presented more correlations with Interest/Enjoyment and Effort/Importance in comparison to Anotatlón, which was the best-rated game. One-way ANOVA was performed to compare the GDE implementations, but no significant difference was found between the games. This table also provides the correlations with the other dimensions of IMI, in which only two GDEs (i.e., adaptation and customization) were correlated with Competence and Tension/Pressure.

3.2.2. Preference/Favorite

As concerns the preferences for the implemented GDEs, challenges and adaptation were consistently the top preference for the three games, as can be seen in Table 4. However, this result for the latter GDE was a rather unexpected outcome, given that adaptation had only been implemented in Tesoros. One-way ANOVA followed by Tukey post hoc comparisons were used to identify the differences between all games. These comparisons revealed a significant difference for rewards ( F 2 , 104 = 3.351 , p < 0.05). Post hoc tests indicated significant between-group differences for Agentes and Anotatlón. Similarly, there was a significant difference in storyline ( F 2 , 78 = 6.257 , p < 0.05). Again, post hoc tests indicated significant between-group differences for Tesoros-Agentes and Anotatlón-Agentes.

4. Discussion

This study was designed to evaluate three game concepts and obtain insights into concept preference and the effect of GDEs on player enjoyment. Analysis of the experimental results yielded the following findings.

4.1. Game Concept Preference

Regarding the game concept preference (RQ1), participants preferred the Anotatlón game. This rejected our hypothesis that Tesoros would be the preferred game. The high level of Interest/Enjoyment shows that this construct was a factor in the selection of this game. The finding is supported by the players’ comments (Their comments are in Spanish, but translations are ours) on why they would play this game again. One participant argued that ‘…is more enjoyable, the concept is the same, but the part before the classification of the word makes it less tedious and more interactive (…es mas divertido, el concepto es el mismo pero la parte previa a la clasificacion de la palabra lo hace menos tedioso y mas interactivo), P2476 (A unique code is used to identify research participants and keep them anonymous)’. Another factor in choosing Anotatlón as a game to replay in the future was the high level of perceived competence. This is confirmed by the comments of participants, such as ‘…I felt challenged while playing by avoiding the obstacles, it motivated me to play again. (…me sentí retado mientras jugaba al esquivar los obstáculos, me motivaba a jugar de nuevo.), P1103’.
The second most preferred game was Tesoros, which also showed a high value of perceived Competence. This is corroborated by the qualitative results, which demonstrated that Tesoros was the game that provided a feeling of achievement for various participants. For example, one participant mentioned the following: ‘perform perfect rounds and win jumps (Hacer rondas perfectas, y ganar saltos), P3957’; another commented ‘try to solve everything correctly to gain lives and coins, and thus position myself in the top 5 (intentar resolver todo de manera correcta para ganar vidas y monedas, y así posicionarme en el top 5), P4162’. That Tesoros came in second could be explained by the fact that five implemented GDEs (i.e., adaptation, challenges, levels, rewards, and theme) were found related to the enjoyment in this game. The previous fact was corroborated by the participants’ comments, such as ‘it is more enjoyable because there is a goal: to collect gold (es más divertido, porque hay un objetivo: colectar oro), P3530’.
Finally, although Agentes did not get rated as an enjoyable game, the study offered some interesting results on it. One of these relates to the text and phrases presented in the game: participants pointed out that they liked to see a variety of words and sentences. To illustrate this with concrete comments, ‘I liked discovering the words that are used… (Me gusto descubrir las palabras que se usan…), P3772’ and ‘the infinite sentences that it shows, haha they are nice, never repeat the same ones (las infinitas oraciones que muestra, jaja son muy padres nunca repiten las mismas), P0603’. Additionally, participants considered that the game facilitates learning, e.g., ‘ They help you a lot to strengthen basic knowledge of the Spanish language that can be forgotten over time (Te ayudan bastante a fortalecer conocimientos basicos del idioma Español que pueden olvidarse con el tiempo), P0239’. These findings for Agentes might be the result of being exposed to a more significant number of words and phrases compared to Tesoros and Anotatlón. Agentes shows ten phrases per round, Tesoros five phrases (which gradually increase as the game progresses), and Anotatlón only three.
Regarding annotations, Agentes produced 54% (3742/6893) of the total annotations compared with 33% (2306/6893) coming from Tesoros, and only 12% (845/6893) from Anotatlón [18]. Designers can benefit from our experience with the three GWAPs we have developed. Agentes demonstrated a more effective game mechanic for collecting PoS tags. In contrast, Anotatlón saw a reduction in data collection due to its additional task—the car race—which was designed to increase player enjoyment. This effect is consistent with the recommendations formulated in Tuite (2014) [21], which suggested only including game mechanics that contribute to the main goal (e.g., the PoS tagging), because additional game mechanics (e.g., car races) can negatively affect players’ ability to complete the main task. Although the tasks in Anotatlón are not performed simultaneously, the time and skills required for the driving task certainly affected the amount of data collected with this GWAP. In terms of balance between annotation collection efficiency and user enjoyment, Tesoros can be positioned between Agentes and Anotatlón. This GWAP received fewer annotations than Agentes, but overall it was rated as more enjoyable. Compared to Anotatlón, Tesoros was less enjoyable but collected more annotations.
The following section discusses whether the implemented GDEs influenced the levels of the dimensions of the IMI to answer RQ2.

4.2. Game Design Elements

To answer RQ2, the present study showed correlations between Interest/Enjoyment and seven GDEs, namely, adaptation, challenges, customization, leaderboards, levels, rewards, and theme. This is consistent with previous studies that also demonstrated relationships between these GDEs and enjoyment. However, precise result comparisons cannot be made with our study, given that most previous research assessed individual GDEs or game concepts are too different.
Rewards was rated as the best implemented GDE in Tesoros and Anotatlón, and, even when it was not the most preferred, its implementation translated into a significant difference in GDE preference between Agentes and Anotatlón. Further, for Tesoros, rewards showed a high level of Interest/Enjoyment. A possible explanation for these results might be that the implementation of rewards is similar for Anotatlón and Tesoros, but not for Agentes. For Agentes, the rewards consist of points that eventually become badges. In contrast, in Tesoros and Anotatlón, the rewards are coins that players can use to buy items, e.g., unlock new categories or power-ups (i.e., jumps and shields) that help them to continue playing. In that regard, participants mentioned ‘the second [Anotatlón] seems better to me with a little more entertainment, better rewards which you can use in the same game as the shields. (el segundo [Anotatlón] me parece mejor con un poco mas de entretenimiento, mejores recompensas las cuales puedes usar en el mismo juego como los escudos.), P3772’. The results on the use of points in Agentes are thus consistent with those from previous studies, which suggests that rather than simply using a scoring system, designers should contextualize this reward to the main purpose of the game, helping to connect and better motivate players in the game [22]. Our findings are also in line with the observations by Johnson et al. [23], who showed that a higher number and diversity of rewards (as in Tesoros and Anotatlón) lead to more enjoyment.
Participants consistently rated challenges as the preferred GDE for the three games. A correlation between Interest/Enjoyment and the preference of this GDE was not observed, but a correlation between Interest/Enjoyment and its implementation for the three games was found. The qualitative analysis also corroborated these quantitative results, as participants stated that challenges made the game more enjoyable and motivating, e.g., ‘It has obstacles and that makes it more fun. (Tiene obstáculos y eso lo hace más divertido), P2274’, and ‘…there is a challenge and it is more dynamic (…hay un reto y es más dinámico), P3015’. Additionally, Tesoros and Anotatlón presented a higher correlation value between challenges and Interest/Enjoyment compared to Agentes, which might be explained by the fact that there were more challenging elements that the players could encounter, such as complete the race or perform perfect rounds. These results align with [24], who indicated that optimal levels of challenges result in better values of competence and flow. In contrast, in Agentes, the goal of obtaining more than 100 points to be validated as a human agent seemed easy, so it did not represent a challenge for the participants.
Adaptation was also rated as one of the preferred GDEs. The reason for this may be the inherent variability of the sentences that were presented throughout the games, leading participants to believe that all three games adapt their difficulty to the player as they play. However, this GDE was only implemented explicitly in Tesoros based on player performance, which led to higher values of Interest/Enjoyment in Tesoros and a positive correlation between adaptation and Interest/Enjoyment. These results are consistent with findings from [25,26], who observed increased flow state and scores when playing a game that adapted to players’ performance.
Other GDEs that had a particular effect on the game concept preference were the avatar and storyline. For Tesoros, the avatar implementation showed a correlation to Effort/Importance, and avatar preference correlated to Tension/Pressure. These results might have been due to people’s empathy for the avatar, so they tried a little harder to help him, e.g., ‘one feels enthusiastic helping the Monitor (uno se siente entusiasmado ayudando el Monito), P0343’ and ‘Seeing the avatar move may motivate to keep playing… (El ir viendo cómo se mueve el avatar quizá motiva a seguir jugando…), P2918’. This aligns with the study of Birk et al. [27], which claimed that greater identification leads to an increase in intrinsic motivation. Storyline presented a significant difference in GDE preference between Tesoros-Agentes and Anotatlón-Agentes, and a negative correlation with Tension/Pressure for Anotatlón. The first result can be explained because the storyline in Agentes was developed explicitly, including the participant as an agent inside the story. Participants were aware of this and rated it positively compared to Tesoros and Anotatlón, in which no storyline explained why they have to collect coins or win a race. The second result could be because participants who experienced less Tension/Pressure rated the storyline as the less preferred GDE. In contrast to previous studies [28,29], the implementation of this GDE in Agentes did not increase enjoyment as much as desired.
Levels, leaderboards, and customization also provided positive outcomes in the different constructs. On the contrary, progress did not yield results. This may have been because players did not perceive progress, or they could associate it with other GDEs, such as levels. Taken together, the results suggest that there is indeed an association between GDE implementation and concept preference. Figure 4 shows a visual summary of the results. The quadrants represent each of the constructs of the IMI. The circles represent the game concepts, and their size indicates the values of the IMI constructs for each game. The squares show the GDEs that had an impact on each of the constructs of the IMI, arranged in order from the highest to the lowest level of influence, as indicated by the colored triangle. As seen, Anotatlón obtained the highest Interest/Enjoyment and Effort/Importance scores. Tesoros showed the highest feeling of Competence, and Agentes presented the highest level of Tension/Pressure. The arrow from Competence indicates that this construct was the only one that predicted a player’s decision to play again, in this case, Anotatlón. Concerning the relationships between the GDEs and the IMI constructs, Tesoros showed the highest number of relationships with Effort/Importance and Interest/Enjoyment. In second place was Anotatlón, with GDEs mainly related to Interest/Enjoyment. These results suggest that preference for a game is influenced by the overall game concept and the perception of the different GDEs implemented. Furthermore, the GDEs demonstrated their ability to influence positive or negative outcomes in the satisfaction of basic needs.

4.3. Limitations and Future Work

Due to COVID-19, the experiment was conducted remotely to maintain social distancing. The games were designed to be played online, so a remote study was a natural extension. However, some observations during playing sessions may have been missed, which could have provided additional insights into participants’ experiences. Another limitation is a possible bias in the data due to the differences in time played, which could have affected the results. Some players spent 10 min, while others spent 20 min, mainly because the sessions were supervised or unsupervised. In the unsupervised session, it could not be determined whether players explored all game features and played effectively for 10 min. The analysis between time and enjoyment showed a tendency to increase enjoyment as the participant played longer. However, as the time variable was not controlled, it would be interesting for future experiments to control it and explore whether it is a factor in increasing enjoyment.
Follow-up work included improving the games based on the study results and releasing them to the crowd. Once they were available and freely accessible, another study was conducted. This study was less controlled than the one presented in this paper, as it only required informed consent, and the questionnaires were presented at two points (i.e., at the end of training and the end of one level). The goal was to monitor long-term player enjoyment and to continue collecting PoS tags. From a linguistic point of view, additional data analysis evaluated the accuracy of the annotations collected, and these results are reported in [30]. The study highlighted the effectiveness of using GWAPs to verify PoS tags and improve accuracy rates. It also showed that factors such as the education level, field of study, and geographical background of the GWAP players had a significant impact on the accuracy of PoS tagging. Obviously, a linguistically proficient annotator would provide more PoS tags per time unit than a GWAP player (with or without linguistic knowledge). However, in the long term, the scale of the corpus may make the tasks expensive. On the other hand, linguistic tasks such as PoS tagging can be tedious and may not appeal to participants unless they incorporate gamification. For these reasons, GWAP approaches based on the principle of crowdsourcing are promising, as they focus on scalability and enjoyment rather than the efficiency of one particular contributor.

5. Conclusions

This study discussed the evaluation of three games—Agentes, Tesoros, and Anotatlón—that focus on the linguistic task of revising and correcting the pre-tagged PoSs of a corpus. Although many GWAPs have been designed before, including games for PoS tagging, this work aimed to ground the design decisions to explore the effects of GDEs on enjoyment and concept preference. The evaluation results showed that, in linguistic task annotations, the best rated games in terms of enjoyment were Anotatlón and Tesoros. The former was the most enjoyable game as a whole concept, and the second showed enjoyment through its different implemented GDEs, such as levels, challenges, and rewards. The findings reveal that the GDEs contributed to positive outcomes for enjoyment. Therefore, we encourage game designers to be inspired by the findings of our paper and to explore similar or alternative GDEs in the context of their GWAP or other types of serious games. At the same time, we want to encourage designers and researchers to thoroughly explore the effect of these game concepts and GDEs in different contexts, to prevent generalizations that lack demonstrated benefits.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15073561/s1.

Author Contributions

Conceptualization, R.L.S.D.; Methodology, R.L.S.D.; Software, R.L.S.D.; Writing—original draft, R.L.S.D.; Writing—review & editing, G.R.R., M.B., V.H. and K.C.; Visualization, R.L.S.D.; Supervision, G.R.R., M.B. and K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was carried out as part of a medium-scale research infrastructure project funded by the Flemish Research Fund (Fonds voor Wetenschappelijk Onderzoek, FWO) entitled “A Respeaking and Collaborative Game-Based Approach to Building a Parsed Corpus of European Spanish Dialects” (project reference: I000418N).

Institutional Review Board Statement

The study was approved by the Ethics Committee (SMEC) of UHasselt with approval number REC/SMEC/VRAI/201/115, granted on 25 March 2021.

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the embargo of PhD dissertation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chamberlain, J.; Fort, K.; Kruschwitz, U.; Lafourcade, M.; Poesio, M. Using Games to Create Language Resources: Successes and Limitations of the Approach. In The People’s Web Meets NLP: Collaboratively Constructed Language Resources; Springer: Berlin/Heidelberg, 2013; pp. 3–44. [Google Scholar] [CrossRef]
  2. COSER = Inés Fernández-Ordóñez (dir.) (2005-): Corpus Oral y Sonoro del Español Rural. Available online: http://www.corpusrural.es (accessed on 24 February 2025).
  3. Chambers, J.K.; Trudgill, P. Dialectology; Cambridge University Press: Cham, Switzerland, 1998. [Google Scholar]
  4. Caroux, L.; Isbister, K.; Le Bigot, L.; Vibert, N. Player-video game interaction: A systematic review of current concepts. Comput. Hum. Behav. 2015, 48, 366–381. [Google Scholar] [CrossRef]
  5. von Ahn, L.; Kedia, M.; Blum, M. Verbosity: A Game for Collecting Common-Sense Facts. In CHI’06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Montréal, QC, Canada, 22–27 April 2006; Association for Computing Machinery: New York, NY, USA, 2006; pp. 75–78. [Google Scholar] [CrossRef]
  6. Guillaume, B.; Fort, K.; Lefebvre, N. Crowdsourcing Complex Language Resources: Playing to Annotate Dependency Syntax. In Proceedings of the COLING 2016—26th International Conference on Computational Linguistics, Osaka, Japan, 11–16 December 2016; Proceedings of COLING 2016: Technical Papers. pp. 3041–3052. [Google Scholar]
  7. Madge, C.; Bartle, R.; Chamberlain, J.; Kruschwitz, U.; Poesio, M. Incremental Game Mechanics Applied to Text Annotation. In CHI PLAY’19: Proceedings of the Annual Symposium on Computer-Human Interaction in Play, Barcelona, Spain, 22–25 October 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 545–558. [Google Scholar] [CrossRef]
  8. Poesio, M.; Chamberlain, J.; Kruschwitz, U.; Robaldo, L.; Ducceschi, L. Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation. ACM Trans. Interact. Intell. Syst. 2013, 3, 1–44. [Google Scholar] [CrossRef]
  9. Madge, C.; Yu, J.; Chamberlain, J.; Kruschwitz, U.; Paun, S.; Poesio, M. Crowdsourcing and aggregating nested markable annotations. In Proceedings of the ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Stroudsburg, PA, USA, 28 July–2 August 2019; pp. 797–807. [Google Scholar] [CrossRef]
  10. Venhuizen, N.J.; Basile, V.; Evang, K.; Bos, J.; Basile, V.; Bos, J.; Venhuizen, N.J. Gamification for Word Sense Labeling. In Proceedings of the 10th International Conference on Computational Semantics, IWCS 2013—Long Papers, Potsdam, Germany, 19–22 March 2013; pp. 397–403. [Google Scholar]
  11. Deterding, S.; Dixon, D.; Khaled, R.; Nacke, L. From Game Design Elements to Gamefulness. In MindTrek’11: Proceedings of the 15th International Academic MindTrek Conference on Envisioning Future Media Environments, Tampere, Finland, 28–30 September 2011; Association for Computing Machinery: New York, New York, USA, 2011; p. 9. [Google Scholar] [CrossRef]
  12. Von Ahn, L.; Dabbish, L. Designing games with a purpose. Commun. ACM 2008, 51, 58–67. [Google Scholar] [CrossRef]
  13. Poesio, M.; Chamberlain, J.; Kruschwitz, U. Crowdsourcing. In Handbook of Linguistic Annotation; Springer: Dordrecht, The Netherlands, 2017; pp. 277–295. [Google Scholar] [CrossRef]
  14. Mekler, E.D.; Bopp, J.A.; Tuch, A.N.; Opwis, K. A Systematic Review of Quantitative Studies on the Enjoyment of Digital Entertainment Games. In CHI’14: Proceedings of the Conference on Human Factors in Computing Systems, Toronto, ON, Canada, 26 April–1 May 2014; Association for Computing Machinery: New York, New York, USA, 2014; pp. 927–936. [Google Scholar] [CrossRef]
  15. Boyle, E.A.; Connolly, T.M.; Hainey, T.; Boyle, J.M. Engagement in digital entertainment games: A systematic review. Comput. Hum. Behav. 2012, 28, 771–780. [Google Scholar] [CrossRef]
  16. Honnibal, M.; Montani, I.; Van Landeghem, S.; Boyd, A. spaCy: Industrial-Strength Natural Language Processing in Python; Zenodo: Honolulu, HI, USA, 2020. [Google Scholar]
  17. Segundo Díaz, R.L.; Rovelo, G.; Bouzouita, M.; Hoste, V.; Coninx, K. The Influence of Personality Traits and Game Design Elements on Player Enjoyment: A Demo on GWAPs for Part-of-Speech Tagging. In Proceedings of the Serious Games. 9th Joint International Conference, JCSG 2023, Dublin, Ireland, 26–27 October 2023; Haahr, M., Rojas-Salazar, A., Göbel, S., Eds.; Springer: Cham, Switzerland, 2023; Volume 14309, pp. 353–361. [Google Scholar] [CrossRef]
  18. Segundo Díaz, R.L.; Bonilla, J.E.; Bouzouita, M.; Ruiz, G.R.; Rovelo Ruiz, G. Juegos con propósito para la anotación del Corpus Oral Sonoro del Español rural. Dialectologia et Geolinguistica 2023, 31, 135–164. [Google Scholar] [CrossRef]
  19. McAuley, E.D.; Duncan, T.; Tammen, V.V. Psychometric properties of the intrinsic motivation inventoiy in a competitive sport setting: A confirmatory factor analysis. Res. Q. Exerc. Sport 1989, 60, 48–58. [Google Scholar] [CrossRef] [PubMed]
  20. Segundo Díaz, R.L.; Rovelo Ruiz, G.; Bouzouita, M.; Hoste, V.; Coninx, K. The Influence of Personality Traits and Game Design Elements on Player Enjoyment: An Empirical Study on GWAPs for Linguistics. In Proceedings of the Games and Learning Alliance 12th International Conference, GALA 2023, Dublin, Ireland, 29 November–1 December 2023; Dondio, P., Rocha, M., Brennan, A., Schönbohm, A., de Rosa, F., Koskinen, A., Bellotti, F., Eds.; Springer: Cham, Switzerland, 2023; Volume 14475, pp. 204–213. [Google Scholar] [CrossRef]
  21. Tuite, K. GWAPs: Games with a Problem. In Proceedings of the 9th International Conference on the Foundations of Digital Games, FDG 2014, Liberty of the Seas, Caribbean, 3–7 April 2014. [Google Scholar]
  22. Jia, Y.; Xu, B.; Karanam, Y.; Voida, S. Personality, targeted Gamification: A Survey Study on Personality Traits and Motivational Affordances. In CHI’16: Proceedings of the Conference on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 2001–2013. [Google Scholar] [CrossRef]
  23. Johnson, D.; Klarkowski, M.; Vella, K.; Phillips, C.; McEwan, M.; Watling, C.N. Greater rewards in videogames lead to more presence, enjoyment and effort. Comput. Hum. Behav. 2018, 87, 66–74. [Google Scholar] [CrossRef]
  24. Yildirim, I.G. Time Pressure as Video Game Design Element and Basic Need Satisfaction. In CHI EA’16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 2005–2011. [Google Scholar] [CrossRef]
  25. Alves, T.; Gama, S.; Melo, F.S. Flow Adaptation in Serious Games for Health. In Proceedings of the 2018 IEEE 6th International Conference on Serious Games and Applications for Health (SeGAH), Vienna, Austria, 16–18 May 2018; pp. 1–8. [Google Scholar] [CrossRef]
  26. Mildner, P.; Stamer, N.; Effelsberg, W. From Game Characteristics to Effective Learning Games. In Serious Games: Proceedings of the First Joint International Conference, JCSG 2015, Huddersfield, UK, 3–4 June 2015; Göbel, S., Ma, M., Baalsrud Hauge, J., Oliveira, M.F., Wiemeyer, J., Wendel, V., Eds.; Springer: Cham, Switzerland, 2015; pp. 51–62. [Google Scholar] [CrossRef]
  27. Birk, M.V.; Atkins, C.; Bowey, J.T.; Mandryk, R.L. Fostering Intrinsic Motivation through Avatar Identification in Digital Games. In CHI’16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 2982–2995. [Google Scholar] [CrossRef]
  28. Prestopnik, N.R.; Tang, J. Points, stories, worlds, and diegesis: Comparing player experiences in two citizen science games. Comput. Hum. Behav. 2015, 52, 492–506. [Google Scholar] [CrossRef]
  29. Wang, X.; Goh, D.H.L.; Lim, E.P.; Vu, A.W.L. Aesthetic experience and acceptance of human computation games. In Digital Libraries: Providing Quality Information: Proceedings of the 17th International Conference on Asia-Pacific Digital Libraries, ICADL 2015, Seoul, Republic of Korea, 9–12 December 2015; Allen, R.B., Hunter, J., Zeng, M.L., Eds.; Springer: Cham, Switzerland, 2015; Volume 9469, pp. 264–273. [Google Scholar] [CrossRef]
  30. Bonilla, J.E.; Segundo Díaz, R.L.; Bouzouita, M. Using GWAPs for Verifying PoS Tagging of Spoken Dialectal Spanish. In Proceedings of the 2023 10th International Conference on Behavioural and Social Computing (BESC), Larnaca, Cyprus, 30 October–1 November 2023; pp. 1–7. [Google Scholar] [CrossRef]
Figure 1. The common GDEs implemented for all games are shown in green, and within the boxes in blue, the specific GDEs implemented for each game. Grey GDEs were not implemented.
Figure 1. The common GDEs implemented for all games are shown in green, and within the boxes in blue, the specific GDEs implemented for each game. Grey GDEs were not implemented.
Applsci 15 03561 g001
Figure 2. Screenshots of the three game concepts used in the present work. (a) Agentes, (b) Tesoros, and (c) Anotatlón. GWAPs are in Spanish. The images have been translated for ease of understanding. For your reference, the original images can be found in the Supplementary Material.
Figure 2. Screenshots of the three game concepts used in the present work. (a) Agentes, (b) Tesoros, and (c) Anotatlón. GWAPs are in Spanish. The images have been translated for ease of understanding. For your reference, the original images can be found in the Supplementary Material.
Applsci 15 03561 g002
Figure 3. Experimental procedure. Black lines show the steps followed by participants during the experiment, and grey lines show activities performed by the facilitator and the storage of annotations by the games.
Figure 3. Experimental procedure. Black lines show the steps followed by participants during the experiment, and grey lines show activities performed by the facilitator and the storage of annotations by the games.
Applsci 15 03561 g003
Figure 4. Visual overview of the relationships among games, GDEs, and IMI constructs. Quadrants represent IMI constructs. Circles depict game concepts, with size reflecting IMI construct values. Squares indicate GDEs that had an impact on each IMI construct. Visuals: Earth for Agentes, red avatar for Tesoros, and yellow car for Anotatlón.
Figure 4. Visual overview of the relationships among games, GDEs, and IMI constructs. Quadrants represent IMI constructs. Circles depict game concepts, with size reflecting IMI construct values. Squares indicate GDEs that had an impact on each IMI construct. Visuals: Earth for Agentes, red avatar for Tesoros, and yellow car for Anotatlón.
Applsci 15 03561 g004
Table 1. Mean scores and standard deviations for participant perceptions in each game regarding IMI constructs (1—Not at all true, 7—Very true).
Table 1. Mean scores and standard deviations for participant perceptions in each game regarding IMI constructs (1—Not at all true, 7—Very true).
IMIAgentesTesorosAnotatlón
MEANSDMEANSDMEANSD
Interest/Enjoyment4.491.575.181.305.461.15
Competence5.371.155.790.895.071.22
Effort/Importance4.081.363.741.264.211.30
Tension/Pressure2.800.912.620.812.701.06
Table 2. Logistic regression analysis of participants playing Anotatlón. * indicates p < 0.05, ** indicates p < 0.01.
Table 2. Logistic regression analysis of participants playing Anotatlón. * indicates p < 0.05, ** indicates p < 0.01.
EstimateStd. Errorz ValuePr(> | z | ) OR
(Intercept)−14.87775.6344−2.64000.0083**0.0000
Interest/Enjoyment0.94230.69391.35800.1745 2.5660
Competence1.39230.61532.26300.0237*4.0243
Effort/Importance0.44740.50940.87800.3799 1.5642
Tension/Pressure0.87070.77501.12300.2612 2.3886
Table 3. Descriptive statistics and correlation matrix between the GDE implementation and the IMI. Bold numbers show the highest-rated GDE implementations. INT.ENJ = Interest/Enjoyment, COMP = Competence, EFF.IMP = Effort/Importance, TEN.PRES = Tension/Pressure. * indicates p < 0.05 , ** indicates p < 0.01 , *** indicates p < 0.001 .
Table 3. Descriptive statistics and correlation matrix between the GDE implementation and the IMI. Bold numbers show the highest-rated GDE implementations. INT.ENJ = Interest/Enjoyment, COMP = Competence, EFF.IMP = Effort/Importance, TEN.PRES = Tension/Pressure. * indicates p < 0.05 , ** indicates p < 0.01 , *** indicates p < 0.001 .
GDEAgentesTesorosAnotatlón
MeanSDINT.ENJCOMPMeanSDINT.ENJEFF.IMPMeanSDINT.ENJCOMPEFF.IMPTEN.PRES
Adaptation6.972.34 0.50 *7.612.370.49 **0.39 *7.532.20
Avatar2.742.49 6.722.95 0.42 *6.053.55
Challenges7.432.130.36 * 7.312.240.57 ***0.60 ***7.912.080.60 *** 0.49 **
Customization3.292.88 5.003.560.46 * 8.032.990.41 * −0.39 *
Leaderboards7.911.84 7.472.57 0.46 **7.752.410.37 *0.41 *
Levels7.152.580.45 ** 7.312.840.60 ***0.38 *7.402.780.36 * 0.40 *
Progress5.883.09 7.572.68 6.503.06
Rewards6.892.76 8.691.720.36 *0.38 *8.391.68
Storyline5.582.88 5.112.59 4.532.84
Theme6.472.550.42 * 7.692.100.56 *** 7.292.97
Table 4. Descriptive statistics and correlation matrix between the GDE preferences and the IMI. Bold numbers show the best rated GDEs. INT.ENJ = Interest/Enjoyment, COMP = Competence, TEN.PRES = Tension/Pressure. * indicates p < 0.05 .
Table 4. Descriptive statistics and correlation matrix between the GDE preferences and the IMI. Bold numbers show the best rated GDEs. INT.ENJ = Interest/Enjoyment, COMP = Competence, TEN.PRES = Tension/Pressure. * indicates p < 0.05 .
GDEAgentesTesorosAnotatlón
MeanSDTEN.PRESMeanSDINT.ENJCOMPTEN.PRESMeanSDINT.ENJCOMPTEN.PRES
Adaptation3.442.16 4.062.46 3.892.26
Avatar8.051.99 5.692.88 −0.37 *6.103.16
Challenges3.401.82 3.281.88 0.35 *0.38 *3.032.16 0.36 *
Customization7.102.30 7.452.54 5.832.76
Leaderboards5.232.35 6.222.400.40 * 5.892.140.38 *0.35 *
Levels4.592.300.38 *4.812.18 4.601.97 0.38 *
Progress5.761.69 5.432.17 6.152.02
Rewards5.372.28 4.422.32 4.002.26
Storyline5.243.39 7.412.44 7.792.70 −0.49 *
Theme4.032.97 4.002.86 4.572.84
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Segundo Díaz, R.L.; Rovelo Ruiz, G.; Bouzouita, M.; Hoste, V.; Coninx, K. Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference. Appl. Sci. 2025, 15, 3561. https://doi.org/10.3390/app15073561

AMA Style

Segundo Díaz RL, Rovelo Ruiz G, Bouzouita M, Hoste V, Coninx K. Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference. Applied Sciences. 2025; 15(7):3561. https://doi.org/10.3390/app15073561

Chicago/Turabian Style

Segundo Díaz, Rosa Lilia, Gustavo Rovelo Ruiz, Miriam Bouzouita, Véronique Hoste, and Karin Coninx. 2025. "Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference" Applied Sciences 15, no. 7: 3561. https://doi.org/10.3390/app15073561

APA Style

Segundo Díaz, R. L., Rovelo Ruiz, G., Bouzouita, M., Hoste, V., & Coninx, K. (2025). Games with a Purpose for Part-of-Speech Tagging and the Impact of the Applied Game Design Elements on Player Enjoyment and Games with a Purpose Preference. Applied Sciences, 15(7), 3561. https://doi.org/10.3390/app15073561

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop