Behavioural Effects of Spatially Structured Scoring Systems in Location-Based Serious Games—A Case Study in the Context of OpenStreetMap

: Location-based games have become popular in recent years, with Pokémon Go and Ingress being two very prominent examples. Some location-based games, known as Serious Games, go beyond entertainment and serve additional purposes such as data collection. Such games are also found in the OpenStreetMap context and playfully enrich the project’s geodatabase. Examples include Kort and StreetComplete . This article examines the role of spatially structured scoring systems as a motivational element. It is analysed how spatial structure in scoring systems is correlated with changes observed in the game behaviour. For this purpose, our study included two groups of subjects who played a modiﬁed game based on StreetComplete in a real urban environment. One group played the game with a spatially structured scoring system and the other with a spatially random scoring system. We evaluated different indicators and analysed the players’ GPS trajectories. In addition, the players ﬁlled out questionnaires to investigate whether they had become aware of the scoring system they were playing. The results obtained show that players who are confronted with a spatially structured scoring system are more likely to be in areas with high scores, have a longer playing time, walk longer distances and are more willing to take detours. Furthermore, discrepancies between the perception of a possible system in the scoring system and corresponding actions were revealed. The results are informative for game design, but also for a better understanding of how players interact with their geographical context during location-based games.


Introduction
The ubiquitous deployment of mobile devices like smartphones equipped with positioning technology has made possible the proliferation of location-based games. Popular examples include Pokémon Go and Ingress, which conflate the physical and the digital sphere through augmented reality and the experience in real-world spaces. Location-based games have sparked interest in academic research in various fields, such as geography [1][2][3] and human-computer interaction [4][5][6]. Researchers thereby leverage digital games to investigate a range of complex mechanisms, including knowledge sharing [7], social collaboration [8], and negotiation processes in the digital realm [9]. Games are The research presented in this article focuses on the game element of scoring systems. These are used to reward players for completing in-game tasks, and are hence an element of gamification that has an impact on the motivation of players. In location-based serious games, players explore real-world and thus, geographical playing fields. Therefore, we assume that a systematic spatial pattern in scoring systems might be a relevant factor affecting the behaviour of players during the game. Our research describes a case study conducted in Heidelberg, Germany, and is based on a modified version of the game StreetComplete. Two groups of players were presented two different versions of the game, each featuring different scoring systems. One group played the game with a spatially clustered scoring system, whereas a control group was confronted with spatially randomised scores. The tasks to be solved were real OSM-related assignments. These were controlled for a comparable degree of difficulty in order to avoid any distorting effects external to the scoring systems used. We tested the hypothesis that the spatial structuring of a scoring system has a significant effect on player behaviour and motivation. In order to test this hypothesis, we assessed several indicators, all of which are related to motivational aspects of games. We further tracked the players' GPS trajectories to investigate the geographical exploration of the playing field. In addition, the study was accompanied by questionnaires the subjects were asked to complete. This allowed us to gain additional insights into how the players experienced the game with one of the two scoring systems each. Note that in this study, we do not address how behavioural responses of players to spatial structure in scores interact with geographical features like different street layouts or the local building morphology. Instead, this study concerns the influence of spatial structure in comparison to the complete absence of a spatial pattern in scores. In addition to interesting findings regarding a relatively unexplored area of gamification, this study offers results that are of interest also for the targeted design of location-based games in a practical sense.

Literature Review
Games can be defined as "rule-based, formal systems [...] in which different outcomes are assigned different values" ( [33] p. 35). These systems can be described as spatiotemporal spheres largely separate from everyday life, with their own internal logic, rules, objectives, virtual goods and other characteristics [34,35]. In summary, these characteristics are referred to as gameplay, which has to be considered holistically and is essential for the experienced enjoyment of a game [36]. All elements of gameplay are closely linked and mutually interrelated [37]. This work focuses on the motivating aspect of spatial patterns in scoring systems, one specific element of gameplay. The following subsections provide an overview of motivation in games in general, as well as of particularly geographic motivational factors.
It should be noted that the term motivation is not used consistently in the literature. Some authors used the term to refer to pre-game incentives that make a potential player play. Others, however, use motivation to refer to elements that keep players playing during a game, which is sometimes also referred to as engagement [38,39]. This distinction is seldom made explicit, and we therefore also use the term motivation to refer to both types of activations interchangeably in this work.

Motivation in Games
Motivations for playing games can be manifold: escape from daily routine, the creative act of building virtual worlds (e.g., cities), the competitive fighting of battles, the experience of high levels of interaction, or a relaxed atmosphere in which patterns and structures must be disclosed (e.g., solving puzzles) [40,41]. These types of motivations discriminate games from more linear forms of entertainment such as television or reading books [42]. Playing games is thereby considered an active engagement whereby players constantly have to make decisions [43]. Therefore, one central issue that makes motivational elements in games important is the constant confrontation of a player with conflicting goals. For example, players often have to weigh between safe options with lower rewards and riskier ones offering more lucrative benefits [44]. The resolution of such conflicts psychologically results in an important aspect of motivation in games: self-determination. According to self-determination theory, games with a high degree of self-determination are perceived as particularly motivating [45]. The theory of self-determination thereby consists of three components, each of which satisfies one psychological basic need: competence, autonomy and the feeling of social inclusion.
Solving challenges leads to experiences of competence and is central to many games. Challenges are usually embedded in an interactive narrative that adds a specific theme and makes the challenge seem more meaningful [37]. The game design should create a game experience tailored to the player's competence. In this way, the game retains attention and keeps the player in the game [46]. When the level of difficulty of a game is optimally adjusted to the level of competence of the player, the player enters a so-called flow, a long period of concentrated engagement with the game [37,47]. In order to achieve this, it is necessary to communicate a clear game objective, to avoid distraction and to optimally couple both mental and physical challenges. However, the greatest influence on a player's competence experience is feedback [48,49]. Feedback can be both informative and performance-enhancing in nature. In both cases, feedback leads to a positive reinforcement of the experienced competence when it is immediate, a mechanism known as operant conditioning [50,51].
In addition to the quest for challenges, the player's autonomous coping with them is just as important for the motivation of a player. Players want to choose challenges or ways to solve them independently. In doing so, they want to make meaningful decisions that affect their success in the game and thus enable the experience of autonomy [49,52]. Players experience autonomy when their actions in a game are strongly aligned with their own intrinsic values and interests [49,52]. Often, a single choice in a game can be sufficient to generate a satisfactory level of autonomy experience [45,52]. In this sense, supportive feedback that regularly endorses the actions of players and confirms their autonomous decisions is an important design feature of successful games [53]. What reinforces this effect further is social embedding in the form of interaction with friends or other players. This interaction can take either a cooperative or competitive form [45]. The highest degree of social inclusion can be achieved through a cooperative team-oriented game mode, which also partly explains the current success of so-called Massively Multiplayer Online Role-Playing Games [42,52,54].

Geographic Motivational Factors
In contrast to other types of games, the players of location-based games move in real geographical space [55][56][57], so that the latter becomes an important game element [58]. The virtual world, which is created by a game in its own virtual space-time sphere, is now blended with the context of an actual physical environment. The player interacts with both simultaneously and thus, real actions influence the course of a game [59,60]. This blurring of two previously separate universes requires the consideration of a number of additional motivational factors.
One geographic factor that influences the motivation of players is the popularity of the playing field. It is known that players of location-based games, as well as OSM contributors, prefer urban playing fields, including tourist and popular areas [20,28]. The reason for this is not only the correlation of such areas with popularity, but also the often easier accessibility of the playing field, which emphasises the importance of the further embedding of the latter into its wider geographical context. In addition to this macro view, the local physical design of a playing field can also influence the motivation of players on a smaller scale. For example, OSM authors prefer to map features they consider either more popular or more relevant (e.g., traffic signs and pedestrian crossings), and this may well be the case for location-based serious games in the same OSM context [61]. Similarly, the salience and complexity of features influence the willingness of players to engage with them in location-based games [62], which, in turn, influences the level of detail at which they are captured or mapped.
An additional geographic motivating factor is the possibility of integrating a location-based game into everyday life. People interact with their own everyday activity spaces on a regular basis. Location-based games offer the possibility to experience the same familiar and everyday spaces in new ways. This can motivate some players to interact with their everyday environment in different ways than they are used to [31,59,60]. Likewise, the distributions of tasks or the ways in which scoring systems are used can be adapted to local conditions. Thus, the exploration of an area can be enriched by additional sensory experiences (e.g., tasks that require the perception of striking colours or the ringing of church bells) or the stronger examination of the history of a supposedly known place [56,63]. Such local adaptations create unique game experiences that have the potential to motivate players, especially when one considers that people conceptualise their everyday geographies differently [64][65][66][67][68][69]. At the same time, this focus on small-scale places can also limit the geographical reach of players, as playing in a very local environment means spending more time in a very confined space [63,70], which may affect the players' geo-literacy if not compensated otherwise [71,72]. Time-wise, most games follow an asynchronous game mode [73]. This means that players can enter the game and the playing field at any time independently of other players, which makes it easier to integrate a game into everyday life [60].
Other motivating factors related to geographic aspects are those linked to the players' ability and willingness to explore geographical areas. Many players of location-based games are classified as so-called explorers [74]. The majority of players, therefore, perceive the exploration of a geographical space as a rewarding, motivating element of location-based games with high intrinsic value [1,31,32,38,75,76]. In particular, the combination with high degrees of autonomy, as outlined above, is able to increase the motivational potential of extensive territorial exploration [60,63]. Similarly to the exploration of areas, many players also describe the discovery of hidden features at specified coordinates in a geocaching sense as motivating and satisfying, especially in combination with solving tasks adapted to local geographic conditions [31,77,78]. A side effect of playing that is also an effective motivating element is an increase in the range of movement and daily physical activity, which has positive health effects [79]. For the sake of completeness and overview, the following list includes further geographical factors influencing the motivation of players (without claiming completeness). These factors are important for the design of games but considered less relevant in the context of the study presented in this paper: • weather conditions [31,80], • time of day including related factors such as pedestrian traffic [81], • ambient stress exposure [82], • morphology of places and street networks (e.g., frequent dead ends) [83,84], • availability of landmarks for orientation and navigation [85,86], • environmental stress factors like air pollution [82,87].

Methodology
Our methodology uses the so-called triangulation, an approach to increase the trustworthiness of research results [31]. We observed the behaviour of the players during the game and additionally, checked the results obtained by questionnaires and analysed the GPS tracks recorded. In this way, we could identify indications of players' awareness of the spatial structure in the scoring system. The following subsections describe the study's methodological approach and the geographical, sociodemographic, and technical context.

Modified Game
The game we used in our experiment is based on the StreetComplete application. This application allows to collect OSM data in a playful way according to the typical crowdsourcing procedure of task definition, task execution and task solving [78]. The game StreetComplete uses very little jargon, making it accessible to a wide audience, including players without OSM experience. This is further facilitated by visual support for the players, for instance, by providing supporting images illustrating possible answers to the tasks presented ( Figure 1). These design features reduce the risk of erroneous contributions and harmonise the otherwise often ambiguous and heterogeneous semantic classifications of OSM features resulting from the underlying folksonomy [88]. In terms of motivation and reward, StreetComplete is open-ended, meaning that players altruistically collect new or refine existing data. Still, one possible way in which players are rewarded is a counter of tasks solved, although this is a non-competitive leaderboard element of the game not visible to other players. For our experiment, the accessible design of the game is important as this reduces potential interferences that may arise from different prior experiences with OSM among our subjects. The game was modified in various ways in order to allow testing of our hypothesis. This is possible because the source code of StreetComplete is open source and hence available online (https://github.com/westnordost/StreetComplete).
One modified aspect are the tasks that were presented to the players. All tasks used are real tasks taken from the actual game StreetComplete. The subset of tasks we used meets the following criteria:

1.
Simple tasks were preferred to complicated ones in order to minimise the risk of external distorting effects beyond the scoring system, 2.
all tasks must be fully accessible to ensure that they can be solved throughout the game (e.g., indoor tasks are not included), 3.
the tasks should be uniformly distributed across the map to avoid spatial bias from local clusters, 4. and the walking distance between tasks should be about 100 m, as pretests carried out by us have shown that this leads to the most neutral visual result on the map at all relevant zoom levels.
In order to compile a corresponding subset of tasks, we overlaid the tasks available from StreetComplete with a regular grid of 100 m in cell side length. The tasks closest to the grid nodes were taken into account for the game. For cases of equal distances, we preferred simple geometric elements such as points to more complex ones such as polygons. We also preferred simpler tasks such as binary yes/no questions over more complex free text input. This led to a useful distribution of tasks, but there is a slight discrepancy between the grid used and the topography of the playing field, which is unavoidable when using real tasks associated with real-world features. This selection procedure led to a set of 132 tasks overall that were used in the game.
Other modifications made to the game include disabling the upload of data to the OSM server and integrating a mechanism that prevents players from cheating. Since the game was played 40 times by different players sharing the same tasks, uploading data to the actual OSM database would have resulted in potentially incorrect or inconsistent contributions. This had to be avoided in order not to interfere negatively with the OSM project. More important in the light of our experiment is the anti-fraud feature. We have created a 50-m buffer around tasks to ensure that players must be near tasks to solve them. Otherwise, unmotivated subjects could have interfered with the results by solving tasks without visiting them. Such a distance-based filter can, of course, lead to frustration in certain situations. For example, if a player is asked to enter the number of floors of a building and could do so from a distance, this can lead to dissatisfaction with the game mode. In most cases, however, we believe that the mechanism will help to ensure the validity of the results obtained. Another modification made concerns the recording of GPS tracks during the game. We recorded GPS coordinates alongside the walking speed at 10-second intervals. These were stored into a database hosted on a server in real time.
All collected information was merged with a unique player ID into one data record. This includes the completed tasks and scores, the GPS tracks and the completed questionnaires (see Section 3.6). In order to make the records anonymous and to be able to reuse the same device, we used randomly generated Universally Unique Identifiers (UUIDs) instead of alternatives such as hardware-bound serial numbers. In practical applications like our experiments, the generated UUIDs are almost guaranteed to be unique [89].

Spatial Scoring System
Our experiment used two scoring systems. The first scoring system shows a strong spatial pattern in the form of spatial clustering, while the second system used is spatially random. The choice of a random pattern for comparison is based on the idea to eliminate all systematic spatial variation in the game played by the control group. Other configurations would have been possible too but would have represented hypotheses other than the ones we want to test in this paper. For instance, a uniform pattern without any local score variation would have been another interesting case with an extreme level of autocorrelation (i.e., with a a spatial autocorrelation of 1). The way we actually set up the scores, however, allows to investigate the influence of the spatial structure in comparison to a complete absence of the latter, which is exactly what we want to test. Both scoring systems used were generated from simultaneous spatial autoregressive (SAR) models [90] of the form where V is a matrix of pairwise spatial weights and ρ denotes the so-called spatial autoregressive parameter. The vector ε represents the residual errors. The transformation of the model into the evaluable so-called reduced form results in where I is the n × n identity matrix. The SAR generating operator (I − ρV) −1 performs matrix inversion to generate simultaneous autoregressive random variables and was used to generate the results we use in both scenarios of our experiment. The parameter ρ was used to control the degree of systematic spatial structuring in the resulting set of random variables. For the case of spatially structured scores, we adjusted ρ to 0.99, while ρ was set to 0.01 for the randomised variant. For the residual vector ε we drew normal variates from N(10, 50). The left multiplication of this column vector with the matrix of the autoregressive variables gives us raw values which are not yet scaled to a suitable and human readable range. We scaled the final scores to the interval [5,100] and rounded all the scores to five steps. Both scoring systems are shown in Figure 2. The map shown in Figure 2a shows that the clustered scoring system led to an alternating radial distribution arranged around the start point located in the centroid of the playing field. This arrangement guarantees that in principle, all clusters can be reached with similar effort. This way, the set-up generated supports our hypothesis testing by not favouring any particular part of the map systematically. One minor limitation caused by the local topography of the playing field, however, is that the high-score clusters can be reached slightly more easily from the start point coordinates. It is further of note to mention that the map view of the modified app does not reveal the scores to the players a priori. This is important to avoid visually affecting and distracting players, which could have distorted our hypothesis testing. Instead, players are presented the scores achieved only after solving tasks in the game.

Subjects
The experimental design of our study is based on the observation of actual players playing the two variants of StreetComplete under real conditions. We recruited 40 volunteers to act as test persons. Of these, 28 were geography students enrolled at Heidelberg University. The remaining 12 subjects also came from academic backgrounds and were therefore largely comparable with the students involved. All the subjects were either recruited via a student mailing list or by using personal contacts. The youngest participant was 19 years old, while the oldest participant was 41 years old. The average age was 27 years. In gender terms, 25 of our participants were male and 15 female. Overall, the group was relatively homogeneous in terms of demographics, educational level and technical knowledge. This is not only beneficial for the consistency of the results achieved. The characteristics also make our group of test persons representative of the OSM community, which is considered to be predominantly male, 20-40 years old and above average technically proficient [25,30,91]. The latter is important because it may allow the transfer of our results to other location-based serious games in the OSM ecosystem.
All the subjects were randomly assigned to one of two equally sized groups: a treatment group and a control group. This allocation was done using a Python script to avoid any potential subconscious subjectivity. The treatment group was exposed to the modified game equipped with the spatially structured, non-random scoring system as described in Section 3.2. In contrast, the control group was confronted with the spatially randomised arrangement of scores under ceteris paribus conditions.

Playing Field
The game was played on a real playing field in Heidelberg. The playing field comprises parts of three different urban neighbourhoods of the city: Bergheim, Weststadt and Altstadt (the Old Town). While Bergheim is dominated by a large historic university campus, the other two areas are distinguished by their respective dominant architectural styles. The western part of the city is dominated by Wilhelminian-style houses from around 1900, while the Old Town of Heidelberg is characterised by a uniform ensemble of Baroque architecture. Figure 3 illustrates the character of the playing field. What the three districts have in common is that they are all perceived as pleasant and offer a high quality of stay. This is an important feature for our study, because spatial heterogeneity in the attractiveness of places across the field could have led to a possible bias towards the more attractive areas. The entire area of the playing field is further similarly urban. Heterogeneity with respect to this characteristic might have impacted the internal attractiveness of the area as was described in Section 2.2. The division of our chosen area minimises this risk. The playing field is also more or less uniform with respect to the street network morphology. This ensures a fair distribution of the scores across the map for both low and high-value clusters, as well as similar attractiveness and exploration capability in all parts of the field.
The playing field covers an area of 1.3 square kilometres, which is close to the optimal size of 1.5 square kilometres recommended in previous studies [56] for a playing time of 30-60 min. A playing time of 30-60 min is considered suitable here, as this length does not exhaust the players and at the same time allows sufficient exploration of the playing field. All players started playing the game in a park called Schwanenteich (indicated as 'Start' in the maps in Figure 2). This starting point is located near the centroid of the area and is about the same distance from all the attractive areas mentioned above. Like the balanced attractiveness of the entire area, the chosen starting point minimises the risk of spatial distortion in the direction of a single above-average attractive part of the playing field. The study was carried out during the semester break. This reduces the probability that students, in particular, might be burdened by routine follow-up activities and thus, less motivated to participate seriously. Further, all participants were advised to play the game for at least 20 min.

Indicators
In order to test the hypothesis of whether a spatially structured scoring system has an influence on gaming behaviour, meaningful indicators are required. We have identified a number of indicators that reflect different aspects of player motivation and behaviour during the game. These can be grouped into two classes.
One class of indicators we consider is indicative of the influence of spatial structure on the intrinsic and extrinsic properties of the game. Examples include the attractiveness of the tasks (intrinsic) and the general interest in playing location-based serious games (extrinsic). We hypothesise that the recognition of structures and thus of a meaningful underlying systematic leads to a strengthening of the positive (and reduction of negative) motivational characteristics. Indicators classified in this class include (top of Table 1): game duration, distance walked, average walking speed and number of tasks completed. To ensure comparability, we normalised the number of answers given and the distance walked over the game duration. In particular, the combination of these indicators is helpful to identify the effects we are interested in.
Another class of indicators that we evaluated is more directly linked to the geographical exploration of the playing field. The exploration behaviour allows us to draw conclusions about the potential of spatially structured scoring systems to guide players' movements across the playing field. For example, players can consciously or unconsciously move towards clusters of high scores or actively avoid clusters of low scores by moving away from them. We tested these effects using (bottom of Table 1): the area and linearity (the ratio of the movement along the X and Y axes) of the standard deviation ellipses of the players of the visited locations (these should indicate the radius of interaction and the target orientation of the players), a detour factor (the ratio of the actual path to the shortest path, calculated with the OpenRouteService API (https://openrouteservice.org/) using the median coordinates as an additional waypoint to handle circle-shaped paths) and the mix of road types traversed (which represent the study of the diversity of the area). In this way, we covered a number of important aspects, allowing us to draw meaningful conclusions. Table 1. Comparison of average values for the tested indicators. The first four measures listed are indicators of intrinsic and extrinsic properties of the game. The second half of the table gives indicators of the geographical exploration of the playing field. Note: * and ** indicate statistically significant mean differences at α = 0.1 and α = 0.05 respectively. We tested for differences in means using the non-parametric Mann-Whitney U test with the exception of the tasks completed. The latter variable was tested by means of a Student t test instead following a Shapiro-Wilk test for normality.

Questionnaire
Upon game completion, all the subjects were asked to complete a questionnaire (see Appendix A). This questionnaire consists of a combination of closed and partially open questions. The latter offer the possibility of writing an individual answer in addition to the existing answer options [92]. The individual questions do not appear coincidental in the questionnaire but are grouped into coherent thematic blocks.
The first section of the questionnaire mostly deals with the entertainment value and handling of the game. Although these questions have little relevance for the hypothesis of the investigation, they facilitate easy entry into the questionnaire through their direct relation to the game. More importantly, these initial questions leave the respondent unclear as to the objective of the investigation, which supports unbiased and truthful answers to the following questions. Furthermore, motivating factors and any difficulties that may have arisen are asked. The preset answer options are based on empirical studies on player motivation previously carried out [31,63,75,77].
The second section focuses on the player's already existing experience with OSM and location-based games. These factors have a strong influence on the expectations of the players and the gaming experience [93,94]. In addition, some questions deal specifically with the scoring system. Among other things, it is asked whether the player is of the opinion that clearly defined areas with high or low scores were present and whether detours or longer walking distances have been considered acceptable for achieving higher scores. In addition, questions from this section of the questionnaire aim to better understand the spatial decision-making of the players.

Results and Discussion
We organised our results into three main subsections. The first part describes the differences between the two scoring systems tested with respect to overall global characteristics (i.e., the indicators introduced in Section 3.5). We then provide a localised overview of spatial variations in the movement patterns of both tested groups of players. A third section introduces results on the conscious awareness of the players regarding the presence of an underlying spatial scoring system.

Global Indicators
We assessed the indicators introduced in section 3.5 for both groups playing the two versions of the game (Table 1). An interesting result is that most indicators point to a motivating effect of the spatial structure found in one of the used scoring systems. Players from the treatment group showed an average of 29% longer playing times than the players from the control group. Moreover, the same players were more likely to walk longer distances even after the normalisation over game duration. These results indicate an increased willingness to explore the playing field and the underlying geographical area more fully. At the same time, it was found that the players in the treatment group also moved more slowly, although this effect is rather weak. More relevant seems to be the larger average number of tasks solved by the treatment group. This result may be informative for a possible higher motivation to participate proactively in the location-based game if a certain, in this case, spatial, logic exists in the scoring system.
In addition to the indicators mentioned above, we also evaluated measures that include game characteristics more closely related to the geography of the playing field. The areas of the standard deviation ellipses calculated from the players' GPS trajectories, however, did not differ substantially between the two groups tested. Both groups seem to have explored similarly large areas. Yet, the standard deviation ellipses give only a rough estimate of the areas explored. More revealing is the difference in the linearities and hence the shapes of the ellipses. The treatment group generated trajectories that led to more bulbous ellipses, indicating lateral movements and a more complete exploration of local areas. In contrast, the control group shows more directional ellipses that suggest a more linear exploration of the area. This result is further supported by the observation that the treatment group, on average, is more likely to take detours rather than direct routes. The detour factor is one higher for the treatment group, which means that the players in this group added the length of a full shortest path to the distance that would have been necessary to traverse the area. In addition, the treatment group also crossed a wider range of road types, further substantiating the results discussed in this paragraph.
The results presented above are important indications and thus support evidence for the possible effects of spatially structured scoring systems. Nevertheless, it is important to critically review their scope and validity. A problem that often occurs in studies such as the one presented here in which human subjects are asked to spend a certain amount of their time is a limited number of available test persons. The p-values given in table 1 should therefore be treated with caution as the statistical power of the estimators is negatively influenced by our relatively small sample size [95]. In turn, this means that for larger experiments, the results should be even clearer than our results in terms of p-values, whereas our interpretation of the effect sizes might be a bit too optimistic. A low statistical power leads to a lower likelihood of discovering effects when they are actually present. The p-values given are therefore likely to be overestimated, which is an interesting observation given the considerable absolute differences that have been revealed. It is also important to consider the nature of the results presented. These are quantified correlations and not causal relationships. The results should therefore be considered as indicative, not confirmatory, in nature.
Taking into account not only the differences in mean values, but also other characteristics of the data distributions largely supports our above results. Figure 4 shows box plots and histograms for the game duration and detour factors for which the mean values differ significantly between the two groups tested. An important observation in the box plots is that both variables exhibit similar outlier behaviours for the control and treatment groups. This reinforces the validity of our results on these variables reported above since the mean differences are unlikely to be due to outliers. Another observation made in the box plots is that the median especially of the game duration variable is much higher for the treatment group. The absolute difference between the two medians is 9 min, which is close to the mean difference reported in Table 1, the latter being further validated as the medians are less susceptible to outlier effects. The histograms also show that in both cases, the distributions for the treatment group are concentrated further to the right, confirming and supporting the above results and interpretations. Similar findings can be made for the other, non-significant mean differences from Table 1, taking into account the associated distributions. The corresponding charts can be found in the Appendix B in Figure A1.

Local Movement Patterns
We studied the local movement patterns of the players to obtain more detailed insight into their use of the playing field. The maps in Figure 5 show clusters of the numbers of visits to tasks for both scenarios of the game. The clusters were generated using Getis-Ord G * i , a method that allows to reveal geographic concentrations in the extremes of a distribution [96,97]. In the maps presented, coldspots (i.e., concentrations of small values) indicate areas where visited and unvisited task locations are closely adjacent (unvisited tasks were not included in the cluster calculation). Although both maps show hotspots (i.e., accumulations of high values) in the centre of the field, the main difference between the results for the treatment group and those for the control group is that players from the latter group move more into areas with low values. This effect is most evident in the eastern part of the playing field. While there are a number of coldspots in that part of the map for the control group, the same area shows a large number of unvisited tasks for the treatment group. At the same time, the players of the treatment group visit more locations in the central parts, which spread out to the north and south. These are the more rewarding areas in this scenario, so it seems plausible to observe players from the treatment group visiting these parts of the playing field. What also stands out is that the players in the control group left 51 tasks unvisited whereas the players in the treatment group only left 43 tasks unvisited. These results indicate a possible motivating effect of the geographically clustered high scores.
The results described above are further backed up by the results of a bivariate spatial analysis of the task visits and the scores offered. We calculated the spatial correlation between the two variables using the bivariate local Moran's I coefficient [98] calculated using the number of visits and their geographically surrounding scores. Figure 6 shows that for the treatment group, we find a very strong correlation between visits made and spatially surrounding scores. This is reflected in a large number of high-high links, that is, high numbers of visits found in areas with high scores at the same time. The corresponding global bivariate spatial correlation coefficient is 0.39 and was found to be highly significant. In contrast, there is no such spatial correlation for the control group, which has a non-significant global coefficient of −0.01 close to the expected value of this statistic. There are almost no high-high links, and the overall pattern of the different types of clustering seems to be more random than with the treatment group. These results support the findings from the hotspot clustering analysis presented above and also provide an indication of a possible correlation between gaming behavior and the underlying structure of the scoring system.  Our results concerning local movement patterns must be considered in the light of certain limitations. Using the same starting point for all subjects may have caused some spatial coincidence in the visits of tasks. We calculated the same bivariate local Moran's I coefficient from above also for the numbers of visits of both versions of the game. After removing three outliers, the correlation is still 0.29, indicating a possible effect of our choice for the starting point. Future research should explore a more optimal configuration that takes account of both player behaviour and score clusters. Moreover, the large road just south of the starting point may have caused a larger number of subjects to initially move north instead of south. Nevertheless, we believe that our results, obtained with a reproducible method, provide evidence that will stimulate further research on the effects of spatially structured scoring systems.

Scoring System Awareness
The evaluation of the questionnaires and the players' GPS tracks allows us to investigate the players' in-game awareness of the existence of a structured scoring system. The visual inspection of the mapped trajectories ( Figure 7) shows no obvious differences between the two groups tested. The players in the treatment group seem to have explored the southern part, which is located in one of the high score clusters, in more detail. In contrast, the trajectories of the players from the control group show a slightly more expansive, less coherent geometric pattern. However, no clear and systematic differences between the two groups can be deduced from looking at the trajectory maps and their geometric arrangements. A closer look at the trajectories, taking into account the temporal aspects and the questionnaires, reveals a more differentiated picture. It turns out that 25% of the players from the treatment group show strong indications of at least subconscious perception of a systematic pattern in their played scoring system. The trajectories reveal that these players returned to high score clusters several times after leaving them, and only after they had previously solved at least four tasks within those clusters. The same players also stated in their questionnaires that they noticed a systematic pattern in their scores, although some of them suspected that this pattern was related to the social and cultural relevance of the areas (e.g., the university buildings in the northern central part of the playing field). This behaviour was not observed in the control group, which gives rise to the conjecture that more players from the treatment group became aware of an underlying system in the scores than from the control group.
The conjecture that players may have become aware of a system underlying the scores is further supported by a look at the questionnaires. Players from the control group frequently stated that they suspected they were playing a random system. The same players also said that this had led to a certain degree of frustration. Furthermore, twice as many players from the control group rated the number of points awarded for solving tasks as low, while players from the treatment group showed a more positive assessment of the scoring system (Figure 8a). With the exception of only two outliers, all players from the treatment group consistently evaluated their played scoring system in a positive light. In contrast, and with little overlap with the responses from the treatment group, the players from the control group stated the scores awarded were too low. These findings underline the importance of an understandable system in the game mode and add a geospatial component to previous results. However, it should be noted that not only players from the treatment group thought they had found a systematic pattern in the scores. Some players from the control group also speculated about possible systems in terms of the traffic significance of roads, the attribute completeness of OSM features, the geometric distance of a player from a task at the time of solving it, and even the speed at which the tasks were solved.
Another important outcome from the questionnaires is that 65% of all players stated that they would play the game even without any scoring system. This may be related to the fact that most of our subjects come from a geographic or spatial background, which reinforces their natural interest in geographic serious games. Looking at the responses to the respective question in more detail, however, reveals an important difference between the two tested groups (Figure 8b). The players from the control group show a certain degree of indifference towards the scores awarded. These players have indicated frequently that they would play the game also without any scoring system in place. The players from the treatment group, instead, show a much more balanced response behaviour on this question. The number of answers "yes" to the question is almost identical to the number of "no" answers. It seems that the positive experience of the scoring system played has had an influence on the mindset of players from the treatment group, making them less willing to play for completely altruistic reasons. The latter result is important for a better understanding of the potential long-term consequences of using spatially structured scoring systems, and hopefully motivates respective long-term studies. Overall, the results obtained reveal that only players from the treatment group show systematic and consistent behavior about the questionnaires and their actual movements as shown in the trajectories.

Synopsis
Our results reveal a number of relevant theoretical aspects. Some players stated that they have become aware of a systematically structured scoring system, even when this was actually not the case. Our triangulation approach allowed us to identify those players who recognised the underlying system and acted accordingly. This shows that there is a certain discrepancy between the perception of the structure of scoring systems and the actual control of actions through these systems, an aspect that deserves more attention in future research. Furthermore, our results show that the established player types as outlined in [74] may not be complete for the case of location-based games. Some of our subjects have shown characteristics of both achievers (players who try to maximise the number of points scored) and explorers (players who have a genuine interest in discovering the geography of the playing field). In contrast, some other players have shown a tendency to be what one might call path optimisers, as they tried to cross the field quickly and efficiently. For location-based games, therefore, a review of existing player categorisations might be necessary.
Our results complement the existing knowledge about the importance of comprehensible scoring systems. Scoring systems have the ability to stimulate and activate players and make them play longer [100]. They also serve as instruments for self-evaluation and comparison [101]. For games that require the development of decision strategies, it is important to what extent the players become aware of the functioning of the underlying scoring system [102]. This is the case in location-based serious games, where players must operate in a real, physical environment. Optimising scoring systems in such games requires a thorough knowledge of how scoring systems work. The results presented here contribute a spatial perspective that is relevant for the geographical context of location-based serious games. The knowledge gained will thus be valuable for game designers to better utilise spatial structure when designing comprehensible scoring systems.

Conclusions
In this article, empirical findings on the influence of spatially structured scoring systems on the behaviour of players of location-based serious games are presented. The study was conducted in a real urban environment in the city of Heidelberg in Germany. Two groups of players played two different versions of the game StreetComplete, one of which was equipped with a spatially structured scoring system. The other version included a random scoring system. The results obtained show that many players at least subconsciously adapted their playing behaviour to the presence and structure of a scoring system. Players from the treatment group tended to move more frequently to areas where high scores were accumulating. In contrast, players from the control group reported a certain level of frustration caused by a perceived randomness of the scoring system played. Players who were confronted with a spatially structured scoring system also showed a longer average playing time, walked longer distances and tended to take detours to explore the area more extensively. The evidence revealed in this study is a relevant addition to the knowledge about location-based serious games.
Some of the insights gained in this study may be applicable to games in contexts other than OSM. The so-called triangle of shared data sources presented in [103] shows that user-generated datasets like Wikipedia or the Citizen Science Project eBird share similar principles as the OpenStreetMap project. In the light of the present study, this is promising, as it could allow other researchers to apply our findings to the motivational aspects of other related types of datasets. In practical terms, this could be valuable to use our findings on the effects of spatial structure for targeted data collection, for example, to fill data gaps or to encourage users to react quickly in time-critical campaigns. In this sense, our results are particularly interesting for the use of spatial patterns in the personalisation of games, which further increases the motivation of players [104,105]. Overall, the results we achieved here are likely to be relevant also beyond the specific OSM context.

Future Research
Future research on the role of spatial structure in scoring systems should extend our approach to other cultural and geographical contexts. Our results are only valid for a Western and especially European context and are also limited by a certain local topography. It will be interesting to see how the results vary as the context conditions change. This includes systematically testing interactions between spatial structure in scores and geographical features of a playing field. Similarly, and as already critically noted, future studies should also vary the implementation of the spatial structure in scoring systems (e.g., in terms of the arrangement of the clusters and their size) as well as in terms of the placement of the starting points. This will allow to draw conclusions about possible arrangement and scaling effects in the recognition of applied systems. It should also be investigated to what extent the spatial structure in a scoring system can influence data quality, an aspect that is particularly important in the context of serious games. Another interesting research area that can be related to our results obtained here is the topic of formalising and operationalising place in a human-geographical sense. GIScience has recently begun to investigate this topic more comprehensively [66,69,106,107], and the motivational aspect of this study could also be informative to explanations of how motivating environments influence the perception and conceptualisation of places. Further, in order to overcome the issue of investigating rather small cohorts of players for pragmatic reasons, it would be very interesting to conflate gaming elements with event sampling and other in situ survey techniques [108]. This way, it might be possible to investigate larger groups of players through better integration of the game into everyday routines.