On Evaluating Social Learning Outcomes of Serious Games to Collaboratively Address Sustainability Problems : A Literature Review

Serious games are increasingly explored as collaborative tools to enhance social learning on sustainable management of land and natural resources. A systematic literature review was conducted to examine the current state of the art of the different methods and procedures used to assess social learning outcomes of collaborative serious games. Forty-two publications were identified and included in the review following study selection and quality assessment steps. Extracted data from the publications were categorized in relation to five research questions. Approaches that were used to assess cognitive, normative, and relational learning outcomes of collaborative serious games were subsequently identified based on the categorizations. As a result, these approaches distinguished between the nature of learning in the assessment of collaborative serious games. Combined, these approaches provide an overview of how to assess social learning outcomes of collaborative serious games, including the methods and procedures that can be used, and may serve as a reference for scholars designing and evaluating collaborative serious games.


Introduction
Environmental sustainability problems are typically complex and multi-scale, concern inherent uncertainty, and affect multiple stakeholders and agencies.Solving sustainability problems concerns the management of land and natural resources in a way that creates and maintains prosperous social, economic, and ecological systems [1].To address such problems, decision-making needs to be adaptive to deal with the uncertainties and needs to include the diversity of knowledge and values of all affected stakeholders.To achieve this, scholars have advocated active experimentation and continuous evaluation, summarized as learning-by-doing, in natural resources management [2][3][4][5].Central to these approaches is collaboration between and learning among researchers, resource managers, and resource users in order to find sustainable solutions [1,[6][7][8].
Learning, in particular social learning, is therefore seen as a prominent driver and normative goal in natural resources management [2,[9][10][11][12].Although there is debate to find a commonly shared definition for social learning (see e.g., [3,11,13,14]), scholars increasingly agree that social learning has occurred when a change in understanding-related to for example the system, problem at hand, agreement, and collective action-is achieved through interaction in collaborative and participatory settings [11,13,[15][16][17].Social learning outcomes therefore require deliberative interactions, where multiple stakeholders work together and build relationships, which should ultimately lead to collective action [14,15,18].Baird et al. [17] define three types of learning outcomes in relation to social learning: cognitive, the acquisition of new or restructuring of existing knowledge; normative, a shift in viewpoints, values or paradigms; and relational, an improved understanding of others' mind-sets and enhanced trust and ability to cooperate between stakeholders.
Serious games are increasingly explored as a method to establish social learning on sustainable natural resources management and urban planning [19][20][21][22].Serious games are generally referenced to as games that have a primary purpose other than entertainment, such as educating, training or informing players [23][24][25].More specifically towards policy-making, Mayer ([26], p. 825) defines games as "experi(m)ent(i)al, rule-based, interactive environments, where players learn by taking actions and by experiencing their effects through feedback mechanisms that are deliberately built into and around the game".A recognized strength of such serious games is that games can include both the techno-physical complexity-the underlying physical elements of the system and its uncertainties-and the socio-political complexity-the strategic interactions between stakeholders in the policy arena-by combining role-play with in-game feedback mechanisms [20,21,26,27].Therefore, serious games fit well with the learning-by-doing approach in natural resources management; serious games offer stakeholders a place to negotiate, deliberate, exchange perspectives used in decision-making and learn about the trade-offs between decisions in the safe experimentation environment of a game [26,27].Moreover, multiplayer and multi-role serious games thereby offer the collaborative and participatory stakeholder interactions that are needed to establish social learning [20,22].The term collaborative serious games is used in the remainder of the paper to refer to serious games that follow the above definition by Mayer [26], but are particularly focused to collaboratively-through mutual engagement of stakeholders in a coordinated effort to solve the problem ( [28], p. 70)-explore sustainable management strategies.
The assumption is that any learning that occurs from playing collaborative serious games is transferable to the world outside the game [26,29].A collaborative serious game can therefore be seen as an intervention or a transitional object [29] that may lead to social learning outcomes.While calls exist for more systematic assessment of learning through serious games (see e.g., [30]), assessing social learning outcomes of collaborative serious games is challenging.On a practical level, gathering data occurs in a collaborative setting that introduces many confounding variables such as players' prior relations, players' attitudes towards serious gaming, and facilitators who have to make decisions on the spot while guiding sessions [30][31][32].On a higher level, the lack of consensus on the definition of social learning itself has made it difficult to assess [8], and indeed few studies empirically and directly assess the learning effects of interventions [3,12,17].
The research presented in this paper adds to this gap by answering the main research question: What is the current state of the art of the different methods and procedures used to assess social learning outcomes of collaborative serious games?To this end, a systematic literature review was conducted to survey the empirical assessment of social learning through collaborative serious games.As such, the contributions of this paper include: (1) summarizing and categorizing different evaluation procedures applied and methods used to assess social learning outcomes of collaborative serious games; and (2) presenting a state-of-the-art overview of approaches to assess social learning outcomes of collaborative serious games.
The next section presents the method of the systematic review and describes all the steps taken in detail.Sections 3 and 4 cover the search and categorization results of the review, respectively.Section 5 discusses approaches used to assess cognitive, normative, and relational learning outcomes of collaborative serious games.Section 6 finally sums up the conclusions drawn from the review and provides an overview of the identified assessment approaches.

Materials and Methods
To perform the literature review, the guidelines by Kitchenham and Charters [33]-an updated version of Kitchenham [34]-were used.These guidelines are an established procedure for conducting systematic reviews, particularly in software engineering.Although collaborative serious games are not exclusively digital-a board game can also be used to explore sustainable management solutions-the use of these guidelines is appropriate as many serious games do make use of software and the guidelines are based on guidelines used in other disciplines.
Kitchenham and Charters [33] prescribe three phases of a systematic review: planning, conducting, and reporting.The planning phase covers confirming the need for a review, defining the research questions, and producing a review protocol.The conducting phase involves developing a search strategy, executing as well as documenting the search, performing a study selection, and extracting data relevant to the research questions.The reporting phase then covers reporting and disseminating the results of the study.In the next subsections, all the steps taken in performing the systematic literature review are described.

Related Work
As a starting point in the study, a search was executed to confirm the need for a review.One literature review by Calderón and Ruiz [35] on the different methods and procedures used to evaluate serious games was identified in this step.Notably, the serious games included in this review are mostly evaluated using a questionnaire as the main assessment method and are mostly evaluated on the educational effectiveness-defined by Calderón and Ruiz [35] as the learning outcomes, usability, and user's experience.The study is similar in terms of its scope and research questions covered, but lacks the focus on social learning and the policy setting.The study was therefore used as a reference for the overall search strategy, such as to sharpen the research questions and to formulate inclusion and exclusion criteria.Other reviews, both literature and meta reviews, focused on the effectiveness of serious games [36][37][38][39][40], not on the assessment procedures.
In addition to these studies, five review papers were identified which focus on serious games in relation to sustainable development [41], climate change [42][43][44], and sustainable water resources management [45].While these do not review the assessment approaches of learning outcomes, the serious games reviewed in these papers do all relate to sustainability and may therefore be of interest to this review.These studies were therefore used to inform the search strategy, for example to define keywords, and as snowball resources.
Concluding, no review was found that focuses specifically on evaluating collaborative serious games and this study therefore adds an in-depth review with this explicit focus to the literature.

Research Questions
To examine the current state of the art of the different methods and procedures used to assess social learning outcomes of collaborative serious games, the following research questions were addressed to analyse publications: RQ1.How is learning through collaborative serious games conceptualized?RQ2.When is data collected in the evaluation of learning through collaborative serious games?RQ3.What methods are used in the evaluation of learning through collaborative serious games?RQ4.Do evaluations of learning through collaborative serious games use quantitative, qualitative or a combination between quantitative and qualitative data?RQ5.What are the learning effects of collaborative serious games according to their evaluations in relation to social learning?
Learning is used over social learning in these questions to not exclude publications that use a different learning conceptualization but do fall within the scope of this review.For all research questions, data was extracted from the publications and subsequently categorized, which is further explained in the data extraction and data analysis Sections 2.7 and 2.8.

Search Strategy
The search strategy defines the systematic approach to identify publications to include in the literature review.Two categories of keywords were used to create search combinations.The first category (A) covered three keywords to identify publications that covered a game-based approach: serious gam* (A1), simulation gam* (A2), and policy gam* (A3).The asterisk was used in each keyword to cover for both game and gaming.Simulation and policy gaming were added to the search terms as pilot searches showed this term, unlike alternatives such as applied gaming, returned search results that were relevant to the scope.
The keywords were subsequently used in search strings to search for all possible combinations between one keyword from category A and one keyword from category B. Using Boolean statements, the search strings therefore covered "(A1 OR A2 OR A3) AND (B1 OR B2 . . .OR B18)".
Searches were limited to publications published since 2007 to review the state of the art; 2007 was chosen in order to cover the last ten years and as pilot searches found few publications providing sufficient information for the scope of the review before 2007-consistent with the results of Calderón and Ruiz [35].
Six databases were similarly determined through pilot searches for conducting the review: ACM Digital Library, IEEE Xplore Digital Library, ISI Web of Science, ScienceDirect, Scopus and SpringerLink.Searches were executed on title, abstract, and keywords.

Study Selection
To narrow down the publications obtained following the search strategy, an initial selection and a final selection step were performed following inclusion and exclusion criteria (Table 1).In the initial selection, each study was reviewed based on its title, abstract, and keywords.This way, publications clearly not covering the scope of the literature review-i.e., a publication describing the economy and markets as a serious game-were excluded.In the final selection, the publications included in the initial selection were reviewed more rigorously according to the same criteria by scanning through each study and readings its conclusion.The entire study was read if necessary to determine inclusion or exclusion.Publications that are only available as abstract 1 1 Abstracts were still reviewed, if the abstract seemed applicable, an attempt was made to obtain the entire publication, exclusion occurred if this was not possible.

Study Quality Assessment
Following the study selection steps, the selected publications were looked at in more detail in relation to the research questions in a quality assessment step.Specifically, questions that can be answered with yes or no were formulated in relation to the research questions: 1.
Does the publication provide a conceptualization of learning?(RQ1) 2.
Does the publication state when data is collected in the evaluation of the collaborative serious game? (RQ2) 3.
Does the publication discuss a method, technique or theory used to evaluate the collaborative serious game? (RQ3) 4.
Does the publication discuss whether the evaluation used a quantitative or qualitative approach, or both in the evaluation of the collaborative serious game? (RQ4) 5.
Does the publication describe learning outcomes of the collaborative serious game following evaluation?(RQ5) The publications included in the study selection step were assessed on each of these questions as yes or no, corresponding to a 1 or 0 score.If a publication included all relevant aspects in regard to the scope of the review, it would therefore score five out of five.Publications that scored four or higher were deemed of sufficient quality and included in the review.Publications that scored exactly three, and were thus excluded, were checked again after completing the quality assessment to confirm exclusion was the correct decision.

Snowballing
To cover for relevant publications missed in the database searches, a snowballing approach was used to identify further relevant publications [46].Firstly, backward snowballing was applied by identifying possible relevant publications from the reference lists of publications passing the quality assessment step.Secondly, forward snowballing was applied by checking all publications citing the publications deemed of sufficient quality using Google Scholar.Thirdly, the five review papers on sustainable development, climate change, and sustainable water resources management [41][42][43][44][45] identified earlier were used to snowball from as serious games covered in these reviews may be of interest to the scope of this review.All papers identified through snowballing were subsequently reviewed following the same study selection and quality assessment steps.

Data Extraction
In the data extraction step, all the publications included in the study selection and deemed of sufficient quality were read in its entirety.Data was extracted from the publications in regard to the research questions.All extracted data, for all publications, were stored in a spreadsheet.At this point, if publications were identified covering the same collaborative serious game without providing additional insight into its evaluation-i.e., a conference paper covering a limited evaluation and a journal paper covering a more extensive evaluation by the same authors-the publication with the lower-quality assessment was excluded from the review.

Data Analysis and Categorization
All data was subsequently categorized in regard to each research question.An inductive approach was used to create categories based on the data retrieved by identifying common themes, for example applied methods as surveyed in RQ3.On the first research question, for example, in regard to the conceptualization of learning, categories included social learning and experiential learning.For RQ5, the typology on social learning by Baird et al. [17] (see Table 2) was used as a framework to define categories that differentiate between the three types of learning.From an assessment perspective, the typology is beneficial as it views learning outcomes from their nature-cognitive, normative or relational-rather than their perceived value.Moreover, the typology separates relational learning as an explicit learning outcome, which is of particular interest in the multi-stakeholder context of sustainable land and natural resources management.
The categorizations were subsequently used to identify assessment approaches used in the evaluation of learning outcomes of collaborative serious games.

Search Results
The entire review, from developing the scope and performing the search, selection, data analysis, and synthesis, was executed between October 2017 and November 2018.Initial database searches were executed on or a few days before the 7th of January 2018, and all publications extracted through these searches were analysed.After processing all retrieved publications, all database searches were updated on the 7th of November 2018. Figure 1 provides an overview of the number of publications found and subsequently included after each step.
As Figure 1 shows, 78 publications retrieved through the database searches were included in the selection steps following the inclusion criteria, of which 41 publications were included following quality assessment.Sixty-six additional publications were identified through snowballing, 45 publications from the publications identified through database searches, and 21 from the five review papers [41][42][43][44][45]. Six of these 66 publications passed the same selection steps and quality assessment.Five publications were subsequently removed from the total publications as these covered the same collaborative serious game without providing additional insight into its evaluation.Forty-two publications were therefore included in the review.Figure 2a shows an overview of how many publications included in the review were retrieved through each database as well as through snowballing.Some publications appeared in multiple databases and are added to these databases in the figure, explaining why the numbers added up are higher than 42.Scopus, as can be expected, provided the most publications included in the review.No publications retrieved from the IEEE database passed the selection criteria and quality assessment steps.Figure 2b shows an overview of the publications included in the review in regard to their year of publication.The figure suggests an increasing trend in the use of collaborative serious games to explore sustainable management strategies.It must be noted that 2018 has the highest publications included in the review (10 publications), even though the review only covers the first 10 months of 2018. Figure 2a shows an overview of how many publications included in the review were retrieved through each database as well as through snowballing.Some publications appeared in multiple databases and are added to these databases in the figure, explaining why the numbers added up are higher than 42.Scopus, as can be expected, provided the most publications included in the review.No publications retrieved from the IEEE database passed the selection criteria and quality assessment steps.Figure 2a shows an overview of how many publications included in the review were retrieved through each database as well as through snowballing.Some publications appeared in multiple databases and are added to these databases in the figure, explaining why the numbers added up are higher than 42.Scopus, as can be expected, provided the most publications included in the review.No publications retrieved from the IEEE database passed the selection criteria and quality assessment steps.
(a) (b) Figure 2b shows an overview of the publications included in the review in regard to their year of publication.The figure suggests an increasing trend in the use of collaborative serious games to explore sustainable management strategies.It must be noted that 2018 has the highest publications included in the review (10 publications), even though the review only covers the first 10 months of 2018. Figure 2b shows an overview of the publications included in the review in regard to their year of publication.The figure suggests an increasing trend in the use of collaborative serious games to explore sustainable management strategies.It must be noted that 2018 has the highest publications included in the review (10 publications), even though the review only covers the first 10 months of 2018.
Table 3 provides an overview of the publications included in the review and the game each publication evaluates.

Categorization Results
This section covers the categorized findings in relation to the formulated research questions.Through an inductive approach, between three and 13 categories were identified for each research question.A complete overview of all the publications and assigned categories can be found in Table A1 in the Appendix A. An overview of the underlying data can be found as a spreadsheet in the Supplementary Materials.
The next subsections present the results of categorization for each research question.

How Is Learning through Collaborative Serious Games Conceptualized?
The aim of this question was to obtain an overview of how learning through collaborative serious games is conceptualized.Thirteen categories were identified in the analysis.In most cases, publications mentioned the categories explicitly and included relevant citations, while in some cases publications lacked these citations.In the latter case, publications were still assigned to the category if a definition or description was included that matches the category definition.Only the categories social learning and experiential learning covered more than one or two publications.Table 4 provides the categories, their definitions, and the publications assigned to these categories.Categories covering only one or two publications are grouped into the other category in Table 4, but can be found in full in Table A1.The aim of this research question was to get an overview of when data is gathered in the assessment of learning through collaborative serious games.Categorizations were chosen based on game sessions as a reference point, with pre, during, post, and post-post data collection.Table 5 provides the categories, their definitions, and the publications assigned to these categories.Notably, 37 out of 42 publications gather data immediately after game sessions (post).Twenty-five of these 37 publications combine the post data collection with data collection during game sessions, while 21 of these combine it with a data collection before game sessions (pre) and 15 combine all three.Only 10 publications collected data well after game sessions (post-post).The aim of this research question was to obtain an overview of the methods and techniques that are used to evaluate collaborative serious games.Table 6 provides the categories used to represent different methods, their definitions, and the publications assigned to these categories.Thirty-one out of the 42 publications used multiple methods-combinations of the five most used methods; questionnaires, observations, debriefings, interviews, and data logging.The aim of this research question was to see whether the evaluation of collaborative serious games relied on qualitative or quantitative or qualitative and quantitative data.Table 7 provides an overview of which publications based evaluations on either of these three categories.Thirty-seven out of the 42 publications used qualitative data.Sixteen of these 37 publications combined the qualitative data with quantitative data.Qualitative data resulted from questionnaires (open questions), observations (unstructured observations by researchers), debriefings, and interviews.Quantitative data on the other hand resulted from questionnaires (closed questions using Likert scales), observations (structured observations e.g., noting and counting certain behaviour), and data logging (e.g., player choices and group decisions).The aim of this research question was to get an overview of the learning outcomes of collaborative serious games.Categories were formulated based on the data extracted from the publications and in relation to the learning typology by Baird et al. [17].Table 8 provides the categories, their definitions and relation to the social learning typology, and the publications assigned to these categories.Thirty-seven out of 42 publications report on cognitive learning outcomes, increased system understanding or raised awareness or both.Only five publications report that playing a collaborative serious game led to a change in views, the category linked to normative learning.Seventeen publications report on relational learning outcomes in the form of understanding other perspectives or building relationships and trust or both.
To formulate and assign categories for this research question, the choice was made to assign publications to categories based only on the text as written in the publications-the data-regardless of whether or not publications described their results as representative for the entire population or as highly or weakly significant.For example, Keijser et al. [65] and Lawrence and Haasnoot [74] both noted results related to the category increased system understanding although Keijser et al.only noted it for those less familiar with the topic, yet both publications were assigned to this category.Making distinctions would otherwise add a selection mechanism susceptible to subjective interpretations of the publications.The results as presented here should therefore only be viewed in relation to the learning outcomes of collaborative serious games can have in general and should not be used to draw direct conclusions on individual publications.

Discussion
This section covers the findings on the approaches to assess social learning outcomes of collaborative serious games.The findings are discussed in relation to the three types of learning as defined by Baird et al. [17]; cognitive, relational, and normative learning.In addition, as social learning should lead to collective action, the real-world impact of collaborative serious games-as reported in the reviewed publications-are discussed.The section ends with a reflection on and limitations of the review.
Forty-two publications were included in the review.In relation to the first research question, social learning and experiential learning are the only conceptualizations of learning used by more than two publications.While some publications provide learning conceptualization related to social learning-i.e., collective learning [49,63] and boundary crossing [48]-other publications provide conceptualizations that do not explicitly include the relational learning component of social learning.This is reflected in the review as the latter publications, as well as the publications that do not provide an identifiable conceptualization of learning, mostly report on cognitive learning outcomes.The next subsections discuss the assessments of cognitive, relational, and normative learning outcomes of collaborative serious games in detail.

Cognitive Learning
Cognitive learning relates to the acquisition of new or the restructuring of existing knowledge.Almost all of the publications in the review-37 out of 42-report on cognitive learning outcomes of collaborative serious games.Two categories were linked to cognitive learning, increased system understanding, and raised awareness, although both can relate to acquiring new knowledge and restructuring existing knowledge.Three common approaches were identified in the assessment: (1) self-reflective questions after game sessions; (2) pre-post measurements of self-reported knowledge on issues; and (3) observed acquisition or restructuring of knowledge.
Firstly, 21 publications describe an approach where participants were asked self-reflective questions in questionnaires, interviews or debriefings after game sessions, sometimes in combination with other methods [49,50,52,55,57,58,[62][63][64]67,68,[72][73][74][76][77][78]80,82,86,89].For example, Dionnet et al. [77] assessed their TADLA approach, which aims to facilitate farmers in the collective modernization of their irrigation system, by combining a questionnaire after game sessions with follow-up interviews.They show that their approach helped farmers to acquire a more comprehensive understanding of how the different system components work together.Becu et al. [62] asked self-reflective questions on what local policymakers learned during their structured, round-table debriefing of LottoSim, a game to support social learning on coastal risk prevention measures.According to their evaluation, the participants especially learned about the dynamics of water expansion in flooding events as a result of the game's integrated simulations.Moreover, the game helped participants to discuss long-term strategies, although participants did not pursue other prevention measures than those already applied in practice.Keijser et al. [65] in turn assessed the ability of the Marine Spatial Planning Challenge to communicate the dynamic and complex interactions between shipping and spatial planning in the maritime environment.Questionnaires after game sessions were applied to assess self-reported learning through closed questions and a five-point Likert scale.Their results show that the game worked well as an introduction to marine spatial planning where participants gain a grasp of its complexities, particularly for those with limited knowledge on the topic.
Secondly, 12 publications analysed self-reported learning through pre-post measurements of participants' knowledge on the issues addressed in the game [58][59][60]65,70,76,78,[80][81][82]87,88].Questionnaires were the common method in this approach, complemented with other methods, although Salvini et al. [81] and Haug et al. [82] instead used interviews and concept maps respectively.Salvini et al. [81] assessed the effects on social learning of their game, that aims to induce Brazilian farmers to explore agroforestry practices, by conducting interviews with participants.The interviews covered the same questions before and after game sessions and included questions on the farmers' practices, knowledge of different systems, and opinions on forming cooperatives.The farmers learned about the technical aspects of agroforestry, such as the amount of investments needed for new infrastructure, and its benefits, such as increased productivity, product quality, and profitability.Haug et al. [82] assessed social learning outcomes through a policy game on EU climate change policy and the member states' sharing arrangements.Participants were asked to draw concept maps [92] before and after game sessions.The analysis of the concept maps was complemented with self-reflective questions asked in questionnaires and interviews after game sessions.Their results show there were some cognitive learning outcomes as certain issues became more central and specific in the post session concept maps-an indication of restructuring of knowledge.
Pre-post measurements of cognitive learning outcomes through questionnaires were based on qualitative assessment of open questions or quantitative assessment of closed questions using Likert scales or a combination of the two.For example, Douven et al. [70] assessed the impact on both raising awareness and upgrading knowledge of playing the Shariva game, which aims to stimulate collaboration among water and related professionals and to resolve transboundary river basin issues.They used questionnaires with both open and closed questions before and after game sessions in this assessment.Their results show that participants acquired knowledge on addressing and resolving transboundary river basin issues.Onencan and Van de Walle [87] assessed whether or not participants gained increased situational awareness-the perception of system elements, comprehension of their meaning and projection of their near future status-from playing WeShareIt Nzoia.To this end, they applied questionnaires based on SART [95], using 10-dimensional subjective ratings, before and after game sessions in combination with in-game performance measures to measure the game's effects.Their results show an increase in participants' situational awareness, on all dependent variables, between the pre-test and post-test of local policymakers.
Thirdly, 10 publications assessed cognitive learning outcomes without asking participants, but rather assessed observed acquisition or restructuring of knowledge by analysing observations and data resulting from game sessions [51,53,54,56,61,71,79,[83][84][85].To assess whether or not participants improved their understanding of the complexity of planning for a sustainable future through their game Futura, Antle et al. [54] analysed field notes on participant behaviour and quotes by identifying themes through coding.Their results show that participants did not necessarily acquire new knowledge from playing the game, but did obtain a better sense of the sheer complexity of sustainable development.Sausse et al. [84] developed a game for local stakeholders to explore management of both genetically modified and conventional crops in an area.To analyse how farmers manage the coexistence of both crop types, they analysed data collected from gameplay, such as the simulation results, player actions and maps drawn by the participants, complemented with observations from notes and audio recordings, to understand why players made certain decisions and why certain events occurred.They show that farmers learned about how to technically implement management measures and to better assess the risk of coexistence of both genetically modified and conventional crops in fields next to each other.

Normative Learning
Normative learning relates to a shift in viewpoints, values or paradigms.Assessing changes in viewpoints and values is particularly challenging as these are difficult to measure and require reflection-and thus time-to occur.How normative learning has been assessed in collaborative serious games is therefore of particular interest.Five publications report on normative learning outcomes [60,74,75,78,88].In addition, Haug et al. [82] did assess normative learning, but did not find evidence of it taking place.Their assessment was based on pre-post measurements, participants rating the extent they agreed or disagreed on propositions before and after game sessions, in combination with participant interviews after game sessions.
Three of the five publications that report on normative learning outcomes used either pre-post measurements or self-reflective questions after gameplay to assess individual changes in views and values.In the evaluation of Ter'Aguas, a game used to simulate negotiations related to land-use planning in a Brazilian municipality, Ducrot, et al. [78] applied a short-term and long-term learning assessment.In the long-term assessment, through interviews conducted eight months after the game sessions, they found that participants had changed their opinion on their role and position in relation to water management issues.For example, a participant noted that (s)he no longer threw away oil after becoming aware of a water quality issue through playing the game.
Two publications applied pre-post measurements that asked participants to indicate to what extent they agreed with propositions, an approach similar to Haug, et al. [82].Meya and Eisenack [60] assessed how playing the board game KEEP COOL, which aims to enhance public understanding of climate change science and to raise awareness among public, scientific and environmental organizations, changes participants' beliefs on international climate politics.To this end, they asked participants to rate their opinions on for example personal responsibility towards climate change mitigation and the confidence in politics to act against climate change on a five-point Likert scale in questionnaires before and after game sessions.Their results show that participants perceived increased responsibility and became more confident in the potential of politics to act.Similarly, Sterman et al. [88] used pre-post measurements of closed question questionnaires to assess changes in attitudes towards climate change from playing the game WORLD CLIMATE, which aims to help participants understand the dynamics and geopolitical implications of climate change.Their results show that participants became more worried about climate change, believed it to be more personally important, and were more likely to urge for immediate action.
The other two publications that report normative learning outcomes cover different versions of the Sustainable Delta Game, which aims to help participants learn about preparing water management strategies for an uncertain future.Lawrence and Haasnoot [74] applied an adapted version of the game in a regional water management project in New Zealand to test how dynamic pathways, pro-active planning to adapt to uncertain future developments, can be adopted in decision-making.Based on observations during game sessions complemented with analyses of the debriefings and interviews with participants after game sessions, they conclude that the game led to group convergence on the necessities to make decisions in uncertain conditions.Van der Wal et al. [75] in turn explicitly assessed such group convergence by measuring changes in group perspectives.They applied perspective mapping, a method rooted in Cultural Theory to classify, interpret, and analyse different individual perspectives [91].Specifically, participants were asked to select statements they agreed with in relation to a topic during each phase of the game.The method is therefore quite similar to the agreement or disagreement ratings applied by Haug, et al. [82], Meya and Eisenack [60] and Sterman et al. [88].Group convergence was subsequently assessed by comparing the agreement on statements between participants.The analysis was complemented with analysing recordings of game sessions to understand why perspectives changed and participant interviews after the game to validate that participants found the game and its underlying model credible.Their results show that individual perspectives changed during gameplay, leading to group convergence in most game sessions.

Relational Learning
Relational learning relates to obtaining an improved understanding of others' mind-sets, enhanced trust and ability to cooperate.Around half of the publications-17 out of 42-report on relational learning outcomes.Two categories were identified in regard to relational learning outcomes: Understanding other roles and perspectives and building relationships and trust.The first category, reported by 13 publications, relates to participants acquiring an increased understanding on other participants' roles, mind-sets, and points of view [48,49,56,58,63,64,66,70,74,78,80,82,89].The second category, reported by seven publications, instead relates to enhancing participants' ability to collaborate by building relationships and enhancing trust [48,49,52,55,69,80,81].Both learning outcomes were assessed mostly qualitatively, based on self-reflective questions in questionnaires, interviews and debriefings conducted after game sessions or well after game sessions (post-post).However, building relationships and trust was reported more often when collaborative serious games were used in a community or an existing project where participants addressed a shared problem that directly impacts them.
For example, in the same long-term assessment as described in the previous section by Ducrot et al. [78], participants indicated in the interviews that they learned about the relationships between issues and stakeholders, and obtained a better understanding of other stakeholders' positions.Similarly, Rumore et al. [58] interviewed participants four to six weeks after game sessions in the assessment of a tailored role-playing game to enhance engagement in the NECAP project, a project that aims to increase the climate change adaptation readiness of coastal communities in New England.In the interviews, participants showed increased empathy and appreciation for other perspectives as a result of playing the game.In particular, participants noted these learning outcomes as a result of taking on another role, looking at the issue from another perspective, and engaging openly with other stakeholders' point of views.Different from these examples, Carson et al. [80] used questionnaires both directly after game sessions and three months later to assess the effect of the Multi Hazard Tournament on decision-making, social learning, and relationship building in watershed management.The post-game questionnaire showed that learning about other stakeholders' perspectives was the main learning outcome.Moreover, participants indicated in the follow-up questionnaire that game sessions had led to them pursuing potential projects with other participants.
Exception to this approach is Jean et al. [48], who combined self-reflective questions with interaction and social network analysis to assess how playing Aqua Republica-a game focused on sustainable watershed management-enhanced collaboration and knowledge co-creation.In particular, participants were asked to rate the collaboration in their game session both during-after completed phases-and after the session.Interaction analysis and social network analysis were applied on session recordings of the game phases to analyse the amount of interactions between participants, the quality of interactions and the effect pre-existing relationships had on group dynamics.Their analysis shows that participants shared ideas and forged relationships during game sessions, providing evidence that collaborative serious games can help to enhance connections and develop a mutual understanding between stakeholders.

Collective Action
Apart from cognitive, relational, and normative learning outcomes, social learning should ultimately lead to collective action [14,15,18].Arguably, the best indicator of collective action is showing that using a collaborative serious game led to real-world impact.Indeed, a few publications describe how game sessions within a project or community led to real-world initiatives and decisions.For example, the application of the Sustainable Delta Game in a water management project in New Zealand by Lawrence and Haasnoot [74] led to the adoption of adaptive pathways strategy in the project's climate adaptation strategy.In the interviews conducted before game sessions by Salvini et al. [81], farmers indicated that they had had negative experiences with farmer cooperatives in the past.In interviews after game sessions, farmers individually indicated that they were considering to create a new cooperative together with other participants.
One publication however set out to measure the real-world impact of using a collaborative serious game.Meinzen-Dick et al. [83] used two different follow-ups after game sessions to improve local community understanding of groundwater interrelationships and to stimulate collective governance of groundwater in India: (1) interviewing locals who did not participate in game sessions to measure whether or not there were any spill-over learning effects; and (2) gathering community-level data of different communities in the region, both communities where the collaborative serious game was used and where it was not used, to evaluate whether or not the game led to changes in community practices.In relation to the latter, more communities-statistically significant-where the game was used adopted lessons from the game, leading Meinzen-Dick et al. [83] to conclude that playing the game led to real-world impact.

Reflection and Limitations
All literature reviews aim to provide the complete overview of relevant publications and the state-of-the-art.There are however some threats and limitations to note in that regard.Firstly, a literature review is always limited in the keywords that it uses in the document search.In this case, a combination of generic and specific keywords was used in relation to sustainability and disciplines dealing with sustainability issues.The generic keywords aimed to make sure collaborative serious games on a variety of sustainability topics were found.The specific keywords were added based on relevant review papers and collaborative serious games known to the researchers.A threat here is that the prior knowledge results in a bias.the generic keywords in combination with snowballing off identified publications however limited any influence of such a bias.
Another threat is that the analysis of the identified publications, and particularly during the study selection and quality assessment steps, contains some subjectivity.After all, the majority of these analysis steps were conducting by a single researcher, the first author of this paper.To limit this threat, a systematic approach to the review was applied, following the guidelines by Kitchenham and Charters [33].Firstly, the study selection protocol enabled an initial selection that excluded publications clearly out of scope of the review, but included publications that required more scrutiny and in-depth assessment.Secondly, the quality assessment step added a structured way of selecting the publications relevant to include in the review and while answering the quality assessment questions may still have included some subjectivity, this could only affect publications that scored just short of the threshold and were thus excluded.These publications were carefully looked at again after completing the entire quality assessment to check whether or not exclusion was a correct conclusion.None of these publications were subsequently added to the review.

Conclusions
Serious games are increasingly explored as a method to establish social learning on sustainable natural resources management and urban planning.The aim of this review was to answer the main research question: What is the current state of the art of the different methods and procedures used to assess social learning outcomes of collaborative serious games?A systematic literature review was conducted, which reviewed 42 publications that applied serious gaming to collaboratively explore sustainable management strategies of land and natural resources.The publications were analysed in order to determine how three types of learning outcomes-cognitive, normative, and relational learning-were assessed.To this end, data was extracted from the publications in relation to five research questions and were categorized following an inductive approach.The categorizations made it possible to identify approaches used in relation to assessing cognitive, normative, and relational learning outcomes of collaborative serious games.
Most evaluations of collaborative serious games focus on assessing cognitive learning, which relates to the acquisition of new or the restructuring of existing knowledge, for example whether or not participants obtain a better understanding of the complexities of climate change.Few evaluations focus explicitly on normative learning, which relates to a shift in viewpoints, values or paradigms, for example on an individual's feeling of responsibility towards climate change mitigation.That few evaluations assess normative learning is not surprising as it generally requires experience, reflection, and time before a person would shift viewpoints, values or paradigms.Nevertheless, the reviewed publications do show that collaborative serious games can lead to normative learning.About half of the reviewed publications report on relational learning outcomes, which relates to obtaining an improved understanding of others' mind-sets, enhanced trust, and ability to cooperate.That only half assess this learning type seems to relate to whether or not relational learning is seen as part of the conceptualization of learning and whether or not the collaborative serious game is used in an existing project or community.Table 9 provides an overview of the approaches used in the assessment of the three learning outcomes, including the common methods and procedures used.The review shows that three common approaches are used to assess cognitive learning outcomes.Firstly, asking participants self-reflective questions such as what they learned after game sessions, either in a qualitative assessment through open questions or a quantitative assessment using closed questions.Secondly, applying measurements of participants' knowledge before and after game sessions, also based on self-reporting by asking open or closed questions to participants.Thirdly, assessing observed acquisition or restructuring of knowledge by analysing observations, either qualitatively or quantitatively, and in-game collected data.
Two approaches are used to assess normative learning outcomes.Firstly, asking participants to rate their views on propositions in pre-post measurements, optionally complemented with such ratings during game sessions, through closed questions questionnaires.Next to assessing individual shifts in viewpoints, pre-post measured individual shifts can be compared within the group of participants to determine whether or not convergence of group opinion occurs.Secondly, interviews with participants well after game sessions-weeks to months-can be used to reflect on the impact of collaborative serious games on normative learning.
Two approaches are also used to assess relational learning outcomes.Firstly, asking participants self-reflective questions following game sessions-the approach used in all but one publication reviewed-in order to determine whether or not participants gain a better understanding of other stakeholders.Secondly, asking participants to self-report the level of collaboration in combination with analysing participants' interactions during game sessions.This approach quantitatively analyses the group dynamics during game sessions in order to determine whether or not relationships are enhanced and a mutual understanding is developed.
Of course, evaluation approaches are not exclusive from one another.Many of the reviewed publications in fact combine approaches or use multiple methods to assess learning outcomes.A multi-method evaluation approach is also recognized to fit well with serious gaming in general.However, explicitly assessing cognitive, normative, and relational learning may help to separate the nature of learning through collaborative serious games.The overview provided in this paper may therefore serve as a reference for scholars designing and evaluating collaborative serious games, particularly those that address sustainability problems, but may also be useful in relation to serious games that aim to collaboratively address problems in complex systems in general.

Appendix A
Table A1.Overview of all publications included in the review, the quality assessment and the categorization of the data in relation to the research questions.

Sustainability 2018 , 27 Figure 1 .
Figure 1.Number of publications included after each step.Flow chart based on the PRISMA guidelines [47].

Figure 2 .
Figure 2. Overview of the publications retrieved and included in the review: (a) overview of publications retrieved per database or through snowballing (note: some publications appeared in more than one database); (b) overview of the amount of publications included in the review per year of publication.

Figure 1 .
Figure 1.Number of publications included after each step.Flow chart based on the PRISMA guidelines [47].

Sustainability 2018 , 27 Figure 1 .
Figure 1.Number of publications included after each step.Flow chart based on the PRISMA guidelines [47].

Figure 2 .
Figure 2. Overview of the publications retrieved and included in the review: (a) overview of publications retrieved per database or through snowballing (note: some publications appeared in more than one database); (b) overview of the amount of publications included in the review per year of publication.

Figure 2 .
Figure 2. Overview of the publications retrieved and included in the review: (a) overview of publications retrieved per database or through snowballing (note: some publications appeared in more than one database); (b) overview of the amount of publications included in the review per year of publication.

Table 1 .
Inclusion and exclusion criteria.

Table 4 .
Overview of categorization for Research Question 1 (RQ1) on the conceptualization of learning through collaborative serious games.

Table 5 .
Overview of categorization for RQ4 on when data is collected in the evaluation of collaborative serious games.
4.3.What Methods are Used in the Evaluation of Learning through Collaborative Serious Games?

Table 6 .
Overview of categorization for RQ5 on the methods used in the evaluation of collaborative serious games.Do Evaluations of Learning through Collaborative Serious Games Use Quantitative, Qualitative or a Combination between Quantitative and Qualitative Data?

Table 7 .
Overview of categorization for RQ6 on use of qualitative and/or quantitative data in the evaluation of collaborative serious games.
4.5.What Are the Learning Effects of Applying Collaborative Serious Games According to Their Evaluations in Relation to Social Learning?

Table 8 .
Overview of categorization for RQ5 on the learning effects of applying collaborative serious games according to their evaluations in relation to social learning.

Table 9 .
Overview of approaches used to assess cognitive, relational and normative learning through collaborative serious games.