Measuring the Change Towards More Sustainable Mobility: MUV Impact Evaluation Approach

: Urban areas can be considered the ground for the challenges related to the UN’s Sustainable Development Goals (SDGs). The objective of shaping cities as human settlement that will see a more inclusive, safe, resilient, and sustainable future is often argued in literature as an issue dependent on behavioral change of inhabitants in urban areas. In this paper, the authors question if experimental applications based on gamification can co-produce more sustainable neighborhoods through an impact evaluation method that departs from individual choices within the complex of urban mobility. This investigation is carried out within MUV (Mobility Urban Values), an EU research and innovation project, which aims to trigger more sustainable urban mobility in six pilot cities. This article describes the critical method of validation, an impact assessment of the MUV experimental gamification in the pilot cities, in order to represent a proof for future urban strategies. This methodological approach is based on an evaluation structured on indicators of both impact and process suitable for urban contexts. As based on six pilot cities, with possibilities for transferability to other contexts and scalability to other cities, the method represents a reference work for the evaluation of similar experimental applications.


Introduction
In recent years, urban sustainability has become a topic with special attention on policy and research. This is manifested also by the United Nations Sustainable Development Goals (SDGs) [1], which dedicate an entire objective to cities in the 2030 Agenda for Sustainable Development specifically to 'make cities and human settlement inclusive, safe, resilient, and sustainable' (goal 11). Urban areas are touched in one way or another by almost all the UN's SDGs, with the need of addressing also issues, such as road traffic (e.g., goal 3.6) and air pollution in cities (e.g., goals 3.9 and 13.2). In parallel, researchers, private companies, and public bodies are nowadays focusing on sustainability approaches that will turn to participative and people-oriented solutions in future cities. Approaches of this kind include questions on how people can assume positive attitudes and routines regarding the use of non-renewable resources. Moreover, the literature on sustainable change often focuses on how people can individually be oriented and sometimes persuaded to use fewer resources; for example, raising awareness on the benefit to achieve more systemic and collective sustainable transitions. Such a movement seems to take as a basic principle what Stephen Wendel pointed out [2]: people are reluctant in changing their habits. Individuals rarely recognize their power to affect and improve the liveability of the context they inhabit as their own neighborhood/city/region/planet and as their own life. It is difficult, indeed, to believe in the 'butterfly effect' (i.e., our individual routines have an impact on an entire ecosystem), so that the causal link between every little change of daily routines can have an impact on a systemic dimension of the SDG as fundamental and relevant.
With the attempt to enhance this awareness, studies on behavior change highlight gamification as an effective perspective [3]. Gamification is often defined as the process of game design elements that structure playful activities [4] on non-game contexts [5]. Game design elements are generally driven by a user-perspective that leads individual human personal motivation and/or perception to provide more effective, efficient, engaging, enduring, and entertaining experiences [6,7]. Points, ranks, levels, competitions, challenges, rewards, badges, or reputations are designed to keep users, as players, in the game [8]. Gamification has been much used as a tool for user-design products or services, but also in business with the purpose to motivate individual behavior change.
Since the beginning of the XX century, psychologists, anthropologists, and philosophers have studied the function of playing for human beings. Karl Groos in his "The play of man" [9], besides acknowledging that the instinct of playing has deep physiological bases, also argued that "play" has a fundamentally social function: it offers to humans (and animals) a tool to mastery those activities that bring prosperity for their species. Later in time, Bernard Suits [10] argued that when individuals engage in gamification, they change the temporary perception of their individual experience sometimes breaking habits: "playing a game is the voluntary attempt to overcome unnecessary obstacles". Further studies on gamification have pointed to diverse psychological motions (e.g., social motivation, intrinsic benefits, monetary, and/or personal rewards), which have proven to have the power to change people choices about their own behavior as, for example, Mihaly Csikszentmihalyi's [11] study. Gamification can be seen as a means to facilitate change at the level of individual unsustainable habits by promoting more informed, enjoyable, environmental, and social friendly choices.
In the last decade, all over the world, physical and virtual urban games are being designed trying to demonstrate these hypotheses, but how can the gamification of urban mobility experiences become effective and valuable?
MUV (Mobility Urban Values) is a research and innovation action (2017-2020) based on an experimental approach to gamification applied in six European neighborhoods (Buitenveldert in Amsterdam, Sant Andreu in Barcelona, the historic district of the Portuguese county of Fundao, Muide-Meulestede in the harbour of Ghent, the new area of Jätkäsaari in Helsinki, and the Centro Storico in Palermo). The MUV project [12] is a concrete critical attempt to experiment with a gamification approach, with the main objective to engage people to promote a shift towards more sustainable and healthy urban mobility. MUV engagement strategy of co-creation and co-design [13] is based on urban governance and participatory design theories [14,15] that aim at triple-loop learning [16]. Public involvement of citizens and policy actors occurs progressively through thin and thick levels of participation [17] and serves in the co-creation of game communities and in the codesign of game solutions that aim to enhance interaction and to transform urban policy through a 'conversational planning' among a variety of actors (from communities to public authorities and vice versa).
MUV method of co-creation and co-design develops socially through a direct dialogue with the local communities and stakeholders, and technically through a mobile app, a network of air monitoring stations and dashboards, all designed through participatory and collaborative methods.
The effort to raise individual and collective awareness on car-dependency and air pollution is aimed to provide an enjoyable experience of the game for citizens with returns on an evidence-based approach to urban policies inspired by people-centered mobility data collected. The active involvement of citizens and other local stakeholders can consequently result in the long term, more efficient, and cost-effective urban planning processes, while achieving global sustainability goals.
The MUV app enables an activity-based game, previously discussed in [18], through a metaphor of sporty narrative that connects diverse kind of users: (i) citizens as MUVers play as athletes to get rewards for their sustainable mobility choices, i.e., walking, biking, and using public transportation, (ii) public authorities as trainers provide training sessions to coach athletes to improve their sustainable mobility skills, and (iii) local business communities, as sponsors, have the opportunity to promote their brand and their products through the athletes' best achievements. The MUV app ( Figure 1) connects all these users when MUVers press a start button choosing their sustainable transport mode at the beginning of their journey, and press it again when they arrive at the destination. As a result of playing MUV, spatial-temporal mobility data of inestimable value are collected and used for impact assessment purpose and for feeding mobility planners and the processes they can follow to design new mobility policies. One of the main objectives for MUV in respect to the SDGs is also to provide a sensitive impact structure that will validate the MUV assessment method. The impact assessment method is structured through a systemic measure of the SDG goals starting from the tracking data collected. MUV impact assessment is based on the following research questions: MUV assessment method is framed in this research paper on the systemic evaluation structure of MUV impacts, according to the diverse spheres of sustainability.

Materials and Methods
The impact assessment framework has been developed to validate and measure the added value of the MUV experimental gamification approach under the perspective to structure a method that will become both scalable and replicable in other contexts, thus ensuring an ease acquisition of the whole evaluation approach from other cities.

The Impact Assessment Framework
In accordance with CIVITAS SATELLITE approach [19], the MUV assessment framework covers not only impacts (impact evaluation) but also the process evaluation. The latter's final aim is to understand the barriers to MUV implementation, the actions to overcome such barriers, and the drivers to leverage on. The process evaluation is, therefore, synergic to impact assessment and instrumental in answering the research questions set above.
MUV's assessment approach, as based on several steps, develops a method suitable for impacts and process assessment in urban contexts, accounting for their peculiarities of places and spaces, but, at the same time, allowing comparability of the results in the six pilots (2017-2020). In order to obtain a transparent and correct understanding of the impact and the measure, it is necessary that the evaluation in each individual city/neighborhood follows the same guidelines of evaluation, especially: • the indicators for measuring the MUV impacts have been selected to be comparable in all the pilot cities. This selection does not prevent cities having their own additional local indicators important for the impact assessment (outside the scope of the project), but only the proposed set of impact indicators guarantees consistency in all the cities; • the methods of measurement of indicators in cities are aligned, allowing to reveal differences in results.
Special attention is paid to the identification of a set of indicators for impact evaluation (see Section 2.3 for details) according to well-grounded guiding principles. These principles are focused on (i) the transformation of MUV objectives from general indicators to comprehensive indicators, (ii) the coverage of the indicators are thought in relation to meaningful impacts derived from the MUV experimental gamification, (iii) the selection of indicators focus both on the availability of existing indicators and data, and deriving indicators from the MUV's objectives themselves.
Since indicators are measured to orient progress toward goals, an overall guiding principle is that suitable indicators have been selected by capturing the essence of MUV objectives. This leaves aside the fact that some indicators might be available/already in use for a targeted phenomenon or not. As a matter of fact, as detailed in [20], it is essential to link the indicators to the objectives with future monitoring and evaluation activities; without clear objectives, it is not possible to monitor and evaluate whether an innovative action is on track.
With this in mind, the following indicators for impact evaluation have been selected according to the following MUV's objectives: • OBJ 1: Sustainable urban mobility/new mobility culture: MUV promotes a shift towards more sustainable mobility in urban contexts; individual choices are at the core of impact evaluation on behavioral change approach to reduce urban vehicle traffic; • OBJ 2: Better health and environment: MUV raises citizens' awareness on the quality of the urban environment and promotes healthier mobility choices, leading to a better environment; • OBJ 3: Evidence-based and human-centered urban mobility planning: MUV promotes the integration of people and personal mobility data into urban policy-making and planning processes at the neighborhood level; • OBJ 4: Foster local development: MUV is likely to generate positive spillover effects on the whole neighborhood and surroundings, even at the city level, involving local businesses and stimulating an innovative environment.
Moreover, the proposed assessment framework envisages the following four impact areas, that are well-grounded in the literature of impact evaluation in smart cities [19]: • IA-1 Society-People: it refers to the effects of the measure on the citizens living in the neighborhood and in the city, in terms of acceptability, mobility habits, perceived wellbeing, and new opportunities at the community level. • IA-2 Society-Governance: it refers to the effects of the measure on the way society is organized in terms of governance, e.g., planning and urban mobility policies. • IA-3 Economy: it focuses on the effectiveness and/or benefits derived from the measure in relation to the costs associated with its preparation, implementation, and operation, together with the economic spillover effects deriving from MUV implementation in the local development. • IA-4 Environment: it relates to the effects on the environment of reducing the use of private motorized transport, thanks to the measure, covering both polluting emissions and energy consumption.
Making use of the relationships between objectives and impact areas is the guiding principle to define the impact indicators that will guarantee that the resulting set of indicators will measure the effects meaningful to MUV and that they will cover different perspectives of the same result.
An overview table providing the list of impact indicators for each impact area and sub-area is shown in Table 2. The whole assessment framework is subject to a continuous review and refinement along the project lifetime in order to assess the feasibility of the baseline computation and of successive monitoring and evaluation; this flexible approach ensures a regular check and a continuous adjustment of the framework, catching new data opportunities, new trends at the societal level, and new policy and planning objectives of the six administrations.

Impact Evaluation
MUV impact evaluation is prospective, since it has been developed at the same time as MUV action has been designed, and baseline data have been collected prior to the implementation. Prospective evaluations have the best chance to generate valid counterfactuals since, at the design stage, alternative ways to estimate a valid counterfactual can be considered. The resulting impact evaluation is, thus, more likely to produce strong and credible evaluation results.
The impact evaluation follows a before-and-after comparison, as in Figure 2. The before-andafter comparison attempts to establish the impact of MUV by tracking changes in outcomes for program participants over time. The 'before' situation is described by the baseline (see Section 2.4). Since the impact is the difference in outcomes for the same individual with and without participation in MUV, and since it is impossible to measure the same person in two different states at the same time (at any given moment in time, an individual either participated in MUV or did not), we encounter the so-called "counterfactual problem": how do we measure what would have happened in the case of MUV absence? Although we can observe and measure the outcome for MUV participants, there is no data to establish what their outcomes would have been in the absence of MUV, that is the counterfactual.
Information about the counterfactual (i.e., what the outcome would have been in the six neighborhoods in case MUV action has not been yet fully implemented) is necessary in order to isolate the MUV impacts from the observed changes. Due to practical constraints, the counterfactual estimation in MUV is without a control group and envisages two different possibilities that will be alternatively selected depending on each impact indicator: A. constructing together with the involved stakeholders (e.g., pilot managers, mobility experts) a reference scenario, that, starting from the same baseline, could provide us with a likely counterfactual, or B. using the pre-MUV values at the baseline (t0) to estimate the post-MUV outcomes (counterfeit counterfactual). A slight adjustment of the second option has been introduced as a third alternative (option C) for the indicators related to the kms traveled on frequent routes (see Section 2.3 for more details about these indicators). In this latter case, the counterfactual estimate is provided by the travel behavior of all the pilot's players on their most frequent routes as provided during their registrations to the MUV app (thus, not necessarily at t0). In this way, any behavioral change in their mobility patterns can be estimated by the difference between the observed behaviors and the counterfactual (i.e., the travel behavior they have declared when registering their frequent routes). Table 1 summarizes the possible options for MUV impact indicators' counterfactual estimate; the choice made for the counterfactual estimate of each impact indicator is shown in the dedicated column of Table A1.

Type of counterfactual estimate
A Constructing a reference scenario together with the involved stakeholders (e.g., pilot managers, mobility experts, local decision makers).

B
Using the pre-MUV values at the baseline (t0) to estimate the post-MUV outcomes (counterfeit counterfactual).
C Using as reference scenario the travel behavior's information provided by the player during the registration of each frequent route (not necessarily at t0).
During the monitoring phase, all the impact indicators defined in Table 2 are being collected and analyzed building on a defined monthly monitoring plan, allowing the evaluators to adjust and refine the impact assessment framework according to data and evidence collected.
Finally, in the 'after' evaluation, the impacts will be evaluated at the end of MUV action, by comparing outcomes with the counterfactual.

Impact and Context Indicators
The proposed assessment framework envisages two types of indicators: impact indicators (measuring the impacts generated by the action) and context indicators (providing information describing the geographical context of interest, i.e., the neighborhood/the city). Concerning the process evaluation, such assessment does not envisage process indicators for now, but it deals with a qualitative assessment of the whole process of implementation, investigating the enabling and inhibiting factors.
We are interested in impact indicators, rather than performance indicators, since-in order to answer to the above-mentioned research questions-the focus should be on to what extent a specific initiative, i.e., MUV, has had an impact on different aspects (e.g., society, economy, environment). Since there is not an ideal target of performance to be achieved, targets will not be defined for MUV impact indicators.
As far as the temporal classification of results is concerned, i.e., the results chain, the decision is to merge results in the unique category 'impact'. The choice of aggregating the chains of causality is due to different reasons [21]: (i) after detailing the chain's structure, it becomes evident that some chains (considered as minor) could be deleted; (ii) to be practical, the number of categories should not be excessive; (iii) a temporal classification of the results would have added greater complexity to the assessment framework. We do know that output and outcome indicators could be affected in the project lifetime, while impact indicators are likely to be affected after the project has been implemented and is in full use, which might take a few years. Nevertheless, (long term) impacts are included among the MUV indicators since they are a fundamental measure for reaching the project objectives, making it clear how progress toward strategic objectives will be assessed. Thus, from now on, we have referred to impact indicators as indicators of short, medium, and long term effects generated by the measure.
Various institutes and authorities have developed mobility indicators. Even though consensus on meeting the 'triple bottom line' exists (i.e., environmental, social, and economic sustainability), yet different indicator sets have been used to evaluate mobility measures in an urban context [22,23]. MUV impact indicators come from different initiatives, such as CIVITAS [19], CITYKeys [22], and TrafficO2 [24]; whenever necessary, tailor-made indicators have been designed. Regarding the data source of such indicators, Figure 3 summarizes the main data sources for MUV impact indicators: the MUV app, pilot managers, MUV monitoring stations, and local decision makers.  Table 2 presents the set of indicators, that is detailed in Table A1, including a special column aimed at specifying the data source of each indicator. The impact indicators have been classified in the four impact areas introduced in the impact assessment framework (see Section 2.1); moreover, some impact sub-areas have been identified to better organize the set of indicators. Since the method aims to measure the impacts on the whole system (i.e., city/neighborhood), the impact indicators do not focus on a single individual. Obviously, the computation of some indicators need individual's data (e.g., the indicators whose data source is the app), but their final value indicates an impact of the action on the neighborhood/city. Many impact indicators whose data source is the MUV app relate to the kms traveled on frequent routes by the players of each pilot. The choice of computing such indicators only in relation to the frequent routes traveled relies on the fact that a real behavioral change will occur if the MUVers will change their daily mobility behaviors (i.e., the ones on their frequent routes), and not if they are only occasionally sustainable. Consider, for instance, an employee going to the workplace from Monday to Friday in his private car. Then, suppose on the weekend, he goes jogging and has an occasional bike ride, thus accumulating a lot of points on the MUV app. Not just because for this reason, his mobility habits could be considered 'sustainable'. A real change in his mobility habits will rather be seen when he changes his mode of transport to go every day to the workplace, leaving his car at home and going to work, for example, by riding a bike.
Where possible, the choice has been to define ratio indicators, that are measurement units normalized to facilitate comparisons (e.g., per-year, per-capita, per-mile, per-trip, per-vehicle-year). Some indicators have been proxied by the app's data, even though other indicators would have been more appropriate in case they were available. During the continuous update of indicators, some indicators can be added or modified whether new data sources become available.
Together with the impact indicators, also the set of context indicators has been developed. Context indicators provide information describing the geographical context of interest (the neighborhood/the city), and they are able to grasp the socio-demographic peculiar features of each neighborhood. They are introduced to facilitate the understanding of the neighborhood's situation and, thus, of its impact evaluation. According to European Commission [25], context indicators usually deal with economic and financial fields (e.g., GDP, trade flows), social fields (e.g., demography, occupation, gender), and specific important sectors (e.g., education, health, environment). In the MUV case, special emphasis is put on transport and mobility.
The classification of MUV context indicators reflects this taxonomy, covering the following four areas (Table 3): socio-economic (C1-C19), transport (C20-C59), environment (C60-C61), and institutional (C62-C66). MUV context indicators have been collected in a collaborative way, directly involving the pilot coordinators both in the choice of indicators and in the collection of the data. Table  A2 details the context indicators that have been collected in each MUV pilot. For each context indicator (row), its geographic level is provided (city/neighborhood/...) together with the corresponding data source. The involvement of the local stakeholders and of the pilot managers during this phase has been crucial, and we expect a continuous involvement of such actors during the evaluation so that the impact assessment results can help policy and decision makers in understanding the real impacts of the mobility measure.
Since, due to their nature, such indicators are not likely to drastically change during the project lifetime, an update of the values reported for each pilot in Table A3 will be performed-if necessaryduring the project lifetime.

The Baseline
The MUV baseline defines the situation before MUV is implemented in each neighborhood. Operatively, the MUV baseline consists of a set of values for each impact indicator in each neighborhood before MUV comes into force. Appropriate baseline data is always critical for impact evaluation, as it is impossible to measure changes without reliable data on the situation before the innovative action begins.
One remark should be made for the baseline of the impact indicators computed from the app data related to the kms traveled for different transport mode by MUV players on average in each neighborhood (in IA1 'Society-people' and IA-4 'Environment'). The baseline for such indicators is provided by data collected during the registration procedure of the players in the MUV app, during which the following information is required (compulsorily): • the modal split of each user on his/her most frequent route(s); • the length of such route(s); • the number of times per week he/she travels this route(s).
Furthermore, for the impact indicators in IA-4 'Environment', further information is asked to the player to estimate his/her personal contribution to air emissions (i.e., CO2, CO, NOx, PMs): • in the case the player uses a car on the registered frequent routes, the kind of car she/he generally uses (segment: mini/small/medium/large; fuel: petrol/diesel/petrol hybrids/LPG/CNG/electric; EURO standard); then, COPERT model [26] is used to estimate the corresponding emissions in her/his urban context; • in the case the player uses a motorbike on the frequent routes, the kind of motorbike she/he usually drives (engine: 2 stroke/4 stroke; cubic capacity: <50 cm 3 /51-250 cm 3 /251-750 cm 3 />750 cm 3 ; EURO standard); then, COPERT model [26] is used to estimate the corresponding emissions in her/his urban context.
The baseline values herewith presented (Table A1) are based on the data collected in each pilot up to 31 December 2018, considering that the app has been officially launched three months before.
The missing values (NAs) in the baseline are originated by the fact that the monitoring stations envisaged by the project have not been installed yet, and thus environmental data coming from such stations have not been collected, yet.

Process Evaluation
As introduced in Section 2.1, while the impact evaluation is aimed to quantify the added value of MUV action in the six neighborhoods along specific impact dimensions, the process evaluationperformed side by side with the impact evaluation-ensures a real understanding of the role the measure can have in a sustainable mobility strategy and provides insight into which elements are crucial to observed impact. MUV process evaluation is a qualitative analysis that deals with the evaluation of the activities of planning, implementation, and operation of MUV, in order to understand whether and why MUV has succeeded or failed in each of the six neighborhoods, with the final aim of understanding the barriers to MUV implementation, the strategies to overcome such barriers, and the drivers to leverage on. Together with the results of the impact evaluation, the outcome of the process evaluation will be the basis for the recommendations for other European cities interested in joining the sustainable mobility action.
The involvement of local stakeholders and pilot managers is crucial in the process evaluation, thanks to their thorough knowledge of the local context. Accurate process evaluation tables have been defined by the evaluators and are being collaboratively filled in by the pilot managers, in order to have a comprehensive understanding of the MUV process in each pilot considering each neighborhood's peculiarities. A monitoring plan has been defined consisting of three measurements, once every about six months, after the app introduction. Figure 4 shows an illustrative template table that pilot managers have to fill in during the ongoing process evaluation: for each step (activation of the community and management of the community), enabling and inhibiting factors are investigated per each stakeholder in terms of drivers, barriers, and lessons learned to be turned into action at the following iteration. Section 3 provides some preliminary results of MUV process evaluation; a complete assessment will be performed at the end of the action, so as to provide recommendations useful for other cities and/or for other sustainable mobility actions.

Results
In the previous section, the general approach to evaluate a sustainable mobility measure has been discussed, using the case of MUV action as an example of application. In this section, the preliminary results are discussed, in terms of a set of impact indicators, criteria for their measurement, baseline, and counterfactual estimate. Since the monitoring of the action is still ongoing, the (ex-post) impact evaluation results cannot be presented yet, and they will be the object of future works by the authors.
Regarding the context indicators, many indicators have been requested at a neighborhood level, but-due to the unavailability of data at that geographical level-in some cases, they have been collected at a higher level (e.g., city). Not all requested data was available in each pilot; thus, some missing values are present (labeled as 'unknown'). Some remarks could be made, observing some relevant context indicators in the different pilots (Table A3): • the six neighborhoods have a similar age structure of residents (C7, C8), with Ghent and Palermo having a slightly higher percentage of the young population than the other neighborhoods; • the average available income, measured as the disposable per-capita income of private households, is expressed in PPS (Purchasing Power Standard) to make the figures comparable between regions ( Figure 5). The differences in such indicator (C9) are quite relevant, varying from 11,600 PPS in Fundao to 18,900 PPS in Ghent; • the social structure is different also in employment terms (C16), going from 3.4% of unemployment rate in Ghent to 21.5% of unemployment rate in Palermo ( Figure 5); • the rate of smartphone ownership (C17) will be an important context indicator to take into account during evaluation activities, since using a smartphone constitutes a prerequisite to installing the MUV app. Apart from Fundao (67%), this rate is greater than 70% in all the pilots, with a peak equal to 87% in Amsterdam and Barcelona ( Figure 6); • even though the smartphone ownership rate is quite aligned with the other pilots, Palermo constitutes an exception for the use of the internet on the move (C18, C19). Only 72% of individuals aged 16-24 use mobile devices via mobile or wireless connection in Palermo, against over 92% of the other sites ( Figure 6). The difference is even more evident getting older, with 11% of individuals aged 55-74 using the internet on the move, against over 50% of the other pilots (with the exception of Fundao); • considering the road safety dimension (C55-C59), the numbers show a huge difference in people killed in road accidents in the six sites ( Figure 8). Helsinki is the safest city among the pilots (0.048 people killed per 10,000 population), while Fundao is the least secure one (1.07 people killed per 10,000 population). Unfortunately, for many pilots, it was not possible to collect information from the pilot managers about the quantity and type of road accidents (e.g., carcar/car-bike/car-people); • as shown in Figure 9, the six cities are very different for their weather conditions (C60, C61), that could constitute a relevant factor in the mobility choice of each individual (e.g., walk, bike). Palermo is the hottest pilot and among the least rainy cities (average annual temperature: 18 °C; the average number of days with precipitation in the year: 74.3). In Ghent, it rains two days out of three (221 days of precipitation in one year), while Helsinki is the coldest city (average annual temperature: 5 °C). Regarding impact indicators, Table A1 provides an overview of MUV impact indicators, together with their updated baseline values and the type of counterfactual estimate established for each indicator (for the counterfactual legend, please refer to Table 1). More detailed indicator definition sheets have been developed to serve as practical information and use guidelines for each impact indicator.
A Python code has been written for the computation of impact indicators, in order to be integrated within the architecture framework. The implemented Python code is composed of three modules. The first module returns, for each most frequent route registered by a player, a) the latitude and the longitude of the origin and of the destination; b) the weekly distance traveled [km] and travel time [minutes] in each transport mode (walk, bike, public transport, carpooling, car, motorbike) on that route, that will be subsequently used as counterfactual (reference scenario), in accordance with the frequency and the modal split declared by the player when registering the frequent route. The second module computes the weekly value of the basic indicators that are related to kms traveled each week on frequent routes (e.g., km traveled by car in each pilot). This code implements an algorithm that estimates the kms traveled in each transport mode by each player on the registered frequent routes. As a matter of fact, only sustainable transport modes can be registered through the MUV app (i.e., walk, bike, public transport), and it is, thus, necessary to estimate the kms traveled by each player in private car and/or motorbike on these routes. Furthermore, this algorithm provides some adjustments to make up for forgotten tracked routes. Such estimates are based on the information provided by the user while registering each most frequent route. The last module is the code running every month (according to a defined monitoring plan) that returns the (monthly) values of impact indicators. In this code, also the counterfactual estimate is performed for the indicators related to the kms traveled on frequent routes in the pilots.
Regarding the baseline presented in Table A1, the authors are absolutely aware that the users of the MUV app are not a representative sample of the target population, but some important considerations emerge. An indicative example could be Palermo's indicator IA1.S2.3 "Modal split", defined as the percentage of kms traveled using each transport mode (private car, walk, bike, public transport, motorbike, carpooling) on frequent routes by Palermo's players. Figure 10 shows the great difference between the modal split of Palermo provided by the city administration (representative of the city) and information collected by Palermo players' mobility habits during the registration phase to MUV app (baseline). As shown in Figure 10, people providing the baseline for MUV have very different mobility habits with respect to Palermo population, at least on frequent routes. As a matter of fact, Palermo's MUV players exhibit high sustainable mobility habits (IA1.S2.1 "Sustainable mobility habits" = 53.97%), and the use of not-active transport is very low (car = 32.22%; moto = 11.05%; carpooling = 2.76%) compared to Palermo in general (private transport = 78%). This factor confirms that, based on the data currently available, MUV players are likely to be more sensitive to the sustainable mobility issues, and therefore constitute a sample slightly deviated from the average population of the city.
As far as the process evaluation is concerned, the first preliminary shortcomings relate to the step of activation of the neighborhood community, since the management of the community itself has not been investigated yet due to the timings of the monitoring plan. In this regard, Figure 11 summarizes the most prominent results of the first monitoring in the pilots. As depicted in Figure 11, the success factors (i.e., the drivers) that cross almost all the pilots cover the social and political/institutional sphere, showing, on the one hand, the great relevance of effective communication activities to make people aware and involved in the sustainable mobility initiative, and, on the other, the importance of a massive involvement of the municipality to successfully engage the community. On the contrary, currently the barriers to the implementation of the initiative are mainly technological: the users declare that the app enrolment is too long and too complicated, too many bugs have been detected, and the 'yet-another-app' syndrome is difficult to overcome. These factors have led many users to a feeling of frustration, which potentially takes them away from the community. Thanks also to the surveys carried out within the process evaluation, during the first months after the launch of the MUV app has undergone numerous releases to make some improvements to the app enrolment and to fix some detected bugs, precisely to meet some questions raised by the users. Furthermore, each pilot has defined some priority takeaways to be turned into action in the following iteration, in accordance with the strong points and the peculiarities of each local context.
The monitoring of the MUV process and impact indicators is ongoing, and future works will deal with the results of ex-post impact and process evaluation within the framework described here.

Discussion
MUV evaluation approach, hereby described, effectively addresses and potentially provides answers to the questions raised at the end of Section 1 for diverse practical reasons.
The first is that the overarching framework for evaluation accounts both for impact and process evaluation, thus, leading to a both qualitative and quantitative assessment that attempts to capture the quantification of the outcomes (e.g., reduction of urban vehicle traffic), as well as the understandings related to the implementation of the solution (e.g., how supporting activities could affect MUV outcomes). Moreover, the relationships between MUV objectives and impact areas (Society-People, Society-Governance, Economy, and Environment) are used as a guiding principle to define the impact indicators. This results in a set of indicators that are meaningful for urban contexts to measure experimental gamification issues and suitable measures on the effects that are selected to be relevant. For example, in MUV, these outcomes generate data that will potentially cover different perspectives of the same result. Then, continuous involvement of the local communities during the evaluation process opens a possibility for co-monitoring with local stakeholders on the impact assessment to respond to their expectations. Thus, MUV impact evaluation results can lead to concrete and effective insights for urban policy and decision makers and urban mobility planners. Furthermore, the users' mobility data and environmental conditions collected through MUV initiative enable more informed policy decisions for urban futures.
The MUV context indicators show how the six pilots are inherently different, and, thus, indicate a likely difference in response to MUV initiative. Such considerations about context indicators will be qualitatively used by impact evaluators to interpret and motivate outcomes that will be measured in each pilot. In this way, it would be possible to include also ongoing mobility initiatives (and/or infrastructure changes) within the neighborhood and the city (context indicators C65, C66). (context indicators C65, C66). The combination of the baseline and the context indicators provides an overall picture of the situation in each pilot before the MUV initiatives come into action. Such a picture can frame the data 'before' the implementation of MUV impact evaluation; at the end of monitoring, MUV outcomes can be quantified and, thus, impacts will be evaluated.
The set of 40 MUV impact indicators ( Table 2) has been pursued also to be utilized during the MUV co-creation process to support both the pilot managers and the policy-makers in each pilot city towards a 'conversational planning' with the citizens.
As detailed in the previous sections, the evaluation approach herewith described is scalable and replicable, to allow consistency and comparability among pilot cities. Several other European cities have been already chosen to join MUV initiative. The method illustrated here is meant to be applied to other urban contexts to reflect on the real effects of these experimental gamification experiences with an aim to feed a sustainable movement for the future.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
This appendix contains details about MUV impact indicators (Table A1) and MUV context indicators (Table A2) in the six neighborhoods.
The impact indicators codes of Table A1 refer to Table 2, while the counterfactual estimate column refers to the legend provided in Table 1. The context indicators codes of Table A2 refer to  Table 3.