Promoting Crowdsourcing for Urban Research: Cycling Safety Citizen Science in Four Cities

: People generate massive volumes of data on the Internet about cities. Researchers may engage these crowds to ﬁll data gaps and better understand and inform planning decisions. Crowdsourced tools for data collection must be supported by outreach; however, researchers typically have limited experience with marketing and promotion. Our goal is to provide guidance on effective promotion strategies. We evaluated promotion efforts for BikeMaps.org, a crowdsourced tool for cycling collisions, near misses, hazards, and thefts. We analyzed website use (sessions) and incidents reported, and how they related to promotion medium (social, traditional news, or in-person), intended audience (cyclists or general), and community context (cycling mode share, cycling facilities, and a survey in the broader community). We compared four Canadian cities, three with active promotion, and one without, over eight months. High-use events were identiﬁed in time periods with above average web sessions. We found that promotion was essential for use of the project. Targeting cycling speciﬁc audiences resulted in more data submitted, while targeting general audiences resulted in greater age and gender diversity. We encourage researchers to use tools to monitor and adapt to promotion medium, audience, and community context. Strategic promotion may help achieve more diverse representation in crowdsourced data.


Introduction
Cities are ideal environments for crowdsourcing geographic information. Within cities, there is access to digital tools (e.g., cellular data networks) and people with expertise and motivation to contribute [1]. Data generated through crowdsourcing can help our understanding of travel behaviour [2], inventory the built environment [3], monitor and identify improvement opportunities [4], and potentially improve public perceptions of new infrastructure projects [5,6]. Within cities, crowdsourced data cover a diverse range sources and topics, including, but not limited to city-launched applications for civic-services [7], researcher-and citizen-collaborations to monitor and address public health concerns [8], and widespread contextual geographic information about current events in social media [9]. In particular, crowdsourced tools are providing solutions to a lack of active transportation and mobility data. For example, global positioning systems (GPS) data collected from fitness applications can document bicycle and foot trips [7], information from bike share stations can

Promotion
BikeMaps.org was launched in Victoria, BC in October 2014 and promotional strategies were originally developed in that city. The approaches take several forms: in person at events, guerilla marketing, earned traditional media, and social media. An example of promotion in-person at events is Bike to Work Week, where local bike advocacy groups organize booths from local organizations, and usually, free food or other services (e.g., bike repairs) where cyclists congregate. The BikeMaps.org team coordinates or partners with these organizations to set up a table, distribute promotional material (branded water bottles, stickers, saddle rain covers, and pamphlets, etc.), and attend the booth to answer questions and talk with participants. Guerrilla marketing tactics include low-cost and unconventional marketing approaches. For BikeMaps.org, these have entailed distributing branded water bottles and saddle rain covers on parked bikes. Earned traditional media is associated with articles by journalists in local newspapers. For BikeMaps.org, these usually follow press releases, in this case associated with the launch of promotion activities in the community, but can also occur with the release of BikeMaps.org data products or by providing data to inform local interest. Finally, social media, including Facebook (http://Facebook.com) and Twitter (http://twitter.com) are used to engage with individuals, and cycling organizations in BikeMaps.org cities. Since day-to-day social media use is frequent and ongoing, in this study we focused on notable social media events outside of day-to-day use, mostly by outside organizations.
Over the course of the study period, marketing approaches were used in the three intervention cities. In this paper, we considered all promotion events led by the BikeMaps.org team and all external media that we were aware of. We investigated for outside promotion during all periods of above average website use. For each promotion event we recorded: the city, the date, the medium (in-person, social, or traditional news), and the intended audience (people whose primary interest was cycling, or more general audiences).

Web Sessions and Users
Information about web sessions and users at the BikeMaps.org website was obtained from Google Analytics (http://analytics.google.com). The number of sessions and unique users on a daily basis were queried by city. Sessions are defined by Google as a series of temporally contiguous and meaningful interactions with a website (i.e., connections where no interaction takes place do not count) (Google Analytics 2017). Users are tracked based on Google user accounts and web browser metadata (Google Analytics 2017). Demographic information for users is obtained from voluntary social media information, where available, or profiling and classification based on web activity, where not available (Google Analytics 2017). Demographic information for users were queried by city over the entire study duration. Data were obtained using R Version 3.3.2 (R Foundation for Statistical Computing, Vienna, Austria) and the package RGoogleAnalytics Version 0.1-5 (https: //cran.r-project.org/web/packages/RGoogleAnalytics/index.html).

Incident Reporting
All incidents were extracted from the BikeMaps.org database for the CDs over the time period. This included the time and location, type of incident, health or ridership impacts, optional demographic information, and open-ended text descriptions. Spatial analyses were completed using R Version 3.3.2 and the package rgdal Version 1.2-5 (https://cran.r-project.org/web/packages/rgdal/index.html).

High-Use Events
To allow comparison between promotion efforts, sessions, and incidents reported, we used web sessions to identify high-use events. Based on visual evaluation of web sessions, we developed the following definition of high-use events: starting on the first day with more than twice the mean number of daily sessions, and lasting until the first day that the number of daily sessions returns below the mean for at least two consecutive days (this accounts for the observation that sessions declined on mid-week statutory holidays and then resumed after). Promotion events, web sessions, and incident reporting were attributed to high-use events based on concurrent timing. All analyses were completed using R Version 3.3.2.

Mode Share
Mode share was obtained from Statistics Canada 2011 National Household Survey for journey to work mode share by city [24]. These data represent the proportion of workers using each mode of travel for most trips to or from work. The data do not sum to 100% because of workers who do not commute or use other means of travel.

Cycling Facilities
Cycling facilities were acquired from OpenStreetMap (OSM). OSM is a crowdsourced project to create and maintain global street mapping data [3]. OSM features are assigned tags to store attributes. Queries were written to interpret tags related to bicycle facilities for OSM ways (line features) ( Table 1). Features were identified as separated bike lane, painted bike lane, shared street bikeway (shared with automobiles), or multi-use trail (shared with pedestrians). We chose to use OSM because of the ability to obtain data for all areas from a single source. We intended for this dataset to be indicative of the nature of local cycling facilities, rather than an exact description, as there may be slight deviations in coding, boundaries, local definitions, completeness between cities [25], and the results depend on the specific queries used. Queries were run using R Version 3.3.2 and the package overpass Version 0.2.0.9 (https://github.com/hrbrmstr/overpass), data were downloaded in XML format, and using the package rgdal Version 1.2-5, projected to Statistics Canada Lambert Conformal Conic Projection, clipped to CD boundaries, and the distance of each feature was calculated in kilometers. The data were acquired 4 May 2017.

Questionnaires
The survey was designed by the Traffic Injury Research Foundation (TIRF) and fielded by Nielsen using Harris Panel participants, including third-party panel providers. Panelists were invited to participate by email between 17 October 2016 and 31 October 2016. Invites were sent proportionately to the general Canadian population and the final results were weighted to represent the general population of the targeted cities based on Statistic Canada's population counts in the 2011 Canadian census. The overall response rate was 22%. Panel members were rewarded for their participation with points that could be exchanged for merchandise. Previous studies by members of our research group have used and validated this recruitment approach and the representativeness of the resulting samples in transportation research [26,27].
For this study, we selected questionnaire items to report on attitudes about cycling safety, cycling infrastructure, barriers to cycling for non-cyclists, and what would need to change to start cycling. For the discrete questions, we used a chi-squared test of proportions to look at differences in these outcomes across cities. The null hypothesis is that the proportions for each response were equal for all cities, and the alternate hypothesis that the proportions were not equal. For the open-ended questions "I do not ride a bicycle because" and "For me to ride a bicycle, the following would need to change", words in response were stemmed to their root word, and stop words and words with ambiguous meanings were removed using the R package tm Version 0.7-1 (text mining) (https://cran. r-project.org/web/packages/tm/index.html) and NLP Version 0.1-10 (natural language processing) (https://cran.r-project.org/web/packages/NLP/index.html). Items mentioned more than three times were tallied into the following themes: (1) physical ability (2) safety (3) the built environment (e.g., bike facilities) (4) convenience (e.g., too far to ride, or need to use car for job) (5) the natural environment (e.g., hills or weather) (6) access to a bicycle and (7) social (only observed for the question about what would need to change to ride a bicycle; e.g., "respect" or "education"), and (8) other. A chi-squared test of multiple proportions was completed with the null hypothesis that the city samples were drawn from populations with the same distribution and the alternative hypothesis that the city samples were drawn from populations with differing distributions of responses. All processing was completed using R Version 3.3.2.

Promotion
In the study cities, the most promotion events were in Edmonton (12) followed by Victoria (11), Ottawa (6). One event happened in Kelowna, unrelated to the team, where a social media posting by an outdoor retailer with national popularity resulted in a rise in web sessions in all cities (Tables 2 and 3). In-person promotion was most frequent (18 events), followed by social media (8), and print (4). The majority of the promotion events targeted cyclists (20), rather than general audiences (10).

BikeMaps.org Use
Peaks in the number of web sessions coincided with promotion events (Figure 2). Incidents reported also had peaks coinciding with promotion events, but were more sustained over time  Figure 1). Ottawa had the highest peaks in web sessions, while in Victoria, web sessions were more ongoing. There were more web sessions in the spring through the fall than the winter. In Edmonton, web sessions and incidents reported coincided with the earlier promotion events, were lower for later promotion events, and were very low when promotion did not occur. Considering the ratio of unique visitors to total website sessions, Victoria had more repeat users, while Ottawa, Edmonton, and Kelowna were closer to a 1:1 ratio (many unique visitors) ( Table 4). Other than Kelowna, all cities had the majority of use by males, with similar proportions by gender. Also, for all cities, the majority of website users were greater than 35 years of age.  Table 1 for colour key. Note longer y-axis for sitewide incidents.   Table 1 for colour key. Note longer y-axis for site-wide incidents.
promotion events, and were very low when promotion did not occur. Considering the ratio of unique visitors to total website sessions, Victoria had more repeat users, while Ottawa, Edmonton, and Kelowna were closer to a 1:1 ratio (many unique visitors) ( Table 4). Other than Kelowna, all cities had the majority of use by males, with similar proportions by gender. Also, for all cities, the majority of website users were greater than 35 years of age.  Table 1 for colour key. Note longer y-axis for site-wide sessions.  Table 1 for colour key. Note longer y-axis for site-wide sessions. In Victoria, more people viewed the website without submitting data, while in Ottawa it was more common to actively partake in submitting data, and this was indicated by the ratio of website views to incidents mapped ( Table 5). The response rates for complete age and gender reporting was consistent across cities. The median age for incidents with complete gender and age information was Urban Sci. 2017, 1, 21 9 of 17 higher in Victoria and Kelowna than Ottawa and Edmonton. Finally, the percent of incidents reported by people over 35 years of age was similar to the web sessions for Edmonton, Kelowna, and Ottawa, while a higher percentage of people over 35 reported incidents than viewed the webpage in Victoria. The people who submitted incidents in Victoria over this time period were older than the previous data, where 35 was the approximate median age [28].

High-Use Events
Fifty-three high-use events were identified, and there were differences in reported age and gender depending on the medium and audience (Tables 6 and A1). Often there were multiple coincident promotions, or at other times, rises in website traffic occurred without promotion. In general, traditional print media corresponded with incidents reported by people with higher median ages and a high proportion of males; events that targeted cyclists corresponded with incidents with lower median ages and males (notably, the Edmonton Bike Club had higher female participation); and social media posts by outdoor retailers with higher median ages and females, though active participation rates (i.e., ratio of views to incidents reported) was lower than other media. Spontaneous high-use events tended to occur in cities where previous BikeMap.org use had occurred, during peak times for cycle commuting (i.e., in the spring and late summer in Victoria, and in the late summer in Ottawa). Since high-use events were identified relative to normal use, this measure was less useful in Kelowna due to low overall use.

Mode Share
Victoria had the highest active transportation mode share (bicycling and walking), approximately double that of Ottawa and Kelowna, and nearly six times that of Edmonton (Table 7). Ottawa had much higher public transit mode share, and motorized vehicle use was highest in Kelowna.

Bicycle Facilities
Edmonton had very-few on-street bicycle facilities, while they had abundant multi-use trails ( Table 8).
The distance estimated in this project was larger than reported by the city (160 km-from https://www.edmonton.ca/activities_parks_recreation/parks_rivervalley/trailsystem.aspx), because the definitions used in this study included unmaintained trails, neighbouring communities, and other types of urban paths shared between cyclists and pedestrians. Nonetheless, these figures were indicative of the large population in the CD, very few on-street bike facilities, and many multi-use trails (e.g., the River Valley Trail System) not principally designed for bicycle transportation. In contrast, Kelowna had a much smaller population, numerous painted bike lanes, and some multi-use trails. We may have underestimated the availability of multi-use trails in Kelowna, since we did not include gravel surfaced trails, such as the Kettle Valley Railway and others, which may be used for bicycle transportation in the city. Both Ottawa and Victoria had painted lane bike lanes and multi-use trails. At the time of writing, only Ottawa had a separated bike lane, while both cities have plans to expand in the future. None of the cities had neighbourhood greenways, i.e., shared lanes in combination with reduced speed limits and traffic calming measures.

Attitudes towards Cycling
The majority of respondents to the survey were male, with a median age of 57 years (Table 9). Respondents in Victoria and Ottawa were slightly more frequently in agreement that bicycling is unsafe. Respondents in Edmonton were more frequently in agreement with the negative view that bicycling lanes cause congestion, while people in the more rural community of Kelowna were less concerned about bicycle lanes causing congestion. For all cities, the most frequently mentioned reasons for not cycling were physical abilities followed by safety, the availability of bike lanes or suitability of roads for cycling, and reasons of practicality or convenience (e.g., long distances). Across all cities, improvements to bike facilities were the most frequently mentioned theme for people who don't ride bikes to ride their bikes more often.

Discussion
In this study of the relationships between promotion and use of an urban crowdsourcing project, we found a link between promotion and the periods with the highest website use and incident reporting. Incident reporting only occurred in sufficient numbers to be informative for city planning or research in cities where the project was promoted actively. Incident reporting and website use corresponded with individual promotion events, with obvious peaks in web traffic immediately following promotions and more delayed responses in terms of incidents submitted. Periods of high-use also occurred spontaneously, usually during peak cycle commuting periods in cities where BikeMaps.org had been previously promoted and use was established. Additionally, we found that different cities showed different potential for ongoing use, with more responsiveness to crowdsourced cycling safety tools where there was higher cycling mode share and more bike facilities.
There was a period of above average sessions that occurred in Ottawa in late August and early September and did not correspond with active promotion by the team. This surge in activity followed a series of three serious collisions between cyclists and automobiles between 29 August 2016 and 1 September 2016, including a fatal incident following the opening of a new bike lane that received international media attention. Shortly before this, another high-use event had occurred, when the City of Ottawa included BikeMaps.org in an email newsletter. In research related to disaster preparedness, Monroe et al. [29] found a critical window with highest salience and action mobilization for community engagement in wildfire preparedness immediately after being impacted by a fire event. Similarly, promoting crowdsourcing tools may build latent interest that is later realized in response to community need.
Victoria was the anchoring community where the technology was developed (at the University of Victoria) and project was launched. The local connection to the project team may have led to wider general interest, resulting in sessions by non-cyclists who may not have had anything to report. As well, in the year prior to launching it, the city of Victoria was engaged about testing a prototype which may have primed had interested in the project. High-use events occurred here in the spring and late summer (times with high cycle-commuting traffic) without promotion, indicating a high level of community awareness of the project. In Victoria, a greater proportion of collisions and near-misses were mapped compared to Ottawa, where a greater proportion of hazards were mapped. In Kelowna, where no intentional promotion took place, reporting was limited, and all incidents with gender reported had male gender indicated, despite a number of web sessions likely by females. With the limited reporting, given the higher rates of male reporting site-wide, it's possible that all contributors who provided gender information were male. Previously mapped data may have influenced how people interacted with the project [30]. This can be described as a network effect, where a product is "valuable to the extent that other people are using it as well" [31]. As a result, different promotion approaches may be successful at different stages of project use, with an emphasis on filling the map with hazards at early stages as a low-barrier entry, and reporting more serious incidents and visualizing data at later stages.
One of the main findings of this work was related to community context, with cities with higher mode share and more cycling facilities achieving more web sessions and submitting more data. More current mode share numbers will be released in the near future, but we expect a similar trend based on experiences in the community and the corroborating cycling facilities data. People in cities with higher cycling mode share and more cycling facilities somewhat more frequently expressed concerns about cycling safety. Additionally, where higher quality cycling facilities were available, cyclists may have held higher expectations for safety. In contrast, despite most of the promotion activity being deployed in Edmonton, sustained use was not achieved. With lower cycling mode share, there are fewer potential mappers. Additionally, local cycling advocacy groups highlighted other activities as priorities for their efforts. Overall, in Edmonton cycling facilities were the most limited, and attitudes towards building new cycling facilities were the most negative in the broader community (i.e., causing congestion for personal automobiles). In contrast, in Ottawa, people were very responsive to a city email promoting BikeMaps.org, generating the second largest high-use event in this study. The email, which was part of the "Cycling in the City" newsletter on updates for city led-bicycling facility improvements, included a heading titled "Help make cycling safer-BikeMaps.org" along with a project description. The large response to this message was likely indicative of receptiveness by the cycling community to communication by the city about cycling. Otherwise, the survey responses, which were mostly by non-cyclists, did not differ very much by city. These findings suggest that in order to achieve regular use, there needs to be sufficient underlying interest, previous positive outcomes from civic participation processes, and support from local groups.
In previous research, Robson et al. [32] found that social media was effective for sharing knowledge in non-profits; however, for generating volunteered citizen science data, partnering with existing organizations was more effective. Research by Cardoso et al. [33] found that eBird, a citizen science project with massive participation, had wider and more diverse social networks compared to other projects with more limited use (diversity over density). We found that promotion events targeted to cyclists were associated with more data submitted, while events targeted to general audiences resulted in more diversity in terms of age and gender. We suggest that both types of promotion and engagement are important. Engaging special interest groups is helpful to generate masses of data and start positive network effects, while later, engaging more diverse audiences benefits long-term longevity of the project. Beyond age and gender there are other types of diversity to consider, such as representing new cyclists, or different socioeconomic class, and a future research priority is to consider more advanced measures of representation.
Two key considerations for the use of crowdsourced data to inform active transportation planning decisions are data quality and representation [8]. Several crowd-based mechanisms that can help ensure data quality also depend on volumes of use and therefore can be aided by promotion; for example, with many people viewing the data mistakes can be found and reported, or trusted individuals in the cycling community may help moderate the data [12]. Both of these mechanisms have been observed in BikeMaps.org, with participants contacting the team to report small fixes to improve data quality. Regarding representation, Haklay ([30]) emphasized, "When using and analyzing crowdsourced information, consider the implications of participation inequality on the data and take them into account in the analysis". A goal for active transportation is to achieve facilities that are safe and appealing to new riders and older people in order to grow the cycling mode share [34]. However, in Western countries with growing cycling populations, often the majority of cyclists are still male [35]. Likewise, for online volunteer mapping efforts, such as OpenStreetMap [36] and also for bike-specific data collection [37] the majority of use has been by young, educated males who are experienced cyclists (in the case of cycling data). Therefore, these data may under-represent the interests of target groups for cycling growth such as older individuals, females, and new cyclists. In this work, we observed a shift towards greater use by older populations in Victoria compared to earlier periods (i.e., higher median age). We also observed different demographic cohorts responding to different types of promotion events. Targeted promotion may provide a tool that may be used in combination with the design of the crowdsourcing tools to engage and improve the representation of target populations.
This research focused on promotion events for uptake of new contributors, which is critical for launching in new cities and because cycling near-misses and crashes are infrequent events that require large reporting populations. Ongoing in-person and social media communication were also used to maintain interest, engagement, and build community amongst dedicated users (in particular, we play an active role in Twitter social networks). In this project, participation by individuals beyond reporting has taken the form of grassroots promotion and championing derived data products. While these activities are harder to measure, they have been essential to the project's success.
With growing interest and rapid advances in crowdsourcing to provide data to meet information needs for research and planning, understanding effective promotion is critical. Several researchers have evaluated the motivations of regular participants in citizen science projects, often using surveys [38][39][40]. This work is strongly complimentary, since it can be applied directly to promotion events. We offer three recommendations for researchers promoting crowdsourced projects. The first is to reflect on project goals, develop metrics that indicate these goals, and use the metrics to monitor the crowd's responses to promotion efforts. The second is to carefully consider community context and tailor crowdsourcing tools and promotion to community needs. The third and final recommendation is to use feedback from monitoring to adapt promotion efforts as participants' needs and project goals evolve over time. These actions, in combination with experience gained on-the-ground, can help project coordinators collect volumes of high quality crowdsourced data that represent populations of interest.

Conclusions
We found that promotion was critical for the uptake and use of a crowdsourced cycling tools. Community context was an important consideration, with cities with higher cycling mode share and more cycling facilities being more responsive to promotion of a crowdsourced cycling safety project. We observed that promotion to cyclists resulted in more incidents reported, and generally by younger cohorts, traditional media targeting general audiences was associated with incidents reported by older males, and social media targeting general audiences associated with more diverse data in terms of gender and age. We encourage project promoters to consider project goals, develop metrics for monitoring, and adapt and respond over time. Targeted promotion may be one tool to work towards better representation of all cohorts in crowdsourced data.