Next Article in Journal
Urban Landscape Structure of a Fast-Growing African City: The Case of Niamey (Niger)
Previous Article in Journal
Autonomous Road Vehicles: Challenges for Urban Planning in European Cities

Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists

School of Geographical Sciences and Urban Planning, Arizona State University, 975 S Myrtle Ave, COOR Hall, 5th Floor, Tempe, AZ 85281, USA
Faculty of Health Sciences, Simon Fraser University, Blusson Hall, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
Author to whom correspondence should be addressed.
Urban Sci. 2019, 3(2), 62;
Received: 21 April 2019 / Revised: 27 May 2019 / Accepted: 29 May 2019 / Published: 4 June 2019
Traditional methods of counting bicyclists are resource-intensive and generate data with sparse spatial and temporal detail. Previous research suggests big data from crowdsourced fitness apps offer a new source of bicycling data with high spatial and temporal resolution. However, crowdsourced bicycling data are biased as they oversample recreational riders. Our goals are to quantify geographical variables, which can help in correcting bias in crowdsourced, data and to develop a generalized method to correct bias in big crowdsourced data on bicycle ridership in different settings in order to generate maps for cities representative of all bicyclists at a street-level spatial resolution. We used street-level ridership data for 2016 from a crowdsourced fitness app (Strava), geographical covariate data, and official counts from 44 locations across Maricopa County, Arizona, USA (training data); and 60 locations from the city of Tempe, within Maricopa (test data). First, we quantified the relationship between Strava and official ridership data volumes. Second, we used a multi-step approach with variable selection using LASSO followed by Poisson regression to integrate geographical covariates, Strava, and training data to correct bias. Finally, we predicted bias-corrected average annual daily bicyclist counts for Tempe and evaluated the model’s accuracy using the test data. We found a correlation between the annual ridership data from Strava and official counts (R2 = 0.76) in Maricopa County for 2016. The significant variables for correcting bias were: The proportion of white population, median household income, traffic speed, distance to residential areas, and distance to green spaces. The model could correct bias in crowdsourced data from Strava in Tempe with 86% of road segments being predicted within a margin of ±100 average annual bicyclists. Our results indicate that it is possible to map ridership for cities at the street-level by correcting bias in crowdsourced bicycle ridership data, with access to adequate data from official count programs and geographical covariates at a comparable spatial and temporal resolution. View Full-Text
Keywords: bias correction; LASSO; active transportation; big data; crowdsourcing bias correction; LASSO; active transportation; big data; crowdsourcing
Show Figures

Figure 1

MDPI and ACS Style

Roy, A.; Nelson, T.A.; Fotheringham, A.S.; Winters, M. Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists. Urban Sci. 2019, 3, 62.

AMA Style

Roy A, Nelson TA, Fotheringham AS, Winters M. Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists. Urban Science. 2019; 3(2):62.

Chicago/Turabian Style

Roy, Avipsa, Trisalyn A. Nelson, A. Stewart Fotheringham, and Meghan Winters. 2019. "Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists" Urban Science 3, no. 2: 62.

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop