1. Introduction
Over the last two decades, researchers have provided much evidence of the benefits of cycling as a health-enhancing physical activity [
1,
2,
3,
4,
5,
6,
7]. Recently, volunteered geographic information (VGI), user-generated content (UGC) and crowdsourced data are becoming promising data sources for transport and health research [
8,
9]. Traditional methods of collecting cycling data, including manual counts, stated preference surveys [
10,
11] and annual average daily bicycle (AADB) volumes [
12], are expensive and time-consuming. Each of these methods has its advantages, but each is almost impossible to accomplish over a broad area simultaneously, which is why crowdsourced methods are gaining interest in planning [
9]. Through the expansion of Global Positioning Systems (GPS) new methods for collecting detailed cycling route information have emerged [
13]. GPS-enabled mobile devices, such as smartphones, allow individuals to track and map their cycling routes [
13,
14,
15,
16]. More recently, crowdsourced cycling data are used to analyze cycling behavior [
13,
17] and make associations between cycling and health [
9]. Strava is a popular website used to track users’ cycling and running activity via GPS-enabled devices, such as smart phones and smart watches. Millions of people upload their rides and runs to Strava every week via their smartphones or other GPS devices [
18]. Strava launched a data service called Strava Metro that offers aggregated data sets after anonymizing and aggregating individual’s GPS traces. In earlier studies that use traditional data collection methods, research on the role of cycling for health through physical activity has been limited by the lack of information on where bicyclists ride [
9]. With a high spatial resolution, Strava Metro data is able to provide new opportunities for research into active travel, sustainable travel and public health. This could benefit studies of cycling behavior and public health, and further help policymakers in urban planning, especially designing urban infrastructures aiming to make urban residents healthier and cities more sustainable. For instance, knowing where people like to cycle could help policymakers to improve cycling infrastructure more effectively (e.g., availability of cycle parking in areas of high demand) and promote road safety by giving priority to roads where there are more cycling trips. In several recent studies, Strava data has been used to map ridership over a city [
13], evaluate the impact of bicycle infrastructure on cycling behavior [
17], and investigate impacts of residential and employment density, land use diversity, cycling facilities and terrain on cycling behavior [
9].
The impact of outdoor and traffic-related air pollution on health is an important issue in transport and health [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Typically, impacts of active travel (cycling and walking) and inactive travel (traveling by car, bus or train) on health are compared [
22,
23,
24,
25,
26,
27]. Although earlier studies offer much evidence on the health benefits of cycling or walking due to increased physical activity [
1,
2,
3,
4,
5,
6,
7], some other studies reveal cycling also carries some potential health risks, including air pollution, accidents and noise [
29,
30,
31]. One of the most important risks is from poor air quality [
29,
30]. Exposure to air pollution is harmful to human health [
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44] and more than 80% of people living in urban areas that monitor air pollution are exposed to air quality levels that exceed World Health Organization (WHO) safe limits [
32]. Cyclists riding in urban areas are therefore likely to be exposed to high levels of air pollution. Recent studies use health impact modelling (HIM) to estimate the health benefits and risks of active travel (cycling, walking), and reveal that the total benefits of active travel outweighed the risks [
28,
45,
46]. Particularly, a very recent study reveals that benefits of active travel outweighed the harm caused by air pollution in all but the most extreme air pollution concentrations [
28]. It is also becoming widely accepted that increasing cycling time tends to increase health improvements. However many such studies assess cyclists’ exposure to air pollution based on city-level air pollution values when in fact, air pollution levels vary spatially over a city. Relating cycling activities to air pollution at a larger scale (e.g., street-level) could promote assessment of air pollution exposure when it is known where and when cyclists ride in a city. Ideally, urban planners and policymakers could use this knowledge of where cyclists ride to devise cycling and walking routes that minimize the risks faced by active commuters, and to decrease volume of cyclists riding in the environments that are associated with the highest exposures [
29,
47].
Moreover, Strava Metro data also indicate cycling purpose (commuting or non-commuting) of cycling activities. Researchers might make use of this in studies of cycling purpose and health. In this paper, we explore the potential of Strava Metro in research of active travel and health by using the data to investigate spatial patterns of non-commuting cycling activities and associations between cycling purpose (commuting and non-commuting) and air pollution exposure at a large scale. Additionally, as some cycling trip data sets (e.g., crowdsourced GPS trajectories or bike-sharing origin-destination trips) lack trip purpose (commuting or non-commuting), we can’t directly relate cycling purpose to air pollution exposure when utilizing those data sets. However, we might estimate the number of non-commuting cycling trips based on the number of all-purpose trips and environmental characteristics that affect cycling behavior. In this paper, we try to estimate the number of non-commuting cycling trips based on the number of trips for all purposes and environmental characteristics, as the estimation model could be validated by the Strava Metro data. If this method of estimating the number of non-commuting cycling trips is shown to be good, we may then use it to estimate the number of other non-commuting cycling trip datasets where trip purpose is unknown.
In this paper, we use the Strava Metro data in Glasgow, UK to carry out an empirical analysis. Firstly, in order to explore spatial patterns of non-commuting cycling activities, we investigate where non-commuting cycling activities are more likely to be than commuting cycling activities by identifying clusters where there are high rates of non-commuting cycling activities. Afterward, to associate cycling purpose with air pollution exposure at a large scale (i.e., the street intersection level), we investigate whether cyclists riding for recreation and other purposes (excluding commuting) are more likely to be exposed to relatively low levels of air pollution than cyclists riding for commuting. Note that levels of air pollution also might also vary over time in the study area. We focus on spatial variations of air pollution levels, not spatio-temporal variations of air pollution levels as temporal resolution of the air pollution data is one year. In this study, we focus on the difference in air pollution exposure during cycling, not the difference in health effects of cycling. Additionally, we strive to improve the estimation of the number of non-commuting cycling activities by using different regression methods (linear and non-linear methods) as the estimation models.
4. Conclusions
In this study, we investigate spatial patterns of cycling activities and associations between cycling purpose and air pollution exposure in Glasgow, UK by using Strava Metro data. Empirical results reveal some findings that (1) compared with commuting cycling activities, non-commuting cycling activities are more likely to be located in outskirts of the city; (2) spatially speaking, cyclists riding for recreation and other purposes are more likely to be exposed to relatively low levels of air pollution than cyclists riding for commuting; and (3) the method for estimating of the number of non-commuting cycling activities works well in this study. The results suggest that (1) policymakers might consider how to improve cycling infrastructure and road safety in outskirts of cities; and (2) we may be able to estimate the number of non-commuting cycling activities when trip purpose of cycling data is unknown. We conclude that this study is a good start in utility of crowdsourced cycling data for studies of cycling and air pollution exposure.
4.1. Limitations
This paper does present a few limitations. First, a census output area is used as the area unit in in the identifying clusters. The modifiable areal unit problem (MAUP) might influence the cluster identification in this study. Second, although the estimation of non-commuting cycling activities is good at the node level, the estimation at the edge (street) level is unknown. Third, although the models work well in estimating non-commuting cycling activities, there might be some potential to improve the estimation. Ideally, we could improve the estimation by incorporating more attributes such as land use mix, residential density, traffic count, road type, road width, etc., into the models. We are not able to include those attributes due to present data availability. Fourth, instantaneous assessment of air pollution is used in this study. In fact, cumulative assessment of air pollution makes more sense to studies of health effects of active travel. Ideally, the inhaled dose of air pollution during a cycling trip should be assessed according to not only where the trip takes place but also the time spent travelling. Furthermore, long-term air pollution exposure of a cyclist should also take account of the number of his or her commuting trips and non-commuting trips within a longer period (e.g., one year or more). As Strava Metro doesn’t offer individual-level trips, we know neither how long a commuting or non-commuting cycling trip takes nor how many commuting or non-commuting cycling trips each biker takes in one year. Thus, we are not able to assess cumulative exposure of a cyclist to air pollution. Finally, since VGI, UGC and crowdsourced data are collected and shared by individuals, there are arguments about the quality and fitness for use of such data in projects [
63]. While we are aware of this issue, we do not tackle this in this study, as it requires separate study on this topic.
4.2. Future Works
In the future, we will take account of some aspects to enhance this study. First, as the Strava database offered by Urban Big Data Centre offers the number of cycling activities in distinct daily time slots, we could model spatio-temporal variations in non-commuting cycling activities according to spatio-temporal characteristics; Second, we will incorporate other air pollutants such as sulphur dioxide, nitrogen dioxide and ozone into the analysis. In addition, although Strava Metro doesn’t offer individual-level trips, we may be able to assess cumulative air pollution exposure by using the sub data set ‘Streets’ in Strava Metro. Based on the number of trips on each street segment, we could use length of the street segment to represent the length of sub cycling trip, and then estimate the time of sub cycling trip based on average speed of cycling for commuting or recreation and other purposes in Glasgow, a figure we could possibly obtain from Strava Metro or other data sources (e.g., travel surveys). Accordingly, we could estimate the total air pollution exposure to all of Glasgow’s Strava cyclists when riding for commuting or recreation and other purposes during one year, and further estimate annual average air pollution exposure of one cyclist when riding for commuting or recreation and other purposes. This would represent a large amount of future work and potential to gain a much greater understanding of the impact of city air pollution.