1. Introduction
Understanding urban flows and dynamics is important for uncovering hidden knowledge in spatial and social systems. For example, Batty [
1] argues that cities are built around flows of money, information, resources, etc. as well as people across urban spaces. Exploring how individual citizens move around urban spaces can potentially shed new light on both urban space characteristics and, critically, their dynamics and complexities [
1,
2].
Knowing how, where, when and why people travel in cities, particularly on a large and comprehensive scale, remains a challenge for researchers. Traditional travel surveys [
3,
4,
5,
6] are simply not responsive enough to capture the dynamics of population flows within cities and critically how patterns of movement change temporally as well as spatially. Transport system smart card data (SCD) are passively collected by automated fare collection systems in stations or on vehicles. They record individual-level details of where and when travellers enter (tap-in) and leave (tap-out) the transit system. They capture the dynamics of individual mobility within the city and provide opportunities to generate new insights into travel flows and mobility behaviours. However, such data contain no information on traveller socioeconomic status or trip purpose [
7]. New forms of micro-level (big) data, such as from social media, have been found to contain rich information about place semantics and individual interactions with the physical world [
8]. Combining such information with SCD presents an opportunity to generate a more holistic picture of urban flows through inference of where, when and why individuals move through cities. These understandings can also benefit the related urban and infrastructure planning, for example, contributing to the development of “liveable city” [
9].
In this context, the aims of this paper were (i) to link metro SCD with land use inferred from social media check-ins at points of interest (POIs), thereby (ii) to generate travel profiles from their origin and destination and from the time and day of travel, and to infer trip purpose, and finally (iii) to analyse travel flows of different groups and sub-groups of travellers to generate new insights into how individuals interact with and use urban space. The paper is organised as follows:
Section 2 presents an overview of the issues around understanding urban mobility, with
Section 3 presenting the data. The methods are presented in
Section 4, analysis and results are in
Section 5, with a critical discussion on limitations and areas for further work given in
Section 6 and
Section 7.
2. Behaviour from Smart Card Data
The analysis of mobility patterns within public transit systems can reveal new insights into the spatiotemporal features of daily urban life. An improved understanding of the mobility patterns of transit riders from different socio-economic backgrounds can support the evaluation of different aspects of current public transit services by authorities and policy makers. This allows, for example, targeted marketing strategies, decision making to improve services and explorations of the resilience and efficiency of transport infrastructures.
Historically, such activities have been informed by travel behaviours research based on questionnaires and travel surveys [
3,
4,
5,
6]. Whilst survey data commonly contain personal demographic and socioeconomic details of survey subjects, they have a number of drawbacks. First, the representativeness and generalizability of the information from surveys may be limited, with a small number of respondents typically sampled. They may not be conducted at the same places and can have short temporal currency, particularly in cities that have been subject to rapid urbanisation over recent decades [
10]. Travel surveys may fail to adequately represent these situations. For these reasons, there has been an upsurge in research interest exploring the opportunities afforded by the many new forms of big data, including social media travel card data and social media.
Smart card data (SCD) are event-triggered. Transactions are recorded only when the traveller swipes their card to board a vehicle or access a station. SCD have been used by researchers to investigate patterns of urban flows, including commuting, mobility and travel areas [
11,
12,
13,
14,
15]. These studies have focussed on identifying the spatiotemporal patterns within the SCD in order to inform and support transportation planning. Such studies are plentiful, and typically they evaluate the spatiotemporal patterns of trips through the transit system [
14] to quantify and predict individual mobility [
14] to examine route choices [
14], the scales of regular and explicable travel behaviours [
16] and temporal changes in the spatial structure of urban movement [
12]. Comprehensive reviews of the technologies, applications and methodologies of SCD analyses and the evolution of thinking in this area are provided by Bagchi and White [
17], Pelletier et al. [
7] and by Li et al. [
18].
One of the main difficulties experienced in research and analyses of SCD is how to link the observed variations in urban flows and spatiotemporal dynamics with individual socioeconomic attributes and thereby infer the purpose of trips. Some studies have been able to classify travellers into different groups and have analysed these separately in order to gain a better understanding of cardholders’ travel behaviour. For example, Huang [
19] studied the diversity of spatial and temporal mobility patterns of different age groups (child/student, adult and senior citizen). Wang et al. [
20] and Long et al. [
6] analysed university students and those making unusually long, early, late or daily trips (“extreme transit commuters”). Other research has identified peak travel times for specific groups, such as students [
19]. Although these studies included demographic dimensions and have advanced understanding, they all concluded that a lack of socioeconomic and demographic details, and in particular an absence of data on the purpose of journeys, presented a major barrier to more in-depth and useful studies. Others have sought to infer such characteristics from the time, origin and destination of trips, but inferring trip purpose presents a challenge. In many cases some kind of service area or catchment has been used, defined as either a buffer (fixed distance or isochrone) around metro stations or administrative polygons [
21]. Such areas have also been used to characterise the origin or destination areas, frequently through land use designations. For example, Wolf [
22] suggested that matching land use information with trip origin and destination could give greater insight into individual motivations, providing context for specific trips and thereby potentially supporting inferences of trip. Lee and Hickman [
23] and Devillaine et al. [
24] used a combination of decision tree and heuristic rules to infer trip purpose from trip temporal characteristics, socioeconomic and land use information. Their method is highly dependent on the duration of activity, which is based on the assumption that users do not use any other transit modes in their trips. This assumption is not likely to be true for all travellers, especially occasional travellers. The work of Medina [
25] combined household travel surveys and high quality public transport data for inferring bus and metro trip purpose of going home, to work or study. Despite its effectiveness, it may have a shortcoming in applicability for other places. Not many cities in developing countries (e.g., China) conducted household travel surveys regularly, and their buses do not often contain both boarding and alighting information to construct a bus and metro travel chain, as proposed in [
25]. Liu et al. [
26] studied the dynamics of the inhabitants’ daily mobility patterns using smart card data in Shenzhen. They identified morning metro tap-in as being close to specific residential areas and afternoon tap-in close to large working areas using detailed examples. However, this research only analysed specific locations, provided no system-wide analysis and ignored other potential land use types.
Another study in Shenzhen [
27] sought to link bus, metro and taxi trips using a spectral clustering approach to analyse transit mode. This was used to delineate five urban space categories, for which the urban function was manually inferred, and to suggest mass transit patterns from the category socioeconomic characteristics.
A number of similar or improved methods (e.g., probabilistic model [
28]) have been proposed [
29,
30], combining detailed GPS tracking data with land use information to detect both transportation mode and trip purposes. Whilst providing a richer overview of movements, this approach is limited by the number of individual study participants. Nonetheless, land use describes socioeconomic activities and provides a prism by which to infer trip purpose.
The problem encountered by previous research is that land use at any given origin or destination is unlikely to be unique—multiple land uses co-exist in space and time (see Fisher et al. [
31] for a full treatment of this issue). Thus, although a number of methods have been developed for inferring trip purpose from land use [
22,
23,
29,
30], they all face the same problem of how to identify the important land use entities in different parts of the transportation system. Analysis of social media check-in data can provide an indication of this.
Social media data analysis can be used to provide an understanding of local sentiment, as well as where individuals go and why [
8]. Much land use-related information is also recorded both directly and indirectly in social media. Indirect information may be through the description of activities that are being undertaken, and direct land use information is available through point of interest (POI) check-ins. These record the presence of social media users at specific labelled locations. POI data have been found to enrich spatiotemporal semantic information in analyses of urban space [
8,
32,
33] by supporting inference of people’s activity in physical space. POI or parcel level land use data have been used to enrich information around origins or destinations [
22,
23,
24,
30]. In previous research, POIs or land use have been assumed to have the same potential to originate or attract trips, a simplification which leads to a bias in representing trip purpose inference [
22]. For example, a large residential POI may be more important than a small shopping mall in originating trips, and this should be reflected in different weights within a trip purpose inference model.
In summary, socioeconomic information can support deeper understandings about trip purpose, thereby providing richer analyses of urban flows and dynamics. Some research has shown it is possible that trip purpose can be inferred from trip pattern and regularity [
34]. Land uses at trip origins and destinations allow a degree of socioeconomic and purpose characterisation. Where lacking or where the land use is uncertain, it can be inferred from POI check-ins in social media data. Current studies using SCD normally focus on general mobility behaviours, such as travel frequency, travel distance and regular origin-destination (OD) pairs. The potential for contextual information derived from low-cost social media has not been fully exploited. Similarly, much of the literature focuses on methods to infer an individual trip’s purpose and fails to shed light on the overall trip purpose pattern in the whole transit system. The research presented in this paper addresses these and a number of other gaps: Social media POI check-in data are used to quantify POI weights, allowing a more accurate description of land use information to be derived, and changes in trip purpose patterns for individuals are evaluated to shed light on when and why different people travel within the city.
6. Discussion
The results of this analysis infer travel behaviours for different groups of metro system users based on the temporal and spatial patterns of their trips, as recorded in smart card data (SCD) and linked to social media data. Land use at origin and destination locations were used to infer trip purpose, providing details and explanations of trips for different users at different times. The SCD allowed different groups of users to be identified based on their fare reductions (students), travel times (commuters) along with the land use derived from social media point of interest (POI) check-ins. These groups were further explored to identify different clusters of student travellers based on the temporal profile of their metro use.
This analysis of new sources of big data to examine travel behaviour addresses the obvious drawbacks and limitations of traditional travel survey data. The work generates similarity metrics for individual traveller profiles based on average trip times and their variance and is generalizable to other studies for user classification based on usage/interaction pattern. These may benefit from applying the per land use change rate approach and from transforming the travel data into “traveller profiles” to determine clusters of users. These could also be applied on repetitive timescales, for example. Moreover, this work used POI data to infer land use-related contextual information around metro stations. Potential land use activities were weighted by quantifying the number of POI check-ins at each POI to eliminate potential bias of treating all POIs equally. The proposed “change rate” measure is capable of supporting related visualisation analysis by providing clear trip purpose interpretation from flow-associated POIs.
Both the smart card data and social media check-in data may be subject to sampling bias, with the impacts of biases in the POI data potentially more serious. Here, only 70,000 POIs were used, a very limited proportion of the total number of POIs in Shenzhen, and their time stamps were between 2011 to 2014 and potentially subject to land use changes in that time. Therefore, a post-hoc validation exercise was undertaken to quantify the potential for bias in the Weibo check-in data. A sample of 1000 Weibo POIs were overlaid with reclassified Baidu map data. Here, the Baidu API was used to extract Baidu POI information in June 2018. The overall correspondence was 0.87, and the Type I errors rates for commercial, education, leisure, working and residential were 0.19, 0.14, 0.17, 0.04 and 0.17, respectively. These indicate the proportion of times a POI land use label used in this study was incorrect (false positives). These error rates suggest that the broad findings about trip destination and purpose are reliable but with varying degrees of uncertainty. The rapid development of Shenzhen may contribute to a high-speed change in urban land use of different areas, resulting in more uncertainty of using aggregated POI check-in data over long period. Ideally, using temporal matched contextual information can lead to a more accurate interpretation of local land use and trip purpose inference.
There are a number of limitations to this study and areas for further work. First, the analysis and identification of traveller segments, temporal variation in mobility patterns and trip purpose evaluated commuters and students against all travellers. These groups could be expanded, for example by examining the socioeconomic properties of the areas from which travellers originate (in the manner of geo-demographic classifications). This would support probabilistic inferences of more nuanced and detailed trip purposes for a wider number of traveller groups and potentially would allow more precise and explanatory analyses of travel behaviours, expanding the results presented here. Such analyses would also allow the potential biases in the representativeness of SCD to be quantified, as travel smart cards may not be used equally by all social groups and offer a potential avenue to understand the sample biases associated with POI check-ins, which may over-represent particular types of activities, with, for example, people more likely to post micro-blogs while undertaking leisure activities, compared to domestic ones. Second, land use is not static, rather multiple, alternate and dynamic land use attributes may be present [
31] as the socioeconomic activities associated with any given location (origins and destinations in this study) may change over the course of a day. There is a need for studies of urban flows and dynamics to accommodate the dynamic nature of the concept of land use, which may have specific functions at different times during the day. Here, the land use from the POI data was inferred by the weighted aggregate check-ins to each POI in each zone. The timestamp of individual check-ins was not considered, which would allow the land use associated with each zone to be inferred dynamically. Such temporal refinements would extend and improve analysis of trip purpose.
Smart card data in the metro system are only one kind of mobility big data that contain travel information of urban inhabitants. Other similar data include bike-sharing transactions and taxi trip record. Smart card data of the bus and metro can generally capture flows and dynamics in mid- to long-distance trips but may not be suitable for describing more localised travel and activities compared to bike-sharing data. It should be noted that some other kinds of consumer data [
36] can also be used to derive mobility-related information, some examples include cell phone tracking records, social media data (e.g., geo-tagged Twitter) and retail transaction records. The enumerated data all contain their own set of shortcomings, for example, sampling bias, due to their varied attractiveness to different kinds of urban inhabitants. Cell phone tracking records are considered to have high representativeness of users and may suffer least of all from the bias problem, but they have another shortcoming—the data may fail to record every OD pair to represent travel flow because the location of the user is only tracked when the phone is being used. These data can, to some extent, reveal urban dynamics and how people interact and utilise urban space, but combining data from different sources may contribute to a more comprehensive picture of urban flows. Although varied in structure and spatiotemporal granularity, such data and derived travel flows always benefit from external contextual information. Incorporating contextual data (e.g., land use) can lead to deeper understanding of the flows and activities. The obtained insight on flows of individuals and groups of people at different time periods also reveals the complexity and diversity within city life.