Investigating the Spatiotemporal Dynamics of Urban Vitality Using Bicycle ‐ Sharing Data

: In recent decades, the availability of diverse location ‐ based service (LBS) data has largely stimulated the research in individual human mobility. However, less attention has been paid on the intra ‐ city movement of cyclists coupled with their spatiotemporal dynamics. To fill the knowledge gap, drawing on bicycle ‐ sharing data over one week in Shanghai, China, this study investigates the dynamics of bicycle ‐ sharing users at two spatial scales (i.e., city level and subdistrict level) and explores the intra ‐ city spatial interactions by those cyclists. At the city level, by applying the analysis of variance (ANOVA) test and the Wilcoxon signed ‐ rank test, this study examines the temporal variation of cyclists across a seven ‐ day period. At the subdistrict level, we develop a new index to capture the urban vitality using bicycle ‐ sharing data with the consideration of trip flow allied with spatial weights. In terms of the computed urban vitality over the course of a day, 98 subdistricts are partitioned into 7 groups by using K ‐ means clustering. In addition, spatial autocorrelation and hot spot analysis are also applied to examine the spatial features of urban vitality at different periods. Our results reveal that urban vitality has an obvious character of the spatial cluster and this cluster feature varies markedly over the course of a day. By shedding new lights on intra ‐ city movement, we argue our results are important in informing urban planners on how to better allocate public facilities and increase bicycle usage as a way to progress towards more sustainable urban areas.


Introduction
In recent decades, with the rapid development and popularity of location-based services, location-based service (LBS) data, as a kind of passive data, have drawn growing interest in investigating individual human mobility in terms of their unique advantages in accuracy, timeliness, ubiquity, and affordability [1][2][3][4][5]. So far, various types of LBS data, such as mobile phone data, transit smart card data, taxi trajectories, and social media data, have been applied to establish a thorough understanding of an individual's mobility, especially on the travel behaviour features [6][7][8][9]. An abundance of salient findings, as a consequence, have come forth and fuelled a wave of mobility studies. For instance, drawing on the records of Wi-Fi-based location logging, Rekimoto et al. visualised Wi-Fi users' activity patterns and probability density in Tokyo, Japan [10]. By analysing transit smart card data, Kusakabe and Asakura developed a data fusion methodology to capture transit users' behavioural attributes and trip purpose [11]. Using massive taxi trajectory data, Zheng et al. studied the spatial-temporal characterisation of urban residents' travel patterns and identified the hot paths by taxi users in Chongqing, China [12].
Despite a fruitful line of research on unveiling individual human mobility, less attention has been paid to the features of urban dynamics characterised by those individuals' movement [13]. From urban function and urban structure to land-use intensity, all of the features of an urban system are intrinsically connected with people's presence and movement [14][15][16][17]. LBS data, with the potential of revealing individual mobility features, as such, provide a promising opportunity for us to better understand the urban system in a timely fashion. Through a few but significant previous studies, we can see that a number of fresh perspectives on urban dynamics have been revealed by utilising the emergent LBS data. For example, drawing on seven-day taxi trajectory data in Shanghai, China, Liu et al. analysed the temporal variations of both pick-ups and drop-offs and identified six types of land use by classifying those temporal variations of taxi passengers [13]. In a similar study conducted in Rome, Italy, Sevtsuk and Ratti applied mobile phone data as probes for estimating the intensity of urban activities and their evolution through space and time; they also found that call-volume patterns are associated with the demographic, economic, and (built) environment indicators [17].
In the small body of LBS data-urban dynamics literature, a particular paucity is the application of bicycle-sharing data [18,19]. Set against the backdrop of increasingly serious traffic congestion in modern cities, a bicycle-sharing system is considered as an effective solution to improve traffic efficiency and reduce air pollution [20]. In addition to those benefits concerning traffic efficiency and environmental sustainability, a bicycle-sharing system also supports multimodal transport connections by acting as a "last mile" connection to public transport [21]. In recent years, with the rapid implement of bicycle-sharing systems across the globe, a growing interest has been drawn on the application of bicycle-sharing data [21][22][23]. Caulfield et al. examined the usage patterns of bicyclesharing in Cork, Ireland, and found that bicycle-sharing usage was associated with weather condition and travel distance [24]. Bakogiannis et al. analysed the efficiency of a bicycle-sharing system in Rethimno, Greece, and reported that the bicycle-sharing system was primarily used for short distance trips, and traffic safety concern coupled with service limitation were the two factors affecting bicyclesharing usage [25]. Most studies focus on the bicycle-sharing usage pattern allied with its corresponding influence factors [20][21][22][23]26,27], less often paying attention to the interaction between the bicycle-sharing users and urban dynamics. It follows that if we attempt to encourage more bicycle usage as a way to progress towards a more sustainable transport system, there is a compelling need to investigate the intra-city movement of cyclists as well as the urban dynamics characterised by those cyclists.
Among various features describing urban dynamics, urban vitality, proposed in Jane Jacobs' book The Death and Life of Great American Cities in 1961, is a manifestation of diversity produced by interactions between human activities and urban space, and is considered as an effective indicator for measuring the attractiveness of an urban area to a diverse population in relation to city living [28][29][30]. The notion of urban vitality has inspired a number of planning strategies, such as New Urbanism and smart growth [31]. As pointed out by Jane Jacobs [32], "On successful city streets, people must appear at different times. This is time considered on a small scale, at different times throughout the day." Individual's activity and trip flow at a finer temporal scale, thereby, are two main aspects needed to maintain the "vitality" of cities. In light of the concept of urban vitality, this paper, drawing on bicycle-sharing data, develops a new index to capture the urban vitality with a particular focus on bicycle-sharing system users. Learning from accessibility modelling and land-use mix measures, in this new index, two important aspects of spatial interaction are innovatively incorporated, namely, spatial distance and spatial diversity.
In an attempt to bridge the aforementioned gaps and shed new light on understanding urban vitality, this study draws on bicycle-sharing data over a seven-day period (15 Aug 2016-21 Aug 2016) and investigates the spatiotemporal dynamics of urban vitality in Shanghai, China. Our investigation incorporates two spatial scales, namely, city level and subdistrict level. At the city level, by using the ANOVA test and Wilcoxon signed-rank test, the temporal variations of the bicycle-sharing system users are examined. At the subdistrict level, the urban vitality index is calculated for each selected subdistrict; K-means clustering is then applied to partition the subdistricts into different groups in terms of their dynamic urban vitality. In addition, both spatial autocorrelation analysis and hotspot analysis are employed to reveal the spatial distribution features of the urban vitality over the course of a day.
The remainder of this paper is organised as follows. Section 2 introduces the study context and data employed. Section 3 presents a new urban vitality index coupled with a series of the analytic approaches to investigate its spatiotemporal dynamics. Section 5 discusses the modelling results reported in Section 4 and points out fruitful avenues for further research based on the outcomes provided in this paper.

Study Area
Shanghai, one of the four municipalities (equal status to a province) in China, is the most populous city in China with a population in excess of 26 million. Sitting on the southern estuary of the Yangtze River, Shanghai is a national centre for finance, commerce, and transportation with the world's largest port-the Port of Shanghai. As the most developed city in China, Shanghai had a GDP up to US$494 billion in 2018, ranking in the top 15 globally [33]. In Shanghai, the station-less bicyclesharing system was first established in 2016, and after a year of explosive growth, by July 2017, the number of station-less shared bicycles was up to over 1 million with 13 million registered users [34].
Given we know that the station-less sharing bicycles are not equally distributed and allocated across the city, to avoid the intense numeric bias, the subdistricts that have daily bicycle-sharing records less than 5 are factored out. Thus, in this study, 98 subdistricts nested within 11 districts are selected ( Figure 1 Tong), and CN (Chang Ning). Those subdistricts located within are coded by the abbreviation of district name plus an ordered number; for example, JA8 is a subdistrict in JA (Jing An) ordered with the number 8. It is worth mentioning that subdistrict in the smallest spatial unit in Chinese statistical standard, and is also called as "Jie Dao" in Chinese, which literally means "street."

Bicycle-sharing Data
The bicycle-sharing data employed in this study are derived from Mobike, the world's largest shared bicycle operator [35]. From 15 Aug 2016 to 21 Aug 2016, 243,864 transaction records had been achieved in this seven-day period. On every weekday, the daily shared-bicycle transaction record is up to over 35,000. Table 1 shows an example of the original bicycle-sharing data, from which we find that a completed transaction record contains spatiotemporal information (i.e., time and coordinates) on both trip origin and trip destination. Figure 2 and Figure 3 demonstrate the bicycle-sharing usage at the daily and hourly level, respectively. Detailed analyses on the temporal variations of the bicycle share ridership are provided in the following sections.   Drawing on the accurate temporal information of the bicycle-sharing data, the bicycle-sharing usage can be integrated into the daily and hourly level (as shown in Figure 2 and Figure 3). To investigate the temporal variations of the bicycle ridership, two statistical models are applied, namely, the analysis of variance (ANOVA) and Wilcoxon signed-rank test.
ANOVA, grounded in the law of total variance, is used to test the differences between two or more groups by analysing the differences among group means in a sample [36]. As such, ANOVA is applied to estimate the daily bicycle ridership variation over the study period.
Integrated into the hourly level, the hourly bicycle ridership shows a character of "two-peaks" distribution on weekdays; however, on weekends, it turns out to be quite dissimilar ( Figure 3). In order to capture this difference, the Wilcoxon signed-rank test is used. The Wilcoxon signed-rank test, as a non-parametric statistical test, is designed to estimate the differences between two matched samples and determine whether the two samples have the same distribution [37]. In comparison with the paired Student's t-test, the Wilcoxon signed-rank test is more appropriate for the samples not normally distributed [37]. Given the non-normal distribution of the hourly bicycle ridership ( Figure  3), the Wilcoxon signed-rank test is employed to quantitatively reveal the variations of the hourly bicycle usage across weekdays and weekends.

Urban Vitality Index
In light of gravity-based accessibility measures [38,39] coupled with the land-use mix index [40,41], we developed a new index to capture urban vitality by using bicycle-sharing data. Rather than merely consider the bicycle-share ridership, two important aspects of spatial interaction, namely, spatial distance and spatial diversity, are both incorporated into our new urban vitality index. The formula is expressed as follows: where represents the urban vitality of place , and a larger value indicates a stronger vitality of an urban area.
represents the bicycle-share ridership with the trip ending at place and starting from the same place, namely, . Similarly, represents the bicycle-share ridership with the trip ending at place but starting from place . denotes the travel distance between place and place ; represents the maximum value of travel distance from other places to place . is the number of places (place excluded) with trip ended at place .
It is noteworthy that the calculation of the urban vitality index includes not only the daily features, but also the hourly variations. At the hourly level, according to the temporal characteristics of hourly bicycle-sharing ridership on weekdays, the urban vitality index is calculated for four time periods: AM peak (7:00-9:00), Midday off-peak hours (9:00-16:00), PM peak (16:00-19:00), and Night hours (19:00-22:00).

K-Means Clustering
K-means clustering is an unsupervised machine learning algorithm for partitioning n observations into a set of k groups (i.e., k clusters). By computing the distance or the (dis)similarity between each pair of observations, K-means minimises the distances and identifies which observations are alike and classifies them into a set of groups. In K-means clustering, within the same group, observations are as similar as possible with the nearest distance to the group centre [42]. In this study, with the subdistrict as the observation, five urban vitalities are considered in K-means clustering, including daily urban vitality and hourly urban vitality during the four periods (i.e., AM peak, Midday off-peak hours, PM peak, and Night hours). To avoid the big numeric variation that may influence the clustering, for each subdistrict, all five computed urban vitalities are normalised by using Min-Max feature scaling [43]. As there are many distance measures, such as Manhattan distance and Pearson correlation distance, in terms of subdistrict's vitality features, we choose to use the classical Euclidean distance to compute the distance matrix. Euclidean distance is formulated as follows: where and are two subdistricts, and and represent the urban vitality at a specific time i for subdistrict and , respectively.

Examining the Spatial Distribution of the Dynamic Urban Vitality
Since K-means clustering is employed to categorise the subdistricts in terms of their overall urban vitality, this part introduces two analytic approaches used to capture the spatial distribution of the dynamic urban vitality. The first one is spatial autocorrelation. The given object's location is coupled with its value, and spatial autocorrelation measures the correlation of the object's value with itself through space and accordingly identifies whether the pattern expressed is clustered, dispersed, or random. Spatial autocorrelation calculates Moran's I index, z-score, and p-value. A positive Moran's I index indicates the clustering tendency while a negative Moran's I index indicates a tendency towards dispersion. As for z-score and p-value, they are calculated to evaluate the significance of Moran's I index. The calculation of spatial autocorrelation is formulated as follows: where I is Moran's I index, is the deviation of an attribute for object i from its mean (xi-X ), , is the spatial weight between object and , is equal to the total number of object, and is the aggregate of all the spatial weights: As for the Z score is given as: The second spatial analytic approach used to examine the spatial distribution of dynamic urban vitality is hotspot analysis. By calculating the Getis-Ord Gi* statistic coupled with z-score and p-value, hotspot analysis measures whether the spatial clustering is hot-value clustering or low-value clustering. A high z-score and small p-value for an object indicates a spatial clustering of high values, while a low negative z-score and small p-value indicates a low-value clustering. The confidence level bin of Getis-Ord Gi* is also included; features in the +/-3 bins imply the significance at 0.01 level, while +/-2 bins imply the significance at 0.05 and +/-1 bins reflect a 0.1 significance level. The relevant calculations are computed as: where is the attribute value for object , , is the spatial weight between object and , and is equal to the total number of features.

Temporal Variation of Bicycle-Sharing Usage
At the city level, both the daily and hourly temporal variations of bicycle-sharing ridership are investigated. Table 2 shows that on the basis of the ANOVA test, the daily bicycle-sharing ridership does not vary significantly. On the one hand, among weekdays, the daily variation is shown to be negligible. On the other hand, even on the weekends, the overall usage of the bicycle-sharing system keeps as high of a level as it does on weekdays.
However, when it comes to the hourly distribution, the Wilcoxon signed-rank test shows that the temporal variation between weekdays and weekends is pronouncedly different (Table 3). From Monday to Friday, no significant variation is reported. This result is also in accord with what is shown in Figure 3. However, with respect to Saturday (20 Aug 2016) and Sunday (21 Aug 2016), the results turn out to be quite dissimilar with weekdays. Both Saturday and Sunday show a distinctive temporal distribution compared to the other six days. In comparison with Sunday, Saturday to a certain extent demonstrates a more alike temporal distribution with the weekdays.
Drawing on the results of the ANOVA and Wilcoxon signed-rank test, we arrive at a set of findings that: across weekdays and weekends the daily bicycle-sharing usage is consistent, while, at the hourly level, the five weekdays all show a classical "two-peaks" temporal distribution, but this similar distribution does not show up on both Saturday and Sunday.

Urban Vitality Clustering
In line with the formula proposed in Section 3.2, urban vitality has been calculated in terms of five temporal features, including daily urban vitality, urban vitality during AM peak (7:00-9:00), urban vitality during Midday off-peak hours (9:00-16:00), urban vitality during PM peak (16:00-19:00), and urban vitality during Night hours (19:00-22:00). As a type of urban vitality index has been assigned with every subdistrict, K-means clustering is employed to categorise the subdistricts into 7 clusters by checking a suite of cluster validity indices including the Silhouette index, Tau index, and Point-Biserial index.
Firstly, Figure 4 shows the distance between each pair of subdistricts. As distance value decreases from blue to green, an obvious blue belt located at the bottom demonstrates this group of subdistricts is quite distant from or dissimilar to other subdistricts in terms of urban vitality performance. This group contains YP8, BS3, YP11, BS1, and YP5. Particularly for YP5, it has a notable blue stripe in contrast with most of the other subdistricts, demonstrating its unique performance of urban vitality. In addition, there are also some other blue stripes in Figure 4, such as BS4 and PT1. Some green patterns also need to be noted, as the low distance values that reflect those corresponding subdistricts could be clustered into a group.
Secondly, grounded in the results of the paired distance, Figure 5 reveals the exact clustering of those subdistricts. It shows that the 98 subdistricts are partitioned into 7 clusters with different colours. Five of them (cluster 1, cluster 2, cluster 3, cluster 4, and cluster 7) are relatively close to each other, demonstrating that those subdistricts have a relatively similar performance of urban vitality. While, the other two clusters, cluster 5 and cluster 6, are located in isolation. Similar to the results in Figure 4, YP5, one subdistrict representing one cluster, is found to be quite distant to other subdistricts. Cluster 5, containing five subdistricts (YP8, BS3, YP11, BS1, and PT1), is shown to sit in the intermediate zone between YP5 and the other clusters with distinctive urban vitality among the 98 subdistricts.
In accordance with the results of K-means clustering shown in Figure 4 and Figure 5, we find that categorised into 7 clusters, most of the 98 subdistricts have relatively similar urban vitality performances. However, there also exist 5 subdistricts that formed two clusters showing quite a different performance, in which the unique YP5 deserves more investigation.

Spatial Distribution of the Dynamic Urban Vitality
In terms of the overall urban vitality, K-means clustering is applied to categorise the subdistricts into a set of groups. To capture the spatial clustering of subdistricts' urban vitality, spatial autocorrelation analysis and hot spot analysis are used. Both analyses are consolidated into four time periods: AM peak, Midday off-peak, PM peak, and Night hours.
As Table 4 presents the result of spatial autocorrelation at different time periods, what stands out in the table is that the spatial clustering of urban vitality is not fixed over the course of a day.
During AM peak, it shows a significant clustered spatial distribution due to the results of Moran's Index and p-value, while during Midday off-peak hours, Moran's Index is near 0 and the p-value is not small enough to reject the null hypothesis. Thereby, the urban vitality during this time period is randomly distributed across the study area, and neither clustering nor dispersion can be verified. With respect to PM peak and Night hours, the spatial clustering is found to emerge again. By comparing the value of the z-score, the spatial clustering is shown to be strongest during AM peak and becomes weakest during PM peak. Since the spatial clustering has been identified for different time periods, hotspot analysis is used to further evaluate the exact type of clustering. From Figure 6, it is apparent that the clustering patterns during the three time periods are various. During AM peak, both high-value clustering and low-value clustering can be observed. The high-value clustering patterns are mainly located in the northeast part and northwest part, while the low-value clustering lies in the central area to the middle east part. It is interesting to point out that the subdistricts highlighted in the northeast high-value clustering are partitioned into cluster 5 and cluster 6, which includes YP5, BS3, YP11, and YP8. As for the northwest high-value clustering, it is mainly formed by PT5 and BS4, which belong to cluster 4 and cluster 7 in terms of K-means clustering.
Regarding PM peak and Night hours, the clustering features are found to be quite similar. The high-value clusterings still exist, whereas the low-value clustering cannot be observed. The highvalue clusterings contain two parts: one is located in the similar northeast area found during AM peak, and the other one appears in the northwest part. This northwest part contains BS4 and PT5, which are classified into cluster 5 in Section 4.2. Thereby, all the subdistricts within the two notable clusters (i.e., cluster 5 and cluster 6) highlighted in Section 4.2 are identified in the spatial distribution analyses. YP5, BS3, YP11, and YP8 form the core of northeast high-value clustering, which is significant over the three time periods (i.e., AM peak, PM peak, and Night hours), while the remaining two subdistricts in cluster 5, BS4 and PT5, represent the northwest high-value clustering during PM peak and Night hours.

Discussion & Conclusion
With the prevalence of location-based services in our daily life, LBS data have become a solid data source to motivate the research in individuals' mobility as well as urban dynamics characterised by those individuals' movement. Drawing on bicycle-sharing data over a seven-day period, this study takes Shanghai, China, as the study context and investigates the spatiotemporal features of urban vitality represented by those bicycle-sharing system users. In light of the accessibility model coupled with the land-use mix index, a new index is proposed to describe the urban vitality with the consideration of both spatial distance and spatial diversity. In accordance with this urban vitality index, K-means clustering, spatial autocorrelation analysis, and hotspot analysis are applied to reveal a new aspect of the 98 subdistricts. Besides, the ANOVA and Wilcoxon signed-rank test are also employed to examine the temporal variations of bicycle-sharing ridership. By utilising these analytic approaches, we have achieved a series of findings with the capacity to deepen our understanding of the cyclists' movement and our urban system.
Firstly, our results of the temporal variation of bicycle-sharing ridership show that the hourly ridership varies significantly between weekdays and weekends. Although at the daily level the overall ridership keeps consistent across a seven-day period, the Wilcoxon signed-rank test reveals that the hourly ridership among the five weekdays are quite similar, whereas both Saturday and Sunday show a different temporal distribution in comparison with other five days.
Secondly, we find that two notable clusters consisting of six subdistricts have a distinctive performance in terms of urban vitality. Since 7 clusters are identified by using K-means clustering, two of them (i.e., cluster 5 and cluster 6) are shown to be quite dissimilar to the other five. Especially for YP5, one subdistrict that constitutes cluster 6, is found to have a unique urban vitality among the 98 subdistricts.
Thirdly, our results of spatial autocorrelation analysis report that during AM peak, PM peak, and Night hours, the subdistrict's urban vitality across the study area is clustered, while, during Midday off-peak hours, the spatial distribution is randomness. Moreover, in terms of the clustering intensity, the spatial clustering is shown to be strongest during AM peak and becomes weakest during PM peak.
Fourthly, our results of hotspot analysis elucidate the high/low clustering of the subdistrict's urban vitality at different time periods. During AM peak, both high-value clusterings and low-value clusterings are identified. During PM peak and Night hours, only high-value clusterings can be observed. More importantly, we find that a high-value clustering expressed across AM peak, PM peak, and Night hours is composed by the most part of cluster 5 and cluster 6, and another high-value clustering highlighted during PM peak and Night hours is composed of the remaining part of cluster 5.
In spite of the fruitful findings provided in this paper, we suggest two aspects may form the avenues for future research. Firstly, as this study only incorporates one kind of LBS data, namely, bicycle-sharing data, further efforts could focus on different LBS data to establish a more thorough understanding of the urban vitality. Secondly, the built environment is proven to have an effect on human travel behaviour, and it would be interesting to investigate the interaction between the built environment and urban vitality, which could help to inform city planners on how to better design a more vital neighbourhood. To conclude, the findings of this study enriches the small body of LBS data-urban dynamics literature, revealing a new aspect of the bicycle-sharing data application and the urban vitality featured by those cyclists that is beneficial for urban planners to better allocate public facility and increase bicycle usage as a way to progress towards more sustainable urban areas.