In recent years, trajectory data have been a hot topic in the emerging domain of data science [1
], of interest to data scientists in diverse fields including geography, statistics, computer science, physics, and biology. This can be attributed to the availability of massive geo-tagged data [2
], thanks to advancements in technologies such as global positioning systems, Web 2.0, and telecommunications. A typical example of trajectory data is floating car data, which are actively collected by GPS receivers installed in vehicles for navigation or monitoring. To uncover novel patterns, much work has involved floating car data. For instance, some studies tried to segment trajectories using an arbitrary specified speed threshold [4
] or elapsed time threshold [5
]. Some studies estimated CO2
emissions from trajectories and applied them to sustainable location planning [6
] and market analysis for the retail sector [7
]. A recent study explored the relationship between human activities and landscape patterns [8
], and studies more relevant to our work focused on mining human activity patterns [9
Among the studies on human activity patterns, some developed methods to extract activity hotspots [14
]. These activity hotspots are the clusters of activity locations in space and time. Not only do they display dynamics in terms of their structure and lifetime [13
], but they also contain rich semantic information in terms of urban functionality, which refers to the actual land use information such as residential area, commercial area, recreation, etc. However, it is not straightforward to infer their urban functionality using the available literature. On the one hand, POIs data have been directly used to reveal the functionality of urban land use [19
], but the uncovered urban functionality cannot reflect the reality well. This is because some POIs may be visited many more times than the others, and hence the distribution of POIs may lead to an unreliable estimation of urban functionality. Besides, no activity hotspots were detected in those studies. On the other hand, POIs data have been indirectly used to infer the activity type of a stopping location or the purpose of a trip using the simple distance matching method [23
], the decision tree model [25
], the random forest method [26
], the probability model [27
], or even the visual analytic technique [30
]. The functionality of the activity hotspot was then derived by merging the activity types of all stopping locations within its spatial range [31
]. In this way, the revealed urban functionality considered the importance of different POIs, but the functionality of activity hotspots with very few stopping locations cannot be estimated reliably. In other words, the relationship between the POIs and the stopping locations is overlooked in the literature. Hence, this study aims to examine this issue by employing the underlying scaling pattern.
The scaling pattern can be understood from two perspectives. First, in physics, the scaling pattern of an entity may refer to a power law distribution of its size or quantity [34
], which indicates invariance under contraction or dilation and is often thought to be a signature of hierarchies. Second, the scaling of an entity could be regarded as an allometric scaling relationship among its properties [37
]. This pattern has been observed in the biological world [38
] and in human society [39
]. For instance, a recent study suggested that human interactions in terms of communication activities scale superlinearly with the size of administrative divisions such as statistical cities, urban zones, or municipalities [40
]. However, in urban studies, very few were aware of the importance of this scaling pattern. Sutton [41
] had observed the allometric scaling pattern between urban area and urban population for cities in the US, which helped him to identify sprawling cities; Jiang et al. [42
] had investigated the scaling pattern of geographic space and further applied this finding to the process of map generalization for many geographical entities including street network, coastline, and drainage network. To our knowledge, applying the scaling pattern of human activity hotspots to understand their urban functionality is still unexamined.
Therefore, this study is dedicated to investigate the scaling pattern of activity hotspots and further apply this rule to understand their urban functionality. Hence, it differs from the previous studies in three main aspects. Firstly, we use a head/tail break rule [43
] to extract a large number of stopping locations from trajectory data, which are regarded as the activity locations for living, working, or shopping. These locations are further aggregated into activity hotspots using a newly designed temporal city clustering algorithm (TCCA). Secondly, we investigate the allometric scaling relationship between the number of POIs and the number of stopping locations, which helps to design an allometric ruler to identify the activity hotspots whose functionality could be reliably estimated using the specified number of stopping locations. This comes from the idea that humans can collectively interact with urban land use in an efficient way known as economies of scale [37
]. Thirdly, a Bayesian learning model is developed to infer the urban functionality of the identified activity hotspots, where it utilizes temporal and spatial information on stopping locations covering 75 days. As a result, the benefits of this work are two-fold. First, it expands the application of scaling patterns, which contributes to a quantitative understanding of the relationship between human and urban environments. Second, it contributes a novel method to infer the urban functionality of activity hotspots, and the results will be useful in many urban applications. For example, urban planners could use them as decision supports for verifying, updating, and compiling city land use plans.
The remainder of this paper is organized as follows. In Section 2
, we describe the datasets and the procedures to derive stopping locations and human activity hotspots. In Section 3
, we report the scaling pattern of human activity hotspots and the allometric ruler to identify those whose urban functionality could be reliably estimated. In Section 4
, we propose the Bayesian learning model to infer the urban functionality of the identified activity hotspots. The limitations are discussed in Section 5
. Conclusions are drawn in Section 6
The underlying scaling pattern of geographic entities might be examined from two perspectives, namely the power law distribution of one quantity and the allometric relationship between two quantities. The allometric relationship could be further examined in situations of a sublinear or superlinear scaling relationship. For the sublinear case, it is typically observed that a city resembles an organism and its population size scales sublinearly with the consumed energies or resources for economies of scale. For the superlinear case, it found that the population size of a city scaled superlinearly with the amount of patents/innovation for the requirement of wealth creation [37
]. The scaling pattern of geographical entities might have many potential implications, but only very few urban studies in the literature have realized its importance, such as the studies on urban sprawl [41
] and map generalization [42
]. Hence, the novelty of our study is to infer the urban functionality of human activity hotspots using their allometric scaling relationship with urban land use, which to our knowledge has rarely been investigated in the literature.
Nonetheless, this study has several limitations:
First, to estimate the urban functionality of activity hotspots, we use taxi trajectories spanning 75 days over three months. A major concern is the reliability of the reported results, because taxis represent only one travel mode among many chosen by urban residents, which limits the observed types of human activities. In other words, it is very difficult to estimate the statistical bias owing to the absence of transportation survey data at Wuhan, which is a common problem of investigations on human activity or mobility patterns using taxi trajectories. Therefore, how to resolve this issue is still an open question. Nonetheless, a possible solution is to integrate other types of human activity or movement data, such as the integrated circuit card data collected by a bus or metro system, the movement data of private vehicles or bicycles, or even pedestrian movement data. This requires further studies.
Secondly, we investigate the quality of GPS trajectory dataset from two aspects. Spatially, we examine the reliability of the assignment of GPS locations to the Voronoi cells of a specified type of POI. To do so, we calculate the service radius (r
) of a Voronoi cell as the radius of a circle with the same spatial extent. As shown in Figure 8
, we find that 99% of Voronoi cells have an r
value greater than 7 m and 90% of Voronoi cells have an r
value greater than 13 m. Hence, via a simple comparison with the horizontal accuracy of GPS locations (5 m), it can be generally assumed that the accuracy of GPS locations has a limited influence on their assignments to the Voronoi cells. Temporally, the major concern is that the temporal resolution of our GPS location is relatively low, and it is very difficult to estimate the stops with time duration of less than one minute. In other words, you can hardly know what happens between two consecutive GPS locations. However, we try to use a massive dataset in order to compensate for lost stopping locations. Besides, it is reasonable that each taxi contributes on average 42 stopping locations (or 41 trips) in one day (24 h), because the estimated monthly income of one taxi driver of 4613 Yuan (considering the facts of two drivers sharing a taxi, the 15 Yuan charge for a trip, and the profit rate of 0.5 in Wuhan, China) is very close to the amount (5000 Yuan) published in the income survey report [52
Thirdly, in the Bayesian reference model we set the damping factor at 0.85, which means that approximately 85% of stopping locations are matched to the POIs in the local activity hotspot, and approximately 15% of them are matched to the POIs in the entire study area. By setting a damping factor we can mimic real situations by supplementing the urban functionality of the activity hotspot. However, it is not easy to set the value of a damping factor for a better simulation, and it requires further study.
Fourthly, to select the human activity hotspots whose functionality could be reliably estimated, we empirically determined the DP value of 0.92 and the stops value of 1000 for the purpose of illustration. This is a potential limitation, and it affects the number of activity hotspots to be selected. In other words, how to select the activity hotspots is still an open question, and it should be dependent on the field of application and determined by decision makers.
Fifthly, the inferred urban functionality is validated using the OpenStreetMap (OSM) [53
], an online map service aimed specifically at creating and providing free geographic data such as street maps to anyone. The validation procedure starts by visually observing the major urban functionality of the activity hotspot on the map, and then it compares them with the major inferred urban functionality. For some activity hotspots, it is difficult to identify the functionality from OSM due to the absence of geographic data, and the validation is conducted in the field. However, it suffers from subjectivity and could be applied to only a few samples, with inaccurate results. A more objective and accurate validation method is thus needed, which requires further study.
In this paper, we aim to identify and infer the urban functionality of human activity hotspots in the light of the underlying scaling pattern. First, we derive a huge number of stopping locations from massive taxi trajectory data using the head/tail break rule, which produces what we regard as proxies of human activity locations. Second, these stopping locations are aggregated into human activity hotspots using a temporal city clustering algorithm, and human activity hotspots are reported to display the scaling pattern over time. With the scaling pattern, we devise an allometric ruler to identify the activity hotspots whose functionality can be reliably estimated using the specified number of stopping locations. Eventually, 67 activity hotspots are identified, which corresponds to six regions in space.
A Bayesian inference model is then proposed to infer their urban functionality. The model studies both temporal and spatial information in stopping locations covering 75 days, where temporal information denotes time of day and spatial information refers to spatial coordinates. Specifically, a probability curve depicting how each type of POI functions during different times of day is empirically calculated. Additionally, a prior probability of functionality distribution is calculated as the percentage of each type of POI in the study area. Thereafter, each stopping location is supplied to the model to match with the most likely POI type, which eventually leads to a reliable functionality distribution for each activity hotspot. Overall, our findings suggest that most of the activity hotspots could be understood well in terms of their functionality distribution. The results will be useful for decision-making on urban land use planning and transportation in general, and they will be beneficial for monitoring the change of urban land use and the dynamics of urban transport in particular. Future work will thus be concentrated on the application of our method to other types of human movement data such as the circuit card data collected by a bus or metro system.