Understanding the Functionality of Human Activity Hotspots from Their Scaling Pattern Using Trajectory Data

: Human activity hotspots are the clusters of activity locations in space and time, and a better understanding of their functionality would be useful for urban land use planning and transportation. In this article, using trajectory data, we aim to infer the functionality of human activity hotspots from their scaling pattern in a reliable way. Specifically, a large number of stopping locations are extracted from trajectory data, which are then aggregated into activity hotspots. Activity hotspots are found to display scaling patterns in terms of the sublinear scaling relationships between the number of stopping locations and the number of points of interest (POIs), which indicates economies of scale of human interactions with urban land use. Importantly, this scaling pattern remains stable over time. This finding inspires us to devise an allometric ruler to identify the activity hotspots, whose functionality could be reliably estimated using the stopping locations. Thereafter, a novel Bayesian inference model is proposed to infer their urban functionality, which examines the spatial and temporal information of stopping locations covering 75 days. Experimental results suggest that the functionality of identified activity hotspots are reliably inferred by stopping locations, such as the railway station.


Introduction
In recent years, trajectory data have been a hot topic in the emerging domain of data science [1], of interest to data scientists in diverse fields including geography, statistics, computer science, physics, and biology.This can be attributed to the availability of massive geo-tagged data [2,3], thanks to advancements in technologies such as global positioning systems, Web 2.0, and telecommunications.A typical example of trajectory data is floating car data, which are actively collected by GPS receivers installed in vehicles for navigation or monitoring.To uncover novel patterns, much work has involved floating car data.For instance, some studies tried to segment trajectories using an arbitrary specified speed threshold [4] or elapsed time threshold [5].Some studies estimated CO 2 emissions from trajectories and applied them to sustainable location planning [6] and market analysis for the retail sector [7].A recent study explored the relationship between human activities and landscape patterns [8], and studies more relevant to our work focused on mining human activity patterns [9][10][11][12][13].
Among the studies on human activity patterns, some developed methods to extract activity hotspots [14][15][16].These activity hotspots are the clusters of activity locations in space and time.Not only do they display dynamics in terms of their structure and lifetime [13,17,18], but they also contain rich semantic information in terms of urban functionality, which refers to the actual land use information ISPRS Int.J. Geo-Inf.2017, 6, 341 2 of 16 such as residential area, commercial area, recreation, etc.However, it is not straightforward to infer their urban functionality using the available literature.On the one hand, POIs data have been directly used to reveal the functionality of urban land use [19][20][21][22], but the uncovered urban functionality cannot reflect the reality well.This is because some POIs may be visited many more times than the others, and hence the distribution of POIs may lead to an unreliable estimation of urban functionality.Besides, no activity hotspots were detected in those studies.On the other hand, POIs data have been indirectly used to infer the activity type of a stopping location or the purpose of a trip using the simple distance matching method [23,24], the decision tree model [25], the random forest method [26], the probability model [27][28][29], or even the visual analytic technique [30].The functionality of the activity hotspot was then derived by merging the activity types of all stopping locations within its spatial range [31][32][33].In this way, the revealed urban functionality considered the importance of different POIs, but the functionality of activity hotspots with very few stopping locations cannot be estimated reliably.In other words, the relationship between the POIs and the stopping locations is overlooked in the literature.Hence, this study aims to examine this issue by employing the underlying scaling pattern.
The scaling pattern can be understood from two perspectives.First, in physics, the scaling pattern of an entity may refer to a power law distribution of its size or quantity [34][35][36], which indicates invariance under contraction or dilation and is often thought to be a signature of hierarchies.Second, the scaling of an entity could be regarded as an allometric scaling relationship among its properties [37].This pattern has been observed in the biological world [38] and in human society [39].For instance, a recent study suggested that human interactions in terms of communication activities scale superlinearly with the size of administrative divisions such as statistical cities, urban zones, or municipalities [40].However, in urban studies, very few were aware of the importance of this scaling pattern.Sutton [41] had observed the allometric scaling pattern between urban area and urban population for cities in the US, which helped him to identify sprawling cities; Jiang et al. [42] had investigated the scaling pattern of geographic space and further applied this finding to the process of map generalization for many geographical entities including street network, coastline, and drainage network.To our knowledge, applying the scaling pattern of human activity hotspots to understand their urban functionality is still unexamined.
Therefore, this study is dedicated to investigate the scaling pattern of activity hotspots and further apply this rule to understand their urban functionality.Hence, it differs from the previous studies in three main aspects.Firstly, we use a head/tail break rule [43] to extract a large number of stopping locations from trajectory data, which are regarded as the activity locations for living, working, or shopping.These locations are further aggregated into activity hotspots using a newly designed temporal city clustering algorithm (TCCA).Secondly, we investigate the allometric scaling relationship between the number of POIs and the number of stopping locations, which helps to design an allometric ruler to identify the activity hotspots whose functionality could be reliably estimated using the specified number of stopping locations.This comes from the idea that humans can collectively interact with urban land use in an efficient way known as economies of scale [37].Thirdly, a Bayesian learning model is developed to infer the urban functionality of the identified activity hotspots, where it utilizes temporal and spatial information on stopping locations covering 75 days.As a result, the benefits of this work are two-fold.First, it expands the application of scaling patterns, which contributes to a quantitative understanding of the relationship between human and urban environments.Second, it contributes a novel method to infer the urban functionality of activity hotspots, and the results will be useful in many urban applications.For example, urban planners could use them as decision supports for verifying, updating, and compiling city land use plans.
The remainder of this paper is organized as follows.In Section 2, we describe the datasets and the procedures to derive stopping locations and human activity hotspots.In Section 3, we report the scaling pattern of human activity hotspots and the allometric ruler to identify those whose urban functionality could be reliably estimated.In Section 4, we propose the Bayesian learning model to infer the urban functionality of the identified activity hotspots.The limitations are discussed in Section 5. Conclusions are drawn in Section 6.

GPS Trajectory Dataset
The GPS trajectory dataset is acquired from the transportation agency of Wuhan, which shares its dataset with the research community for noncommercial use.It is composed of 830,062,777 records with the size of 46 GB, contributed by 16,787 taxis collecting for almost three months including January, March, and August of 2013.Each record in the dataset contains the ID number of the taxi and the information when the GPS signal is received at an interval of 60 s or less.The information includes the longitude (x) and latitude (y), the time (t), and the velocity (v) of the taxi.It should be noted that the longitude and latitude are referenced to the World Geodetic System 84 and measured with a horizontal accuracy of 5 m.Records with missing or incorrect information in terms of time or position are removed from the dataset, resulting in 93.1% of the records being adopted in this study.Spatially, it not only has trips to nearby cities but also covers the downtown area and suburbs of our study area, (Figure 1a); temporally, it covers 81% of the collection days due to business security concerns, which shows a good temporal coverage although each month contributes a different percentage, say 81% in January, 65% in March, or 97% in August.Besides, each taxi on average contributes approximately 1200 records per day, which indicates the consistency of data collection of each taxi in each day across the three months (Figure 1c).Hence, this trajectory dataset with such a large volume is chosen to ensure the robustness of our study.
ISPRS Int.J. Geo-Inf.2017, 6, 341 3 of 16 infer the urban functionality of the identified activity hotspots.The limitations are discussed in Section 5. Conclusions are drawn in Section 6.

GPS Trajectory Dataset
The GPS trajectory dataset is acquired from the transportation agency of Wuhan, which shares its dataset with the research community for noncommercial use.It is composed of 830,062,777 records with the size of 46 GB, contributed by 16,787 taxis collecting for almost three months including January, March, and August of 2013.Each record in the dataset contains the ID number of the taxi and the information when the GPS signal is received at an interval of 60 s or less.The information includes the longitude (x) and latitude (y), the time (t), and the velocity (v) of the taxi.It should be noted that the longitude and latitude are referenced to the World Geodetic System 84 and measured with a horizontal accuracy of 5 m.Records with missing or incorrect information in terms of time or position are removed from the dataset, resulting in 93.1% of the records being adopted in this study.Spatially, it not only has trips to nearby cities but also covers the downtown area and suburbs of our study area, (Figure 1a); temporally, it covers 81% of the collection days due to business security concerns, which shows a good temporal coverage although each month contributes a different percentage, say 81% in January, 65% in March, or 97% in August.Besides, each taxi on average contributes approximately 1200 records per day, which indicates the consistency of data collection of each taxi in each day across the three months (Figure 1c).Hence, this trajectory dataset with such a large volume is chosen to ensure the robustness of our study.

POI Dataset
The POI dataset is obtained from the company NavInfo, a provider of navigation services in Mainland China with the most detailed POI dataset.It has a total number of 59,876 POIs comprised of 16 categories relating to most of our daily activities.The names of these categories are: airport, commercial building, bus station, residential district, education, financial site, gas station, healthcare, hotel, leisure, parking, railway station, restaurant, shopping, toll, and tourist site (Figure 1b).Among the POIs, shopping sites seem to have the highest total number-21,630-followed by restaurants and leisure sites which total 8382 and 6586 respectively.There are three airports and 27 railway stations.POI categories can reflect land use types, and hence they can be used to estimate the functionality of urban regions.It is POIs that attract the human activities in neighborhoods and the whole city.Therefore, the POI dataset is chosen for understanding its relationship with human activities and further supplying a prior distribution of urban functionality.

Stopping Locations
Stopping locations denote the origins and destinations of the taxi trips.They are likely to contain implicit semantic information related to human activities [29,[31][32][33] and are regarded as proxies of human activity locations for working, living, shopping, etc.Hence, in a continuous trajectory, stopping locations tend to display significant dissimilarities with other moving locations.From the perspective of time, stopping locations refer to adjacent locations with large time intervals between them, and they are used to demarcate the continuous trajectories into individual trips.They can be seen in Figure 2c, where the red dots indicate stopping locations with large time intervals.Some trajectory datasets, like the New York taxi travel records, are accompanied by tag information that indicates whether the taxi is picking up or dropping off a fare.This information can be directly used to extract the stopping locations; however, there are many other trajectory datasets, like ours, which do not have such valuable information.Hence, it is not a trivial task to derive stopping locations, because of the ambiguity of setting threshold values for the time intervals.In this study, we take the arithmetic mean value of the time intervals of each taxi in each month as its own time threshold value, which is different from a previous study [36] where only one mean value is used.This strategy can be justified by the great differences among taxis in their mean values of time intervals, which obey remarkable power law distributions with respect to the three months as shown in Figure 2a.Besides, the reason for adopting the mean value as the time threshold for each taxi is the power-law-like distribution of the time interval of each taxi (Figure 2b), which incurs the usage of the head/tail break method [43][44][45][46][47] to demarcate the trajectory locations into two parts: the head (minority) of stopping locations and the tail (majority) of moving locations.Bearing the threshold values in mind, we extract 52,633,829 stopping locations within our study area (Figure 2c), which means each taxi generates around 42 stopping locations per day on average.This is reasonable for the normal business of taxi service.

POI Dataset
The POI dataset is obtained from the company NavInfo, a provider of navigation services in Mainland China with the most detailed POI dataset.It has a total number of 59,876 POIs comprised of 16 categories relating to most of our daily activities.The names of these categories are: airport, commercial building, bus station, residential district, education, financial site, gas station, healthcare, hotel, leisure, parking, railway station, restaurant, shopping, toll, and tourist site (Figure 1b).Among the POIs, shopping sites seem to have the highest total number-21,630-followed by restaurants and leisure sites which total 8382 and 6586 respectively.There are three airports and 27 railway stations.POI categories can reflect land use types, and hence they can be used to estimate the functionality of urban regions.It is POIs that attract the human activities in neighborhoods and the whole city.Therefore, the POI dataset is chosen for understanding its relationship with human activities and further supplying a prior distribution of urban functionality.

Stopping Locations
Stopping locations denote the origins and destinations of the taxi trips.They are likely to contain implicit semantic information related to human activities [29,[31][32][33] and are regarded as proxies of human activity locations for working, living, shopping, etc.Hence, in a continuous trajectory, stopping locations tend to display significant dissimilarities with other moving locations.From the perspective of time, stopping locations refer to adjacent locations with large time intervals between them, and they are used to demarcate the continuous trajectories into individual trips.They can be seen in Figure 2c, where the red dots indicate stopping locations with large time intervals.Some trajectory datasets, like the New York taxi travel records, are accompanied by tag information that indicates whether the taxi is picking up or dropping off a fare.This information can be directly used to extract the stopping locations; however, there are many other trajectory datasets, like ours, which do not have such valuable information.Hence, it is not a trivial task to derive stopping locations, because of the ambiguity of setting threshold values for the time intervals.In this study, we take the arithmetic mean value of the time intervals of each taxi in each month as its own time threshold value, which is different from a previous study [36] where only one mean value is used.This strategy can be justified by the great differences among taxis in their mean values of time intervals, which obey remarkable power law distributions with respect to the three months as shown in Figure 2a.Besides, the reason for adopting the mean value as the time threshold for each taxi is the power-law-like distribution of the time interval of each taxi (Figure 2b), which incurs the usage of the head/tail break method [43][44][45][46][47] to demarcate the trajectory locations into two parts: the head (minority) of stopping locations and the tail (majority) of moving locations.Bearing the threshold values in mind, we extract 52,633,829 stopping locations within our study area (Figure 2c), which means each taxi generates around 42 stopping locations per day on average.This is reasonable for the normal business of taxi service.

Human Activity Hotspots
We propose a temporal city clustering algorithm (TCCA), which is an extension of the city clustering algorithm (CCA) [15], to aggregate the individual stopping locations "from the bottom up" into human activity hotspots.They refer to the clusters within an urban area where most of the human activities take place.Specifically, this method starts from a randomly selected location, on which a cylinder is drawn to check whether other locations are encompassed.The cylinder is constructed by drawing a circle with a specified spatial radius r, which is then extruded along the time dimension with a specified time resolution τ.This process goes recursively until no locations are within the cylinder or the contained locations have already been checked.
How to specify the values of the two parameters requires further investigation.In this study, we intuitively set the spatial radius r at 200 m, which is the approximate average length of road segments in the study area.This setting is consistent with the argument that human activities are mostly constrained to city blocks delineated by road networks [35].On the other hand, for a better understanding of the effects of time resolution on scaling pattern, we change the time resolution τ from 5 min to 30 min with 5 min increments.This results in 6 groups of activity hotspots with respect to different time resolutions in each month.As shown in Figure 3a, the number of activity hotspots is decreasing with the increment of time resolution.Besides, we show one activity hotspot in Figure 3b, which clearly depicts its spatio-temporal process.

Human Activity Hotspots
We propose a temporal city clustering algorithm (TCCA), which is an extension of the city clustering algorithm (CCA) [15], to aggregate the individual stopping locations "from the bottom up" into human activity hotspots.They refer to the clusters within an urban area where most of the human activities take place.Specifically, this method starts from a randomly selected location, on which a cylinder is drawn to check whether other locations are encompassed.The cylinder is constructed by drawing a circle with a specified spatial radius r, which is then extruded along the time dimension with a specified time resolution τ.This process goes recursively until no locations are within the cylinder or the contained locations have already been checked.
How to specify the values of the two parameters requires further investigation.In this study, we intuitively set the spatial radius r at 200 m, which is the approximate average length of road segments in the study area.This setting is consistent with the argument that human activities are mostly constrained to city blocks delineated by road networks [35].On the other hand, for a better understanding of the effects of time resolution on scaling pattern, we change the time resolution τ from 5 min to 30 min with 5 min increments.This results in 6 groups of activity hotspots with respect to different time resolutions in each month.As shown in Figure 3a, the number of activity hotspots is decreasing with the increment of time resolution.Besides, we show one activity hotspot in Figure 3b, which clearly depicts its spatio-temporal process.

Human Activity Hotspots
We propose a temporal city clustering algorithm (TCCA), which is an extension of the city clustering algorithm (CCA) [15], to aggregate the individual stopping locations "from the bottom up" into human activity hotspots.They refer to the clusters within an urban area where most of the human activities take place.Specifically, this method starts from a randomly selected location, on which a cylinder is drawn to check whether other locations are encompassed.The cylinder is constructed by drawing a circle with a specified spatial radius r, which is then extruded along the time dimension with a specified time resolution τ.This process goes recursively until no locations are within the cylinder or the contained locations have already been checked.
How to specify the values of the two parameters requires further investigation.In this study, we intuitively set the spatial radius r at 200 m, which is the approximate average length of road segments in the study area.This setting is consistent with the argument that human activities are mostly constrained to city blocks delineated by road networks [35].On the other hand, for a better understanding of the effects of time resolution on scaling pattern, we change the time resolution τ from 5 min to 30 min with 5 min increments.This results in 6 groups of activity hotspots with respect to different time resolutions in each month.As shown in Figure 3a, the number of activity hotspots is decreasing with the increment of time resolution.Besides, we show one activity hotspot in Figure 3b, which clearly depicts its spatio-temporal process.

Scaling Pattern of Human Activity Hotspots
Human activity hotspots are good subjects to test the inherent scaling pattern.Hence, we investigate the scaling relationship between the number of stopping locations and the number of POIs, say y = ax b , where x is the number of stopping locations, y is the number of POIs, a is the constant, and b is the scaling exponent.In this context, the number of stopping locations relates to the potential needs of the people visiting the activity hotspots, which reflects the importance of the activity hotspots.The number of POIs relates to the extent of urban land use that can satisfy the needs of the people, which reflects the potential functionality of the activity hotspots.As shown in Figure 4 and Table 1, the findings coincide with our initial assumption suggesting a clearly allometric sublinear scaling relationship for the three months irrespective of the time clustering resolution.This consistency implies that the universal law of economies of scale dominate human interaction with urban land use.In addition, these allometric scaling relationships display very high values of goodness of fit in term of R-square values, and they are all statistically significant with very small p-values approaching to 0 using F-test.Hence, the goodness of fit test and the significance test might guarantee the reliability of these scaling patterns.

Scaling Pattern of Human Activity Hotspots
Human activity hotspots are good subjects to test the inherent scaling pattern.Hence, we investigate the scaling relationship between the number of stopping locations and the number of POIs, say b ax y = , where x is the number of stopping locations, y is the number of POIs, a is the constant, and b is the scaling exponent.In this context, the number of stopping locations relates to the potential needs of the people visiting the activity hotspots, which reflects the importance of the activity hotspots.The number of POIs relates to the extent of urban land use that can satisfy the needs of the people, which reflects the potential functionality of the activity hotspots.As shown in Figure 4 and Table 1, the findings coincide with our initial assumption suggesting a clearly allometric sublinear scaling relationship for the three months irrespective of the time clustering resolution.This consistency implies that the universal law of economies of scale dominate human interaction with urban land use.In addition, these allometric scaling relationships display very high values of goodness of fit in term of R-square values, and they are all statistically significant with very small pvalues approaching to 0 using F-test.Hence, the goodness of fit test and the significance test might guarantee the reliability of these scaling patterns.

Identification of the Reliable Human Activity Hotspots
The above scaling pattern of human activity hotspots hints that we could devise an allometric ruler, the de facto power regression line between the number of stops and POIs, to identify the activity hotspots whose functionality can be reliably estimated using stopping locations.Specifically, from the allometric ruler, we can define the deviation percentage (DP) for an activity hotspot i as: where Observed(i) is the number of POIs in activity hotspot i and Estimated(i) is the number of POIs estimated from the stopping locations.As shown in Figure 5a, activity hotspots are separated into Table 1.Allometric scaling relationships between stops and POIs in different time resolutions for the three months (Note: All these models are statistically significant with p-values of 0 using F-test).

Identification of the Reliable Human Activity Hotspots
The above scaling pattern of human activity hotspots hints that we could devise an allometric ruler, the de facto power regression line between the number of stops and POIs, to identify the activity hotspots whose functionality can be reliably estimated using stopping locations.Specifically, from the allometric ruler, we can define the deviation percentage (DP) for an activity hotspot i as: where Observed(i) is the number of POIs in activity hotspot i and Estimated(i) is the number of POIs estimated from the stopping locations.As shown in Figure 5a, activity hotspots are separated into two parts by the allometric ruler, where the upper ones in gray have DP values less than 0 and the lower ones in red have DP values greater than 0. Besides, the number of stopping locations (stops) is also used to identify the reliable activity hotspots.Intuitively, the larger the values of DP and stops, the more reliable the activity hotspots can be estimated using stopping locations.
As an example, the ruler is applied to activity hotspots with a time resolution of 15 min in January.In Figure 5b, they are colored according to the DP values using a jet colormap.In this study, for the purpose of illustration, an activity hotspot is selected if its DP value is greater than 0.92 and its stops value is greater than 1000.The former ensures a reliable estimation of urban functionality, while the latter guarantees an important activity hotspot.Using the ruler, a total number of 67 activity hotspots are identified as the reliable ones (enclosed within boxes in Figure 5b).In the city, they are spatially related to six urban regions as shown in Figure 5c.For instance, some regions are associated with a large number of activity hotspots, while others are related to a very small number.Intuitively a popular region, say, a railway station, would correspond to a large number of activity hotspots.However, for ease of analysis, we take only the activity hotspot with the largest number of stops as the representative of each region.two parts by the allometric ruler, where the upper ones in gray have DP values less than 0 and the lower ones in red have DP values greater than 0. Besides, the number of stopping locations (stops) is also used to identify the reliable activity hotspots.Intuitively, the larger the values of DP and stops, the more reliable the activity hotspots can be estimated using stopping locations.
As an example, the ruler is applied to activity hotspots with a time resolution of 15 min in January.In Figure 5b, they are colored according to the DP values using a jet colormap.In this study, for the purpose of illustration, an activity hotspot is selected if its DP value is greater than 0.92 and its stops value is greater than 1000.The former ensures a reliable estimation of urban functionality, while the latter guarantees an important activity hotspot.Using the ruler, a total number of 67 activity hotspots are identified as the reliable ones (enclosed within boxes in Figure 5b).In the city, they are spatially related to six urban regions as shown in Figure 5c.For instance, some regions are associated with a large number of activity hotspots, while others are related to a very small number.Intuitively a popular region, say, a railway station, would correspond to a large number of activity hotspots.However, for ease of analysis, we take only the activity hotspot with the largest number of stops as the representative of each region.

Construct the Bayesian Inference Model
Once the activity hotspots are identified, the main question is how to construct a model to infer their urban functionality using the stopping locations.To solve this problem, we use a Bayesian inference model, aiming to employ the Bayes' theorem to update the probability of a hypothesis with observed evidence.In this study, the distribution of POIs is considered as the hypothetic functionality distribution, whereas the stopping locations act as observed evidence to infer the real functionality distribution.Specifically, the proposed Bayesian inference model is designed to determine the most likely POI type, given a stopping location's spatial coordinate and occurrence time.The model is shown in Equation ( 2), where Pr( (x, y)|type) is the probability of observing one stopping location at coordinate x, y, given a POI type, Pr( t|type) is the probability of observing one stopping location around time t, given a POI type, and Pr(type) is the probability of observing a certain type of POI in the study area.Note that it is assumed that the occurrence of a stopping location at coordinate x, y was independent of the time t, given a POI type.
To put it in detail, this model is composed of three steps.The first step is to derive the empirical probability of observing one stopping location around time t, given a POI type (Equation ( 3)).To achieve this step, we decompose the entire region into Voronoi cells using each type of POI, which is similar to the assumption in central place theory [49], so that each POI acted as a nodal point for the distribution of goods or services to the surrounding Voronoi cell.Consumers were assumed to act as homo economicus to the nearest POI, given that travel effort was equal in all directions.Then, stopping locations falling inside each Voronoi cell are aggregated according to time of day, which is discredited into 24 h.Note that the aggregation process does not rely on simply counting the number of stopping locations, but considers the geographical weight in terms of its inverse distance to the POI.In other words, near stopping locations are more important than far ones.Finally, a further aggregation is performed for all the Voronoi cells regarding the time of day, which leads to the empirical probability of observing one stopping location at a certain time of day, given each type of POI.In Equation ( 3), SL i is the ith stopping location, VN j is the jth Voronoi cell, N type is the number of a certain type of POI, n j t is the number of stopping locations within the jth Voronoi cell around a certain time t of the day, and n j is the number of stopping locations within the jth Voronoi cell.
The second step focuses on deriving the empirical probability of observing one stopping location at coordinate x, y, given a POI type.It is very difficult to directly derive this probability from observations, but we assume that it was proportional to the gravity force between the POI and the stopping location [33].The gravity force can be approximated by the product of the attractiveness value of the POI and its inverse distance to the stopping location.In Equation ( 4), A type is the attractiveness value of a POI type specified according to its physical size or importance, and µ is the distance decay factor, in this study set as the value of 1.5 [33].
The third step is to infer the activity type of a stopping location with the maximum probability according to the naïve Bayesian theory.To further mimic the real urban functionality of activity hotspots, we add the damping factor β to the inference model.It was originally used in the PageRank [50] algorithm to model the probability that a web surfer would continue clicking on the current web page without jumping to a randomly chosen one.In this context, it allows the stopping location with a probability to be matched with a POI type in the entire study area to supplement the urban functionality of an activity hotspot.The formula can be seen in Formula (5), where Φ global is the set of POI types in the study area, Φ local is the set of POI types in the activity hotspot, and β threshold is the threshold value of the damping factor.

Train the Model and Infer the Urban Functionality
Once the model is constructed, we train it with the stopping locations.The training process is carried out in two steps.First, we calculate the probability of observing a stopping location given a POI type (likelihood function), which is actually to train Equation (3).However, for Equation (4), there is no need to use the empirical stopping locations, because it can be calculated on the fly.Specifically, the training process adopts all the stopping locations covering three months.To test stability, stopping locations are supplied to the model gradually in daily increments for a total of 75 days, which eventually leads to a saturated likelihood function curve, shown as the red dotted line in Figure 6a. Figure 6 shows how each type of POI functions at different times of day; generally, the probability of human activities decreases from midnight to morning and displays different patterns in the daytime for different types of POI.For example, in Figure 6d, we can observe roughly two peaks in human activities; going to work from 06:00 to 08:00 and returning home from 16:00 to 18:00.Second, we calculate the probability of each POI type before stopping locations are available, which is actually to train the prior probability, say Pr(type).Hence, this training process is simply to compute the percentage of POIs with respect to different types in the study area.
ISPRS Int.J. Geo-Inf.2017, 6, 341 9 of 16 page without jumping to a randomly chosen one.In this context, it allows the stopping location with a probability to be matched with a POI type in the entire study area to supplement the urban functionality of an activity hotspot.The formula can be seen in Formula ( 5), where Фglobal is the set of POI types in the study area, Фlocal is the set of POI types in the activity hotspot, and βthreshold is the threshold value of the damping factor.

Train the Model and Infer the Urban Functionality
Once the model is constructed, we train it with the stopping locations.The training process is carried out in two steps.First, we calculate the probability of observing a stopping location given a POI type (likelihood function), which is actually to train Equation (3).However, for Equation (4), there is no need to use the empirical stopping locations, because it can be calculated on the fly.Specifically, the training process adopts all the stopping locations covering three months.To test stability, stopping locations are supplied to the model gradually in daily increments for a total of 75 days, which eventually leads to a saturated likelihood function curve, shown as the red dotted line in Figure 6a. Figure 6 shows how each type of POI functions at different times of day; generally, the probability of human activities decreases from midnight to morning and displays different patterns in the daytime for different types of POI.For example, in Figure 6d, we can observe roughly two peaks in human activities; going to work from 06:00 to 08:00 and returning home from 16:00 to 18:00.Second, we calculate the probability of each POI type before stopping locations are available, which is actually to train the prior probability, say e) Pr(typ .Hence, this training process is simply to compute the percentage of POIs with respect to different types in the study area. (a) (   Then, the urban functionality of the identified activity hotspots is inferred using stopping locations.Specifically, for each activity hotspot i, we randomly select stopping locations with the same number as the estimated POIs, and these stopping locations are supplied to Formula (5) to determine the most matched POI types.Hence, these newly derived POI types are aggregated to infer their functionality by either reweighting the importance among POIs or adding new types of POIs.It should be noted that we set the damping factor at the value of 0.85, which has been generally used in many studies on human mobility patterns and models [36,51].In this context, it means that approximately 85% of stopping locations would be matched with POIs in the local activity hotspot and approximately 15% of them would be assigned to POI types in the global study area.To avoid sampling bias, this procedure is repeated 100 times for each activity hotspot, and the final average value is used.Compared with the prior probability of functionality distribution (Figure 7a), we can enhance the reliability of activity hotspots for their urban functionality, as shown in Figure 7b.
To put the results in detail: The probability of observing one stopping location at times of day for (a) airport, (b) commercial buildings, (c) bus facilities, (d) residential districts, (e) education sites, (f) financial sites, (g) gas station, (h) healthcare facilities, (i) hotel, (j) leisure sites, (k) parking sites, (l) railway stations, (m) restaurants, (n) shopping sites, (o) toll sites and (p) tourist sites, where x-axis is in hour and y-axis is the probability.(Note: In this figure, the curves with color changing from black to white are calculated by using the stopping locations from one day to 75 days respectively, and the curve with the red dotted line indicates the saturated one calculated by using all stopping locations of 75 days; this figure suggests that different types of POI might function differently in the time of day, for instance, airport, bus station and railway station present a relatively stable visiting pattern during daytime, residential district or commercial building displays roughly two peaks in the morning and in the afternoon, and leisure sites show a clear peak around 22:00 at night).
Then, the urban functionality of the identified activity hotspots is inferred using stopping locations.Specifically, for each activity hotspot i, we randomly select stopping locations with the same number as the estimated POIs, and these stopping locations are supplied to Formula (5) to determine the most matched POI types.Hence, these newly derived POI types are aggregated to infer their functionality by either reweighting the importance among POIs or adding new types of POIs.It should be noted that we set the damping factor at the value of 0.85, which has been generally used in many studies on human mobility patterns and models [36,51].In this context, it means that approximately 85% of stopping locations would be matched with POIs in the local activity hotspot and approximately 15% of them would be assigned to POI types in the global study area.To avoid sampling bias, this procedure is repeated 100 times for each activity hotspot, and the final average value is used.Compared with the prior probability of functionality distribution (Figure 7a), we can enhance the reliability of activity hotspots for their urban functionality, as shown in Figure 7b.
To put the results in detail: • Activity hotspot 1: It is enhanced by improving the importance of airport, namely many stopping locations are matched with the POI of airport.Hence, it is considered to provide aviation service, which coincides with the real situation by visual check in Figure 7c.

Discussion
The underlying scaling pattern of geographic entities might be examined from two perspectives, namely the power law distribution of one quantity and the allometric relationship between two quantities.The allometric relationship could be further examined in situations of a sublinear or superlinear scaling relationship.For the sublinear case, it is typically observed that a city resembles an organism and its population size scales sublinearly with the consumed energies or resources for economies of scale.For the superlinear case, it found that the population size of a city scaled superlinearly with the amount of patents/innovation for the requirement of wealth creation [37].The scaling pattern of geographical entities might have many potential implications, but only very few urban studies in the literature have realized its importance, such as the studies on urban sprawl [41] and map generalization [42].Hence, the novelty of our study is to infer the urban functionality of human activity hotspots using their allometric scaling relationship with urban land use, which to our knowledge has rarely been investigated in the literature.
Nonetheless, this study has several limitations: First, to estimate the urban functionality of activity hotspots, we use taxi trajectories spanning 75 days over three months.A major concern is the reliability of the reported results, because taxis represent only one travel mode among many chosen by urban residents, which limits the observed types of human activities.In other words, it is very difficult to estimate the statistical bias owing to the absence of transportation survey data at Wuhan, which is a common problem of investigations on human activity or mobility patterns using taxi trajectories.Therefore, how to resolve this issue is still an open question.Nonetheless, a possible solution is to integrate other types of human activity or movement data, such as the integrated circuit card data collected by a bus or metro system, the movement data of private vehicles or bicycles, or even pedestrian movement data.This requires further studies.
Secondly, we investigate the quality of GPS trajectory dataset from two aspects.Spatially, we examine the reliability of the assignment of GPS locations to the Voronoi cells of a specified type of POI.To do so, we calculate the service radius (r) of a Voronoi cell as the radius of a circle with the same spatial extent.As shown in Figure 8, we find that 99% of Voronoi cells have an r value greater than 7 m and 90% of Voronoi cells have an r value greater than 13 m.Hence, via a simple comparison with the horizontal accuracy of GPS locations (5 m), it can be generally assumed that the accuracy of GPS locations has a limited influence on their assignments to the Voronoi cells.Temporally, the major concern is that the temporal resolution of our GPS location is relatively low, and it is very difficult to estimate the stops with time duration of less than one minute.In other words, you can hardly know what happens between two consecutive GPS locations.However, we try to use a massive dataset in order to compensate for lost stopping locations.Besides, it is reasonable that each taxi contributes on average 42 stopping locations (or 41 trips) in one day (24 h), because the estimated monthly income of one taxi driver of 4613 Yuan (considering the facts of two drivers sharing a taxi, the 15 Yuan charge for a trip, and the profit rate of 0.5 in Wuhan, China) is very close to the amount (5000 Yuan) published in the income survey report [52].
Thirdly, in the Bayesian reference model we set the damping factor at 0.85, which means that approximately 85% of stopping locations are matched to the POIs in the local activity hotspot, and approximately 15% of them are matched to the POIs in the entire study area.By setting a damping factor we can mimic real situations by supplementing the urban functionality of the activity hotspot.However, it is not easy to set the value of a damping factor for a better simulation, and it requires further study.
Fourthly, to select the human activity hotspots whose functionality could be reliably estimated, we empirically determined the DP value of 0.92 and the stops value of 1000 for the purpose of illustration.This is a potential limitation, and it affects the number of activity hotspots to be selected.In other words, how to select the activity hotspots is still an open question, and it should be dependent on the field of application and determined by decision makers.
Fifthly, the inferred urban functionality is validated using the OpenStreetMap (OSM) [53], an online map service aimed specifically at creating and providing free geographic data such as street maps to anyone.The validation procedure starts by visually observing the major urban functionality of the activity hotspot on the map, and then it compares them with the major inferred urban functionality.For some activity hotspots, it is difficult to identify the functionality from OSM due to the absence of geographic data, and the validation is conducted in the field.However, it suffers from subjectivity and could be applied to only a few samples, with inaccurate results.A more objective and accurate validation method is thus needed, which requires further study.

Figure 1 .
Figure 1.Datasets used in this study: (a) trajectories in March overlaid on the urban street network; (b) Points of Interest (POI) dataset overlaid on the urban street network; (c) average number of records per taxi during each day in January, March, and August, where x-axis is the day of month, y-axis labels the average number of records per taxi, and the day with negative number implies the absence of records.

Figure 3 .
Figure 3. Human activity hotspots for the three months: (a) the number of activity hotspots decreases with the increasing value of time resolution from 5 min to 30 min; (b) 3D boundary of an activity hotspot lasting from 1 January 2013 08:07:26 to 1 January 2013 22:19:14, where its spatial area expands at approximately 13:00.

Figure 2 .
Figure 2. Extracting stopping locations: (a) a log-log plot for the power law distributions of the mean values of time intervals of individual taxis with respect to January, March, and August, where the significance tests are conducted with p-values as 0.5, 0.5, and 0.25 for the three months respectively (please refer to [48] for the details on calculating the p-value); (b) a log-log plot for the power-law-like distribution of the time interval values of one taxi; (c) an illustration on extracting stopping locations from a continuous trajectory.

Figure 2 .
Figure 2. Extracting stopping locations: (a) a log-log plot for the power law distributions of the mean values of time intervals of individual taxis with respect to January, March, and August, where the significance tests are conducted with p-values as 0.5, 0.5, and 0.25 for the three months respectively (please refer to [48] for the details on calculating the p-value); (b) a log-log plot for the power-law-like distribution of the time interval values of one taxi; (c) an illustration on extracting stopping locations from a continuous trajectory.

Figure 3 .Figure 3 .
Figure 3. Human activity hotspots for the three months: (a) the number of activity hotspots decreases with the increasing value of time resolution from 5 min to 30 min; (b) 3D boundary of an activity hotspot lasting from 1 January 2013 08:07:26 to 1 January 2013 22:19:14, where its spatial area expands at approximately 13:00.

Figure 4 .
Figure 4. Allometric scaling relationship between the number of stops and POIs for activity hotspots with time resolution of 15 min in: (a) January; (b) March; (c) August.

Figure 4 .
Figure 4. Allometric scaling relationship between the number of stops and POIs for activity hotspots with time resolution of 15 min in: (a) January; (b) March; (c) August.

Figure 5 .
Figure 5. Usage of the allometric ruler: (a) schematic illustration of the allometric ruler; (b) application of the allometric ruler to human activity hotspots with a time resolution of 15 min in January; (c) map of six regions corresponding to the identified activity hotspots.

Figure 5 .
Figure 5. Usage of the allometric ruler: (a) schematic illustration of the allometric ruler; (b) application of the allometric ruler to human activity hotspots with a time resolution of 15 min in January; (c) map of six regions corresponding to the identified activity hotspots.

Figure 6 .
Figure 6.The probability of observing one stopping location at times of day for (a) airport, (b) commercial buildings, (c) bus facilities, (d) residential districts, (e) education sites, (f) financial sites, (g) gas station, (h) healthcare facilities, (i) hotel, (j) leisure sites, (k) parking sites, (l) railway stations, (m) restaurants, (n) shopping sites, (o) toll sites and (p) tourist sites, where x-axis is in hour and y-axis is the probability.(Note: In this figure, the curves with color changing from black to white are calculated by using the stopping locations from one day to 75 days respectively, and the curve with the red dotted line indicates the saturated one calculated by using all stopping locations of 75 days; this figure suggests that different types of POI might function differently in the time of day, for instance, airport, bus station and railway station present a relatively stable visiting pattern during daytime, residential district or commercial building displays roughly two peaks in the morning and in the afternoon, and leisure sites show a clear peak around 22:00 at night).
Figure 6.The probability of observing one stopping location at times of day for (a) airport, (b) commercial buildings, (c) bus facilities, (d) residential districts, (e) education sites, (f) financial sites, (g) gas station, (h) healthcare facilities, (i) hotel, (j) leisure sites, (k) parking sites, (l) railway stations, (m) restaurants, (n) shopping sites, (o) toll sites and (p) tourist sites, where x-axis is in hour and y-axis is the probability.(Note: In this figure, the curves with color changing from black to white are calculated by using the stopping locations from one day to 75 days respectively, and the curve with the red dotted line indicates the saturated one calculated by using all stopping locations of 75 days; this figure suggests that different types of POI might function differently in the time of day, for instance, airport, bus station and railway station present a relatively stable visiting pattern during daytime, residential district or commercial building displays roughly two peaks in the morning and in the afternoon, and leisure sites show a clear peak around 22:00 at night).
Figure 6.The probability of observing one stopping location at times of day for (a) airport, (b) commercial buildings, (c) bus facilities, (d) residential districts, (e) education sites, (f) financial sites, (g) gas station, (h) healthcare facilities, (i) hotel, (j) leisure sites, (k) parking sites, (l) railway stations, (m) restaurants, (n) shopping sites, (o) toll sites and (p) tourist sites, where x-axis is in hour and y-axis is the probability.(Note: In this figure, the curves with color changing from black to white are calculated by using the stopping locations from one day to 75 days respectively, and the curve with the red dotted line indicates the saturated one calculated by using all stopping locations of 75 days; this figure suggests that different types of POI might function differently in the time of day, for instance, airport, bus station and railway station present a relatively stable visiting pattern during daytime, residential district or commercial building displays roughly two peaks in the morning and in the afternoon, and leisure sites show a clear peak around 22:00 at night).

• Activity hotspot 2 : 16 •Figure 7 .
Figure 7. Urban functionality of the activity hotspots and their validations on OpenStreetMap (available at: www.openstreetmap.org):(a) prior probability of functionality distribution; (b) inferred probability of functionality distribution; (c) activity hotspot 1, providing aviation services with an international airport; (d) activity hotspot 2, providing educational services with driving schools; (e) activity hotspot 3, providing recreation and healthcare services with an ocean park and an infectious disease hospital, which cannot be inferred by our model; (f) activity hotspot 4, serving as a residential

Figure 7 .
Figure 7. Urban functionality of the activity hotspots and their validations on OpenStreetMap (available at: www.openstreetmap.org):(a) prior probability of functionality distribution; (b) inferred probability of functionality distribution; (c) activity hotspot 1, providing aviation services with an international airport; (d) activity hotspot 2, providing educational services with driving schools; (e) activity hotspot 3, providing recreation and healthcare services with an ocean park and an infectious disease hospital, which cannot be inferred by our model; (f) activity hotspot 4, serving as a residential area with many shopping sites; (g) activity hotspot 5, providing railway services with a train station; (h) activity hotspot 6, serving as an educational area with several institutions or colleges.

Table 1 .
Allometric scaling relationships between stops and POIs in different time resolutions for the three months (Note: All these models are statistically significant with p-values of 0 using F-test).