As shown in
Figure 1, the proposed methodology framework consists of three steps. In the first step, the whole city area is segmented into contiguous patches (i.e., basic units) using appropriate levels of traffic networks. Secondly, POI data and traffic patterns of each basic unit are fed into the DMR model, which allows the combination of both data sources. Related models such as TF-IDF, LSI, and LDA are also utilized by comparison. Then, the urban units with certain functional meaning are clustered with K-means clustering, resulting in functional zones with similar functions and roles in the urban system. Lastly, given the functional zones created in the third step, graph clustering is applied to the traffic patterns of these zones to generate urban regions, encompassing a subset of distinct functions, where intra-region travelling trips should be as large as possible while inter-regional trips should be as small as possible. Each of the regions can be considered as a cluster, featuring a set of closely complementary functions and strongly interconnected traffic relations. The distinguishing features of this framework are the embeddedness of traffic flows and integration of topic models, in which the utilize traffic flows are utilized twice, and the performance of prevailing topic models is compared.
3.3.2. Discovery of Functional Zones
To satisfy the input data requirements of topic models, both the taxi OD data and POI data are processed to create the requisite data format. The taxi OD dataset contains the origin point, destination point, and their corresponding time. For a particular basic unit ui, two kinds of mobility patterns (arriving and leaving) can be derived. The arriving pattern corresponds to trips that originate from other units and terminate at ui, while its leaving counterpart corresponds to trips that originate from ui and terminate at any other units. The traffic flow patterns of each unit are formulated on an hourly basis, and considering the distinct nature of mobility patterns between weekdays and weekends, they are constructed by averaging the number of trips on weekdays and weekends separately. Therefore, the mobility patterns of ui can be built as a matrix with rows denoting basic units, columns representing time bins, and each cell indicating the average number of trips originating from or terminating at ui during the corresponding time bin. Further, the leaving and arriving cuboids are yielded by concatenating the corresponding mobility matrix of each unit.
Each record in the POI dataset refers to a single entity, which contains such properties as name, position, and category. According to the 20 categories listed in
Table 1, the POI within each basic unit can be formulated into a vector, representing the categorical distribution. For basic unit u
i, the POI vector can be expressed as V
i = (p
1, p
2, p
3, … p
i … p
20), in which p
i is the percentage of POI of the i
th category in total POI, i.e., p
i = number of POI of i
th category/total number of POI. Similarly, the POI vector can be concatenated into the POI matrix.
Similar to the logic of topic inference for documents, topic models can be applied to discovering the function distribution of a region. As is shown in
Table 2, a basic region unit, its functions, and traffic flow patterns in discovering urban function are analogous to a document, its latent topics, and words, respectively, in the situation of text mining. Specifically, POI data serves as metadata in the DMR model, which is equivalent to the part played by the bibliography information (e.g., author, date, or institution) in the topic extraction situation. As is shown in
Figure 2, the process of DMR is:
First, for each region function f,
Second, for each region r,
for each function f, calculate the prior parameter: arf = exp ( λf)
draw function distribution: ϴr ~ Dir(a)
for each mobility pattern mrn in region r,
where R represents regions, F denotes functions, P is the feature of POI, and β is the Dirichlet prior. Following the procedure of DMR, the function assignment for each basic unit can be discovered based on the traffic flow patterns and POI data.
For TF-IDF model, the number of POI pertaining to a certain category in a basic region unit resembles the term frequency in a document. Therefore, a POI feature vector can be built for each basic unit, and a POI matrix can be derived with each cell representing the POI percentage of a certain category. Similar to the case of discovering the high frequent terms of the regular document, this algorithm allocates a weight to each term and identifies those that are most notable and significant. These selected terms represent the elements underpinning the functions of each unit.
As a topic model, LSI enables the latent topics of each basic unit to be extracted from traffic flow patterns or POI data, which is the most distinguishing feature compared with TF-IDF. Along with the standard process, the POI vector, or traffic flow patterns of each basic unit can be treated analogously as the words of a document. The SVD is then carried out on the constructed POI or traffic flow patterns, and the two approaches are designated as LSI_POI and LSI_MP. The dimensions or features representing the functions of basic unit are acquired by selecting a particular number of entries in the diagonal matrix.
Analogous to discovering the latent topics of a document, LDA can be employed to find the functions of the basic units in the urban area. In this scheme, the unit is equivalent to the document in a corpus, mobility patterns or POI to words, and functions to topics. The traffic flow patterns and POI data are loaded into the LDA model accordingly in LDA_MP and LDA_POI, and two types of function distribution can be obtained based on corresponding data sources.
As we have obtained six kinds of function representations using corresponding algorithms (i.e., TF-IDF, LSI_MP, LSI_POI, LDA_MP, LDA_POI, and DMR), K-means clustering is then applied to these function representations. As a consequence, the functional zones with certain functional topics are obtained.
3.3.3. Aggregation of Urban Functional Clusters
The adjacent functional zones across the urban area are aggregated to form urban functional region, which is equivalent to the concept of polycentric urban region in the community of urban studies. PUR can be defined as a collection of settlements that are closely interconnected in terms of both functions and geographical locations [
31,
51]. In our case, the aim is to build up such PUR that the traffic volume within each centric area is as large as possible and that between centric areas as small as possible, suggesting strong connections among zones in the same centric area. As zones are characterized with functions, the PUR is inherently underpinned by functional attributes.
As a widely used graph analysis method, graph clustering provides an approach to grouping the vertices of a graph into clusters, also called communities [
52], with the objective of having more edges within each cluster and relatively fewer between clusters [
53]. Along with the logic, graph clustering is performed in the analysis to yield geographically contiguous regions, which possess various functions embodied in the functional zones. In this framework, each functional zone is further separated into several nonadjacent sub-zones, which are considered the nodes. The presence of a trip between sub-zones is regarded as the edge, and the number of trips is directly associated with edge weight. Therefore, using inter-zone travels, we can establish a weighted graph that is capable of depicting the morphological correlations among functional zones.
Graph clustering is implemented on the constructed traffic graph based on the derived zones in the previous step. Given six distinct categories of clustered urban functional zones, urban polycentricity can be evaluated based on the clustered results accordingly obtained from graph clustering.