Detecting and Analyzing Urban Centers Based on the Localized Contour Tree Method Using Taxi Trajectory Data: A Case Study of Shanghai

: Urban structure is of vital importance to urban planning, transportation, economics and other applications. Since detecting and analyzing urban centers is crucial for understanding urban structure, a large number of studies on urban center extraction have been performed. In this paper, we propose an analysis framework to identify urban centers by using taxi trajectory data. The proposed approach differs from previous methods by employing a novel way to simulate taxi trajectory data with the topographic surface. We extracted pick-up and drop-off spots from taxi trajectory data and employed the localized contour tree method to delineate the boundaries and hierarchies of urban centers. The experiments show that the proposed method can successfully detect urban centers and analyze their temporal patterns in different periods in Shanghai, China.


Introduction
Urban structure is defined as the spatial arrangement of different land uses in urban areas [1]. It is an important part of urban geography and has major effects on urban planning, economics, transportation and many other aspects of cities. Given the acceleration of urbanization in recent years, the study of urban structure is of great importance to government decision-making and policy implementation.
In 1925, Burgess proposed the first monocentric model to describe the city structure [2]. Since then, many researchers have focused on studying urban structure, and most of them have concentrated on structure models, urban centers, land use, functional areas, urban density, etc. (detailed in Section 2.1).
Since remote sensing can obtain a wide range of urban data, it has been used to measure urban density [19] and polycentricity [20]. In 2017, Chen et al. employed nighttime light (NTL) data to analyze urban structure using a localized contour tree method [18]. They successfully identified most of the centers specified in the master plan of Shanghai. In addition, they inventively drew an analogy between the NTL intensity and topographic surface. Then, they constructed a hierarchy of the detected urban centers and calculated the attributes of each center. Although this is an innovative and effective approach, there are still some limitations of using NTL data. Firstly, the spatial and temporal resolutions of NTL data are not high. The NPP-VIIRS NTL data used by Chen et al. [18] only has a resolution of 500 m. Although the spatial resolution of the latest Luojia-1 data is 130 m, the regression cycle takes 3-5 days and does not reflect the details of finer spatial and temporal scales [21]. Secondly, NTL satellites only capture nighttime data, which means that it can only reflect activities at night. Therefore, daytime activities cannot be inferred from NTL data. Thirdly, the proximity of night light to human activities is rather weak in certain places in the urban area; for instance, streetlights and many urban illuminations are independent of human flow. In sum, NTL data are unable to completely reveal the urban structure dynamic.
Urban centers and their functions are always connected to sociological and economical activities in a city. As humans are the main body of social and economic activities, employing data that are more related to human mobility, e.g., taxi trajectory data, to identify urban centers is more reasonable and reliable. Taxis are a large proportion of urban traffic in most large cities and can even account for as much as 50-60% of the traffic in certain key areas [22]. Taxi trajectory data provide precise information on citizens' favorite departure sites and destinations, and they are an important type of digital footprint for extracting human activities [23]. Furthermore, taxi trajectory data have the advantages of wide coverage, high sampling density, high position accuracy, large volume and information richness [24,25]. Compared with NTL data, taxi data have flexible spatial and temporal resolutions, are more related to human activities and can reflect these activities throughout a 24-h period.
Previous studies using taxi trajectory data related to the urban structure have mostly focused on human mobility [26][27][28][29][30][31], land use and functional regions [32][33][34][35][36][37] (detailed in Section 2.2). Notably, by creatively using kernel density estimation (KDE) to transfer taxi data to a continuous surface, Hu et al. proposed an algorithm to extract geometric features from the surface, and they successfully detected and analyzed hotspots in Shanghai [38]. However, similar to many previous studies, the specific boundaries of the detected hotspots were not delineated. Although researchers have delineated the boundaries of regions by using a clustering algorithm [30], census tract [28] or pre-shaped partition [29,31,[35][36][37], we did not find any research that used taxi data to analyze the hierarchy and attributes of urban centers.
Pick-up and drop-off spots extracted from taxi trajectory data have proven to be related to urban land use [32,35]. Therefore, we assumed that the use of taxi trajectory data could improve the identification of urban centers. Inspired by the work of Chen et al. [18], in this study, we simulated taxi data with the topographic surface and employed the localized contour tree method to identify urban centers with boundaries in Shanghai. In addition, the hierarchy and attributes of the detected centers were analyzed and calculated.
In this work, the following research questions (RQ) are addressed: RQ1: Can taxi trajectory data replace NTL data for the purpose of identifying urban centers? RQ2: What are the advantages of using taxi trajectory data in urban center identification? The remainder of this paper is organized as follows. A short review of previous research on urban structure is provided in the next section. We introduce our proposed methodology for identifying and analyzing urban centers in Section 3. In Section 4, we present our analysis results, followed by a discussion of our results and methods in Section 5. The conclusion of our work is given in Section 6.

Related Work
This section provides an overview of existing research approaches related to this work: urban structure research (Section 2.1) and taxi trajectory data research related to urban structure (Section 2.2).

Urban Structure Research
A city is a spatial combination of different land uses and functional areas. Initially, researchers proposed monocentric models to describe the spatial distribution in a city. In 1925, Burgess introduced the concentric zone model [2]. This is the first model that explains the urban structure. Five concentric functional zones, namely, the central business district (CBD), zone of transition, zone of workers' homes, the residential area and the commuter zone, were created to describe the urban structure of Chicago. This concentric model matched the city pattern when America was influenced by the Industrial Revolution [39]. However, some urban areas have become more active, and they continuously grow and expand. Therefore, in 1939, Hoyt proposed the sector model [40]. The sector model also has a single center, but some functional areas can extend from the CBD to the urban edge. Harris and Ullman developed the multiple nuclei model in 1945 [1]. According to the multiple nuclei model, a city can contain several centers. The CBD is no longer in the dominant position as it was before, and several sub-centers have been derived. Nevertheless, with urban development and population growth in recent decades, the urban structure has become complex, and it cannot be simply described by any traditional model.
Scholars employ different kinds of data to discover structures within cities. Census data, such as employment data, population data and commuting flows, are the most popular and are often used in urban structure analysis. In 1987, McDonald suggested that gross employment density and the employment-population ratio were the best measures of employment sub-centers [3]. Then, these measures were applied to identify sub-centers in Chicago [3,4]. Giuliano and Small developed a method for identifying employment sub-centers in Los Angeles using journey-to-work data [5]. They defined a sub-center as a continuous set of zones with an employment density and total employment above preset thresholds. Then, they analyzed the functions, distribution and commuting flows associated with the detected centers. Following in their steps, numerous works based on thresholds have been reported [6,7,9,10,12]. As noted by McMillen, the regression approach is less sensitive to the analysis unit [11]; therefore, he used a two-stage regression procedure to identify employment sub-centers in six cities. In contrast to the parametric regression model proposed by McDonald and Prather [8], McMillen's procedure is non-parametric, and its identification of employment sub-centers can better adapt to the reality of the urban structure [15]. Another work of McMillen proposed an algorithm using the standard properties of contiguity matrices to identify sub-centers [13]. This algorithm combines regression and threshold methods. Further work by Redfearn used the topography of the employment density obtained by non-parametric regression methods to detect both the sub-centers and their areas of influence, respectively [14,15]. Census data have still been commonly used in recent studies. Hu et al. detected the urban spatial structure using employment data and characterized the dynamic commuting patterns in Beijing, China [16]. Shen and Batty detected functional regions perceived by people with different occupations using disaggregated flow data and thus showed differences across occupations [17].
Instead of census data, 3D building models derived from remote sensing data were applied to measure the urban density [19] and analyze polycentricity [20] in four German city regions. Since nighttime light (NTL) data are receiving increasing attention, Chen et al. applied NTL data to delimit urban sub-centers and determine the topological relationships between them by adopting a graph-theory-based localized contour tree method [18]. In addition, as GPS is widely used, location-based data have become a research hotspot. Location-based social network data were used to identify city centers with precise boundaries [41] and distinguish urban functions [42]. Taxi trajectories are also popular location-based data for studying urban structure (detailed in Section 2.2).

Taxi Trajectory Data Research Related to Urban Structure
Taxi trajectory data record routes based on GPS devices. Such data have significant spatial and temporal characteristics and are closely related to human mobility, socialeconomic activities and urban layout. Therefore, a large number of studies have used taxi trajectory data to study human mobility, functional regions and land use.
In their work, Peng et al. analyzed taxi trips in Shanghai and found that people traveled for three main purposes on workdays: commuting between home and the workplace, traveling from workplace to workplace and traveling for other activities such as leisure [26]. Zhang et al. estimated origin-destination (OD) flows and mined their semantics based on periodic patterns [27]. In 2014, Scholz and Lu defined pick-up or drop-off spots as activity locations, and they proposed a method to detect the dynamic patterns of activity hotspots in a city [28]. After that, Tang et al. divided Harbin city into 400 transportation districts [29]. They clustered pick-up and drop-off locations using the DBSCAN algorithm and analyzed human mobility on weekdays and weekends. Employing road nodes to generate Thiessen polygons, Shen et al. analyzed the spatial distribution of pick-up and drop-off spots and used Moran's I index to explore the spatial autocorrelation of these regions, and hotspots were found by clustering [30]. Moreover, Kong et al. proposed a time-location relationship to find human mobility patterns in functional regions by analyzing pick-up and drop-off locations [31].
Since Zheng et al. detected inadequacy in urban planning using taxi trajectory data in 2011 [43], many taxi data researchers have focused on studying urban structure. Combined with points of interest (POIs), trajectory data were used to identify the different functions of regions partitioned by major roads [34]. The association between traffic patterns extracted from pick-up and drop-off spots and urban land uses in Shanghai has been revealed [35]. In addition, Qi et al. demonstrated that the social function of an urban region could be characterized by temporal variation in the number of pick-ups/drop-offs [32]. Pan et al. presented an improved DBSCAN algorithm for clustering pick-up and drop-off spots [33]. Then, they designed six features for regions extracted from clustering and classified these regions into different land uses. Using three large taxi datasets from three cities, Zhao et al. provided a new quad-tree region division method and then used the association rule to identify four kinds of functional regions based on the human mobility patterns revealed by taxi trips [36]. In 2016, Tong et al. used Voronoi decomposition to divide the city into regions, which were then classified into different functional areas by incorporating pick-up and drop-off information [37].
Thus far, in previous research work, pick-up and drop-off spots (origins and destinations) have been used for the analysis of human mobility and functional areas since these locations can reflect people's travel preferences [26][27][28][29][31][32][33]36,37]. From this point of view, it is logical to expect that pick-up and drop-off hotspots extracted from taxi trajectory data can directly indicate the urban structure and can therefore be used to detect urban centers.

Detecting Urban Centers by Using Taxi Trajectory Data
The main innovation of our proposed method is the adoption of a remote sensing method to analyze taxi trajectory data and extract results with finer details. In comparison with NTL data, taxi trajectory data are more related to human mobility and activities, as mentioned in Section 1, so they can better reflect human activity intensity in urban areas. In our analysis framework, we used a statistical grid of pick-up/drop-off spots and regarded it as a topographic surface of human activity intensity. Thus, an analogy was drawn between this surface and the statistical grid. The center of human activity is analogous to a peak on the topographic surface. Since a peak is represented by concentric contour lines on the topographic surface, an urban center can be detected by generating contour lines from the statistical surface (statistical grid). With this analogy, a localized contour tree was applied to our data to identify urban centers, analyze the hierarchy and calculate attributes. Our approach consists of five steps: taxi trajectory data preprocessing, contour map generation, localized contour tree construction and simplification, urban center identification and attribute calculation. The complete analysis framework is presented in Figure 1.
calculate attributes. Our approach consists of five steps: taxi trajectory data preprocessing, contour map generation, localized contour tree construction and simplification, urban center identification and attribute calculation. The complete analysis framework is presented in Figure 1.

Taxi Trajectory Data Preprocessing
Taxi trajectory data are a set of points recorded by GPS devices. Each point has several features, including GPSID, timestamp, coordinates, speed, direction, passenger status, etc. After basic data cleaning, null and out-of-range values were eliminated. The remaining data values were projected onto locations in a planar coordinate system.
As the pick-up and drop-off locations reflect people's departure places and destinations, we chose them to represent human activities. The passenger status of the taxi trajectory data is 0 (vacant) or 1 (occupied). In order to extract pick-up and drop-off spots, we established the following premise: For taxi data with the same GPSID in chronological order, if the passenger status of two adjacent records changes from 0 to 1, then the record with a passenger status of 0 is set as a pick-up spot. On the other hand, if the passenger status of two adjacent records changes from 1 to 0, then the record with a passenger status of 1 is set as a drop-off spot.
Following this premise, we extracted pick-up and drop-off spots from taxi trajectory data. Then, as an analogy to spatial resolution, we drew square grids and counted the total number of pick-up and drop-off spots in each grid. We calculated the hourly average of the total number in each grid. Thus, each grid has an attribute that represents the human activity intensity in that grid. The entire grid is similar to topography with peaks.
In order to compare this study with the work of Chen et al. [18], we chose a grid size of 500 m, which is equivalent to the pixel size of their NTL data. Pick-up and drop-off spot statistics in different periods were also computed to analyze the temporal patterns of urban centers.

Contour Map Generation
As we treated the statistical grid as topography, we used it to generate contour lines. To smooth noisy data, we applied a 5 × 5 Gaussian filter with a sigma value of 1 to our statistical grid. The initial contour value was set to 1, and the contour interval was set to 1.
Since taxis mainly pick up and drop off passengers near entrances and exits, the centers can be fragmentized into small areas. However, areas that are too small cannot be considered as a center. Given these considerations, we defined the minimum area of a detected center as 1 square kilometer. Therefore, we removed all closed contours with

Taxi Trajectory Data Preprocessing
Taxi trajectory data are a set of points recorded by GPS devices. Each point has several features, including GPSID, timestamp, coordinates, speed, direction, passenger status, etc. After basic data cleaning, null and out-of-range values were eliminated. The remaining data values were projected onto locations in a planar coordinate system.
As the pick-up and drop-off locations reflect people's departure places and destinations, we chose them to represent human activities. The passenger status of the taxi trajectory data is 0 (vacant) or 1 (occupied). In order to extract pick-up and drop-off spots, we established the following premise: For taxi data with the same GPSID in chronological order, if the passenger status of two adjacent records changes from 0 to 1, then the record with a passenger status of 0 is set as a pick-up spot. On the other hand, if the passenger status of two adjacent records changes from 1 to 0, then the record with a passenger status of 1 is set as a drop-off spot.
Following this premise, we extracted pick-up and drop-off spots from taxi trajectory data. Then, as an analogy to spatial resolution, we drew square grids and counted the total number of pick-up and drop-off spots in each grid. We calculated the hourly average of the total number in each grid. Thus, each grid has an attribute that represents the human activity intensity in that grid. The entire grid is similar to topography with peaks.
In order to compare this study with the work of Chen et al. [18], we chose a grid size of 500 m, which is equivalent to the pixel size of their NTL data. Pick-up and drop-off spot statistics in different periods were also computed to analyze the temporal patterns of urban centers.

Contour Map Generation
As we treated the statistical grid as topography, we used it to generate contour lines. To smooth noisy data, we applied a 5 × 5 Gaussian filter with a sigma value of 1 to our statistical grid. The initial contour value was set to 1, and the contour interval was set to 1.
Since taxis mainly pick up and drop off passengers near entrances and exits, the centers can be fragmentized into small areas. However, areas that are too small cannot be considered as a center. Given these considerations, we defined the minimum area of a detected center as 1 square kilometer. Therefore, we removed all closed contours with areas less than 1 square kilometer. We believe that all major urban centers can be detected with these preset parameters.

Localized Contour Tree Construction and Simplification
The contour tree was first used to store contour maps [44,45]. Then, in 1994, Kweon and Kanade built a contour map from a digital elevation model (DEM) and then created a connectivity tree to extract topographic features [46]. They called the resulting tree a topographic change tree. The localized contour tree method was first described by Wu et al. to detect and delineate surface depressions [47,48]. They generated a contour tree from a LiDAR DEM. Then, a localized fast priority search algorithm was applied over the contour tree to identify depressions.
A contour tree is composed of nodes and links. Nodes represent contour lines, while links represent topological inclusion relationships between adjacent contour lines. According to Wu et al. [47], the localized contour tree is constructed in a bottom-up manner. Sample contour lines are shown in Figure 2, and sample contour trees based on Figure 2 are shown in Figure 3a. First, we identified seed contour lines. A seed contour line contains no other contour lines and is a leaf node in a contour tree. In Figure 2, contours g, h, i, j are seed contour lines. In Figure 3a, nodes g, h, i, j are leaf nodes. Starting from seed contour lines, the rest of the contour trees can be constructed. If the external contour line contains only one contour line or branch, then the former is considered to be at the same level as the latter. Otherwise, the external contour line is considered to be at a higher level. As shown in Figures 2 and 3a, contours g, h, i, j are at level 1. Contours d, e, f contain only one contour line each, so they are also at level 1. However, contour c contains more than one branch, so it is considered as level 2. Similarly, contour a contains two branches, and contour b contains only one, so they are identified as level 3 and level 1, respectively. For each branch, the outermost contour line is the root node. Following this rule, contour trees were constructed as shown in Figure 3a.
topographic change tree. The localized contour tree method was first described by Wu et al. to detect and delineate surface depressions [47,48]. They generated a contour tree from a LiDAR DEM. Then, a localized fast priority search algorithm was applied over the contour tree to identify depressions.
A contour tree is composed of nodes and links. Nodes represent contour lines, while links represent topological inclusion relationships between adjacent contour lines. According to Wu et al. [47], the localized contour tree is constructed in a bottom-up manner. Sample contour lines are shown in Figure 2, and sample contour trees based on Figure 2 are shown in Figure 3a. First, we identified seed contour lines. A seed contour line contains no other contour lines and is a leaf node in a contour tree. In Figure 2, contours , ℎ, , are seed contour lines. In Figure 3a, nodes , ℎ, , are leaf nodes. Starting from seed contour lines, the rest of the contour trees can be constructed. If the external contour line contains only one contour line or branch, then the former is considered to be at the same level as the latter. Otherwise, the external contour line is considered to be at a higher level. As shown in Figures 2 and 3a, contours , ℎ, , are at level 1. Contours , , contain only one contour line each, so they are also at level 1. However, contour contains more than one branch, so it is considered as level 2.
Similarly, contour a contains two branches, and contour contains only one, so they are identified as level 3 and level 1, respectively. For each branch, the outermost contour line is the root node. Following this rule, contour trees were constructed as shown in Figure  3a.
In order to find the hierarchy of contour lines and detect the urban center, the contour trees were simplified. Adjacent contour lines at the same level were considered to be different locations of one peak, and we only kept the outermost contour line, i.e., the highest node with the same level in the contour tree. Therefore, as shown in Figure 3b, contours a, , , , , are retained as the simplified contour tree.

Urban Center Identification
As mentioned above, we calculated the hourly average number of total pick-up and drop-off spots to reflect human activity intensity. Urban centers are usually areas of high human activity intensity. Since the contour lines define ranges with the same activity intensity, they can be used to identify urban centers.
The simplified contour tree represents the entire hierarchy of all contour lines. In In order to find the hierarchy of contour lines and detect the urban center, the contour trees were simplified. Adjacent contour lines at the same level were considered to be ISPRS Int. J. Geo-Inf. 2021, 10, 220 7 of 26 different locations of one peak, and we only kept the outermost contour line, i.e., the highest node with the same level in the contour tree. Therefore, as shown in Figure 3b, contours a, b, c, d, f , g are retained as the simplified contour tree.

Urban Center Identification
As mentioned above, we calculated the hourly average number of total pick-up and drop-off spots to reflect human activity intensity. Urban centers are usually areas of high human activity intensity. Since the contour lines define ranges with the same activity intensity, they can be used to identify urban centers.
The simplified contour tree represents the entire hierarchy of all contour lines. In Figure 3b, there are two sub-trees in the simplified contour tree. The root nodes are the outermost contour lines of each sub-tree. Some sub-trees can have several branches, while others may contain only one node, as shown in Figure 3b. In this paper, we regard each leaf node, including one-node sub-trees, as a monocentric urban center (MUC). In addition, sub-trees that contain more than one leaf node or branch are considered to be polycentric urban centers (PUCs). The MUCs in a PUC are sub-centers, and the largest sub-center is the main center of the PUC. For example, in Figure 3, there is one MUC, and one PUC contains three sub-centers. We consider the main center in the largest contour tree to be the main center of a city, and the number of levels can be regarded as a reference to the complexity of PUCs.

Calculation of Urban Center Attributes
In order to better understand the characteristics of urban centers, we calculated several attributes for each one. Since we used the hourly average number of pick-up and dropoff spots to represent human activity intensity, for each MUC, the minimum intensity (MinI), the maximum intensity (MaxI), the total intensity (TI), the average intensity (AveI) and the standard deviation of intensity (StdI) were calculated as statistical attributes. In addition, the area (S), urban development orientation (Φ), compactness index (C) and elongatedness (E) [18] were calculated for MUCs as morphometric attributes to analyze geometric characteristics [49,50]. Moreover, we used the slope of topography to calculate the gradient (G) of activity intensity to identify intensity changes from the center to the periphery of a center. For each PUC, we calculated a polycentric indicator (PI) to quantify polycentricity [51,52]. The detailed definitions of the attributes are listed in Table 1.
In Table 1, n is the number of grids in a center, and x i is the hourly average number of pick-up and drop-off spots in the ith grid. The compactness index (C) is designed to quantify the compactness of the shape, and p is the perimeter of the center boundary. A circle is the most compact shape in Euclidean space, and its C value is 1 [17]. Elongatedness (E) also indicates the compactness of a shape. Len and Wid are the length and width of the MBR (Minimum Bounding Rectangle) of the center. If a center has a high C and a low E, then the center is regarded as compact. Otherwise, the center is incompact. When calculating the polycentric indicator (PI), we consider all of the sub-centers in a PUC. n c is the number of sub-centers in a PUC. Std c is the standard deviation, and AveI is the average of all AveI of those sub-centers. AveI i is the AveI of the i th sub-center. AveI max is the maximum AveI of all sub-centers, and Std max is the standard deviation between AveI max and an ideal sub-center with an AveI of 0. PI ranges from 0 to 1. If the PI of a PUC is 0, then the PUC has a dominant main center, while if PI is 1, then the sub-centers equally impact the PUC.

Case Study and Results
This section presents the results of our case study in Shanghai, China. We first introduce our study area and data, and then we present the results of the localized contour tree and urban center detection. Finally, we analyze the attributes and temporal patterns of urban centers in Shanghai.

Study Area and Data
Researchers have verified that the polycentric planning strategy is beneficial to urban economics in China [53]. Among the metropolitans in China, Shanghai has been demonstrated to be a polycentric city, which is consistent with its master plan [18,54]. Shanghai is located in the east of China. It is an international center of economics, trade, science and technology. As shown in Figure 4, Shanghai has 16 districts (Huangpu, Xuhui, Changning, Jing'an, Putuo, Hongkou, Yangpu, Minhang, Baoshan, Jiading, Pudong, Jinshan, Songjiang, Qingpu, Fengxian and Chongming) with a total area of 6340.5 square kilometers, and its population was over 24 million in 2019 according to national statistical data [55]. Shanghai has three rings, which are connected by several roads and expressways. According to the Shanghai master plan, the central area of Shanghai is within the outer ring road [56].  According to the Shanghai Annual Report on Urban Traffic Operation in 2018, the daily average number of passenger trips of cruising taxis is around 1.75 million, and that of online car-hailing is around 1.05 million. Considering the daily average number of passenger trips via underground transportation (around 10.16 million) and public buses (around 5.76 million), taxis accounted for more than one-tenth of public transportation in Shanghai in 2018.
In this case study, we used the taxi trajectory data in Shanghai provided by a local company to verify our analysis framework. Our data include around 5500 taxis and cover an ordinary week in 2018 (from Monday 4th of June to Sunday 10th of June). One of the largest taxi companies, Qiangsheng, claimed that they had about 12,000 taxis in 2018, which accounted for 25 percent of taxis in Shanghai. According to this information, our taxis account for about one-tenth of taxis in Shanghai. The data include information on each taxi, and the GPS sampling frequency is 10 s. Table 2 shows sample records from our taxi trajectory data. The value of the field 'Time' ranges from 0 to 235,959, which represents the time in 24-h format. The field 'GPSID' is the unique identification of each taxi. The fields 'Lon' and 'Lat' are the longitude and latitude of the taxi. The fields 'S' and 'D' are the instantaneous speed and direction of the taxi. The field 'State' indicates whether there are passengers in the taxi and has a value of 0 (vacant) or 1 (occupied).
We removed missing and outlier values from the raw data. The coordinates were converted to locations in a projected coordinate system, which is WGS 1984 UTM Zone 51N in this case study. The pick-up and drop-off spots were extracted following the method described in Section 3.1. Each record in the pick-up and drop-off data was transferred to a point in shapefile format. Square grids with sides of 500 m covering the whole study area were drawn as shown in Figure 5. We calculated the total number of pick-up and drop-off spots in each grid. We repeated the steps above for seven days of data, which were separated into workday (Monday to Friday) and weekend (Saturday and Sunday) data. The hourly average of the total number in each of the two data groups was calculated According to the Shanghai Annual Report on Urban Traffic Operation in 2018, the daily average number of passenger trips of cruising taxis is around 1.75 million, and that of online car-hailing is around 1.05 million. Considering the daily average number of passenger trips via underground transportation (around 10.16 million) and public buses (around 5.76 million), taxis accounted for more than one-tenth of public transportation in Shanghai in 2018.
In this case study, we used the taxi trajectory data in Shanghai provided by a local company to verify our analysis framework. Our data include around 5500 taxis and cover an ordinary week in 2018 (from Monday 4th of June to Sunday 10th of June). One of the largest taxi companies, Qiangsheng, claimed that they had about 12,000 taxis in 2018, which accounted for 25 percent of taxis in Shanghai. According to this information, our taxis account for about one-tenth of taxis in Shanghai. The data include information on each taxi, and the GPS sampling frequency is 10 s. Table 2 shows sample records from our taxi trajectory data. The value of the field 'Time' ranges from 0 to 235,959, which represents the time in 24-h format. The field 'GPSID' is the unique identification of each taxi. The fields 'Lon' and 'Lat' are the longitude and latitude of the taxi. The fields 'S' and 'D' are the instantaneous speed and direction of the taxi. The field 'State' indicates whether there are passengers in the taxi and has a value of 0 (vacant) or 1 (occupied). We removed missing and outlier values from the raw data. The coordinates were converted to locations in a projected coordinate system, which is WGS 1984 UTM Zone 51N in this case study. The pick-up and drop-off spots were extracted following the method described in Section 3.1. Each record in the pick-up and drop-off data was transferred to a point in shapefile format. Square grids with sides of 500 m covering the whole study area were drawn as shown in Figure 5. We calculated the total number of pick-up and drop-off spots in each grid. We repeated the steps above for seven days of data, which were separated into workday (Monday to Friday) and weekend (Saturday and Sunday) data. The hourly average of the total number in each of the two data groups was calculated in each grid. Then, we obtained the workday grids and weekend grids, which represent the human activity intensity on workdays and weekends in Shanghai, respectively. As an example, the statistical results for weekdays are shown in Figure 5 using graduated color. in each grid. Then, we obtained the workday grids and weekend grids, which represent the human activity intensity on workdays and weekends in Shanghai, respectively. As an example, the statistical results for weekdays are shown in Figure 5 using graduated color.

Identification and Analysis of Urban Centers
We generated contour lines from the resulting statistical grids. In this case study, we used a 5 × 5 Gaussian filter with a sigma value of 1 to smooth the noisy data and set both the initial contour value and contour interval to 1. A contour level of 1 means that the grid has an average of 1 pick-up/drop-off spot within an hour. Similarly, a contour value of 60 means that the grid has an average of 60 pick-up/drop-off spots within an hour. The contour map of the weekday grids is shown as an example in Figure 6. The outlined area in the figure is enlarged in the lower-left corner to show the detailed contour lines since they are dense in some areas. The contour values increase as the color changes from yellow to blue. The contour lines are found to be much denser when the contour value is high.
As mentioned in Section 3.2, we removed the closed contour lines with areas less than 1 square kilometer to eliminate potential centers that were too small. After that, we applied the localized contour tree method to generate contour trees. Taking the simplified contour tree for weekdays as an example, we obtained 30 sub-trees, as shown in Figure  7a. There were 76 leaf nodes in total. Among these leaf nodes, 27 were one-node sub-trees (yellow features in Figure 7a), and the others belonged to three complex sub-trees (blue features in Figure 7a). One complex sub-tree had two levels, and another had three levels. The largest sub-tree had 11 levels and covered most of Shanghai's central area (within the

Identification and Analysis of Urban Centers
We generated contour lines from the resulting statistical grids. In this case study, we used a 5 × 5 Gaussian filter with a sigma value of 1 to smooth the noisy data and set both the initial contour value and contour interval to 1. A contour level of 1 means that the grid has an average of 1 pick-up/drop-off spot within an hour. Similarly, a contour value of 60 means that the grid has an average of 60 pick-up/drop-off spots within an hour. The contour map of the weekday grids is shown as an example in Figure 6. The outlined area in the figure is enlarged in the lower-left corner to show the detailed contour lines since they are dense in some areas. The contour values increase as the color changes from yellow to blue. The contour lines are found to be much denser when the contour value is high. 4 at level 1 are three MUCs. Node 1 consists of nodes 3 and 4 and represents a PUC with two sub-centers at level 2. Then, node 0 is a root node of this small sub-tree and represents a more complex PUC than node 1. Even though nodes 2, 3 and 4 are all at level 1, Figure  7c indicates that nodes 3 and 4 have closer relationships since they belong to node 1. When verifying the map, we found a canal between nodes 1 and 2. This shows that the extracted hierarchy is in line with the actual situation. Following these rules, the hierarchy within the detected PUCs was analyzed.  As mentioned in Section 3.2, we removed the closed contour lines with areas less than 1 square kilometer to eliminate potential centers that were too small. After that, we applied the localized contour tree method to generate contour trees. Taking the simplified contour tree for weekdays as an example, we obtained 30 sub-trees, as shown in Figure 7a. There were 76 leaf nodes in total. Among these leaf nodes, 27 were one-node sub-trees (yellow features in Figure 7a), and the others belonged to three complex sub-trees (blue features in Figure 7a). One complex sub-tree had two levels, and another had three levels. The largest sub-tree had 11 levels and covered most of Shanghai's central area (within the outer ring road), as shown in Figure 7b. As the largest sub-tree is too complex (Figure 7b) and the space is limited, Fengxian District is used as an example to illustrate our tree construction (Figure 7c). In Figure 7c, the sub-tree is a 3-level contour tree. Leaf nodes 2, 3 and 4 at level 1 are three MUCs. Node 1 consists of nodes 3 and 4 and represents a PUC with two sub-centers at level 2. Then, node 0 is a root node of this small sub-tree and represents a more complex PUC than node 1. Even though nodes 2, 3 and 4 are all at level 1, Figure 7c indicates that nodes 3 and 4 have closer relationships since they belong to node 1. When verifying the map, we found a canal between nodes 1 and 2. This shows that the extracted hierarchy is in line with the actual situation. Following these rules, the hierarchy within the detected PUCs was analyzed.
To verify the accuracy of the proposed method, in addition to comparing our results with those of Chen et al. [18], we also referred to the Shanghai City Master Plan (2017-2035) [56]. The plan describes Shanghai's four-level system of public activity centers (a. the main center (central activity area); b. the sub-center; c. the district center; and d. the community center). In Table 3, we list all 74 MUCs on weekdays, indicate the locations covered by these MUCs and compare them with the planned activity centers in Shanghai.    As can be seen in Table 3, most of the 33 centers identified by the NTL data of Chen et al. [18] are included in our results. Since the centers obtained from taxi data are more detailed, multiple MUCs commonly correspond to one NTL center. For example, the main center detected in NTL data was defined as the center of Lujiazui, People's Square and    As can be seen in Table 3, most of the 33 centers identified by the NTL data of Chen et al. [18] are included in our results. Since the centers obtained from taxi data are more detailed, multiple MUCs commonly correspond to one NTL center. For example, the main center detected in NTL data was defined as the center of Lujiazui, People's Square and To verify the accuracy of the proposed method, in addition to comparing our results with those of Chen et al. [18], we also referred to the Shanghai City Master Plan (2017-2035) [56]. The plan describes Shanghai's four-level system of public activity centers (a. the main center (central activity area); b. the sub-center; c. the district center; and d. the community center). In Table 3, we list all 74 MUCs on weekdays, indicate the locations covered by these MUCs and compare them with the planned activity centers in Shanghai.
As can be seen in Table 3, most of the 33 centers identified by the NTL data of Chen et al. [18] are included in our results. Since the centers obtained from taxi data are more detailed, multiple MUCs commonly correspond to one NTL center. For example, the main center detected in NTL data was defined as the center of Lujiazui, People's Square and Shanghai Railway Station, whereas we had six MUCs covering this area, which includes these landmarks.
Compared with Shanghai's four-level system of public activity centers, we cover most of the key locations in the planned central activity area (marked "a" in Table 3) and include 9 sub-centers out of 16 planned ones (marked "b" in Table 3) and 19 district centers (marked "c" in Table 3).
Most of the undetected centers that are included in the two references are in suburban areas such as Jiading District, Qingpu District, the northern part of Baoshan District, Jinshan District, the southern part of Pudong New District and Chongming District. These are areas outside the outer ring road of Shanghai, which are economically underdeveloped, industrial or new planned development zones. The intensity of human activity is low in these areas, and taxis are not the most economical mode of transportation for remote areas. Since taxi drivers can choose their activity range independently, they may prefer to avoid these areas.   Furthermore, we detected some centers that are not mentioned in one or both references, which are listed in Table 3. For example, Shanghai Disney Resort (MUC 16), Jing'an Temple (MUC 73), Yangpu Riverside (MUC 60), Fengxian (MUCs 71, 75 and 76), some towns, etc., are not included in the NTL results. Shanghai Disney Resort is an amusement park that closes around 8:30 pm. Such areas are not illuminated for long at night, so they cannot be captured by NTL data. As a result, NTL data cannot detect such centers, whereas taxi data, which can capture 24-h human activity, can identify them.
Some small towns, as well as Shanghai Disney Resort (MUC 16), Hongqiao Railway Station and Hongqiao International Airport (MUC 39, 40), Pudong International Airport (MUC 12, 17), etc., are not mentioned by name in the first three levels of Shanghai's public activity center system. Although Shanghai Disney Resort is not explicitly named in the Shanghai Master Plan, it is located near Chuansha New Town, which is planned as an activity center. As railway stations and airports are traffic hubs in the city, we consider them to be urban centers with transportation functions. Moreover, some MUCs do not cover any distinct landmark, but they either include residential areas and small hospitals, such as Pediatric Hospital of Fudan University (MUC 49), or encircle major roads, such as the outer ring highway (MUC 15). They are all preferred destinations or routes used by people in their daily activities.
Overall, except for suburban areas, our results are reliable, and most urban centers can be successfully detected in more detail.

Attributes of Urban Centers
The five statistical attributes and areas of MUCs on weekdays are illustrated by box plots in Figure 9a Additionally, MUC 32 covers a large region that encompasses residential areas in Xinzhuang Town, and MUC 54 covers residential areas and a section of the north-south viaduct.
Among the three PUCs on weekdays, PUC C has an area of 511.04 km 2 and covers most parts inside the outer ring. According to the statistical data of Shanghai, the central area within its outer ring is around 660 km 2 [56]. Therefore, PUC C covers around 77% of the central area of Shanghai.
(Fengxian)) and financial centers (e.g., MUC 68 (Lujiazui)) have high values, which means that the activity intensity changes sharply within these centers.
The Polycentric Indicator ( ) was calculated for the three PUCs on weekdays. PUC A has a value of 0.66. The largest PUC C has a value of 0.47, which indicates its polycentricity. We believe that the polycentricity of PUC C is affected by certain MUCs with high intensity. PUC B has a high value of 0.85, which means that the three MUCs  For the other four morphometric attributes, the Compactness Index (C) and Elongatedness (E) together reflect the compactness of MUCs. If the C value is large and the E value is small, then the MUC's shape is relatively compact. As shown in Figure 10a, most MUCs on weekdays are relatively incompact, and the most compact MUC on weekdays is MUC 17 (Pudong International Airport). The Urban Development Orientation (Φ) reflects the development orientation of MUCs. We find that, in Figure 11a, most urban centers develop along the road or around intersections, such as the typical MUCs 15 and 48. Furthermore, the shape and development direction of the urban center is related to the shape of the building (e.g., MUC 39, Hongqiao Railway Station) and the shape of the town or residential area (e.g., MUC 22, Shachuan New Town). The Intensity Gradient (G) reflects the changes in activity intensity within each urban center. Figure 11a also shows that the G value is relatively low in most urban centers, indicating that the intensity change is not very large inside the center. The three transportation centers (MUC 70 (Shanghai Railway Station), MUCs 39 and 40 (Hongqiao Airport and Hongqiao Railway Station) and MUC 17 (Pudong International Airport)) and some commercial (e.g., MUCs 71, 75 and 76 (Fengxian)) and financial centers (e.g., MUC 68 (Lujiazui)) have high G values, which means that the activity intensity changes sharply within these centers.  The Polycentric Indicator (PI) was calculated for the three PUCs on weekdays. PUC A has a PI value of 0.66. The largest PUC C has a PI value of 0.47, which indicates its polycentricity. We believe that the polycentricity of PUC C is affected by certain MUCs with high intensity. PUC B has a high PI value of 0.85, which means that the three MUCs (MUCs 41, 42 and 43) within PUC B have a nearly equal impact.

Temporal Patterns of Urban Centers
We analyzed the urban centers detected on weekends to compare them with those on weekdays. Figure 12a illustrates the urban centers on weekends. Compared with urban centers on weekdays (Figure 12a), the overall range does not change significantly. Table  4 shows the basic information on weekday and weekend urban centers. The number of contour trees on weekends is 34 (more than that on weekdays), but the highest level of the largest contour tree is 10 (less than that on weekdays). The decrease in level means that the largest contour tree on weekends is less complex than that on weekdays. There are fewer MUCs on weekends than on weekdays, but the number of PUCs remains the same.    However, PUCs A' and C' on weekends shrink compared with PUC A and C on weekdays. In addition, the area of PUC B' on weekends is a part of PUC C on weekdays, and PUC B on weekdays becomes a smaller MUC (MUC 24 (Zhangjiang Hi-tech Park) on weekends in Figure 12a) on weekends. We find that a large number of MUCs on weekends have almost the same positions and boundaries as those on weekdays, for example, MUCs 18 and 13 (Pudong International Airport) on weekends (corresponding to MUCs 17 and 12 on weekdays). These MUCs represent urban centers that do not change between weekdays and weekends, such as transportation centers. There are still some MUCs on weekdays that disappear on weekends (e.g., MUCs 14,15,19,24,29,56,57,59, 62 and 68 on weekdays), and other MUCs emerge on weekends but do not exist on weekdays (e.g., MUCs 3,15,19,26,31,32,34,44,57, 59 and 62 on weekends). Lujiazui, Changfeng Ecological Business District and Hongqiao Development Zone (MUCs 68, 59 and 65 on weekdays) all disappear on weekends. Some residential areas, such as Qibao Town (MUC 34 on weekdays), also disappear. The centers that disappear on weekends are mostly employment centers (e.g., Lujiazui) and some residential areas. Three urban centers that are planned in the Shanghai City Master Plan (2017-2035) but are not detected on weekdays are delineated on weekends: these centers are Zhenru, Luxun Park and Sichuan North Road, and Gaoqiao Town (MUCs 44, 62 and 32 on weekends). The new centers on weekends are mostly residential areas (e.g., Gaoqiao Town) and parks (e.g., Luxun Park). Some urban centers with commercial and leisure functions, for instance, People's Square, Shanghai Government, The Bund, Nanjing Road, Jing'an Temple and Jing'an Park (MUCs 69 and 70 on weekdays), are expanded compared with their sizes on weekdays. The most prominent one, MUC 64 on weekends, extends from Xujiahui to Zhongshan Park. This is because people's commuting needs decrease significantly on the weekend, but leisure activities increase.
As shown in Figure 9a, the statistical attributes of MUCs on weekends are mostly distributed in a lower range than those on weekdays. However, the total intensity of all MUCs on weekends is 3953.94 and is higher than that on weekdays (3797.13). Even though some MUCs on weekends have the same locations and boundaries as they do on weekdays, their activity intensity is higher (e.g., People's Square). Moreover, the total area of MUCs and PUCs on weekends is larger than on weekdays. For morphometric attributes, the most compact center on weekends is still Pudong International Airport. MUCs that have high G values on weekdays have even higher G values on weekends. Overall, in terms of the distributions of statistical attributes, many detected urban centers are less active on weekends, but there are several vital centers, such as transportation and commercial centers, that are even more active than on weekdays. This further reflects the increased demand for leisure and entertainment on weekends.
In order to extract finer temporal patterns, we analyzed urban centers on Monday and Sunday between 7:00 and 10:00, which is when people are on their way to work. The detected urban centers on Monday and Sunday are shown in Figure 12b. The centers clearly shrink on Sunday morning compared with Monday. The number and total area of MUCs significantly decrease on Sunday morning as shown in Table 5. Many MUCs on Monday morning disappear on Sunday morning, and those that still exist on Sunday morning mostly shrink. The MUCs that disappear on Sunday morning are mostly residential areas, towns, some major roads, intersections and viaducts. There are also some new MUCs on Sunday morning compared with Monday, for instance, Zhenru (MUC 33), Zhongshan Park (MUC 67), Shanghai Botanical Garden (MUC 54), Shanghai World Expo Park (MUC 55), etc., which shows that people tend to travel to leisure and entertainment destinations on Sunday. The statistical and morphometric attributes of MUCs on Monday and Sunday morning are illustrated in Figures 9b, 10b and 11c,d. The statistical attributes on Monday have a higher mean value than on Sunday. The total activity intensity on Monday morning is 5061.33 and is much higher than that on Sunday morning (3526), which means that Shanghai is much more active on Monday morning than on Sunday morning. The compactness, urban development orientation and intensity gradient of the centers do not vary much between Monday and Sunday. Overall, there are more centers on Monday morning due to commuting demands, but vital urban centers, such as transportation centers, are not influenced by temporal factors.

Discussion
Summarizing the results of our case study (Section 4), the urban centers of Shanghai were successfully detected and analyzed by applying our analysis framework (Section 3). We focused on the attributes and locations of monocentric urban centers (MUCs). The hierarchy and polycentricity of polycentric urban centers (PUCs) were also revealed. There were fewer urban centers on weekends than on weekdays, but the total area and activity intensity of the centers were larger and higher. In addition, we detected urban centers during the commuting period (7:00-10:00) on Monday and Sunday. The number and area of centers both increased on Monday relative to those on Sunday, which fits the typical lifestyle. Furthermore, the results also suggest that major urban centers are not affected by temporal factors.
We used our proposed analysis framework to apply taxi trajectory data to detect urban centers. The detected urban centers were mostly small in range but large enough to cover key hotspots, which is suitable for analyzing urban centers on small scales. Our results covered more than half of the 33 urban centers detected by nighttime light (NTL) data [18], as listed in Table 3. We identified a greater number of centers; for instance, 76 MUCs were detected on weekdays. Most of the delineated centers were relatively small compared with the NTL results. To be more specific, several detected centers corresponded to one center detected from NTL data. Therefore, our results were more detailed. Among our detected centers, some were not included in the NTL results, such as Fengxian New City in Fengxian District and several other residential areas, as explained in Section 4.2 and listed in Table 3.
Those centers that we did not identify were all located in the suburban areas of Shanghai or along the coastline, such as Jiading District, Qingpu District, the northern part of Baoshan District, etc. This omission is due to the lack of taxis in suburban areas. Since taxi drivers can choose their activity ranges independently, they naturally choose areas with high human activity intensity so that they can serve more passengers. This, in turn, will result in high taxi numbers in these areas, making them more likely to be urban centers.
Our results demonstrate the advantages of flexible resolution, high sampling frequency and rich semantic information for taxi trajectory data. First, we used the average number of pick-up/drop-off spots in a week to extract urban centers to eliminate deviations caused by a special event on a certain day. The average data in a long period can produce more generalizable results. In this study, some urban centers were found to be time-dependent. For example, Zhenru (MUC 44 on weekends in Figure 12a and MUC 33 on Sunday in Figure 12b) was only detected on weekends and Sunday morning. In addition, the results for Monday and Sunday morning show that most urban centers, with the exception of some major centers such as transportation centers, are strongly affected by time. Therefore, dynamic urban centers can be analyzed by dividing taxi trajectory data into more detailed time ranges, which cannot be done with NTL data.
However, there are still limitations in our method. This paper is based on the premise that our taxi data can represent all taxis in Shanghai and that taxis travel to every corner of the city. Unfortunately, as shown in our results, taxis are distributed unevenly and cannot cover the city completely. The taxis tend to be sparse in suburban areas. This is why we detected few centers in suburban areas. In addition, taxis are limited by road networks and buildings, so some places are inaccessible to them. Thus, further work should be conducted to address these limitations.

Conclusions and Future Work
In this paper, we propose an analysis framework to detect urban centers with boundaries from taxi trajectory data. We simulated taxi trajectory data with the topographic surface and generated contour maps. Then, we employed the localized contour tree method to delineate the boundaries of urban centers and analyzed the hierarchy within polycentric urban centers. We calculated the attributes of the detected urban centers and compared the results between different periods. We integrated a remote sensing method and taxi trajectory data to provide a new perspective for analyzing urban centers.
The work in this paper provides an affirmative answer to RQ1 raised in the Introduction. Taxi trajectory data can be used as a replacement for NTL data when detecting urban centers. In the presented study, we identified 76 monocentric urban centers (MUCs) on weekdays, and most of them agree with Shanghai's actual situation and the master plan, as illustrated in Section 4.2. The results are more detailed than those of NTL data, and we performed our method for different time periods to analyze temporal patterns of urban centers. Although urban centers in suburban areas were rarely detected due to the limitations of taxi trajectory data, as discussed in Section 5, the results are still reliable and representative.
In answer to RQ2, the advantages of using taxi trajectory data for urban center detection are apparent. The topographic surface simulated by taxi trajectory data has an unrestricted spatial and temporal resolution. We used grids with sides of 500 m in this analysis, but theoretically speaking, our analysis framework is suitable for grids with any side length. In addition, more detailed urban centers can be analyzed by using taxi trajectory data, as detected centers with small areas are more likely to correspond to actual landmarks. Moreover, the detection of urban centers is not limited by time period since taxi trajectory data can reflect human activity throughout a 24-h period. Therefore, urban centers can be extracted at finer temporal scales. Finally, due to the rich semantic information of taxi trajectory data, different statistical parameters can be designed according to different purposes. Then, different simulative topographic surfaces can be obtained, and centers with different meanings can be extracted.
There is much work to do in the future. The effects of the limitations of taxi trajectory data can be reduced by adding more taxis, using data from a longer period and including data on other forms of transportation. In addition to addressing the limitations of taxi trajectory data, as one of the future research directions, dynamic urban centers can be analyzed by using our analysis framework over shorter periods. Moreover, urban functional areas can be detected directly through spatiotemporal variation in urban centers. We can also verify the degree of completion of the city master plan by analyzing urban centers in different years. Overall, the analysis framework that we propose can be applied to study urban structure, urban transportation and many other aspects.