Identification of Urban Functional Regions in Chengdu Based on Taxi Trajectory Time Series Data

: Overall scientific planning of urbanization layout is an important component of the new period of land spatial planning policies. Defining the main functions of different spaces and dividing urban functional areas are of great significance for optimizing the land development pattern. This article identifies and analyses urban functional areas from the perspective of data mining. The results of this method are consistent with the actual situation. In this paper, representative taxi trajectory data are selected as the research basis of urban functional areas. First, based on trajectory data from Didi Chuxing within the high-speed road surrounding Chengdu, we generated trajectory time sequence data and used the dynamic time warping (DTW) algorithm to generate a time series similarity matrix. Second, we utilized the K-medoid clustering algorithm to generate preliminary results of land clustering and selected the results with high classification accuracy as the training samples. Then, the k-nearest neighbour (KNN) classification algorithm based on DTW was performed to classify and identify the urban functional areas. Finally, with the help of point-of-interest (POI) auxiliary analysis , the final functional layout in Chengdu was obtained. The results show that the spatial structure of Chengdu is complex and that the urban functions are interlaced, but there are still rules that are followed. Moreover, traffic volume and inflow data can better reflect the travel rules of residents than simple taxi on–off data. The original DTW calculation method has high temporal complexity, which can be improved by normalization and the reduction of time series dimensionality. The semi-supervised learning classification method is also applicable to trajectory data, and it is best to select training samples from unsupervised learning. This method can provide a theoretical basis for urban land planning and has auxiliary and guiding value for urbanization layout in the context of land spatial planning policies in the new era.


Introduction
An urban functional area is a product of nature, society and the economy that enables a city to simultaneously support a variety of human activities, such as work, life, dining and entertainment [1]. Overall scientific planning for urbanization is an important part of the national spatial planning policy in the new era. Determining and dividing the main functions of different spaces is the basis for determining the direction of urban land development, which is of great significance for standardizing the process and optimizing the patterns of national land development.
The distribution of urban spatial structures and functions has consistently been an important topic in urban research. Traditional studies on urban spatial structure are based on the analysis of the interaction between urban functions and social processes [2]. With the development of remote sensing technology, the use of remote sensing images to detect land-use change has become an important means to study urban functions [3][4][5]. Many scholars have applied data with location information to the study of urban space. Ahas et al. [6] first proposed that mobile phone location data could be applied to the study of urban spatial structure. Subsequently, Ratti et al. [7], based on Ahas, demonstrated the results of analysing the spatial and temporal changes of urban activities using thermal maps to mobile phone data, opening the research field of the application of mobile positioning data to a large sample, a large range and a dynamic understanding of urban systems.
In recent years, the information age has deepened, and the analysis of urban functional areas from the perspective of location-based service (LBS) data mining has become mainstream. Various types of social media data often contain location information. Cranshaw [8] introduced a clustering model and research methodology for studying the structure and composition of a city on a large scale based on the social media its residents generate. Kling et al. [9] explored the use of textual and eventbased citizen-generated data from services such as Twitter and Foursquare to study urban dynamics. Steiger [10] introduced the utilization of social media data to investigate urban environments. Dong [11] extracted population flow information from WeChat data to explain the urban space.
With the emergence of smartphones with positioning functions, the location information contained in mobile phone data represents the law of human activities. Therefore, many scholars use mobile phone data to reveal the laws related to urban functions. Phithakkitnukoon et al. [12] studied the travel patterns of mobile phone users in Japan using mobile data. Victor Soto et al. [13] designed an automatic land use identification system based on the signal generated by the cell phone base station network. Becker R.A. et al. [14] analysed the population flow in New York City and its suburbs by using cellular data (CDR).
As spatiotemporal trajectory data become increasingly easy to obtain, scholars have studied user behaviour patterns, traffic volume, regional population density, occupation and housing distribution and other issues by analysing the trajectory and then drawing conclusions related to the urban functional structure. For example, Brockman [15] used travel bugs to understand human mobility patterns. Doyle [16] used cell phone data combined with the Markov chain to analyse population density and thus to draw conclusions about the behaviour of urban residents. Yuan et al. [17] proposed a framework named DRoF (Discover Regions of different Functions) to divide Beijing into nine functional areas using this framework.
With regard to tracking data, public transportation data play an important role in expressing residents′ behavioural activities. Therefore, this type of data has become a practical tool to extract residents′ activity rules and summarize urban functions. Sun et al. [18] used smart card data to extract the space-time density of human activities and the track of trains. Zhong et al. [19] proposed a method to infer urban functions at the building level using transportation data obtained from surveys and smart card systems. Han et al. [20] used bus card swipe data to extract the rules of public transportation use by residents and then identified the functional areas of Beijing. Point-of-interest (POI) data are often used as auxiliary interpretation data for urban function analysis in scholars′ studies [21].
As the main mode of transport for citizens, taxi travel largely reflects the spatial and temporal patterns of urban population movement, so taxi trajectory data are widely applied to study urban functional areas. Liu [22] analysed the global spatial-temporal pattern of trips and explored urban land use with GPS-enabled taxi data. Pan et al. [23] used time series extracted from taxi track data to classify urban functional areas. Chen et al. [24] used GPS data of floating vehicles to identify functional areas of Guangzhou from the perspective of semantic analysis. However, previous studies on the temporal characteristics of mined data are not sufficient. In addition, the direct clustering method, namely, unsupervised learning, was used in most time series mining studies, and there may be many inaccuracies in category definitions in unsupervised learning because of the lack of accurate data labels for the training sample.
Therefore, this paper adopts a semi-supervised learning method that combines clustering and classification to perform data mining. First, K-medoid clustering is conducted based on the similarity of time series data, which is calculated by the dynamic time warping (DTW) algorithm. Accurate results are selected as training samples. Then, k-nearest neighbour (KNN) classification is performed on similar time series data based on the training samples generated in the previous step, and accurate classification results are obtained. Finally, POI analysis is used to analyse the specific functions of various functional areas. This paper provides a new way of thinking about the application of trajectory data mining in the identification of urban function. In the context of national spatial planning policies in the new era, these results have reference value for the systematic understanding of urban spatial structure and decision support for the scientific planning of urbanization layout.

Study Area
Chengdu, the capital of Sichuan Province, is a crucial national high-tech industrial base and an important central city in the western region. According to the official website of the People′s Government of Chengdu (http://www.chengdu.gov.cn/), by 2018, Chengdu covered an area of 14,335 square kilometres and had a permanent population of 16.33 million and a GDP of 1534.277 billion yuan. The research area of this paper is within the beltway of Chengdu, including the urban land of Qingyang District, Jinniu District, Chenghua District, Jinjiang District, Wuhou District and other districts and counties. As a modern metropolis, Chengdu has a complicated urban land and spatial structure, and various urban functions are staggered, but there are still rules to follow. This paper studies the complex distribution of functions in Chengdu from the perspective of big data mining. The study area is shown in Figure 1.

Study Data and Preprocessing
The main data used in this paper are Chengdu road network data, vehicle trajectory data and POI data. The unified coordinate system is CGCS2000_3_Degree_GK_CM_105E.
Chengdu road network data, which were collected from Gaode Map, were used to generate a traffic analysis zone (TAZ). The TAZ is a useful tool for analysing complex urban traffic networks because there are similar characteristics and strong correlations of traffic in the same TAZ [8]. Therefore, the TAZ was taken as the basic spatial unit in the analysis of urban functional structure in this paper. After removing unnecessary details (overpasses, roundabouts, etc.), the five grades of highways, urban expressways, national highways, provincial highways, and urban trunk roads were used to divide the study area into 422 TAZs ( Figure 2). Vehicle trajectory data were acquired from the GAIA open data initiative (https://gaia.didichuxing.com) of Didi Chuxing. Didi Chuxing, which is a travel platform for taxi, private car, express, driving and bus services, has changed the traditional way of taking a taxi and led the development of modern travel in the era of mobile Internet. In this study, a total of 3,298,395 records were selected from the order data for 14 days from November 7, 2016 (Monday), to November 20, 2016 (Sunday). After cleaning and preprocessing the original data, each record contained a unique identifier ID, the order start and end time and the start and endpoint locations (latitude and longitude).
POI data were acquired from the open platform of Gaode Map (https://lbs.amap.com), a leading provider of digital map content, navigation and location services in China. This study used this open platform to collect POI data in December 2016 within the beltway of Chengdu, with a total of 560,369 records. The original POI data were classified into various categories covering a variety of subcategories, and there were overlapping problems in different categories, so it was necessary to delete and reclassify the POI data. Referring to the latest version from 2011 of the "Standard of urban land classification and planning construction land of China (GB50137-2011)" and considering the type and attributes of urban functional areas, this paper divided the POI data into the following categories: catering, shopping services, leisure services, accommodation services, science and education services, healthcare services, dwellings, companies, government agencies and social organizations, and tourist attractions.

Methods
In the study of time series data, clustering is a common data mining method. However, due to the lack of accurate data labels for the training samples, there may be many inaccurate category definitions in the results of the clustering algorithm. Therefore, this paper adopts a semi-supervised learning method that combines clustering and classification to perform data mining. The core method of this paper is based on DTW calculation and the KNN classification method to classify urban functional areas. Since there were no training samples in the current situation, the training samples were first selected by combining K-medoid clustering based on DTW and analysing the baselines of the time series data. Then, KNN classification based on DTW was performed, and finally, POI analysis was used to analyse the specific functions of various functional areas. The specific research process is shown in Figure 3.

Methods of Time Series Generation
To understand the travel patterns of residents, the average daily and hourly pickup and dropoff times on weekdays and weekends in the study area were counted ( Figure 4a). On this basis, the hourly traffic volume (pickup + drop-off) and inflow (drop-off -pickup) on weekdays and weekends were calculated and counted ( Figure 4b). Figure 4 shows that there are large differences in the travel patterns of residents on working days and rest days, so working days and rest days should be treated separately. The pickup and drop-off point data on weekdays and weekends intersect with the TAZ data. Then, the pickup and drop-off numbers within each hour and each TAZ were counted. Finally, the average passenger numbers over 24 hours a day on weekdays and weekends were calculated. We obtained 4 sets of data in 422 TAZs: pickup quantity on weekdays, dropoff quantity on weekdays, pickup quantity on weekends and dropoff quantity on weekends [25]. quantity of drop-offs on weekends; (b) quantity of traffic volume/inflow on workdays/weekends. E represents the traffic volume on workdays; F represents the traffic volume on weekends; G represents the inflow on workdays; H represents the inflow on weekends.
It can be seen from Figure 4 that the independent analysis of the pickup and drop-off quantities does not adequately reflect the travel pattern. Therefore, adding traffic volume (pickup + drop-off) and inflow (drop-off -pickup) to the time series can better reflect the travel characteristics of residents. In addition, considering that this research unit is a TAZ generated by cutting roads rather than a regular grid division, the area differences between each research unit are large, so the density of each unit should be used to create the time series.
In summary, the time series of each ultimately generated research unit is 96 dimensions: where 0~23 represents the traffic volume per unit area on working days, 24~47 represents the traffic volume per unit area on rest days, 0~23 represents the inflow per unit area on working days, and 24~47 represents the inflow per unit area on rest days. Time series for 422 plots were generated.

Dynamic Time Warping
In essence, the clustering problem of time series involves how to better measure the similarity of 2 time series [26]. The methods for measuring the similarity of time series can be generally divided into three categories: time, shape and variation [27]. This paper focuses on the time series of taxi traffic volume, whose shape characteristics (rise, fall, extremum) are the response to residents′ travel patterns. In the methods of the time series similarity measurement based on the shape, the DTW algorithm applies the constraints of the structured time dimension to find the best correspondence between the 2 observed sequences and can mine similarities and differences of the time sequences with maximum flexibility. Therefore, DTW has practically become the best distance measurement method of time series similarity calculation [28].
To eliminate migration and scaling in the process of data collection, Z-normalization is first performed for the original time series. It is assumed that T is the original time series and Z is the Znormalized time series: Then, where and are the arithmetic mean value and standard variance of sequence T and Z, respectively: To improve the efficiency of data processing, the piecewise aggregate approximation (PAA) algorithm is adopted here to reduce the dimensionality of the data: In this algorithm, every P data point in the sequence is averaged, and the new value generated is a sampling point of the new sequence that is dimensionally reduced [29].
After dimension reduction, DTW was conducted to measure the similarity of 2 sequences. It is assumed that both A and B are time series after Z-normalization and PAA dimension reduction: The DTW algorithm first establishes an n×m matrix MAR. Each element in the matrix represents the distance between point and point [30]: DTW then finds the shortest path in the matrix MAR from the element at the bottom left to the element at the top right, satisfying 3 constraints: boundary conditions, continuity conditions, and monotone conditions. The boundary condition means that the starting point of the path is the element in the lower left corner of the matrix and the ending point is the element in the upper right corner. The continuity condition means that, except for the starting and ending points, 2 points must be adjacent around each element in the path. The monotonicity condition requires that the next element on the path must be to the right of or above the previous element and must not span 2 elements. Among all the paths that meet the above three constraints, DTW selects the shortest path: where ( , ) refers to the minimum cumulative distance of the current elements ( , ) and (0,0) = 0, (0, ) = ( , 0) = ∞.

K-Medoids
For a large amount of data without labels, semi-supervised learning usually adopts manual methods to mark a small number of data labels with typical characteristics and uses them as training samples to train most of the remaining data without labels [31]. In this paper, training samples are generated by combining unsupervised learning with manual labelling. K-medoid clustering based on DTW calculation was adopted, and typical and accurate data were selected as training samples for semi-supervised classification according to the time series baseline in the clustering results.
Using the DTW algorithm, we can obtain the plot distance matrix, that is, the similarity matrix of the time series of taxi traffic volume for 422 plots. Based on this matrix, we can distinguish the differences of the different plot types [32]. In the generation of training samples, the clustering method adopted in this paper is K-medoids, which is the preferred large-scale data clustering analysis method. The difference between K-medoids and the K-means algorithm is that the centre point selection in the K-means algorithm is the focus of all points in the current cluster; however, the centre point selection in K-medoids exists in the current cluster, and the sum of the distances between all other points in the current class and this centre point is the smallest, so K-medoids are less affected by outliers [33].
The size of the data set, the purpose of classification and the validity of the clustering effect should be considered comprehensively to determine the number of clusters, K [25]. To evaluate the impact of different clustering numbers on the reliability of the clustering results, the clustering number reliability should be evaluated by the silhouette coefficient, which can be calculated by repeatedly conducting clustering operations [33]: where ( ) represents the mean value of the DTW distance between sample point and other sample points in the same cluster and ( ) represents the mean value of the minimum DTW distance between sample point and other clusters. The larger the value of ( ) is, the better the matching degree between the sample point and the existing clustering results. When ( ) is a negative value, it indicates that the sample point should be aggregated into neighbouring clusters.

K-Nearest Neighbour
The k-nearest neighbour (KNN) classification algorithm is a simple but effective algorithm in data mining classification technology. The basic idea of this algorithm is that the sample to be classified belongs to the group of classified samples with k-nearest neighbours. The time series data of traffic volume belong to the data type with many overlapping class domains. Therefore, compared with the classification method that relies on class domain discrimination, KNN mainly relies on the limited neighbouring samples as the classification basis, which has better applicability.
In this paper, the KNN method based on DTW calculation was adopted to classify the time series data. DTW can help to calculate the minimum path between time series curves, which was used to replace the Euclidean distance in the KNN algorithm for time series clustering [34]. Experience shows that when combined with DTW, the nearest neighbour algorithm works best when K = 1 (i.e., 1 nearest neighbour classification) [35,36].

POI Auxiliary Analysis
After K-medoid clustering and KNN classification of the time series, the differences in urban land types are determined. Then, specific functional types of each category should be identified. In this paper, the definition of function was mainly determined according to the POI types in the clusters. A POI contains semantic information on urban functions and plays an important role in understanding the spatial and temporal utilization of urban space [1]. The frequency density (FD) and category ratio (CR) of POIs are usually used to determine the specific function of one region [37]: where refers to the number of plot types after clustering, refers to the number of the POI type in cluster , refers to the total area of cluster , and therefore represents the density of POI type in cluster . However, from the preprocessing of POI data, it can be found that there is a large difference in the number of POIs of different categories. To eliminate the impact, min-max standardization should be carried out to [38,39]: ( , ) = , − − ( = 1,2, … ,10; = 1,2, … , ) Then, the normalized frequency density is used to calculate the category ratio:

Generation of the Training Sample
To obtain training samples of semi-supervised learning, the K-medoids algorithm was utilized to cluster the preprocessed time series data. First, through repeated clustering operations, the reliability of the number of clusters is evaluated by the silhouette coefficient. The change in the silhouette value with the number of clusters K is shown in Figure 5. The larger the silhouette value is, the better the clustering effect will be. The clustering results are shown in Figure 6. At this point, based on the trajectory time series data, the land is divided into six clusters (C1~C6), but the specific functions of each cluster are still unknown. However, we can see from the figure that the functional distribution of the study area is a ring structure. Then, the baseline of the time series of each cluster was taken as the evaluation standard, and data with good clustering effects of 20% were selected as the training samples. The results are shown in Figure 7.

Results of KN--DTW Classification
The above classification results of functional regions are obtained by direct K-medoid clustering of time series data. Direct clustering is an unsupervised learning method that may lead to inaccurate classification in some regions. To make the results more reliable, this study selected some data with good classification effects from the above clustering results as training samples according to the baselines of the time series of each cluster (Figure 7). KNN-DTW classification based on the training samples was then conducted on the remaining data to obtain the final result of the functional area classification (Figure 8). The baselines of the time series of each cluster can be used to obtain traffic volume and inflow information per unit time and area on working days and rest days to determine the travel patterns of residents in various plots ( Figure 9).

Results of POI Auxiliary Analysis
To define the specific functions of each cluster, this study calculated the frequency density (FD) and category ratio (CR) of POI data of Cluster C1 to Cluster C6 (Table 1). By analysing the residents′ travel characteristics and POI distribution characteristics of each cluster, the specific functions can be defined as follows.
C1 is distributed at the edge of the study area, and the frequency density of POIs is the lowest, so the rule should be determined by the direction of residents′ travel characteristics. Figure 9a shows that the population flow in this area is not obvious on weekdays. However, on rest days, the population inflow occurs in the morning, and there are relatively high population outflows in the afternoon and evening, which is consistent with the pattern of people playing and visiting relatives in suburban areas on weekends. Therefore, C1 is judged as the suburban tourism area. C2 and C3 are distributed in the transition area between the suburb and the city. In terms of the proportion of POI types, both C2 and C3 have service facilities such as shopping, science, education and medical treatment necessary for residents′ lives, and C3 has relatively more residential POIs. In terms of travel characteristics, the outflow and inflow peaks of C2 on working days are 8:00 and 24:00, respectively; these peaks are at 8:00 and 19:00, respectively, in C3. Compared with C3 on working days, C2 has more obvious commuting patterns in residential areas. C2 belongs to the transitional region of C1 and C3, so it is determined that C3 is the urban residential area and C2 is the residential/tourism mixed area.
POIs in the office class (including science and education, medical treatment, corporations and government agencies) have high proportions of C4 and C5 compared with other areas. From the perspective of travel rules, C4 during the daytime (7:00 -19:00) on working days experiences population outflow. The cut-off point for population inflow/outflow on the rest day is 13:00. The peak population outflow on the rest day occurs at 8:00. The traffic volume of C5 fluctuates frequently in a day, generating many extreme values. After 8:00 on weekdays, the inflow of the population gradually begins, while the outflow of the population reaches its peak at 18:00. Compared with C4, C5 has more obvious characteristics of commuting time, so it is judged as the office area. The characteristics of weekday population inflow in C4 are similar to those in residential areas, and the variation trends of the population inflow on rest days are the same as those on weekdays. However, the variation range is small, which may be because people′s shopping activities on rest days offset the original trend to some extent. Therefore, C4 is concluded to be the residential/commercial mixed area.
C6 is only distributed in the central area of the city. From the perspective of travel characteristics, the population inflow on weekends and weekdays reaches a peak at approximately 10:00, but the population outflow starts at 11:00 on rest days and 16:00 on working days. The traffic volume on rest days and working days peaks at 10:00 and 18:00, respectively, which is highly similar to people′s shopping times. According to the distribution characteristics of POIs, there are many types and quantities of POIs in this area, and all types of facilities are high-quality, so this area is judged to be the mature business area.

Discussion
In this paper, trajectory data were converted into time series data, and information mining was performed. After the similarity of the time series was obtained using the DTW calculation method, K-medoid clustering was performed, and the results with good clustering effects were selected as the training samples. Then, KNN classification was performed based on the training samples to obtain the final identification of urban functional areas. This study can provide a new idea for data classification without training samples. Although the method is common, it is very suitable for machine learning of trajectory time series. As long as this idea is followed, replacing K-medoids and KNN with other methods is also suitable for big data mining.
To verify the identification effect of this method, the results were compared with Google Earth images, Gaode Maps, and real photos of landmark areas. The comparison results in some typical regions are shown in Table 2. The data below the Google image and the Gaode Map image are the locations (latitude and longitude) of the centre point of the captured image. Since the reference coordinate system of Google Earth and Gaode Map are different, there will be some deviation in the dimension and longitude data of the same spot. In addition, to get as close as possible to the time of the trajectory data and POI data, Google images of 2017/4/13 was selected. The capture time of the Gaode Map was 2019/8/17. The fetch time of real photos varied from place to place, so the fetch time is marked below the image in Table 2.
The landmark area in the first group is Qinglong Wetland Park, which is the largest wetland park and a famous historical and cultural scenic spot in Chengdu. The second group is the Chengdu Research Base of Giant Panda Breeding, a famous institution for giant panda research, breeding and protection in China and the world; it is also a national AAAA tourist attraction. As famous tourist attractions, these areas are in line with the urban functions of suburban tourism. In the third and fourth groups, the landmark areas are Chengdu Happy Valley and East Lake Park, which are also famous scenic spots in Chengdu city. However, in these areas, in addition to tourist attractions, there are some moderately dense residential buildings. In groups 5 and 6, a large number of residential buildings with high density and orderly arrangement are distributed. Furthermore, this area is close to the city centre, which is consistent with the positioning of the "Urban Residential Area" functional area. From the comparison of groups 1-6, it can be found that the living, travelling and transition areas can be well separated in this study.
The area in group 7 is Jinli, which contains a variety of specialty food and beverage shops and themed commodity shops with prominent commercial functions. There are also some residential buildings similar to those in C3 in this area. The area in the eigth group is People′s Park, which is the centre of the old city of Chengdu. Although this area has certain tourist value, the distribution of old shops with strong histories and low, densely populated old houses around the area is more prominent. In summary, both of these areas are in line with their location in the "Residential/Commercial Mixed Area". The landmarks in groups 9 and 10 are the People′s Government of Sichuan Province and the Chengdu 339 TV Tower, which not only have prominent office functions but are also well supplemented by a large number of office buildings distributed around them. The landmark area in the 11th group is Chunxi Road, which is the busiest and most prosperous commercial street in Chengdu and the characteristic commercial street that is famous in literature all over the country. The area in the 12th group is Tianfu Square, which is located in central Chengdu city. This area has consistently been the symbol of Chengdu and even Sichuan Province, and it is a city landmark. These two areas are located in the centre of the city, where commercial functions occupy almost the entire area. There are few buildings for other functions around them, so they are in line with the functional positioning of "Mature Commercial Areas". The comparison of groups 7-12 reveals that this study has a good ability to distinguish mature commercial areas from residential/commercial areas. The distribution of office areas is similar to the two adjacent functional areas, but the functions are quite different. This study can also identify these patterns.
In summary, the method used in this study can effectively identify the main functions and their distribution in Chengdu city with good accuracy. According to the identification results, the distribution of functional areas in the study area is basically circular and follows the distribution mode of tourism-residence-commerce, which is consistent with the generally recognized urban structure of Chengdu.
In a recent study of Chengdu urban functional regions, the research by Gao et al. [37] was also based on trajectory data and adopted a combination of Gaussian mixture model (GMM) and Pearson correlation coefficient (PCC). Their results are similar to the results of this study, although there are some differences in the nomenclature and regional definitions of functional areas; however, there are few differences in the overall urban functions and their distribution. In addition, Gao's results can distinguish Chunxi Road and Chengdu Railway Station from other functional areas and identify them separately, which is not achieved by the method in this paper. These two sites are relatively single and prominent, so this point has reference value for this paper. The training sample selection method used in this paper and the idea of semi-supervised learning also have reference value for the data mining models using clustering alone.

Conclusion
With the deepening of urbanization, the spatial structures of cities present complex but regular characteristics. This paper analyses the urban spatial structure from the perspective of big data mining. The results of this method are consistent with the actual situation, and the findings of this study are as follows. Traffic volume and inflow can better reflect residents′ travel rules than simple on and off data. The original DTW method has high temporal complexity, which can be improved by normalization and the reduction of the dimensionality of the time series. The semi-supervised learning classification method is applicable to trajectory data, and it is better to select typical unsupervised learning models as the training samples. This method can provide a theoretical basis for urban land planning, administrative division adjustments, urban resource allocation and other fields, and it has auxiliary and guiding value for the overall scientific planning of land use and urbanization layout in the context of national spatial planning policies in the new era.
There is still much work to be done in this area of study in the future. In this paper, taxis are taken as representative of residents′ travel, and other means of residents′ travel, such as public transportation and bicycles, are not considered. In addition, LBS big data, such as WeChat circle of friends data and Weibo check-in data, have important reference value for the interpretation and classification of urban land use. Therefore, multi-source urban big data should be integrated in future studies to make the classification results more detailed and reliable. In addition, after obtaining reliable classification results of urban functional areas, the spatial structure of each functional area and its correlation degree can be analysed. Then, the reasonable utilization degree of urban space can be evaluated to attempt to provide effective optimization suggestions.