Where Urban Youth Work and Live: A Data-Driven Approach to Identify Urban Functional Areas at a Fine Scale

As a major labor force of cities, young people provide a huge driving force for urban innovation and development, and contribute to urban industrial upgrading and restructuring. In addition, with the acceleration of urbanization in China, the young floating population has increased rapidly, causing over-urbanization and creating certain social problems. It is important to analyze the demand of urban youth and promote their social integration. With the development of the mobile Internet and the improvement of the city express system, ordering food delivery has become a popular and convenient way to dine, especially in China. Food delivery data have a significant user attribute where the ages of most delivery customers are under 35 years old. In this paper, we introduce food delivery data as a new data source in urban functional zone detection and propose a time-series-based clustering approach to discover the urban hotspot areas of young people. The work and living areas were effectively identified according to the human behavioral characteristics of ordering food delivery. Furthermore, we analyzed the relationship between young people and the industry structure of Hangzhou and discovered that the geographical distribution of the identified work areas was similar to that of the Internet and e-commerce companies. The characteristics of the identified living areas were also analyzed in combination with the distribution of subway lines and residential communities, and it was found that the living areas were mainly distributed along subway lines and that urban villages appeared in the living hotspot regions, indicating that transportation and living cost were two important factors in the choice of residential location for young people. The findings of this paper can help urban industrial and residential planning and young population management.


Introduction
The youth population is a group with strong creativity and high consumption power, who provide a strong driving force for urban innovation and development. With the acceleration of urbanization in China, a large number of young people have migrated from villages to cities. The fast-growing young floating population causes over-urbanization, which leads to urban congestion and insufficient infrastructure, and creates certain social problems, especially with respect to housing, employment, college campuses, and large industrial parks, and can provide a more accurate dataset for functional area extraction. This paper proposes a time-series-based clustering approach to discover the urban hotspot areas of young people. A case study of Hangzhou, China, which is famous for its Internet economy, was implemented to discuss the relationship between urban youth and the urban industrial structure and the factors that affect the residential location choice of young people.
The remainder of this paper is organized as follows: Section 2 describes the study area and data source, and performs some statistics of the dataset. Section 3 discusses the method for extracting urban functional areas from food delivery data, and the results are verified with office urban planning maps. Section 4 analyzes the spatial distribution of the urban youth population, combined with the industrial structure and the distribution of residential areas in Hangzhou. This paper concludes with a brief summary and discussion in Section 5.

Study Area and Data
Hangzhou, capital of East China's Zhejiang Province, is famous for its Internet and digital economy, and it is also home to Chinese Internet and e-commerce giant Alibaba and a number of other world-leading high-tech companies. According to the statistics in 2018, the main business income of Hangzhou's digital economy exceeded 1 trillion yuan, and its added value reached 335.6 billion yuan, accounting for 24.8% of the city's GDP [36]. The study area of this paper is the main city area of Hangzhou, which covers 74 street districts with an area of over 1600 km 2 ( Figure 1). ISPRS Int. J. Geo-Inf. 2020, 9,42 3 of 20 extraction. This paper proposes a time-series-based clustering approach to discover the urban hotspot areas of young people. A case study of Hangzhou, China, which is famous for its Internet economy, was implemented to discuss the relationship between urban youth and the urban industrial structure and the factors that affect the residential location choice of young people. The remainder of this paper is organized as follows: Section 2 describes the study area and data source, and performs some statistics of the dataset. Section 3 discusses the method for extracting urban functional areas from food delivery data, and the results are verified with office urban planning maps. Section 4 analyzes the spatial distribution of the urban youth population, combined with the industrial structure and the distribution of residential areas in Hangzhou. This paper concludes with a brief summary and discussion in Section 5.

Study Area and Data
Hangzhou, capital of East China's Zhejiang Province, is famous for its Internet and digital economy, and it is also home to Chinese Internet and e-commerce giant Alibaba and a number of other world-leading high-tech companies. According to the statistics in 2018, the main business income of Hangzhou's digital economy exceeded 1 trillion yuan, and its added value reached 335.6 billion yuan, accounting for 24.8% of the city's GDP [36]. The study area of this paper is the main city area of Hangzhou, which covers 74 street districts with an area of over 1600 km 2 ( Figure 1). The dataset used in this paper was provided by the Chinese logistics company "Dianwoda", which recorded food delivery electric scooter trips that occurred in Hangzhou from 28 July 2017 to 26 September 2017, and the user-related information was removed for privacy protection. The original delivery data collected real-time GPS information of the electric scooters during the entire food delivery process. The fields of the dataset mainly include the order ID, rider ID, status, timestamp, latitude, and longitude. The order ID is a unique identifier for a delivery order, while the rider ID is a unique identifier for a delivery rider. Real-time GPS location information is converted to latitude and longitude, accompanied by a time stamp. The field "status" represents different stages of the food delivery process including dispatch, arriving, arrive, leave, delivering, and finish. In this study, the origin and destination (OD) pairs for each delivery order are worthy of our major concern, where the origin for the delivery order refers to the location of the restaurant, and the destination refers to the location of the delivery customer. The record, whose status changes from arrive to leave, was extracted as the origin of one delivery order, and the record, whose status changes from delivering to finish, was extracted as the destination. In this way, we could obtain the delivery OD dataset by extracting and reorganizing the delivery electric scooter trip data. The dataset used in this paper was provided by the Chinese logistics company "Dianwoda", which recorded food delivery electric scooter trips that occurred in Hangzhou from 28 July 2017 to 26 September 2017, and the user-related information was removed for privacy protection. The original delivery data collected real-time GPS information of the electric scooters during the entire food delivery process. The fields of the dataset mainly include the order ID, rider ID, status, timestamp, latitude, and longitude. The order ID is a unique identifier for a delivery order, while the rider ID is a unique identifier for a delivery rider. Real-time GPS location information is converted to latitude and longitude, accompanied by a time stamp. The field "status" represents different stages of the food delivery process including dispatch, arriving, arrive, leave, delivering, and finish. In this study, the origin and destination (OD) pairs for each delivery order are worthy of our major concern, where the origin for the delivery order refers to the location of the restaurant, and the destination refers to the location of the delivery customer. The record, whose status changes from arrive to leave, was extracted as the origin of one delivery order, and the record, whose status changes from delivering to finish, was extracted as the destination. In this way, we could obtain the delivery OD dataset by extracting and reorganizing the delivery electric scooter trip data.
In the delivery OD dataset, each record includes the order ID, the departure and arrival time, the origin and destination location, the delivery trip distance, and the delivery time. The delivery OD dataset contained 7,480,402 orders. Table 1 shows a small subset of these data for reference, and Figure 2 shows the location distribution of the food delivery destination points in the study area. In the delivery OD dataset, each record includes the order ID, the departure and arrival time, the origin and destination location, the delivery trip distance, and the delivery time. The delivery OD dataset contained 7,480,402 orders. Table 1 shows a small subset of these data for reference, and Figure 2 shows the location distribution of the food delivery destination points in the study area.  As with many large datasets, the delivery OD dataset contains errors. Some errors are easy to identify such as point coordinates out of the research area or order trips with a negative distance or time. Table 2 shows the thresholds used in each filter as well as the quantity of orders that violate the threshold. A trip distance of less than 0 as a negative value was filtered because it is geometrically impossible, and the orders whose trip distance was more than 20 km were also filtered because of their low incidence. The orders with a delivery time between 1 min and 90 min were reserved. Figure  3 shows the delivery duration and trip distance distribution of the delivery OD data after filtering. As with many large datasets, the delivery OD dataset contains errors. Some errors are easy to identify such as point coordinates out of the research area or order trips with a negative distance or time. Table 2 shows the thresholds used in each filter as well as the quantity of orders that violate the threshold. A trip distance of less than 0 as a negative value was filtered because it is geometrically impossible, and the orders whose trip distance was more than 20 km were also filtered because of their low incidence. The orders with a delivery time between 1 min and 90 min were reserved. Figure 3 shows the delivery duration and trip distance distribution of the delivery OD data after filtering.   We also calculated some statistics to understand the characteristics of the dataset. As shown in Figure 4, we counted the average number of delivery orders of each day in one week and the average number of delivery orders on an hourly basis in one day for both weekdays and weekends. It was observed that people ordered more deliveries on weekends than on weekdays. There were two peaks on each day, one peak from 11:00 to 13:00 and another peak from 17:00 to 20:00, which was similar on both weekdays and weekends.

Methods
In this section, we aimed to explore the spatially significant areas of urban youth, and determine the functional types of the extracted areas. The process was divided into the following steps: (1) map the food delivery destination points into a spatiotemporal cube and divide the research area into equal grids; (2) calculate the delivery hot value by using the modified Getis-Ord statistic method, and construct the time series by combining the time pattern with the delivery hot values of each grid; and (3) cluster the time series of the delivery data with the unsupervised k-means method and determine the functional types of clusters based on the behavioral characteristics of ordering food delivery. Finally, the results were verified by office urban planning maps.  We also calculated some statistics to understand the characteristics of the dataset. As shown in Figure 4, we counted the average number of delivery orders of each day in one week and the average number of delivery orders on an hourly basis in one day for both weekdays and weekends. It was observed that people ordered more deliveries on weekends than on weekdays. There were two peaks on each day, one peak from 11:00 to 13:00 and another peak from 17:00 to 20:00, which was similar on both weekdays and weekends.   We also calculated some statistics to understand the characteristics of the dataset. As shown in Figure 4, we counted the average number of delivery orders of each day in one week and the average number of delivery orders on an hourly basis in one day for both weekdays and weekends. It was observed that people ordered more deliveries on weekends than on weekdays. There were two peaks on each day, one peak from 11:00 to 13:00 and another peak from 17:00 to 20:00, which was similar on both weekdays and weekends.

Methods
In this section, we aimed to explore the spatially significant areas of urban youth, and determine the functional types of the extracted areas. The process was divided into the following steps: (1) map the food delivery destination points into a spatiotemporal cube and divide the research area into equal grids; (2) calculate the delivery hot value by using the modified Getis-Ord statistic method, and construct the time series by combining the time pattern with the delivery hot values of each grid; and (3) cluster the time series of the delivery data with the unsupervised k-means method and determine the functional types of clusters based on the behavioral characteristics of ordering food delivery. Finally, the results were verified by office urban planning maps.

Methods
In this section, we aimed to explore the spatially significant areas of urban youth, and determine the functional types of the extracted areas. The process was divided into the following steps: (1) map the food delivery destination points into a spatiotemporal cube and divide the research area into equal grids; (2) calculate the delivery hot value by using the modified Getis-Ord statistic method, and construct the time series by combining the time pattern with the delivery hot values of each grid; and (3) cluster the time series of the delivery data with the unsupervised k-means method and determine the functional types of clusters based on the behavioral characteristics of ordering food delivery. Finally, the results were verified by office urban planning maps.

Spatiotemporal Cube Construction
In this paper, the delivery destination points are the main processing target. As shown in Table 1, the destination points have longitude-latitude information and the arrival time, which can be mapped into a three-dimensional Euclidian space. We constructed a spatiotemporal cube model where the cubes had equal space size and time intervals. Each cube was represented as a three-dimensional array c = (x c , y c , t c ), where x c is the number of cubes in the x dimension, y c is the number of cubes in the y dimension, and t c is the number of cubes in the time dimension. We set the time size to 1 h and the spatial size to 0.001 degrees (approximately 95 to 115 m). The research area was also divided into equal grids, as shown in Figure 5. After data mapping and aggregation, each spatiotemporal cube stored the count of the destination points in that cube as its attribute.

Spatiotemporal Cube Construction
In this paper, the delivery destination points are the main processing target. As shown in Table  1, the destination points have longitude-latitude information and the arrival time, which can be mapped into a three-dimensional Euclidian space. We constructed a spatiotemporal cube model where the cubes had equal space size and time intervals. Each cube was represented as a threedimensional array = ( , , ), where is the number of cubes in the x dimension, is the number of cubes in the y dimension, and is the number of cubes in the time dimension. We set the time size to 1 h and the spatial size to 0.001 degrees (approximately 95 to 115 m). The research area was also divided into equal grids, as shown in Figure 5. After data mapping and aggregation, each spatiotemporal cube stored the count of the destination points in that cube as its attribute.

Modified Getis-Ord Statistic Method
The well-known Getis-Ord statistic method was used in this paper to calculate the hot value of each cube. To be a statistically significant hotspot, a feature will have a high value and be surrounded by other features that also have high values. The method provides a z-score that allows users to determine where the features with either high or low values are clustered spatially. Formally, based on the constructed spatiotemporal cube, the Getis-Ord value G * of each cube is defined as [37,38]: where n refers to the total number of cubes, and is the attribute value of cube . , is the spatial weight between cubes i and j, and is defined as follows:

Modified Getis-Ord Statistic Method
The well-known Getis-Ord statistic method was used in this paper to calculate the hot value of each cube. To be a statistically significant hotspot, a feature will have a high value and be surrounded by other features that also have high values. The method provides a z-score that allows users to determine where the features with either high or low values are clustered spatially. Formally, based on the constructed spatiotemporal cube, the Getis-Ord value G * i of each cube is defined as [37,38]: where n refers to the total number of cubes, and a j is the attribute value of cube c j . w i,j is the spatial weight between cubes i and j, and is defined as follows: Here, we define two cubes as neighbors only if the maximum distance between i and j in any of the three coordinates (x c , y c , t c ) ≤ 1. A cube is also its own neighbor. Then, to simplify the computation of the Getis-Ord statistic value, we use N(c i ) to denote the set of neighboring cubes of c i and N(c i ) to denote the number of such neighboring cubes. Therefore, we obtain the following simplified formula for the Getis-Ord statistic:

The Time Series Construction
It is well known that people usually have different behavior patterns on weekdays and on weekends; thus, we built a 48-h timeline that consisted of 24-h periods on weekdays and weekends. After the Getis-Ord statistic calculation was performed for all the cubes, a group function was used to gather the cubes with the same location and time period, and an aggregate function averaged the Getis-Ord values of the cubes in the same group. Finally, the synthesized time series were obtained as a linear combination of the 48-h timeline and the statistical value of delivery order quantity. Then, each grid would have a time series TS k g k t , t = 1, 2, . . . , 48; k = 1, 2, . . . , n , where n is the total number of the grids, and g k t is the hot value of grid k at time period t (1-24 for weekdays and 25-48 for weekends). As the Getis-Ord statistic method is a common method for hotspot analysis, we also obtained 48 delivery hotspot distribution maps that corresponded to 48 time periods. The delivery hotspot distribution during typical time periods (i.e., lunch time, dinner time, and midnight) is visualized in Figure 6.

Time Series Classification and Identification
The statistical hot values in the time series represent the delivery quality in one region during one period. That is, when the hot value is higher, more delivery orders are placed in this area, and low hot values generally represent less delivery orders. To better identify functional areas, the grids whose hot values were negative in all time periods were removed. To emphasize the characteristics of the delivery data and weaken the impact of regional factors, the hot values of the time series were normalized for each grid.
The clustering method k-means was applied to classify the time series. Unsupervised clustering methods require the number of clusters to be known beforehand. To determine an appropriate value for K, the ratio between the intra-cluster and inter-cluster distances was introduced as a validity index [39], which is defined as: where N is the number of instances; K is the number of clusters; and z i is the centroid of cluster C i . An ideal partition will minimize the intra-cluster distance and maximize the inter-cluster distance; therefore, the best partition will be the partition that minimizes the validity value. Figure 7 presents the validity values by using the k-means cluster method to the synthesized time series for { } 2,...,10 K ∈ , and the minimum validity value was obtained when K = 6. An ideal partition will minimize the intra-cluster distance and maximize the inter-cluster distance; therefore, the best partition will be the partition that minimizes the validity value. Figure 7 presents the validity values by using the k-means cluster method to the synthesized time series for K ∈ {2, . . . , 10}, and the minimum validity value was obtained when K = 6. An ideal partition will minimize the intra-cluster distance and maximize the inter-cluster distance; therefore, the best partition will be the partition that minimizes the validity value. Figure 7 presents the validity values by using the k-means cluster method to the synthesized time series for { } 2,...,10 K ∈ , and the minimum validity value was obtained when K = 6.  Figure 8 represents the cluster results of the time series by using the k-means method with K = 6, where the plotted curves (each consisting of 48 points) represent the cluster centers (red for weekdays and blue for weekends). In addition, the maximum and minimum values of each time period in each class were plotted in a thin color shadow. Figure 9 shows the location distribution of the grids according to the cluster results.  Figure 8 represents the cluster results of the time series by using the k-means method with K = 6, where the plotted curves (each consisting of 48 points) represent the cluster centers (red for weekdays and blue for weekends). In addition, the maximum and minimum values of each time period in each class were plotted in a thin color shadow. Figure 9 shows the location distribution of the grids according to the cluster results.     Figure 8 represents the cluster results of the time series by using the k-means method with K = 6, where the plotted curves (each consisting of 48 points) represent the cluster centers (red for weekdays and blue for weekends). In addition, the maximum and minimum values of each time period in each class were plotted in a thin color shadow. Figure 9 shows the location distribution of the grids according to the cluster results.   Within different land use areas, people may demonstrate different routine activities. This allows us to determine the social functions of areas based on the customary characteristics of ordering delivery. The peaks of the cluster curves were concentrated in four dining periods (i.e., at lunch time and dinner time on weekdays and weekends). By comparing the curves of the cluster centroids, we found that six clusters could be divided into two categories based on their similar curve characteristics: one category contains clusters a-c, and another category contains clusters d-f.
Clusters a-c: This category is characterized by the fact that a significant peak appears during lunch time on weekdays. Delivery during weekdays is more than delivery during weekends, and delivery during lunch time is greater than delivery during dinner time both on weekdays and weekends. This category shows a work-related activity characteristic, and the hypothesis is that the areas included in this category are used as industrial parks, office buildings, and/or commercial-related areas.
Clusters d-f: This category representative shows that during weekdays, delivery at dinner time is greater than delivery at lunch time, especially in cluster d. Delivery during weekends is greater than delivery during weekdays. These characteristics imply living areas, where people come back after work on weekdays and stay during weekends, which may include residential communities, apartments, and/or college dormitories.
Clusters in the same category also exhibited different characteristic strength, with a > b > c and d > e > f. The factors that contribute to different cluster characteristic strength may include the following: (1) The number of people in a region varies during different time periods, for example, in industrial parks, the workers are mainly concentrated during the daytime on weekdays; and (2) the dining time is short or limited. The peak time of ordering delivery is related to people's eating habits. However, white-collar workers may be limited by work arrangements, resulting in a dining peak. The strong cluster characteristic also indicates that the social function of this area is significant.

Cluster Results Validation
To validate our hypotheses, we compared the cluster results against office urban planning maps released by the Hangzhou Planning Bureau, in which the land use types mainly include: residential, commercial, industrial, educational, and green ( Figure 10). ISPRS Int. J. Geo-Inf. 2020, 9,42 10 of 20 delivery. The peaks of the cluster curves were concentrated in four dining periods (i.e., at lunch time and dinner time on weekdays and weekends). By comparing the curves of the cluster centroids, we found that six clusters could be divided into two categories based on their similar curve characteristics: one category contains clusters a-c, and another category contains clusters d-f. Clusters a-c: This category is characterized by the fact that a significant peak appears during lunch time on weekdays. Delivery during weekdays is more than delivery during weekends, and delivery during lunch time is greater than delivery during dinner time both on weekdays and weekends. This category shows a work-related activity characteristic, and the hypothesis is that the areas included in this category are used as industrial parks, office buildings, and/or commercialrelated areas.
Clusters d-f: This category representative shows that during weekdays, delivery at dinner time is greater than delivery at lunch time, especially in cluster d. Delivery during weekends is greater than delivery during weekdays. These characteristics imply living areas, where people come back after work on weekdays and stay during weekends, which may include residential communities, apartments, and/or college dormitories.
Clusters in the same category also exhibited different characteristic strength, with a > b > c and d > e > f. The factors that contribute to different cluster characteristic strength may include the following: (1) The number of people in a region varies during different time periods, for example, in industrial parks, the workers are mainly concentrated during the daytime on weekdays; and (2) the dining time is short or limited. The peak time of ordering delivery is related to people's eating habits. However, white-collar workers may be limited by work arrangements, resulting in a dining peak. The strong cluster characteristic also indicates that the social function of this area is significant.

Cluster Results Validation
To validate our hypotheses, we compared the cluster results against office urban planning maps released by the Hangzhou Planning Bureau, in which the land use types mainly include: residential, commercial, industrial, educational, and green ( Figure 10). As shown in Figure 9, the cluster results are displayed in the form of grids on maps. To understand how well the clusters were identified as the work and living areas, we evaluated the percentage of overlap that exists between the grids of the clusters and the official urban planning maps. This way, we would have an understanding of the accuracy of our approach as well as of the difference between the results. It should be noted that this verification method is an approximate measure due to the different granularities of maps or human factors. As shown in Figure 9, the cluster results are displayed in the form of grids on maps. To understand how well the clusters were identified as the work and living areas, we evaluated the percentage of overlap that exists between the grids of the clusters and the official urban planning maps. This way, we would have an understanding of the accuracy of our approach as well as of the difference between the results. It should be noted that this verification method is an approximate measure due to the different granularities of maps or human factors. Table 3 shows the percentage of overlap between the office planning map (columns) and six cluster results (rows). Here, the office planning map was divided into the six land use types of residential, commercial, industrial, educational, mixed commercial and residential, and other. Commercial and industrial types refer to work areas on which office buildings and industrial parks are located; residential type areas are obviously living areas; educational areas are mixed areas in which dormitories are living type areas, and research institutes are work type areas. We can overlay the cluster grids with the office planning map to see which land use type the grids belong. Each element in the table represents the percentage of grids that belong to each land use type in one cluster.
According to our assumptions, clusters a-c represent work areas, while clusters d-f represent living areas. The commercial and industrial types are the two land use types that occupy the greatest proportion in clusters a-c, which account for 94.6%, 81.2%, and 66.4% in total, respectively. In cluster a, the industrial type accounts for 64.9% and was more than twice the proportion of the commercial type, which means that large-scale industry parks have a stronger industrial aggregation effect. The residential type obviously occupied the greatest proportion in clusters d-f, which account for 87.2%, 64.7% and 59.0%, respectively. The educational type is the second largest proportion in clusters e and f, perhaps because a college dormitory area, as a type of living area, is a hotspot for delivery due to its large student user group. Overall, according to the distribution of the land use types in each cluster, the work and living areas extracted by the method of this paper were credible. In addition, it could be discovered that the recognition accuracy of the regional function type was related to the strength of the cluster characteristics. That is, when the cluster characteristic strength is stronger, the accuracy of the regional function type identification is higher.

Work Areas
The work areas represent the location distribution of companies, enterprises, and industrial parks. As Hangzhou is famous for its Internet and data economy, we obtained information on Internet and e-commerce companies from a recruitment website called lagouwang (https://www.lagou.com) to compare the extracted work areas of this paper. The dataset included the company ID, company name, industry field, company scale, and company address, and all company address text were converted to latitude and longitude coordinates. After filtering out the data that were outside the research area and that contained errors, we finally had 1005 Internet and e-commerce company items. Figure 11 shows the location distribution of the companies.
We used heat maps to visualize the hotspot regions of the work areas, helping to discover the distribution pattern and potential characteristics. The heat maps were made with the kernel density estimation (KDE) method, which calculates the density of the features in a neighborhood around these features. Before making heat maps, three parameters should be set, namely the output cell size, the search radius, and the population field. The cell size was set to 0.0001 degrees, and the search radius was set to 0.005 degrees. The population field value determines the number of times to count the feature, which could be used to weigh some features more heavily than other features or to allow one point to represent several observations. In this section, the population field was set based on the scale of the companies and the strength of the cluster results' characteristics, as shown in Table 4. estimation (KDE) method, which calculates the density of the features in a neighborhood around these features. Before making heat maps, three parameters should be set, namely the output cell size, the search radius, and the population field. The cell size was set to 0.0001 degrees, and the search radius was set to 0.005 degrees. The population field value determines the number of times to count the feature, which could be used to weigh some features more heavily than other features or to allow one point to represent several observations. In this section, the population field was set based on the scale of the companies and the strength of the cluster results' characteristics, as shown in Table 4.   Figure 12 shows the heat maps made from the Internet and e-commerce companies and the extracted work areas. We discovered that the two heat maps had similar hotspot region distributions. This shows that a large number of young people were working in the Internet-related industries areas. There is a mutual driving relationship between industry agglomeration and talent gathering [40]. Regional industrial structure plays an important role in attracting talent or human capital [41], and the improvement of Internet-related industrial agglomeration level can promote the development of young labor gathering. The enhancement of the young labor gathering level can also promote the strengthening of Internet-related industrial agglomeration in the region. From the phenomenon that the spatial distribution of the delivery customers' work places is similar to the spatial distribution of the Internet and e-commerce companies, we can obtain an inference that Hangzhou's Internet-related industry has a well-planned structure and good development, which can provide sufficient employment and development opportunities for young people.   Figure 12 shows the heat maps made from the Internet and e-commerce companies and the extracted work areas. We discovered that the two heat maps had similar hotspot region distributions. This shows that a large number of young people were working in the Internet-related industries areas. There is a mutual driving relationship between industry agglomeration and talent gathering [40]. Regional industrial structure plays an important role in attracting talent or human capital [41], and the improvement of Internet-related industrial agglomeration level can promote the development of young labor gathering. The enhancement of the young labor gathering level can also promote the strengthening of Internet-related industrial agglomeration in the region. From the phenomenon that the spatial distribution of the delivery customers' work places is similar to the spatial distribution of the Internet and e-commerce companies, we can obtain an inference that Hangzhou's Internet-related industry has a well-planned structure and good development, which can provide sufficient employment and development opportunities for young people. The layout of urban industrial planning will affect the distribution of work hotspots. As shown in Figure 13, areas A, B, C, and D are the four high-tech areas planned by the government.
Area A is located in Hangzhou Future Sci-tech City, also called Haichuang Park, which has made great efforts to cultivate industries such as electronic information, new energy and materials, and so on. It has a series of new high-tech enterprises such as the Alibaba Taobao Town and Zhejiang Overseas High-level Talent Innovation Park.
Area B is located in the Xiasha District, which stands for the Hangzhou Economic and Technological Development Area. Xiasha is also a district of colleges and universities, 14 in total, which provide a large number of high-quality young talent. Several high-tech industrial parks are located here including the Hangzhou Singapore Science and Technology Park, Hangzhou Smart Valley Mobile Internet Pioneer Park, and Hangzhou High-Tech Enterprise Incubation Park.
Area C is located in the Hangzhou Binjiang High-Tech Industrial Development Zone, which has followed the leadership of the Scientific Outlook on Development and persisted in the "development of high technology and realization of industrialization". It has many famous Internet companies such as NetEase, Hikvision, Alibaba, and Huawei, and has large industrial parks such as the Hangzhou High-Tech Software Park, Xike Technology Park, and Shangfeng E-Commerce Industrial Park.
Area D is located in Hangzhou Northern New City, whose goal is to become the second largest Internet innovation center in Hangzhou. The large industrial parks located here include Hangzhou North Software Park and Paradise e Valley E-commerce Creative Industry Park.
The establishment of high-tech zones in the city, where locations are generally selected in the peripheral areas of the city, will help the city to gradually change from a single-center form to a multicenter form and drive the development of related industries by sharing resources and overcoming external negative effects to thus effectively promote the formation of industrial clusters.
Area E in Figure 13 is the urban downtown area, and most of the traditional and prosperous business districts are located here such as Wulin, Huanglong, Qianjiang New Town, and QianJiang Century City. There are many office buildings in these business districts, which have good transportation and communication conditions and high-quality supporting facilities, thus attracting a large number of companies to settle here. The layout of urban industrial planning will affect the distribution of work hotspots. As shown in Figure 13, areas A, B, C, and D are the four high-tech areas planned by the government.
Area A is located in Hangzhou Future Sci-tech City, also called Haichuang Park, which has made great efforts to cultivate industries such as electronic information, new energy and materials, and so on. It has a series of new high-tech enterprises such as the Alibaba Taobao Town and Zhejiang Overseas High-level Talent Innovation Park.
Area B is located in the Xiasha District, which stands for the Hangzhou Economic and Technological Development Area. Xiasha is also a district of colleges and universities, 14 in total, which provide a large number of high-quality young talent. Several high-tech industrial parks are located here including the Hangzhou Singapore Science and Technology Park, Hangzhou Smart Valley Mobile Internet Pioneer Park, and Hangzhou High-Tech Enterprise Incubation Park.
Area C is located in the Hangzhou Binjiang High-Tech Industrial Development Zone, which has followed the leadership of the Scientific Outlook on Development and persisted in the "development of high technology and realization of industrialization". It has many famous Internet companies such as NetEase, Hikvision, Alibaba, and Huawei, and has large industrial parks such as the Hangzhou High-Tech Software Park, Xike Technology Park, and Shangfeng E-Commerce Industrial Park.
Area D is located in Hangzhou Northern New City, whose goal is to become the second largest Internet innovation center in Hangzhou. The large industrial parks located here include Hangzhou North Software Park and Paradise e Valley E-commerce Creative Industry Park.
The establishment of high-tech zones in the city, where locations are generally selected in the peripheral areas of the city, will help the city to gradually change from a single-center form to a multi-center form and drive the development of related industries by sharing resources and overcoming external negative effects to thus effectively promote the formation of industrial clusters.
Area E in Figure 13 is the urban downtown area, and most of the traditional and prosperous business districts are located here such as Wulin, Huanglong, Qianjiang New Town, and QianJiang Century City. There are many office buildings in these business districts, which have good transportation and communication conditions and high-quality supporting facilities, thus attracting a large number of companies to settle here.

Living Areas
Like the work areas analysis, we made a heat map to localize the hotspot regions in living areas by using the KDE method. The parameters of the method were set as follows: (1) the cell size was set to 0.0001 degrees; (2) the search radius was set to 0.005 degrees; and (3) for the population field, the clusters d, e, and f were set to 7, 3, and 1, respectively. Figure 14 shows the heat map of the living areas.

Living Areas
Like the work areas analysis, we made a heat map to localize the hotspot regions in living areas by using the KDE method. The parameters of the method were set as follows: (1) the cell size was set to 0.0001 degrees; (2) the search radius was set to 0.005 degrees; and (3) for the population field, the clusters d, e, and f were set to 7, 3, and 1, respectively. Figure 14 shows the heat map of the living areas.

Living Areas
Like the work areas analysis, we made a heat map to localize the hotspot regions in living areas by using the KDE method. The parameters of the method were set as follows: (1) the cell size was set to 0.0001 degrees; (2) the search radius was set to 0.005 degrees; and (3) for the population field, the clusters d, e, and f were set to 7, 3, and 1, respectively. Figure 14 shows the heat map of the living areas.  Transportation is one of the important factors that affect the choice of residential location, which is related to commuting time. The subway, as a form of public transportation, has become a major way for commuting, especially in large cities. Hangzhou has opened subway lines Nos. 1, 2, and 4. We used the subway stations as centers to create buffer zones with a buffer range of 1.5 km and overlay the buffer zones with the heat map of the living areas; the result is shown in Figure 15. From the results of the buffer analysis, it is clear that the living areas are mainly distributed along the subway lines. Statistically, the grids that represent living areas within the buffer zones account for 81.24% of the total. ISPRS Int. J. Geo-Inf. 2020, 9, 42 15 of 20 Transportation is one of the important factors that affect the choice of residential location, which is related to commuting time. The subway, as a form of public transportation, has become a major way for commuting, especially in large cities. Hangzhou has opened subway lines Nos. 1, 2, and 4. We used the subway stations as centers to create buffer zones with a buffer range of 1.5 km and overlay the buffer zones with the heat map of the living areas; the result is shown in Figure 15. From the results of the buffer analysis, it is clear that the living areas are mainly distributed along the subway lines. Statistically, the grids that represent living areas within the buffer zones account for 81.24% of the total. It is easy to find that there are four hotspot regions of living areas in the heat map, as shown in Figure 16e. In contrast with online maps such as Google Maps, a large number of residential communities and apartments appear in the hotspot regions, as shown in Figures 16a-d. In addition, we found that urban villages, which are also called chengzhongcun in Chinese, also appeared in all hotspot regions. Urban villages are formed when expanding modern urban districts encroach on rural settlements and became transitional neighborhoods under rapid urbanization [1,42].
Hangzhou has a huge job market for Internet-related industries, which has attracted a large number of young talent in recent years and produces a growing demand for housing, especially for low-rent housing. We added the center points of the grids from clusters d and e to the hotspot region A, which were identified as living areas and had more significant functional characteristics. As shown in Figure 17, although residential communities and apartments were densely distributed in area A, the locations of the cluster grids were mostly concentrated in the urban village areas. It shows that the urban villages have become one of the residential choices for many young people, which provides low-rent housing and low-cost living spaces. However, the population of urban villages has strong mobility and instability [43], and there are many private rental housing accommodations in urban villages that are not registered in the local house lease management office, which will make it difficult to monitor the floating population in urban villages. Thus, the method of this paper provides a new way to discover urban villages in cities, which will help to find the hotspot areas of floating populations and promote urban youth population management. It is easy to find that there are four hotspot regions of living areas in the heat map, as shown in Figure 16e. In contrast with online maps such as Google Maps, a large number of residential communities and apartments appear in the hotspot regions, as shown in Figure 16a-d. In addition, we found that urban villages, which are also called chengzhongcun in Chinese, also appeared in all hotspot regions. Urban villages are formed when expanding modern urban districts encroach on rural settlements and became transitional neighborhoods under rapid urbanization [1,42].
Hangzhou has a huge job market for Internet-related industries, which has attracted a large number of young talent in recent years and produces a growing demand for housing, especially for low-rent housing. We added the center points of the grids from clusters d and e to the hotspot region A, which were identified as living areas and had more significant functional characteristics. As shown in Figure 17, although residential communities and apartments were densely distributed in area A, the locations of the cluster grids were mostly concentrated in the urban village areas. It shows that the urban villages have become one of the residential choices for many young people, which provides low-rent housing and low-cost living spaces. However, the population of urban villages has strong mobility and instability [43], and there are many private rental housing accommodations in urban villages that are not registered in the local house lease management office, which will make it difficult to monitor the floating population in urban villages. Thus, the method of this paper provides a new way to discover urban villages in cities, which will help to find the hotspot areas of floating populations and promote urban youth population management.

Conclusions
In this paper, food delivery data, as a new data source, were introduced into urban computing, and we proposed a time-series-based clustering method to discover the geographical distribution of urban youth. The synthetic time series of food delivery were constructed by combining the weekdays-weekends 48-h timeline with delivery hot values that were calculated by using the modified Getis-Ord statistic method to utilize the characteristics of the delivery data. Then, an unsupervised k-means clustering method was adopted to classify the time series into six classes, and based on the behavioral characteristics of ordering food delivery that differ between job and housing areas, the six classes were identified as two functional types (i.e., work areas and living areas). The identification result was verified by comparing it with the office urban planning map, and the accuracy of the identified work areas was 66.4-94.6%, while the accuracy of the identified living areas was 59-87.2%.
Focusing on the urban youth population, especially white-collar workers, the work and living areas were further analyzed by comparing them with the industrial structure and residential layout in Hangzhou. Heat maps were made with the kernel density estimation method to localize hotspot regions in the work and living areas, with the main findings as follows: 1. The spatial distribution of the delivery customers' work places was similar to the spatial distribution of the Internet and e-commerce companies of Hangzhou. This shows that a large number of young people are working in the Internet-related industries areas planned by the government, and there exists a symbiotic relationship between Hangzhou's Internet-related industrial agglomeration and young labor gathering. The establishment of high-tech zones has effectively attracted young people, and the young labor gathering also promotes the development of the Internet-related industry. There is a mutual driving relationship between Hangzhou's Internet-related industry agglomeration and young labor gathering. 2. Transportation and living costs are the two important factors that affect the choice of residential location for young people. The hotspot living areas of urban youth are mostly located within 1.5 km from the subway stations, and the subway is becoming one of the most important modes for commuting. Urban villages have become one of the residential choices for many young people, which provide low-rent housing and low-cost living spaces.

Conclusions
In this paper, food delivery data, as a new data source, were introduced into urban computing, and we proposed a time-series-based clustering method to discover the geographical distribution of urban youth. The synthetic time series of food delivery were constructed by combining the weekdays-weekends 48-h timeline with delivery hot values that were calculated by using the modified Getis-Ord statistic method to utilize the characteristics of the delivery data. Then, an unsupervised k-means clustering method was adopted to classify the time series into six classes, and based on the behavioral characteristics of ordering food delivery that differ between job and housing areas, the six classes were identified as two functional types (i.e., work areas and living areas). The identification result was verified by comparing it with the office urban planning map, and the accuracy of the identified work areas was 66.4-94.6%, while the accuracy of the identified living areas was 59-87.2%.
Focusing on the urban youth population, especially white-collar workers, the work and living areas were further analyzed by comparing them with the industrial structure and residential layout in Hangzhou. Heat maps were made with the kernel density estimation method to localize hotspot regions in the work and living areas, with the main findings as follows: 1.
The spatial distribution of the delivery customers' work places was similar to the spatial distribution of the Internet and e-commerce companies of Hangzhou. This shows that a large number of young people are working in the Internet-related industries areas planned by the government, and there exists a symbiotic relationship between Hangzhou's Internet-related industrial agglomeration and young labor gathering. The establishment of high-tech zones has effectively attracted young people, and the young labor gathering also promotes the development of the Internet-related industry. There is a mutual driving relationship between Hangzhou's Internet-related industry agglomeration and young labor gathering.

2.
Transportation and living costs are the two important factors that affect the choice of residential location for young people. The hotspot living areas of urban youth are mostly located within 1.5 km from the subway stations, and the subway is becoming one of the most important modes for commuting. Urban villages have become one of the residential choices for many young people, which provide low-rent housing and low-cost living spaces.
In addition, the findings of this paper can not only improve the view of the current state of the city's industrial development, but can also promote floating population management by discovering and monitoring the key areas such as urban villages.
In future studies, machine learning methods can be introduced to improve the classification method for food delivery data, and thus, the identified functional types can be divided into more land use categories. By comparing multi-year food delivery data, we can discover changes in the distribution of work and living areas to track the urbanization process.
Author Contributions: Yiming Yan was involved in the design of the study, interpretation of data, drafted the major revisions, and performed the experiments; Yuanyuan Wang contributed to the study design and algorithm improvement; Zhenhong Du drafted part of the manuscript; Feng Zhang conceived the experiments and improved the manuscript; Renyi Liu was involved in the data acquisition and analyses of the data and experiments; and Xinyue Ye was involved in the revision of the manuscript. All authors have read and agreed to the published version of the manuscript.