Exploring the Attractiveness of Residential Areas for Human Activities Based on Shared E-Bike Trajectory Data

: Human activities generate diverse and sophisticated functional areas and may impact the existing planning of functional areas. Understanding the relationship between human activities and functional areas is key to identifying the real-time urban functional areas based on trajectories. Few previous studies have analyzed the interactive information on humans and regions for functional area identiﬁcation. The relationship between human activities and residential areas is the most representative for urban functional areas because residential areas cover a wide range and are closely connected with human life. The aim of this paper is to propose the CARA (Commuting Activity and Residential Area) model to quantify the correlation between human activities and urban residential areas. In this model, human activities are represented by hot spots extracted by the Gaussian Mixture Model algorithm while residential areas are represented by POI (point of interest) data. The model shows that human activities and residential areas present a logarithmic relationship. The CARA model is further assessed by retrieving urban residential areas in Tengzhou City from shared e-bike trajectories. Compared with the actual map, the accuracy reaches 83.3%, thus demonstrating the model’s reliability and feasibility. This study provides a new method for functional areas identiﬁcation based on trajectory data, which is helpful for formulating the urban people-oriented policies.


Introduction
Human activities not only generate diverse and sophisticated functional areas, but may also change the existing planning of functional areas [1]. Understanding the relationship between human activities and urban regions is key to functional area identification [2]. In the big data era, various trajectory data can be easily obtained because of the popularity of location-aware devices and smart sensors in a city. These trajectory data not only convey the underlying information on people and cities, but also imply the interactive information of people and the urban environment [3,4]. Therefore, trajectory data provide a new opportunity for deeply mining the inner relationship between human activities and the surrounding environment. Exploring the inner relationship and identifying functional areas based on trajectory data are helpful for city planners and administrators to comprehend urban dynamics and evaluate urban environments timely and rapidly.
The study presented here is focused on "identifying residential areas" and "determining the relationships between human activities and functional areas." The related works on these two aspects are summarized.
Residential areas can be determined by identifying urban functional areas. Conventional urban functional area identification is largely based on the land use status and questionnaire surveys [21], which are time-consuming, labor-intensive, and cannot reflect the city structure in real time [22]. Some researchers also used POI (point of interest) data to identify functional areas [23,24]. This method ignores the inner correlation between human activities factors and urban functional areas; thus, the identified results cannot reflect the influence of human activities on the dynamic changes of region social function [25].
Big data provides an opportunity to identify functional areas by data-driven methods. Accompanied with the rapid development of LBS (location-based service) technology, various trajectory data can be easily obtained, such as taxi trajectories, bicycle trajectories, e-bike trajectories, and mobile phone data. Trajectory data include the geographic locations of objects that change over time [26]; thus, these data are a fine-scale representation of the spatiotemporal footprint of human activities. Thus, scholars have combined trajectory data and other data sources (POI, social media data, satellite image) to improve the identification results [27][28][29]. For instance, to fully exploit taxi trajectory and POI data, a topic-based inference model is applied to discover urban functional areas [1]. In the model, trajectory data reflect human movements from one place to another while the POIs located in a region reflect the function of that region. This method implicitly considers spatial interaction information rather than the influence of region social function on human mobility.
The collective dynamic of a whole city has a strong spatiotemporal pattern, which suggests that the population of each region in a city varies with time due to different human activities. These population variations can be represented by human activities identified via trajectory data, such as picking up a bicycle or making a phone call [2]. This temporal variation shows the interactive information between human activities and locations with certain social functions. Generally, regions with different social functions have different temporal activity variations. For instance, residential areas are characterized by large travel volume, although the centralized travel time is limited to two peaks periods of commuting. However, railway stations have a large traffic volume all day except late at night. Obviously, temporal activity variations are the interpreted indicators of social functions and are thus analogous to the reflection spectrum curve of objects in remote sensing images [11]. Therefore, temporal activity variation can be assessed by advanced machine learning algorithms, such as SVM (Support Vector Machine), EM (Expectation Maximization), deep learning, or random forest [8,9,30], and are used to identify urban functional areas through trajectory data [31,32]. The temporal activity variation-based method of functional area identification considers the interactive information between human activities and functional areas, although the interactive information is qualitatively described. However, the standard temporal activity variation pattern of each functional area is difficult to obtain; thus, the method is limited in applicability and robustness. In addition, the temporal activity variation pattern of each functional area is obtained based on prior knowledge or training a high number of precise samples [10].
Human activities and urban regions are closely intertwined [33]. Understanding the interactive information between human activities and functional areas is essential for functional area identification and land use analysis [23,34,35]. The relationship between human activities and functional areas has always been a hot topic in the urban transportation and urban planning domain field, and plays an important role in public policymaking [36]. Many scholars have focused on the relationship between travel behavior and land use, and they found that land use patterns determine the need to travel, and travel behaviors also affect the land use pattern and lead to improved urban development [20]. Additionally, the influence of the urban built environment on some special travel behaviors was explored. For instance, Boarnet and Crane built a model to study the relationship between the urban built environment and nonwork travel behavior, and found significant linkages between land use and travel speed and distance in San Diego [33]. Handy et al. studied the influence of the built environment on commuting behavior and found that the influence was not only statistically significant but also practically important [37]. These studies were mainly carried out using travel survey data, which are time-consuming to collect and cannot describe human movements in real time. With the popularity of LBS technology, the relationship between travel behavior and the built environment was studied based on trajectory data. Ahmadreza et al. built a linear mixed model to explore the influence of the factors that affect bicycle flows (e.g., weather, temporal, bicycle infrastructure, land use, and built environment) based on bicycle trajectory data [38]. Wafic et al. studied the bike sharing demand under the impacts of the built environment and weather condition in order to optimize the bicycle share system [39]. In these studies, the associated models mainly represent the comprehensive influence of the external environmental factors on human travel behavior and cannot be used to infer region social function from trajectory data.
Previous studies on "functional area identification" and the "relationships between human activities and functional areas" have various shortcomings. First, most of the studies on functional area identification do not consider the interactions between human activities and regional social functions, which limits the accuracy and real-time performance of functional area identification. Second, the majority of the relationships between human activities and functional areas focus on how human travel behavior is affected by the external environment, which cannot be used to inversely identify functional regions. Third, most study data are concentrated on mobile phone data, taxi and bicycle trajectory data, and survey data, while few studies have focused on shared e-bike trajectory data. Thus, this paper is conducted based on shared e-bike trajectory data in Tengzhou City and aims to provide insights on the relationship between human activities and residential areas and guidance for residential area delineation.

Study Area and Dataset
Shared electric bikes (e-bikes for short) are an emerging green travel mode. Compared with bicycles, e-bikes have distinct advantages on long trips, during periods of poor air quality or weather, and in areas with challenging topography [40]. The popularity of shared e-bikes is beneficial to the sustainable development of urban transportation, regional socioeconomics, and the environment. With the use of an electronic fence, shared e-bikes can be picked up or returned freely within the station-free bike sharing system using a smart phone. Therefore, they are widely used in the downtown areas of Tengzhou City.
Tengzhou City, a prefecture-level city, is located in southwestern Shandong Province in eastern China. The geographic coordinates of the city are 116 • 49 -117 • 24 N and 34 • 50 -35 • 17 E. Tengzhou is one of the most beautiful ecotourism cities in China, and the city includes four blocks (located in the central area) and seventeen towns (located in the surrounding area) (seen in Figure 1).
Human activity data were obtained from the real-time captured GPS trajectory points from May to July 2018 in the urban area of Tengzhou City, Shangdong Province. Each e-bike sends GPS information to a specified internet address every minute through an integrated GPS communication module. The trajectory points of shared e-bikes can be obtained by a high-frequency timer system based on an HTTP (hyper-text transfer protocol) data acquisition interface provided by the shared e-bike operator. The acquired trajectory data set contains 2,128,290 GPS points covering the downtown areas (Longquan block, Jinghe block, Shannan block, and Beixin block) of Tengzhou City, which involves 960 e-bikes. Each of the raw GPS points includes the e-bike ID (Station ID), data acquisition time (timestamp), geographic location (latitude and longitude coordinates), and predicted mileage (anticipated mileage).

Methodology
Commuting behavior links workplace areas and residential areas and is one of the important activities in human daily life. This paper proposes a model to quantify the inner correlation between human commuting activities and residential zones by trajectory data. The raw trajectories are cleaned to obtain OD (Origin-Destination) points, which imply that certain activities are carried out at certain times in a region. Based on these OD points, temporal mobility patterns are analyzed to form the ebike trajectory data related to residential areas. Then, urban hot spots are detected by the GMM (Gaussian Mixture Model), a self-adapting machine leaning algorithm, in which the Bayesian information criterion (BIC) is used to determine the key parameter, that is, the number of the urban hot spots. POI data are the point data used to describe physical entities in the real world. The spatial distribution reflects the function of a region. Combining POI data and urban hot spots, the CARA model is constructed quantitatively to mine their inner correlation. The CARA model vividly depicts the correlation between human commuting activities and residential zones. Finally, based on the CARA model, the residential zone and boundary in Tengzhou City are re-delineated through e-bike trajectory data to validate the feasibility of the model. The overall workflow is illustrated in Figure 2.

Methodology
Commuting behavior links workplace areas and residential areas and is one of the important activities in human daily life. This paper proposes a model to quantify the inner correlation between human commuting activities and residential zones by trajectory data. The raw trajectories are cleaned to obtain OD (Origin-Destination) points, which imply that certain activities are carried out at certain times in a region. Based on these OD points, temporal mobility patterns are analyzed to form the e-bike trajectory data related to residential areas. Then, urban hot spots are detected by the GMM (Gaussian Mixture Model), a self-adapting machine leaning algorithm, in which the Bayesian information criterion (BIC) is used to determine the key parameter, that is, the number of the urban hot spots. POI data are the point data used to describe physical entities in the real world. The spatial distribution reflects the function of a region. Combining POI data and urban hot spots, the CARA model is constructed quantitatively to mine their inner correlation. The CARA model vividly depicts the correlation between human commuting activities and residential zones. Finally, based on the CARA model, the residential zone and boundary in Tengzhou City are re-delineated through e-bike trajectory data to validate the feasibility of the model. The overall workflow is illustrated in Figure 2.

Gaussian Mixture Model
In the study, hot spot detection is a core technology. To accommodate the nonuniformity of trajectory data, a self-adopting machine learning algorithm GMM (Gaussian mixture model) is adopted. The GMM is a classical machine learning algorithm that is most widely used for feature recognition, data classification, and image segmentation [41,42]. The GMM assumes that all the data are generated from a superposition of a finite number of Gaussian distributions with some unknown parameters. In fact, almost any density of data can be approximated at an arbitrary accuracy, if sufficient Gaussian components are used and the parameter values (mean, weight, and covariance values) of each Gaussian component in the linear combination are adjusted. In this paper, a GMM with two variables is applied to GPS trajectory data to detect activity hot spots. Let

Gaussian Mixture Model
In the study, hot spot detection is a core technology. To accommodate the nonuniformity of trajectory data, a self-adopting machine learning algorithm GMM (Gaussian mixture model) is adopted. The GMM is a classical machine learning algorithm that is most widely used for feature recognition, data classification, and image segmentation [41,42]. The GMM assumes that all the data are generated from a superposition of a finite number of Gaussian distributions with some unknown parameters. In fact, almost any density of data can be approximated at an arbitrary accuracy, if sufficient Gaussian components are used and the parameter values (mean, weight, and covariance values) of each Gaussian component in the linear combination are adjusted. In this paper, a GMM with two variables is applied to GPS trajectory data to detect activity hot spots. Let x = {x i ; i = 1, 2, · · · , n} and y = y i ; i = 1, 2, · · · , n denote the GPS trajectory data sets, where (x i , y i ) are the longitude and latitude of an arbitrary GPS point, respectively, i is the index of the GPS points, and n is the total number of trajectory points. To cluster a trajectory data set of n points into K labels, the GMM assumes that each observation x i is considered independent of the label where Ω k is a sample set of all the trajectory points with label k. The corresponding probability density equation can be expressed as follows: where Π = {π k |k = 1, 2, · · · , K } is the set of weights for each component and satisfies the constraints is a Gaussian component of the mixture model with a Gaussian distribution, and it is parameterized by the symbol θ k . The symbol θ k includes the mean vector µ k µ k ∈ R 2 and covariance matrix Σ k Σ k ∈ R 2×2 of the Kth component, which can be efficiently estimated with the expectation maximization (EM) algorithm. Φ(x i , y i |θ k ) describes the Gaussian distribution of the trajectory data, which can be expressed as follows: Θ is the parameter set for all the components, and Θ = {θ k |k = 1, 2, · · · , K }. As the observation (x i , y i ) is the independent variable being modeled, the joint conditional density of the trajectory data set (x, y) = x i , y i ; i = 1, 2, · · · , n can be modeled as follows: The estimation of model parameters (π, µ, Σ) is key to constructing a well-suited GMM from random data. In this paper, the EM algorithm is used to estimate parameters with a predefined number K of components until the iterative process converges [43].

Bayesian Information Criterion
The number of components in the GMM needs to be defined in advance. A good model can balance the relationship between complexity and descriptive ability. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are frequently used to find a good model for determining the number of components K in a bivariate GMM. Compared with the AIC, the BIC has advantages in modeling convergence for large data sets [44]. Therefore, the BIC is adopted in this paper to determine the number of components in the GMM.
The BIC, which was first proposed by Schwartz [45], uses a Bayesian framework for model selection to establish a model with the maximum posterior probability or maximum likelihood under certain conditions. This approach has been widely used in audio recognition and image processing and is characterized by high efficiency, high accuracy, and strong robustness [46]. The BIC can be used to determine the number of components in a GMM and is usually used in combination with the expectation maximization algorithm. The mathematical expression of the BIC is as follows: where the positive integer k denotes the number of components in the GMM, n denotes the size of the sample data set, and L(θ; x, y) is the likelihood function of the GMM. As k increases from 1 to a large number, the best GMM is obtained when the BIC value is minimized. The equation is written in the following form.

Temporal Mobility Pattern of Shared E-Bikes
As the raw GPS trajectory points are disordered, the raw trajectory data are cleaned first by the MCHLC (multirule-constrained homomorphic linear clustering) algorithm [47]. After data cleaning, 143,910 trips were obtained. The OD pairs of each trip were used to analyze citizens' travel behaviors. Figure 3A presents the aggregated hourly rhythm of shared e-bike usage. The variation in hourly rhythm indicates that shared e-bike travel is related to daily behaviors on different days of the week. There is a similar variational pattern from Monday to Friday and a distinct pattern on weekends.
pattern on weekdays suggests that the morning and evening rush periods for e-bikes in the central area of Tengzhou City are 7:00-8:00 am and 17:00-19:00 pm, when people are highly mobile to commute. Therefore, the trips during rush hours on weekdays are mainly home-to-work journeys. Therefore, the O-D point of trips during rush hours on weekdays is strongly associated with residential and workplace areas, which supports a subsequent investigation of the correlation between commuting activities and residential areas.  On weekdays, there are usually three peak periods in a day, and they occur in the morning, evening, and at noon. Among the three peaks, the evening peak is highest, morning peak second, and midday peak lowest. Dong, 2018 noted a similar result that there were three obvious peaks (morning peak, midday peak, and evening peak) of e-bike usage in a day in Zhongshan City [17]. It is also a characteristic of the travel behavior of small-and medium-sized cities in China. Although a similar pattern is observed on weekdays, the total usage on Friday (especially after the evening rush hours) approaches that on the weekend. Compared with the weekdays, an obvious evening peak only occurs on weekends. E-bike usage fluctuates slightly from 7:00 a.m. to 3:00 p.m., and a small peak occurs at noon ( Figure 3B).

Hot Spot Detection Based on the Gaussian Mixture Model
There are obvious discrepancies in the travel patterns on weekdays and weekends. The travel pattern on weekdays suggests that the morning and evening rush periods for e-bikes in the central area of Tengzhou City are 7:00-8:00 a.m. and 17:00-19:00 p.m., when people are highly mobile to commute. Therefore, the trips during rush hours on weekdays are mainly home-to-work journeys. Therefore, the O-D point of trips during rush hours on weekdays is strongly associated with residential and workplace areas, which supports a subsequent investigation of the correlation between commuting activities and residential areas.

Hot Spot Detection Based on the Gaussian Mixture Model
The travel pattern of shared e-bikes on weekdays shows that trips during morning and evening rushes are concentrated mainly on job-home journeys. This finding suggests that the origin point of trips during morning rush hour and the destination point of trips during evening rush hour are highly correlated with residential areas. To measure the correlation between commuting activity and residential area, the origin points of trips during morning rush hour on weekdays are selected.
The GMM is adopted to detect activity hot spots because the multiple components in the model have a nonuniform distribution. To detect activity hot spots with the GMM, the number of Gaussian components K is first determined based on the BIC. To obtain the best GMM, a series of BIC values are calculated for values of K varying from 1 to 100. Figure 4 shows that the best model is composed of 32 Gaussian components, that is, the value of K corresponding to the minimum BIC value is 32.

Hot Spot Detection Based on the Gaussian Mixture Model
The travel pattern of shared e-bikes on weekdays shows that trips during morning and evening rushes are concentrated mainly on job-home journeys. This finding suggests that the origin point of trips during morning rush hour and the destination point of trips during evening rush hour are highly correlated with residential areas. To measure the correlation between commuting activity and residential area, the origin points of trips during morning rush hour on weekdays are selected.
The GMM is adopted to detect activity hot spots because the multiple components in the model have a nonuniform distribution. To detect activity hot spots with the GMM, the number of Gaussian components K is first determined based on the BIC. To obtain the best GMM, a series of BIC values are calculated for values of K varying from 1 to 100. Figure 4 shows that the best model is composed of 32 Gaussian components, that is, the value of K corresponding to the minimum BIC value is 32. The distribution of origin points (green dots in Part A in Figure 5) in the morning rush hour period is simulated by the GMM with the best number of components at K = 32. The points with nonuniform densities can be well depicted by the GMM with different component distributions (see Part B in Figure 5). The nonuniform density distribution implies that the activity degrees at different locations vary with the uneven population distribution. The probability of the GMM denotes the activity intensity, and the shape of each component reflects the dispersion degree, center position, and maximum activity intensity of each activity hot spot. Thirty-two Gaussian components represent the regions in which activities are locally concentrated during morning rush hours, and the peak value of each component reflects the degree of activity intensity in the local region. All the activity hot spots in the study area are divided into three levels based on the average index (Ave = 0.067) and average deviation index (Ave_Dev = 0.037) of the peak value distribution. Level 1 denotes the activity intensity of hot spots larger than the average index, indicating a large traffic flow for shared e-bikes The distribution of origin points (green dots in Part A in Figure 5) in the morning rush hour period is simulated by the GMM with the best number of components at K = 32. The points with nonuniform densities can be well depicted by the GMM with different component distributions (see Part B in Figure 5). The nonuniform density distribution implies that the activity degrees at different locations vary with the uneven population distribution. The probability of the GMM denotes the activity intensity, and the shape of each component reflects the dispersion degree, center position, and maximum activity intensity of each activity hot spot. Thirty-two Gaussian components represent the regions in which activities are locally concentrated during morning rush hours, and the peak value of each component reflects the degree of activity intensity in the local region. All the activity hot spots in the study area are divided into three levels based on the average index (Ave = 0.067) and average deviation index (Ave_Dev = 0.037) of the peak value distribution. Level 1 denotes the activity intensity of hot spots larger than the average index, indicating a large traffic flow for shared e-bikes in this region. Level 2 denotes an activity intensity value between the average index and average deviation index values, indicating a typical traffic flow in the region. Level 3 denotes an activity intensity lower than the average deviation index value, indicating that few people choose shared e-bikes in the region. The standard deviation indicator was applied to delimit the urban hotspot scope [48]. Notably, Borruso applied three standard deviation units to delimit the CBDs (central business districts) of two midsize urban areas in northeastern Italy [49]. To further examine the hotspot "cores," one standard deviation and two standard deviations were computed for the hotspot centrality. The result suggests that one standard deviation unit is the best indicator for hotspot scope delineation in our study. The hotspot "core" information, as an expression of human activity, is used to construct the model of the relationship between human activity and functional zone.
The activity hot spots at different levels are displayed in Figure 6 and the distribution of activity hot spots at different levels was analyzed with the Standard Error Ellipse function in ArcGIS software. The Level 1 hot spots are mainly distributed at the junction of the Beixin, Jinghe, and Longquan blocks. The flatness value of the error ellipse is 0.35, and the direction angle is 67.87°. The direction angle of the error ellipse indicates that the residents are more active in the east-west The standard deviation indicator was applied to delimit the urban hotspot scope [48]. Notably, Borruso applied three standard deviation units to delimit the CBDs (central business districts) of two midsize urban areas in northeastern Italy [49]. To further examine the hotspot "cores," one standard deviation and two standard deviations were computed for the hotspot centrality. The result suggests that one standard deviation unit is the best indicator for hotspot scope delineation in our study. The hotspot "core" information, as an expression of human activity, is used to construct the model of the relationship between human activity and functional zone. The activity hot spots at different levels are displayed in Figure 6 and the distribution of activity hot spots at different levels was analyzed with the Standard Error Ellipse function in ArcGIS software. The Level 1 hot spots are mainly distributed at the junction of the Beixin, Jinghe, and Longquan blocks. The flatness value of the error ellipse is 0.35, and the direction angle is 67.87 • . The direction angle of the error ellipse indicates that the residents are more active in the east-west direction than in the north-south direction, which is consistent with the distribution of residential areas in the city center (residential areas are scattered along the main east-west roads, including Jiefang Road, Jinghe Road, and Xueyuan Road). Compared with the Level 1 hot spots, the Level 2 hot spots display a more distinct trend, with a flatness value of 0.41 and direction angle of 25.75 • for the error ellipse. The Level 2 hot spots are mainly located in the north and northeast parts of the city, especially in the emerging developing regions located in the northern part of the Longquan block. The Level 3 hot spots are located near the periphery of the city. It is noteworthy that the outer periphery of the city is characterized by small rural villages according to the actual Gaode Map. The low activity intensity levels imply that the usage of shared e-bikes is less common in rural regions than in urban regions. The activity hot spots at different levels are displayed in Figure 6 and the distribution of activity hot spots at different levels was analyzed with the Standard Error Ellipse function in ArcGIS software. The Level 1 hot spots are mainly distributed at the junction of the Beixin, Jinghe, and Longquan blocks. The flatness value of the error ellipse is 0.35, and the direction angle is 67.87°. The direction angle of the error ellipse indicates that the residents are more active in the east-west direction than in the north-south direction, which is consistent with the distribution of residential areas in the city center (residential areas are scattered along the main east-west roads, including Jiefang Road, Jinghe Road, and Xueyuan Road). Compared with the Level 1 hot spots, the Level 2 hot spots display a more distinct trend, with a flatness value of 0.41 and direction angle of 25.75° for the error ellipse. The Level 2 hot spots are mainly located in the north and northeast parts of the city, especially in the emerging developing regions located in the northern part of the Longquan block. The Level 3 hot spots are located near the periphery of the city. It is noteworthy that the outer periphery of the city is characterized by small rural villages according to the actual Gaode Map. The low activity intensity levels imply that the usage of shared e-bikes is less common in rural regions than in urban regions.

CARA Model Construction
To verify the aforementioned hypothesis that the hot spots of origin points during morning rush hours are located in residential zones, POIs obtained from the Baidu Map API (http://lbsyun.baidu.com/) using the Python language are introduced and categorized in accordance with Baidu's internal POI standards (http://lbsyun.baidu.com/index.php?title=lbscloud/poitags). The POI data collected for this study include multiple categories, such as residential zones, shopping malls, transportation facilities, educational facilities, financial institutions, tourist attractions, medical facilities, restaurants, corporate facilities, and community services. These POIs are point features that

CARA Model Construction
To verify the aforementioned hypothesis that the hot spots of origin points during morning rush hours are located in residential zones, POIs obtained from the Baidu Map API (http://lbsyun.baidu.com/) using the Python language are introduced and categorized in accordance with Baidu's internal POI standards (http://lbsyun.baidu.com/index.php?title=lbscloud/poitags). The POI data collected for this study include multiple categories, such as residential zones, shopping malls, transportation facilities, educational facilities, financial institutions, tourist attractions, medical facilities, restaurants, corporate facilities, and community services. These POIs are point features that describe entities including geolocation coordinates (latitude and longitude coordinates), names, and classes; thus, they not only reflect the basic activities of urban residents (e.g., living, working, commuting, and recreation) but also the function of a region [50]. In total, 16,626 data points were acquired and 2573 POIs were located in residential areas. A coordinate transformation was performed due to the encryption form of the POI data.
The POI data for residential zones were aggregated to an appropriate spatial analysis unit to reveal the hot spot distribution. The spatial analysis unit size ranged from 100 to 500 m with an interval of 50 m. The optimum unit size was found to be 200 m, which is within the range suggested by some urban geographers (200-300 m is suitable for fine-spatial-resolution data in urban centers) [51,52]. The density distribution of residential POIs in the study area is displayed with a regular grid cell (200 m × 200 m) in Figure 6. The density distribution of POIs indicates that residential zones in Tengzhou City are mainly located in the city center at the junction of the four jurisdictional blocks, and these zones extend to the north and east. Considering the distribution of residential POIs, the distribution of hot spots with different intensities verifies the speculation that activities during morning rush hours are concentrated in residential areas.
We further model the activity hot spots (human activities) and the POIs (region function) to investigate the relationship between these factors and verify the feasibility of residential area identification based on the trajectories of shared e-bikes. As for the statistical distribution of POIs, the activity intensity of each spatial analysis unit can be calculated by the GMM. The spatial units with nonzero POI distributions in residential areas are selected for modeling, and these areas are represented by gray dots in Figure 7. and these zones extend to the north and east. Considering the distribution of residential POIs, the distribution of hot spots with different intensities verifies the speculation that activities during morning rush hours are concentrated in residential areas. We further model the activity hot spots (human activities) and the POIs (region function) to investigate the relationship between these factors and verify the feasibility of residential area identification based on the trajectories of shared e-bikes. As for the statistical distribution of POIs, the activity intensity of each spatial analysis unit can be calculated by the GMM. The spatial units with nonzero POI distributions in residential areas are selected for modeling, and these areas are represented by gray dots in Figure 7. Before modeling, noise points should be filtered and removed to reduce the influence of noise on modeling. The distributions of POIs and spatial grid units corresponding to different activity intensity values are statistically analyzed in Figure 8. The points in Part A of Figure 7 denote the spatial units with activity intensities close to zero (the value lies in the range of 0-0.001), as denoted by the cyan bars in Figure 8. These spatial units are uncommon, accounting for only 2.37% of all units, and the POIs of these units account for only 3.05% of all POIs. Such spatial grid units are mainly located in the periphery of the city and characterized by low activity intensities and high POI values, suggesting that residents in these areas seldom select e-bikes for travel. Additionally, this finding indicates that shared e-bike users are only a limited portion of the population, which is a common issue for trajectory data. The points in Part B of Figure 7 denote the spatial grid units with activity Before modeling, noise points should be filtered and removed to reduce the influence of noise on modeling. The distributions of POIs and spatial grid units corresponding to different activity intensity values are statistically analyzed in Figure 8. The points in Part A of Figure 7 denote the spatial units with activity intensities close to zero (the value lies in the range of 0-0.001), as denoted by the cyan bars in Figure 8. These spatial units are uncommon, accounting for only 2.37% of all units, and the POIs of these units account for only 3.05% of all POIs. Such spatial grid units are mainly located in the periphery of the city and characterized by low activity intensities and high POI values, suggesting that residents in these areas seldom select e-bikes for travel. Additionally, this finding indicates that shared e-bike users are only a limited portion of the population, which is a common issue for trajectory data. The points in Part B of Figure 7 denote the spatial grid units with activity intensity values in the range of 0.005-0.165, as denoted by the yellow bars in Figure 8. The statistical results indicate that the activity intensity values of these spatial units span large and reach 69.7%; however, these grid units account for only 20.84% of all grid units, and only 20.74% of POIs are in these grid units. Such spatial grid units are mainly located in the downtown area of Tengzhou City, which is characterized by high intensity values but low POI values, indicating the mixed functionality of the downtown area. Therefore, the points in Parts A and B in Figure 7 cannot be concretely used to identify residential areas and are excluded from modeling. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 13 of 20 located in the periphery of the city and characterized by low activity intensities and high POI values, suggesting that residents in these areas seldom select e-bikes for travel. Additionally, this finding indicates that shared e-bike users are only a limited portion of the population, which is a common issue for trajectory data. The points in Part B of Figure 7 denote the spatial grid units with activity intensity values in the range of 0.005-0.165, as denoted by the yellow bars in Figure 8. The statistical results indicate that the activity intensity values of these spatial units span large and reach 69.7%; however, these grid units account for only 20.84% of all grid units, and only 20.74% of POIs are in these grid units. Such spatial grid units are mainly located in the downtown area of Tengzhou City, which is characterized by high intensity values but low POI values, indicating the mixed functionality of the downtown area. Therefore, the points in Parts A and B in Figure 7 cannot be concretely used to identify residential areas and are excluded from modeling.
(A) (B) The points with activity intensity values between 0.001 and 0.005 were used for modeling. Figure  7 shows that the activity intensities of the trajectory and POI distributions exhibit a many-to-one functional relationship because of the uncertainty in human activities. Thus, the activity intensity observations need to be normalized. To explore the maximum impact of the activity intensity on residential identification, a maximization process method similar to the MVC (maximum value composite) approach of remote sensing is adopted. That is, according to the activity intensity value, the spatial grid units were divided into many groups with an interval of 0.005. Then, the spatial grid unit with the maximum intensity value in each interval was selected to model, as denoted by the red triangles in Figure 7. The CARA model is expressed by a logarithmic function.
where CA is the intensity value of the commuting activity of each spatial grid unit, which can be obtained from the hot spot distribution after GMM processing, and RAcnt is the number of residential POIs in the corresponding grid. The R-squared value is 0.876, indicating that the logarithmic model can effectively reflect the correlation between human activity and residential functional areas with high accuracy. It is noteworthy that the boundary of the residential area can also be delineated based on this specific logarithmic model. If the activity intensity value of a spatial unit is lower than 0.0036 (the green triangle in Figure 7), then the probability of the grid unit being classified as a residential area is negligible.
Based on the CARA model, the residential areas of Tengzhou City can be re-delineated, as shown in Figure 9. The result is consistent with the urban planning of Tengzhou City and further The points with activity intensity values between 0.001 and 0.005 were used for modeling. Figure 7 shows that the activity intensities of the trajectory and POI distributions exhibit a many-to-one functional relationship because of the uncertainty in human activities. Thus, the activity intensity observations need to be normalized. To explore the maximum impact of the activity intensity on residential identification, a maximization process method similar to the MVC (maximum value composite) approach of remote sensing is adopted. That is, according to the activity intensity value, the spatial grid units were divided into many groups with an interval of 0.005. Then, the spatial grid unit with the maximum intensity value in each interval was selected to model, as denoted by the red triangles in Figure 7. The CARA model is expressed by a logarithmic function.
RA cnt = 23.7036 + 1.36889 * ln(CA − 0.00362) (6) where CA is the intensity value of the commuting activity of each spatial grid unit, which can be obtained from the hot spot distribution after GMM processing, and RA cnt is the number of residential POIs in the corresponding grid. The R-squared value is 0.876, indicating that the logarithmic model can effectively reflect the correlation between human activity and residential functional areas with high accuracy. It is noteworthy that the boundary of the residential area can also be delineated based on this specific logarithmic model. If the activity intensity value of a spatial unit is lower than 0.0036 (the green triangle in Figure 7), then the probability of the grid unit being classified as a residential area is negligible.
Based on the CARA model, the residential areas of Tengzhou City can be re-delineated, as shown in Figure 9. The result is consistent with the urban planning of Tengzhou City and further corroborates the aforementioned speculation that citizens start their daily activities in residential areas during the morning rush hour period. In addition, the rural villages located at the periphery of the city can be identified, even if the density of the POI data is low.

Delineated Result of Urban Residential Areas and Evaluation
By a comparison with Baidu Map information, each grid function can be obtained through the geographic coordinate inverse calculation module. As a grid unit may cover two different land use parcels, a score is used to evaluate the relevance of each grid to residential area. The score of each grid is calculated by the area index index A . The symbol is less than 0.5, then the grid is considered independent of the residential areas; otherwise, the grid is regarded as part of a residential area. A score for each grid can be obtained from the following criteria.
Score n (9) where n is the number of grid units in the identified result. The precision is the relevance of the delineated result to the residential areas and calculated by the score of each grid unit. Compared with the actual map, four typical regions in Figure 9 are selected to further clarify the reliability of the results. These regions are located to the north, east, west, and south of the downtown area of the city. Part A lies in the emerging urban area in the northern part of the city; this area includes three large communities: Apple Garden, Tongsheng Garden, and Champion House. The

Delineated Result of Urban Residential Areas and Evaluation
By a comparison with Baidu Map information, each grid function can be obtained through the geographic coordinate inverse calculation module. As a grid unit may cover two different land use parcels, a score is used to evaluate the relevance of each grid to residential area. The score of each grid is calculated by the area index A index . The symbol Area insection denotes the intersection area between the grid unit of the delineated results and the actual residential area. The symbol Area grid denotes the area of each grid unit. The symbol A index denotes the relevance of the grid to the residential areas. If A index is less than 0.5, then the grid is considered independent of the residential areas; otherwise, the grid is regarded as part of a residential area. A score for each grid can be obtained from the following criteria.
where n is the number of grid units in the identified result. The precision is the relevance of the delineated result to the residential areas and calculated by the score of each grid unit.
Compared with the actual map, four typical regions in Figure 9 are selected to further clarify the reliability of the results. These regions are located to the north, east, west, and south of the downtown area of the city. Part A lies in the emerging urban area in the northern part of the city; this area includes three large communities: Apple Garden, Tongsheng Garden, and Champion House. The identification result in Part A includes 15 grid units, twelve of which are highly associated with the aforementioned communities. The precision index reaches 80%.
Part B is a cluster of modern communities located in the eastern part of the city, including the Voyage First International Community, Hancui Garden, Moxiang Holy House, and Huilong Harmony Garden. Residents in these communities prefer to travel by shared e-bikes, thus forming high-intensity hot spots. All the grid units are associated with residential areas.
Part C is another community cluster located west of the downtown area. Here, communities with different scales are built next to each other, and the largest are Jinkongfu Community, Western Song Community, and Geological Home. These communities are not along the major road in the area (Parallel Avenue), although the trajectories are mainly along this road. Thus, the precision index is low at only 78.6%.
Unlike Part C, Part D is a community cluster located along Datong Avenue, and it is composed of the different districts of Venice Community. Hence, the results obtained with the trajectory along the road are highly consistent with the actual scenario.
Based on the CARA model, the delineated residential areas from trajectory data encompass 480 grid units, including the villages at the periphery of the city. According to the actual map and the score index, the precision of the delineation of residential areas reaches 83.3%, which suggests that the proposed model is feasible.

Influencing Factors for the CARA Model
Commuting behaviors usually occur during the morning and evening rush hours, suggesting that the attractiveness of residential areas and workplaces to residents have distinct time characteristics. Therefore, the time characteristic is a factor that needs to be considered when building the CARA model. To further investigate the influence of the time factor on the CARA model, two additional data sets are used and their relevance for residential areas is compared. One is the set of destination points of trajectories in evening rush hours on weekdays; and the other is the mixture set of original points of trajectories in evening rush hours and destination points of trajectories in evening rush hours on weekdays. Notably, the weekdays for evening rush hours do not include Friday, because the usage of shared e-bikes during the Friday evening peak is close to that on weekends. The hot spots results for the two data sets are shown in Figure 10, in which Part A is the result for the evening rush data, and Part B is the result based on the morning-evening rush data.
Compared with the result for the morning peak period, similar distributions of different-level hot spots are observed in the evening peak and morning-evening peak periods. However, the number of hot spots with high activity intensity increases significantly with the increasing usage of shared e-bikes. When the usage of shared e-bikes increases, the number of hot spots (especially hot spots with high activity intensity) increases significantly (Table 1). This finding indicates that the hot spots are related to the volume of trajectory data.
In Figure 10, some hot spots are not associated with residential areas, suggesting that residents in Tengzhou ride shared e-bikes to different destinations in the evening peak period. To verify this finding, the functions of hot spots are investigated based on actual maps. As shown in Figure 11, compared with the Baidu Map information, although most hot spots in the evening peak period are associated with residential areas (the green circles), some hot spots are associated with shopping malls (the red circles) and entertainment venues (the pink circles). This finding suggests that residents with available time enjoy riding shared e-bikes for relaxation; in addition, based on the morning-evening peak trajectory data, as shared e-bike usage in the evening peak period is dominant (the trajectory volume in the evening peak period is approximately 1.5 times that in the morning peak period), the hot spot results are more similar to those in the evening peak period. additional data sets are used and their relevance for residential areas is compared. One is the set of destination points of trajectories in evening rush hours on weekdays; and the other is the mixture set of original points of trajectories in evening rush hours and destination points of trajectories in evening rush hours on weekdays. Notably, the weekdays for evening rush hours do not include Friday, because the usage of shared e-bikes during the Friday evening peak is close to that on weekends. The hot spots results for the two data sets are shown in Figure 10, in which Part A is the result for the evening rush data, and Part B is the result based on the morning-evening rush data.   Figure 11. Diverse locations of hot spots in the evening peak period. The green circles indicate the hot spots associated with residential areas, the red circles indicate the hot spots associated with shopping malls, and the prink circle indicate the hot spots associated with entertainment.
We further investigate the functions of hot spots through the geographical coordinate inverse calculation module of Baidu Map. Here, the function types of hot spots include residential areas, shopping malls, entertainment venues, and villages. The statistical results in Table 1 show that 90.6% of the hot spots in the morning peak period are associated with residential areas, while only 77.8% of residential areas are identified as hot spots during the evening peak period. The result suggests that Figure 11. Diverse locations of hot spots in the evening peak period. The green circles indicate the hot spots associated with residential areas, the red circles indicate the hot spots associated with shopping malls, and the prink circle indicate the hot spots associated with entertainment.
We further investigate the functions of hot spots through the geographical coordinate inverse calculation module of Baidu Map. Here, the function types of hot spots include residential areas, shopping malls, entertainment venues, and villages. The statistical results in Table 1 show that 90.6% of the hot spots in the morning peak period are associated with residential areas, while only 77.8% of residential areas are identified as hot spots during the evening peak period. The result suggests that the attractiveness of residential area to human activities is strongest during the morning rush hours. The diversity of human travel during the evening peak period interferes with the construction of the CARA model. Therefore, compared with the data volume, the data time characteristic has a greater influence on model construction, because it may result in the heterogeneity of human activities. Note: Morning, Evening, and Morning and Evening denote the trajectory data for the morning peak, evening peak, and combined morning-evening peak periods, respectively.

Conclusions
Understanding the relationship between human activities and functional areas is key to identifying urban functional areas. Trajectories imply rich interactive information of human-region, which is the basis for urban functional area identification. In this paper, the CARA model is proposed to quantify the correlation between commuting activities and residential areas. The residential areas are retrieved from the shared e-bike trajectory data to verify the quantitative model. The main conclusions are as follows.
(1) The GMM combined with BIC can accurately detect hot spots, including weak hot spots, when the density of trajectory data is heterogeneous. Based on the "activity degree" indicator, the hot spots can be classified into different levels.
(2) The CARA model is built to quantify the correlation between human commuting activity and residential area, and the R 2 coefficient reaches 0.876. Based on the model, residential areas can be delineated with high precision (83.3%). The result validates the feasibility and reliability of the CARA model.
(3) Although human activities have distinct temporal characteristics, different human activities may occur at the same time. The diversity of human activities results in interference when modeling the correlation between certain human activities and functional areas. The attractiveness of residential area to human activities is strongest during the morning rush hours on weekdays. The origins of e-bike trajectory data during morning rush period are most strongly correlated with urban residential areas.
(4) Compared with the amount of data, the data time characteristic can affect the homogeneity of human activities and has a more obvious influence on the model of correlation between human activities and functional areas.
Other functional areas, such as workplace and entertainment areas, can also be obtained from the shared e-bike trajectory based on the concept provided in this study. In our study, each functional region is regarded as a region with only one social function; thus, the mixed functional areas identification cannot be identified. Developing a method of identifying mixed functional areas is the focus of our future work.