Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data

Hao, Xuguang; Yin, Biao; Liu, Liu

doi:10.3390/futuretransp4040063

Open AccessArticle

Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data

by

Xuguang Hao

¹,

Biao Yin

^2,*

and

Liu Liu

³

¹

College of Physics and Electronic Engineering, Hainan Normal University, Haikou 571158, China

²

SATIE, Université Gustave Eiffel, 77420 Champs-sur-Marne, France

³

MATRiS, CY Cergy Paris Université, 95000 Cergy, France

^*

Author to whom correspondence should be addressed.

Future Transp. 2024, 4(4), 1318-1333; https://doi.org/10.3390/futuretransp4040063

Submission received: 9 August 2024 / Revised: 17 October 2024 / Accepted: 29 October 2024 / Published: 1 November 2024

Download

Browse Figures

Versions Notes

Abstract

The study on individual mobility patterns supports our better understanding of spatiotemporal characteristics of people’s travel behavior and social activities. The mobile phone GPS data are advantageous due to the large size of their data coverage. This paper aims to identify individual activity anchor places and to analyze related patterns based on the GPS data collected from thousands of mobile phone users over four months in Greater Paris, France. We propose a method to refine the identification of home and secondary activities. Based on this, the mobility spatial characteristics are aggregated by applying a three-stage clustering method. As a consequence, the clusters of activity types, the daily mobility patterns (day types), and the user groups with similar daily mobility patterns are obtained stage by stage. This allows us to analyze the obtained clusters in a cascading maneuver by three different levels: activity level, day level, and individual level. Inversely, the mobility characteristics per user group are interpreted with respect to the interpretation of day types and then activity types. From the interpretable clusters, it is facilitated for us to find the daily mobility differences by user groups across weekdays and weekends, transport modes, as well as the mobility variability over the study period.

Keywords:

individual mobility; clustering; activity type; daily mobility pattern; variability

1. Introduction

Understanding human mobility patterns is useful for policy-making in urban planning and mobility management. For example, through mobility spatial and usage pattern analysis, proper locations can be found for the deployments of car sharing [1] and e-charging stations [2]. In recent years, to meet the mobility needs of some targeted groups of individuals, demand-responsive transport services have developed, especially in rural and peri-urban areas [3].

With the rapid socio-economic development, people’s activities and travel behavior have accumulated lots of changes. More comfortable, dedicated, and efficient mobility is expected, while accompanied by sustainable consciousness on environmental protection. To learn about mobility demand and performance, traditional household travel surveys play an important role. However, the surveys have limited data samples and normally individuals’ one-day mobility diagrams. The survey update period is relatively long (e.g., 5 to 10 years) because of a large amount of time and labor costs so the state-of-the-art mobility is not known. Information and communication technologies (ICT) give us opportunities to acquire comprehensive and timely mobility information from massive geolocational points, covering a large area and a long time period. Over the last decade, mobile phone data, such as GPS traces, have been successfully used for detecting users’ travels and activities. A large number of studies focus on the topics of trip trajectories and mode inferences [4,5], as well as origin-destination matrices for travel demand modeling [6]. The labeling of activities is a challenge as only geographic information can be used to identify the anchor places (such as home, work, study, and other frequently visited places). In recent years, activity-based or agent-based models have attracted much attention thanks to their advantages of flexible and user-centric mobility modeling compared to trip-based models. For such models, individual mobility patterns with spatiotemporal features have the potential to provide preliminary knowledge for demand modeling at the individual level, especially considering mobility variability in a long-time window, such as multiple days [7,8].

In our study, we aim to identify individual mobility patterns at the day level based on the travel and activity characteristics from mobile phone GPS data. To achieve that, we initially refined the identification process of principal activity places. After this process, travel and activity features are grouped by applying a three-stage clustering method. In detail, the representative features are chosen for the clustering at three levels: activity level, day level, and individual level. Stage by stage, the clusters of activity types, the daily mobility patterns (day types), and the user groups with similar daily mobility patterns are eventually obtained. Due to the cascading maneuver of the multi-level clustering, inversely, the mobility characteristics per user group are interpreted with respect to the interpretation of day types and then activity types. The individual mobility patterns at the day level are further used to analyze the relationship between the patterns per user group and the day types of the week, as well as the mode uses. Moreover, we find the mobility variability over the study period and check the mobility stability across the user groups integrally.

The rest of this paper is organized as follows. Section 2 introduces the previous work. Section 3 presents the data and methods, including data preparation, activity inference, and the 3-stage clustering process with the interpretation of related clusters. The applied analysis of the obtained patterns by user groups is given in Section 4, where we analyze the pattern differences regarding weekdays and weekends, transportation modes, as well as mobility variability. We draw the conclusion in the last section.

2. Previous Work

A section of the literature is dedicated to extracting mobility features and patterns based on mobile phone data, such as GPS and Call Details Record (CDR) data. The early studies focused on trip segmentation centered on transport mode inferences, especially with GPS data [9,10]. In recent years, the detection of activity locations and the types of labeling (i.e., activity inference) have acquired many interests because of their applicable attempts in activity-based models [11]. The detection of activity locations is recognized as data pre-processing, where clustering methods are normally used to hash locations, and the basic attributes of locations, such as the start or end time, duration, or visited frequency that can be extracted [11,12,13,14]. Within the scope of this paper, we only review related work for primary and secondary activity inference, such as home and work, and for the subsequent activity pattern detection.

For activity inference based on detected locations, the applied methods can be categorized as rule-based methods, learning-based methods, and hybrid methods. In [15], a rule-based algorithm was proposed to identify home locations, according to users’ most frequent activities performed in the 9 p.m.–9 a.m. time interval and the most activity time spent during the week. The authors in [11] defined home and work using the most staying hours for the time intervals from midnight to 6 a.m. and from 1 p.m. to 5 p.m., respectively, considering both early and late workers. In the study of [16], different supervised learning models such as SVM, Decision Trees, and Random Forests were used to classify activities into “home”, “work/school”, “social visit”, “leisure”, etc. Based on the ensemble of the models, they achieved a prediction accuracy of 69.7%. In [4], the authors developed a hybrid method to identify activity places: a model-based clustering method to calculate clusters, a logistic regression model to distinguish between activity and travel clusters, and a set of rule-based algorithms to identify the types of locations visited (home, work, and other places).

For mobility pattern analysis based on inferred activities, there are a limited number of studies for trip chains or activity-travel pattern detection at the day level [17,18,19]. In [17], the GPS data from a round-trip car-sharing system were collected, aiming at using partitioning around medoid clustering to obtain the contrastive trip chain characteristics represented by four clusters (day types). These clusters are differently distributed on weekdays, days off, and holidays. In [18], the authors presented several kinds of activity-travel topologies (daily motifs) based on mobile phone users’ CDR data. In our previous study [19], we proposed a pipeline to reconstruct daily mobility diagrams for two weeks for 600 mobile phone users based on the raw sources of mobile phone data. The day-level mobility patterns were investigated with a set of aggregated activity types performed in one day. This work did not take into account the mobility variability regarding day types over a long-term study period.

For readers’ convenience, a comparison table (see Table 1) between methodologies for main activity inference (i.e., home and work) and the related pattern analysis is given below.

From the previous work, we found that some work emphasized identifying activity locations with various methods but less focused on subsequent day-level pattern analysis. Moreover, the labeling process of activities (e.g., home and work) still has room for improvement in identification accuracy regarding their rule-based algorithms. For those subsequent patterns obtained by machine learning algorithms, although they reflect aggregate characteristics, they lack enough interpretation with the aid of disaggregate features, especially at the day level. It is also challenging to measure individual mobility variability over a large-scale time window (e.g., several months) regarding their different day types. In this paper, we take into account the above issues and focus on identifying individual mobility patterns at the day level, using the GPS data from thousands of mobile phone users who were tracked during the study period of four months in 2020 and experienced COVID-19.

3. Data and Methods

In this section, we first introduce the data source (Section 3.1) and then use related methods for primary and secondary activities inference (Section 3.2), as well as the identification of individual daily mobility patterns (Section 3.3). In Figure 1, the framework of the method is illustrated.

3.1. Data Preparation

3.1.1. Overview

The study area is the Greater Paris region in France, including seven departments (i.e., Paris city (75), small ring (92, 93, and 94), and large ring (77, 78, 91, and 95)). The previously unpublished study detected users’ trips from the mobile phone GPS tracking data by KISIO company. From the trip-based dataset, one trip has the attributes of date, id of an anonymous individual, mode use, departure, and destination of spatiotemporal GPS points (timestamps and coordinates), as well as the standard analysis zones associated with the points. The original dataset refers to the executed trips from 10,000 mobile phone users in four different months (i.e., February, May, September, and November) of the year 2020. After data qualification analysis, only 7348 individuals own at least 15 days of data each month. Notably, the experienced months relate to (1) before COVID-19 was officially confirmed in France (i.e., February), (2) the first action of national confinement (i.e., partial days in May), (3) the release of the first confinement (i.e., September), and (4) the second action of national confinement (i.e., November). All individuals only have trips inside the study region, and people who have exchange trips outside (for example, long-distance or temporal holiday travels outside the region) are not included in this study.

3.1.2. Data Filtering

At first, the timestamps of all trips in the dataset are modified to fit in the daytime window between 04:00:00 (the current day) and 04:00:00 (the next day).

It is assumed that individuals owned their regular travel and activity habits at a week level before the COVID-19 impact. To ensure each individual has the same time dimension of the dataset, the data of two weeks from each month are selected. However, it is difficult to guarantee the data in 14 consecutive days per month as the data are probably missing for some days without collection. To limit analysis bias, we use data occurrences in the chronological order of the days of the week (Monday, Tuesday, …, Sunday). To do that, we pick up 14 consecutive days in each month (all starting from Monday and ending on Sunday) on the condition that there are a maximum number of individuals who have complete two-week datasets; we reconstitute the remaining individuals’ datasets where data in some days are missing, and they are substituted by those from the same weekdays in other weeks in that month. As a consequence, it remains for 2019 individuals with two-week datasets each month after the filtering process. A large number of individuals is filtered out as they have no complete data of two weeks associated even by the substitution. In total, the filtered dataset contains 379,156 trips. Figure 2 shows the spatial distribution of start points of the trips. We can see, to some degree, that the mobility activities occur much more frequently in the Paris city (75) and its neighboring departments in the small ring (92, 93, and 94), where a high population density exists, and along with the radial railway transport to the suburbs as well.

3.2. Primary and Secondary Activity Inferences

3.2.1. Trip and Activity Attributes

Following the filtered dataset of two-week trips for four months, we can directly calculate the trip characteristics, such as (Euclidean) trip distance and trip time. To obtain the activity information, two consecutive trips are used. The activity places are recorded by the zones where trip destinations are. The activity duration is calculated by the time difference between the current trip departure time and the last trip arrival time for this activity. To detect an activity to be more realistic, we only consider the activity with a threshold of 15 min. Moreover, the accumulate activity duration and the activity frequency that is counted by the number of days at the visited places are calculated. These two characteristics are used for the identification of anchor places, such as home and secondary activities in the following section.

3.2.2. Identification of the Primary and Secondary Places

We define the following synthetic index (SI) to measure the importance of activity places.

S I = W * D + (1 - W) * F,

(1)

where

D

and

F

are both normalized values (

D, F \in [0, 1]

) at the individual level, and they correspond to the accumulate activity duration and the frequency at the same place over the four months, respectively. The weight of

W \in [0, 1]

is used to assign weighted importance to

D

and

F

for

S I

calculation.

Based on the calculated

S I

values per individual, we can make their activities (only ids of the places) in a decreasing order. The top activity place of each individual is recognized as the primary anchor place, i.e., home, and the secondary activity place is also marked, which stands for a habitual visiting place, such as a work or study place. The rest of the places are other activities. Note that the rest of places are categorized later in the first clustering stage in Section 3.3.1.

After the home identification for each individual, we calculate each activity distance from home, representing activity space. Also, we complete individual daily travel–activity diagrams with the derived home information in the case where the first or the last activity place in the daily diagram is missing.

Assume that in a daily diagram, the first trip departs from home and the last trip arrivals at home. Then, the first-activity duration of home staying is estimated by the first trip departure time added by 24 h minus the last arrival time. However, this assumption only works for the cases where the complete home-based trip chains were detected. For those chains lacking first or last home-related trips, the first-activity duration from the above calculation is not correct and it is overestimated, which influences the identification of home and secondary places.

To address this issue, we consider individuals’ first trip departure or last arrival information from other days where trip chains were completely detected. Then, we calculate average departure/arrival times to replace the related missing times (see the distributions of departure and arrival times in Figure 3). The whole procedure of the first-activity duration correction is given as follows. At first, home and secondary places are pre-identified by the rule defined in Equation (1). For home-based trips, we can calculate the first and the last home-related travel information on average, i.e., the average departure/arrival times. Afterward, we correct the first-activity duration with the average times when home-related travels are lost. The same occurs if the first activity is identified to be the secondary place. For the first activity that is neither home nor secondary, its duration is set to be 15 min. After this revision per individual, we identify again home and secondary places. After several iterations, the accuracy on the activity identification and the first-activity duration should be improved. By this way, we only correct the intermediate first-activity duration on each day to identify the anchor places, instead of modifying the whole dataset by adding the lost trips.

3.2.3. Comparison with the 2018 Household Travel Survey

The total duration of the identified home and secondary activities on an average day are compared to the regional household travel survey in 2018—EGT-2018. From the EGT-2018, we only consider active people with ages from 15 to 60 years old. Their average home duration is 15.2 h/day, and the average work or study duration is 7.8 h/day. For the KISIO dataset, the different weights of

W

in Section 3.2.2 are tested. The results are shown in Figure 4. As a whole, the average home and secondary activity durations from February, as well as from the total four months, are both shorter than those from the EGT-2018. This could be explained by that the long staying activities at home especially in weekends (as less trips compared to weekdays) are not detected completely in the KISIO dataset (see Figure 5). The secondary activities identified may be not all linked to work or study but other frequently visited places with short durations. From the results, we observe that both home and secondary activity durations are reduced with the increases in weights. This is even more evident for the latter. The setting of

W = 0

, which means only activity frequency considered in the identification, makes the average daily durations closest to those from the EGT-2018, according to the minimum of the total Root Mean Square Errors (RMSEs) of the comparison pairs. Note that, for such a study period of four months, the factor of activity frequency seems determinant, compared to the accumulate activity duration. In our study, we choose the weight of

W = 0

for later analysis.

Table 2 shows the statistical description of the average daily trip features, in terms of the daily number of trips, daily travel distance, and daily travel time. In the month of February (before COVID-19), one individual completed about 3.6 trips, traveled 25.1 km, and took 2.0 h for those trips on an average day. They are mostly close to the values from the EGT-2018.

Table 3 shows the statistical description of the activity features. In February, there are 2.8 daily activities (including home) per individual. The activities are represented by the unique places with a staying time of more than 15 min. For repeated activities at the same place at different times of the day, only one activity counts. After the home is identified for each individual, we can calculate the daily outdoor activity duration of about 6.8 h on average. The activity space from home is 8.8 km.

By comparing the above results, we see that the statistics from the month of February are approximate to those from the EGT-2018, which indicates the satisfying inference of travel and activity characteristics. The statistics from other months reflect, to some degree, the mobility variance influenced by COVID-19. The following section will aggregately analyze the mobility features with our proposed clustering method.

3.3. Three-Stage Clustering Process

We aggregate travel and activity characteristics by three stages, corresponding to the process at the activity level, day level, and individual level. At each stage, the representative features are selected for its clustering process using the k-means++ algorithm, which chooses initial seeds in the k-means algorithm to easily find cluster centers that minimize the intra-class variance. The effects of clusters obtained at the lower stage act as features to be used at the upper stage for clustering. This cascading process only happens in the second and third stages.

3.3.1. Stage 1: Activity Types

In this stage, we select three activity features to identify different activity types. The selected spatiotemporal features are activity duration, activity distance (Euclidean distance from home), and activity frequency (number of visiting days over the four months). Note that, we detected the home and the secondary places in the data preparation. Here, the clustering process is only for the rest of other activities (i.e., other anchor places), except for home and the secondary places. We choose the proper number of clusters, k = 4, according to the Elbow method where the optimal k value (x-axis) is the point at which the graph forms an elbow, regarding the within-sum-of-square (WSS) values (y-axis), as shown in Figure 6. Combining with home and secondary activities, a total of 6 activity types are obtained and their aggregate features, on average, are shown in Table 4.

As we can see, AT₁, AT₂, AT₃, and AT₄ have contrastive characteristics by the three combined activity features. Specifically, AT₁ accounts for most parts of the activities (35%), which have a short duration, short activity distance from home, and low visit frequency by days. This activity type could be imagined as some common activities like going shopping and restaurants. AT₂ has the longest distance of 29.1 km away from home but with only one or two visit times during the four months. It only accounts for 6.7% of the total activities. This type of activity might be a rare visit to far-away entertainment or leisure places. AT₃, with a short duration and a medium frequency type, could be some sportive activities. AT₄ could be represented for some long-time activities but with insufficient frequency, such as part-time jobs. The primary activity type AT_P has an average duration of 14.3 h and about 43 detected days from the study period of 56 days. The secondary activity type AT_S has an average duration of 6.6 h and about 25.5 days detected. They indicate, to some degree, the home (AT_P) and the work/study (AT_S) activity types. Moreover, we can learn about the commuting distance of 9.6 km from AT_S.

3.3.2. Stage 2: Day Types

In the second stage, we use two sets of mobility features at the day level to infer the mobility day types. The first category includes two mobility features: (1) the daily travel distance (DTD) and (2) the daily number of trips (DNT). The second category refers to the number of each activity type per day from the first stage, except the primary activity type (i.e., home). A typical home activity day (for example, more than 23 h staying at home) is considered as an exclusive day type here. We therefore make a clustering process for the selected mobility features and obtained activity types at the day level, except for the predefined home days. Finally, we obtain eight clusters (including home days) named as daily mobility patterns (DMP). The related results of these clusters are shown in Table 5.

From the above table, we can infer the day types by the typical features from the first category of daily mobility and the second category of the importance of activity types. In this way, it is easy to know the principal characteristics of the daily patterns. For example, we learn that DMP₃ is most likely a commuting day, while DMP₅ has the activity type AT₃ (like sports) in addition. The difference between DMP₁ and DMP₆ is that the first type has relatively low importance of AT₁ and short daily travel distance (9.5 km), while the latter type has relatively high importance of AT₁ and medium daily travel distance (24.4 km), although they have similar trip distances of about 5 km calculated from DTD/DNT.

3.3.3. Stage 3: Individual Mobility Patterns and Groups

In the third stage, the individual mobility pattern can be constituted by the number of days per day type during the study period (i.e., 7 × 2 × 4 days). Figure 7 shows one individual’s daily mobility pattern with different day types, distinguished by the four months. This individual owns the following typical day types: DMP₅ in Feb. and DMP₃ in Oct. with both quantities of 7 days, followed by DMP₁ in May and DMP₃ in Nov. with 5 days. The evolution shows that this individual decreases the days of DMP₅ and increases the days of DMP₃ during the months of May, Oct., and Nov. (when measures were taken due to COVID-19). To interpret this daily pattern, we can use the information of the associated main activity and day types that were inferred from the previous two stages (see Table 4 and Table 5).

Each individual can have a mobility pattern like above. To facilitate our mobility analysis, we use the clustering algorithm again to group the individuals who have similar quantities of related DMPs over the four months. Therefore, the aggregative mobility patterns at the day level can be found by groups. As shown in Figure 8, each user group has its own pattern of combined day types and there exist typical day types with high frequencies. For example, the typical day type in Group 1 (Figure 8a) is DMP₁ for each month, which means users mainly performed short daily travel distances for short-duration and low-frequency activities (see related inference in Table 5, then Table 4) and this phenomenon seems relatively stable over the four months. Other groups can also be interpreted this way.

4. Applied Analysis of Daily Mobility Patterns

In this section, we conduct an analysis of the individual DMPs at the group level to see their differences across weekday/weekend and modal shares. And, mobility variability and stability are presented.

4.1. Importance of Weekdays and Weekends

In each group, we calculate the ratios of the average number of weekdays and the average number of weekends associated with the same DMP. The reference ratio is set to 10 (weekdays)/4 (weekends) = 2.5, marked as the baseline.

In Figure 9, the ratios of DMP₃, DMP₄, and DMP₅ are almost beyond the baseline. This evidence is consistent with their pattern inference (see Table 5) that is related to commuting days, which mostly link to weekdays. On the contrary, the DMP₁, DMP₂, DMP₇, and DMP_P are under the baseline for all groups. It means that the activities that mostly happen in weekends instead of weekdays, come from, respectively, the large number of activities nearby home (see DMP₁ and DMP₇), the long distance and low frequent activities (see DMP₂), and the home-staying activities (see DMP_P).

4.2. Modal Shares

The modal shares of all trips per user group are shown in Figure 10. The categories of public transport (PT), car (C), and active modes (AM), mainly walking, are taken into account. Regarding the global shares from the four months, Groups 1 and 7 have relatively more shares of PT and AM. The typical day types in Groups 1 and 7 are DMP₁ and DMP₄, respectively (see Figure 8a,g). This means fewer car uses on the days corresponding to traveling short- or medium-distance trips to low-frequency places. The individuals in Group 2 have relatively more car uses, as they have long distances traveled (see DMP₇ in Figure 8b and Table 5). For Group 4, the individuals prefer using their cars to execute short-distance trips and home-based activities. Regarding the shares per month, it shows that for all groups, there are more PT trips and AM trips before COVID-19 (February) than during COVID-19 (May, Oct., and Nov.). This phenomenon is totally inverse, particularly in the month of May as people shifted to private cars for self-protection due to the explosively increased cases of COVID-19.

4.3. Mobility Variability

In this section, we firstly investigate the month-by-month mobility variability in terms of the DMP frequencies and then check the mobility stability for each user group.

As mentioned in Figure 7, individual mobility patterns might be different by months, especially meeting the epidemic diseases. To obtain the monthly variability, the Squared Euclidian distance (SED) in terms of related DMP frequencies is used. A reference month is set alternatively, with which the other three months are compared. We calculate the SED for each individual as follows:

S E D_{i, j} = \sum_{k} {(f_{i, k} - f_{j, k})}^{2}

(2)

where i is the compared month, j is the reference month, k (k = 1, 2, …, 7, P) is the index of DMP, and f is the DMP frequency (i.e., the number of days).

Table 6 gives the comparable results of the average SED per individual. Compared to the reference month of February, the biggest mobility variance happens in November, reaching 53.5. Compared to the reference month of November, individuals have the smallest pattern difference in October with SED = 31.9. Indeed, COVID-19 has certain impacts on mobility behavior and the impact is augmented along the confinement actions taken by the government.

For the stability per user group, we take into account another variability coefficient

h_{p}

, which is calculated by the cumulative DMP variances from the standard deviation divided by the mean value over the four months. Thus, each individual obtains a coefficient

h_{p}

and the average value per group

C_{g}

is obtained by

h_{p} = \sum_{k} \frac{s d_{k}}{{\bar{x}}_{k}}, C_{g} = \frac{1}{N_{g}} \sum_{p = 1}^{N_{g}} h_{p}

(3)

where k is the index of DMP and

N_{g}

is the total number of individuals in the group g. It is known that the smaller

C_{g}

means less mobility variability from the individuals in this group. As a result, the coefficients are shown in Figure 11. Over the study period, individuals in Group 1, 2, 4, and 6 have less variances in their mobility patterns than those in Group 3, 5, and 7. And, the most stable group is Group 6, where individuals have more commuting days (see DMP₃ in Figure 8g) compared to other groups.

5. Conclusions

This study strengthened the identification process of spatial anchor places and the mobility patterns analysis based on the mobile phone GPS data in Greater Paris, France. The activity types, day types, and individual patterns were progressively obtained through the 3-stage clustering method. The relative clusters were interpretable, according to the mobility characteristics at different aggregation levels. We also found an intrinsic relationship between the user groups and the day types of the week, as well as the mode uses. The individual mobility patterns’ variability by months and stability by groups were measured. In particular, the changes in mobility behavior before and during the confinements caused by COVID-19 were observed based on the identified patterns. Our research proposed the complete pipeline for the identification of anchor places and individual mobility patterns at the day level. The calculated clusters indicate similar individual behavior by groups, which can be used for model-based mobility generation, as well as for a validation purpose. Moreover, they are flexible to be measured regarding users’ behavior changes, which benefit for discrete choice modeling of activity time, locations, etc. However, there are some limitations as well. First, it is not sufficient for the determination of the weight choice in activity inference only by the comparison of average daily activity duration with the HTS. More comparisons of other variables are needed for the proper weight choice. Second, we only considered users who had outdoor activities with their executed trips. Users who stayed at home for some days were probably filtered out (not enough data of mobility days) or treated with the substitution of days from other weeks, in order to satisfy the chronological order of two entire weeks. Third, the day types were calculated based on the feature frequencies of activity types, and their activity transitions or sequential order were ignored. This would limit the application of day types for sequential activity generation in time dimension. In the future work, we will overcome above shortcomings and look into the representativeness of the identified patterns and their scalability for a synthetic population. It would be interesting to implement potential applications to transport and mobility management based on our results, for example, facilitating ridesharing services for the targeted user groups who have similar activity profiles with long distance traveling.

Author Contributions

Conceptualization, B.Y. and X.H.; methodology, X.H. and B.Y.; validation, B.Y. and L.L.; formal analysis, X.H., B.Y. and L.L.; writing—original draft preparation, X.H. and B.Y.; writing—review and editing, B.Y. and L.L.; visualization, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to collaboration purposes.

Acknowledgments

This work is supported by the Chair of ENPC-IDFM and the Hainan Provincial Natural Science Foundation of China [620RC606]. We thank the mobility consulting company for providing us with the mobile phone data and also thank Leurent for his advice on this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Caulfield, B.; Kehoe, J. Usage patterns and preference for car sharing: A case study of Dublin. Case Stud. Transp. Policy 2021, 9, 253–259. [Google Scholar] [CrossRef]
Asamer, J.; Reinthaler, M.; Ruthmair, M.; Straub, M.; Puchinger, J. Optimizing charging station locations for urban taxi providers. Transp. Res. A Policy Pract. 2016, 85, 233–246. [Google Scholar] [CrossRef]
Gkavra, R.; Susilo, Y.O.; Klementschitz, R. Determinants of usage and satisfaction with demand responsive transport systems in rural areas. Transp. Res. Rec. 2024, 2678, 667–680. [Google Scholar] [CrossRef]
Chen, C.; Bian, L.; Ma, J. From traces to trajectories: How well can we Guess activity locations from mobile phone traces? Transp. Res. C Emerg. Technol. 2014, 46, 326–337. [Google Scholar] [CrossRef]
Rasmussen, T.K.; Ingvardson, J.B.; Halldórsdóttir, K.; Nielsen, O.A. Improved methods to deduct trip legs and mode from travel surveys using wearable GPS devices: A case study from the greater Copenhagen area. Comput. Environ. Urban Syst. 2015, 54, 301–313. [Google Scholar] [CrossRef]
Ge, Q.; Fukuda, D. Updating origin-destination matrices with aggregated data of GPS traces. Transp. Res. C Emerg. Technol. 2016, 69, 291–312. [Google Scholar] [CrossRef]
Guo, Y.; Yang, F.; Yan, H.; Xie, S.; Liu, H.; Dai, Z. Activity-based model based on multi-day cellular data: Considering the lack of personal attributes and activity type. IET Intell. Transp. Syst. 2023, 17, 2474–2492. [Google Scholar] [CrossRef]
Luo, N.; Nara, A.; Khoo, H.L.; Chen, M. An integration modeling framework for individual-scale daily mobility estimation. Travel Behav. Soc. 2024, 34, 100650. [Google Scholar] [CrossRef]
Reddy, S.; Mun, M.; Burke, J.; Estrin, D.; Hansen, M.; Srivastava, M. Using mobile phones to determine transportation modes. ACM Trans. Sens. Netw. 2010, 6, 1–27. [Google Scholar] [CrossRef]
Byon, Y.J.; Abdulhai, B.; Shalaby, A. Real-time transportation mode detection via tracking global positioning system mobile devices. J. Intell. Transp. Syst. 2009, 13, 161–170. [Google Scholar] [CrossRef]
Yin, M.; Sheehan, M.; Feygin, S.; Paiement, J.F.; Pozdnoukhov, A. A generative model of urban activities from cellular data. IEEE Trans. Intell. Transp. Syst. 2017, 19, 1682–1696. [Google Scholar] [CrossRef]
Chrétien, J.; Le Néchet, F.; Leurent, F.; Yin, B. Chapter 3—Using mobile phone data to observe and understand mobility behavior, territories, and transport usage. In Urban Mobility and the Smartphone: Transportation, Travel Behavior and Public Policy; Aguilera, A., Boutueil, V., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 79–141. [Google Scholar] [CrossRef]
Ashbrook, D.; Starner, T. Using GPS to learn significant locations and predict movement across multiple users. Pers. Ubiquit. Comput. 2003, 7, 275–286. [Google Scholar] [CrossRef]
Montoliu, R.; Blom, J.; Gatica-Perez, D. Discovering places of interest in everyday life from smartphone data. Multimed. Tools Appl. 2013, 62, 179–207. [Google Scholar] [CrossRef]
Toader, B.; Sprumont, F.; Faye, S.; Popescu, M.; Viti, F. Usage of smartphone data to derive an indicator for collaborative mobility between individuals. ISPRS Int. J. Geo-Inf. 2017, 6, 62. [Google Scholar] [CrossRef]
Liu, F.; Janssens, D.; Wets, G.; Cools, M. Annotating mobile phone location data with activity purposes using machine learning algorithms. Expert Syst. Appl. 2013, 40, 3299–3311. [Google Scholar] [CrossRef]
Hui, Y.; Ding, M.; Zheng, K.; Lou, D. Observing trip chain characteristics of round-trip carsharing users in China: A case study based on GPS data in Hangzhou city. Sustainability 2017, 9, 949. [Google Scholar] [CrossRef]
Jiang, S.; Ferreira, J.; Gonzalez, M.C. Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore. IEEE Trans. Big Data 2017, 3, 208–219. [Google Scholar] [CrossRef]
Yin, B.; Leurent, F. Exploring individual activity-travel patterns based on geolocation data from mobile phones. Transp. Res. Rec. 2021, 2675, 771–783. [Google Scholar] [CrossRef]
Andrade, T.; Cancela, B.; Gama, J. Discovering locations and habits from human mobility data. Ann. Telecommun. 2020, 75, 505–521. [Google Scholar] [CrossRef]

Figure 1. Method framework.

Figure 2. Distribution of start points of the trips.

Figure 3. Distributions of (a) the first departure time from home, and (b) the last arrival time at home.

Figure 4. Activity duration comparison by different weights.

Figure 5. Distributions of daily duration at home in (a) KISIO (W = 0) and (b) EGT-2018.

Figure 6. Elbow method to choose the optimal number of clusters.

Figure 7. Example of the individual mobility pattern over the four months.

Figure 8. Seven user groups: (a) Group 1, (b) Group 2, (c) Group 3, (d) Group 4, (e) Group 5, (f) Group 6, (g) Group 7.

Figure 9. Ratio of days in weekdays and in weekends (dash line represents the reference ratio).

Figure 10. Modal shares by user groups.

Figure 11. Average coefficient of variability per user group.

Table 1. Key elements for activity inference and mobility pattern analysis.

Paper	Activity Inference			Activity Pattern Detection
Paper	Rule-Based	Learning-Based	Hybrid	Activity (Trip) Level	Day Level	Individual Level
[11]	x			x
[13]		x		x
[14]			x	x		x
[15]	x					x
[16]		x
[17]		x		x	x
[18]	x				x	x
[19]	x			x	x	x
[20]		x		x		x

Table 2. Statistics on trips.

	Feb.	May	Oct.	Nov.	EGT-2018 (Age of 15–60)
Ave. daily number of trips	3.6	3.0	3.6	3.1	4.2
Ave. daily travel distance (km)	25.1	18.7	26.3	22.0	24.8
Ave. daily travel time (min)	118	89	155	126	111

Table 3. Statistics on activities.

	Feb.	May	Oct.	Nov.	EGT-2018 (Age of 15–60)
Daily number of activities (>15 min, unique places)	2.8	2.4	2.7	2.4	2.7
Daily outdoor activity duration (min)	403	321	371	336	420
Activity distance from home (km)	8.8	7.6	9.0	8.4	8.9

Table 4. Aggregate results per activity type.

Type (%)	Activity Duration (h)	Activity Distance (km)	Activity Frequency (Days)	Inference
AT₁ (35.0)	1.1	5.0	3.5	Low duration, short distance, and low frequency activity (e.g., shopping)
AT₂ (6.7)	1.7	29.1	3.5	Low duration, far away, and low frequency activity (e.g., leisure)
AT₃ (8.9)	1.8	4.0	18.7	Low duration, short distance, and medium frequency activity (e.g., sports)
AT₄ (6.7)	8.3	9.7	6.1	Medium duration, medium distance, and low frequency activity (e.g., part-time jobs)
AT_P (28.1)	14.3	0.1	42.8	High duration, nearby home, and high frequency activity (e.g., home staying)
AT_S (14.5)	6.6	9.6	25.5	Medium for all (e.g., work or study)

Table 5. Aggregate results per day type.

Type	DTD	DNT	AT₁	AT₂	AT₃	AT₄	AT_S	Inference
DMP₁ (22.9%)	9.5	1.8	0.9	0.1	0.0	0.0	0.0	Short daily travel distance + AT₁
DMP₂ (8.9%)	12.0	2.9	0.7	0.0	1.2	0.0	0.0	AT₃
DMP₃ (23.3%)	19.2	2.6	0.5	0.0	0.0	0.0	1.0	Commuting day
DMP₄ (15.1%)	21.4	2.8	0.6	0.1	0.1	1.0	0.1	AT₄
DMP₅ (6.6%)	21.9	3.9	0.6	0.0	1.1	0.0	1.0	Commuting day + AT₃
DMP₆ (10.8%)	24.4	5.1	3.0	0.1	0.2	0.1	0.3	Medium daily travel distance + AT₁
DMP₇ (8.3%)	62.9	3.9	0.6	1.6	0.1	0.1	0.3	AT₂
DMP_P (4.1%)	6.4	2.0	0.4	0.0	0.1	0.0	0.1	Home day

Note: For AT₁, AT₂, …, ATs, the bold values are the maximum ones either by column or by row.

Table 6. Monthly variability measured by SED.

SED	Feb.	May	Oct.	Nov.
Feb.	-	42.5	43.1	53.5
May	42.5	-	48.1	47.7
Oct.	43.1	48.1	-	31.9
Nov.	53.5	47.7	31.9	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, X.; Yin, B.; Liu, L. Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data. Future Transp. 2024, 4, 1318-1333. https://doi.org/10.3390/futuretransp4040063

AMA Style

Hao X, Yin B, Liu L. Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data. Future Transportation. 2024; 4(4):1318-1333. https://doi.org/10.3390/futuretransp4040063

Chicago/Turabian Style

Hao, Xuguang, Biao Yin, and Liu Liu. 2024. "Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data" Future Transportation 4, no. 4: 1318-1333. https://doi.org/10.3390/futuretransp4040063

APA Style

Hao, X., Yin, B., & Liu, L. (2024). Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data. Future Transportation, 4(4), 1318-1333. https://doi.org/10.3390/futuretransp4040063

Article Menu

Identification of Individual Mobility Anchor Places and Patterns Based on Mobile Phone GPS Data

Abstract

1. Introduction

2. Previous Work

3. Data and Methods

3.1. Data Preparation

3.1.1. Overview

3.1.2. Data Filtering

3.2. Primary and Secondary Activity Inferences

3.2.1. Trip and Activity Attributes

3.2.2. Identification of the Primary and Secondary Places

3.2.3. Comparison with the 2018 Household Travel Survey

3.3. Three-Stage Clustering Process

3.3.1. Stage 1: Activity Types

3.3.2. Stage 2: Day Types

3.3.3. Stage 3: Individual Mobility Patterns and Groups

4. Applied Analysis of Daily Mobility Patterns

4.1. Importance of Weekdays and Weekends

4.2. Modal Shares

4.3. Mobility Variability

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI