#### 2.1. Trip Extraction

Wuhan is the largest city in Central China. Many taxi companies have their vehicles equipped with GPS receivers in order to monitor the operation of each taxi. Taxis equipped with GPS are called ‘‘floating’’ cars, which can monitor the running status of real-time traffic. With the help of GPS devices, historical trajectories of taxis can be recorded as a series of locations sampled at small periodic intervals. In this research, the dataset includes more than 2050 floating cars records from Wuhan. The data cover seven consecutive days of a week, from 2 to 8 June 2014, from Monday to Sunday. For each taxi, its longitude, latitude, time stamp, instantaneous velocity, azimuth angle and occupancy are automatically collected approximately every 40 s. Each taxi reports nearly 2160 GPS sample points every day. In fact, the amount of GPS records is slightly less, as GPS receivers are shut down by drivers or become disconnected. The accumulated observations create a very large dataset for research, averaging 2,485,000 records per day. Subsequently, a record of GPS information can be denoted by

$p\left(t,id,lat,lon,v,h,s\right)$, among them

$t$ denoting the time instant corresponding to the current position of the taxi,

$id$ denoting the taxi identification number,

$lat$ and

$lon$ representing the position of the taxi, namely latitude and longitude,

$v$ representing the instant velocity of the taxi,

$h$ representing the driving direction of the taxi and

$s$ indicating service status of a taxi, vacant or occupied.

Table 1 shows some continuous sampling points of a taxi trajectory used in this research. As depicted in the table, a taxi is occupied when the status equals one; otherwise, it is vacant.

GPS information of the taxi not only reflects the running state of traffic, but also can be used to investigate human mobility patterns based on the taxi occupancy [

10,

11,

12]. Therefore, each extracted trip could be simplified to be a vector,

$<{t}_{i},\left({x}_{i1},{y}_{i1}\right),\left({x}_{i2},{y}_{i2}\right)>$, the term

${t}_{i}$ denoting the time instant corresponding to the trip origin of the taxi,

$\left({x}_{i1},{y}_{i1}\right)$ representing the trip origin geographic coordinates of the taxi and

$\left({x}_{i2},{y}_{i2}\right)$ indicating the geographic coordinates of the trip destination. Therefore, trips taken during different periods could be extracted to study human mobility patterns.

**Definition 1:** In the research, a trip is simplified to be a vector, $<{t}_{i},\left({x}_{i1},{y}_{i1}\right),\left({x}_{i2},{y}_{i2}\right)>$, the term ${t}_{i}$ denoting the time instant corresponding to the trip origin of the taxi, $\left({x}_{i1},{y}_{i1}\right)$ representing the trip origin geographic coordinates of the taxi and $\left({x}_{i2},{y}_{i2}\right)$ indicating the geographic coordinates of the trip destination.

**Definition 2:** A taxi trajectory is constituted by a series of trip tuples, $<{t}_{i},\left({x}_{i1},{y}_{i1}\right),\left({x}_{i2},{y}_{i2}\right),{t}_{i+1},\left({x}_{i+1,1},{y}_{i+1,1}\right),\left({x}_{i+1,2},{y}_{i+1,2}\right),\dots ,{t}_{m},\left({x}_{m,1},{y}_{m,1}\right),\left({x}_{m,2},{y}_{m,2}\right),\dots ,{t}_{n},\left({x}_{n,1},{y}_{n,1}\right),\left({x}_{n,2},{y}_{n,2}\right)$.

Table 2 summarizes the statistics of the dataset that we selected.

Figure 1 shows a map of Chinese cities, indicating where Wuhan is located. As depicted in

Figure 2a, a taxicab trajectory within a loop highway was plotted in the map on 2 June 2014. According to the characteristic of taxicab trajectories, especially the recorded GPS points where anonymous passengers were picked up and dropped off during different time periods, we extracted quantities of trips from more than 2050 taxis on different days. Subsequently, each trip was simplified to be a point pair and trip distance, which are represented by a Pick-Up Point (PUP), a Drop-Off Point (DOP) and the Euclidean distance between the two points. At the same time, the two points, PUP and DOP, can be viewed as the origin and destination of a trip, respectively. Therefore, a trip destination represents the purpose of a trip and reflects human mobility. However, it is worth noting that trips less than a certain distance should be removed, as they are often caused by false driver operations or data errors. In this research, the distance threshold was set to be 0.5 km.

Figure 2b demonstrates the spatial distribution of all taxicab trajectories on 2 June 2014; yellow points and red points denote the positions of trip origins and destinations on 2 June 2014, respectively, which correspond to pick-up points and drop-off points.

Figure 2 illustrates the spatial distribution of pickups and drop offs from the perspective of the number of taxicabs. As shown in

Figure 2, the numbers of red points and yellow points present certain differences because of the coverage. In fact, the number of red points is equal to the yellow points because one pickup point corresponds to one drop off point.

#### 2.2. Distribution of Trips

In this paper, we explore similarities in human behavior in terms of the temporal distribution of trip origins and trip distance. The occurrences of trip origins during each hour every day can be easily obtained and represent the characteristics of human activities over time. Consistent with many previous studies, as depicted in

Figure 3, there are strong daily rhythms and day-to-day trip similarities [

33,

34,

35,

36,

37]. People take more trips during the day than at night, and the temporal patterns on weekends are significantly different from those on workdays. Therefore, there were different mobility patterns on workdays and non-workdays. On weekends, more entertainment, parties, shopping and other recreational activities contribute a large proportion to trip purposes. There were more trips from zero o’clock to five o’clock on weekends than any other time.

Each trip can be viewed as a displacement of an individual trajectory, and the distance distribution reflects the mobility patterns of people. The observed distribution of extracted distance on 2 June was drawn in

Figure 4, as a previous study used an exponentially-truncated power law distribution to fit the distance distribution of taxi trips [

23]. It shows that there are more trips corresponding to short distant travel, while it is the opposite for a long trip. In order to better reflect mobility patterns, in

Figure 4, we choose one hundred meters as the trip unit instead of one thousand meters [

23] because it will appear as a negative value. People travel different distances for different purposes; we divided distance into four categories according to elbows where there exist sharp changes, 1 km, 7 km and 20 km, respectively, as depicted in

Figure 4. Taxi trips with different distances could also prove reasonable by the differences in their temporal variations. As depicted in

Figure 5, although the temporal variations of PUP with different distances from four groups were different, the distribution of trip distance also presents day-to-day similarity in a week, implying the similarity of human daily activity.