## 1. Introduction

With the development of urbanization, people place increasing demands on urban traffic and transportation. Taxis, with respect to their flexibility and convenience, have become one of the most popular modes of urban transportation [

1]. Taxis are an indispensable component of the urban transportation system and meet the travel demands of a great number of people. In many cities, however, people expecting to take a taxi are used to stopping a vacant taxi via “roadside beckoning”. Therefore, the locations of taxi drivers picking up their passengers are highly random [

2]. Both the people seeking to take a taxi and taxi drivers have insufficient location information on the best location for a pick-up, which causes the phenomenon that a taxi driver has difficulty finding passengers while people find it difficult to locate a vacant taxi. Vacant taxis in urban road networks not only generate superfluous energy (i.e., oil and gas) consumption but also occupy road space, causing traffic flow congestion and air pollution problems [

3,

4].

In recent years, taxis have been widely equipped with global positioning system (GPS) sensors, which are mobile devices that can monitor taxi locations and statuses at regular intervals. Therefore, a large number of GPS trajectories with spatio-temporal information have been collected. Such a large amount of tracking data provides an unprecedented opportunity to discover the implicit information and understand taxi drivers’ driving behaviors, human mobility, and the dynamics of street networks [

5,

6]. The mining of taxi GPS trajectories has received increasing attention from the data mining, intelligent transportation, movement patterns and ubiquitous computing communities [

2,

7,

8]. Several studies have considered the historical GPS trajectories of taxis, such as understanding human mobility [

9,

10,

11,

12], estimating traffic emissions [

13,

14,

15,

16], planning routes [

17,

18,

19,

20], and formulating taxi/passenger search strategies [

3,

6,

21,

22,

23].

Unlike other transportation modes such buses and the subway, which only operate along fixed lines, taxis are more flexible because drivers are able to plan their pick-up destinations and cruising routes. Historical pick-up/drop-off information can be useful for drivers to find their next passengers. Several studies have considered taxi route-planning with different approaches toward the historical GPS trajectories of taxis [

1,

2,

3,

6,

8,

21,

22,

23,

24,

25,

26,

27,

28].

On the one hand, many studies focus on drivers’ strategies for finding passengers [

8,

21,

23,

24,

27,

28]. Obviously, it is helpful to increase cruising efficiency and reduce unnecessary cruising time by examining the cruising patterns of experienced drivers. Zhang et al. described both the efficient and inefficient taxi service strategies based on a large-scale GPS historical database [

8]. Hu et al. discussed the characteristics of urban taxi drivers’ activity distributions at different temporal and spatial levels to identify taxi drivers’ operation patterns and searching-behavior patterns to help taxi drivers reduce searching time [

23]. Liu et al. presented a framework including a series of models to study how a taxi driver gathers and learns information in an uncertain environment; they found that drivers not only learn from their own experiences but also communicate with other drivers [

28].

On the other hand, many studies tend to concentrate on determining the hot spots of pick-up locations [

1,

2,

6,

29]. Hot spots of pick-up locations exist on road networks at different times in a single day [

3,

22]. Experienced drivers usually know where they are more likely to quickly pick up their next passengers after dropping passengers off instead of remaining vacant on the road network. Moreira et al. presented a novel application using time-series forecasting techniques to predict the taxi-passenger demand at taxi stands at 30-minute intervals to improve taxi-driver mobility intelligence [

2]. Liu et al. discussed the crowdedness of moving objects and explored hot spots from the crowdedness using historical GPS trajectories [

9]. Hwang et al. proposed a grid-based clustering approach considering four factors, including waiting time, distance, average revenue and the probability of finding passengers when clustering, to recommend the next pick-up locations to taxi drivers [

30].

In this paper, we consider the clustering of pick-up locations to explore the hot spots of recommended locations. In addition, we consider computing the probability of picking up the next passengers (along the routes or at pick-up locations) and how to allocate the cruising routes to more than one taxi driver in a small-scale region of neighboring pick-up locations. We integrate the abovementioned factors into a spatio-temporal trajectory (STT) model. The framework of this paper is illustrated in

Figure 1.

## 3. Spatio-Temporal Trajectory Model

In this study, a spatio-temporal trajectory model is proposed to recommend to taxi drivers locations where they can quickly pick up passengers and suitable routes to these locations. Three factors have been considered in the STT model: the pick-up locations cluster, average taxi travel speed and cruising routes.

#### 3.1. Cluster of Pick-Up Locations

When the passengers get out of the taxi or a taxi wanders on the road network, the taxi is in vacant status and a taxi driver must find the next passengers. Experienced drivers tend to go to locations around where they are more likely to pick up passengers. Especially during the off-peak periods, experienced drivers usually wait at locations such as shopping centers or bus and train stations.

Pick-up locations usually represent the hot spots where there are high taxi travel demands [

2,

21,

28,

32]. Thus, we can unite the historical pick-up information to explore high-demand areas and analyze the distribution patterns of pick-up hot spots. Clustering is a feasible and meaningful approach to identify hot spots of moving vehicles in an urban area [

9]. In particular, using a density-based clustering method, we can discover clusters with arbitrary shapes and avoid the adverse impacts of noise and unusual points. Moreover, density-based clustering has a strong ability to address geographical characteristics [

37]. In this work, pick-up locations were explored by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm [

38], which is one of the most common clustering algorithms cited in the scientific literature [

39,

40,

41,

42]. Given a set of pick-up points in a study area, it groups together points that are closely packed together, marking points that lie alone in low-density regions as outliers. DBSCAN requires two parameters: epsilon (Eps) and minimum points (MinPts). In addition, weekdays have different patterns from weekends [

8]. In our experiment, the parameters were different for weekdays and weekends (

Table 2).

#### 3.2. Average Taxi Travel Speed and Travel Time

Taxis operate throughout the day and cover most road segments in the road network; therefore, we can derive the average taxi travel speed and travel time from taxi trajectory data [

5]. The average travel speed was computed by averaging the travel speeds of all taxis passing through a road segment

$r$ within each specific time period t, as shown in Equation (1) [

20].

where

$m$ is the total number of taxis,

${n}_{k}$ records the total number of points of a taxi

$k$ on road segment

$r$ within time period

$t$,

${V}_{\left(i,j\right)}$ denotes the

${j}^{th}$ instantaneous travel speed, and

${n}_{r}\left(t\right)$ denotes the total number of points of all taxis and is equal to

$\sum}_{i=1}^{m}{n}_{i$. Then, the average travel time on road segment

$r$ during time period

$t$ can be computed as follows:

where

${L}_{r}$ is the length of road segment

$r$.

#### 3.3. Cruising Routes

Based on the clustered pick-up locations and average taxi travel speed, we are then able to explore the cruising routes to pick-up locations. We can obtain multiplex (at least one) routes from the current point to any pick-up location within the road network. In the literature, most studies have considered the best route choice from start to pick-up locations with different approaches. In this paper, we consider how to allocate the cruising routes to more than one taxi driver in a small-scale region to the neighboring pick-up locations. For example, using our method, taxi driver $A$ determined one route from the current location ${L}_{a}$ to pick-up location $P$, while taxi driver $\mathrm{B}$, not as far from $A$, might recommend the same route to ${L}_{a}$. If all taxi drivers in a small-scale region follow the same drive route, the road network may become overloaded, which would increase congestion. In the STT model, we first consider exploring multiple optimal routes using the $k$ shortest path routing algorithm. Then, the pick-up probability of each selected route is considered. Finally, based on the cruising routes and their pick-up probabilities, we utilize load balancing technology to allocate the cruising routes to taxi drivers. The details of these steps are described in the following sections.

The

$k$ shortest path routing algorithm is an extension algorithm of the shortest path routing algorithm [

43], where more than one route between two locations will be obtained in the road network. The algorithm not only finds the shortest path but also finds the

$k-1$ of other paths in increasing order of cost [

44,

45]. Parameter

$k$ denotes the number of shortest paths to find.

Our STT model is designed to provide the pick-up locations service for taxi drivers and the cruising routes to these locations. However, determining how to define a “good” cruising route is an important issue. In this work, we computed the probability of picking up the next passengers using the method proposed by Yuan et al. [

3]. The reason for using this method is that it considers how to support a high probability (during the routes or at the recommended location) of picking up passengers, a short waiting time, a short queue length at the pick-up location and a long distance of the next trip.

A cruising route is dependent on a certain pick-up location

$P$ and a route

$R$ (route

$R$ would be divided into a number of connected road segments, i.e.,

$R=\left\{{r}_{1},{r}_{2},{r}_{3},\dots ,{r}_{n}\right\}$, where

$n$ denotes the total number of road segments) to

$P$. Taxi drivers may pick up passengers on

$R$ or at

$P$; however, the worst situation is when the taxi driver fails to pick up passengers after waiting at

$P$ for a time

${t}_{p}$ (

${t}_{p}$ would be divided into a sequence of time segments with interval

$r$, i.e.,

${t}_{p}=m\tau $, where

$m$ denotes the total number of time segments). Let

$E$ be the event that the driver succeeds in picking up a passenger if he/she selects route

$R$. Then,

where

$Pr\left(E\right)$ is the probability of

$E$,

$p\left({r}_{i}\right)$ denotes the probability of picking up passengers on road segment

${r}_{i}$, and

$p\left({p}_{j}\right)$ denotes the probability of picking up passengers at

$P$ after waiting for

$j\tau $ time.

In addition, the conditional expectations of the duration (denoted by

$T$) and expected distance (denoted by

$D$) from current time

${t}_{0}$ to the beginning of the next trip are computed for the purpose of making comparisons between the recommended routes and historical routes. Based on Bayes rules,

where

$E(T|E)$ is the conditional expectations of the duration, and

${t}_{i}$ denotes the driving time to arrive at road segment

${t}_{i}$.

$Pr\left({E}_{i}\right)$ denotes the probability of the event that a driver succeeds in picking up a passenger at

${r}_{i}$, and

$\mathrm{Pr}({E}_{i})=\{\begin{array}{ll}p({r}_{1}),\hfill & i=1\hfill \\ p({r}_{i})\times {\displaystyle \prod _{k=1}^{i-1}(1-p({r}_{k}))},\hfill & i=2,3,\dots ,n\hfill \end{array}$.

The conditional expectations of expected distance are as follows:

where

${d}_{i}$ denotes the driving distance to arrive at the road segment

${r}_{i}$.

Load balancing is a critical method to improve the distribution of workloads across multiple computing resources in computing fields [

46]. This method aims to optimize resource use, maximize throughput, and avoid the overloading of any single resource. One of the most commonly used applications of load balancing is to provide a single Internet service from multiple servers [

47]. In the present study, the load balancing method was utilized to allocate the cruising routes to more than one taxi driver in a small-scale region to the neighboring pick-up locations. To be specific, the weighted round-robin scheduling algorithm [

48], one of the most commonly used load balancing algorithms [

49], was adopted. In this method, the servers and connections are two fundamental elements. Each server should be assigned a weight representing its processing capacity. Servers with higher weights receive new connections before those with smaller weights. In this work, we defined vacant taxis as connections, cruising routes as servers and the probability of picking up the next passengers as the processing capacity of the servers. The pseudocode is provided in Algorithm 1.

**Algorithm 1:** The Weighted Round-Robin Scheduling Algorithm |

**Inputs:** Candidate cruising routes set as $\mathbf{R}=\left\{{\mathit{R}}_{0},{\mathit{R}}_{1},\dots ,{\mathit{R}}_{\mathit{n}-1}\right\}$; $\mathit{W}\left({\mathit{R}}_{\mathit{i}}\right)$ indicates the probability of picking up a new passenger at ${\mathit{R}}_{\mathit{i}}$; $\mathit{i}$ indicates the route selected last time, and $\mathit{i}$ is initialized with −1; $\mathit{c}\mathit{w}$ is the current weight in scheduling and is initialized with 0; $\mathit{m}\mathit{a}\mathit{x}\left(\mathit{R}\right)$ is the maximum weight of all the routes in $\mathit{R}$; $\mathit{g}\mathit{c}\mathit{d}\left(\mathit{R}\right)$ is the greatest common divisor of all route weights in $\mathit{R}$.
**Output:** The distributed server ${\mathit{S}}_{\mathit{i}}$ 1: while true do 2: $\mathit{i}$ ← ($\mathit{i}+1$) mod $\mathit{n}$; 3: If $\mathit{i}==0$ then 4: $\mathit{c}\mathit{w}$ ←$\mathit{c}\mathit{w}$ − $\mathit{g}\mathit{c}\mathit{d}\left(\mathit{R}\right)$; 5: If $\mathit{c}\mathit{w}\le 0$ then 6: $\mathit{c}\mathit{w}$ ← $\mathit{m}\mathit{a}\mathit{x}\left(\mathit{R}\right)$; 7: if $\mathit{c}\mathit{w}==0$ then 8: return NULL; 9: end if 10: end if 11: end if 13: if $\mathit{W}\left({\mathit{R}}_{\mathit{i}}\right)\ge \mathit{c}\mathit{w}$ then 14: return ${\mathit{R}}_{\mathit{i}}$; 15: end if 16: end |

An example of this method is given in

Figure 3. Assuming that there are three recommended pick-up locations (

${P}_{1}$,

${P}_{2}$, and

${P}_{3}$) and 4 recommended cruising routes

${R}_{1}^{1}$,

${R}_{2}^{1}$,

${R}_{3}^{1}$ and

${R}_{3}^{2}$ with the weights 60%, 50%, 40% and 30%, respectively, then 10 taxis (

${C}_{i},i\in \left[1,10\right]$) are considered to be allocated. Based on our methods,

${R}_{1}^{1}$ will be assigned to

${C}_{1}$,

${C}_{2}$,

${C}_{4}$ and

${C}_{7}$;

${R}_{2}^{1}$ will be assigned to

${C}_{3}$,

${C}_{5}$ and

${C}_{8}$;

${R}_{3}^{1}$ will be assigned to

${C}_{6}$ and

${C}_{9}$; and

${R}_{3}^{2}$ will be assigned to

${C}_{10}$.

## 4. Results

Pick-up time statistics are presented in

Figure 4 and

Figure 5. Note that all the time periods or hours in this study refer to Beijing time. The pick-up patterns look similar from Monday to Friday, the patterns on weekend days are similar. No matter on weekdays or weekends, the lowest demanding time is between 4 a.m. and 5 a.m. In addition, the pick-up times on weekends are always lower than those on weekdays. With respect to peak times, there are four peak times in one day during weekdays (9 a.m. to 10 a.m., 2 p.m. to 3 p.m., 5 p.m. to 6 p.m. and 8 p.m. to 9 p.m.), while three peak times occur on weekends (10 a.m. to 12 p.m., 2 p.m. to 3 p.m., and 9 p.m. to 10 p.m. on Saturday and 5 p.m. to 6 p.m. on Sunday). Moreover, people seem to go out later on weekends than weekdays, but the duration of time that people are away is usually longer.

The average duration and average distance from the current time to the beginning of the next trip vary at different parts of the day (

Figure 6). There are rarely passengers before dawn regardless of the day of the week, and thus, drivers must spend substantial time and drive a long way to find passengers; drivers spend less time searching for passengers during high-demand periods. Compared with weekdays, drivers spend less time searching for passengers on weekends before drawn. In addition, people spent more time away from home on weekends. Furthermore, drivers spend less time searching for passengers in the weekdays before 10 a.m.

The cluster counts of pick-up locations showed variable changes at different times in one day (

Table 3). We observed that in the early morning, whether on weekdays or weekends, the number of clusters was the lowest. During this time, few people went out, and there was a low demand for taxis. As the number of people going out increased, increasing taxi demand was generated, and the number of clusters began to increase. There was little difference in the number of clusters on weekends or weekdays in the morning. In the afternoon, however, the number of clusters on weekdays was more than double that on the weekend, which might be because people spend more time away from home on weekends than on weekdays. During the evening, the demand for taxis gradually reduced, while the number of hot spot demand areas began to decrease.

The regional distribution of the cluster counts of pick-up locations at different times in one day are presented in

Figure 7 (weekdays) and

Figure 8 (weekends). Note that the legend time represents the average time spent picking up a passenger. We found the following:

In the early morning, high-demand areas were mainly in the Dongcheng District and Chaoyang District on weekdays (

Figure 7a) and especially on weekends (

Figure 8a). Eastern Beijing has both foreign affairs areas and business districts, and as many business centers are there, taxi demand occurred in the early morning. Moreover, as people go out for longer on weekends, there was also greater demand on weekends.

At 8 a.m. or 9 a.m., demand was centered around public transportation stations, such as the Beijing Station and Beijing West Station (

Figure 7b and

Figure 8b).

Later in the day, the number of high-demand areas began to increase. Chaoyang is the largest district in terms of population, followed by the Haidian District and Fengtai District. Approximately 45.6% of Beijing’s population lives in these districts.

Figure 7 and

Figure 8 show that high-demand areas were concentrated in these areas. There are many good colleges and high schools in the Haidian District. Manufacturing and scientific research groups are concentrated between West Second Ring Road and West Third Ring Road. Moreover, most of the population is employed in the Haidian District and Fengtai District. Therefore, during the day, the number of high-demand areas in the Haidian District and Fengtai District were greater than that in the Chaoyang District; this difference was particularly pronounced on the weekends (

Figure 7c–e and

Figure 8c,d).

During the evening, the number of high-demand areas in the Chaoyang District changed more slowly than in the Haidian District and Fengtai District, particularly on the weekends. The major reason might be because there are several catering and entertainment businesses in the Chaoyang District (

Figure 7f–h and

Figure 8e,f).

Traffic resistance was detected using average taxi travel speed and travel time on weekdays and weekends (

Figure 9 and

Figure 10).

5. According to the Evaluation Index System of Urban Road Traffic Management provided by the Ministry of Public Security, a travel speed of >30 km/h indicates no congestion, a travel speed of between 20 km/h and 30 km/h indicates mild congestion, a travel speed of between 10 km/h and 20 km/h indicates congestion, and a travel speed of <10 km/h indicates strong congestion. We observed that before 5 a.m., traffic transportation showed the best conditions no matter which day of the week. Almost all the roads were clear (

Figure 9a,b and

Figure 10a,b), because during this period, there were fewer people and cars going out. Moreover, on weekdays, roads became increasingly congested from 6 a.m. onwards. During the peak morning hours, the average speed was approximately 25 km/h (

Figure 9c). However, the traffic conditions were slightly better in the early afternoon (

Figure 9d,e). In the period of the evening peak (5 p.m. to 7 p.m.), roads again became congested, and the congestion worsened (

Figure 9f); the average speed was only approximately 22.9 km/h. After the evening peak, the traffic began to clear, and at midnight, there were no jammed roads (

Figure 9g,h). On the weekends, traffic transportation showed better conditions than that on weekdays. There was no morning peak or evening peak (

Figure 10c,f), but the traffic conditions were still congested during the day (

Figure 10d,e). After the evening peak, the traffic began to clear. This pattern shows that people’s activities have impacts on traffic transportation conditions.

In simulation experiment, separate statistical comparisons were conducted for time and distance aimed to verify the effect of the recommended cruising routes using an STT model in our work. Specifically, we divided one day (24 h) into 24 time periods. Next, we extracted a random 1000 historical taxi routes in each time period. Then, the historical time spent and driving distance and expected time spent and driving distance from drop-off location to the pick-up location of the next passenger in one route were estimated. Statistical comparisons are shown in

Figure 11. We observed that the historical times spent were mostly greater than the expected times; that is, taxi drivers using our recommended route to find the next passengers can significantly reduce the time spent, while with the historical route, the taxi drivers traveled greater distances than those with our recommended route. In conclusion, taxi drivers using cruising routes recommended by our STT model can significantly reduce the waiting time (about average 2.98 min on weekdays and 3.81 min on weekends) and travel less distance (about average 2.92 km on weekdays and 2.86 min on weekends) to quickly find their next passengers.

In the STT model, the

$k$ shortest path routing algorithm was presented, and

$k$ cruising routes were obtained. Then, the load balancing method was presented to allocate these

$k$ cruising routes to more than one taxi driver in a small-scale region to the neighboring pick-up locations. However, if directly offering these

$k$ cruising routes and their pick-up probabilities to taxi drivers and letting them choose by themselves, the cruising route with the highest probability would be chosen by most taxi drivers. This condition would cause a large number of taxi drivers to select the same road to find their next passengers and would reduce the accuracy of the probability computed in our model. Considering this, we conducted a comparison of whether or not the load balancing method is utilized (

Figure 12). We observed that the load balancing strategy significantly alleviates road loads. When the load balancing method was not utilized in the STT model, the maximum percentage would be close to 50% at the off-peak period (approximately 5 a.m. to 7 a.m.). However, the maximum percentage would reduce to just 30% when the load balancing method was utilized in our STT model.

## 5. Discussion

In this work, we present an STT model based on the historical GPS trajectories of taxis to explore spatio-temporal traffic patterns and guide taxi navigation. Furthermore, the STT model can provide useful information for taxi drivers to quickly pick up their next passengers, further saving time and gas and increasing profits. In addition, the taxi fleet management method in our model could help ease urban traffic problems.

Numerous previous studies have explored the optimizing cruising routes and recommendations to taxi drivers in various methods. One such study used time series forecasting techniques to predict the spatio-temporal distribution in real-time and developed an online recommendation system for the taxi stand choice in the city of Porto, Portugal [

50]. The fleet equipped with their recommendation system can significantly reduce 5% of average waiting time, but they actually ignore the comparisons between weekends and weekdays. Another study propose a Time-Location-Relationship combined taxi service recommendation model, utilizing Gaussian Process Regression and statistical approaches to improve taxi drivers’ profits. They compared their model with ARIMA, SVM et.al models and found their taxi service recommendation can predict more accurately than others by using the taxi GPS data in Beijing [

51]. Moreover, Wong et al. emphasized that taxi drivers’ cruising decisions are significantly affected by the probability of successfully picking up passengers along this cruising route [

4]. Yuan et al. focused on extracting passenger waiting areas for taxi drivers and computing the probability of picking up their next passengers based on the time spent, road segment information and accessibility to waiting areas [

3,

22]. Qian proposed a method that transformed the taxi-routing issue into a Markov decision process of pick-up locations [

52].

However, these taxi route-planning studies have a limited quantitative focus on the association of cruising routes with traffic resistance and have paid less attention to taxi fleet management. The present article fills this gap by estimating the average taxi travel speed using the historical GPS trajectories of taxis and utilizing a load balancing method, which is widely used in computing fields. Three factors, including the pick-up locations cluster, average taxi travel speed and cruising routes, have been considered in the STT model. The SST model is multi-integrated and mainly designed for macroscopically taxi fleet management but a simple instance, cruising routes recommendation for taxi drivers were depicted based on this global taxi fleet management. As a result, our study shows that taxi drivers using cruising routes recommended by our STT model can significantly reduce the average waiting time and travel less distance to quickly find their next passengers, and the load balancing strategy significantly alleviates road loads. Our results support a growing body of recent literature underlining the superiority of combination and integration of statistical methodologies and load balancing algorithm in transportation patterns mining.

Given to clustering of pick-up locations, the Manhattan distance, rather than Euclidean distance, is used to measure the similarity between two clusters. Based on the different pick-up patterns in different periods, we divided a day (24 h) into 8 time periods on weekdays and 6 time periods on weekends (

Table 3). Then, pick-up cluster analysis in the different time periods was conducted. With respect to parameters in the DBSCAN algorithm, MinPts and Eps were determined through an interactive process by examining the sorted K-dist graph [

53]. Generally, we determined the value in the

y-axis (distance) corresponding to the turning point as Eps [

37]. The K-dist graph is shown in

Figure 13. We observed that the distances corresponding to the turning point were approximately 130 m on weekdays and 140 m on weekends.

Then, average taxi travel speed and travel time were estimated to explore traffic flow patterns. Moreover, these speeds were considered a critical factor when computing the pick-up probability of recommended routes.

The K shortest path routing algorithm was presented to explore multiple optimal routes. In the STT model, parameter k was dynamically determined by estimating the travel time of finding the next passengers. Specifically, cruising routes were reserved if the travel time in this route was less than β times that of the shortest path routing. β was defined as the cruising routes threshold in our model. Considering that most people located all over the study areas are able to take taxis, the cruising routes are suggested to cover most roads. Therefore, we discussed the relationship between β and road coverage in different time periods (

Figure 14). We observed that at any time, whether on weekends or on weekdays, if only one path is planned, it cannot cover all the main roads, especially in the early morning. When β is 1.5, almost all of the time intervals have a coverage of more than 99%. When β is 1.6, almost all of the time intervals have a coverage of more than 99.5%. Thus, when β changes from 1.5 to 1.6, the coverage is not significantly improved, but the number of calculations will increase greatly. Therefore, β was defined as 1.5.

However, much improvement should be conducted in future studies. Firstly, DBSCAN clustering algorithm was utilized to explore high-demand areas from the historical pick-up information. Specifically speaking, we can discover pick-up clusters with arbitrary shapes and avoid the adverse impacts of noise and unusual points, and address geographical characteristics with DBSCAN. But the deterministic lack and adaptability of the setting of parameters isn’t considered and will be examined in future studies. Secondly, due to the limitations of access to the taxi GPS trajectory data, only one-month data is collected and utilized in our experiment, the sample size used in the analysis was small. Despite this, we recognize that this analysis is only a preliminary analysis of transportations patterns mining and it has its limitations. The results of our study highlight the need for researchers to recognize the usefulness of our model as an exploratory data analysis tools for trajectories mining. Moreover, as another shortage of our this study, we does not consider certain variables, such as emergent traffic situation handling and the quantitative incidence of energy consumption because of the lack of these data. It will be very interesting to extend the approach further to consider these variables. Finally, while the data being used in this study is a little outdated in temporal granularities (i.e., 2012) and not a real-time data feed, it is possible to extend the method to enable real-time recommendations based on immediate past or historical records.