Revealing Recurrent Urban Congestion Evolution Patterns with Taxi Trajectories

Urban congestion can be classified into two types: Recurrent Congestion (RC) and Non-Recurrent Congestion (NRC). RC is more regular than NRC, having fixed and long-standing patterns. Mining urban recurrent congestion evolution patterns can assist with congestion cause analysis and the creation of alleviating strategies. Most existing methods for analyzing urban congestion patterns are based on traditional traffic detector data, which is inflexible and expensive. Additionally, prior research primarily focused on the microscopic model, which simulated congestion propagation based on theoretical models and hypothetical networks. As such, most previous models and methods are difficult to apply to real case scenarios. Therefore, we investigated recurrent congestion patterns by mining historical taxi trajectory data that were collected in Harbin, China. A three-step method is proposed to reveal urban recurrent congestion evolution patterns. Firstly, a grid-based congestion detection method is presented by calculating the change in taxi global positioning system (GPS) trajectory patterns. Secondly, a customized cluster algorithm is applied to measure the recurrent congestion area. Finally, a series of indicators are proposed to reflect RC evolution patterns. A case study was competed in the Harbin urban area to evaluate the main methods. Finally, RC cause analysis and alleviating strategy are discussed. The results study are expected to provide a better understanding of urban RC evolution patterns.


Introduction
Recurrent congestion (RC) and Non-Recurrent Congestion (NRC) are two typical types of congestion occurring in urban areas [1].RC is usually caused by insufficient traffic capacity, excess travel demand, and poor signal control [2,3], to name a few.This type of congestion is regular with typical fixed features, such as having the same generating period, similar spatio-temporal influence scope, duration, etc. [4].Compared with NRC, RC has a more deleterious effect on the travel time of urban commuters and travelers.Revealing recurrent congestion patterns can help city traffic managers improve signal controllers [5] and urban planners rebuild inferior traffic infrastructures [6], to further alleviate urban recurrent congestion [7].
With the adoption of location-based services (LBS), increasing amounts of multiple-source locating information can be obtained in a city.For example, an increasing number of taxi vehicles have been equipped with GPS modules, so large amounts of taxi GPS data can be collected.In China, taxis are part of public transit, with large numbers of vehicles travelling the urban road network at all times.For example, more than 13,000 taxis operate in Harbin, China, and most of the vehicles are equipped with a GPS module [8].With a sampling frequency of 30 s, almost 30 million GPS records are obtained every 24 h for managing and operating the taxis.These locating data can reflect the traffic state of the urban environment.Since the late 2000s, a broad range of transportation research has been conducted based on taxi trajectory data, such as building a taxi recommender system [9], predicting link and route travel time [10], and detecting urban traffic congestion [11].In contrast to traditional traffic detectors, such as loops, microwaves, and video detectors, using taxi GPS data has the obvious advantages of mobility, extensive range coverage, and being inexpensive [12].Therefore, using taxi GPS trajectory data to study urban RC patterns is reasonable and valid, and the method can be widely applied at a lower cost in comparison with methods based on traditional traffic detectors.
Many existing studies have been completed on urban congestion evolution patterns, and most were based on microscopic traffic flow models, including a cell transmission model (CTM) [13] and a car-following model [14].For example, Zhang et al. [15] proposed a bi-level programming model based on CTM to investigate the spatio-temporal distribution of traffic flow and the location of variable message signs (VMS) under non-recurrent congestion conditions.Chu et al. [16] proposed a modified CTM (MCTM) to depict the temporal-spatial evolution of traffic congestion on urban freeways.Yang et al. [17] used the MCTM model to investigate the traffic congestion formation and diffusion process due to road capacity drop-down after traffic incidents.Additionally, some studies used car-following models to demonstrate urban congestion evolution patterns.For example, Chen et al. [18] presented a behavioral car-following model based on empirical trajectory data that was able to reproduce spontaneous congestion formation and the ensuing propagation of stop-and-go waves in traffic.Zhu et al. [19] used an enhanced car-following model to estimate delay and emissions for urban signalized intersections.Papathanasopoulou and Antoniou [20] presented an innovative methodological framework based on a data-driven approach to estimate car-following models, which could be used for measuring congestion evolution patterns.Besides, some studies proposed a congestion pattern measuring model based on Dynamic Bayesian Network [21] and other machine learning methods.
These studies have provided valuable insights into urban congestion patterns.However, few studies have focused on urban recurrent congestion patterns.Compared with NRC, RC has more regular patterns, such as fixed periodicity, frequency, and fixed co-occurrence.Additionally, most existing studies used the microscopic traffic flow model to depict urban congestion evolution patterns.These studies simulated congestion propagation based on the theoretical model and hypothetical two-way grid network, which complicated the application of these theories and methods to real life scenarios.In addition, to the best of our knowledge, few studies researched recurrent congestion evolution patterns on a macroscopic level.Therefore, we modeled urban recurrent congestion patterns based on taxi GPS trajectory data at a macroscopic level.
The primary purpose of this study is to present an effective method to explore the RC pattern in an urban area.We had three additional auxiliary goals: to develop an urban congestion detection method based on GPS data at the grid level, to discover the RC area, and to illustrate the RC patterns.
The remainder of this paper is organized as follows: Section 2 introduces GPS trajectory data and the study area with divided grids, and also presents the main methods, including detecting grid congestion, discovering the RC area, and measuring the RC evolution patterns.Section 3 presents the experiment for the main methods and discusses the results, and Section 4 provides our conclusions.

Data Collection
Generally speaking, taxi GPS data contain information such as GPS device identification (ID) including taxi ID, location information (i.e., latitude and longitude), timestamp, instantaneous velocity, and direction [22].In this paper, taxi ID, latitude, longitude, timestamp, and instantaneous velocity were required for measuring the taxi trajectory pattern.In this section, using Harbin taxi GPS data as an example, details about the data are as follows.
Due to the high latitude location, the winter in Harbin lasts from November to March, for a total of 5 months [23].In winter, the snowy weather and snow accumulation on the road strongly affects city travelers, aggravating urban congestion.As such, we conducted our study during the winter of 2015.Since over 20 million GPS records are created in one day, 5 weeks corresponding to the 5 months were chosen as the sampling period.The GPS data's sampling rate is 30 s, and the total number of samples for the 25 days was 506 million.Table 1 illustrates a sample Harbin taxi GPS record.In this paper, oracle 11 g was used to manage the GPS data sets, and ArcGIS 10.0 was used for the GPS interface to manipulate and visualize the GPS data.In addition, the core of the analytical inspection and graphics were fully executed in the R 3.3.3environment.

Study Area
Taxi vehicles usually travel around the road network in an urban area.We chose a study area located within the urban area to ensure sufficient taxi GPS locating data would be obtained.Besides, the urban area should be divided into equal-sized grids, which is very simple and easy to implement [22].In this section, the Harbin urban area was divided into grids as an example.
Harbin is the capital of Heilongjiang Province and is located in Northeast China.Harbin was ranked as one of the most seriously congested cities in China in 2015 [24], leading us to choose Harbin as the study city.Additionally, the second-ring-road area, which includes Daxin Street, Beixin Street, and Nanzhi Road, was chosen as the specific study area in this paper (Figure 1a).The Harbin second-ring-road is 30 km long and covers 67.2 km 2 , which was sufficiently large to research recurrent urban patterns.
velocity were required for measuring the taxi trajectory pattern.In this section, using Harbin taxi GPS data as an example, details about the data are as follows.
Due to the high latitude location, the winter in Harbin lasts from November to March, for a total of 5 months [23].In winter, the snowy weather and snow accumulation on the road strongly affects city travelers, aggravating urban congestion.As such, we conducted our study during the winter of 2015.Since over 20 million GPS records are created in one day, 5 weeks corresponding to the 5 months were chosen as the sampling period.The GPS data's sampling rate is 30 s, and the total number of samples for the 25 days was 506 million.Table 1 illustrates a sample Harbin taxi GPS record.In this paper, oracle 11 g was used to manage the GPS data sets, and ArcGIS 10.0 was used for the GPS interface to manipulate and visualize the GPS data.In addition, the core of the analytical inspection and graphics were fully executed in the R 3.3.3environment.

Study Area
Taxi vehicles usually travel around the road network in an urban area.We chose a study area located within the urban area to ensure sufficient taxi GPS locating data would be obtained.Besides, the urban area should be divided into equal-sized grids, which is very simple and easy to implement [22].In this section, the Harbin urban area was divided into grids as an example.
Harbin is the capital of Heilongjiang Province and is located in Northeast China.Harbin was ranked as one of the most seriously congested cities in China in 2015 [24], leading us to choose Harbin as the study city.Additionally, the second-ring-road area, which includes Daxin Street, Beixin Street, and Nanzhi Road, was chosen as the specific study area in this paper (Figure 1a).The Harbin second-ring-road is 30 km long and covers 67.2 km 2 , which was sufficiently large to research recurrent urban patterns.This paper examined urban recurrent congestion evolution patterns on a macroscopic level.Thus, the study area was divided into many square grids.Generally, smaller square grids capture more urban congestion details.However, the sampling frequency of Harbin GPS data is 30 s, and the average velocity of a Harbin taxi is about 23 km/h [25].Therefore, on average, a taxi vehicle travels about 190 m in each sampling interval.As such, if the square grid size is too small, one grid cannot capture a taxi location point, even if the vehicle passed through the grid.Therefore, the gird size in this paper was determined to be 200 m.
The Harbin second-ring-road area ranges from 45.72 to 45.79 longitude and 126.59 to 126.7 latitude, which is nearly a 10 × 10 km area.We adjusted this range somewhat to 45.712555-45.799049longitude and 126.582505-26.706473,so that the study area could be divided into integer numbers by 200 m, and the divided grids would still cover the entire second-ring-road area.The divided grids are shown in Figure 1b.There were 2500 grids in total, and 1680 of them were used as the study grids, as shown by the shaded area in Figure 1.The urban area was divided into n × m square grids, and G i, j was defined as one specific grid, where i = 1, 2, . . ., n and j = 1, 2, . . ., m.In Figure 1b, n and m were both 50.A taxi GPS trajectory was defined as a set of continuous locating points: P 1 → P 2 → . . .→ P n , and each locating point was defined as: P DevID, Lat, Lng, TS, v , where DevID, Lat, Lng, TS, and v represent taxi ID, latitude, longitude, timestamp, and instantaneous velocity.Respectively (Table 1).The relationship between the divided grids and a set of GPS trajectories is shown in Figure 2.For the Time Interval [TS 1 , TS n ], this specific taxi vehicle travels via a set of grids including and G i + 2, j + 2 .If only one taxi trajectory exists, each of these 8 grids obtain only one trajectory in [TS 1 , TS n ], and only grids G i − 2, j − 1 and G i − 1, j − 2 obtain one trajectory in [TS 1 , TS 2 ], whereas the other 6 grids obtain none.
This paper examined urban recurrent congestion evolution patterns on a macroscopic level.Thus, the study area was divided into many square grids.Generally, smaller square grids capture more urban congestion details.However, the sampling frequency of Harbin GPS data is 30 s, and the average velocity of a Harbin taxi is about 23 km/h [25].Therefore, on average, a taxi vehicle travels about 190 m in each sampling interval.As such, if the square grid size is too small, one grid cannot capture a taxi location point, even if the vehicle passed through the grid.Therefore, the gird size in this paper was determined to be 200 m.
The Harbin second-ring-road area ranges from 45.72 to 45.79 longitude and 126.59 to 126.7 latitude, which is nearly a 10 × 10 km area.We adjusted this range somewhat to 45.712555-45.799049longitude and 126.582505-26.706473,so that the study area could be divided into integer numbers by 200 m, and the divided grids would still cover the entire second-ring-road area.The divided grids are shown in Figure 1b.There were 2500 grids in total, and 1680 of them were used as the study grids, as shown by the shaded area in Figure 1.The urban area was divided into nm  square grids, and , G i j was defined as one specific grid, where .In Figure 1b, n and m were both 50.A taxi GPS trajectory was defined as a set of continuous locating points: 12 ...
, and each locating point was defined as: , , , , P DevID Lat Lng TS v , where DevID , Lat , Lng , TS , and v represent taxi ID, latitude, longitude, timestamp, and instantaneous velocity.Respectively (Table 1).The relationship between the divided grids and a set of GPS trajectories is shown in Figure 2.For the Time Interval   1 , n TS TS , this specific taxi vehicle travels via a set of grids including Illustration of relationship between grids and GPS trajectory.

Detecting Grid Congestion
Each grid was considered a special traffic detector that can collect traffic flow information.In this study, the number of taxi GPS trajectories (N) and the mean velocity of these GPS trajectories (V) in a specific Time Interval ( TI ) were obtained for each grid.The mean velocity of the taxi GPS trajectories was calculated as follows:

Detecting Grid Congestion
Each grid was considered a special traffic detector that can collect traffic flow information.In this study, the number of taxi GPS trajectories (N) and the mean velocity of these GPS trajectories (V) in a specific Time Interval (TI) were obtained for each grid.The mean velocity of the taxi GPS trajectories was calculated as follows: where v i is the mean velocity of the ith trajectory, and i = 1, 2, . . ., N, which is calculated as follow: if there is only one locating point in the grid Combined with taxi GPS trajectory and divided grids, three kinds of grid are possible, which are described as follows: • Type 1: no road is included in this kind of grid, which means no GPS trajectory can be obtained in this grid.As shown in Figure 3 ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 5 of 18 where i v is the mean velocity of the i th trajectory, and 1, 2,..., iN  , which is calculated as follow: , if there is only one locating point in the grid 1 ... , else 1 Combined with taxi GPS trajectory and divided grids, three kinds of grid are possible, which are described as follows: • Type 1: no road is included in this kind of grid, which means no GPS trajectory can be obtained in this grid.As shown in Figure 3, grid .Including low-grade road, this kind of grid usually obtains a lower value of N.
Due to the randomness of the small number of vehicles, V varies considerably, as shown in    Due to the complicated distribution of different grid types, determining a fixed threshold to detect congestion in all kinds of grids was complicated.When congestion typically occurs, a large amount of lower speed vehicles gather in a specific area [5,26], creating an abnormal N and V pattern As such, grid congestion was detected by calculating the change in taxi trajectory pattern, including N and V .Therefore, a Grid State ( GS ) was defined as

 
, GS N V to reflect the taxi trajectory patterns in a specific grid.
The Euclidean Distance is used to measure the difference between the Grid State Three times the standard deviation value is usually used for filtering anomalous values in the field of statistics [27].This was the cut off used in this paper to measure abnormal Grid States (i.e., abnormal taxi trajectory patterns in a specific grid), as follows: where was calculated as follow: combined with the lower V information: A specific grid in time interval i TI was identified as being congested if Equations ( 4) and ( 6) were both satisfied.Due to the complicated distribution of different grid types, determining a fixed threshold to detect congestion in all kinds of grids was complicated.When congestion typically occurs, a large amount of lower speed vehicles gather in a specific area [5,26], creating an abnormal N and V pattern As such, grid congestion was detected by calculating the change in taxi trajectory pattern, including N and V. Therefore, a Grid State (GS) was defined as GS(N, V) to reflect the taxi trajectory patterns in a specific grid.
The Euclidean Distance is used to measure the difference between the Grid State GS TI 1 in TI 1 and the Grid State GS TI 2 , as follows: Three times the standard deviation value is usually used for filtering anomalous values in the field of statistics [27].This was the cut off used in this paper to measure abnormal Grid States (i.e., abnormal taxi trajectory patterns in a specific grid), as follows: where GS avg[TI 1 ,TI i−1 ] was calculated as follow: combined with the lower V information: A specific grid in time interval TI i was identified as being congested if Equations ( 4) and ( 6) were both satisfied.

RC Area Identification
After detecting the congestion of all grids, the RC area (i.e., a set of grids) is discovered in this section.
As shown in Figure 5, for a 5 × 5 grids area, after identifying grid congestion, the congestion frequency was obtained for a specific period, such as one week or one month.For example, grid G i, j was congested during 12 time intervals, so has a congestion frequency (c f ) of 12.The spatial adjacent grids that had greater congestion frequency (c f ) value were labeled recurrent congestion areas, which were considered during the clustering process [6].In Figure 5, a 5-grid area with greater congestion frequency was clustered as a recurrent congestion area, consisting of G i, j − 2 , G i, j − 1 , G i, j , G i, j + 1 and G i − 1, j + 1 .
For a 50 × 50 grids area, the exact number of clusters (i.e., RC areas) cannot be known in advance, and the shape of the cluster can be random and non-convex.As such, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) was applied to solve this problem.DBSCAN identifies random and non-convex shape clusters and determining the number of clusters in advance is not required [28,29].We used Liu's [30] customized DBSCAN algorithm.

RC Area Identification
After detecting the congestion of all grids, the RC area (i.e., a set of grids) is discovered in this section.
As shown in Figure 5, for a 55  grids area, after identifying grid congestion, the congestion frequency was obtained for a specific period, such as one week or one month.For example, grid , G i j was congested during 12 time intervals, so has a congestion frequency ( cf ) of 12.The spatial adjacent grids that had greater congestion frequency ( cf ) value were labeled recurrent congestion areas, which were considered during the clustering process [6].In Figure 5, a 5-grid area with greater congestion frequency was clustered as a recurrent congestion area, consisting of ,2 For a 50 50  grids area, the exact number of clusters (i.e., RC areas) cannot be known in advance, and the shape of the cluster can be random and non-convex.As such, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) was applied to solve this problem.DBSCAN identifies random and non-convex shape clusters and determining the number of clusters in advance is not required [28,29].We used Liu's [30] customized DBSCAN algorithm.

Measuring the RC Evolution Pattern
We first explain the process of one traffic jam. Figure 6 illustrates the five typical congestion states in the congestion process: Congestion Start (CS), Congestion End (CE), Congestion Peak (CP), Congestion Propagation (CPr), and Congestion Dissipation (CD).In Figure 6, the red-shaded grids represent congestion in the corresponding time interval.
The five congestion states are described as follows: • Congestion Start ( CS ) represents the beginning of a traffic jam in the specific RC area.

CS
T represents the start grids of this jam.In Figure 6, at least one grid is congested in 1 TI , and none of the 7 grids is congested in the 3 previous time intervals (i.e.,

CE
T is the end time of the jam, which corresponds to a single time interval.In Figure 6, at least one grid is congested in n TI , and none of the 7 grids are congested in the 3 latter time intervals,

Measuring the RC Evolution Pattern
We first explain the process of one traffic jam. Figure 6 illustrates the five typical congestion states in the congestion process: Congestion Start (CS), Congestion End (CE), Congestion Peak (CP), Congestion Propagation (CPr), and Congestion Dissipation (CD).In Figure 6, the red-shaded grids represent congestion in the corresponding time interval.
The five congestion states are described as follows: • Congestion Start (CS) represents the beginning of a traffic jam in the specific RC area.T CS represents the start time of a jam, which corresponds to a single time interval.The congested grids {G CS } in T CS represents the start grids of this jam.In Figure 6, at least one grid is congested in TI 1 , and none of the 7 grids is congested in the 3 previous time intervals (i.e., TI −2 , TI 1 and TI 0 ), meaning this jam started in TI 1 .

•
Congestion End (CE) is the end of a traffic jam in the specific RC area.T CE is the end time of the jam, which corresponds to a single time interval.In Figure 6, at least one grid is congested in TI n , and none of the 7 grids are congested in the 3 latter time intervals, TI n+1 , TI n+2 and TI n+3 , which means this jam ended in TI n .
• Congestion Peak (CP) is the peak with the maximum number of congested grids in a specific RC area.T CP is the peak time of a jam, which corresponds to at least one time interval.In Figure 6, all the 7 grids are congested in TI i , which means this jam reach peak in TI i .Notably, the particular jam reached a peak in several time intervals, thus the congestion peak contains all states from the first peak to the last.

•
Congestion Propagation (CPr) is the state between CS and CP.T CPr represents the propagating time of a jam, which corresponds to several time intervals between T CS and T CP .In Figure 6, this state lasts from TI 2 to TI i−1 .

•
Congestion Dissipation (CD) is the states between CP and CE.T CD represents the dissipating time for a jam that corresponds to several time intervals between T CP and T CD .In Figure 6, this state lasts from TI i+1 to TI n−1 .
For a long period such as one week or one month, there are a total of n traffic jams occurring in the RC area.The recurrent congestion spatial-temporal evolution pattern details are described in the following section.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 8 of 18 • Congestion Peak ( CP ) is the peak with the maximum number of congested grids in a specific RC area.

CP
T is the peak time of a jam, which corresponds to at least one time interval.In Figure 6, all the 7 grids are congested in i TI , which means this jam reach peak in i TI .Notably, the particular jam reached a peak in several time intervals, thus the congestion peak contains all states from the first peak to the last.

•
Congestion Propagation ( CPr ) is the state between CS and CP .In Figure 6, this state lasts from For a long period such as one week or one month, there are a total of n traffic jams occurring in the RC area.The recurrent congestion spatial-temporal evolution pattern details are described in the following section.

RC Temporal Evolution Pattern
Three temporal indicators of the RC are defined as follows: • RC End time RCD T is calculated as: • RC time

RC
T is calculated as:

RC Temporal Evolution Pattern
Three temporal indicators of the RC are defined as follows: • RC Start time T RCS is calculated as: • RC End time T RCD is calculated as: • RC time T RC is calculated as:

RC Spatial Evolution Pattern
Two spatial indicators of the RC are as follows: • RC Start Grid is the set of congested grids in T CS of all traffic jams.

•
RC Key Grid is a grid that propagates to the adjacent grids with most frequency.For adjacent grids G 1 and G 2 , if grid G 1 is congested and G 2 is not congested in TI 1 , and then G 2 is congested in TI 2 , it means that traffic jam propagate from G 1 to G 2 for one time.In Figure 6, the traffic jam propagates from G i, j to G i, j + 1 and G i + 1, j in TI 2 .

Detecting Grid Congestion
The detecting method was applied for the three types of grids, including G 12, 39 (type 2), G 38, 40 (type 3) and G 9, 46 (type 4).One-week data, from 16-22 November 2015, were used to verify this detecting method.The time interval was set to 10 min, which adequately reflects the change in traffic state in each grid, to ensure each grid obtains enough data to apply the methods [6].In Table 2, Equations ( 4) and ( 6) were both satisfied from 4:50 to 6:20 p.m., for a total of nine time intervals, which means grid G 12, 39 was congested during these intervals.As shown in Figure 7a, the N and V distrbutions appear to be abnoraml in these time intervals, with greater N and lower V.
The N and V distributions present regularly, as shown in Figure 8a.The congestion of grid G 12, 39 occurred during the moring and evening peak hours of each day from 16-22 December 2015, demonstrating periodicity.
In Table 3, Equation ( 4) and ( 6) were both satisfied from 07:40 to 08:10 a.m., for a total if three time intervals, which means grid G 38, 40 ws congested during these intervals.As shown in Figure 7b, the N and V distrbutions appear to be abnormal in these time intervals, with greater N and lower V.The N and V distrbutions of grid G 38, 40 present regularly, as shown in Figure 8b.The congestion of grid G 38, 40 occurred at the peak morning hours of each day for 16-22 December 2015, demonstrating periodicity.
In Table 4, Equations ( 4) and ( 6) were both satisfied from 6:50 to 7:20 p.m., for a total of three time intervals, which means grid G 9, 46 was congested during these intervals.As shown in Figure 7c, the N and V distributions appear to be abnormal in these time intervals, with greater N and lower V.

Measuring the RC Evolution Pattern
The data from the 25 days were applied in the customized DBSCAN algorithm [30].A total 13 recurrent congestion areas were discovered in Harbin second-ring road area at morning peak hours, which are shown in Figure 9.The threshold of congestion frequency was set to 375, which means that all 13 recurrent congestion grids were detected to be congested in more than 15 time intervals for each day on average.That is to say, the RC T values of all the recurrent congestion were more than 150 min.The 13 RC areas included 62 grids, 4.8 grids on average.
The details of spatial evolution pattern of the recurrent congestion occur in the 13 RC areas are illustrated in Table 5.

Measuring the RC Evolution Pattern
The data from the 25 days were applied in the customized DBSCAN algorithm [30].A total 13 recurrent congestion areas were discovered in Harbin second-ring road area at morning peak hours, which are shown in Figure 9.The threshold of congestion frequency was set to 375, which means that all 13 recurrent congestion grids were detected to be congested in more than 15 time intervals for each day on average.That is to say, the T RC values of all the recurrent congestion were more than 150 min.The 13 RC areas included 62 grids, 4.8 grids on average.
The details of spatial evolution pattern of the recurrent congestion occur in the 13 RC areas are illustrated in Table 5.The temporal evolution pattern of the recurrent congestion occur in the 13 RC areas are illustrated in Figure 11.
• Some remarkable characteristics of the RC temporal evolution patterns were observed during the winter in Harbin at peak morning hours.The start time of most RC is earlier than morning peak hours.Generally speaking, the urban morning peak hours were from 7:00 to 9:00 a.m., with 77% (N = 10) of the RC appearing earlier than 7:00 a.m.On average, the RC start time was 6:45 a.m.The earliest RC start time was observed in Cluster 6 at 6:00 a.m.additionally, the end times for all RC were later than the morning peak hours.All the RC disappeared after 9:00 a.m., and 92% (N = 12) disappeared after 10:00 a.m.The RC that occurred in Clusters 4 and 10 did not end until afternoon.We also noted that the duration of all RC was longer than the morning peak hours.All the RC lasted for more than two hours.The average congestion time for all RC was 4.7 h.
The temporal evolution pattern of the recurrent congestion occur the 13 RC areas are illustrated in Figure 11.
• Some remarkable characteristics of the RC temporal evolution patterns were observed during the winter in Harbin at peak morning hours.The start time of most RC is earlier than morning peak hours.Generally speaking, the urban morning peak hours were from 7:00 to 9:00 a.m., with 77% (N = 10) of the RC appearing earlier than 7:00 a.m.On average, the RC start time was 6:45 a.m.The earliest RC start time was observed in Cluster 6 at 6:00 a.m.additionally, the end times for all RC were later than the morning peak hours.All the RC disappeared after 9:00 a.m., and 92% (N = 12) disappeared after 10:00 a.m.The RC that occurred in Clusters 4 and 10 did not end until afternoon.We also noted that the duration of all RC was longer than the morning peak hours.All the RC lasted for more than two hours.The average congestion time for all RC was 4.7 h.

Discussion
One of the aims of this paper was to determine how understanding the RC evolution patterns would help operators and traffic planners to analyze causes and create alleviating strategies.Due to the complicated situation of each RC area, the causal factors and alleviating strategies for all 13 RC areas are different.Therefore, a typical RC area, Cluster 4, was selected as an example.
As shown in Figure 12, Cluster 4 has three grids, along with Shitoudao Street.The grid that most frequently propagates to the adjacent grids (i.e., the RC Key Grid) was 16,17 G .

Discussion
One of the aims of this paper was to determine how understanding the RC evolution patterns would help operators and traffic planners to analyze causes and create alleviating strategies.Due to the complicated situation of each RC area, the causal factors and alleviating strategies for all 13 RC areas are different.Therefore, a typical RC area, Cluster 4, was selected as an example.
As shown in Figure 12

Discussion
One of the aims of this paper was to determine how understanding the RC evolution patterns would help operators and traffic planners to analyze causes and create alleviating strategies.Due to the complicated situation of each RC area, the causal factors and alleviating strategies for all 13 RC areas are different.Therefore, a typical RC area, Cluster 4, was selected as an example.
As shown in Figure 12    In addition, administration and supervision should be strengthened to reduce the temporary roadside parking.Also, Shitoudao Street (from east to west) should be rebuilt increase traffic capacity.

Conclusions
Taxi GPS trajectories are valuable resources that can be used to measure urban recurrent congestion and evolution patterns.Based on taxi GPS trajectories, we proposed a novel stepwise method to measure the recurrent urban congestion evolution patterns.We also completed a case study in the urban Harbin area, using real GPS data and digital maps rather than simulation data.In understanding the RC evolution patterns, the city traffic manager can visualize recurrent urban congestion at the macroscopic level.The main contributions of this paper can be summarized as follows: (1) The main method proposed is based on taxi trajectory data, which has obvious advantages including extensive coverage and lower cost in comparison with the data collected from traditional traffic detectors, such as loops, microwaves, and video detectors.(2) We created the grid congestion detection method using multivariate trajectory pattern analysis, including the number of taxi trajectories and their average velocity.Then, a 3 − σ rule was used to detect abnormal patterns in the grid.Combined with the lower average velocity information, a specific grid is identified as being congested.(3) An integrated stepwise method is proposed to measure urban RC evolution patterns at the macroscopic level, including detecting grid congestion, determining RC areas, and measuring the RC evolution patterns.The method provides a new solution for Intelligent Transportation System application.(4) Based on the proposed methods, a case study was completed in Harbin, China with real GPS data and a digital map, rather than simulation data, to evaluate the stepwise method.A total of 13 RC areas were detected that occur in the Harbin second ring road area in winter during peak morning hours.In addition, some remarkable spatial-temporal evolution pattern characteristics of RC were revealed in this paper.
Further studies may expand the data source to other GPS-equipped vehicles.In addition, by using the results of the RC evolution pattern, congestion prediction method could be proposed.

Figure 1 .
Figure 1.Illustration of the study area: (a) The second-ring-road in Harbin, China, and (b) The study area divided into 50 50  grids.

Figure 1 .
Figure 1.Illustration of the study area: (a) The second-ring-road in Harbin, China, and (b) The study area divided into 50 × 50 grids.

Figure 2 .
Figure 2. Illustration of relationship between grids and GPS trajectory.
, grid G 14, 26 is this type.A total of 255 grids of this kind are shown in Figure 3. • Type 2: at least one intersection with signal control is included in this kind of grid, such as G 12, 39 .Including high-grade road, this kind of grid usually has greater value of N and V. Due to the signal control, the value of V varies minimally, as shown in Figure 4. • Type 3: at least one intersection with no signal control is included in this kind of grid, such as grid G 38, 4 .Including low-grade road, this kind of grid usually obtains a lower value of N. Due to the randomness of the small number of vehicles, V varies considerably, as shown in Figure 4. • Type 4: no intersection is included in this kind of grid, as per grid G 9, 46 .Compared with the other 2 kinds of grids, the variation in N and V vary considerably, as shown in Figure 4.
Figure 4. • Type 4: no intersection is included in this kind of grid, as per grid 9, 46 G .Compared with the other 2 kinds of grids, the variation in N and V vary considerably, as shown in Figure 4.

Figure 3 .
Figure 3. Illustration of three kinds of grid.Figure 3. Illustration of three kinds of grid.

Figure 3 .
Figure 3. Illustration of three kinds of grid.Figure 3. Illustration of three kinds of grid.

Figure 4 .
Figure 4.The N -V scatterplot of different grid types (the data are collected on 21 December 2015, and the time interval is 10 min of nature time.).

Figure 4 .
Figure 4.The N-V scatterplot of different grid types (the data are collected on 21 December 2015, and the time interval is 10 min of nature time.).

Figure 5 .
Figure 5.The example of congestion frequency of 55  grids.
time of a jam, which corresponds to a single time interval.The congested grids   CS G in

2 TI  , 1 TI and 0 TI
), meaning this jam started in 1 TI .•CongestionEnd ( CE ) is the end of a traffic jam in the specific RC area.
 , which means this jam ended in n TI .
CD ) is the states between CP and CE .

Figure 6 .
Figure 6.A typical congestion process occurring in a seven-grid area.

Figure 6 .
Figure 6.A typical congestion process occurring in a seven-grid area.

•
Some remarkable characteristics of the RC spatial evolution patterns were observed during the winter in Harbin at morning peak hours.Firstly, the RC spatial distribution is extremely uneven.In the second-rind road area of Harbin, the 13 RC areas are concentrated in two main regions, as shown in Figure 10.The first area is a two-kilometer radius circular region, centered on Harbin Train Station.This region is in the northwest part of the urban Harbin area, containing eight RC areas.The second area is also a two-kilometer radius circular region, centered on the Heilongjiang Provincial Government Office Building.This region is in the southern part of the Harbin urban area, containing 4 RC areas.Secondly, the RC areas are very different from each other.The RC range varied considerably.For example, Cluster 12 included 13 grids, with a value almost eight times that of those found in Clusters 2, 3, 7, and 11.The Start Grid (or Key Grid) types are different.For example, the Start Grid in Cluster 12 (G 30, 35 ) is type 2, whereas it is type 3 in Cluster 7 (G 20, 24 ).ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 13 of 18

Figure 9 .
Figure 9.The 13 recurrent congestion areas in the Harbin second-ring road area.• Some remarkable characteristics of the RC spatial evolution patterns were observed during the winter in Harbin at morning peak hours.Firstly, the RC spatial distribution is extremely uneven.In the second-rind road area of Harbin, the 13 RC areas are concentrated in two main regions, as shown in Figure 10.The first area is a two-kilometer radius circular region, centered on Harbin Train Station.This region is in the northwest part of the urban Harbin area, containing eight RC areas.The second area is also a two-kilometer radius circular region, centered on the Heilongjiang Provincial Government Office Building.This region is in the southern part of the Harbin urban area, containing 4 RC areas.Secondly, the RC areas are very different from each other.The RC range varied considerably.For example, Cluster 12 included

Figure 9 .
Figure 9.The 13 recurrent congestion areas in the Harbin second-ring road area.

Figure 10 .
Figure 10.The spatial distribution of the recurrent congestion (RC) areas.Figure 10.The spatial distribution of the recurrent congestion (RC) areas.

Figure 10 .
Figure 10.The spatial distribution of the recurrent congestion (RC) areas.Figure 10.The spatial distribution of the recurrent congestion (RC) areas.

Figure 11 .
Figure 11.The temporal evolution pattern of the 13 recurrent congestion areas.

Figure 11 .
Figure 11.The temporal evolution pattern of the 13 recurrent congestion areas.

18 Figure 11 .
Figure 11.The temporal evolution pattern of the 13 recurrent congestion areas.
, Maimai, and Yimian Streets are in these three grids.The RC that occurs in this area usually starts at 7:30 a.m.The start grid is 16Street.The grid that most frequently propagates to the adjacent grids (i.e., the RC Key Grid) was 16,17 G .

Figure 12 .
Figure 12.The three grids of Cluster 4.Figure 12.The three grids of Cluster 4.

Figure 12 .
Figure 12.The three grids of Cluster 4.Figure 12.The three grids of Cluster 4.

3. 3 . 1 .
RC Cause Analysis Harbin No. 1 Middle School and Harbin No. 1 Hospital are in this RC area, which place high demands on the roadway infrastructure.The school and hospital both open at 8:00 a.m., and heavy traffic travels along with Shitoudao Street from east to west, where drivers must turn right to reach the entrance of the school and hospital.The entrances are in G 16, 17 on Diduan Street, which is a north-south one-way street.Shitoudao Street is the only way to reach the entrances, which has only one east-west lane.Therefore, the RC usually first appears in G 16, 17 when heavy traffic exceeds the road capacity.Then the RC spreads to upstream areas, such as G 17, 17 and G 18, 17 ).3.3.2.RC Alleviating Strategies Due to the lack of parking spaces for Harbin No.1 Middle School and Harbin No. 1 Hospital, many roadside temporary parking spaces exist along Shitoudao and Diduan Streets.Parking lots should be constructed for Harbin No. 1 Middle School and Harbin No. 1 Hospital to meet the parking demands.

Table 1 .
A typical global positioning system (GPS) taxi records.

Table 1 .
A typical global positioning system (GPS) taxi records.

Table 2 .
Sample of the calculation results for the type 2 grid.

Table 3 .
Sample of the calculation results of the type 3 grid.

Table 4 .
Sample of the calculation results for the type 4 grid.

Table 5 .
The details of spatial evolution pattern of all recurrent congestion.

Table 5 .
The details of spatial evolution pattern of all recurrent congestion.