Grid Mapping for Spatial Pattern Analyses of Recurrent Urban Traffic Congestion Based on Taxi Gps Sensing Data

Traffic congestion is one of the most serious problems that impact urban transportation efficiency, especially in big cities. Identifying traffic congestion locations and occurring patterns is a prerequisite for urban transportation managers in order to take proper countermeasures for mitigating traffic congestion. In this study, the historical GPS sensing data of about 12,000 taxi floating cars in Beijing were used for pattern analyses of recurrent traffic congestion based on the grid mapping method. Through the use of ArcGIS software, 2D and 3D maps of the road network congestion were generated for traffic congestion pattern visualization. The study results showed that three types of traffic congestion patterns were identified, namely: point type, stemming from insufficient capacities at the nodes of the road network; line type, caused by high traffic demand or bottleneck issues in the road segments; and region type, resulting from multiple high-demand expressways merging and connecting to each other. The study illustrated that the proposed method would be effective for discovering traffic congestion locations and patterns and helpful for decision makers to take corresponding traffic engineering countermeasures in order to relieve the urban traffic congestion issues.


Introduction
As contemporary GPS sensor technology enables us to track vehicle trajectories in a traffic network, it provides an alternative way to monitor traffic operation performance in a large traffic network with low cost but high efficiency.Especially, taxi floating car data (FCD) collected from installed GPS equipment presents an opportunity for the governments and scholars to detect and describe traffic congestion occurrence locations and patterns in the whole traffic network, which were previously difficult to identify due to the lack of traffic data [1].
FCD technology is a new approach for gathering traffic information, which is one of the most significant aspects in the field of Intelligent Transportation Systems (ITS).In essence, the taxi FCD data are a random sample set for the entire urban road network.There are thousands of taxis driving every day on the roads of large cities, such as New York, Boston, and Beijing [2].More than half of the taxis have been equipped with GPS data recorders, and the majority of them work for the whole day.The taxis' FCD technology has advantages in three aspects.Firstly, real-time traffic data can be automatically collected and sent to a processing center, which can facilitate the extraction of information about traffic conditions.Secondly, the entire road network can be treated as a collection of monitored areas.However, neither fixed sensor surveillances nor loop detectors can take charge of a large scale roadway network [3].On the contrary, owing to the flexibility and magnitude of floating vehicles, it is possible to monitor the majority of roads in a roadway network.Finally, high-quality data can be collected with a minimum cost via GPS equipped in vehicles [4].
Previous studies have successfully applied GPS sensor devices to survey traffic for individual [5], freeway [6], and signalized intersections [7].They are also an important data collector for location-based services (LBS) and deliver traffic data for vehicle navigation services [8].The research and application of FCD can be summarized into three levels, namely macro, meso, and micro levels.At the macro level, FCD analysis results can be used to evaluate the effectiveness of the traffic planning implementation and assist city planners in identifying problems that were not expected during the planning stages [9].At the meso level, the characteristics of travel speed can reflect the features of road networks [10], and the tracks of floating vehicles can even be used to discover driving route distributions and update the network structures or attributes [11].At the micro level, FCD can be applied for urban traffic incident detection, segment demand-capacity analyses, intersection delay, and so on.For example, through monitoring traffic flow from different approaches of signalized intersections, the real-time traffic operation status at intersections can be established [12].
From the aspect of traffic management, the taxis' GPS data displayed a significant value for traffic operation performance assessment, particularly for traffic congestion identification.There are two types of traffic congestion in urban road networks: recurrent congestion and non-recurrent congestion.Non-recurrent congestion occurs owing to random events, such as traffic crashes, roadside breakdowns, or other incidents.However, recurrent congestion regularly takes place at fixed locations once traffic demand is higher than capacity [13].If the recurrent congestions' locations, times, and intensities can be identified, the urban transportation managers can take proper countermeasures according to the congestion occurring patterns.Drivers can also benefit from the knowledge to avoid the frequently delayed routes or regions in order to reduce travel delay.
The traditional methods of traffic congestion identification are based on fixed sensors data and traffic flow analyses of the relationship among highway demands, free-flow speeds, and capacities [14].Compared with traditional traffic detectors (e.g., loop detectors, traffic cameras, remote transportation microwave sensors), the floating car has several obvious advantages such as lower cost, wider coverage, and higher mobility.Thus, the FCD-based method has been paid considerable attention in non-recurrent congestion identification during the last decade.A temporary incident can be detected by analyzing floating vehicles' travel time of segments and acceleration noise through statistical models [15].An abnormal traffic condition can also be reflected by the traveling speed in the road segments [16], with the average speed of floating vehicles in blocked or congested segments usually below 10 km/h [17].By employing speed and temporal features of the segments identified on the road network, unique traffic patterns on each road can be characterized and the traffic states can be described on a segment-by-segment basis [18].The typical algorithms that have been explored for incident or non-recurrent congestion detection include density-based clustering [19], support vector machine modeling [20], neural network-based modeling [21], etc.
FCD analyses gives substantial knowledge about traffic operation patterns of urban road networks [22].However, few studies focused on recurrent congestion estimation using the FCD.Actually, the historical taxi FCD are especially appropriate for assessing the level of traffic congestion in the city [23] and scanning the spatial and temporal patterns of traffic congestion patterns in urban road networks because of the large scale samples in short time periods, which is helpful in being able to provide a whole and better picture of traffic situation in the traffic network.In this study, we processed taxi FCD starting with the elimination of implausible (e.g., mismatched GPS positions) and irrelevant (e.g., from taxis waiting for customers) data.Then, the trajectories floating vehicles were used to generate road networks based on the method of grid mapping and a map matching process was conducted through matching the FCD data to the cells of the map.Further, the method of Density-Based Spatial Clustering of Applications with Noise was developed to fit traffic grid modeling analyses and cluster the congested cells.Finally, the GIS visualization technology was used for traffic congestion pattern analyses.

Taxi GPS Trajectories Data Preprocessing
The data used in this paper was derived from the taxi-FCD system, which has been installed as the standard taxi GPS equipment in Beijing.Through the system, the data were sent to the FCD-server of the taxi company's headquarters.The time frequency of the taxi positioning collection was limited by the bandwidth of the communication channel and varied between 10 and 120 s, depending on the status of the individual taxis.In every 10 min, the data collected in each taxi were uploaded to the remote servers of the headquarters.
The data used in this study include the spatial trajectories of 12,000 taxis.It accounted for 1.3% of all motor vehicles in Beijing.Due to the characteristics of high travel frequency and long operation time, the taxi volume actually occupied up to 10 percent of the whole traffic volume in Beijing [24].The original data were encoded by ASCII and stored in text files.The total size of the data was 15.1 GB, which was segmented into 4884 sub TXT files by 2-min time slices.The data were collected during the seven days from November 1st to November 7th, 2012.
The obtained data were saved in the format of TXT files, in which each row represents the data by recording time and the column represents the data by attributes.Table 1 lists the detailed data attributes of the FCD system.The attributes include vehicle number, taxi operation state, GPS recording time, longitude, latitude, vehicle speed, and GPS state.Especially, longitude and latitude can reflect the taxi positions, which are the basic information used to track the taxi trajectories based on the Worldwide Geodetic System 1984 (WGS84) coordinate system.The original data's quality depends on a number of issues.The typical GPS errors may be caused by either blockage of the GPS signal or hardware/software bugs during the data collection process [25].Previous research also found that drivers have different performances when taxies are in different operation statuses.When carrying passengers, drivers tend to drive faster and choose an optimal travel path.Inversely, drivers would drive slowly and choose a path in the higher travel-demand region to look for potential passengers [26].Figure 1 shows that compared to the average speeds of taxis without passengers, the speeds of taxis carrying passengers were consistently higher.Therefore, the data of taxis in operation status is more applicable for the traffic analyses.Using the data, Figure 2 shows the proportion and number of taxis by the different operation statuses.The blue line represents the taxi proportion with passengers in the total number of taxis at different times.The results indicate that the proportion reached the peaks at 8:00 a.m. in the morning and 18:00 p.m. in the afternoon at workdays, but on weekends, the proportion reached a peak at 11:00 a.m. and was largely continuous.Actually, travel peaks on weekdays mainly occur during the commuting periods of the morning peak and evening peak while the travel peaks on weekends are different from those on weekdays; on weekends, the morning peak occurs from 10:00 a.m. to 12:00 a.m.The proportion was always higher than 50%, indicating that at least 6000 taxis were carrying passengers at the same time during the whole day.It should be noted that even when the taxis are in operation status (serving passenger = 1), their behavior could be different from private passenger cars.Especially, during periods surrounding the arrival at destinations of passengers or picking up passengers at their points of origin, the taxis' speed could be continuously lower than the adjacent speed of passenger cars.In this study, if a taxi's speed is constantly close to zero in more than three consecutive observations while other vehicle's speeds are above 10 km/h, the observations would be treated as outliers and removed from analyses.
Sustainability 2017, 9, 533 4 of 15 was always higher than 50%, indicating that at least 6000 taxis were carrying passengers at the same time during the whole day.It should be noted that even when the taxis are in operation status (serving passenger = 1), their behavior could be different from private passenger cars.Especially, during periods surrounding the arrival at destinations of passengers or picking up passengers at their points of origin, the taxis' speed could be continuously lower than the adjacent speed of passenger cars.In this study, if a taxi's speed is constantly close to zero in more than three consecutive observations while other vehicle's speeds are above 10 km/h, the observations would be treated as outliers and removed from analyses.The ideal situation of data acquisition frequency should be stable and concentrated.However, some errors may inevitably occur in the process of data collection and transformation due to the instability of the recording medium.Thus, it is necessary to analyze the data acquisition frequency to prepare for following analyses.The vehicle number was used as the key factor to construct a dictionary searching method.As Figure 3 shows, the taxi sampling interval of 10-15 s accounted for about 57% of the sample.The other sampling intervals were mostly more than 55 s.When the intervals were above 50 s, it indicates that the taxi service status was changing, which cannot reflect the real traffic conditions at the time.On the other hand, only 19% of the samples have a taxi sampling interval less than 10 s.Therefore, the 10-15 s sampling intervals were mostly available and reliable for the traffic analyses.was always higher than 50%, indicating that at least 6000 taxis were carrying passengers at the same time during the whole day.It should be noted that even when the taxis are in operation status (serving passenger = 1), their behavior could be different from private passenger cars.Especially, during periods surrounding the arrival at destinations of passengers or picking up passengers at their points of origin, the taxis' speed could be continuously lower than the adjacent speed of passenger cars.In this study, if a taxi's speed is constantly close to zero in more than three consecutive observations while other vehicle's speeds are above 10 km/h, the observations would be treated as outliers and removed from analyses.The ideal situation of data acquisition frequency should be stable and concentrated.However, some errors may inevitably occur in the process of data collection and transformation due to the instability of the recording medium.Thus, it is necessary to analyze the data acquisition frequency to prepare for following analyses.The vehicle number was used as the key factor to construct a dictionary searching method.As Figure 3 shows, the taxi sampling interval of 10-15 s accounted for about 57% of the sample.The other sampling intervals were mostly more than 55 s.When the intervals were above 50 s, it indicates that the taxi service status was changing, which cannot reflect the real traffic conditions at the time.On the other hand, only 19% of the samples have a taxi sampling interval less than 10 s.Therefore, the 10-15 s sampling intervals were mostly available and reliable for the traffic analyses.The ideal situation of data acquisition frequency should be stable and concentrated.However, some errors may inevitably occur in the process of data collection and transformation due to the instability of the recording medium.Thus, it is necessary to analyze the data acquisition frequency to prepare for following analyses.The vehicle number was used as the key factor to construct a dictionary searching method.As Figure 3 shows, the taxi sampling interval of 10-15 s accounted for about 57% of the sample.The other sampling intervals were mostly more than 55 s.When the intervals were above 50 s, it indicates that the taxi service status was changing, which cannot reflect the real traffic conditions at the time.On the other hand, only 19% of the samples have a taxi sampling interval less than 10 s.Therefore, the 10-15 s sampling intervals were mostly available and reliable for the traffic analyses.
Based on the above analyses, the process of data cleansing was conducted by the following five steps: step 1 is to remove the data which contains invalid values owing to GPS errors (97% of the total data left); step 2 is to keep data that indicate taxis that are carrying passengers in operation status (39% of data left); step 3 is to eliminate the data in which the taxi speeds are out of the range from 0 to 100 km/h (34.3% of the total data left); step 4 is to remove the data in which the taxis' trajectories are beyond the traffic analysis region (we focused on the road network in the core area of Beijing, namely Based on the above analyses, the process of data cleansing was conducted by the following five steps: step 1 is to remove the data which contains invalid values owing to GPS errors (97% of the total data left); step 2 is to keep data that indicate taxis that are carrying passengers in operation status (39% of data left); step 3 is to eliminate the data in which the taxi speeds are out of the range from 0 to 100 km/h (34.3% of the total data left); step 4 is to remove the data in which the taxis' trajectories are beyond the traffic analysis region (we focused on the road network in the core area of Beijing, namely the region with the fifth ring road Five Rings) (29.7% of the total data left); step 5 is to select the data with sample rates ranging 10-15 s.Finally, 25.2% of the original data was left after completing the five aforementioned steps, resulting in 48.76 million records in total.

Grid Mapping and Cell Size Identification
The traffic grid model in this research comes from the theory of city management grid modeling [27].The city management area can be divided into grids with certain sizes either statically (the cell size is fixed for a long term) or dynamically (the cell size is varied and adjusted corresponding to different purposes of city management).The facilities within the grids will then be managed and assigned into categories so as to improve management effectiveness.Since we focused on a traffic operation performance analysis of steady road facility systems, the grid mapping and cell size should be static for uncovering recurring congestion patterns in road networks.Although linearly assigning each taxi's trajectory into road networks for traffic analyses was popular, the traffic grid models have their own advantages in two aspects.First, the sample rates are allowed to vary in a certain range and grid modeling does not require the complicated algorithms for road matching that subtly involve accuracy issues.Second, traffic grid models can be used to generalize a cell-based road network map without the need of high-quality vector GIS map.
In this study, we divided the wide-ranged urban road network into specified sizes of grids which contain massive amounts of floating car data.In our study, the road network in the core traffic area of Beijing was segmented into cells of the same size by a grid.However, choosing an appropriate cell size according to regional road properties and float car data was essential to ensure enough data for every cell while meanwhile avoiding the double count of trips.On one hand, the road attributes including expressway, arterials, sub-arterials, and collector streets in the grid should be reflected in the cell level; if the cell size was too large, the cells may contain too many road properties and the characteristics of different road networks cannot be reflected.On the other hand, if the cell size was too small, it may cause problems of offsets and leaps for continuous trajectories owing to the existing systematic uncertainties in the collection process; meanwhile, the taxi trajectory samples would be insufficient for some cells during some time periods, thus, it would make the calculations of travel speed and trajectory positions difficult.
Specifically, for the Beijing road network, if choosing a large size cell, like over 500 hundred meters, a cell would cover two parallel freeways with the same directions in a cell, which would cause the method to fail because the traffic conditions on different freeway segments cannot be distinguished.If we choose a small size cell, like tens of meters, we may not have highly dense GPS

Grid Mapping and Cell Size Identification
The traffic grid model in this research comes from the theory of city management grid modeling [27].The city management area can be divided into grids with certain sizes either statically (the cell size is fixed for a long term) or dynamically (the cell size is varied and adjusted corresponding to different purposes of city management).The facilities within the grids will then be managed and assigned into categories so as to improve management effectiveness.Since we focused on a traffic operation performance analysis of steady road facility systems, the grid mapping and cell size should be static for uncovering recurring congestion patterns in road networks.Although linearly assigning each taxi's trajectory into road networks for traffic analyses was popular, the traffic grid models have their own advantages in two aspects.First, the sample rates are allowed to vary in a certain range and grid modeling does not require the complicated algorithms for road matching that subtly involve accuracy issues.Second, traffic grid models can be used to generalize a cell-based road network map without the need of high-quality vector GIS map.
In this study, we divided the wide-ranged urban road network into specified sizes of grids which contain massive amounts of floating car data.In our study, the road network in the core traffic area of Beijing was segmented into cells of the same size by a grid.However, choosing an appropriate cell size according to regional road properties and float car data was essential to ensure enough data for every cell while meanwhile avoiding the double count of trips.On one hand, the road attributes including expressway, arterials, sub-arterials, and collector streets in the grid should be reflected in the cell level; if the cell size was too large, the cells may contain too many road properties and the characteristics of different road networks cannot be reflected.On the other hand, if the cell size was too small, it may cause problems of offsets and leaps for continuous trajectories owing to the existing systematic uncertainties in the collection process; meanwhile, the taxi trajectory samples would be insufficient for some cells during some time periods, thus, it would make the calculations of travel speed and trajectory positions difficult.
Specifically, for the Beijing road network, if choosing a large size cell, like over 500 hundred meters, a cell would cover two parallel freeways with the same directions in a cell, which would cause the method to fail because the traffic conditions on different freeway segments cannot be distinguished.If we choose a small size cell, like tens of meters, we may not have highly dense GPS data falling into cells during a 10 min time interval, and the cell size would be smaller than the width of freeways.Additionally, the sampling rates of FCD applied in this study are 10-15 s, which means that if the cell size was chosen as tens of meters, one vehicle's speed measure points would not continuously fall into adjacent cells.Balancing the various factors, a cell size of 100 × 100 m 2 was recommend for the Beijing road network's grid modeling.Accordingly, the research area was divided into a grid of Sustainability 2017, 9, 533 6 of 15 300 × 300 cells.Thus, each segment in the major road network can be expressed with continuous cells and the trajectory data can be sufficiently sampled in each cell.The geographic coordinates of the boundary for the research area and the spatial grid modeling map is shown in Figure 4.The red grid mesh represents the road segments with speed limits above 70 km/h, which are mainly expressways.In comparison, the green grid represents the road segments with speed limits under 70 km/h, which mainly contain the arterial roads, collectors, and local roads.As the network grid was developed, the grid's attribute data was structured as shown in Table 2.
data falling into cells during a 10 min time interval, and the cell size would be smaller than the width of freeways.Additionally, the sampling rates of FCD applied in this study are 10-15 s, which means that if the cell size was chosen as tens of meters, one vehicle's speed measure points would not continuously fall into adjacent cells.Balancing the various factors, a cell size of 100 × 100 m 2 was recommend for the Beijing road network's grid modeling.Accordingly, the research area was divided into a grid of 300 × 300 cells.Thus, each segment in the major road network can be expressed with continuous cells and the trajectory data can be sufficiently sampled in each cell.The geographic coordinates of the boundary for the research area and the spatial grid modeling map is shown in Figure 4.The red grid mesh represents the road segments with speed limits above 70 km/h, which are mainly expressways.In comparison, the green grid represents the road segments with speed limits under 70 km/h, which mainly contain the arterial roads, collectors, and local roads.As the network grid was developed, the grid's attribute data was structured as shown in Table 2.By extracting the characteristics of traffic attributes and visually displaying them into the grid map, the traffic operation performance patterns can be figured out.The procedure includes map matching and traffic parameter estimation [8].Map matching was to link the coordinates of vehicles with digital maps in order to locate the positions of vehicles.Traffic parameter estimation was to utilize the trajectory information, including time and speed, and identify parameters to assess the traffic congestion levels based on certain criteria, such as average speed, travel time, etc.
Firstly, the trajectory data were matched to the cells based on the GPS data, which are defined as follows:  Cell set: = , , … … , n = 90,000  Time slice set: = , , … … in terms of every 15 min  GPS trajectory speed data set in cell during time : , : , = , , , , , … … ,  By extracting the characteristics of traffic attributes and visually displaying them into the grid map, the traffic operation performance patterns can be figured out.The procedure includes map matching and traffic parameter estimation [8].Map matching was to link the coordinates of vehicles with digital maps in order to locate the positions of vehicles.Traffic parameter estimation was to utilize the trajectory information, including time and speed, and identify parameters to assess the traffic congestion levels based on certain criteria, such as average speed, travel time, etc.
Firstly, the trajectory data were matched to the cells based on the GPS data, which are defined as follows:

•
Cell set: R = {r 1 , r 2 , r 3 . . . . . .r n } , n = 90,000 • Time slice set: T = {t 1 , t 2 , t 3 . . . . . .t m } in terms of every 15 min • GPS trajectory speed data set in cell r i during time t j : C i,j : C i,j = c i,j Secondly, the vehicle speed performance index of cells was established for traffic operation analyses and assessment.When evaluating the road traffic congestion, the speed is an intuitive factor reflecting traffic congestion [28].Therefore, we define a variable cp(r i ) to represent vehicle speed performance index of r i , as shown in Equation (1), which is equal to the cell's free flow speed divided by the average operation speed for each 15 min.As the operation speeds of cells can be easily obtained according to the GPS data, the free flow speed r i (µ F ) is identified as the 95th percentile of operation speed distribution.
In order to intuitively demonstrate the congestion degree of the road network, the Min-Max normalized method was used to transfer the congestion indexes into the range between 0 and 100, in which the value of 100 indicates the most congested condition and the value close to 0 indicates free flow condition.

Density-Based Spatial Clustering Algorithm for Traffic Congestion Pattern Analyses
In this study, the method of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) was applied to cluster the congested cells, and the GIS visualization technology was used for traffic congestion pattern analyses.DBSCAN can help to find a number of clusters starting from the estimated density distribution of corresponding nodes [29].It groups the cells that are closely reachable as well as separates the groups with different attributes.The DBSCAN method can be well applied to data without prior knowledge about the number of classifications.The key definitions of DBSCAN are shown as follows: • ε neighborhood: a circle with a radius of ε around a data point, which is also called the threshold between data points.• minPts: the minimum number of data points needed within a ε neighborhood.

•
Core points: data points are defined as core points when the number of valid data points within the ε neighborhood is larger than the value of minPts.

•
Border points: data points are defined as border points when they are reachable by certain core points but their number within the ε neighborhood is smaller than the value of minPts.

•
Outlier: it is neither a core point nor a border point.
In this study, we proposed DBSCAN algorithms for fitting traffic grid modeling analyses.As cells are analyzed in the girds-based network, the data points in DBSCAN are redefined as cells.First, the ε neighborhood is defined as the number of adjacent cells per direction around the core cell, which is equivalent to the radius of the traffic analysis zone.Second, we used the sum of speed performance index to quantify minPts within a ε neighborhood, as shown in Equation (2), where for a cell r i , all the neighbor cells that are within the radius ε are composed of the set of r i1 , r i2 , r i3 . . . . . .r ip , .
As illustrated in Figure 5, minPts is defined as the threshold value of SCI(r i ).Then, the Core Cells are defined as the cells when the SCI(r i ) is higher than minPts; the Border Cells are defined as the cells when the SCI(r i ) is lower than minPts but reachable by the other Core Cells; and the Outlier Cells are those unreachable by any other cells.
The process of the algorithm is shown in Figure 6.After the road network is segmented into grid cells, the SCI(r i ) is assigned into each cell, and iterative procedures are needed to categorize the cells into different clusters Class_ID(r i ).

•
Step 1: Class_ID(r i ) is initialized as the value of −1 and the attribute of each cell is coded as false, isVisited(r i ) = false.

•
Step 2: Traversing the whole grid, if the SCI(r i ) > minPts, the cell r i is identified as a core cell;

•
Step 3: If r i is a core cell and isVisited(r i ) = false, traversing core cells within the ε, set isVisited(r i ) for all of the cells within the ε neighborhood to be true; • Step 4: Moving to the next core cell within the current ε neighborhood and updating its neighborhood.Repeating Step 3 and traversing all of the potential core cells until isVisited(r i ) for all of the core cells is true.Assigning Class_ID(r i ) for all the visited cells to be Class_ID.

•
Step 5: Identifying any unvisited core cell, repeating Step 3 and Step 4, updating Class_ID = Class_ID + 1 until isVisited(r i ) for all of the core cells through the grid is true.

Visualization of Traffic Operation Performance Index
In this study, the ArcGIS software was applied to formulate both 2D and 3D maps of the road network by assigning the traffic operation performance indexes into the cells, referring to the previous research [13].Figure 7 displays the traffic operation performance distribution patterns during 2 h, respectively for the time periods of 7:00-9:00 a.m., 9:00-11:00 a.m., 14:00-16:00 p.m., 17:00-19:00 p.m., 19:00-21:00 p.m., and 22:00-24:00 p.m.We can visually tell that the morning peak's traffic distribution in the network (7:00-9:00 a.m.) is obviously different from the afternoon peak (17:00-19:00 p.m.), during at which point the expressway system is more congested, especially in the west and east road segments of the second ring road and the east segment of the third ring road.During the daytime non-peak hours (9:00-11:00 a.m. and 14:00-16:00 p.m.), the traffic operation performance's spatial patterns are similar and traffic congestion is scattered in the networks within the fourth ring road.During the nighttime, the traffic conditions gradually become smooth from the period of 19:00-21:00 p.m. to the period of 22:00-24:00 p.m. Further, Figure 8 displays a 3D image of Beijing's expressway network's average traffic operation performance distribution map over the whole day.The height and color in the grid map represent the different congestion levels of cells, which can be easily utilized to find out the severe congestion areas.Particularly, the higher and darker a cell is, the higher the cell's congestion level will be.It shows that the closer the road segments are located in the center of the expressway network, the higher the traffic operation indexes would be.Although the analyses were focused on using the historical FCD to assess traffic operation performance, the technique applied in this study can be transferred to the purpose of dynamically monitoring the network's traffic conditions if the taxi FCD are input into the system in a real-time manner.

Visualization of Traffic Operation Performance Index
In this study, the ArcGIS software was applied to formulate both 2D and 3D maps of the road network by assigning the traffic operation performance indexes into the cells, referring to the previous research [13].Figure 7 displays the traffic operation performance distribution patterns during 2 h, respectively for the time periods of 7:00-9:00 a.m., 9:00-11:00 a.m., 14:00-16:00 p.m., 17:00-19:00 p.m., 19:00-21:00 p.m., and 22:00-24:00 p.m.We can visually tell that the morning peak's traffic distribution in the network (7:00-9:00 a.m.) is obviously different from the peak (17:00-19:00 p.m.), during at which point the expressway system is more congested, especially in the west and east road segments of the second ring road and the east segment of the third ring road.During the daytime non-peak hours (9:00-11:00 a.m. and 14:00-16:00 p.m.), the traffic operation performance's spatial patterns are similar and traffic congestion is scattered in the networks within the fourth ring road.During the nighttime, the traffic conditions gradually become smooth from the period of 19:00-21:00 p.m. to the period of 22:00-24:00 p.m. Further, Figure 8 displays a 3D image of Beijing's expressway network's average traffic operation performance distribution map over the whole day.The height and color in the grid map represent the different congestion levels of cells, which can be easily utilized to find out the severe congestion areas.Particularly, the higher and darker a cell is, the higher the cell's congestion level will be.It shows that the closer the road segments are located in the center of the expressway network, the higher the traffic operation indexes would be.Although the analyses were focused on using the historical FCD to assess traffic operation performance, the technique applied in this study can be transferred to the purpose of dynamically monitoring the network's traffic conditions if the taxi FCD are input into the system in a real-time manner.

Parameter Analyses of and
After obtaining the traffic operation performance indexes for all cells, the DBSCAN algorithm was applied detect traffic congestion patterns.In DBSCAN, there are two vital parameters, i.e., and , to evaluate whether cells should be clustered.A discussion about how to set these two variables is elaborated as follows.
Setting the value of determines the traffic attributes' homogeneity of the clusters.If the is too small, such as one cell neighborhood (which means a 100 meters radius around a core cell), it would cause the cells with very close traffic characteristics to be classified into different clusters.For example, the cells representing the intersection approaches for different directions could be separated into several small-scale discrete clusters, rather than one cluster.On the other hand, if the value of is too large, the adjacent parallel road segments in the road network would be classified into one cluster, resulting in many irrelevant cells falling into one giant traffic congestion region.For example, when was set as an eight cell neighborhood, the congestion region scope would be as large as 10 km, as shown in Figure 9.Eventually, it was found that when was set as a two cell neighborhood, the clusters could both reflect the traffic features and identify the detailed traffic congestion occurrence patterns.A small value of can cause more adjacent cells being clustered into one class.Meanwhile, the total number of classes will be less.With the value of increasing, some original classes will be split into more classes and the total number of classes will increase.However, when the value of continues to increase, more cells will be identified as outliers, resulting in

Parameter Analyses of ε. and minPts.
After obtaining the traffic operation performance indexes for all cells, the DBSCAN algorithm was applied to detect traffic congestion patterns.In DBSCAN, there are two vital parameters, i.e., minPts and ε, to evaluate whether cells should be clustered.A discussion about how to set these two variables is elaborated as follows.
Setting the value of ε determines the traffic attributes' homogeneity of the clusters.If the ε is too small, such as one cell neighborhood (which means a 100 meters radius around a core cell), it would cause the cells with very close traffic characteristics to be classified into different clusters.For example, the cells representing the intersection approaches for different directions could be separated into several small-scale discrete clusters, rather than one cluster.On the other hand, if the value of ε is too large, the adjacent parallel road segments in the road network would be classified into one cluster, resulting in many irrelevant cells falling into one giant traffic congestion region.For example, when ε was set as an eight cell neighborhood, the congestion region scope would be as large as 10 km, as shown in Figure 9.Eventually, it was found that when ε was set as a two cell neighborhood, the clusters could both reflect the traffic features and identify the detailed traffic congestion occurrence patterns.

Parameter Analyses of and
After obtaining the traffic operation performance indexes for all cells, the DBSCAN algorithm was applied to detect traffic congestion patterns.In DBSCAN, there are two vital parameters, i.e., and , to evaluate whether cells should be clustered.A discussion about how to set these two variables is elaborated as follows.
Setting the value of determines the traffic attributes' homogeneity of the clusters.If the is too small, such as one cell neighborhood (which means a 100 meters radius around a core cell), it would cause the cells with very close traffic characteristics to be classified into different clusters.For example, the cells representing the intersection approaches for different directions could be separated into several small-scale discrete clusters, rather than one cluster.On the other hand, if the value of is too large, the adjacent parallel road segments in the road network would be classified into one cluster, resulting in many irrelevant cells falling into one giant traffic congestion region.For example, when was set as an eight cell neighborhood, the congestion region scope would be as large as 10 km, as shown in Figure 9.Eventually, it was found that when was set as a two cell neighborhood, the clusters could both reflect the traffic features and identify the detailed traffic congestion occurrence patterns.A small value of can cause more adjacent cells being clustered into one class.Meanwhile, the total number of classes will be less.With the value of increasing, some original classes will be split into more classes and the total number of classes will increase.However, when the value of continues to increase, more cells will be identified as outliers, resulting in A small value of minPts can cause more adjacent cells being clustered into one class.Meanwhile, the total number of classes will be less.With the value of minPts increasing, some original classes will be split into more classes and the total number of classes will increase.However, when the value of minPts continues to increase, more cells will be identified as outliers, resulting in that only the cells with high SCI can be kept into the clusters and the total number of classes will decrease.
To select the appropriate value of minPts, the corresponding clustering number based on each existing value of minPts was calculated as in Figure 10.The results show that one class will be generated when the value of minPts is more than 370.Then, as the value of minPts increases, the number of classes increases.When the values of minPts are within the range of 890-1050, the number of classes becomes stable in the range of 50-55.When the value of minPts is 1050, the number of classes starts to drop down.Finally, no classes will be generated once the value is more than 1550.Optimally, when the value of minPts is equal to 960, the number of generated classes reaches the maximum value, namely 55 classes.In this study, we chose the maximum number of clusters in order to identify different types of traffic congestion patterns presenting in the road network.that only the cells with high SCI can be kept into the clusters and the total number of classes will decrease.
To select the appropriate value of , the corresponding clustering number based on each existing value of was calculated as shown in Figure 10.The results show that one class will be generated when the value of is more than 370.Then, as the value of increases, the number of classes increases.When the values of are within the range of 890-1050, the number of classes becomes stable in the range of 50-55.When the value of is 1050, the number of classes starts to drop down.Finally, no classes will be generated once the value is more than 1550.Optimally, when the value of is equal to 960, the number of generated classes reaches the maximum value, namely 55 classes.In this study, we chose the maximum number of clusters in order to identify different types of traffic congestion patterns presenting in the road network.

Traffic Congestion Pattern Analyses
When setting the value of as 2 cell neighborhood and the SCI value of as 960, Figure 11 shows the spatial distribution of the 55 traffic congestion clusters in the grid map.It indicates that the traffic congestion patterns can be categorized into three types, namely point type, line type, and region type, as summarized in Table 3.

Traffic Congestion Pattern Analyses
When setting the value of ε as 2 cell neighborhood and the SCI value of minPts as 960, Figure 11 shows the spatial distribution of the 55 traffic congestion clusters in the grid map.It indicates that the traffic congestion patterns can be categorized into three types, namely point type, line type, and region type, as summarized in Table 3. that only the cells with high SCI can be kept into the clusters and the total number of classes will decrease.
To select the appropriate value of , the corresponding clustering number based on each existing value of was calculated as shown in Figure 10.The results show that one class will be generated when the value of is more than 370.Then, as the value of increases, the number of classes increases.When the values of are within the range of 890-1050, the number of classes becomes stable in the range of 50-55.When the value of is 1050, the number of classes starts to drop down.Finally, no classes will be generated once the value is more than 1550.Optimally, when the value of is equal to 960, the number of generated classes reaches the maximum value, namely 55 classes.In this study, we chose the maximum number of clusters in order to identify different types of traffic congestion patterns presenting in the road network.

Traffic Congestion Pattern Analyses
When setting the value of as 2 cell neighborhood and the SCI value of as 960, Figure 11 shows the spatial distribution of the 55 traffic congestion clusters in the grid map.It indicates that the traffic congestion patterns can be categorized into three types, namely point type, line type, and region type, as summarized in Table 3.

Spatial Pattern Critical Causation Typical Examples
Point Insufficient capacities at the nodes of the road network.

Line
High traffic demand or bottleneck issues in the road segments.

Region
Multiple high-demand expressways merge and connect to each other.
Point-type congestion occurs at intersections of arterials or interchanges of expressways.This kind of congestion is usually caused by insufficient capacities at the nodes of the road network.It is independent and isolated from the other congested areas.Improving geometric design for interchanges and optimizing intersection signal timing design can both represent effective countermeasures to relieve the point-type congestion [30].A serious point congestion may also lead to linear congestion, which typically emerges in the urban expressways or arterials of the segments.Line congestion is often caused by high traffic demand or bottleneck issues.In order to relieve linear congestion, an advanced traffic guidance system that delivers traffic information to drivers in a realtime manner can be applied for balancing the traffic demand distribution in the local road networks [31].Regional congestion usually occurs and covers the whole local road networks when multiple high-demand expressways merge and connect to each other.In the Beijing road network, the most typical regional congestion exists in the areas where the third ring road and fourth ring road intersect with the high speed expressway to the National Airport.Guiding the traffic from the airport expressway to the other alternative routes or encouraging travelers to or from the airport to use mass transit systems would be proper solutions in resolving the regional congestion issues [28].
Furthermore, Figure 12 shows the distribution of the number of cells in the 55 clusters.Most of the clusters contain 10-14 cells, representing 1-1.4 km congestion mileage; some heavy congestion clusters contain 26-32 cells, representing up to 2.6-3.2km congestion mileage, which frequently occurs in the road network.As shown in Figure 13, a strong correlation is found between the variable and the number of cells.Intuitively, the total value of within a cluster increases with an increase in the number of cells.However, Figure 14 illustrates the values of total in most clusters are less than 18,000, while only six clusters' total values are larger than 18,000 and their average values are larger than 900 at the same time.Interestingly, using the total value 18,000 and average value 900 as the division line, the 55 clusters can be separated into three groups, which just match then explain the spatial patterns of point, linear, and regional congestion.The results indicate that the point type clusters have a more intensive degree of congestion but a lower influencing scope; however, the regional type clusters have both an intensive congestion degree and scope.
Point-type congestion occurs at intersections of arterials or interchanges of expressways.This kind of congestion is usually caused by insufficient capacities at the nodes of the road network.It is independent and isolated from the other congested areas.Improving geometric design for interchanges and optimizing intersection signal timing design can both represent effective countermeasures to relieve the point-type congestion [30].A serious point congestion may also lead to linear congestion, which typically emerges in the urban expressways or arterials of the segments.Line congestion is often caused by high traffic demand or bottleneck issues.In order to relieve linear congestion, an advanced traffic guidance system that delivers traffic information to drivers in a real-time manner can be applied for balancing the traffic demand distribution in the local road networks [31].Regional congestion usually occurs and covers the whole local road networks when multiple high-demand expressways merge and connect to each other.In the Beijing road network, the most typical regional congestion exists in the areas where the third ring road and fourth ring road intersect with the high speed expressway to the National Airport.Guiding the traffic from the airport expressway to the other alternative routes or encouraging travelers to or from the airport to use mass transit systems would be proper solutions in resolving the regional congestion issues [28].
Furthermore, Figure 12 shows the distribution of the number of cells in the 55 clusters.Most of the clusters contain 10-14 cells, representing 1-1.4 km congestion mileage; some heavy congestion clusters contain 26-32 cells, representing up to 2.6-3.2km congestion mileage, which frequently occurs in the road network.As shown in Figure 13, a strong correlation is found between the variable SCI and the number of cells.Intuitively, the total value of SCI within a cluster increases with an increase in the number of cells.However, Figure 14 illustrates the values of total SCI in most clusters are less than 18,000, while only six clusters' total SCI values are larger than 18,000 and their average SCI values are larger than 900 at the same time.Interestingly, using the total SCI value 18,000 and average SCI value 900 as the division line, the 55 clusters can be separated into three groups, which just match then explain the spatial patterns of point, linear, and regional congestion.The results indicate that the point type clusters have a more intensive degree of congestion but a lower influencing scope; however, the regional type clusters have both an intensive congestion degree and scope.

Conclusions
In order to provide a high-efficiency method for recurring traffic congestion in an urban road network, we developed a cell-based model using the taxi FCD and the DBSCAN algorithm.Taking the Beijing road network as an example, the study illustrated how to map the grid network based on identifying the proper cell size.It was found that a cell size 100 × 100 m 2 can well represent the major road network with sufficient taxi FCD samples.Then, we proposed the DBSCAN algorithm to fit the cell-based traffic congestion analyses through assigning the traffic operation performance index to the grid cells.The grid modeling results showed that the traffic congestion patterns can be categorized into three types, namely: point type, stemming from insufficient capacities at the nodes of the road network; line type, caused by high traffic demand or bottleneck issues in the road segments; and region type resulting from multiple high-demand expressways merging and connecting to each other.Through the application of ArcGIS software, 3D maps of the road network

Conclusions
In order to provide a high-efficiency method for recurring traffic congestion in an urban road network, we developed a cell-based model using the taxi FCD and the DBSCAN algorithm.Taking the Beijing road network as an example, the study illustrated how to map the grid network based on identifying the proper cell size.It was found that a cell size of 100 × 100 m 2 can well represent the major road network with sufficient taxi FCD samples.Then, we proposed the DBSCAN algorithm to fit the cell-based traffic congestion analyses through assigning the traffic operation performance index to the grid cells.The grid modeling results showed that the traffic congestion patterns can be categorized into three types, namely: point type, stemming from insufficient capacities at the nodes of the road network; line type, caused by high traffic demand or bottleneck issues in the road segments; and region type resulting from multiple high-demand expressways merging and connecting to each other.Through the application of ArcGIS software, 3D maps of the road network

Conclusions
In order to provide a high-efficiency method for recurring traffic congestion in an urban road network, we developed a cell-based model using the taxi FCD and the DBSCAN algorithm.Taking the Beijing road network as an example, the study illustrated how to map the grid network based on identifying the proper cell size.It was found that a cell size of 100 × 100 m 2 can well represent the major road network with sufficient taxi FCD samples.Then, we proposed the DBSCAN algorithm to fit the cell-based traffic congestion analyses through assigning the traffic operation performance index to the grid cells.The grid modeling results showed that the traffic congestion patterns can be categorized into three types, namely: point type, stemming from insufficient capacities at the nodes of the road network; line type, caused by high traffic demand or bottleneck issues in the road segments; and region type resulting from multiple high-demand expressways merging and connecting to each other.Through the application of ArcGIS software, 3D maps of the road network

Conclusions
In order to provide a high-efficiency method for recurring traffic congestion in an urban road network, we developed a cell-based model using the taxi FCD and the DBSCAN algorithm.Taking the Beijing road network as an example, the study illustrated how to map the grid network based on identifying the proper cell size.It was found that a cell size of 100 × 100 m 2 can well represent the major road network with sufficient taxi FCD samples.Then, we proposed the DBSCAN algorithm to fit the cell-based traffic congestion analyses through assigning the traffic operation performance index to the grid cells.The grid modeling results showed that the traffic congestion patterns can be categorized into three types, namely: point type, stemming from insufficient capacities at the nodes of the road network; line type, caused by high traffic demand or bottleneck issues in the road segments; and region type resulting from multiple high-demand expressways merging and connecting to each other.Through the application of ArcGIS software, 3D maps of the road network congestion were easily generated for either dynamic operation performance visualization or location identification for different types of congestion.The proposed method would be useful for traffic management departments in order identify the best corresponding engineering countermeasures to relieve the network traffic congestion issues.In future studies, bus FCD data and mobile sensor data can be also applied into the grid model in order to increase the sample size for each cell and the estimation accuracy of traffic operation performance.Especially, if urban POI (point of interest) data are available, the properties of land use data in cells can be integrated with the traffic analyses to find out the deep causation of congestion formation from the perspective of drivers' travel purposes and behavior features.

Figure 1 .
Figure 1.Average speed (km/h) of taxis of two different operating conditions.

Figure 2 .
Figure 2. The number and proportion of service taxis on workdays and weekends.

Figure 1 .
Figure 1.Average speed (km/h) of taxis of two different operating conditions.

Figure 1 .
Figure 1.Average speed (km/h) of taxis of two different operating conditions.

Figure 2 .
Figure 2. The number and proportion of service taxis on workdays and weekends.

Figure 2 .
Figure 2. The number and proportion of service taxis on workdays and weekends.

Figure 3 .
Figure 3. Distribution of the taxis' sampling time intervals.

Figure 3 .
Figure 3. Distribution of the taxis' sampling time intervals.

Figure 4 .
Figure 4. Spatial scale of the city in this research.

Figure 4 .
Figure 4. Spatial scale of the city in this research.

Figure 6 .
Figure 6.Pseudo-code of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm for clustering cells.

Figure 5 .
Figure 5. Schematics of the variable definition.

Figure 6 .
Figure 6.Pseudo-code of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm for clustering cells.

Figure 6 .
Figure 6.Pseudo-code of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm for clustering cells.

Figure 7 .
Figure 7. Traffic operation performance distribution patterns during 2 h.Figure 7. Traffic operation performance distribution patterns during 2 h.

Figure 7 .Figure 8 .
Figure 7. Traffic operation performance distribution patterns during 2 h.Figure 7. Traffic operation performance distribution patterns during 2 h.

Figure 9 .
Figure 9. Congestion region scope with different value of .

Figure 8 .
Figure 8. 3D Visualization of Traffic Operation Performance Index in Beijing.

Figure 8 .
Figure 8. 3D Visualization of Traffic Operation Performance Index in Beijing.

9 .
Congestion region scope with different value of .

Figure 9 .
Figure 9. Congestion region scope with different value of ε.

Figure 10 .
Figure 10.Clustering classes and value of .

Figure 11 .
Figure 11.3D visualization of congested cells clustering result.

Figure 10 .
Figure 10.Clustering classes and value of minPts.

Figure 10 .
Figure 10.Clustering classes and value of .

Figure 11 .
Figure 11.3D visualization of congested cells clustering result.

Figure 11 .
Figure 11.3D visualization of congested cells clustering result.

Figure 12 .
Figure 12.Number distribution of cells in each class.

Figure 13 .
Figure 13.Cells number and total of class.

Figure 14 .
Figure 14.Average of class and total of class.

Figure 12 . 15 Figure 12 .
Figure 12.Number distribution of cells in each class.

Figure 13 .
Figure 13.Cells number and total of class.

Figure 14 .
Figure 14.Average of class and total of class.

13 . 15 Figure 12 .
Figure 12.Number distribution of cells in each class.

Figure 13 .
Figure 13.Cells number and total of class.

Figure 14 .
Figure 14.Average of class and total of class.

Figure 14 .
Figure 14.Average SCI of class and total SCI of class.

Table 1 .
The detailed data attributes of floating car data (FCD) system.

Table 2 .
Structure and details of the cell data.

Table 2 .
Structure and details of the cell data.

Table 3 .
Spatial patterns of point, line, and region congestion.