Interactive, Multiscale Urban-Traffic Pattern Exploration Leveraging Massive GPS Trajectories

Urban traffic pattern reflects how people move and how goods are transported, which is crucial for traffic management and urban planning. With the development of sensing techniques, accumulated sensor data are captured for monitoring vehicles, which also present the opportunities of big transportation data, especially for real-time interactive traffic pattern analysis. We propose a three-layer framework for the recognition and visualization of multiscale traffic patterns. The first layer computes the middle-tier synopses at fine spatial and temporal scales, which are indexed and stored in a geodatabase. The second layer uses synopses to efficiently extract multiscale traffic patterns. The third layer supports real-time interactive visual analytics for intuitive explorations by end users. An experiment in Shenzhen on taxi GPS trajectories that were collected over one month was conducted. Multiple traffic patterns are recognized and visualized in real-time. The results show the satisfactory performance of proposed framework in traffic analysis, which will facilitate traffic management and operation.


Introduction
Understanding urban traffic dynamics facilitates the lives of urban residents, the operations of transport managers, and the alleviation of air pollutions. With the development of information and communication technology (ICT), the Internet of things (IoTs), and the global positioning system (GPS), vehicle GPS trajectories have been widely collected in many cities [1][2][3]. As vehicles, especially taxis, move around the city, their GPS trajectories become important data sources for advanced traffic information systems and advanced traffic management systems [4,5]. Currently, raw GPS data are processed for real-time traffic monitoring or vehicle dispatching when they are streamed in. Sometimes, these data are stored for an additional 2 to 6 months or discarded directly. Due to the large volume of vehicle GPS data, it is difficult to process massive GPS data to perceive and explore the hidden traffic patterns in real-time. Consequently, it is essential to develop a new approach that has low storage and computing costs for exploring citywide traffic patterns.
Many studies have been conducted on the extraction of transport information. Several transport attributes have been recognized from various data sources, for example, road network [6], travel speed [7], travel volume [8], and traffic congestion [9]. A set of effective approaches has been developed for analyzing road traffic, which includes statistics-based methods [10][11][12][13] and clustering-based methods [14][15][16]. For example, Zou et al. [17] examined road traffic using long-term vehicle trajectories and identified the spatial dependency of the traffic state via spatial autocorrelation. Guo et al. [18] developed an effective computing system for investigating regular and abnormal traffic flow patterns at road intersections. These studies provide an effective analysis of traffic at various places. However, these methods are computation-intensive and, thus, have difficulty in real-time exploration analysis. In addition, due to the cascaded nature of urban traffic [17], congestion at an intersection may produce a vehicle queue at another road. The exploration of multiscale urban traffic patterns from one road to the city area is expected.
Meanwhile, due to its satisfactory performance on big data computing and prediction, machine learning methods have been applied in transport data analysis, such as tensor factorization [19], artificial neural networks [20], and graph neural networks (GNNs) [21]. Although complex traffic patterns have been extracted, a gap between the computed results and the associated human understanding remains. A more friendly interactive tool for the intuitive illustration of urban traffic is needed to enable users to understand citywide traffic efficiently.
We present a three-layer interactive framework for the exploration of multiscale traffic patterns leveraging massive GPS trajectory and the intuitive visualizations for effective perception. We propose a middle-tier data structure, namely, the synopses, which connects the data processing layer and the pattern recognition layer to increase the efficiency to enable real-time traffic exploration analysis. According to the end users' interests, multiple traffic patterns are effectively recognized. In the visual analytic layer, a web-based system is designed to facilitate end users visualizing, perceiving, and exploring citywide traffic dynamics in real-time. Finally, using GPS data that were collected in Shenzhen City over one month, we conducted an experiment to evaluate the performance of the proposed framework.
The main contributions of this study are summarized as follows: • A three-layer framework is proposed for real-time interactive traffic pattern exploration analysis on massive GPS trajectories.

•
The synopses are proposed to constitute a middle-tier data structure to accelerate pattern recognition and support real-time exploration. • A friendly interactive visual analytics system is developed for exploring urban road traffic dynamics intuitively.
The remainder of this article is organized as follows: Section 2 reviews the related literature. Section 3 introduces the study area and the dataset. In Section 4, we describe the proposed framework and method in detail. Section 5 reports a case study and presents several examples. In Section 6, we discuss the performance of the developed system. Finally, we present our conclusions in Section 7.

Data-Driven Traffic Monitoring
Traffic monitoring has been a popular topic as it facilitates transport management and planning. Due to the advantage of passive sensing, increasingly many sensors are being used in the transport sector, such as probe vehicles, loops, and cameras [22][23][24][25]. These sensors produce abundant data that capture the urban traffic status, thereby enabling traffic monitoring. Consequently, data-driven methods are widely developed for monitoring traffic attributes, such as the speed, the flow, the occupancy, and the travel time [22,[26][27][28]. These results are useful for forecasting traffic status, detecting traffic incidents, and planning transportation facilities [29][30][31][32].
Data-driven traffic monitoring studies contain two main parts: real-time traffic sensing and traffic pattern extraction. Real-time traffic sensing processes massive sensed data and produces traffic information for travelers and transport managers [30,33]. The main challenge is to infer the accurate traffic status despite the inhomogeneity and the sparseness of the sensed data.
Traffic pattern extraction is another task of data-driven traffic monitoring. Three categories of methods have been developed for this task: probability-based methods, clustering methods, and neural network (NN) methods. The probability-based methods compute traffic properties such as speed and flow [10,13,34]. The distribution, randomness, and relationship among these properties are used to estimate parameters that describe the traffic patterns [11,12]. The clustering methods group road links according to their similarity in terms of traffic properties. The clustering methods have wide applications in traffic pattern extraction as they can consider various traffic features [14][15][16]. To obtain more accurate results, several NN methods are applied in traffic data analysis, such as local artificial neural networks [20], fuzzy neural networks [8], and sequence graph neural networks [21]. However, these methods focused mostly on fixed temporal and spatial scales; hence, they cannot explore dynamic traffic intuitively.
Road traffic changes with space and time. Several studies represented road traffic as time-series sequences and examined the trends and recurrent patterns using the autoregressive integrated moving average model (ARIMA) and the Markov model [15,[35][36][37]. Furthermore, spatial autocorrelations and cross-correlations are computed to generate relationships among road links [17,38,39]. Overall, these advanced studies have demonstrated the spatial-temporal variation of road traffic in the city. This study developed an efficient computing framework for supporting friendly and intuitive traffic exploration using massive GPS trajectories.

Traffic Visual Analytics
The revealed traffic patterns are typically complex; thus, they are difficult for users to understand. Recently, visual analytics of temporal, spatial, numerical, and categorical properties were presented for facilitating the interpretation of complex results by users [40]. Vehicle trajectories are represented as dots or polylines on a map [41,42]. Traffic visualization techniques are categorized into three main groups: direct depiction, derived data visualization, and extracted pattern visualizations [43]. For example, in combination with typical two-dimensional spatial maps, studies modeled the temporal dimension as the z-axis to depict the temporal properties [44,45]. The novel route-zooming technique [46] embeds the spatiotemporal information into a map, to avoid the visual clutter that arises in direct depiction, which enables us to display spatial information and attributes at the same time. For visualizing derived data, aggregation is a popular approach. Kernel density estimation is mostly used to derive the density of trajectories [47]. The hotspot map is another tool for indicating significant places behind massive traffic data [48]. By bundling massive vehicle trajectories, the changing flow visualization is proposed for mapping massive mobility flows [49]. Furthermore, studies have focused on the visualization of traffic patterns. For example, the propagation of a traffic jam is displayed on a spatial map to show how the traffic jam evolves from one road to another [50].
Visualization depicts the results using various visualization forms, whereas visual analytics focuses on methods "for an effective understanding, reasoning and decision making" [51]. Visual analytics is a useful and powerful analysis method for big data due to the advantage of combining human intelligence and machine computing. Various studies have proved the performance of visual analytics on transport problems. For example, a friendly trip visualization system, namely, TripVista [18], was developed for examining the movement behaviors at intersections, such as the object types, travel speeds, and travel directions. Traffic events are detected and displayed, such as traffic jam [50] and human gathering activities [52]. Visual analytics on the route diversity facilitate the determination of the usages of urban roads [53,54]. These studies of visual analytics enable the intuitive exploration of massive transport data. Following this direction, we put forward the mission designing visual analytics with knowledge of urban traffic to develop a data-driven interactive visual analysis system for exploring traffic pattern friendly and intuitively.

Study Area and Datasets
We conducted this study in Shenzhen, China. Shenzhen is one of the most developed cities in southern China. It covers a total area of approximately 2000 km 2 . After fast development over the past forty years, Shenzhen has become a global city with 20 million residents and 3.4 million vehicles.
Daily traffic has become a challenging issue for transport management and operations. We considered two categories of data: road network and vehicle GPS records. The road network of Shenzhen was provided by the Transport Commission of Shenzhen Municipality. The road network is represented by a graph of interconnected lines and points, in which lines denotes road links and points denotes the intersections of road links. The road links also contain several attributes, such as the name, the level, and the local administrative region. The road network is organized according to the topology.
The vehicle GPS records were also provided by the Transport Commission of Shenzhen Municipality. This dataset contains the GPS records of 14,692 taxis from January 2015. It was collected by GPS modules that were equipped in the taxies. The total number of GPS records is approximately 5.5 billion, which require storage of up to 100 G. The vehicle GPS records are described in Table 1. Each GPS record includes six fields: vehicle ID, time stamp, longitude, latitude, speed, and occupation status.

Methodology
We propose a three-layer framework for extracting and exploring traffic patterns, which is illustrated in Figure 1. In the first data processing layer, raw GPS records are processed to produce the synopses, which constitute a middle-tier data structure that represents traffic states of fine-grained road segments. In the second traffic pattern recognition layer, typical traffic patterns are recognized. The visual analytic layer provides interactions and intuitive visualization for the real-time exploration of traffic patterns. These layers are described in detail below. is represented by a graph of interconnected lines and points, in which lines denotes road links and points denotes the intersections of road links. The road links also contain several attributes, such as the name, the level, and the local administrative region. The road network is organized according to the topology. The vehicle GPS records were also provided by the Transport Commission of Shenzhen Municipality. This dataset contains the GPS records of 14,692 taxis from January 2015. It was collected by GPS modules that were equipped in the taxies. The total number of GPS records is approximately 5.5 billion, which require storage of up to 100 G. The vehicle GPS records are described in Table 1. Each GPS record includes six fields: vehicle ID, time stamp, longitude, latitude, speed, and occupation status.

Methodology
We propose a three-layer framework for extracting and exploring traffic patterns, which is illustrated in Figure 1. In the first data processing layer, raw GPS records are processed to produce the synopses, which constitute a middle-tier data structure that represents traffic states of finegrained road segments. In the second traffic pattern recognition layer, typical traffic patterns are recognized. The visual analytic layer provides interactions and intuitive visualization for the realtime exploration of traffic patterns. These layers are described in detail below.

Data Processing
Due to the dynamic nature of traffic, the traffic state varies over space and time. Instead of extracting traffic patterns from raw GPS records directly, we process raw GPS data into middle-tier synopses to accelerate the subsequent exploration. Formally, the synopsis is a traffic state description of a road segment during a period. The synopsis can be defined as a quadruples M = {p, t, P, s}, where p is the road segment, t is the time, P is the set of GPS records within the road segment p, and s is the corresponding traffic state. The length of the road segment and the time interval of the sequential synopsis are important as the fine-grained spatial and temporal units will significantly increase the required data storage. According to the first law in geography, "near things are more related than distant things" [55]. Traffic states on shorter road links are more homogeneous. Therefore, we segmented long road links into shorter segments. Similarly, a smaller time interval is also suggested. After intensive experimentation, we set the maximum length of a road segment to 200 m. If the length of a road link exceeds 200 m, we divide it to several segments. We set the time interval to 15 min, which is a typical value in transport studies and applications [56,57]. Regarding the set of GPS records, P, we filter the related records of a synopsis by the location of road segment p and the time t. Noted that taxis travel slowly or stop while looking for passengers, which will underestimate road traffic. Thus, we firstly filter out GPS records with the occupation status un-occupied. Raw GPS data are then related to the corresponding road segment by map-matching. As the used parallel map-matching method [7] is very fast, we can produce the synopsis on-the-fly when GPS records are streamed into the taxi monitoring system or with the post-processing mode.
We regard the average travel speed as the traffic state. The value of the traffic state s can be computed using the velocity vector s = [v 1 , v 2 , . . . , v n ], as in Equation (1), where v i is the traffic speed from the GPS records, and n is the number of matched records. It is also essential to evaluate the reliability of the synopsis. Here, we present the variance of the speed to indicate the trustworthiness of the traffic state value, which is calculated via Equation (2). A large variance indicator suggests high uncertainty in the value of the traffic state. If no record is available, we set the value to "−1", and the road link cannot be sensed during the corresponding 15 min using taxi GPS data.
Finally, after computing all synopses, we can represent road traffic dynamic with a sequence of synopses. We stored all synopses in a geodatabase for the subsequent traffic pattern recognition, visualization, and exploration. Considering the imbalance of the GPS records, we created the spatial index and temporal index on synopses to accelerate the data query and retrieving.

Pattern Recognition
We use time sequential synopsis to evaluate the dynamic variations of road traffic. Similarity measures have been widely used in pattern recognition, such as the Manhattan distance and the Pearson distance. Here, an algorithm, named "Trend and Value Distance (TVD)", is presented for evaluating the similarity of road traffic sequence. In contrast to many similarity measures that consider only the numeric values or the sequence order, the presented TVD algorithm integrates both to examine the similarity of two sequences. For a specified road and a time period from t p to t q , let SEQ {p, t p , t q } = s {p, t p } , s {p, t p+1 } , . . . , s {p, t q } be the sequential synopses. The TVD is described in detail in Algorithm 1. The numeric differences between two sequential synopses are firstly computed (Lines 6 to 13). The Pearson distance is calculated and used to examine the similarity of trends (Line 15). These are combined as the final TVD result (Line 16). The value of the TVD ranges from 0 to 1. The larger the TVD value, the higher the similarity of two synopses. Here, the TVD is used to evaluate the similarity of traffic dynamic of one road link between periods. It is also used to evaluate the similarity of the traffic dynamic of multiple road links.

Algorithm 1 TVD
Input: SEQ1, SEQ2: two time sequential synopses µ: adjusting parameter Output: TVD: distance value 1: Length of sequential synopses: n = SEQ1.length 2: Manhattan distance of SEQ1 and SEQ2: manhattan = 0 3: Maximum Manhattan distance: max = 0 4: Vector of the difference of SEQ1 and SEQ2: if abs i > max then 11: max = abs i 12: end if 13: end for 14: For a road segment, if the similarity of two sequential synopses exceeds a threshold, it indicates a possible regular traffic pattern. We develop another algorithm, namely, "Traffic Regular Pattern (TRP)", for identifying the hidden pattern, which is presented as Algorithm 2. There are three steps in TRP. Parameter duration determines the period of interest. The parameter type filters the days of the specified type during the period. In our system, there are three types to be used: workday, weekend, and all (workday and weekend). The first step (Lines 1-8) is to retrieve sequential synopses from the geodatabase according to the temporal query conditions that are specified by the end users. For example, the setting duration = "From 2015-01-01 to 2015-01-31" and type = "weekend" specifies that the synopses of all the weekends in January will be used to recognize a potential traffic pattern. The second step (Line 9) is to evaluate whether the traffic dynamics of a road in the specified period are a regular traffic pattern. We use DBSCAN [58] to cluster the sequential synopses. The TVD value is used to measure the similarity of two sequences. There are two parameters, namely, "eps" and "minpts", in DBSCAN. The parameter "eps" is set according to experimentation. "Minpts" is not a constant; it is related to sequences that are filtered by parameters duration and type. The third step is to generalize a typical traffic pattern for each cluster from the sequential synopses (Lines 10-21).

Interactive Traffic Pattern Explorative Analysis
A web-based visual analytics system is developed for the real-time exploration of traffic patterns by end users. The interface is designed to comply with the classic visual exploration mantra, "Overview first, zoom and filter, details on demand" [59]. As illustrated in Figure 2, the designed interface consists of views (a, b) for global spatial and temporal pattern overview and views (c, d) for detailed exploration of roads of interest.
For the overview navigation, a map view (a) is presented in the top left, with a digital map as a spatial reference to illustrate traffic state and traffic patterns by coloring the roads. Conventional map operations such as zooming and dragging are supported. End users can click or box select roads in the map view and set time conditions (i.e., duration, type, etc.) by the time components at the right-bottom of the map view to interactively explore traffic patterns of various roads according to their individual interests. Parameters used in the traffic pattern recognition layer are also set by end users. The extracted traffic patterns with high frequency are enumerated in the pattern view on the top-right (b) to show the typical cases of how the traffic state of a road segment changes over time. End users can click a pattern and observe its spatial distribution in the map view. The developed system also enables end users to add their patterns of interest as the target patterns by clicking on the roads.
Once traffic pattern of a specified road link has been interactively identified, it can be examined in detail from both the spatial and temporal perspectives. Spatially, the road traffic states of the adjacent 10 upstream links and 10 downstream links of the target link are displayed (c). Multiscale temporal dynamics of the road link are visually expressed by a line chart and radial charts (d). The line chart is able to better support numerical reading tasks, namely, users can easily perceive the speed at a time point. Radial charts are provided to support the cycle comparison tasks, e.g., over workdays or weekends. The radial layout is composed of several concentric rings, in which one ring displays one day traffic pattern in the clockwise direction. Examples with details will be presented in Section 5.

Case Study
We conducted an experiment to evaluate the performance of the developed system. When raw GPS data streamed in, the synopses were computed and stored in the MongoDB database. The visual analytic system was implemented in JavaScript as a web-based system. We ran the experiment on a Dell desktop PC with a 3.40 GHz CPU and 8 G of memory. We reported several user cases to demonstrate the exploration of urban road traffic patterns using the developed interactive visualization system.

Regularity of Traffic States
One function of the developed system is to reveal the highly frequent traffic patterns of road links. The first case adopts the Visual Information Seeking Mantra "overview-zoom/filter-detail" [59] as the exploration strategy. For one road link, the TVD indicator was calculated to evaluate the similarity of the daily sequential traffic states in the workdays and the weekends. Table 2 reports the descriptive statistics of road links with high self-similarity, which demonstrates the regularity of the road traffic dynamic. Figure 3 presents the spatial distribution of road links that differ in terms of self-similarity. The results demonstrate that 15,266 roads have a TVD indicator that exceeds 0.5, which cover 2294.2 km. These links cover most road links in the downtown areas, and main roads in the outer suburb. The TVD values were concentrated in the range of 0.8 to 0.9, with 9000 road segments in the workday and 8098 road segments in the weekend, respectively. In section of 0.9-1.0, the weekend mode has more road links, and according to the figure, it has more light-blue links. From the spatial perspective, many road links with high self-similarity are located in the urban center. A few roads with low self-similarity are located in the east Shenzhen and north Shenzhen. These results indicate the regularity of the road link traffic state. Therefore, it is possible to extract regular travel traffic patterns from massive vehicle GPS trajectories.

Case Study
We conducted an experiment to evaluate the performance of the developed system. When raw GPS data streamed in, the synopses were computed and stored in the MongoDB database. The visual analytic system was implemented in JavaScript as a web-based system. We ran the experiment on a Dell desktop PC with a 3.40 GHz CPU and 8 G of memory. We reported several user cases to demonstrate the exploration of urban road traffic patterns using the developed interactive visualization system.

Regularity of Traffic States
One function of the developed system is to reveal the highly frequent traffic patterns of road links. The first case adopts the Visual Information Seeking Mantra "overview-zoom/filter-detail" [59] as the exploration strategy. For one road link, the TVD indicator was calculated to evaluate the similarity of the daily sequential traffic states in the workdays and the weekends. Table 2 reports the descriptive statistics of road links with high self-similarity, which demonstrates the regularity of the road traffic dynamic. Figure 3 presents the spatial distribution of road links that differ in terms of self-similarity. The results demonstrate that 15,266 roads have a TVD indicator that exceeds 0.5, which cover 2294.2 km. These links cover most road links in the downtown areas, and main roads in the outer suburb. The TVD values were concentrated in the range of 0.8 to 0.9, with 9000 road segments in the workday and 8098 road segments in the weekend, respectively. In section of 0.9-1.0, the weekend mode has more road links, and according to the figure, it has more light-blue links. From the spatial perspective, many road links with high self-similarity are located in the urban center. A few roads with low self-similarity are located in the east Shenzhen and north Shenzhen. These results indicate the regularity of the road link traffic state. Therefore, it is possible to extract regular travel traffic patterns from massive vehicle GPS trajectories.

Temporal Dynamic of Road Traffic
We zoom-in on the road links to investigate the temporal dynamics of traffic patterns. Upon clicking a link of interest on the map, its daily, weekly, and monthly traffic patterns are shown in the temporal view. Figure 4 shows an example of road links with typical traffic states. These two-road links are located on the same road but correspond to opposite driving directions. Due to large commuting volume, the road traffic significantly deteriorates during rush hour. The line chart in Figure 4a demonstrates that the travel speed decreases from 58 km/h to 18 km/h. The concentric circles are used to display weekly and monthly traffic patterns. One circle represents the traffic dynamic in one day. For the weekly pattern, the inner circle corresponds to Monday and the outer circle to Sunday. For the monthly pattern, the inner circle corresponds to the first week and the outer circle to the last week. The traffic pattern of the road in Figure 4a is stable on workdays and weekends. However, the monthly variation demonstrates that the road traffic deteriorates since more hours correspond to low travel speed week by week. Figure 4b suggests that the traffic dynamics differ between road directions. Both the weekly and monthly traffic patterns change minimally. The traffic patterns show high self-similarity on workdays and weekends from the weekly perspective, and stable traffic patterns are observed from the weekly perspective.

Temporal Dynamic of Road Traffic
We zoom-in on the road links to investigate the temporal dynamics of traffic patterns. Upon clicking a link of interest on the map, its daily, weekly, and monthly traffic patterns are shown in the temporal view. Figure 4 shows an example of road links with typical traffic states. These two-road links are located on the same road but correspond to opposite driving directions. Due to large commuting volume, the road traffic significantly deteriorates during rush hour. The line chart in Figure 4a demonstrates that the travel speed decreases from 58 km/h to 18 km/h. The concentric circles are used to display weekly and monthly traffic patterns. One circle represents the traffic dynamic in one day. For the weekly pattern, the inner circle corresponds to Monday and the outer circle to Sunday. For the monthly pattern, the inner circle corresponds to the first week and the outer circle to the last week. The traffic pattern of the road in Figure 4a is stable on workdays and weekends. However, the monthly variation demonstrates that the road traffic deteriorates since more hours correspond to low travel speed week by week. Figure 4b suggests that the traffic dynamics differ between road directions. Both the weekly and monthly traffic patterns change minimally. The traffic patterns show high self-similarity on workdays and weekends from the weekly perspective, and stable traffic patterns are observed from the weekly perspective.

Traffic Pattern of Local Road Networks
We further examined the traffic pattern of local road networks. Upon clicking a road link, traffic patterns of its adjacent links from both upstream and downstream are visualized. Figure 5 shows an example and the associated daily traffic dynamic. At the selected road link, two roads (W and N) merge. The traffic patterns of adjacent road segments are similar on the long road, according to the traffic of the upstream and downstream roads. Furthermore, the topology influences the road traffic. Due to the merging of the travel flow at the selected roads, more vehicles gather at this intersection, thereby leading to changes in road traffic. According to the traffic of the downstream road in Figure 5b, traffic in these segments is worse than in the upstream segments. This friendly interactive tool enables us to examine the traffic dynamics of local road networks of interest.

Traffic Pattern of Local Road Networks
We further examined the traffic pattern of local road networks. Upon clicking a road link, traffic patterns of its adjacent links from both upstream and downstream are visualized. Figure 5 shows an example and the associated daily traffic dynamic. At the selected road link, two roads (W and N) merge. The traffic patterns of adjacent road segments are similar on the long road, according to the traffic of the upstream and downstream roads. Furthermore, the topology influences the road traffic. Due to the merging of the travel flow at the selected roads, more vehicles gather at this intersection, thereby leading to changes in road traffic. According to the traffic of the downstream road in Figure  5b, traffic in these segments is worse than in the upstream segments. This friendly interactive tool enables us to examine the traffic dynamics of local road networks of interest.

Traffic Pattern Exploration
The relationships of the traffic patterns among road links are also visualized. With the effective synopses, the proposed system supports efficient query of roads with similar patterns. Firstly, we answer the following question: "Which types of patterns are the most common?" We computed traffic pattern of each road link using one-month data and type "all". Via the TRP Algorithm, the pattern of the largest cluster is identified, as shown in Figure 6a. This pattern, in which the speed fluctuates

Traffic Pattern of Local Road Networks
We further examined the traffic pattern of local road networks. Upon clicking a road link, traffic patterns of its adjacent links from both upstream and downstream are visualized. Figure 5 shows an example and the associated daily traffic dynamic. At the selected road link, two roads (W and N) merge. The traffic patterns of adjacent road segments are similar on the long road, according to the traffic of the upstream and downstream roads. Furthermore, the topology influences the road traffic. Due to the merging of the travel flow at the selected roads, more vehicles gather at this intersection, thereby leading to changes in road traffic. According to the traffic of the downstream road in Figure  5b, traffic in these segments is worse than in the upstream segments. This friendly interactive tool enables us to examine the traffic dynamics of local road networks of interest.

Traffic Pattern Exploration
The relationships of the traffic patterns among road links are also visualized. With the effective synopses, the proposed system supports efficient query of roads with similar patterns. Firstly, we answer the following question: "Which types of patterns are the most common?" We computed traffic pattern of each road link using one-month data and type "all". Via the TRP Algorithm, the pattern of the largest cluster is identified, as shown in Figure 6a. This pattern, in which the speed fluctuates

Traffic Pattern Exploration
The relationships of the traffic patterns among road links are also visualized. With the effective synopses, the proposed system supports efficient query of roads with similar patterns. Firstly, we answer the following question: "Which types of patterns are the most common?" We computed traffic pattern of each road link using one-month data and type "all". Via the TRP Algorithm, the pattern of the largest cluster is identified, as shown in Figure 6a. This pattern, in which the speed fluctuates around 30 km/h all day, is the most common pattern. According to the spatial view in Figure 6b, most of the roads are branch roads in the urban center (Futian, Luohu, and Nanshan) and subcenter (Baoan, Longhua, and Longgang), where the travel speeds are limited by the transport law or travel flow.
Facilitated by the pattern view, the question of "which roads share a similar pattern?" is further answered. Here, due to the importance of rush hour in the transport domain, we selected the morning rush hour for case study. We selected one road link with one typical morning rush hour traffic state as the target and queried roads with similar patterns. By specifying various similarity degrees, we obtained various sets of road links. Figure 7 shows the roads that were returned in response to input TV distance values, of 0.88, 0.86, 0.84, and 0.82. The spatial view displays the distribution of roads that have similar patterns. As the similarity degree is relaxed, increasingly, many roads appear. Meanwhile, the temporal change of the traffic states is amplified. From the line charts in the figure, the duration and the drop extent of the morning rush hour differ, although the deterioration of the traffic state in the morning rush hour is maintained.
Sensors 2020, 20, 1084 11 of 16 around 30 km/h all day, is the most common pattern. According to the spatial view in Figure 6b, most of the roads are branch roads in the urban center (Futian, Luohu, and Nanshan) and subcenter (Baoan, Longhua, and Longgang), where the travel speeds are limited by the transport law or travel flow. Facilitated by the pattern view, the question of "which roads share a similar pattern?" is further answered. Here, due to the importance of rush hour in the transport domain, we selected the morning rush hour for case study. We selected one road link with one typical morning rush hour traffic state as the target and queried roads with similar patterns. By specifying various similarity degrees, we obtained various sets of road links. Figure 7 shows the roads that were returned in response to input TV distance values, of 0.88, 0.86, 0.84, and 0.82. The spatial view displays the distribution of roads that have similar patterns. As the similarity degree is relaxed, increasingly, many roads appear. Meanwhile, the temporal change of the traffic states is amplified. From the line charts in the figure, the duration and the drop extent of the morning rush hour differ, although the deterioration of the traffic state in the morning rush hour is maintained. Next, we select patterns with "morning rush hour" and "evening rush hour" as the targets to explore roads that have similar patterns. Figure 8 displays the obtained results in the spatial view. According to Figure 8, many primary roads show the typical "morning rush hour" pattern, especially the main road in south Shenzhen. That is because many people who are living in north Shenzhen travel to work in the south. In contrast, the road with the evening traffic rush pattern is more spatially scattered.  Facilitated by the pattern view, the question of "which roads share a similar pattern?" is further answered. Here, due to the importance of rush hour in the transport domain, we selected the morning rush hour for case study. We selected one road link with one typical morning rush hour traffic state as the target and queried roads with similar patterns. By specifying various similarity degrees, we obtained various sets of road links. Figure 7 shows the roads that were returned in response to input TV distance values, of 0.88, 0.86, 0.84, and 0.82. The spatial view displays the distribution of roads that have similar patterns. As the similarity degree is relaxed, increasingly, many roads appear. Meanwhile, the temporal change of the traffic states is amplified. From the line charts in the figure, the duration and the drop extent of the morning rush hour differ, although the deterioration of the traffic state in the morning rush hour is maintained. Next, we select patterns with "morning rush hour" and "evening rush hour" as the targets to explore roads that have similar patterns. Figure 8 displays the obtained results in the spatial view. According to Figure 8, many primary roads show the typical "morning rush hour" pattern, especially the main road in south Shenzhen. That is because many people who are living in north Shenzhen travel to work in the south. In contrast, the road with the evening traffic rush pattern is more spatially scattered. Next, we select patterns with "morning rush hour" and "evening rush hour" as the targets to explore roads that have similar patterns. Figure 8 displays the obtained results in the spatial view. According to Figure 8, many primary roads show the typical "morning rush hour" pattern, especially the main road in south Shenzhen. That is because many people who are living in north Shenzhen travel to work in the south. In contrast, the road with the evening traffic rush pattern is more spatially scattered. 6. Discussion

Computing Performance
One of the key challenges in traffic pattern extraction is how to meet the need for real-time

Computing Performance
One of the key challenges in traffic pattern extraction is how to meet the need for real-time exploration leveraging accumulated big GPS data. Instead of processed massive GPS trajectories directly, the presented framework used the middle-tier synopses to extract traffic patterns for real-time exploration. The synopses can be computed on-the-fly when GPS records are streamed in or in a pre-processing stage. Therefore, the computation in the data processing layer does not affect the performance of the proposed interactive visual analytics system. In terms of storage, if the total length of the road network is L meters, the finest spatial scale unit is m meters, and the finest temporal scale unit is n minutes, then the number of synopsis in one day is computed as As m and n decreases, the number of synopses increases, and more storage space is needed. Table 3 presents an example of the storage space that is needed based on the road network data of Shenzhen City with various values of m and n. The raw GPS data size for this day is 3221 M. According to the table, if the spatiotemporal unit is "100 m and 5 min", the storage space for the synopses is larger than that for the raw GPS data. Therefore, if the applications do not require the finest scale unit, the storage space will be reduced in most cases. The number of synopses does affect the performance of real-time exploration in the traffic pattern recognition layer and the visual analytic layer. The finer the spatial and temporal units are selected, the bigger the number of synopses is computed. For the traffic pattern recognition layer, the computation performance depends on the size of synopses. Based on end users' interactive setting including roads, duration, and type in TRP, synopses are retrieved from the geodatabase. The retrieving performance is related to the database used. Because the time complexity of the filtering is O(1), retrieving time on large volume of data will not be very long. Suppose m road links and n days are filtered by the specified parameters. Line 9 in the TRP calls the DBSCAN algorithm, which has a time complexity of O((m*n) 2 ). The time complexity of Lines 10-21 is also O((m*n) 2 ). In conclusion, the time complexity of the presented method is O((m*n) 2 ).
In addition, both synopses computing in the data processing layer and in the traffic pattern recognition layer can be decomposed into independent subproblems, which can be solved by advanced distributed computing. Therefore, more computing cells will further improve the computing performance.

Scalability
The presented framework can be reused and extended. First, the framework is general to GPS data collected by many types of vehicles. In the proposed framework, raw GPS data are processed into synopses according to the location (including longitude and latitude) and the recording time. Traffic state is denoted by travel speed. Therefore, any GPS data that contains the fields "longitude, latitude, time stamp, and speed" can be accepted. GPS trajectories of various vehicles including bus, logistics vehicles, and private vehicles are potential data sources. According to official statistics, by the end of December 2018, the number of motor vehicles in Shenzhen was about 3.35 million (http://sztqb.sznews.com/PC/content/201812/10/content_525436.html). It shows the great potential of our framework for better urban traffic understanding.
Second, the interactive visual analytics system can be extended for other transport applications. Due to the efficient middle-tier synopsis, the developed framework integrates the machines' computing ability and the end users' experience, supporting real-time interactive urban traffic exploration analysis. Although this study focuses on traffic, the developed framework can be easily extended to other traffic applications, i.e., reconstructing the evolution of traffic in multiple road intersections, revealing the traffic pattern under events, etc. Furthermore, combined with professional models [60], the developed system can also support innovative transport tasks. For example, by introducing the vehicle emission model, the developed system can be used in fine-grained transport carbon emission and air pollution analysis.

Conclusions
Understanding urban traffic patterns benefits traffic management and urban planning. This study proposes a three-layer interactive framework for the exploration of urban traffic patterns. Massive raw taxi GPS data are assigned to the synopses in the first data processing layer. The computed synopses are stored in geodatabase for quick retrieving to recognize traffic patterns in the traffic pattern recognition layer and the visual analytics layer. User-friendly interfaces are developed for the real-time exploration of complex urban traffic in a mega-city in the spirit of visual analytics. An experiment in Shenzhen was conducted using taxi GPS data that were collected over one month. Following end users' interests, traffic patterns of both the specified area and the related period were computed and visualized in real-time. The proposed visualization interfaces show various traffic patterns in different forms intuitively, which facilitates comparison, perception, and understanding of citywide road traffic. The results demonstrate that the presented framework provides useful tools for road traffic analysis and future transport planning.