1. Introduction
In classical static analysis, the sensitivity information of link travel cost mainly holds the decision-making of considering new road links to be added into a transportation network [
1,
2,
3,
4]. Currently, the era of big traffic data, huge historic and real-time traffic data is well generated from the so-called Intelligent Transportation Systems (ITS) such that data-driven methods are widely applied to traffic analysis and forecast [
5]. It is believed that the ability to predict traffic information based on big open data is one of important building blocks to enrich the effectiveness of dynamic traffic control strategies. However, existing studies rely heavily on macroscopic and aggregate viewpoints of traffic data patterns.
In this paper, the authors proposes a microscopic approach [
6], which is based on temporal-spatial recording of vehicle passage along a trip on the freeway, to extract some key traffic patterns by using a newly developed method [
7]. The amount of the timestamp gantry sequences collected in freeway automatically is very large, and, therefore, it is necessary to have a scalable approach to make the extract of significant temporal patterns practical and possible. The method to extract the significant temporal patterns and their corresponding class frequency distribution of these patterns is adapted from the previous works [
7] that had been applied for an U.S.A. patent application as “Wang, Ching-Tu. Method for Extracting Maximal Repeat Patterns and Computing Frequency Distribution Tables. Patent Application Serial Number 15/208,994. 13 July 2016.”
In this study, a significant pattern is defined as one maximal repeat [
8] extracted from the gantry timestamp sequences that can not be the subsequences of another pattern all the time. This process of extracting significant travel time patterns is based on the previous work [
7] that was a scalable approach via Hadoop MapReduce programming model [
9]. There are two passes to extract maximal repeats. The first pass is to verify the right and left boundary of candidate maximal repeats and the second pass is to estimate one candidate maximal repeat as a maximal repeat if that repeat passes both left and right boundary verification. In [
10], Wang adopted an external memory approach using only one general computer with limited memory. However, the computational time was too long to be satisfied as a reasonable time, e.g., several weeks, from the practical point of view. In [
7], Wang used Hadoop, a distributed computing platform, to speed up that computation. Note that the Hadoop MapReduce programming model is well known for its scalability in solving big data problems [
11,
12,
13].
To show briefly the concept of this study,
Figure 1,
Figure 2 and
Figure 3 are given and described in the following. First of all,
Figure 1 shows the passages of five vehicles from left to right on a freeway with their origin and destination interchanges. In particular, a detection gantry, denoted as GID in
Figure 1, is located at the mainline section between two adjacent interchanges to capture complete passage of vehicles.
Figure 2 gives the corresponding gantry sequences with timestamps according to their trip, respectively. Namely, a timestamp is recorded with one gantry simultaneously when a vehicle is passing that gantry. To show the situation of these five vehicles passing, for simplicity, their timestamps attached with the same gantry are assumed to be the same in
Figure 2. That is, the amount of the flow of vehicles passing consecutive gantries can be obtained by computing the frequency of gantry sequences whose corresponding timestamps passing each of these gantries are the same.
Figure 3, for example, gives the significant travel time patterns extracted according to those gantry sequences in
Figure 2. Note that the pattern “GID_2 GID_3” with specific timestamps is not generated in
Figure 2 because the “GID_2 GID_3” always is followed by the “GID_4” such that the “GID_2 GID_3” is not a maximal repeat. On the other hand, the pattern “GID_3 GID_4” is a maximal repeat because it is not always preceded by the “GID_2” as the passage of the vehicle “VT31-1” did not have the “GID_2”.
To show the work of this study being practical, experimental resources are downloaded from the Traffic Data Collection System (TDCS),one of Taiwan government open data platforms. Experimental results contain the statistics of frequency distribution of vehicles passing through one selected gantry sequence according to 24 h per day; these statistics are expected to provide experts with hints to inspect traffic congestions and to help drivers how to avoid traffic jam. It is expected and attractive that this approach can provide the metadata of traffic patterns to enrich the effectiveness of dynamic traffic control strategies in the future.
The remainder of this paper is organized in the following.
Section 2 describes the data of gantry timestamp sequences and the scalable approach of maximal repeat extraction.
Section 3 shows experimental results.
Section 4 discusses future works and
Section 5 presents conclusions.
4. Discussion
The analysis model of traffic flow theory [
15,
16,
17,
18,
19,
20] focused on describing the evolutionary behavior of traffic variables temporally and spatially. In historical studies [
5,
21], the statistics of traffic variables were collected and counted in aggregate based on a single detector observation. In modern traffic systems, or the so called intelligent transportation systems, vehicle detections are accomplished automatically via the transaction data of electronic devices of, which vehicle identification is widely available. With these records collected and the novel approach proposed in this paper, one can not only analyze the characteristics of traffic flow but also study the traffic congestion problem about the upstream propagation phenomenon such that he/she can have more precise observation about traffic flow from a microscopic point of view.
Due to the scalability of the previous work [
7], it is believed that our approach can handle a larger amount of timestamp gantry sequences collected from a longer time period, e.g., several years. With these long-term and fine-grain statistics of travel time patterns of vehicles, the domain expects to have precise observation of vehicle behavior such that they can trace back to analyzing why these historical events resulted in traffic jam or congestion. Therefore, they can provide a new approach or police to avoid such a situation happening again in the future.
Indeed, there is still a lot of room for improving this study. First of all, an interactive data visualization interface, especially integrated with geographical map, is needed to promote the usage of these statistics as well as to stimulate users’ comprehension sensitively. Instead of “Hadoop” based on an external-memory method, on the other hand, “Spark”, an in-memory distributed computing, is expected to run 100× faster than Hadoop MapReduce.With the aid of cloud computing for sharing these statistics of travel time patterns in times, if possible, that are computed consistently via “Spark”, it may be attractive and expected to have this work as a software package, for providing current traffic information, integrated with IoT (Internet of Things) to make smart or driverless cars with more intelligence in the future.
5. Conclusions
This paper provides a novel approach, adopting a previous work based on the Hadoop MapReduce programming model, in order to extract significant travel time patterns from gantry timestamp sequences and, in the meantime, compute the class frequency distribution of these patterns, where the classes can be derived from the combination of timestamp and vehicle information according to what kind of distribution users desire to observe or analyze. Experimental resources include the timestamp gentry sequences of vehicles passed in five months from 2016/11 to 2017/3 that were downloaded from the Traffic Data Collection System (TDCS), one of the Taiwanese government’s open data platforms. The longest trip within Freeway No. 5, including six gantries in both the southern and northern directions, is selected for demonstration. Many kinds of class frequency distributions of significant travel time patterns are computed according to different combinations of time unit and vehicle information. The statistics, computed from the class frequency distribution, did reveal some interesting and valuable information about traffic and transportation issues for further research or analysis.