Growing traffic congestion on America’s roadways has negative impacts on mobility, the environment, and the economy. According to a Texas A & M Transportation Institute report, the total congestion cost for 471 U.S. urban areas in 2014 was $
160 billion, and congestion caused travelers to waste 6.9 billion hours and more than 3 billion gallons of fuel [1
]. Congestion can result from excessive traffic demand, the presence of physical bottlenecks, traffic incidents, work zones, adverse weather conditions, and special events. In an effort to improve transportation network performance it is important to understand the factors that contribute to congestion development and implement strategies to alleviated congestion.
Practices for transportation data collection, management and governance vary from agency to agency. A comprehensive synthesis of practice was published in 2017 which summarizes transportation agency data management practices based on literature review, a two-phase online survey and follow-up interviews with transportation agency representatives [2
]. The study recommended the development of a framework for integrating data within transportation agencies; case studies to assess the magnitude and complexity of data managed by transportation agencies; and the development of methods and case studies for mining archived data at these agencies [2
Systematic collection of traffic data is of great importance for congestion monitoring but has proven to be a costly and challenging process. In the past, only a limited number of public agencies had comprehensive data collection programs to generate reliable estimates of congestion performance measures as the high costs associated with extensive data collection deterred many states from investing in such programs [3
]. Recognizing the value of traffic data availability, in 2013 the US federal government acquired a national data set of average travel times called National Performance Management Research Data Set (NPMRDS) and made it available to States and Metropolitan Planning Organizations (MPOs) to use for their transportation performance management activities [4
]. NPMRDS is a vehicle probe-based travel time data set with data records being collected from a variety of sources. The database contains hundreds of billions of records that cover the entire National Highway System (NHS) containing all interstates and US highways.
While the benefits of gaining access to a comprehensive database such as NPMRDS are tremendous, some challenges and difficulties have been reported by MPOs, practitioners, and researchers in their efforts to utilize the NPMRDS data set to develop performance measures and generate reports for congestion monitoring. Among them was the Wisconsin Traffic Operation and Safety Laboratory, one of the first institutes that used probe for transportation performance monitoring. In 2014, they developed a performance measurement process that describes the steps that should be taken for data processing and developing mobility measures such as Travel Time Reliability and Vehicle Delay by integrating hourly volume into NPMRDS [5
]. Regarding data management, they declared that the data set required the usage of database and scripting skills for this purpose. They also studied travel time data distributions and confirmed the presence of outliers and data gaps in the data set.
The University of Minnesota and Minnesota DOT provided another valuable report focusing on performance analysis of a total of 38 freight corridors using the NPMRDS database, and Structured Query Language (SQL) scripts for data processing [6
]. This work demonstrated the feasibility of travel time data records obtained from freight trucks as a data source for the study of speed variation and truck delay during peak hours.
In another study, the American Transportation Research Institute (ATRI) reported on the cost of delay and congestion experienced by the freight industry [7
]. The University of Maryland conducted a validation analysis between NPMRDS and I-95 Corridor Coalition’s Vehicle Probe Project (VPP) data. The researchers pointed out that the comparison between different data sources is complicated as it requires careful consideration of the differences in segments given that every data collection source uses different segmentations for collecting traffic data [8
Another research institute that performed a validation analysis was the Upper Midwest Reliability Resource. They reported that the travel time data records in the NMPRDS data set to display a higher variation and a lower mean of travel time, compared to data records from the INRIX data set. PostSQL and Psycopg were utilized to store the data set, and data analysis was performed by writing codes in Python [9
In May 2014, Iteris Inc. offered a training module called “MAP-21 Module” to help agencies meet reliability and congestion mitigation reporting requirements established by MAP-21 [10
]. To overcome the issue of handling big data, this module stored NPMRDS into a series of databases which enabled users to query the data through a web interface and to develop performance measures and maps for visualization purposes [10
To date, the majority of published research on the generation of transportation performance measures using NPMRDS relied on the usage of complex programming languages and databases and was performed by experts in such fields. However, employees of small and mid-size MPOs and transportation agencies’ staff have encountered difficulties in utilizing the NMPRDS data set for congestion monitoring purposes due to the lack of experience in database management and big data analytics. To address this issue, this study developed an automated process to manage and store NPMRDS data for the Birmingham, AL region. Moreover, the study used traffic data analytics and statistical analysis to extract travel time reliability and other congestion performance measures. Such measures were used to determine the congestion extent and severity and guide optimization of operations along the study corridors.
5. Conclusions and Recommendations
This study was undertaken to (a) showcase the development of an automated process to facilitate the management, storage, and processing of big transportation data sets such as NPMRDS for congestion monitoring applications, (b) use traffic data analytics and statistical analysis to extract travel time reliability performance measures in a Birmingham case study, and (c) use reliability performance measures to determine the congestion extent and severity and guide optimization of traffic operations in the Birmingham region.
The case study utilized the NPMRDS data set in order to quantify congestion in the Birmingham region over an one-year period (2015) along four major freeways namely I-65, I-20, I-59, and I-20/I-59. RDBMS was employed as an efficient and economical tool for data management and SQL was used to extract data and perform the analysis. A range of performance measures was calculated for quantifying the congestion location, level, and extent, and used to prioritize freeway segments needs with respect to congestion. The performance measures calculated were the Travel Time Index (TTI), the Duration of Congestion (DOC), the 85th Percentile Congestion Intensity, and the 85th Speed-drop. In addition, calculation of an Impact Factor was proposed and used for ranking the congested segments. Such rankings can be used as a systematic and data-driven method for prioritizing resource allocations for operational improvements. The analysis revealed that the segments 17 and 23, with relatively high values for 85th Percentile Congestion Intensity and Speed-drop are the most unreliable segments in the study area and thus require close attention.
Overall, the study findings can be valuable in guiding transportation professionals and agencies on how to use big transportation databases such as NPMRDS to quantify the level and extent of congestion, and generate performance-based measures. Such performance measures can, in turn, be used as an initial screening process for congestion management purposes and help to identify locations where implementation of congestion mitigation initiatives have the best potential return for the investment.
Future work can consist of validating the proposed approach using a larger sample size. In addition, it is recommended that further studies be conducted that investigate in greater depth the effect of outliers on Travel Time Reliability measures. It is also desirable to extend the work to include consideration of additional data sources such as volume data, incident data, weather events and work zone presence information in order to improve the understanding of the causes of uncertainty in travel time and more accurately quantify recurrent and non-recurrent congestion in the future.