An Open GMNS Dataset of a Dynamic Multi-Modal Transportation Network Model of Melbourne, Australia

: Simulation-based dynamic trafﬁc assignment models are increasingly used in urban transportation systems analysis and planning. They replicate trafﬁc dynamics across transportation networks by capturing the complex interactions between travel demand and supply. However, their applications particularly for large-scale networks have been hindered by the challenges associated with the collection, parsing, development, and sharing of data-intensive inputs. In this paper, we develop and share an open dataset for reproduction of a dynamic multi-modal transportation network model of Melbourne, Australia. The dataset is developed consistently with the General Modeling Network Speciﬁcation (GMNS), enabling software-agnostic human and machine readability. GMNS is a standard readable format for sharing routable transportation network data that is designed to be used in multimodal static and dynamic transportation operations and planning models. :


Summary
The emergence of dynamic traffic assignment (DTA) since the late 1970s [1] is largely due to the fact that traffic networks are generally not in a steady state, as depicted by static traffic assignment. Thus, two important extensions were made: (i) travel times may change due to varying traffic conditions (i.e., experienced travel time); and (ii) travel times on used routes should be equal for the same departure time interval [2][3][4]. A recent methodological review of DTA can be found in [5]. Due to their dynamic nature, DTA models are able to capture and replicate complex behavior of formation, propagation, and dissipation of traffic congestion in cities [6] and are often used for transportation network design and planning purposes such as corridor management [7], downtown traffic management [8], road pricing [9], and emergency management [10]. A comprehensive summary of DTA applications can be found in [2].
More recently, a growing number of studies have unraveled the potential benefits of big data in understanding and modeling of urban transportation networks, including use of mobile phone data [11,12] and, more generally, floating car data [13,14]. Nevertheless, Data 2021, 6, 21 2 of 9 simulation-based DTA has still remained the dominant and more widely used methodology in understanding and predicting both the short-and long-term dynamics of traffic congestion in urban networks.
Applications of simulation-based DTA models have been growing in urban transportation systems operations and planning, but development of such models requires a large number of data inputs that often make their real-world applications a practical challenge. These inputs generally fall into two categories of demand and supply sides [2]. The former typically include time-dependent origin-destination (TDOD) matrices and traveler behavior models' parameters, while the latter consist of network geometry, traffic control information, traffic flow parameters, and others. Some of these inputs can be relatively easy to acquire such as network geometry (e.g., via OpenStreetMap), whereas some others are difficult to observe in reality such as TDOD matrices. Thus, calibration and validation of simulation-based DTA models are integral to their deployment to ensure that models accurately reflect the ground truth [15][16][17][18][19].
The overall deployment procedure of simulation-based DTA models is data demanding, but lack of a globally used data standard has been limiting reproducibility of easily transferable large-scale travel demand models. In a recent and timely initiative by the Zephyr Transport Foundation [20], a standard format for sharing routable transportation network data known as the General Modeling Network Specification (GMNS) is developed [21]. GMNS is designed to facilitate sharing tools and datasets for development of static and dynamic multi-modal transportation networks. In this paper, we develop and share an open GMNS dataset for reproduction of a dynamic multi-modal transportation network model of Melbourne, Australia (see [19] for model deployment details).

Data Description
In this section, we follow the GMNS data sharing standards that include two general data elements: basic and advanced. The basic data elements are sufficient to generate a routable transportation network while the advanced data elements provide more information on the time-dependent features of the network including movements and traffic signal controls. We also include additional data elements, not included in the GMNS for sharing travel demand origin-destination matrices and observed traffic volume data for model validation and benchmarking purposes. See Figure 1 for an illustration of the relational structure of the developed GMNS dataset.

Basic Network Data Elements
The basic network data elements include information on The node data table (node.csv) includes of 2077 rows and nine columns in which each row represents an individual node, and the columns contain information on the geometry of the nodes including node ID, latitude and longitude, type of node (e.g., external, centroid, etc.), and node control type (e.g., signal) (see Table 1). Note that all latitude and longitude information included in the table below and all consecutive tables are in the UTM (Universal Transverse Mercator) coordinate system.  The link data table (link.csv) includes 4223 rows and 22 columns in which each row represents an individual edge in the network and the columns contain information including link ID, link name, parent link ID, from node ID, to node ID, link type (e.g., directed), geometry ID, geometry, direction flag, length, grade, facility type, capacity, free flow speed, number of lanes, bike facility, pedestrian facility, parking, allowed uses, toll, jurisdiction, and row width. Some of the columns are left blank as we did not have the associated information to include in the table (see Table 2).  2967  60399  60400  9000  100  5  3008  60436  60437  1800  100  1  3507  60903  60904  1000  50  1  4905  61867  60903  950  90  1

Advanced Network Data Elements
The advanced network data elements include information on The link data table (link.csv) includes 4223 rows and 22 columns in which each row represents an individual edge in the network and the columns contain information including link ID, link name, parent link ID, from node ID, to node ID, link type (e.g., directed), geometry ID, geometry, direction flag, length, grade, facility type, capacity, free flow speed, number of lanes, bike facility, pedestrian facility, parking, allowed uses, toll, jurisdiction, and row width. Some of the columns are left blank as we did not have the associated information to include in the table (see Table 2).

Advanced Network Data Elements
The advanced network data elements include information on The location data table (location.csv) consists of 4289 rows and 10 columns in which each row represents a specific location in the network (e.g., bus stops and detectors) and its attributes are nearly the same as those for a node, except that the location includes an associated link. The zone ID field enables the network to be loaded via locations (similar to what is done in existing commercial transportation network traffic simulation). Other information included in this table are location ID, link ID, referred node ID, LR number (it is used if link geometry exists; otherwise, the link geometry is assumed to be the straight-line distance between the from node and to node), x and y and z coordinates, location type and zone ID (see Table 3). The movement data table (movement.csv) describes how inbound and outbound links connect at an intersection. The table contains 6879 rows and 14 columns in which each row represents a movement at a node and columns consist of movement ID, node ID, movement name, inbound link ID, start_ib_lane (innermost lane number the movement applies to at the inbound end), end_ib_lane (outermost lane number the movement applies to at the inbound end), outbound link ID, start_ob_lane (innermost lane number the movement applies to at the outbound end), end_ob_lane (outermost lane number the movement applies to at the outbound end), movement type (e.g., left, right or thru), penalty (in seconds), capacity, and control type (e.g., no control, signal, yield, and stop) (see Table 4). Table 4. Example of movement data (not all columns are included). 85940  63750  39211  1  1  8149  1  2  85941  63750  39211  1  1  8150  1  2  85942  63750  43991  1  1  8148  1  1  85943  63750  43991  1  2  8150  1  2 The signal controller data table (signal_controller.csv) contains 1 row and 1 column, which is the controller ID. The signal controller is associated with an intersection or a cluster of intersections. Here, all traffic signals are coded as actuated.

MVMT_ID Node_ID Ib_Link_ID Start_Ib_Lane End_Ib_Lane Ob_Link_ID Start_Ib_Lane End_Ib_Lane
The signal timing plan data table (signal_timing_plan.csv) includes 372 rows and 4 columns in which each row represents a particular signal timing time. Columns consist of time plan ID, controller ID, time day, and cycle length. This data table establishes timing plans for signalized nodes (see Table 5). The signal timing phase data table (signal_timing_phase.csv) includes 2756 rows and 13 columns. Each row describes an individual timing phase. Columns consist of timing phase ID, timing plan ID, signal phase number, min green, max green, extension, clearance, walk time, pedestrian clearance, ring, barrier, and position. This data table provides signal timing information and establishes phases that may run concurrently for signalized nodes (see Table 6). The signal phase movement data table (signal_phase_mvmt.csv) includes 3959 rows and 8 columns. Each individual row represents a signal phase. Columns consist of signal phase movement ID, controller ID, timing phase ID, signal phase number, timing plan ID, movement ID, link ID, and protection. This data table associates movements and pedestrian links (e.g., crosswalks if any) with signal phases. A signal phase may be associated with several movements. A movement may also run on more than one phase (see Table 7).

Additional Network Data Elements
The GMNS does not provide any formatting standard for sharing the time-dependent origin-destination demand matrices that are often used in transportation network models. Here, we use a commonly used long format in the literature for sharing network weight matrices in which rows and columns represent the travel demand (number of vehicle trips) going from one node to another with provided node IDs. Here the travel demand moves from one centroid to another centroid. Note that the connectors between centroids to links are not provided. Users can connect centroids to links or nodes as appropriate in their application. We provide 16 origin-destination demand matrices for every 15 min time interval from 6:00 am to 10:00 am. All matrices are calibrated using empirical data as discussed in detail in [15] (see Table 8).  We also provide observed traffic volume data (observed_traffic_volume.csv) on 448 links across the network for every 15 min time interval obtained from loop detector data on both freeways and arterials. These data can be used for benchmarking and model validation if needed. Note that GMNS does not provide any formatting standard for sharing observed traffic volume data (see Table 9).

Methods
The presented network data consists of 2077 nodes, 4223 links, and a time-dependent origin-destination demand matrix with 330,000 commuting trips in a 4 h long morning peak period. The spatial configuration of the road network is obtained from the Victoria Integrated Transport Model (VITM) [22] consisting of 416 traffic zones (see Figure 2). Road links are grouped into several classes based on their physical attributes including information such as the number of lanes, capacity, and free flow speed. Nodes include two key attributes of permitted turning movements and signal control parameters. The network also includes 372 actuated signal controls that are set up using Sydney Coordinated Adaptive Traffic System (SCATS) data containing information on the cycle time, the turning movements associated with each signal phase, and the maximum and minimum green times (see Figure 3). The network also includes 2483 bus stops specified as points associated with the links (see Figure 4). validation if needed. Note that GMNS does not provide any formatting standard for sharing observed traffic volume data (see Table 9).

Methods
The presented network data consists of 2077 nodes, 4223 links, and a time-dependent origin-destination demand matrix with 330,000 commuting trips in a 4 h long morning peak period. The spatial configuration of the road network is obtained from the Victoria Integrated Transport Model (VITM) [22] consisting of 416 traffic zones (see Figure 2). Road links are grouped into several classes based on their physical attributes including information such as the number of lanes, capacity, and free flow speed. Nodes include two key attributes of permitted turning movements and signal control parameters. The network also includes 372 actuated signal controls that are set up using Sydney Coordinated Adaptive Traffic System (SCATS) data containing information on the cycle time, the turning movements associated with each signal phase, and the maximum and minimum green times (see Figure 3). The network also includes 2483 bus stops specified as points associated with the links (see Figure 4).    The dataset also includes traffic count data from hundreds of loop detectors across the network that are map matched to the road network using the commonly available spatial analysis tool in ArcGIS [23]. Note that the presented model and the time-dependent origin-destination matrices are validated using a large set of path travel times obtained from the publicly available Google distance matrix API [24]. The time-dependent origindestination demand matrices are calibrated based on an initial static demand matrix obtained from the VITM and re-estimated using traffic counts from freeways and signalized intersections.
The obtained and collated data here are manually checked against other publicly available datasets such as Google Maps and OpenStreetMap to ensure the location of nodes and links are correct. Capacity and free flow speed information are directly obtained from VITM and have not been quality controlled against in any other source, assuming information already in the VITM is of high quality as the model has been widely used by the government of Victoria and many consulting firms in the transportation profession for many years. There may be some disparity between the number and location of bus stops across the network, given the public transportation system in the area might have gone through occasional network design updates. However, the change is expected to be small without significant impact on the overall bus network. The signal timing information is also directly obtained from available SCATS data prior to 2015. Given the nature of SCATS, the signal timing data may have also changed over time since then. The dataset also includes traffic count data from hundreds of loop detectors across the network that are map matched to the road network using the commonly available spatial analysis tool in ArcGIS [23]. Note that the presented model and the time-dependent origindestination matrices are validated using a large set of path travel times obtained from the publicly available Google distance matrix API [24]. The time-dependent origin-destination demand matrices are calibrated based on an initial static demand matrix obtained from the VITM and re-estimated using traffic counts from freeways and signalized intersections.
The obtained and collated data here are manually checked against other publicly available datasets such as Google Maps and OpenStreetMap to ensure the location of nodes and links are correct. Capacity and free flow speed information are directly obtained from VITM and have not been quality controlled against in any other source, assuming information already in the VITM is of high quality as the model has been widely used by the government of Victoria and many consulting firms in the transportation profession for many years. There may be some disparity between the number and location of bus stops across the network, given the public transportation system in the area might have gone through occasional network design updates. However, the change is expected to be small without significant impact on the overall bus network. The signal timing information is also directly obtained from available SCATS data prior to 2015. Given the nature of SCATS, the signal timing data may have also changed over time since then. However, the impact on the overall performance of the network given its large scale and its strategic long-term application is expected to be minimal.
Further details of the deployment, calibration and validation of the network model can be found in [19,25]. The developed network model has already been used in various transportation network design, traffic operations, and optimization applications [26][27][28][29][30][31][32]. For more information on GMNS specifications, please refer to [21].