A Data-Driven Quasi-Dynamic Trafﬁc Assignment Model Integrating Multi-Source Trafﬁc Sensor Data on the Expressway Network

: Static trafﬁc assignment (STA) models have been widely utilized in the ﬁeld of strategic transport planning. However, STA models cannot fully represent the dynamic road conditions and suffer from inaccurate assignment during trafﬁc congestion. At the same time, an increasing number of installed sensors have become an important means of detecting dynamic road conditions. To address the shortcomings of STA models, we integrate multi-source trafﬁc sensor datasets and propose a novel data-driven quasi-dynamic trafﬁc assignment model, named DQ-DTA. In this model, records of toll stations are used for time-varying travel demand estimation. GPS trajectory datasets of vehicles are further used to calculate the dynamic link costs of the road network, replacing the imprecise Bureau of Public Roads (BPR) function. Moreover, license plate recognition (LPR) data are used to design a statistical probability-based multipath assignment method to capture travelers’ route choices. The expressway network in the Hunan province is selected as the study area, and several classic STA models are also chosen for performance comparison. Experimental results demonstrate that the accuracy of the proposed DQ-DTA model is about 6% higher than that of the chosen STA models.


Introduction
The use of the expressway has become the favorite choice for inter-city travel due to its high capacity and low time cost. Obtaining traffic flow with high accuracy plays an important role in time-critical traffic planning applications of expressway, including emergency evacuations, incident management, etc. In most regions, traffic flow data are usually collected by a range of traffic observation devices; however, the installation and maintenance of these devices tend to be very costly. It is therefore impossible to directly obtain complete traffic flow in large-scale road networks through the use of sparsely installed measurement devices. Accordingly, the development of theoretical traffic assignment models is of great importance to the calculation of accurate and complete flow information for large-scale road networks [1][2][3][4].
Traffic assignment models can translate time-varying travel demand (i.e., origindestination (OD) pairs) through route assignment into the link flows of the road network, hence deriving the detailed traffic flow of each road segment under various scenarios. To date, a collection of traffic assignment models have been proposed with different computing efficiency and variable computing resource requirements, which can be divided into static traffic assignment (STA) and dynamic traffic assignment (DTA) models.
STA models, which were first proposed in the 1950s, assume that the travel demand for each origin-destination pair is uniformly distributed over time during traffic assignment [5]. Classical STA models include All-or-Nothing (AON) [6,7], Incremental [8,9], 1.
We propose a data-driven quasi-dynamic traffic assignment model (DQ-DTA). This approach is capable of realizing traffic assignment with low computational efficiency, in the same way as traditional STA models, but can achieve higher assignment accuracy in the large-scale expressway network context.

2.
Utilizing fine-grained temporal segmentation, a dynamic link cost calculation method, named DLC, designed to calculate the dynamic link cost through the use of GPS trajectory data to express the time-dependent link cost. The direct cost expression adequately reflects dynamic traffic congestion and improves the accuracy of link travel cost. 3.
To model the multipath choice of travelers, a multipath assignment method based on statistical probability (named MSP) is proposed to accurately capture user path choices from historical travel records. It uses massive amounts of travel history data to generate the statistical probability of selected path choices, and thereby achieves more realistic path assignment when compared with pure mathematical logit models. 4.
We conduct extensive experiments on a real large-scale expressway network. Our experimental results show that the DQ-DTA model can achieve about 6% higher accuracy than the classical STA models.
The remainder of this paper is organized as follows. Section 2 introduces the study area and the data involved. In Section 3, the methodology of the proposed DQ-DTA model is described, while the core procedures of the DQ-DTA model are also illustrated in detail. Subsequently, Section 4 conducts a comparison between the proposed DQ-DTA model and some classical STA models in terms of accuracy. Finally, the conclusion and recommendations for future research are presented in Section 5.

Materials
The study area chosen is the Hunan province, which is located in the central south area of China at 108 • 47 ∼114 • 13 E, 24 • 39 ∼30 • 08 N (see Figure 1). It has a total area of 211,800 km 2 , which covers 14 cities and 122 counties. The Hunan province is also an important inland transportation hub in China.
1. We propose a data-driven quasi-dynamic traffic assignment model (DQ-DTA). This approach is capable of realizing traffic assignment with low computational efficiency, in the same way as traditional STA models, but can achieve higher assignment accuracy in the large-scale expressway network context. 2. Utilizing fine-grained temporal segmentation, a dynamic link cost calculation method, named DLC, designed to calculate the dynamic link cost through the use of GPS trajectory data to express the time-dependent link cost. The direct cost expression adequately reflects dynamic traffic congestion and improves the accuracy of link travel cost. 3. To model the multipath choice of travelers, a multipath assignment method based on statistical probability (named MSP) is proposed to accurately capture user path choices from historical travel records. It uses massive amounts of travel history data to generate the statistical probability of selected path choices, and thereby achieves more realistic path assignment when compared with pure mathematical logit models. 4. We conduct extensive experiments on a real large-scale expressway network. Our experimental results show that the DQ-DTA model can achieve about 6% higher accuracy than the classical STA models. The remainder of this paper is organized as follows. Section 2 introduces the study area and the data involved. In Section 3, the methodology of the proposed DQ-DTA model is described, while the core procedures of the DQ-DTA model are also illustrated in detail. Subsequently, Section 4 conducts a comparison between the proposed DQ-DTA model and some classical STA models in terms of accuracy. Finally, the conclusion and recommendations for future research are presented in Section 5.

Materials
The study area chosen is the Hunan province, which is located in the central south area of China at 108°47′∼114°13′ E, 24°39′∼30°08′ N (see Figure 1). It has a total area of 211,800 km 2 , which covers 14 cities and 122 counties. The Hunan province is also an important inland transportation hub in China. Moreover, as can be seen from Figure 2, a collection of related datasets are collected from the traffic management bureau of the Hunan province: these include the expressway network, toll records, real-time surveillance data (license plate recognition devices, traffic Moreover, as can be seen from Figure 2, a collection of related datasets are collected from the traffic management bureau of the Hunan province: these include the expressway network, toll records, real-time surveillance data (license plate recognition devices, traffic flow observation stations, etc.), and GPS trajectory datasets. The period of all data covers the entire month of January 2018; during this period, severe traffic congestion occurred on the expressway due to extensive holiday-related travels (e.g., on New Year's Day). ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 4 of 16 flow observation stations, etc.), and GPS trajectory datasets. The period of all data covers the entire month of January 2018; during this period, severe traffic congestion occurred on the expressway due to extensive holiday-related travels (e.g., on New Year's Day).

The Expressway Network
At the end of 2018, the total mileage of the expressway in the Hunan province of China had reached 6724.5 kilometers. Since the expressway is a closed network, and all vehicles travelling through this network can only enter and exit through toll stations, the expressway network can be abstracted into a directed graph structure, i.e., the expressway network G = (V, E), with a set of nodes V and directed edges E.
There are 530 edges and 490 vertexes in the simplified expressway graph, as illustrated in Figure 2. Each edge e = (r, s)∈E has one properties cost d(r, s), which represents the length of the edge.

The Travel Toll Records
Time-varying traffic demand data has an important impact on the results of traffic assignment. Existing methods for traffic demand data collection can be classified into two types: survey-based [40] and traffic surveillance [41][42][43][44]. The former has been gradually replaced by the latter due to the former's high labor costs, limited coverage, and strong subjective nature.

The Expressway Network
At the end of 2018, the total mileage of the expressway in the Hunan province of China had reached 6724.5 km. Since the expressway is a closed network, and all vehicles travelling through this network can only enter and exit through toll stations, the expressway network can be abstracted into a directed graph structure, i.e., the expressway network G = (V, E), with a set of nodes V and directed edges E.
There are 530 edges and 490 vertexes in the simplified expressway graph, as illustrated in Figure 2. Each edge e = (r, s)∈E has one properties cost d(r, s), which represents the length of the edge.

The Travel Toll Records
Time-varying traffic demand data has an important impact on the results of traffic assignment. Existing methods for traffic demand data collection can be classified into two types: survey-based [40] and traffic surveillance [41][42][43][44]. The former has been gradually replaced by the latter due to the former's high labor costs, limited coverage, and strong subjective nature.
Here, the expressway toll records are collected to estimate the required time-varying traffic demand. Each record has six attributes: vehicle ID, entry and exit station codes, entry and exit time, and vehicle class. There are about 23 million toll records for January of 2018, three samples of which are listed in Table 1. Since a given trip on one day may extend past the midnight, two different cases in the toll records need to be processed separately for traffic demand extraction. The first case is when the entry time and exit time are both within a single day, meaning that all records are extracted as valid OD pairs. The second case is when the entry and exit times are on different days separately; under these circumstances, records in which less than half of the total travel time elapses on this day will be discarded.

Real-Time Surveillance Data
Two types of real-time surveillance data are involved here: namely, license plate recognition (LPR) data and link flow observation data.
The LPR data is obtained by traffic surveillance cameras located at the chosen links of the expressway network (shown in Figure 2). An LPR record will be generated when the vehicle's license plate is recognized by the installed camera. The record consists of the vehicle ID, site code, record time, and driving direction. There are about 72 million LPR records for January of 2018, three of which are presented in Table 2. There are a total of 38 traffic flow observation stations installed in the chosen study area (shown in Figure 2). The link flow data obtained by traffic flow observation stations can be used to validate the accuracy of the proposed assignment models. All records of traffic flow with different vehicles classes are generalized into a single unit, named the passenger car unit (PCU); moreover, the real daily traffic volume of links is within the range of 1000 and 200,000.

The GPS Trajectory Dataset
Road traffic conditions reflect the congestion level of each road segment, which can in turn be used to calculate the time-dependent link cost (i.e., travel time) with reference to the relationship between road conditions and free flow speed. With the development of intelligent navigation maps, historical or real-time road condition data can now be easily obtained from the remote access interface of online map systems such as Google Maps, etc.
Here, the GPS trajectories collected from 37,221 special-purpose vehicles during January 2018 are used to calculate the time-dependent link cost (i.e., buses, trucks). There are approximately 1.84 billion records in total, and the data frequency is about one GPS point every 10~30 s. Each record contains six attributes: vehicle ID, timestamp, longitude, latitude, instantaneous velocity, and direction.
Before importing GPS trajectory data into our model, it is first necessary to process the GPS trajectory dataset. Initially, the duplicated GPS points in each GPS trajectory are removed; subsequently, the GPS points are matched to the road network using the ST-matching algorithm [45]. Finally, GPS trajectories comprising more than five points are reserved to avoid the interference caused by GPS points that are not located in the selected expressway. Figure 3 outlines the framework of the proposed DQ-DTA model, which is extended from the general framework of a macroscopic DTA model. It appends two novel components: dynamic link cost calculation (DLC) and multipath assignment based on statistical probability (MSP). ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW 6 of 16 matching algorithm [45]. Finally, GPS trajectories comprising more than five points are reserved to avoid the interference caused by GPS points that are not located in the selected expressway. Figure 3 outlines the framework of the proposed DQ-DTA model, which is extended from the general framework of a macroscopic DTA model. It appends two novel components: dynamic link cost calculation (DLC) and multipath assignment based on statistical probability (MSP).

Dynamic Link Cost Calculation
BPR functions are typically used to estimate travel time from known traffic flow. However, the travel times do not follow a convex function with respect to flow [18], which leads to inaccurate travel time estimation, especially during traffic congestion. To overcome this shortcoming of BPR, GPS trajectory data can be employed to effectively reflect real-time traffic conditions; thus, they are used here to calculate the time-dependent travel time of the road network, i.e., DLC, which replaces the BPR function for the link cost calculation.
Following the fine-grained temporal segmentation in DTA, one day is divided into l time intervals. The GPS trajectory dataset then needs to be divided into l parts according to the time intervals. In each decomposed dataset of the GPS trajectory, the attribute of vehicle speed is used to calculate the average travel time for all links in a certain period. For example, if the link E (a, b) has k vehicles at a time period from a to b, the average travel speed and travel time of the current time period in this direction can be expressed by Equations (1) and (2) respectively:

Dynamic Link Cost Calculation
BPR functions are typically used to estimate travel time from known traffic flow. However, the travel times do not follow a convex function with respect to flow [18], which leads to inaccurate travel time estimation, especially during traffic congestion. To overcome this shortcoming of BPR, GPS trajectory data can be employed to effectively reflect real-time traffic conditions; thus, they are used here to calculate the time-dependent travel time of the road network, i.e., DLC, which replaces the BPR function for the link cost calculation.
Following the fine-grained temporal segmentation in DTA, one day is divided into l time intervals. The GPS trajectory dataset then needs to be divided into l parts according to the time intervals. In each decomposed dataset of the GPS trajectory, the attribute of vehicle speed is used to calculate the average travel time for all links in a certain period. For example, if the link E (a, b) has k vehicles at a time period from a to b, the average travel speed and travel time of the current time period in this direction can be expressed by Equations (1) and (2) respectively: where V ab is the average travel speed of link from node a to b; v ji is the speed of vehicle j at GPS point i; m j is the total number of GPS points of vehicle j on link E(a, b); k is the total number of vehicles on link E(a, b); n is the total number of nodes in the expressway network; T ab is average travel time of link E(a, b); finally, D ab is the distance of E(a, b). Hence, two average travel time vectors (i.e., upward and downward) of each link for each day can be obtained in l time intervals. For example, for the link E(a, b), the format of the time vector can be expressed in Table 3. Table 3. Travel time vector of one link E(a, b) by time interval.

Link
Direction Travel Time Vector

Multipath Assignment Based on Statistical Probability
To overcome this shortcoming of logit-related models during multipath traffic assignment, LPR data can be used to constrain the path choice for OD pairs; this is done because the multipath sets of constrained OD pairs are much closer to the realistic path choices. Accordingly, utilizing massive amounts of travel history data provided by OD pairs and LPR records, a multipath assignment method based on statistical probability (MSP) can be defined to capture user route choices.
There are two types of OD pairs, namely those with and those without internal LPR records. The decomposition and probability generation procedure is applied for the former, while multipath restoration is used for the latter. Ultimately, all the results of prior processing are used for traffic assignment.
For OD pairs with internal LPR records, each pair is decomposed by LPR points according to the temporal order. In Figure 4, points S1 and S2 are the LPR points of one OD pair, where S1 belongs to link E(a, b) and S2 belongs to link E(p, q). The OD pair can thus be decomposed into five sub-OD pairs (o-a, a-b, b-p, p-q, and q-d), so that a detailed path of OD (o-a-b-p-q-d) can be derived.
Since the travel path choices differ depending on vehicle class, the OD pairs and the later path statistics are classified according to these vehicle classes. After decomposing these OD pairs containing the LPR points, the paths of the vehicles with the same class and the same OD are counted. For example, for the OD pair (r, s) with class i, one path h i, j of the multipath set h i and the probability (P i, j ) of path h i, j can be counted, which can be expressed in Equations (3)-(5) below. Moreover, the probability generation results of the OD pair (r, s) with class i are listed in Table 4.
h r,s i = h r,s i1 , . . . , h r,s im , r ∈ N, s ∈ N, i ∈ C  Since the travel path choices differ depending on vehicle class, the OD pairs and th later path statistics are classified according to these vehicle classes. After decomposin these OD pairs containing the LPR points, the paths of the vehicles with the same cla and the same OD are counted. For example, for the OD pair (r, s) with class i, one path h of the multipath set hi and the probability (Pi, j) of path hi, j can be counted, which can expressed in Equations (3), (4), and (5) below. Moreover, the probability generation resu of the OD pair (r, s) with class i are listed in Table 4.    Here, N is the set of all nodes on the expressway network; C is a set of all classes of vehicles; m is the total path count for the OD pair (r, s) with vehicle class i; h r,s i is the multipath set of the OD pair (r, s) with vehicle class i, while N r,s i,j is the count of vehicle class i in path h r,s i,j of the OD pair (r, s). For OD pairs without internal LPR records, the results of probability generation (i.e., multipath sets and statistical probability) can be used to restore traveler route choices. These OD pairs are grouped into different sets depending on the vehicle class and counted with the same OD. The route choices of each OD set can then be restored by the multipath sets, such that the counts of each restored path are calculated by the product of the total count and the corresponding statistical probability. For example, vehicles of class A have 500 counts of OD pair (r, s); the results of multipath restoration are listed in Table 5. During the traffic assignment, since the OD pairs are decomposed into a detailed path comprising certain sub-OD pairs, each sub-OD pair can be treated as an independent OD pair for processing. Initially, each sub-OD pair is divided into l discrete time intervals according to its entry and exit time. Next, the l weights of each time interval are calculated and set for the sub-OD pairs. Each weight w i is equal to the ratio of the travel time length in each discrete time interval to the total travel time. For example, consider a vehicle that enters the expressway at 13:30 on 1 January 2018 and exits the expressway at 15:30. If l is set to 24, the total duration of this sub-OD pair is 120 minutes; accordingly, the weight w 13 = 0.25, w 14 = 0.5, w 15 = 0.25, while the other weights are 0.
To find the minimum cost path of each sub-OD pair, the link cost d e of all links on the expressway needs to be calculated for the time period of the sub-OD pair. The timedependent link cost d e can be updated using the weights w of each sub-OD pair and the variables of travel time vector T e to perform weighted average summation. For example, for the sub-OD pair (r, a), the link cost d r,a e of all of its links can be calculated using Equation (6) below. The Dijkstra algorithm [6] is then used to find the path with minimum cost, after which all links of the path are assigned the same traffic flow at the same time. Finally, the above steps are looped for all sub-OD pairs.
Here, w i is the time weight of the sub-OD pair (r, a) in the time interval i; l is the time interval size; T e i is the travel time of the link e in the time interval i; finally, d r,a e is the dynamic link cost of edge e in the time intervals of the sub-OD pair (r, a).
In addition, since the structure of the expressway network is a directed graph, the assigned traffic volumes on the link also have directions (up and down) on the expressway network. Consequently, the total traffic volume of a link is the summation of all traffic volumes assigned in both directions of the link.

Results and Discussion
In order to evaluate the performance of our DQ-DTA model, we apply the proposed approach to the real large-scale expressway network in the Hunan province. Two experiments are conducted to compare the accuracy of different traffic assignment models, in which the proposed DQ-DTA model and four classical STA models (AON, Incremental, STOCH, and UE) are implemented.

Performance Comparison with Classical STA Models
Classical STA models provided by the TransCAD software package were used to assign the traffic flow, specifically AON, Incremental, STOCH, and UE. The assigned traffic flow of these STA models were then compared with the real traffic volumes recorded by traffic flow observation stations, after which the accuracy of the assignment is evaluated using the mean relative error (MRE), derived according to Equation (7): where m is the number of real observation data, i.e., 38 here; moreover, y i is the assigned traffic flow of the link, while y ir is the real traffic flow. In this experiment, the time interval l of the DQ-DTA model is provisionally set to 24; the sensitivity of the value of l is analyzed in more detail below. The MRE values for these five models are plotted in Figure 5, from which it can be seen that the DQ-DTA model achieves the highest accuracy and an MRE of about 0.08. The performances of the four classical STA models are almost identical (i.e., around 0.15), while the MRE of the UE model, which performs better than the other classical models, is about 0.143. Thus, the accuracy of the proposed DQ-DTA model is about 6~7% higher than that of these STA models.
these five models are plotted in Figure 5, from which it can be seen that the DQ-DTA model achieves the highest accuracy and an MRE of about 0.08. The performances of the four classical STA models are almost identical (i.e., around 0.15), while the MRE of the UE model, which performs better than the other classical models, is about 0.143. Thus, the accuracy of the proposed DQ-DTA model is about 6~7% higher than that of these STA models. In addition, the relative errors (RE) of all the chosen links among these five models are shown in Figure 6a and Figure 6b, respectively. In Figure 6a, the relative errors of the four classical STA models generally fall between 0.04 and 0.2, while their maximum relative error is close to 0.78. However, the relative errors of the DQ-DTA model are mostly between 0.03 and 0.1, indicating that the MRE has been greatly improved. At the same time, the maximum relative error drops from 0.78 to 0.32. In the violin plots in Figure 6b, the relative errors of nearly all the links with the DQ-DTA model fall below 0.1, which indicating that the assignment quality have been greatly improved by the DQ-DTA model. In addition, the relative errors (RE) of all the chosen links among these five models are shown in Figure 6a,b, respectively. In Figure 6a, the relative errors of the four classical STA models generally fall between 0.04 and 0.2, while their maximum relative error is close to 0.78. However, the relative errors of the DQ-DTA model are mostly between 0.03 and 0.1, indicating that the MRE has been greatly improved. At the same time, the maximum relative error drops from 0.78 to 0.32. In the violin plots in Figure 6b, the relative errors of nearly all the links with the DQ-DTA model fall below 0.1, which indicating that the assignment quality have been greatly improved by the DQ-DTA model.

Ablation Study on the DQ-DTA
The UE model that achieves the best performance out of the classical STA models is chosen for comparison with the DLC, MSP, and DQ-DTA models in order to analyze the improvement achieved by these approaches. For the DLC, all initial OD pairs are used to assign directly the minimum link cost without decomposition of the OD pairs and multipath restoration. Likewise, for the MSP, the dynamic link cost calculation is replaced by calculating the shortest path according to the distance attribute of the expressway network.
The mean relative error and relative error distribution of the assigned results for the UE, DLC, MSP, and DQ-DTA models are plotted in Figure 7, Figure 8a, and Figure 8b. As the figures show, the DLC, MSP, and DQ-DTA models achieve a certain degree of improvement in terms of accuracy compared with the UE model; moreover, the DQ-DTA

Ablation Study on the DQ-DTA
The UE model that achieves the best performance out of the classical STA models is chosen for comparison with the DLC, MSP, and DQ-DTA models in order to analyze the improvement achieved by these approaches. For the DLC, all initial OD pairs are used to assign directly the minimum link cost without decomposition of the OD pairs and multipath restoration. Likewise, for the MSP, the dynamic link cost calculation is replaced by calculating the shortest path according to the distance attribute of the expressway network.
The mean relative error and relative error distribution of the assigned results for the UE, DLC, MSP, and DQ-DTA models are plotted in Figure 7, Figure 8a, and Figure 8b. As the figures show, the DLC, MSP, and DQ-DTA models achieve a certain degree of improvement in terms of accuracy compared with the UE model; moreover, the DQ-DTA model, which combines both the DLC and MSP, scores the highest in terms of accuracy.
The UE model that achieves the best performance out of the classical STA models is chosen for comparison with the DLC, MSP, and DQ-DTA models in order to analyze the improvement achieved by these approaches. For the DLC, all initial OD pairs are used to assign directly the minimum link cost without decomposition of the OD pairs and multipath restoration. Likewise, for the MSP, the dynamic link cost calculation is replaced by calculating the shortest path according to the distance attribute of the expressway network.
The mean relative error and relative error distribution of the assigned results for the UE, DLC, MSP, and DQ-DTA models are plotted in Figure 7, Figure 8a, and Figure 8b. As the figures show, the DLC, MSP, and DQ-DTA models achieve a certain degree of improvement in terms of accuracy compared with the UE model; moreover, the DQ-DTA model, which combines both the DLC and MSP, scores the highest in terms of accuracy.   379 and 367) are chosen for further improvement analysis. The relative errors on the chosen two links for UE, DLC, MSP, and DQ-DTA are shown in Figure 9. First, MSP substantially reduces the relative error of link 379 from 0.467 to 0.338, while the DLC achieves only a slight improvement. Second, DLC greatly reduces the relative error of link 367 from 0.414 to 0.169; by contrast, MSP achieves only moderate improvement. Finally, the DQ-DTA model obtains the lowest relative error for links 379 and 367. To explain the contribution difference of DLC and MSP between links 379 and 367, the road network connectivity and traffic congestion distribution of these two links are compared from a visual perspective in Figure 10. For link 379, the connectivity degrees of the two endpoints are higher in the road topology, and this means that there are more route choices through these intersections during the multi-path traffic assignment; thus, the MSP can improve the assigned result more significantly. At the same time, the traffic conditions of link 379 are less congested than that of link 367, as shown in Figure 10b, so the DLC achieves very little improvement. Likewise, for link 367, the DLC improves the assigned result more significantly due to the traffic condition being much poorer. In addition, although the connectivity degrees of link 367's right endpoint are still high in the road topology, the route choices may be much reduced due to poor traffic conditions; To explain the contribution difference of DLC and MSP between links 379 and 367, the road network connectivity and traffic congestion distribution of these two links are compared from a visual perspective in Figure 10. For link 379, the connectivity degrees of the two endpoints are higher in the road topology, and this means that there are more route choices through these intersections during the multi-path traffic assignment; thus, the MSP can improve the assigned result more significantly. At the same time, the traffic conditions of link 379 are less congested than that of link 367, as shown in Figure 10b, so the DLC achieves very little improvement. Likewise, for link 367, the DLC improves the assigned result more significantly due to the traffic condition being much poorer. In addition, although the connectivity degrees of link 367's right endpoint are still high in the road topology, the route choices may be much reduced due to poor traffic conditions; consequently, while the MSP model achieves some improvement, it does not improve the assigned result as much as the DLC. Overall, the ablation study of DLC, MSP, and DQ-DTA indicate that the DLC and MSP models improve the accuracy relative to the classical STA models to varying extents while the DQ-DTA model can combine the superiority of both the DLC and MSP model to obtain the most accurate results.

Sensitivity Analysis
The sensitivity analysis determines how the MRE of assignment models changes with different time interval sizes in the DQ-DTA model. Six kinds of time granularity are cho sen for the experiments: i.e., 1/4, 1/2, 1, 2, 3, and 6 hours. The sensitivity analysis result are listed in Figure 11.
The mean relative errors of these two models are nearly proportional to the time in terval size, which is in line with the actual situation of the dynamic time-varying natur of the DTA. As the time interval l increases, the MRE of the assigned results of DLC and DQ-DTA rapidly decrease when l is less than 24, then decrease more slowly when l ex ceeds 24. It can accordingly be concluded that the time interval size l can be set to 24 (i.e. one hour per time interval), which is appropriate for the actual large-scale expressway Overall, the ablation study of DLC, MSP, and DQ-DTA indicate that the DLC and MSP models improve the accuracy relative to the classical STA models to varying extents, while the DQ-DTA model can combine the superiority of both the DLC and MSP models to obtain the most accurate results.

Sensitivity Analysis
The sensitivity analysis determines how the MRE of assignment models changes with different time interval sizes in the DQ-DTA model. Six kinds of time granularity are chosen for the experiments: i.e., 1/4, 1/2, 1, 2, 3, and 6 hours. The sensitivity analysis results are listed in Figure 11. terval size, which is in line with the actual situation of the dynamic time-varying nature of the DTA. As the time interval l increases, the MRE of the assigned results of DLC and DQ-DTA rapidly decrease when l is less than 24, then decrease more slowly when l exceeds 24. It can accordingly be concluded that the time interval size l can be set to 24 (i.e. one hour per time interval), which is appropriate for the actual large-scale expressway network. Furthermore, this value can strike a balance between computation time and accuracy.
(a) (b)  The mean relative errors of these two models are nearly proportional to the time interval size, which is in line with the actual situation of the dynamic time-varying nature of the DTA. As the time interval l increases, the MRE of the assigned results of DLC and DQ-DTA rapidly decrease when l is less than 24, then decrease more slowly when l exceeds 24. It can accordingly be concluded that the time interval size l can be set to 24 (i.e., one hour per time interval), which is appropriate for the actual large-scale expressway network. Furthermore, this value can strike a balance between computation time and accuracy.

Conclusions
In summary, a data-driven quasi-dynamic traffic assignment (DQ-DTA) model is proposed that integrates multi-source traffic sensor data for a large-scale expressway network. The expressway network, real-time toll records and LPR data, traffic flow observation stations, and GPS trajectory data are combined to accurately obtain the traffic demand, waypoints, real traffic volumes, and time-varying traffic condition. Furthermore, a dynamic link cost calculation method (DLC) based on GPS trajectory data is designed to replace the BPR function in order to calculate the link cost. Moreover, by combining the OD pairs and LPR data, a multipath assignment method based on statistical probability (MSP) is proposed to help in precisely restoring the user path choices. The results of the DQ-DTA model were verified using real link flow data from traffic flow observation stations; according to our experiments, the mean relative error is improved by nearly 6% compared to classical STA models. Furthermore, the model has low computing resource requirements, while its computational efficiency remains close to that of the STA models owing to the lack of iteration for calculation.
However, there are still some limitations of the DQ-DTA model that will be addressed in the near future.

1.
Our proposed approach only considers the time cost from traffic congestion during the link cost update. Other factors influencing travel costs, e.g., toll fees [7], weather conditions, and differences between week days/weekends should also be considered later.

2.
This model is currently applied only to the closed expressway networks. We will further apply our model into urban road networks, in which the openness of the road topology and signal light control [46,47] would increase the difficulty of both travel demand estimation and travel-time cost calculation. In addition, the complexity of road topological structure will also increase the computational complexity of traffic assignment models.

3.
With the development of the Internet of Things, other complex data-driven methods (e.g., deep learning) can be explored to address traffic assignment problems.