Fast L2 Calibration for Inexact Highway Traffic Flow Systems

Jingru Huang; Yan Wang; Mei Han

doi:10.3390/electronics11223710

,

and

¹

School of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing 100124, China

²

College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

^*

Author to whom correspondence should be addressed.

Electronics2022, 11(22), 3710;https://doi.org/10.3390/electronics11223710

This article belongs to the Special Issue Next-Generation Sensing, Computing, and System Engineering for Large-Scale Connected and Automated Vehicle and Transportation: Celebrating the 70th Anniversary of Nanjing University of Aeronautics and Astronautics

Version Notes

Order Reprints

Review Reports

Abstract

Transportation systems need more accurate predictions to further optimize traffic network design with the development and application of autonomous driving technology. In this article, we focus on highway traffic flow systems that are often simulated by the modified Greenshields model. However, this model can not perfectly match the true traffic flow due to its underlying simplifications and assumptions, implying that it is inexact. Specifically, some parameters affect the simulation accuracy of the modified Greenshields model, while tuning these parameters to improve the model’s accuracy is called model calibration. The parameters obtained using the

L_{2}

calibration have the advantages of high accuracy and small variance for an inexact model. However, the method is calculation intensive, requiring optimization of the integral loss function. Since traffic flow data are often massive, this paper proposes a fast

L_{2}

calibration framework to calibrate the modified Greenshields model. Specifically, the suggested method selects a sub-design containing more information on the calibration parameters, and then the empirical loss function obtained from the optimal sub-design is utilized to approximate the integral loss function. A case study highlights that the proposed method preserves the advantages of

L_{2}

calibration and significantly reduces the running time.

Keywords:

traffic flow system; modified greenshields model; sequential sub-design; L₂ calibration; uncertainty quantification

1. Introduction

To keep up with rapidly growing travel demands, urban traffic management systems are required to be continuously updated and innovated. Among them, connected and automated vehicles (CAV) are considered to be new technologies with great promise, as well as the future direction of the global automotive industry. CAV needs more accurate traffic dynamics at the network level to secure transport infrastructure and to prevent traffic congestion [1]. Thus, it is mandatory to analyze more extensively the characteristics of spatio-temporal travel patterns for traffic flow analysis at the network level. Given that the dynamic traffic assignment system (DTA) is often utilized to simulate real traffic flows [2,3], the DTA’s dual-regime modified Greenshields traffic flow model can be employed to simulate highway traffic based on previous experience [4], called computer model or determetic simulator in computer experiments. The model can be expressed as a set of segmentation functions:

v_{l} = \{\begin{matrix} u_{f}, & 0 & < k_{l} < k_{b p}, \\ v_{0} + (v_{f} - v_{0}) {(1 - \frac{k_{l}}{k_{j a m}})}^{α}, & k_{b p} & < k_{l} < k_{j a m}, \end{matrix}

(1)

where

v_{l}

is the speed on link l on which we are focusing,

k_{l}

is the density on link l indirectly determined by the flow rate to speed ratio, i.e.,

k_{l} = f_{l} / v_{l}

, and

f_{l}

denotes the total carriageway flow on link l. The density is the input variable of interest, referred to as design in the computer experiments. Moreover,

u_{f}

,

v_{0}

, and

v_{f}

are the free-flow speed, minimum speed, and intercept speed on link l, respectively,

k_{b p}

and

k_{j a m}

are the breakpoint density and the jam density on link l, and

α

is a shape parameter. Figure 1 illustrates the modified Greenshields model.

Figure 1. The dual-regime modified Greenshields traffic flow model of DTA.

This is because the vehicles’ speeds are different for different times for when the highway is not jammed, i.e.,

v_{0}

varies. Additionally, the speed changes due to the weather and terrain, and there are some ideal assumptions and simplifications in the dual-regime modified Greenshields model compared with the real traffic flow system [5,6,7]. Namely, the modified Greenshields model is inexact.

Let

θ = {(k_{b p}, u_{f}, v_{f}, α, v_{0}, k_{j a m})}^{T}

be unknown or unobservable in the traffic flow system, with

θ

typically affecting the reliability and credibility of the simulators’ outputs. The process of adjusting these parameters utilizing real traffic flow data is called the calibration of computer models, and these parameters are the calibration parameters [8]. For further details on computer model calibration, the reader is referred to [9,10]. The literature has attempted several times to obtain accurate and rapid estimates of the calibration parameters, with many efforts focusing on obtaining a consistent estimator for the calibration parameters, which are then applied to the traffic flow framework. Current methods are the KO calibration [9], the least square calibration (LS) [11,12,13], the weighted least squares method (WLS) [14], and the optimization-based model calibration [3,15].

In [16,17], the authors defined the “true values” of the calibration parameters by minimizing the distance between the computer model and the physical system. A follow-up work, [16], proposed the

L_{2}

calibration method that is one of the most widely used in practice [18,19]. This method is proven to have good statistical properties, including high accuracy and small variance, posing it an appealing calibration solution requiring small sample sizes. A brief review of this method is presented in Section 2. However, due to the large-scale traffic flow framework in practice, the

L_{2}

calibration procedure involves complex integration operations that are calculation intensive. Therefore, to apply the

L_{2}

calibration framework to the traffic flow model, we develop a fast

L_{2}

calibration framework to obtain the estimates and variances of the calibration parameters from real data. Our contributions can be summarized as follows:

We propose a fast $L_{2}$ calibration framework to estimate the calibration parameters. The suggested method finds the optimal sub-design containing more information about the calibration parameters. Then, the empirical $L_{2}$ loss function constructed from the sub-design is used to approximate the integral $L_{2}$ loss function.
We develop the algorithm to generate sequential optimal sub-designs based on the proposed criterion.
A bootstrap method is adopted to quantify the uncertainty of the calibration parameters.

The remainder of this article is organized as follows. Section 2 briefly reviews the

L_{2}

calibration approach proposed in [16]. Then, this section introduces the proposed fast

L_{2}

framework utilizing a sub-design criterion and empirical loss function. The algorithm generating the sequential optimal sub-design and the bootstrap method to quantify the uncertainty are also provided. Section 3 applies the proposed method to the traffic flow model of the M25 motorway in London, and finally, Section 4 concludes this work and discusses the findings.

2. Optimal Sub-Design for the $L_{2}$ Calibration

In Section 2.1, we review the

L_{2}

calibration, and in Section 2.2, we explain how to develop the experimental sub-designs and how to quickly estimate the calibration parameters of the dual-regime modified Greenshields model. The algorithm for generating sequentially optimal designs is provided in Section 2.3, and finally, Section 2.4 suggests a bootstrap method to quantify the uncertainty.

2.1. A Review of the $L_{2}$ Calibration

We assume that the record traffic flow data

V = {(v_{1}^{t}, \dots, v_{n}^{t})}^{T}

are conducted at the design points

K = {(k_{1}, \dots, k_{n})}^{T}

, where

k_{i} \in K

is a design value of the density. Suppose that

ζ (\cdot)

is the real traffic flow system, which is unknown. Since the measurement error always exists, the data can be presented as:

v_{i}^{t} = ζ (k_{i}) + ε_{i}, i = 1, \dots, n,

(2)

where

ε_{i}^{'}

s are independent and identically distributed random variables with zero mean and finite variance

τ^{2} > 0

.

Since we are concerned with the traffic flows on only one link, the subscript l from the notations in the dual-regime modified Greenshields model (1) is deleted thereafter. Let

v (k, θ)

be the output of the dual-regime modified Greenshields model, where

k \in K

indicates the density;

θ = {k_{b p}, u_{f}, v_{f}, α, v_{0}, k_{j a m}} \in Θ

is a set of the calibration parameters. The calibration process aims to find the estimates of

θ

so that the modified Greenshields model outputs are as close as possible to the recorded data. Since the modified Greenshields model is inexact, there is a “distance” between

ζ (\cdot)

and

v (\cdot, \cdot)

, called the discrepancy function. Therefore, the relationship between the simulation model and the discrepancy function can be established as follows:

ζ (\cdot) = v (\cdot, θ^{⋆}) + δ (\cdot),

(3)

where

δ (\cdot)

is the model discrepancy, which is an unknown function. In this article, we consider the observation error and model uncertainty during the calibration and prediction process of the traffic flow simulation model. Moreover,

θ^{⋆} \in Θ

is the “true value” or the optimal calibration parameter, defined as [17]:

θ^{⋆} = \underset{θ \in Θ}{argmin} \int_{K} {(ζ (k) - v (k, θ))}^{2} d k .

(4)

This loss function is named as the

L_{2}

loss function. In [16], the authors proposed the

L_{2}

calibration method, where the definition of the calibration parameter estimation is:

{\hat{θ}}^{L_{2}} = \underset{θ \in Θ}{argmin} \int_{K} {(\hat{ζ} (k) - v (k, θ))}^{2} d k,

(5)

where the optimization function is denoted as the

L_{L_{2}}

loss function and

\hat{ζ} (\cdot)

is a nonparametric estimator of the traffic flow system estimated from the record data. The frequently used estimators include the Gaussian process models [20,21], kernel ridge regression [16], and smooth spline regression [22].

2.2. Optimal Sub-Design Criterion

L_{2}

calibration requires optimization of the functions containing integral operations. When the gradient descent algorithm is used, we must calculate the gradient and the Hessian matrix of the

L_{2}

loss function to estimate the calibration parameters. These calculations involve complex integration operations because the integration needs to be recomputed at each update step. Thus, (4) poses a very challenging optimization process, especially for large-scale network systems such as traffic flow. To overcome this concern, the MCMC method approximates the

L_{L_{2}}

loss integration. Indeed, a discrete set is generated from the design region

K

, denoted

{ξ_{1}, \dots, ξ_{M}}

, and the approximate loss is obtained as follows:

{\hat{L}}_{L_{2}} (θ) = \frac{1}{M} \sum_{i = 1}^{M} {(\hat{ζ} (ξ_{i}) - v (ξ_{i}, θ))}^{2} .

(6)

The minimum value of

{\hat{L}}_{L_{2}}

within

Θ

is noted as

\hat{θ}

, which can be made arbitrarily near to the minimum of the

L_{L_{2}}

. However, the value of the approximate loss function imposes a significant computational burden, since M is a large number. Therefore, we aim to design efficient samples to adjust the calibration parameters accurately under a certain criterion. In other words, we need to search for the optimal sub-design in the design region so that

\hat{θ}

is as close as possible to

θ^{⋆}

.

Our motivation is derived from the truncated least squares (LTS) [23] concept, which uses a portion of the selected samples by sorting the absolute values of the residuals. The proposed approach employs a similar idea to select the design with a large discrepancy over region

K

, affording more robust calibration parameter estimates. Thus, the optimal sub-design can be obtained by maximizing the discrepancy function for a given value of

θ

. Suppose that we have a design of N runs to estimate the calibration parameters efficiently. The non-parametric approximation of the highway traffic flow system is estimated employing the record flow data, which affords considering the system as a known model. Let the optimal sequential design be

k^{⋆} = {k_{1}^{⋆}, \dots, k_{N}^{⋆}} \in K^{N}

, then the first optimality criterion is:

k^{⋆} = \underset{k \in K^{N}}{argmax} ∥ \hat{ζ} (k) - v (k, θ^{⋆}) ∥,

(7)

where

∥ \cdot ∥

denotes the Euclidean distance. Additionally, the optimal sub-design is expected to contain more information of the calibration parameters, which is a concept that is commonly used during the experimental design [20,24,25]. Specifically, this involves placing as many points as possible where the information of

θ

is large affords robust and accurate estimates. Furthermore, based on the information maximization criterion, [24] suggested that the Fisher information matrix (FIM) of

θ

is obtained by:

I (k, θ) = \sum_{i = 1}^{N} \nabla v (k_{i}, θ) \nabla v {(k_{i}, θ)}^{T},

(8)

where

\nabla v (k_{i}, θ) = {(\frac{\partial v (k_{i}, θ)}{\partial k_{b p}}, \dots, \frac{\partial v (k_{i}, θ)}{\partial k_{j a m}})}^{T}

. The FIM is inversely correlated with the variance of the calibration parameters. It is a natural choice to design points where a large amount of information exists. Thus, the second criterion involves maximizing the determinant of

I (k, θ)

for a given

θ

, which has been proven to be the approximate locally D-optimal design [24]. The optimal FIM criterion is:

k^{⋆} = \underset{k \in K^{N}}{argmax} | I (k, θ^{⋆}) |,

(9)

where

| A |

is the determinant of matrix

A

. By considering the above two objectives and aiming to obtain robust and accurate estimates, the design criterion becomes:

k^{⋆} = \underset{k \in K^{N}}{argmax} {∥ \hat{ζ} (k) - v (k, θ^{⋆}) ∥ + λ | I (k, θ^{⋆}) |},

(10)

where

λ > 0

is a hyperparameter, selected as described in Section 2.3.

2.3. Algorithm for Generating a Sequential Optimal Sub-Design

Since

θ^{⋆}

is unknown in (10), k and

θ

must be optimized simultaneously, with the most common solution being updating k and

θ

iteratively using sequential methods. First, assuming that

D_{0} = {k_{1}, \dots, k_{n_{0}}}

is the initial design selected using the space-filling methods, and that the current density set is

D_{i} = {k_{1}, \dots, k_{i}}

,

θ

is estimated through optimizing the empirical

L_{2}

loss function according to

D_{i}

:

{\hat{θ}}_{i} = \underset{θ \in Θ}{argmin} L_{f} (D_{i}, θ),

(11)

where

L_{f} (D_{i}, θ) = \frac{1}{i} \sum_{r = 1}^{i} {(\hat{ζ} (k_{r}) - v (k_{r}, θ))}^{2}

presents the empirical

L_{2}

loss function. Additionally, by fixing

θ^{⋆}

and

D_{i}

in (10) to

{\hat{θ}}_{i}

and optimizing it, we obtain

k_{i + 1} = \underset{k \in K}{argmax} {| \hat{ζ} (k) - v (k, {\hat{θ}}_{i}) | + λ | I (k, {\hat{θ}}_{i} q) |}

. It is widely believed that the design points should be evenly spread in the experimental space to achieve a comprehensive exploration. Thus, we use the grid search method to find the optimal sub-design, which avoids requiring many design points in the neighborhood, with space-filling designs being typically used in grid search methods to generate lattice points that are robust to the modeling choices. In this article, we illustrate the method using the maximin Latin hypercube design (maximin LHD) [20,26], but we will maintain this flexibility of choice for the experimenter. Let the candidate points generated using the maximin LHD be

K^{c} = {k_{1}^{c}, \dots, k_{M}^{c}}

; the optimal density is generated by the following equation according to the sequential criterion:

k_{i + 1} = \underset{k \in K^{c}}{argmax} {| \hat{ζ} (k) - v (k, {\hat{θ}}_{i}) | + λ | I (k, {\hat{θ}}_{i}) |} .

(12)

This article uses the grid search method to select

λ

dynamically. Assuming that the initial alternative points of

λ

are

λ_{1}, \dots, λ_{t}

, the hyperparameter alternatives are input into (12) to obtain t optimal sub-designs, respectively. Let the optimal sub-designs obtained for the ith under different hyperparameters be

D_{i 1}, \dots, D_{i t}

. Applying them to the

L_{f}

loss, the objective hyperparameter is selected by minimizing:

λ_{i}^{⋆} = \underset{j \in {1, \dots, t}}{argmin} L_{f} (D_{i j}, {\hat{θ}}_{i}) .

(13)

where

L_{f} (D_{i j}, {\hat{θ}}_{i}) = \frac{1}{i} \sum_{r = 1}^{i} {(\hat{ζ} (k_{r j}) - y^{s} (k_{r j}, {\hat{θ}}_{i}))}^{2}

. Since

λ

is reselected at each sequential design, we call it a dynamic hyperparameter selection.

Finally, Algorithm 1 summarizes the proposed method for generating the optimal sub-design.

Algorithm 1: Generating the sequential optimal sub-design for the

L_{2}

calibration.

Input: Initial design

D_{0} = (k_{1}, \dots, k_{n_{0}})

, traffic flow data

K = {(k_{1}, \dots, k_{n})}^{T}

and

V = {(v_{1}^{t}, \dots, v_{n}^{t})}^{T}

, candidate design set

K^{c}

, alternative hyperparameter

{λ_{1}, \dots, λ_{t}}

, number of sequential additional points m.

Initialize:

\hat{ζ} (\cdot)

is given based on

K

and

V

.

for

i = 1

to m do

for

j = 1

to t do

k_{i j} \leftarrow \underset{k \in K^{c}}{argmax} {| \hat{ζ} (k) - v (k, {\hat{θ}}_{i - 1}) | + λ_{j} | I (k, {\hat{θ}}_{i - 1}) |}

,

D_{i j} \leftarrow D_{i - 1} \cup k_{i j}

.

end for

λ_{i l}^{⋆} = \underset{l \in {1, \dots, t}}{argmin} L_{f} (D_{i l}, {\hat{θ}}_{j})

,

k_{i}^{⋆} \leftarrow k_{i l}

,

K^{c} \leftarrow K^{c} - {k_{i}^{⋆}}

,

D_{i} \leftarrow D_{i - 1} \cup k_{i}^{⋆}

,

{\hat{θ}}_{i} = \underset{θ \in Θ}{argmin} L_{f} (D_{i}, θ)

.

end for

Output: Optimal sub-design

D_{m}

and calibration parameter estimate

{\hat{θ}}_{m}

.

2.4. Uncertainty Quantification of the Calibration Parameters

In practice, we aim not only to obtain the point estimates of the calibration parameters, but to gain the variance of the parameter estimates to quantify uncertainty. Since the dual-regime modified Greenshields model is deterministic, the model’s uncertainty originates from

\hat{θ}

. Considering the frequency methods, the bootstrap methods have been widely used to calculate the variance and confidence intervals of the parameters [27]. The initial design is repeated for T times, and the estimates of the calibration parameters are obtained using the proposed method. The specific steps are presented below:

Step 1: $K^{'} = {(k_{1}^{^{'}}, \dots, k_{n}^{^{'}})}^{T}$ and the corresponding $V^{'} = {(v_{1}^{t^{'}}, \dots, v_{n}^{t^{'}})}^{T}$ can be obtained using the replacement sampling method from the real traffic flow data $K$ and $V$ .
Step 2: The surrogate model ${\hat{ζ}}^{'} (\cdot)$ of the traffic flow system is estimated according to $K^{'}$ and $V^{'}$ .
Step 3: To estimate the calibration parameters according to Algorithm 1.
Step 4: Repeat the above steps T times to obtain ${{\hat{θ}}_{1}, \dots, {\hat{θ}}_{T}}$ , and compute their variance and empirical confidence interval.

3. Case Study

This section investigates the performance of the proposed method (abbreviated as Fast-

L_{2}

calibration) on the traffic flow system of the M25 motorway in London. The London Orbital motorway is a circular highway around London, and since the M25 is the busiest motorway in the UK and traffic jam is relatively severe [28], we select it for our study. Section 3.1 present the sources and description of the traffic flow data, Section 3.2 introduces the settings of various calibration methods, and Section 3.3 presents the corresponding calibration results.

3.1. Data Source of the Traffic Flow Model

The primary source of traffic data is obtained through loop detectors installed in the highway lanes, and such data are available from several web-based data archiving systems. This work utilizes real and simulated data downloaded from http://tris.highwaysengland.co.uk/detail/trafficflowdata (accessed on 24 September 2022), which contains historical traffic data at 15 min aggregation intervals on the M25 motorway in London from 1 to 5 June 2021. Figure 2 illustrates the distribution of the selected loop detector locations in the study area, and Figure 3 depicts the scatterplot of the record traffic flow data. Figure 3 highlights that when the density is relatively small, the vehicle’s speed remains around 110 km/h. As the density increases, traffic jams occur, and the speed gradually decreases to the minimum value.

Figure 2. Maps of the selected detector locations on the M25 motorway in London, where the big red pin indicates the detector.

Figure 3. Speed vs. density scatterplot from 1 to 5 June 2021.

In [6,29], the authors used the modified Greenshields model to calibrate highways in the United States. Based on their experience and scatter plots, the value regions for the six calibration parameters are reported in Table 1.

Table 1. Value regions for the calibration parameters.

3.2. The Settings of the Calibration Methods

To verify our proposed method’s performance, we challenge it against the

L_{2}

, KO [9], and the LS [11] calibration methods. However, due to the computational burden of the

L_{2}

calibration, in the comparisons, we use the projected

L_{2}

calibration (Proj-

L_{2}

) [30] variant, which is the

L_{2}

calibration method under a Bayesian framework. Given that the true calibration parameter values cannot be calculated in this case study, to evaluate the performance of the calibration methods, we first use the relative prediction discrepancy (RPD) as the statistical criterion to compare the different approaches. The RPD determines the prediction accuracy for the calibrated computer model, defined as follows:

RPD = \frac{1}{M} \sum_{i = 1}^{M} {\frac{1}{n_{t e s t}} \sum_{j = 1}^{n_{t e s t}} |\frac{v_{j}^{t} - v (k_{j}, {\hat{θ}}_{ij})}{v_{j}^{t}}|},

(14)

where

M = 50

is the repetition and

| \cdot |

is the absolute value.

{k_{1}, \dots, k_{n_{t e s t}}}

and

{v_{1}^{t}, \dots, v_{n_{t e s t}}^{t}}

are the testing sets, with

n_{t e s t}

being the sample size.

The initial design is fixed at the same sample size, which is changed at each replication to calculate the RPD. For the proposed method, the initial design size is set to

n_{0} = 2 q

and is obtained using the maximin LHD method from

K

, where q is the dimension of the calibration parameters. The number of additional sequential points is

m = 5 q

obtained on Algorithm 1. For a fair comparison, the sample size is set as

N = 7 q

for the KO, LS, and Proj-

L_{2}

calibration methods, which is the same as the total sample size after adding points for the proposed method.

n_{t e s t}

is set on

n_{0}

and the testing data are selected from the real traffic flows data randomly. For the Fast-

L_{2}

calibration, we use the scaled Gaussian process [21] to estimate the real traffic flow system, and the

RobustGasp

package [31] in R is employed to build the scaled Gaussian Process model. The variance and running time are also used as guidelines for comparing the performances of different calibration methods. Since the Fast-

L_{2}

and LS are frequency methods, we use 500 bootstrap samples to measure their variances and running times. For the KO and Proj-

L_{2}

calibrations, the prior density of

θ

is set as an uninformative prior. Additionally, the

r (\cdot, \cdot)

in the benchmark methods is set on the

Mat \overset{´}{e} rn

kernel function with a smooth parameter

ν = 5 / 2

, and the scaling parameter

ψ

is fixed to

1 / 2

. The variances of KO and Proj-

L_{2}

are calculated using posterior samples of the calibration parameters.

3.3. The Results

Table 2 reports the RPD, the mean standard deviation (mSD) of

\hat{θ}

, and the runtime, which are used to compare the prediction discrepancy and computational efficiency of the four methods.

Table 2. Summary statistics of

\hat{θ}

for different calibration methods.

Due to the ideal assumptions and simplifications, the modified Greenshields model is inexact; that is, the RPDs of four different calibration methods are relatively large. Among the four benchmark methods, the RPD of the Fast-

L_{2}

calibration method is the smallest, affording the best prediction accuracy. According to (14), the RPD of LS is theoretically smaller than Fast-

L_{2}

. Since the Fast-

L_{2}

chooses more efficient sample points according to the proposed optimal criteria, its RPD is smaller than the LS. The RPD of KO’s calibration is larger than Fast-

L_{2}

and LS because the KO does not converge to the true value when the discrepancy function exists [17]. The RPD and mSD of the Proj-

L_{2}

’s calibration are the largest due to the inaccurate Gaussian process estimation. Moreover, Fast-

L_{2}

has the smallest mSD, indicating that it provides the smallest

\hat{θ}

uncertainty. Finally, LS requires the shortest time due to simple calculations, and the runtime of Fast-

L_{2}

only requires 14.28 s, which is much smaller than Proj-

L_{2}

calibration time.

To further compare the uncertainty of each calibration parameter, Figure 4 illustrates the box plots of

\hat{θ}

using different calibration methods. Combined with the mSD in Table 2, it highlights that the proposed calibration parameter estimation has the smallest variance. That is, the uncertainty provided by the Fast-

L_{2}

calibration is smaller than the competitor calibration methods. The estimated value of the Fast-

L_{2}

is close to the LS, and although the true value of

θ

is unknown, the estimates of Fast-

L_{2}

and LS are more accurate than the other RPD-based methods and the

θ

estimates.

Figure 4. Box plots for four calibration methods in case study.

Figure 5 depicts the predictions and confidence interval of the modified Greenshields model after calibration, according to the optimal sub-design criterion. The results infer that the computer model fits the observations well and that the 95% interval is narrow. Additionally, the scatter plots of the testing data and the predicted values are uniformly distributed around

y = x

, with the coefficient of determination

R^{2} = 0.98

.

Figure 5. (Left): Computer model after calibration and the 95% prediction confidence interval, using the proposed method. (Right): Observations vs. predictions of the calibrated computer model.

4. Conclusions

This work proposes a fast

L_{2}

calibration framework suitable for the inexact traffic flow system. The proposed method first suggests an optimal sub-design criterion for the

L_{2}

calibration based on the discrepancy function and FIM, which reduces the computational time and preserves the advantages of

L_{2}

calibration. Considering the space-filling of the design, we employ the grid search method to find the additional points sequentially. Then, we develop an algorithm to generate the optimal design, and a standard bootstrap method is utilized to quantify the uncertainty of the predictors. Finally, we apply the proposed method to a case study of the M25 motorway in London. The results demonstrate that the prediction accuracy of the calibration parameters estimated based on our optimal design criterion is better than that of the current calibration methods. Furthermore, the suggested method significantly improves the computational efficiency of the

L_{2}

calibration and reduces the calibration parameters’ uncertainty. The results demonstrate that the proposed method applies to inexact traffic flow models.

The future research directions are multifaceted. First, since most data have periodicity, which is not considered in our paper, an optimal design criterion for periodic data can be developed in the future. Second, an optimal design criterion under the Bayesian version can be considered as being more convenient for quantifying the uncertainty.

Author Contributions

Conceptualization, Y.W.; writing—original draft preparation, J.H.; writing—review and editing, J.H., Y.W. and M.H.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

Wang’s research was supported by the Natural Science Foundation of Beijing Municipality (1214019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, Q.; Li, L.; Jeff Ban, X. Urban traffic signal control with connected and automated vehicles: A survey. Transp. Res. Part C Emerg. Technol. 2019, 101, 313–334. [Google Scholar] [CrossRef]
Ngoduy, D.; Maher, M. Calibration of second order traffic models using continuous cross entropy method. Transp. Res. Part C Emerg. Technol. 2012, 24, 102–121. [Google Scholar] [CrossRef]
Hale, D.K.; Antoniou, C.; Brackstone, M.; Michalaka, D.; Moreno, A.T.; Parikh, K. Optimization-based assisted calibration of traffic simulation models. Transp. Res. Part C Emerg. Technol. 2015, 55, 100–115. [Google Scholar] [CrossRef]
Mahmassani, H.S.; Dong, J.; Kim, J.; Chen, R.B.; Park, B. Incorporating Weather Impacts in Traffic Estimation and Prediction Systems; US Department of Transport: Washington, DC, USA, 2009; Volume 108.
Payne, H.J. Discontinuity in Equilibrium Freeway Traffic Flow. Transp. Res. Rec. 1984, 1, 140–146. [Google Scholar]
Gu, Z.; Saberi, M.; Sarvi, M.; Liu, Z. A Big Data Approach for Clustering and Calibration of Link Fundamental Diagrams for Large-Scale Network Simulation Applications. Transp. Res. Part C Emerg. Technol. 2017, 23, 901–921. [Google Scholar] [CrossRef]
Alfelor, R.; Mahmassani, H.; Dong, J. Incorporating Weather Impacts in Traffic Estimation and Prediction Systems; Institute of Transportation Engineers Annual Meeting and Exhibit: San Antonio, TX, USA, 2009; Volume 1, pp. 443–457. [Google Scholar]
Box, G.E.; Hunter, W.G. A useful method for model-building. Technometrics 1962, 4, 301–318. [Google Scholar] [CrossRef]
Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2001, 63, 425–464. [Google Scholar] [CrossRef]
Li, Z.; Tan, M.H.Y. A Gaussian process emulator based approach for Bayesian calibration of a functional input. Technometrics 2022, 64, 299–311. [Google Scholar] [CrossRef]
Wong, R.K.; Storlie, C.B.; Lee, T.C. A frequentist approach to computer model calibration. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2017, 79, 635–648. [Google Scholar] [CrossRef]
Schultz, L.; Sokolov, V. Bayesian Optimization for Transportation Simulators. Procedia Comput. Sci. 2018, 130, 973–978. [Google Scholar] [CrossRef]
Sha, D.; Ozbay, K.; Ding, Y. Applying Bayesian Optimization for Calibration of Transportation Simulation Models. Transp. Res. Rec. 2020, 2674, 215–228. [Google Scholar] [CrossRef]
Qu, X.; Wang, S.; Zhang, J. On the fundamental diagram for freeway traffic: A novel calibration approach for single-regime models. Transp. Res. Part B Methodol. 2015, 73, 91–102. [Google Scholar] [CrossRef]
Lee, G.; Kim, W.; Oh, H.; Youn, B.D.; Kim, N.H. Review of statistical model calibration and validation—from the perspective of uncertainty structures. Struct. Multidiscip. Optim. 2019, 60, 1619–1644. [Google Scholar] [CrossRef]
Tuo, R.; Wu, C.J. Efficient calibration for imperfect computer models. Ann. Stat. 2015, 43, 2331–2352. [Google Scholar] [CrossRef]
Tuo, R.; Wu, C.J. A theoretical framework for calibration in computer models: Parametrization, estimation and convergence properties. SIAM/ASA J. Uncertain. Quantif. 2016, 4, 767–795. [Google Scholar] [CrossRef]
Sung, C.L.; Hung, Y.; Rittase, W.; Zhu, C.; Jeff Wu, C. A generalized Gaussian process model for computer experiments with binary time series. J. Am. Stat. Assoc. 2020, 115, 945–956. [Google Scholar] [CrossRef]
Liu, B.; Yue, X.; Byon, E.; Al Kontar, R. Parameter calibration in wake effect simulation model with stochastic gradient descent and stratified sampling. Ann. Appl. Stat. 2022, 16, 1795–1821. [Google Scholar] [CrossRef]
Santner, T.J.; Williams, B.J.; Notz, W.I.; Williams, B.J. The Design and Analysis of Computer Experiments; Springer: Berlin/Heidelberg, Germany, 2003; Volume 1. [Google Scholar]
Gu, M.; Wang, L. Scaled Gaussian stochastic process for computer model calibration and prediction. SIAM/ASA J. Uncertain. Quantif. 2018, 6, 1555–1583. [Google Scholar] [CrossRef]
Eubank, R. Nonparametric Regression and Spline Smoothing; CRC Press: Boca Raton, FL, USA, 1999. [Google Scholar]
Čížek, P.; Víšek, J.Á. Least Trimmed Squares. In XploRe^®—Application Guide; Springer: Berlin/Heidelberg, Germany, 2000; pp. 49–63. [Google Scholar]
Krishna, A.; Joseph, V.R.; Ba, S.; Brenneman, W.A.; Myers, W.R. Robust experimental designs for model calibration. J. Qual. Technol. 2022, 54, 441–452. [Google Scholar] [CrossRef]
Diao, H.; Wang, Y.; Wang, D. A D-Optimal Sequential Calibration Design for Computer Models. Mathematics 2022, 10, 1375. [Google Scholar] [CrossRef]
Fang, K.; Li, R.; Sudjianto, A. Design and Modeling for Computer Experiments; Chapman and Hall/CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
Wu, C.F.J. Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. Ann. Stat. 1986, 14, 1261–1295. [Google Scholar] [CrossRef]
Wang, C.; Quddus, M.A.; Ison, S.G. Impact of traffic congestion on road accidents: A spatial analysis of the M25 motorway in England. Accid. Anal. Prev. 2009, 41, 798–808. [Google Scholar] [CrossRef] [PubMed]
Hou, T.; Mahmassani, H.S.; Alfelor, R.M.; Kim, J.; Saberi, M. Calibration of traffic flow models under adverse weather and application in mesoscopic network simulation. Transp. Res. Rec. 2013, 2391, 92–104. [Google Scholar] [CrossRef]
Xie, F.; Xu, Y. Bayesian projected calibration of computer models. J. Am. Stat. Assoc. 2021, 116, 1965–1982. [Google Scholar] [CrossRef]
Gu, M.; Palomo, J.; Berger, J.O. RobustGaSP: Robust Gaussian stochastic process emulation in R. arXiv 2018, arXiv:1801.01874. [Google Scholar] [CrossRef]

Figure 1. The dual-regime modified Greenshields traffic flow model of DTA.

Figure 2. Maps of the selected detector locations on the M25 motorway in London, where the big red pin indicates the detector.

Figure 3. Speed vs. density scatterplot from 1 to 5 June 2021.

Figure 4. Box plots for four calibration methods in case study.

Figure 5. (Left): Computer model after calibration and the 95% prediction confidence interval, using the proposed method. (Right): Observations vs. predictions of the calibrated computer model.

Table 1. Value regions for the calibration parameters.

$θ$	$k_{bp}$	$u_{f}$	$v_{f}$	$α$	$v_{0}$	$k_{jam}$
Value region	[10, 30]	[80, 130]	[170, 220]	[0, 10]	[0, 5]	[200, 220]

Table 2. Summary statistics of

\hat{θ}

for different calibration methods.

Table 2. Summary statistics of

\hat{θ}

for different calibration methods.

Calibration Methods	Fast- $L_{2}$	KO	LS	Proj- $L_{2}$
RPD	0.7039	2.6187	0.8345	6.7197
mSD	1.3377	3.0554	2.1423	6.8666
Runtime	14.28 s	40.67 s	0.22 s	637.55 s

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Fast L₂ Calibration for Inexact Highway Traffic Flow Systems

Abstract

1. Introduction

2. Optimal Sub-Design for the $L_{2}$ Calibration

2.1. A Review of the $L_{2}$ Calibration

2.2. Optimal Sub-Design Criterion

2.3. Algorithm for Generating a Sequential Optimal Sub-Design

2.4. Uncertainty Quantification of the Calibration Parameters

3. Case Study

3.1. Data Source of the Traffic Flow Model

3.2. The Settings of the Calibration Methods

3.3. The Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Fast L2 Calibration for Inexact Highway Traffic Flow Systems

Abstract

1. Introduction

2. Optimal Sub-Design for the L 2 Calibration

2.1. A Review of the L 2 Calibration

2.2. Optimal Sub-Design Criterion

2.3. Algorithm for Generating a Sequential Optimal Sub-Design

2.4. Uncertainty Quantification of the Calibration Parameters

3. Case Study

3.1. Data Source of the Traffic Flow Model

3.2. The Settings of the Calibration Methods

3.3. The Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Fast L₂ Calibration for Inexact Highway Traffic Flow Systems

2. Optimal Sub-Design for the $L_{2}$ Calibration

2.1. A Review of the $L_{2}$ Calibration