1. Introduction
The inherent accuracy and uncertainty in travel forecasting models are receiving increasing attention from the scholarly and practicing communities. As an example of this attention, the Standing Committee on Transport Forecasting of the Transportation Research Board has made uncertainty one of its primary research agenda issues [
1], following a major report from the Federal Highway Administration on the topic [
2]. Given that such models are used in the allocation of billions of dollars of infrastructure financing each year, the financial risks for inaccurate or imprecise forecasts are high [
3,
4]. Systemic under- or over-prediction could lead to substantial over- or under-investment in the highway network [
5].
Transportation demand forecasting models, like other mathematical-statistical models, might be abstracted into the following basic form:
where
y is the variable being predicted based on input data
X, moderated through a specific functional form
and parameters
. Three general sources of error may cause a forecast value
to differ from the “true” or “actual” value of
y [
6]:
Of these potential sources of error, only the third is substantively addressed in classical statistics. The standard errors of the model parameter estimates from a theoretical perspective address the parameter uncertainty question to a great degree. Yet even this source of uncertainty has been largely ignored in transportation forecasts, and model development documentation often elides the variance in these values completely [
8]. Zhao and Kockelman [
9] examined the effects of this parameter uncertainty in a trip-based model of a contrived 25-zone region, but a systemic analysis of this uncertainty in a practical model is not common.
In this research, we investigate the uncertainty in traffic forecasts resulting from plausible parameter uncertainty in an advanced trip-based transportation demand model. Using a Latin hypercube sampling (LHS) methodology, we simulate one hundred potential parameter sets for a combined mode and destination choice model in Roanoke, VA, USA. We then assign the resulting trip matrices to the highway network for the region and evaluate the PM and daily assigned traffic volumes alongside the variation in implied impedance and accessibility.
This paper proceeds with a description of the model design and simulation sampling methodology in
Section 3, followed by a discussion of the variations in mode, destination, and traffic performance measures in
Section 4. This paper concludes in
Section 5 with a summary of the key findings alongside a presentation of limitations and related indications for future research.
2. Literature Review
Uncertainty has been examined in various ways over the last two decades and is becoming increasingly important for researchers. This review looks at why uncertainty is important to evaluate in transportation demand models and discusses research that has been conducted to evaluate this uncertainty. Rasouli and Timmermans [
6] presented an extensive literature review on this topic. An overview of the literature and the sources of uncertainty they evaluated can be found in
Table 1.
Model accuracy is the basis for studying the uncertainty of input data and/or parameter estimates. Travel forecasters have always been cognizant of the uncertainty in their forecasts, especially as project decisions are made using these models, often with high financial impacts.
Flyvbjerg et al. [
3] collected data from various forecasting traffic models with an emphasis on rail projects. They used the forecast data for a given year and the actual values that were collected for the same year. Their study found that there is a statistical significance in the difference between the estimated and actual values. Rail projects are generally overestimating passenger forecasts by 106%, and half of road projects have a traffic forecast difference of plus or minus 20%. They did not identify where this inaccuracy came from, but they identified that it was important for future research.
Armoogum et al. [
12] looked at uncertainty within a forecasting model for the Paris and Montreal metropolitan regions. The sources of uncertainty analyzed were the calibration of the model, the behavior of future generations, and demographic projections. A jackknife technique, rather than sampling methods, was used to estimate confidence intervals for each source of error using multiple years of analysis. This technique is a way to reduce the bias of an estimator and permits the estimation of confidence intervals to produce variance estimates. They found that the longer the forecasting period was, the larger the uncertainty. Generally, the model forecast is within 10–15%, reaching higher percentage ranges for variables with small values or small sample sizes.
Welde and Odeck [
14] compared actual and forecast traffic values for 25 toll and 25 toll-free roads in Norway. They evaluated the accuracy of Norwegian transportation planning models over the years. Generally, traffic models overestimate traffic. This study found that toll projects, on average, overestimated traffic, but only by an average of 2.5%. Toll-free projects, however, underestimated traffic by an average of 19%. They concluded that Norwegian toll projects have been fairly accurate, with probable cause coming from the scrutiny that planners face when developing a toll project. A similar scrutiny should then also be placed on toll-free projects as they are significantly less accurate.
These articles show that models have errors that affect traffic projections by a significant amount. These articles identified that the error existed but did not quantitatively identify the source of the error. The most researched source of error has been model form, but that research has mostly been excluded from this review as it is not the main focus of this research. The second most researched form has been input data. Chronologically, Rodier and Johnston [
10], Zhao and Kockelman [
9], Clay and Johnston [
11], Duthie et al. [
13], Yang et al. [
15], Manzo et al. [
16], and Petrik et al. [
17] have all researched input errors, with all but the first authors also looking at parameter estimate errors. Parameter estimation error has been the least researched source of uncertainty, with no studies focusing only on this source of error. Petrik et al. [
18] looked at parameter estimates but with a focus on model form error. The details of each study are described below in chronological order.
Rodier and Johnston [
10] looked at uncertainty in socioeconomic projections (population and employment, household income, and petroleum prices) at the county level for the Sacramento, California region. They wanted to know if the uncertainty in the range of plausible socioeconomic values was a significant source of error in the projection of future travel patterns and vehicle emissions. They identified ranges for population and employment, household income, and petroleum prices for two scenario years (2005 and 2015). The ranges varied based on the scenario year and the socioeconomic variable. They changed one variable at a time for a total of 19 iterations of the model run for 2005 and 21 iterations for 2015. Their results indicated that errors in projections for household income and petroleum prices are not significant sources of uncertainty, but error ranges for population and employment projections are significant sources of change in travel and emissions. The input data of population and employment were significant to model result uncertainty.
Zhao and Kockelman [
9] looked at the propagation of uncertainty through each step of a trip-based travel model from variations among inputs and parameters. This analysis used a traditional four-step urban transportation planning process (trip generation, trip attraction, mode split, and trip assignment) on a 25-zone sub-model of the Dallas-Fort Worth metropolitan region. The Monte Carlo simulation was used to vary the input and parameter values. These values were all assigned using a coefficient of variation (
) of 0.30. The four-step model was run 100 times with 100 different sets of input and parameter values. The results of these runs showed that uncertainty increased in the first three steps of the model and the final assignment step reduced the compounded uncertainty, although not below the levels of input uncertainty. The authors determined that uncertainty propagation was significant from changes in inputs and parameters, but the final step nearly stabilized the uncertainty to the same amount as assumed (0.30
assumption with a 0.31
in the trip assignment results).
Another study that looked at input data uncertainty was Clay and Johnston [
11]. These researchers varied three inputs and one parameter to analyze the uncertainty of outputs on a fully integrated land use and travel demand model of six counties in the Sacramento, California region. The variables used for analysis were productions, commercial trip generation rates, perceived out-of-pocket costs of travel for single-occupant vehicles, and concentration parameters. Exogenous production, commercial trip generation rates, and the concentration parameter varied by plus or minus 10, 25, and 50%, while the cash cost of driving varied by plus or minus 50 and 100%. This resulted in 23 model runs—one for each changed variable and one for the base scenario. Their research showed that any uncertainty in the inputs resulted in a large difference in the vehicle miles traveled output, although this difference was a lower percentage than the uncertainty in the input.
Duthie et al. [
13] evaluated uncertainty at a different level. They used a small generic gravity-based land use model with the traditional four-step approach, using a coefficient of variation of 0.3, as suggested by Zhao and Kockelman [
9], for both inputs and parameters, albeit using antithetic sampling. In this sampling method, pairs of negatively correlated realizations of the uncertain parameters are used to obtain an estimate of the expected value of the function. The uncertainty was evaluated on the rankings of various transportation improvement projects. They found that there are a few significant differences that arise when changing the input and parameter values that result in different project rankings, and thus, neglecting uncertainty can lead to suboptimal network improvement decisions.
Yang et al. [
15] evaluated a quantitative uncertainty analysis of a combined travel demand model. They looked at input and parameter uncertainty using a coefficient of variation of 0.30. Rather than using a random sampling method for choices, they used a systematic framework with a variance–covariance matrix. Their research found that the coefficient of variation of the outputs is similar to the coefficient of variation of the inputs and that the effect of parameter uncertainty on output uncertainty is generally higher than that of input uncertainty. This finding contradicts the finding by Zhao and Kockelman [
9]. The authors concluded that improving the accuracy of parameter estimation is more effective than enhancing input estimation, as they found that in most steps of the model, the impact of parameter uncertainty was more significant than that of input uncertainty.
Manzo et al. [
16] looked at uncertainty in model input and parameters for a trip-based transportation demand model in a small Danish town. They used a triangular distribution with LHS to create the range in parameters; using the information from Zhao and Kockelman (2002) [
9], they also used a coefficient of variation of 0.30 and 100 draws, choosing these values as they had been previously used. Their addition to the research on uncertainty involved examining uncertainty under different levels of congestion. Their research found that there is an impact on the model output from the change in input and parameter uncertainty and requires attention when planning. Also, model output uncertainty was not sensitive to the level of congestion.
Petrik et al. [
17] evaluated uncertainty in mode shift predictions due to uncertainty from input parameters, socioeconomic data, and alternative specific constants. This study was based on a high-speed rail project in Portugal as a component of the Trans-European Transport Network. They collected survey data and developed discrete choice models. The authors created their own parameter values from the collected data, obtaining the mean or “best” value from the surveys and the corresponding t-statistic. With these, they generated 10,000 samples each of parameter values, socioeconomic inputs, and mode-specific constants, using bootstrap resampling, Monte Carlo sampling, and triangular distribution methods, respectively. The authors found that variance in alternative specific attributes is the major contributor to output uncertainty, in comparison to variance in parameters or socioeconomic factors. Socioeconomic data had the least contribution to overall output variance, and there was a relatively insignificant mode shift due to variability in parameters.
Petrik et al. [
18] used an activity-based microsimulation travel demand model for Singapore to evaluate model form and parameter uncertainty. This model has 22 sub-models and 817 parameters. The authors determined which of the 817 parameters the sub-models were most sensitive to and applied a full sensitivity analysis of the top 100 of the parameters, preserving correlations. Using the mean parameter values and the standard deviations available for each, the authors used Latin hypercube sampling with 100 draws to look at the outcomes of the change in each parameter value. Different-sized samples of the model population were also considered in their research. They found that of the 100 most sensitive parameter values, the outcome coefficient of variation varied from 3% to 49%. The variance of the parameter variables did not exceed 19% and, thus, the results from the parameter uncertainty were higher than the variance in the parameters. They also found that the results of the parameter uncertainty were higher than simulation uncertainty.
In transportation demand models, when uncertainty is analyzed, most research to this point has focused on input uncertainty or model forms, rather than parameter estimate uncertainty [
6]. Of the 12 articles in this review, two focus only on input data as the subject of their uncertainty research, three focus on model form uncertainty, one looks at both model form and parameter estimate uncertainty, and six focus on both input data and parameter estimate uncertainty. No researchers have looked at parameter estimate uncertainty as the only source of error in their models. When parameter uncertainty has been examined in existing literature, it is often in conjunction with input errors, or on small and non-practicing models. No studies that we could identify have used real models for their analyses. Uncertainty research is needed as transportation demand models provide estimates and forecasts for decision-makers and policymakers. An inaccurate model or large output variance could change what decisions are made and when [
1]. Thus, there is a critical research need for a detailed exploration of parameter estimation uncertainty in a practical travel model.
5. Conclusions
The results of this research show that despite large variations in mode and destination choice parameters—and consequently, in accessibility—the impact of this variation on assigned highway volumes is limited. To our knowledge, this is the first systematic evaluation of parameter uncertainty in a practical travel model in the literature, with prior research being limited to toy networks (e.g., [
9]). The resulting uncertainty in the output forecasts was generally smaller than the input parameter variance, confirming the results of Petrik et al. [
18] in a different context. In this application, at least, the variation in mode and destination choice probabilities appears to be constrained by the capacities and procedures of the highway network assignment.
Several limitations must be mentioned in this research. First, we did not attempt to address the statistical uncertainties in trip production estimates; these may play a substantially larger role than the destination and mode choice parameters, given that lower trip rates may lead to lower traffic volumes globally, which could not be “corrected” by the traffic assignment. Second, a different methodology of sampling might have produced a different result at the extremes than the results of LHS. Additionally, the relatively sparse network of the RVTPO model region—lacking parallel high-capacity highway facilities—may have meant that the network assignment process would converge to a similar solution point regardless of modest changes to the trip matrix. If the number of paths between nodes is limited and constrained by highway capacity, there are only so many solutions to any highway assignment process. It may be that in a larger network with more path redundancies or more alternative transit services, the assignment may not have been as helpful in constraining the forecast volumes. In general, these findings are based on a specific trip-based travel demand model, which may not be applicable to all contexts or regions. The results might vary significantly in different geographic areas or under different modeling frameworks, such as activity-based models. Attempting this research again with a variety of models and geographic regions would be a valuable research priority.
In this research, we only had the estimates of the statistical coefficients and, therefore, had to assume a coefficient of variation to derive variation in the sampling procedure. It would be better if model user guides and development documentation more regularly provided estimates of the standard errors of model parameters. The ideal would be variance–covariance matrices for the estimated models, enabling researchers to ensure that covariance relationships between sampled parameters are maintained. Future research might reconsider the present experiment but allow for a correlation between parameter values.
Notwithstanding these limitations, statistical parameter variance does not appear to be the largest source of uncertainty in travel forecasting. There are likely more important factors at play that planners and government agencies should address. Research on all sources of uncertainty is somewhat limited, but in many ways has been hampered by the burdensome computational requirements of many modern travel models [
4]. This research methodology benefited from a lightweight travel model that could be repeatedly re-run with dozens of sampled choice parameters. One strategy for applying this methodology to larger models may involve using the relatively recent TMIP-EMAT exploratory modeling toolkit [
23]. But a better understanding of the other sources of uncertainty—model specification and input accuracy—might also benefit from lightweight models constructed for transparency and flexibility rather than heavily constrained models emphasizing precise spatial details and strict behavioral constraints. This might allow forecasts to be made with an ensemble approach [
24], identifying preferred policies as the consensus of multiple plausible model specifications.