Next Article in Journal
Superconducting Quantum Interferometers for Nondestructive Evaluation
Next Article in Special Issue
A General Framework for 3-D Parameters Estimation of Roads Using GPS, OSM and DEM Data
Previous Article in Journal
Dynamic Spectrum Access for Internet of Things Service in Cognitive Radio-Enabled LPWANs
Previous Article in Special Issue
IMU-Based Gait Recognition Using Convolutional Neural Networks and Multi-Sensor Fusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks

1
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China
2
Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China
3
Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hong Kong 999077, China
4
Shenzhen Key Laboratory of Spatial Smart Sensing and Services, Shenzhen University, Shenzhen 518060, China
*
Authors to whom correspondence should be addressed.
Sensors 2017, 17(12), 2822; https://doi.org/10.3390/s17122822
Submission received: 2 November 2017 / Revised: 28 November 2017 / Accepted: 5 December 2017 / Published: 6 December 2017

Abstract

:
Travel times in congested urban road networks are highly stochastic. Provision of travel time distribution information, including both mean and variance, can be very useful for travelers to make reliable path choice decisions to ensure higher probability of on-time arrival. To this end, a heterogeneous data fusion method is proposed to estimate travel time distributions by fusing heterogeneous data from point and interval detectors. In the proposed method, link travel time distributions are first estimated from point detector observations. The travel time distributions of links without point detectors are imputed based on their spatial correlations with links that have point detectors. The estimated link travel time distributions are then fused with path travel time distributions obtained from the interval detectors using Dempster-Shafer evidence theory. Based on fused path travel time distribution, an optimization technique is further introduced to update link travel time distributions and their spatial correlations. A case study was performed using real-world data from Hong Kong and showed that the proposed method obtained accurate and robust estimations of link and path travel time distributions in congested road networks.

1. Introduction

Accurate and robust estimation of travel time distribution, including mean and variance, is a crucial requirement for advanced traveler information systems (ATIS). Provision of travel time distribution information through ATIS enables travelers to make reliable path choice decisions, ensuring a higher chance of on-time arrival [1,2,3]. The provided distribution information also allows operators to evaluate network performance and reliability, and identify bottlenecks for proactively deploying effective controls to improve overall traffic conditions [4,5].
Recent advances in information and communication technologies (ICTs) have produced a variety of spatiotemporal big data for travel time estimation [6]. Existing data collection techniques could be classified into point detection, interval detection, and floating car systems [7,8]. Point detectors (such as loop detectors and video image detectors) are generally deployed at specific road segment locations, to collect vehicle point speeds. Interval detectors consist of a pair of devices deployed in road networks to directly calculate travel times between the device pair. Typical interval detectors include automatic vehicle identification (AVI), Bluetooth, and license plate recognition devices. In contrast to the above fixed detectors, floating car systems use a fleet of probe vehicles, typically taxi cabs, equipped with global positioning system (GPS) devices. The probe vehicle locations and speeds are collected at given time intervals to estimate network traffic conditions [9]. These data collection techniques have generated complementary heterogeneous data sources with distinct data quality and network coverage.
Accurate and robust estimation of travel time distributions from heterogeneous data sources is somewhat challenging in congested road networks. Firstly, although rich traffic observations from multiple data sources are beneficial, data quality variability from different data sources presents a serious impediment to robust estimation of travel time distributions. Data quality variability may raise from a variety of reasons, such as detector failures, measurement errors, sample size variations, etc. [10,11,12,13]. Therefore, traffic observations from different sources can be inconsistent and even conflicting. Thus, robust data fusion techniques are urgently required, relatively insensitive to data quality of heterogeneous data sources.
Secondly, traffic data from multiple data sources has enhanced data availability for major roads in a network, but the limited coverage across the whole network poses a significant challenge to accurate travel time distribution estimates. Traffic data from point detectors cover all vehicles at deployed locations and have a very good temporal sampling, but their spatial coverage is restricted to the (relatively few) deployed locations. Floating car and interval detector data have relatively better spatial coverage on major roads, but sparse data issues remain for many arterial roads [14,15]. Therefore, effective techniques to impute spatially missing data are also required.
This paper proposes an effective method to estimate travel time distributions from heterogeneous data sources with missing data. The remainder of this paper is organized as follows. Section 2 reviews the literature on the travel time distribution estimation methods. Section 3 briefly introduces Dempster-Shafer (D-S) evidence theory to provide the necessary background. Section 4 presents the proposed heterogeneous data fusion method. Section 5 reports a case study using real-world data from Hong Kong. Section 6 provides conclusions and recommendations for further research.

2. Literature Review

Travel time estimations have been intensively studied for over three decades. Early studies proposed various methods to estimate deterministic travel times, e.g., mean travel time, using a single data source [16,17,18,19,20]. A complete survey of these methods is beyond the scope of this paper; interested readers can refer to comprehensive reviews by Mori et al. and Vlahogianni et al. [8,21].
In the last decade, many research efforts have focused on data fusion techniques to enhance accuracy and robustness of deterministic travel time estimation using multiple data sources. Current data fusion techniques can be broadly classified into statistical, artificial cognition, and probabilistic-based techniques [22]. Statistical based techniques, such as simple convex combination algorithms, use statistical information of data quality to determine weights for different data sources [16,23]. They are relatively widely used due to their simple implementation. However, they are not well suited to fuse different data sources, which are inconsistent and even conflicting. Artificial cognition based techniques combine multiple data sources using artificial intelligence techniques, such as neural networks or genetic algorithms [11]. They can tackle complex data fusion problems, but require large datasets for training, which are generally infeasible for many real-world applications. Probabilistic based techniques typically employ Bayesian and/or D-S evidence theory to provide mathematical reasoning rules for fusing inaccurate and inconsistent data from multiple sources [24,25,26]. The D-S evidence theory can be regarded as a generalization of Bayesian theory without the requirement of prior knowledge. Nevertheless, most existing studies using D-S evidence theory are restricted to estimation of traffic states (i.e., very congested, congested, medium, smooth or very smooth) rather than precise numerical values of travel times [27,28,29].
Since no single data source covers the whole network, research efforts have also investigated missing data imputing techniques to enhance data completeness. Missing data imputation can be broadly classified as prediction and spatial interpolation based techniques. Prediction-based techniques adopt travel time prediction models, such as K-nearest neighbors, kernel regression, and autoregressive integrated moving average, to forecast the missed data from historical data [30,31,32]. Spatial interpolation techniques impute missing data of a link by using established statistical relationships between the link and its adjacent links [14,15,16,33]. For all techniques in both categories, incorporation of travel time correlations is well recognized as an effective way to improve imputation performance [32]. However, most missing data imputing techniques assume fixed travel time correlations, which are inadequate to represent the dynamic nature of traffic conditions.
The above studies focused on estimating only deterministic travel times, while ignoring travel time variances. Recent attention has investigated methods to estimate travel time distributions (including means and variances) using a single data source. Dion and Rakha [34] proposed an exponential smoothing filter to estimate travel time distributions using interval detector data. Jenelius and Koutsopoulos [35] developed a maximum likelihood approach to estimate travel time distributions using floating car data. Rahmani et al. [36] used the same data type and proposed a nonparametric approach to estimate travel time distributions. Hans et al. [37] used point detector data and proposed a kinematic wave approach for estimating travel time distributions at signalized intersections. Accurate estimation of travel time distributions is more challenging, because more data with higher quality are required to estimate reliable mean and variance information. Including multiple data sources is obviously beneficial for accurate and robust estimations of travel time distributions, but to the best of our knowledge, few studies have been done for estimating travel time distributions by fusing multiple data sources.
To fill this gap, the current study proposes a heterogeneous data fusion method for estimating travel time distributions, fusing interval and point detector data. In the proposed method, link travel time distributions are first estimated from point detector observations. The spatially missing data issue of point detectors is addressed. Travel time distributions of links without point detectors are imputed based on their spatial correlations with links with point detectors. Estimated link travel time distributions from point detector data are then fused with path travel time distributions obtained from interval detectors. To fuse these two path distributions, a D-S distribution fusion algorithm is proposed, built on D-S evidence theory. An optimization technique is further introduced to update link travel time distributions and their spatial correlations according to the fused travel time distribution.

3. Brief Introduction of the D-S Evidence Theory

The D-S evidence theory was initially developed by Dempster [38], and later extended and refined by Shafer [39]. This theory can be regarded as a generalization of Bayesian inference to tackle uncertainty reasoning based on incomplete information [40,41]. In contrast to Bayesian inference, D-S evidence theory does not assign priori probabilities to unknown propositions or states [42]. Probabilities are assigned only when supporting evidence is available [43]. This provides a flexible framework for decision making by combining cumulative evidence, and has broad applications in many areas, such as expert systems [40,44], artificial intelligence [45,46], false diagnosis [47,48], target recognition [49,50,51], decision-making [52], information fusion [53], etc.
Let Ω = { S 1 , , S n } be a collectively exhaustive and mutually exclusive set of states, which is also called the frame of discernment. This frame of discernment contains every possible state of a system. A basic probability assignment (BPA) (also called a belief structure) is a function m : 2 Ω [ 0 , 1 ] , satisfying m ( ϕ ) = 0 and A Ω m ( A ) = 1 , where A is a subset of Ω ; and 2 Ω = { A | A Ω } is the power set of Ω consisting of all the subsets of Ω . The assigned probability m ( A ) measures the belief exactly assigned to A . All the assigned probabilities sum to unity and there is no belief in the empty set ϕ . For notational consistency, boldfaced letters represent vectors or matrixes throughout the paper.
Multiple independent evidence can be fused using the traditional Dempster’s combination rule [43,44,45,46,47,48,51,52]. With BPAs of two independent evidences, m 1 and m 2 , the combination rule is defined as:
m f u s ( C ) = m 1 m 2 = A B = C ϕ , A , B Ω m 1 ( A ) × m 2 ( B ) 1 η m f u s ( ϕ ) = 0
where η is the conflict factor, which ranges from 0 to 1 and represents the degree of total conflict between evidences m 1 and m 2 . 1 / ( 1 η ) is the normalization factor which ensures the sum of BPAs can be unit. η is given by:
η = A B = ϕ , A , B Ω m 1 ( A ) × m 2 ( B )
Dempster’s combination rule, Equation (1), provides effective reasoning rules for fusing low and moderate conflict evidences. However, in case of high or complete conflict evidences (i.e., η value approach to 1), traditional D-S evidence theory may lead to unreasonable synthesis results. To reduce the degree of evidence conflict, an effective method is to modify the evidence. A common technique [54,55] is to introduce an unknown state, Θ , into the frame of discernment as Ω = { Ω , Θ } , where Θ represents the unknown part of the evidence.
As an alternative, several researchers argued that high conflict are mainly caused by unreliable evidences; and thereby they proposed methods to identify and correct the unreliable evidences before the combination [48,51,56]. Overall, the D-S evidence theory provides mathematical reasoning rules for fusing inaccurate and incomplete data from multiple sources. In Section 4.2.2, the D-S evidence theory is employed to fuse travel time distributions from different data sources, which may be high conflict or even complete conflict.

4. Travel Time Distributions Estimated by Fusing Heterogeneous Data Sources

4.1. Problem Statement

Let G = ( N , E ) be a directed network consisting of a set of nodes, N , and a set of links, E . A link a i j E is defined to be the road section between two adjacent nodes with n i N and n j N . Travel time of the link is a random variable, T i j , with mean and standard deviation (STD) t i j and σ i j , respectively. The vector of mean travel times for all links is t = [ , t i j , ] T , and the variance-covariance matrix between all links is K . The matrix K is the variance-covariance matrix of link travel times. In the variance-covariance matrix K , elements along the diagonal are the variance of link travel times, and off-diagonal elements are the travel time covariance between two links.
Let p o d be a path between starting and ending nodes, n o and n d , respectively, consisting of λ consecutive links. Let x i j o d be a link path incidence variable, where x i j o d = 1 means that link a i j is on p o d and x i j o d = 0 otherwise. Let X = [ , x i j o d , ] T be the vector of link path incidence variables. The path travel time distribution, T o d , can be calculated by summing link travel times along the path,
T o d = a i j E T i j x i j o d
Let t o d and σ o d be the mean and variance of the path travel time distribution, respectively, then:
t o d = X T t
σ o d = X T K X
To obtain travel time distribution information, many detectors of different types may be deployed in the network, as shown in Figure 1 for a simple network with interval and point detectors. A pair of interval detectors, e.g., AVI devices, are installed at n o and n d of p o d to record the set of vehicles passing them. The path travel time of each recorded vehicle can be obtained by the time difference from entering to leaving the path, and path travel time distribution can be directly estimated from this data, denoted as T i n t o d . However, the detailed travel time distributions of all links along the path are unknown and the interval detector data covers only a portion of vehicles with relatively poor temporal sampling.
Point detectors, e.g., loop detectors, are generally deployed for a subset of network links in real applications, e.g., a o 1 r and a 23 r in Figure 1, whereas other links, e.g., a 12 e , a 34 e , and a 4 d e , are without detectors. Thus, only travel time distributions of links with point detectors, e.g., T o 1 r and T 23 r , can be directly estimated, while travel time distributions of links without point detectors are unknown, e.g., T 12 e , T 34 e , and T 4 d e . Nevertheless, the point detector data tend to have good temporal sampling, since these detectors generally can collect the speeds of all vehicles passing through them.
Obviously, interval and point detector data have distinct spatial and temporal characteristics. Fusing heterogeneous data from both interval and point detectors could be beneficial for estimating travel time distributions for the path and all links either with or without point detectors.

4.2. Proposed Heterogeneous Data Fusion Method

This section presents the proposed heterogeneous data fusion method to estimate travel time distributions for the path and all links either with or without point detectors. Empirical studies have found that travel times can be well represented by either normal, gamma, or lognormal distributions [10,39]. Therefore, to simplify the problem and present the essential concept, it is assumed that all link and path travel time distributions follow the normal distribution [57,58]. Using this normality assumption, the proposed method is to estimate the mean and STD of travel time distributions of the path and all links.
Figure 2 shows that the framework of the proposed heterogeneous data fusion method consists of three steps, described in detail in the following sections. The first step, called data preprocessing, is to estimate path travel time distributions from interval and point detector data, respectively. The second step, called distribution fusion, is to fuse the estimated path travel time distributions by using D-S evidence theory. The last step, called posterior update, is to update link travel time distributions and their travel time correlations based on the fused distribution.

4.2.1. Data Preprocessing Step

This step estimates path travel time distributions from interval and point detector data, independently. The path travel time distribution, T i n t o d , can be directly estimated from interval detector data. Since interval detectors only record a set of vehicles equipped with electronic tags, the limited sample size becomes a critical issue in the estimation, especially for low market penetration applications. Outlier observations can also significantly affect path travel time distribution accuracy, e.g., some vehicles may make stops or detours along the path, leading to atypical travel time observations. To estimate path travel time distribution from interval detector data, the data filtering algorithm proposed by Dion and Rakha [34] was adopted in this study. This data filtering algorithm utilizes a series of low pass filters to remove outlier observations outside a dynamically varying validity window. Such algorithms can perform well in both stable and unstable traffic conditions at low levels of market penetration; and have been successfully applied in the real-time traveler information system (RTIS) in Hong Kong [14]. Thus, an accurate and robust estimation of mean, t i n t o d and STD, σ i n t o d of path travel time distribution can be obtained from interval detector data.
As discussed above, path travel time distribution cannot be directly estimated through point detector data, because only a few links are deployed with point detectors. To estimate the path travel time distributions, links are divided into links with and without point detectors, so that the vector of mean travel time comprises two parts t poi = [ t poi r , t poi e ] T , where t poi r and t poi e are mean travel times for links with and without point detectors, respectively, at time interval . The variance-covariance matrix can be divided into four sub-matrixes, K poi = [ K poi rr K poi er K poi re K poi ee ] , where K poi rr is the variance-covariance matrix for links with point detectors; K poi ee is the variance-covariance matrix for links without point detectors; K poi er is the covariance matrix between links without and with point detectors; and K poi re = ( K poi er ) T is the covariance matrix between links with and without point detectors. Let v poi r and v poi e be vectors of travel time variances for links with and without point detectors, respectively. They are elements along the diagonal of K poi rr and K poi ee , respectively.
For a link a r i with a point detector, its mean, t r i , and STD, σ r i , of the link travel time distribution can be obtained from the collected data at the current time interval , i.e., t poi r and K poi rr can be determined from the point detector data. However, mean travel times for links without point detectors, t poi e , should be indirectly estimated. Following Tam and Lam [14], t poi e is estimated using spatial correlations between links with and without point detectors:
t poi e = t upd e , 1 + K upd er , 1 ( K poi rr ) 1 ( t poi r t upd r , 1 )
where t upd r , 1 and t upd e , 1 are mean travel times for the links with and without point detectors estimated at the previous time interval 1 , respectively; K upd er , 1 is the covariance matrix between links without and with point detectors estimated at the previous time interval 1 ; and ( K poi rr ) 1 is the inverse of K poi rr .
Similar to Equation (6), v poi e in this study was also indirectly estimated using the spatial correlations between links with and without point detectors:
v poi e = v upd e , 1 + K upd er , 1 ( K poi rr ) 1 ( v poi r v upd r , 1 )
where v upd r , 1 and v upd e , 1 are travel time variances of the links with and without point detectors at the previous time interval 1 , respectively. Therefore, elements along the diagonal of K poi ee and all elements of K poi rr are estimated in the current time interval . It is assumed that ( k p o i e e ) i j = ( k u p d e e , 1 ) i j , i j and K poi er = K upd er , 1 , which means that off-diagonal elements of K poi ee and all elements of K poi er are the same as corresponding elements at the previous interval, 1 . These two matrixes, K poi ee and K poi er , will be updated in the posterior update step in Section 4.2.3.
After t poi = [ t poi r , t poi e ] T and K poi = [ K poi rr K poi er K poi re K poi ee ] are determined, the mean, t p o i o d , and STD, σ p o i o d , of the path travel time distribution, can be calculated. The vector of link path incidence variables is divided into two groups as X = [ X poi r , X poi e ] T , where X poi r and X poi e are link path incidence variables for links with and without point detectors, respectively. Then, Equations (4)–(7) for calculating t p o i o d and σ p o i o d can be expressed as:
t p o i o d = ( X poi r ) T t poi r + ( X poi e ) T t poi e
σ p o i o d = ( X poi r ) T K poi rr X poi r + ( X poi e ) T K poi ee X poi e + 2 ( X poi e ) T K poi er X poi r
Substituting Equation (6) into Equation (8), the mean travel time can be expressed as:
t p o i o d = ( X poi r ) T t poi r + ( X poi e ) T t upd e , 1 + ( X poi e ) T K upd er , 1 ( K poi rr ) 1 ( t poi r t upd r , 1 )
Therefore, the path travel time distribution, T p o i o d , can be determined from point detector data.

4.2.2. Distribution Fusion Step

This step fuses two path travel time distributions, T i n t o d and T p o i o d , estimated from interval and point detectors, respectively. A fusion algorithm is proposed built on the D-S evidence theory. In this study, the frame of discernment, Ω , is defined as a set of mutually exclusive travel time ranges, { S 1 , , S i , , S n } , where each travel time range, S i = [ l i , u i ] , is defined by a lower bound l i and upper bound u i .The mean travel time for range S i can be expressed as:
E ( S i ) = u i + l i 2
Path travel time distributions estimated by interval and point detectors can be regarded as two independent sets of evidence. Based on the defined travel time ranges, these two path travel time distributions are discretized to obtain corresponding discrete distributions, i.e., histograms, as illustrated in Figure 3a. The resultant discrete distributions, m i n t and m p o i , are respectively modelled as BPAs for T i n t o d and T p o i o d . Then, m ( S i ) (either m i n t ( S i ) or m p o i ( S i ) ) represents the corresponding probability of travel time range Si, and can be expressed as:
m ( S i ) = l i u i f ( x ) d x
where f ( x ) is the probability density function of T o d (either T i n t o d or T p o i o d ). When path travel time distributions follow normal distributions, m ( S i ) can be expressed as:
m ( S i ) = Φ s n d ( u i t r s σ r s ) Φ s n d ( l i t r s σ r s )
where Φ s n d ( ) represents the cumulative distribution function (CDF) of the standard normal distribution. In the literature, Hart’s formula [59] is a good numerical approximation approach to calculate Φ s n d ( ) :
Φ s n d ( x ) 1 2 1 x 2 π { e x 2 / 2 [ π x 2 2 + ( 1 + 0.282455 x 2 ) 1 / 2 1 + 0.212024 x 2 e x 2 / 2 ] }
Clearly, BPAs, m , satisfies m ( ϕ ) = 0 and S i Ω m ( S i ) = 1 .
Figure 3 illustrates three typical situations of evidence conflict, representing the relationships between interval detector and point detector. Two path travel time are discretized into five travel time ranges as (5, 8), (8, 11), (11, 14), (14, 17) and (17, 20) which constitute the frame of discernment Ω = { S 1 , S 2 , S 3 , S 4 , S 5 } . The corresponding BPAs of two path travel time distributions are shown in Table 1. Figure 3a shows Case 1 that the two evidences have high belief level and low conflict degree, with a large portion of histogram coverage. Figure 3b shows Case 2 that two evidences have low belief level and high conflict degree, with only a small portion of histogram coverage. Figure 3c shows Case 3 that the two evidences are completely conflicted without histogram coverage.
Table 1 shows the results of evidence fusion by using Dempster’s combination rule, Equation (1). As shown, this combination rule can provide a good estimation of path travel time distribution for Case 1 with a low conflict factor, η = 1 i = 1 5 m i n t ( S i ) × m p o i ( S i ) = 1 ( 0.2 × 0.3 + 0.4 × 0.4 + 0.2 × 0.3 ) = 0.72 . The fused BPA is calculated from m f u s ( S i ) = m i n t ( S i ) × m p o i ( S i ) / ( 1 η ) (e.g., m f u s ( S 3 ) = 0.4 × 0.4 / ( 1 0.72 ) = 0.5714 ). After distribution fusion, travel time ranges, S 2 , S 3 and S 4 , supported by both evidence sets, are strengthened in a reasonable way.
However, for Cases 2 with high conflict factor, η = 1 ( 0.1 × 0.1 ) = 0.99 , the Dempster’s combination rule can lead to an incorrect fusion result, m f u s ( S 3 ) = ( 0.1 × 0.1 ) / ( 1 0.99 ) = 1 , given both evidence sets afford little support to S 3 . This situation is known as Zadeh’s paradox in the literature. Further, Dempster’s combination rule cannot be used for Case 3 of the completely conflict situation. In this case, m i n t ( ) and m p o i cannot be fused, because η = 1 so all m f u s ( S i ) become infinite.
To reduce the degree of data conflict, the generalized combination rule, Equation (2), is adopted in this study, by introducing the unknown state into the frame of discernment, Ω = { S 1 , , S i , , S n , Θ } . Subsequently, to construct BPA m (either m i n t or m p o i ), a pre-defined small probability α Θ = m ( Θ ) , (e.g., α Θ = 0.05 ), is set for the unknown state Θ . Then, the path travel time distribution between t o d + Φ s n d 1 ( α Θ / 2 ) σ o d and t o d + Φ s n d 1 ( 1 α Θ / 2 ) σ o d is discretized to obtain m ( S i ) , where Φ s n d 1 ( ) is the inverse CDF of the standard normal distribution (e.g., Φ s n d 1 ( 0.025 ) = 1.96 and Φ s n d 1 ( 0.975 ) = 1.96 ).
High and complete conflict situations are usually due to various data quality from the different detectors. To differentiate data sources with varying quality, an information quality parameter [48] is adopted in this study to assign higher weighting to data sources with better information quality. Let w i n t and w p o i be the information quality weights for the path travel time distribution from interval and point detectors respectively. In this study, w i n t and w p o i are expressed as a function of sample size and travel time variance:
w i n t = 1 ( 1 β i n t ) N i n t ( σ i n t o d ) 2
w p o i = 1 ( 1 β p o i ) N p o i ( σ p o i o d ) 2
where N i n t is the sample size collected by interval detectors; N p o i is the average sample size for all point detectors along the path; β i n t and β p o i are sensitivity parameters for interval and point detectors, respectively, which should be calibrated independently. Other types of information quality function could also be used in practice.
Applying different weightings w i n t and w p o i , the BPA m (either m i n t or m p o i ) is adjusted using following formula [48]:
m ¯ = { m ¯ ( S i ) = w w max m ( S i ) , S i Ω m ¯ ( Θ ) = 1 i = 1 n m ¯ ( S i )
where w max = max ( w i n t , w p o i ) is the larger between w i n t and w p o i . Substituting the adjusted BPAs into Equations (1) and (2), the fused BPA, m f u s , can be determined following the generalized combination rule:
m f u s = { m f u s ( S i ) = m ¯ i n t ( S i ) × m ¯ p o i ( S i ) + m ¯ i n t ( S i ) × m ¯ p o i ( Θ ) + m ¯ i n t ( Θ ) × m ¯ p o i ( S i ) 1 η , S i Ω m ( Θ ) = m ¯ i n t ( Θ ) × m ¯ p o i ( Θ ) 1 η η = 1 m ¯ i n t ( Θ ) × m ¯ p o i ( Θ ) i = 1 n [ m ¯ i n t ( S i ) × m ¯ p o i ( S i ) + m ¯ i n t ( Θ ) × m ¯ p o i ( S i ) + m ¯ p o i ( Θ ) × m ¯ i n t ( S i )
Table 2 illustrates the distribution fusion built on the generalized combination rule using the same example as in Table 1. In this example, m i n t ( Θ ) = m p o i ( Θ ) = 0.05 are set; and two BPAs, m i n t and m p o i are modified to reflect this setting. Information quality parameters w i n t = 0.8 and w p o i = 0.6 are used for interval and point detectors, respectively. All BPAs, m p o i , for these cases are adjusted to m ¯ p o i ( S i ) = m p o i ( S i ) × 0.6 / 0.8 , S i Ω ; and m ¯ p o i ( Θ ) = 1 i = 1 5 m ¯ p o i ( S i ) = 1 0.95 × 0.6 / 0.8 = 0.288 . The generalized combination rule, Equation (16), was adopted for fusing path travel time distribution.
Table 2 shows that the generalized combination rule provides a reasonable outcome for Case 1 (i.e., low conflict situation). More importantly, this generalized combination rule can well address the distribution fuse problem for Case 2 (i.e., the high conflict situation). Introducing Θ significantly reduced the conflict factor η to 0.6614. The probability of S 3 , which has little support from both evidence sets, is only slightly strengthened as m f u s ( S 3 ) = ( 0.075 × 0.056 + 0.075 × 0.288 + 0.05 × 0.056 ) / ( 1 0.6614 ) = 0.0874 . The probabilities of other travel time ranges, S 1 , S 2 , S 4 , and S 5 , are reduced, but a higher weighting is given to the data source with better data quality (i.e. m i n t ). The generalized combination rule also addressed the distribution fusion for Case 3 (i.e., complete conflict situation), which cannot be fused using Dempster’s combination rule.
From the fused BPA, m f u s , the corresponding mean and STD, can be expressed as:
θ = 1 1 m f u s ( Θ )
t f u s o d = i = 1 n θ m f u s ( S i ) E ( S i )
σ f u s o d = i = 1 n ( E ( S i ) t f u s o d ) 2 θ m f u s ( S i )
where θ is the adjustment parameter to assign the probability of the unknown state to each travel time range. Thus, the proposed D-S distribution fusion algorithm can estimate path travel time distributions by fusing two path travel time distributions from interval and point detector data, even in the cases of extreme conflict between the data sets.

4.2.3. Posterior Update Step

This step updates the link travel time distributions and their spatial correlations based on the fused path travel time distribution. An optimization technique is proposed to update the travel time means (i.e., t poi = [ t poi r , t poi e ] T ) and variance-covariance matrix (i.e., K poi = [ K poi rr K poi er K poi re K poi ee ] ) estimated in the data preprocessing step.
Let t upd = [ t upd r , t upd e ] T and K upd = [ K upd rr K upd er K upd re K upd ee ] be the updated travel time means and covariance matrix, respectively where ( k u p d e e ) i j is the element at row i and column j of K upd ee . This study uses t upd r = t poi r and K upd rr = K poi rr , because t poi r and K poi rr are directly obtained from point detector data and assumed to be accurate. Therefore, to update the link travel time covariance matrix, only K upd ee and K upd er sub-matrixes need to be updated, since K upd re = ( K upd er ) T holds. Accordingly, the optimization problem of updating the spatial correlations is formulated as the following nonlinear programming problem:
M 1   min ( i j ( ( k u p d e e ) i j ( k p o i e e ) i j ) 2 + i j 2 ( ( k u p d e r ) i j ( k p o i e r ) i j ) 2 )
Subject to:
t f u s o d = ( X poi r ) T t upd r + ( X poi e ) T t upd e , 1 + ( X poi e ) T K upd er ( K poi rr ) 1 ( t poi r t upd r , 1 )
( σ f u s o d ) 2 = ( X poi r ) T K poi rr X poi r + ( X poi e ) T K upd ee X poi e + 2 ( X poi e ) T K upd er X poi r
The nonlinear programming M 1 has a convex objective function and two linear constraints. To ensure K upd is stable over time, objective function (22) minimizes the total difference of updating elements in both K upd ee and K upd er sub-matrixes. Constraints (23) and (24), derived from Equations (9) and (10), ensure that the summation of means and variances of corresponding link travel time distributions are equal to that of the fused path travel time distribution, i.e., t f u s o d and ( σ f u s o d ) 2 . This M 1 problem is a typical quadratic programming problem. A unique solution can be determined using several efficient algorithms, such as the quadprog function in MatLab.
Once K upd is determined, the vector of travel time means for links without point detectors, t upd e are updated as:
t upd e = t upd e , 1 + K upd er ( K poi rr ) 1 ( t poi r t upd r , 1 )
The updated t upd and K upd are used for estimating travel time distributions of links without point detectors in the subsequent time interval. The detailed steps of the Algorithm 1 are summarized as follows.
Algorithm 1
Step 1. Data preprocessing stage:
Estimate T i n t o d from interval detector data at current interval .
Estimate t poi r and K poi rr for links with point detectors at current interval .
Deduce t poi e and v poi e for links without point detectors using Equations (6) and (7).
Estimate T p o i o d using Equations (9) and (10).
Step 2. Distribution fusion stage:
Estimate T f u s o d by fusing T i n t o d and based on Equations (11)–(21).
Step 3. Posterior update stage:
Update K upd using Equations (22)–(24); and update t upd using Equation (25).
Set K upd 1 = K upd , and t upd 1 = t upd .
Go to Step 1 for next time interval.

5. Numerical Experiments

Performance of the proposed heterogeneous data fusion method was investigated using real-world data from Hong Kong, as shown in Figure 4. A path from Aberdeen tunnel in Hong Kong Island to the Cross Harbor tunnel (CHT) in Kowloon urban area was selected for this case study. CHT is the most congested of the three tunnels connecting Kowloon urban area and Hong Kong Island. The total travel distance of the chosen path was 3.7 km with free-flow travel time 3.6 min. There were 11 links in the chosen path, with only two, Links 1 and 5, equipped with Autoscope video image detectors (VIDs), which is a popular type of point detector. Two AVI devices were installed at the beginning and end of the chosen path for automatic toll collection. Market penetration of AVI systems was approximately 40%. Real-time AVI data were also utilized for the implementation of RTIS (real-time traveler information systems) in Hong Kong [14]. Detailed information of this AVI system was provided in Tam and Lam [14].
Traffic data from both interval and point detectors were collected during (07:00–23:00) of a typical weekday: Wednesday, 20 August 2014. An offline link travel time covariance matrix was obtained from RTIS [14] as the initial K fus . To evaluate the performance of the proposed heterogeneous data fusion method, a manual license plate matching survey was performed. Video recording equipment was set at the starting and end nodes of the chosen path to record the license plate readings of vehicles. The vehicles recorded at the starting and end nodes were manually matched. Path travel times of matched vehicles were computed as ground truth for accuracy validation.

5.1. Evaluation Metrics

Two widely accepted metrics, mean absolute percentage error (MAPE) and root mean square error (RMSE), were adopted to evaluate the accuracy of the estimated mean of path travel time distributions:
M A P E t = 100 % n = 1 n | t f u s o d t o b s o d | t o b s o d
R M S E t = 1 n = 1 n ( t f u s o d t o b s o d ) 2
where n is the number of time intervals during the period of interest, and t o b s o d is the ground truth observed mean travel time obtained from the field survey at time interval . Smaller MAPEt and RMSEt indicate higher accuracy of the estimated mean travel time.
The MAPE and RMSE concepts were extended to evaluate the accuracy of the estimate STD of the path travel time as:
M A P E σ = 100 % n = 1 n | σ f u s o d σ o b s o d | σ o b s o d
R M S E σ = 1 n = 1 n ( σ f u s o d σ o b s o d ) 2
where σ o b s o d represents the ground truth observed travel time STD obtained from the field survey at time interval .
For many transportation applications, it is meaningful to construct a travel time interval at a given confidence level from the estimated travel time distribution [60,61]. The travel time interval accuracy represents the integrated accuracy of both the estimated mean and STD. Two metrics were adopted to evaluate these accuracies: probability outside the predicted (estimated) time interval (POPI) and probability outside the observed time interval (POOI) [62]. The POPI measures the percentage of observed data outside the estimated travel time interval, while the POOI measures the percentage of estimated distribution outside the observed travel time interval.
Let l f u s = Φ f u s 1 ( α / 2 ) and u f u s = Φ f u s 1 ( 1 α / 2 ) be the lower and upper bounds of the estimated travel time interval, respectively, at confidence level 1 α , where Φ f u s 1 ( ) is the inverse CDF of the estimated path travel time distribution. Then:
P O P I = 100 % n = 1 n ( 1 Φ o b s ( u f u s ) Φ o b s ( l f u s ) 1 α )
where Φ o b s ( ) is the CDF of the observed travel time distributions. The POPI value ranges from 0 to 1. The smaller POPI indicates capture of larger proportion of observed data, i.e., higher accuracy of the estimated travel time interval. As noted by Shi et al. [62], this POPI metric is very useful, but tends to exhibit bias for situations of wide travel time intervals due to large STD errors.
As an alternative, POOI metric is the percentage of estimated distribution outside the observed travel time interval. Let l o b s = Φ o b s 1 ( α / 2 ) and u o b s = Φ o b s 1 ( 1 α / 2 ) denote the lower and upper bounds of the observed travel time interval, respectively, at confidence level 1 α , where Φ o b s 1 ( ) is the inverse CDF of the observed path travel time distribution, and Φ f u s ( ) denotes the CDF of the estimated travel time distribution. Then:
P O O I = 100 % n = 1 n ( 1 Φ f u s ( u o b s ) Φ f u s ( l o b s ) 1 α )
POOI also ranges [0, 1], and larger POOI indicates lower estimated travel time interval accuracy, because a larger proportion is outside the observed travel time interval. Therefore, the POPI and POOI matrices are complementary to evaluate the estimated path travel time distribution accuracy.

5.2. Experimental Results

This section reports experimental results of the case study using the proposed heterogeneous data fusion method. Travel time distributions for the chosen path and links were estimated every 2 min. The probability of the unknown state for both interval and point detectors was set as α Θ = m i n t ( Θ ) = m p o s ( Θ ) = 0.05 , and sensitivity parameters in Equations (15) and (16) were set as β i n t = 0.2 and β p o i = 0.8 , according to the sensitive analysis results obtained from Dion and Rakha [34]. Setting β p o i = 4 β i n t assigns a higher level of information quality to the interval detector than point detector data, given the same sample sizes.
Figure 5 shows two path travel time distributions, T i n t o d and T p o i o d , estimated from interval and point detectors, respectively, in the data preprocessing step. Travel time intervals were constructed for the 95% confidence level, i.e., α Θ = 0.05 , for both interval and point detectors, shown in blue and red, respectively. Observed data from the field survey, shown in green dots, were only used for accuracy validation. As shown in the figure, two estimated travel time intervals from different data sources can cover most observed data well during the period of interest. The two estimated travel time distributions show high consistency during off-peak periods (21:00–23:00 and 7:00–7:30), slight inconsistency during inter-peak periods (10:00–16:00), and high inconsistency during peak periods (7:30–10:00 and 16:00–21:00). In general, T i n t o d tended to have higher accuracy than T p o i o d . This was expected, since T i n t o d was estimated from interval detector data, whereas T p o i o d was estimated from point detector data through spatial interpolation. Such a result also justified the chosen sensitivity parameters, reflecting the higher level of information quality for the interval detector data.
Figure 6 shows the resultant path travel time distribution after fusing the two path travel time distributions from Figure 5. A confidence level of 80%, i.e., α = 0.2, was used to construct the travel time interval and calculate POPI and POOI metrics. The proposed heterogeneous data fusion method provided an accurate and robust estimation of mean travel time, t f u s o d , throughout the period of interest, with M A P E t = 7.1 % . However, the relative large M A P E σ = 17.9 % showed that the proposed method overestimated path travel time distribution STD, σ f u s o d , for the period of interest. This highlights the challenge of accurately estimating σ f u s o d in congested road networks. One major reason may be the difficulty of estimating σ o b s o d of the population using biased samples with various data quality. Fortunately, the slight STD over estimation could be beneficial to most travelers with risk-averse attitudes regarding travel time uncertainty. P O P I = 15.7 % , somewhat better than the target (20%), which indicates that a high proportion (84.3%) of observation data was well covered by the estimated path travel time interval. It can also be seen from the figure that the estimated interval was not too wide, given the relative large STD error. P O O I = 25.6 % , which was somewhat larger than the target (20%). Thus, overall the POPI and POOI metrics verified that the proposed heterogeneous data fusion method could obtain accurate and robust estimations of the path travel time interval (i.e., path travel time distribution) by fusing heterogeneous interval and point detector data.

5.3. Comparison of Data Fusion and Single Data Source Results

In this section, the effectiveness of the proposed heterogeneous data fusion method was investigated by comparing data fusion results with those estimated from single data source. The estimated path travel time distribution (i.e., T i n t o d ) from single interval detector data was shown in Figure 5 in blue. The estimated path travel time distribution from single point detector data (denoted by T ˜ p o i o d ) was shown in Figure 7 in blue, which was different from the T p o i o d estimation shown in Figure 5. It should be noted that T ˜ p o i o d was generated using fixed offline spatial correlations obtained from RTIS, and T p o i o d was generated by the proposed heterogeneous data fusion method using the updated spatial correlations.
Figure 7 shows travel time intervals of T ˜ p o i o d and T p o i o d in blue and red colors for comparison. The 80% confidence level was used for construing travel time intervals and calculating P O P I and P O O I metrics. As illustrated, by using updated spatial correlations, the accuracy of the path travel time distribution estimated from point detector data was significantly improved. The M A P E t , M A P E σ , P O P I , and P O O I metrics were reduced by 46.4% (i.e., 1–24.9%/46.5%), 78.9%, 21.1%, and 22.1%, respectively. This validates the effectiveness of the proposed optimization technique for updating travel time spatial correlations. Such a result also highlights the necessity for considering the dynamic nature of travel time spatial correlations in congested road networks, and implies that current spatial interpolation techniques [14,15] built on fixed spatial correlations may lead to considerable errors when imputing missing data.
Table 3 summarizes the evaluation metrics for all path travel time distributions estimated from point detector, T ˜ p o i o d ; interval detector, T i n t o d ; and fused data, T f u s o d . Amongst these three distributions, the accuracy of T ˜ p o i o d was the poorest, with M A P E t = 46.5 % and M A P E σ = 61.6 % . The P O P I = 85.9 % indicates that a large proportion (85.9%) of observations falling outside the travel time interval of T ˜ p o i o d . The P O O I = 92.0 % shows that almost whole travel time range of T ˜ p o i o d was out of the observed time interval. The accuracy of T i n t o d was somewhat superior, with M A P E t = 17.1 % , M A P E σ = 76.9 % , P O P I = 26.4 % and P O O I = 48.9 % . As shown, T f u s o d , using the proposed data fusion method, was the best for all evaluation metrics. By fusing interval and point detector data, the M A P E t , M A P E σ , P O P I and P O O I metrics were respectively reduced by 58.5% (i.e., 1–7.1%/17.1%), 76.7%, 40.5%, and 47.6%, when compared to that of T i n t o d . Thus, the proposed heterogeneous data fusion method can significantly improve the accuracy of path travel time distribution estimations from interval and point detectors.
Fusion of interval and point detector data can improve the accuracy of travel time distributions for links without point detectors. When only point detector data were used, travel time distributions for links without point detectors were indirectly estimated through the fixed spatial correlations. Fusing interval and point detector data provided better estimations of link travel time distributions from the updated spatial correlations. Figure 8 compares individual link travel time distributions estimated from point detector data and the proposed data fusion method. Ground truth data for these link travel time distributions were not available for quantitative analysis of estimation accuracy. Nevertheless, link travel time distributions estimated from the proposed heterogeneous data fusion method better capture dynamic traffic conditions, with more distinct peaks occurring during the morning and evening peak periods. The much superior accuracy of path travel time distribution estimation (see Table 3) also justifies this visual observation, because the path travel time distribution is the summation of corresponding link travel time distributions along the path.

5.4. Comparison of Different Distribution Fusion Algorithms

This section investigates the effectiveness of the proposed D-S distribution fusion algorithm built on the D-S evidence theory. To further evaluate and benchmark the proposed algorithm, a linear combination fusion algorithm built on the linear combination approach was also implemented. The linear combination approach (or simple convex combination approach) has been widely used as a simple and effective technique to fuse two independent estimations of mean travel times [11],
t f u s o d = w i n t w i n t + w p o i t i n t o d + w p o i w i n t + w p o i t p o i o d
where w i n t and w p o i are the data quality of interval and point detectors, respectively, as defined in Equations (15) and (16). This study extended the linear combination approach to fuse two independent STD estimations, as:
σ f u s o d = w i n t w i n t + w p o i σ i n t o d + w p o i w i n t + w p o i σ p o i o d
Assuming normal distributions, this extended linear combination fusion algorithm can be used to fuse path travel time distributions from interval and point detectors.
In this study, the same set of input data was used to validate the results of the proposed D-S distribution fusion and the linear combination fusion algorithms. Path travel time distributions of interval and point detectors obtained in the data preprocessing step, shown in Figure 5, were employed as the input data. Figure 9 reports the fused path travel time distributions using these two algorithms. As shown, the proposed D-S distribution fusion algorithm produces better of path travel time distribution estimates than the linear combination fusion algorithm. The proposed algorithm can significantly reduce M A P E t , M A P E σ , P O P I , and P O O I metrics by 58.6%, 15.3%, 37.2%, and 38.0%, respectively, compared to the linear combination fusion algorithm. This result indicates that the D-S evidence theory is effective for fusing inaccurate and inconsistent distribution data from multiple sources under various information conflict situations, including highly consistent, slightly inconsistent, and highly inconsistent situations.

6. Conclusions and Future Research

Provision of travel time distribution information is a crucial requirement for travelers to make reliable path choice decisions incorporating travel time uncertainties. With advances in information and communication technologies, interval detectors (such as automatic vehicle identification devices) and point detectors (such as loop detectors) are being increasingly deployed in road networks. These interval and point detectors generate heterogeneous data sources with distinct characteristics of data quality and network coverage. Fusing these heterogeneous data can be beneficial for robust and accurate estimation of travel time distribution information.
This paper proposed a heterogeneous data fusion method to estimate travel time distributions, fusing heterogeneous data from point and interval detectors. The proposed method consisted of three steps. The first step, i.e., data preprocessing, was to respectively estimate path travel time distributions from interval and point detector data. The spatially missing data issue of point detectors was addressed. The travel time distributions of links without point detectors were imputed based on their spatial correlations with links that had point detectors. The second step, i.e., distribution fusion, was to fuse these two path travel time distributions estimated from interval and point detectors. A D-S distribution fusion algorithm built on the Dempster-Shafer evidence theory was proposed to fuse path travel time distributions from different data sources with various information qualities. The third step, i.e., posterior update, was to update link travel time distributions and their spatial correlations. The problem of updating spatial correlations was formulated and solved as a quadratic programming problem with a convex objective function and two linear constraints.
To validate the accuracy of the proposed heterogeneous data fusion method, a case study was performed using real-world data from RTIS in Hong Kong. The results validated that the proposed method can obtain robust and accurate estimations of path travel time distributions in congested road networks. Compared with either interval or point detectors alone, the proposed data fusion method can significantly reduce estimation errors for path travel time distributions with respect to M A P E t , M A P E σ , P O P I , and P O O I metrics. The proposed D-S distribution fusion algorithm was also compared to a linear combination algorithm for the same case study, and it showed that the proposed D-S distribution fusion algorithm can generate a robust and accurate fusion of travel time distributions over the whole period of interest, including highly consistent, slightly inconsistent, and highly inconsistent situations for the different data sources. Furthermore, the results of the case study indicated that the proposed optimization technique can effectively update travel time spatial correlations under dynamic traffic conditions, and incorporation of updated spatial correlations greatly enhanced estimation accuracy of travel time distributions of the path and all links without point detectors. Therefore, the proposed D-S distribution algorithm was validated to be effective for fusing travel time distributions from different data sources under various information conflict situations, including highly consistent, slightly inconsistent, and highly inconsistent situations.
There are several worthwhile directions for future research. First, travel times in this study were assumed to follow normal distributions. However, several previous studies have found that travel times in congested road networks could be better represented by asymmetric distributions with strong positive skew, e.g., lognormal, gamma, or Burr distributions [10,57]. The proposed heterogeneous data fusion method can be easily extended to other types of distributions with two parameters, e.g., lognormal or gamma, by replacing Equation (14) with corresponding methods to calculate the cumulative distribution function. Second, the spatial interpolation proposed by Tam and Lam [14] was adopted in this study for imputing the travel time distributions of links without point detectors. However, other effective spatial interpolation techniques have been proposed, such as Kriging [15]. Integrating these alternative spatial interpolation techniques into the proposed heterogeneous data fusion method warrants further study. Third, the proposed data fusion method only considered heterogeneous data from point and interval detectors. How to extend the proposed method to incorporate floating car data needs further investigation. Fourth, the case study only involved a specific path. Extension of the proposed method to fuse travel time distributions of multiple paths between a pair of nodes is an interesting topic for further investigation. Last but not the least, travel time distributions were estimated in this study for the current time interval. Extension of the proposed data fusion method to the problem of short term travel time distribution prediction is another interesting topic for further study.

Acknowledgments

The work described in this paper was supported by research grants from the National Key Research and Development Program of China (No. 2017YFB0503604), the National Natural Science Foundation of China (Nos. 41231171 and 41571149), the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 152628/16E), the Research Institute for Sustainable Urban Development of the Hong Kong Polytechnic University (1-ZVEW), and the Natural Science Foundation of Hubei Province (2016CFB568).

Author Contributions

Chaoyang Shi, Bi Yu Chen, William H. K. Lam and Qingquan Li provided the idea of this study; Chaoyang Shi implemented the proposed method, carried out the case study; Chaoyang Shi and Bi Yu Chen wrote the manuscript; and William H. K. Lam and Qingquan Li made important comments and suggestions for improvement of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chen, B.Y.; Lam, W.H.K.; Sumalee, A.; Li, Q.Q.; Shao, H.; Fang, Z.X. Finding reliable shortest paths in road networks under uncertainty. Netw. Spat. Econ. 2013, 13, 123–148. [Google Scholar] [CrossRef]
  2. Chen, B.Y.; Li, Q.Q.; Lam, W.H.K. Finding the k reliable shortest paths under travel time uncertainty. Transp. Res. B Methodol. 2016, 94, 189–203. [Google Scholar] [CrossRef]
  3. Yang, L.; Zhou, X. Optimizing on-time arrival probability and percentile travel time for elementary path finding in time-dependent transportation networks: Linear mixed integer programming reformulations. Transp. Res. B Methodol. 2017, 96, 68–91. [Google Scholar] [CrossRef]
  4. Zhong, R.; Sumalee, A.; Maruyama, T. Dynamic marginal cost, access control, and pollution charge: A comparison of bottleneck and whole link models. J. Adv. Transp. 2012, 46, 191–221. [Google Scholar] [CrossRef]
  5. Zhao, J.; Ma, W.; Liu, Y.; Han, K. Optimal operation of freeway weaving segment with combination of lane assignment and on-ramp signal control. Transp. A 2016, 12, 413–435. [Google Scholar] [CrossRef]
  6. Chen, B.Y.; Yuan, H.; Li, Q.Q.; Shaw, S.L.; Lam, W.H.K.; Chen, X. Spatiotemporal data model for network time geographic analysis in the era of big data. Int. J. Geogr. Inf. Sci. 2016, 30, 1041–1071. [Google Scholar] [CrossRef]
  7. Lim, S.; Lee, C. Data fusion algorithm improves travel time predictions. IET Intell. Transp. Syst. 2011, 5, 302–309. [Google Scholar] [CrossRef]
  8. Mori, U.; Mendiburu, A.; Alvarez, M.; Lozano, J.A. A review of travel time estimation and forecasting for Advanced Traveller Information Systems. Transp. A 2015, 11, 119–157. [Google Scholar] [CrossRef]
  9. Chen, B.Y.; Yuan, H.; Li, Q.Q.; Lam, W.H.K.; Shaw, S.L.; Yan, K. Map matching algorithm for large-scale low-frequency floating car data. Int. J. Geogr. Inf. Sci. 2014, 28, 22–38. [Google Scholar] [CrossRef]
  10. Du, L.; Peeta, S.; Kim, Y.H. An adaptive information fusion model to predict the short-term link travel time distribution in dynamic traffic networks. Transp. Res. B Methodol. 2012, 46, 235–252. [Google Scholar] [CrossRef]
  11. Bachmann, C.; Abdulhai, B.; Roorda, M.J.; Moshiri, B. A comparative assessment of multi-sensor data fusion techniques for freeway traffic speed estimation using microsimulation modeling. Transp. Res. C Emerg. Technol. 2013, 26, 33–48. [Google Scholar] [CrossRef]
  12. Bachmann, C.; Roorda, M.J.; Abdulhai, B.; Moshiri, B. Fusing a bluetooth traffic monitoring system with loop detector data for improved freeway traffic speed estimation. J. Intell. Transp. Syst. 2013, 17, 152–164. [Google Scholar] [CrossRef]
  13. Deng, W.; Lei, H.; Zhou, X. Traffic state estimation and uncertainty quantification based on heterogeneous data sources: A three detector approach. Transp. Res. B Methodol. 2013, 57, 132–157. [Google Scholar] [CrossRef]
  14. Tam, M.L.; Lam, W.H.K. Using automatic vehicle identification data for travel time estimation in Hong Kong. Transportmetrica 2008, 4, 179–194. [Google Scholar] [CrossRef]
  15. Zou, H.; Yue, Y.; Li, Q.; Yeh, A.G.O. An improved distance metric for the interpolation of link-based traffic data using kriging: A case study of a large-scale urban road network. Int. J. Geogr. Inf. Sci. 2012, 26, 667–689. [Google Scholar] [CrossRef]
  16. El Esawey, M.; Sayed, T. Travel time estimation in urban networks using limited probes data. Can. J. Civil. Eng. 2011, 38, 305–318. [Google Scholar] [CrossRef]
  17. Liu, H.X.; Ma, W. A virtual vehicle probe model for time-dependent travel time estimation on signalized arterials. Transp. Res. C Emerg. Technol. 2009, 17, 11–26. [Google Scholar] [CrossRef]
  18. Liu, H.X.; Ma, W.; Wu, X.; Hu, H. Real-time estimation of arterial travel time under congested conditions. Transportmetrica 2012, 8, 87–104. [Google Scholar] [CrossRef]
  19. Ndoye, M.; Totten, V.F.; Krogmeier, J.V.; Bullock, D.M. Sensing and signal processing for vehicle reidentification and travel time estimation. IEEE Trans. Intell. Transp. Syst. 2011, 12, 119–131. [Google Scholar] [CrossRef]
  20. Yu, B.; Lam, W.H.K.; Tam, M.L. Bus arrival time prediction at bus stop with multiple routes. Transp. Res. C Emerg. Technol. 2011, 19, 1157–1170. [Google Scholar] [CrossRef]
  21. Vlahogianni, E.I.; Golias, J.C.; Karlaftis, M.G. Short-term traffic forecasting: Overview of objectives and methods. Transp. Rev. 2004, 24, 533–557. [Google Scholar] [CrossRef]
  22. El Faouzi, N.E. Data fusion in road traffic engineering: An overview. In Proceedings of the SPIE—The International Society for Optical Engineering, Orlando, FL, USA, 14–15 April 2004. [Google Scholar]
  23. Choi, K.; Chung, Y. A data fusion algorithm for estimating link travel time. J. Intell. Transp. Syst. 2002, 7, 235–260. [Google Scholar] [CrossRef]
  24. El Faouzi, N.E. Bayesian and evidential approaches for traffic data fusion: Methodological issues and case study. In Proceedings of the Transportation Research Board 85th Annual Meeting (No. 06–1510), Washington, DC, USA, 22–26 January 2006. [Google Scholar]
  25. El Faouzi, N.E.; Klein, L.A.; De Mouzon, O. Improving travel time estimates from inductive loop and toll collection data with Dempster-Shafer data fusion. Transport. Res. Rec. 2009, 2129, 73–80. [Google Scholar] [CrossRef]
  26. Kong, Q.J.; Li, Z.; Chen, Y.; Liu, Y. An approach to urban traffic state estimation by fusing multisource information. IEEE Trans. Intell. Transp. Syst. 2009, 10, 499–511. [Google Scholar] [CrossRef]
  27. Kong, Q.J.; Chen, Y.; Liu, Y. A fusion-based system for road-network traffic state surveillance: A case study of Shanghai. IEEE Intell. Transp. Syst. 2009, 1, 37–42. [Google Scholar] [CrossRef]
  28. Nantes, A.; Dong, N.; Bhaskar, A.; Miska, M.; Chung, E. Real-time traffic state estimation in urban corridors from heterogeneous data. Transp. Res. C Emerg. Technol. 2015, 66, 99–118. [Google Scholar] [CrossRef]
  29. Shan, Z.; Xia, Y.; Hou, P.; He, J. Fusing Incomplete Multisensor Heterogeneous Data to Estimate Urban Traffic. IEEE Multimed. 2016, 23, 56–63. [Google Scholar] [CrossRef]
  30. Lederman, R.; Wynter, L. Real-time traffic estimation using data expansion. Transp. Res. B Methodol. 2011, 45, 1062–1079. [Google Scholar] [CrossRef]
  31. Haworth, J.; Cheng, T. Non-parametric regression for space-time forecasting under missing data. Comput. Environ. Urban. 2012, 36, 538–550. [Google Scholar] [CrossRef]
  32. Li, L.; Li, Y.B.; Li, Z.H. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transp. Res. C Emerg. Technol. 2013, 34, 108–120. [Google Scholar] [CrossRef]
  33. Chan, K.S.; Lam, W.H.K.; Tam, M.L. Real-time estimation of arterial travel times with spatial travel time covariance relationships. Transp. Res. Rec. 2009, 2121, 102–109. [Google Scholar] [CrossRef]
  34. Dion, F.; Rakha, H. Estimating dynamic roadway travel times using automatic vehicle identification data for low sampling rates. Transp. Res. B Methodol. 2006, 40, 745–766. [Google Scholar] [CrossRef]
  35. Jenelius, E.; Koutsopoulos, H.N. Travel time estimation for urban road networks using low frequency probe vehicle data. Transp. Res. B Methodol. 2013, 53, 64–81. [Google Scholar] [CrossRef]
  36. Rahmani, M.; Jenelius, E.; Koutsopoulos, H.N. Non-parametric estimation of route travel time distributions from low-frequency floating car data. Transp. Res. C Emerg. Technol. 2015, 58, 343–362. [Google Scholar] [CrossRef]
  37. Hans, E.; Chiabaut, N.; Leclercq, L. Applying variational theory to travel time estimation on urban arterials. Transp. Res. B Methodol. 2015, 78, 169–181. [Google Scholar] [CrossRef]
  38. Dempster, A.P. Upper and lower probabilities induced by multi-valued mapping. Ann. Math. Stat. 1967, 38, 325–339. [Google Scholar] [CrossRef]
  39. Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
  40. Beynon, M.; Cosker, D.; Marshall, D. An expert system for multi-criteria decision making using Dempster Shafer theory. Expert Syst. Appl. 2001, 20, 357–367. [Google Scholar] [CrossRef]
  41. Hegarat-Mascle, S.L.; Richard, D.; Ottle, C. Multi-scale data fusion using Dempster-Shafer evidence theory. Integr. Comput. Aided Eng. 2003, 10, 9–22. [Google Scholar]
  42. Gong, Y.; Wang, Y. Application Research on Bayesian Network and D-S Evidence Theory in Motor Fault Diagnosis. In Proceedings of the 6th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), Shenyang, China, 1–3 November 2013. [Google Scholar]
  43. Khaleghi, B.; Khamis, A.; Karray, F.O.; Razavi, S.N. Multisensor data fusion: A review of the state-of-the-art. Inf. Fusion 2013, 14, 28–44. [Google Scholar] [CrossRef]
  44. Deng, Y.; Chan, F.T.S. A new fuzzy dempster MCDM method and its application in supplier selection. Expert Syst. Appl. 2011, 38, 9854–9861. [Google Scholar] [CrossRef]
  45. Su, S.Y.; Deng, Y.; Mahadevan, S.; Bao, Q.L. An improved method for risk evaluation in failure modes and effects analysis of aircraft engine rotor blades. Eng. Fail. Anal. 2012, 26, 164–174. [Google Scholar] [CrossRef]
  46. Parikh, C.R.; Pont, M.J.; Jones, N.B. Application of Dempster–Shafer theory in condition monitoring applications: A case study. Pattern Recogn. Lett. 2001, 22, 777–785. [Google Scholar] [CrossRef]
  47. Dou, Z.; Xu, X.; Lin, Y.; Zhou, R. Application of D-S evidence fusion method in the fault detection of temperature sensor. Math. Probl. Eng. 2014, 2014, 1–6. [Google Scholar] [CrossRef]
  48. Fan, X.; Zuo, M.J. Fault diagnosis of machines based on D-S evidence theory. Part 1: D–S evidence theory and its improvement. Pattern Recogn. Lett. 2006, 27, 366–376. [Google Scholar] [CrossRef]
  49. Hu, Y.; Fan, X.; Zhao, H.; Hu, B. The Research of Target Identification Based on Neural Network and D-S Evidence Theory. In Proceedings of the International Asia Conference on Informatics in Control, Bangkok, Thailand, 1–2 February 2009. [Google Scholar]
  50. Dong, G.; Kuang, G. Target recognition via information aggregation through Dempster–Shafer’s evidence theory. IEEE Geosci. Remote Sens. 2015, 12, 1247–1251. [Google Scholar] [CrossRef]
  51. Li, Y.; Chen, J.; Ye, F.; Liu, D. The improvement of DS evidence theory and its application in IR/MMW target recognition. J. Sens. 2016, 2016, 1–15. [Google Scholar] [CrossRef]
  52. Dymova, L.; Sevastjanov, P. An Interpretation of Intuitionistic Fuzzy Sets in the Framework of the Dempster-Shafer Theory: Decision making aspect. Knowl. Based Syst. 2010, 23, 772–782. [Google Scholar] [CrossRef]
  53. Chen, N.; Sun, F.; Ding, L.; Wang, H. An adaptive PNN-DS approach to classification using multi-sensor information fusion. Neural Comput. Appl. 2009, 18, 455–467. [Google Scholar] [CrossRef]
  54. Yager, R.R. On the Dempster-Shafer framework and new combination rules. Inf. Sci. 1987, 41, 93–137. [Google Scholar] [CrossRef]
  55. Smets, P. The Combination of Evidence in the Transferable Belief Model. IEEE Trans. Pattern Anal. 1990, 12, 447–458. [Google Scholar] [CrossRef]
  56. Murphy, C.K. Combining belief functions when evidence conflicts. Decis. Support Syst. 2000, 29, 1–9. [Google Scholar] [CrossRef]
  57. Chen, B.Y.; Shi, C.; Zhang, J.; Lam, W.H.K.; Li, Q.Q.; Xiang, S. Most reliable path-finding algorithm for maximizing on-time arrival probability. Transp. B 2016, 5, 204–221. [Google Scholar] [CrossRef]
  58. Lomax, T.; Schrank, D.; Turner, S.; Margiotta, R. Selecting Travel Reliability Measures; Texas Transportation Institute Monograph: College Station, TX, USA, 2003. [Google Scholar]
  59. Hart, R.G. A Close approximation related to the error function. Math. Comput. 1966, 20, 600–602. [Google Scholar] [CrossRef]
  60. Khosravi, A.; Mazloumi, E.; Nahavandi, S.; Creighton, D.; Van Lint, J.W.C. A genetic algorithm-based method for improving quality of travel time prediction intervals. Transp. Res. C Emerg. Technol. 2011, 19, 1364–1376. [Google Scholar] [CrossRef]
  61. Khosravi, A.; Mazloumi, E.; Nahavandi, S.; Creighton, D.; Van Lint, J.W.C. Prediction intervals to account for uncertainties in travel time prediction. IEEE Trans. Intell. Transp. Syst. 2011, 12, 537–547. [Google Scholar] [CrossRef]
  62. Shi, C.; Chen, B.Y.; Li, Q. Estimation of Travel Time Distributions in Urban Road Networks Using Low-Frequency Floating Car Data. ISPRS Int. J. Geo-Inf. 2017, 6, 253. [Google Scholar] [CrossRef]
Figure 1. Illustrative example of the heterogeneous data fusion problem.
Figure 1. Illustrative example of the heterogeneous data fusion problem.
Sensors 17 02822 g001
Figure 2. Framework of the proposed heterogeneous data fusion method.
Figure 2. Framework of the proposed heterogeneous data fusion method.
Sensors 17 02822 g002
Figure 3. Typical information conflict situations of interval and point detectors: (a) low conflict, (b) high conflict, (c) complete conflict.
Figure 3. Typical information conflict situations of interval and point detectors: (a) low conflict, (b) high conflict, (c) complete conflict.
Sensors 17 02822 g003
Figure 4. Study area location.
Figure 4. Study area location.
Sensors 17 02822 g004
Figure 5. Two path travel time distributions obtained in the data preprocessing step.
Figure 5. Two path travel time distributions obtained in the data preprocessing step.
Sensors 17 02822 g005
Figure 6. Fused path travel time distribution during the period of interest.
Figure 6. Fused path travel time distribution during the period of interest.
Sensors 17 02822 g006
Figure 7. Path travel time distributions estimated from point detector data by using updated and fixed spatial correlations.
Figure 7. Path travel time distributions estimated from point detector data by using updated and fixed spatial correlations.
Sensors 17 02822 g007
Figure 8. Individual link travel time distribution estimated from point detector data and fused data.
Figure 8. Individual link travel time distribution estimated from point detector data and fused data.
Sensors 17 02822 g008
Figure 9. Path travel times of different methods during the period of interest.
Figure 9. Path travel times of different methods during the period of interest.
Sensors 17 02822 g009
Table 1. Simple example of distribution fusion using Dempster’s combination rule.
Table 1. Simple example of distribution fusion using Dempster’s combination rule.
Travel Time RangesCase 1Case 2Case 3
m i n t ( ) m p o i ( ) m f u s ( ) m i n t ( ) m p o i ( ) m f u s ( ) m i n t ( ) m p o i ( ) m f u s ( )
S 1 0.1000.3000.40-
S 2 0.20.30.21430.6000.60-
S 3 0.40.40.57140.10.1100-
S 4 0.20.30.214300.6000.7-
S 5 0.10000.3000.3-
Table 2. Simple example of distribution fusion using the generalized combination rule.
Table 2. Simple example of distribution fusion using the generalized combination rule.
Travel Time RangesCase 1Case 2Case 3
m i n t ( ) m p o i ( ) m f u s ( ) m i n t ( ) m p o i ( ) m f u s ( ) m i n t ( ) m p o i ( ) m f u s ( )
S 1 0.07500.04100.27500.24150.37500.3337
S 2 0.20.2750.20750.600.52700.57500.5116
S 3 0.40.40.47560.0750.0750.0874000.0000
S 4 0.20.2750.207500.60.068700.6750.0783
S 5 0.07500.041000.2750.031500.2750.0319
Θ 0.050.050.02730.050.050.04390.050.050.0445
Table 3. The accuracy of data fusion results and single data source results.
Table 3. The accuracy of data fusion results and single data source results.
Data SourceEstimated MeanEstimated STDPOPIPOOI
MAPERMSE (min)MAPERMSE (min)
Point detectors46.5%2.3261.6%0.7585.9%92.0%
Interval detectors17.1%1.4276.9%1.0126.4%48.9%
Data fusion7.1%0.8517.9%0.3515.7%25.6%

Share and Cite

MDPI and ACS Style

Shi, C.; Chen, B.Y.; Lam, W.H.K.; Li, Q. Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks. Sensors 2017, 17, 2822. https://doi.org/10.3390/s17122822

AMA Style

Shi C, Chen BY, Lam WHK, Li Q. Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks. Sensors. 2017; 17(12):2822. https://doi.org/10.3390/s17122822

Chicago/Turabian Style

Shi, Chaoyang, Bi Yu Chen, William H. K. Lam, and Qingquan Li. 2017. "Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks" Sensors 17, no. 12: 2822. https://doi.org/10.3390/s17122822

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop