A Survey of Methods and Technologies for Congestion Estimation Based on Multisource Data Fusion

: Trafﬁc congestion occurs when trafﬁc demand is greater than the available network capacity. It is characterized by lower vehicle speeds, increased travel times, arrival unreliability, and longer vehicular queueing. Congestion can also impose a negative impact on the society by decreasing the quality of life with increased pollution, especially in urban areas. To mitigate the congestion problem, trafﬁc engineers and scientists need quality, comprehensive, and accurate data to estimate the state of trafﬁc ﬂow. Various types of data collection technologies have different advantages and disadvantages as well as data characteristics, such as accuracy, sampling frequency, and geospatial coverage. Multisource data fusion increases the accuracy and provides a comprehensive estimation of the performance of trafﬁc ﬂow on a road network. This paper presents a literature overview related to the estimation of congestion and prediction based on the data collected from multiple sources. An overview of data fusion methods and congestion indicators used in the literature for trafﬁc state and congestion estimation is given. Results of these methods are analyzed, and a disseminative analysis of the advantages and disadvantages of surveyed methods is presented.


Introduction
Congestion on the road network is, by one definition, reduced quality of service caused by a network carrying more vehicles than it can handle. Congestion estimation is a significant task helping to mitigate the air pollution, by tackling the problem of lower vehicle speeds and increased time delays. If the congestion intensity in the network is known, appropriate optimization, to reduce the negative side effects of traffic, can be applied.
Estimation of the traffic congestion can be significantly improved by combining data from various sensors and other sources. In the literature, this process is called data fusion (DF). DF of multisource traffic data can be defined as the process of combining data or information to more accurately estimate or predict traffic flow conditions. According to Qing [1] and Dailey et al. [2], multisource DF is a technique in which data from multiple sensors are combined to create a synergic process for providing comprehensive and accurate information; in this context comprehensive information means improved, less expensive, and higher quality information. Therefore, DF is a group of techniques by which information from multiple sources is combined to reach a better usability of data and provide a better description of traffic scenario. A general analysis of traffic state tries to identify the traffic flow conditions at which the road would be operating at an optimal level and define congestion as the difference between actual, observed, and optimal conditions. To determine a realistic and accurate state of the traffic flow and predict the travel time, traffic engineers need comprehensive spatiotemporal traffic data. Since every traffic data collection technology has its disadvantages, multisource data collection is often used to cope with the disadvantages and harness the advantages of the used technologies.

•
A systematic literature review through a keyword-based search of academic research databases and a systematic identification of highly relevant papers, according to the impact made on scientific community, from the search results. • Coverage of traffic congestion estimation studies in urban networks based on the data fusion from multiple sensors. • Analyzation of the objectives of congestion estimation and data fusion, e.g., improving efficiency or accuracy, together with different data fusion methods used according to their performance.
In this paper, an overview of the most used multisource data collection technologies is given, and methods for DF and congestion indicators for congestion or traffic flow estimation are presented. Its aim is to fill a research gap in the review of methods and technologies for congestion estimation based on multisource data fusion. The main question to be answered is: which data collection technologies were used for data fusion and which data fusion methods are mostly used according to their performance? Section 2 gives the background of the conducted research presented in this paper. In Section 3, a short overview of commonly used quantitative congestion indicators is given. Section 3 gives an overview of commonly used traffic data collection technologies for DF and the advantages and disadvantages of each technology. In Section 4, a representation of different DF methods and techniques used for traffic congestion estimation is provided. In Section 5 a discussion of the recently proposed methods is given, and, finally, Section 6 brings conclusive remarks.

Review Methodology
As mentioned in the introduction, we used a keyword-based search and selected work published mostly during the past twenty years, from 2000 to 2020. We performed the search of the most used scientific research databases which cover research from the field interesting for this review: Scopus, IEEE Xplore, Web of Science (WoS), and Google Scholar. We used keyword-based searches to identify and filter relevant articles based on appropriately selected keywords (traffic congestion; multisource data fusion; data fusion, traffic flow modeling; congestion estimation; traffic state estimation). Some older sources were also mentioned to provide the information of the previously developed methods. The main part of this research includes 30 papers which are thoroughly discussed, and their methods and results are summarized in the following sections.

Congestion Definition
Traffic congestion occurs when the volume of traffic exceeds the available network capacity [3,4]. There is no complete consensus about the congestion definition, nor how it should be estimated and represented to cover every situation because the contexts are different; the capacity, which is the key factor in defining and estimating the congestion, has randomness and it is difficult to exactly detect and measure it in the field. However, congestion is characterized by lower vehicle speeds, increased time delays in planning trips, arrival unreliability, and vehicular queueing longer than in the free flow conditions [3]. Congestion is a non-linear function, meaning that, as a road approaches its maximum capacity, small changes in traffic volumes can cause non-proportionately larger delays. Congestion can also be recurrent or non-recurrent, where recurrent congestion occurs on a daily, weekly, or annual basis. This type of congestion is caused by recurring bottlenecks, which happen when traffic demands are greater than the capacity of network elements, i.e., road segments and intersections. Non-recurrent congestion is unexpected and usually unpredictable or with a small percentage of predictability caused by traffic incidents, vehicle breakdowns, work zones, weather, and special events [5]. Two mentioned main forms of congestion are important to distinguish whether the goal of the research is to find a pattern or to estimate the frequency and predictability of the congestion. However, even recurrent congestion can show a large degree of randomness, especially in its duration and severity [3].

Congestion Problem and Impact on Society
Many traffic experts and companies worldwide deal with congestion issues and try to find various solutions to mitigate this problem. Congestion is a big issue in most cities around the world and significantly affects citizen mobility. For example, in 2019, according to INRIX Research, the top five most congested cities were (1) Moscow, (2) Istanbul, (3) Bogota, (4) Mexico City, (5) São Paulo [6], where three of five cities overlap with HERE's ranking of the top 5 most congested cities [7]. INRIX Research calculated the congestion impact rank based on a city's population and the delay attributable to congestion. Ranking by hours lost in congestion shows that 8 of the top 10 congested cities globally are in Europe. The reason for this probably lies in old downtown areas with narrow streets, which can be traced back to the Roman period [6], with a small capacity. According to TomTom's ranking, the top five most congested cities worldwide are (1) Mexico City, (2) Bangkok, (3) Jakarta, (4) Chongqing, (5) Bucharest. They calculated the congestion level as an increase in overall travel times compared to the free flow or uncongested situation. By analyzing the top five cities, citizens spend approximately 50% more time traveling in congestion conditions than in free flow conditions [8]. Travel speeds, congestion level, and time-loss positively correlate with population and density of population in a city [6]. However, the authors in [9] warn that the INRIX and TomTom congestion indices can be pessimistic. Their congestion indices are based on the speed data collected by their subscribers, who tend to drive in congestion more than average, and so exaggerate the congestion level for the average driver. Obviously, large cities usually create larger GDP and have possibilities of developing adequate infrastructure. The authors in [10] concluded that cities with higher GDP generally build more infrastructure, which helps in reducing overall congestion.
Traffic congestion is not just a problem for commuters, it also has an impact on the economy [11] and the environment [12]. Hao et al. [13] established a framework for traffic-related evaluation of air pollution using mobile crowd-sourced data, such as cellular network data and Global Navigation Satellite System (GNSS) data. Economic impact, combining direct costs, like delay in travel times, fuel wasted, and indirect ones, such as pollution by generated emissions, reduction in life satisfaction and driver stress [9]. INRIX Research estimates congestion costs in the USA, UK, and Germany for the top 10 most congested cities to be around USD 46B generated in the USA, GBP 6.8B in the UK, and EUR 5.2B in Germany [6]. The congestion costs depend upon the labor market, industrial sector, mode of transport, trip distance, and travel conditions. To assess these costs, public sector managers need accurate operational models. Woensel and Cruz [14] used stochastic queueing models for congestion costs calculation. Ali et al. [15], used three outgoings for estimating congestion cost: opportunity costs, vehicle operating costs, and fuel consumption quantity. Opportunity costs use delay, number of vehicles, average vehicle occupancy for a specific mode of transport, and value of time for a specific mode of transport estimated using a socio-economic survey. Vehicle operating cost consists of fuel cost for a specific mode of transport and delay. Consumption quantity includes fuel price of specific fuel types (Gasoline, Diesel, Compressed natural gas). In [16] authors analyzed traffic congestion costs related to the road transport of passengers. They presented an analysis of two cost factors of road traffic congestion: productivity time loss and fuel energy consumption. Results indicate that personalized transport significantly contributes to traffic congestion and imposes significant losses to the economy.
To mitigate the congestion problem, traffic engineers use a variety of solutions, such as congestion pricing, encouraging multimodal and public transport, and developing Intelligent Transport Systems (ITS) applications [17,18]. Hensher [19] argues that switching to the sharing economy and relinquishing private car ownership, or switching to connected autonomous vehicles, can potentially reduce congestion. A different strategy to reduce congestion can be achieved by adopting teleworking [20], which includes various programs and activities that substitute physical travel with new telecommunication technologies. Teleworking is defined as work performed from a distance, typically online, which eliminates travel to work offices. This can be tested especially in times of COVID-19 [21], where a complete or partial lockdown causes a reduced number of trips to work offices. In addition, the authors in [22] suggest that moving consumers from physical stores to online stores can potentially alleviate traffic congestion by preventing consumers from driving to physical stores.

Data Fusion in ITS
Data fusion is applied in many areas: intelligent transport systems, bioinformatics, cheminformatics, geospatial information systems, oceanography, wireless sensor networks. There are several papers giving a review of DF in ITS. El Faouzi in [23] provided a survey of how DF is used in different areas of ITS, such as: Automatic Terminal Information Service, Automatic Incident Detection, Advanced Driver Assistance, Network Control, Crash Analysis and Prevention, Traffic Demand Estimation, Traffic Forecasting, and Monitoring. El Faouzi introduced two levels of DF: the first level involves fusion to provide only raw and uncorrelated data to the end-user, and the main methods are data association and positional estimation using Kalman filter. The second level delivers meaningful information from raw data for guiding human decision-making, and the methods include pattern recognition using adaptive neural network and clustering methods and identify fusion using Bayesian Decision Theory and Dempster-Schafer evidential reasoning.
Mihaylova et al. in [24] separated the DF process through the six-level hierarchy based on DF goals. Level 0 encompasses source preprocessing to address estimation and compression of input data. Level 1 includes data analysis from all appropriate sources, e.g., point, point-to-point, and area-wide sensors. Level 2 includes a complete and timely assessment of the observed data and information from external sources (weather reports, seasonal traffic patterns, special events and construction schedules, and others) by incorporating relations among the entities of interest. Level 3 evaluates traffic flow patterns and external sources for assessing the occurrence of an incident, travel time delays, and other events that influence traffic flow. Level 4 improves the effect of the fusion process through traffic planning and control. Level 5 processing is focused on issues related to human support in cognitive decision making and action taking based on the fused information.
El Faouzi et al. in [25] made a review of the state of practice and prospects for DF in the management of the travel demand. The proposed system architecture requirements, several DF models, and a brief review of major relevant industry players in data provision, data aggregation, and delivery to end users. El Faouzi and Klein in [26] warn that in this field, there are still some remaining challenges. This includes the need for obtaining data with the necessary accuracy to make dynamic and real-time ITS applications. They see the potential in the development of methods to combine traffic sensors and human-generated data.

Quantitative Congestion Indicators
Traffic engineers use different approaches to present and describe the state of traffic flow and to estimate the degree of congestion. The common approach is to describe the state of traffic flow using traffic flow parameters and the fundamental diagram proposed in 1935 by Greenshield [27]. Another approach relies on the usage of Lighthill-Whitham-Richards (LWR) models, first introduced in 1955 by Lighthill and Whitham [28], and independently in 1956 by Richards [29]. The three-phase traffic theory, developed by B. Kerner between 1996 and 2002 [30][31][32], proposes a division of congested traffic into two distinct phases, synchronized flow and wide moving jam. Some traffic flow parameters or combination of some parameters can be used as quantitative congestion indicators. Fundamental traffic parameters are flow rate q (veh/h), density ρ (veh/km), and speed v (km/h) according to [33] with basic relationship q = ρ v. Other parameters used to describe the state of traffic flow are given in Table 1. [1,27,34]. The throughput is the number of vehicle-kilometers driven for a given length of a road and for a given time period In [35], the authors state that the key performance indicators for urban roads are delay, density, and Level of Service (LOS). LOS is usually measured by speed, density, and volume/capacity ratio [36], and it presents a quality indicator of the road network service.
There are also hybrid measures, which can be fused by combining two or more measures. The authors in [37] combined flow, measured using loop detectors, and travel time, measured using GNSS probe taxi vehicles, to calculate the link density and determine the shape of the Macroscopic Fundamental Diagram (MFD).
There are also other indicators regarding the state of traffic flow, which can be provided by GNSS probe data, like Proportion Stopped Time (PST), and Acceleration Noise (AN). The authors in [38][39][40] used several indexes such as travel time index, space mean speed index, acceleration noise index, buffer index, and planning time index for estimation of congestion on the link or network. Indexes for a transport link and network congestion estimation are also mentioned in the survey [35]. In Table 2, a short overview of congestion indexes is given per [41]. Hybrid indicators that combine two or more parameters also can be used for congestion description. Speed Reduction Index SRI The ratio between free flow speed v 0 and actual speed v difference over free flow The extra time that travelers must add to their average travel time when planning trips to ensure on-time arrival Travel Rate Index TRI The additional time that is required to make a trip because of congested conditions on the roadway.
The ratio of stopped time T s to the total journey time T r (running time)

Acceleration Noise AN
Induce fluctuation in speed where ∆t i is the time interval taken for a speed change ∆v i and T r is vehicle running time The ratio between actual acceleration noise and acceleration noise in a free flow condition AN I = AN AN 0

Data Collection Technologies
According to [42], traffic data collection technologies can be classified into three groups based on functionality: point sensors, point-to-point sensors, and area-wide sensors. Here, the word sensor refers to a traffic flow sensor, i.e., device or system that can collect traffic flow data. Point sensors include various technologies such as: inductive loops, piezoelectric sensors, video image sensors, radars, infrared sensors, acoustic sensors, pneumatic road tubes, and magnetic sensors. These sensors are usually limited in spatial coverage and are used for measuring traffic volume, speed, occupancy, and other traffic flow parameters [43][44][45].
Point-to-point sensors detect vehicles at multiple locations throughout the network and are often called automated vehicle identification (AVI). Major technologies used for point-to-point detection are Bluetooth, Wi-Fi, RFID, and Automatic License Plate Recognition (ALPR). These technologies are suitable for computing travel times, route choice fractions paths, and origin-destination (O-D) flows [46][47][48].
Some technologies do not necessarily belong to only one group and can be used as either point or point-to-point sensors. There are several papers where researchers use inductive loops for vehicle reidentification and travel time estimation [49][50][51]. Cameras and video-and image-processing are used for collecting traffic data as point and point-to-point sensors [52][53][54][55].
Area-wide sensors include data collection technologies that allow tracking of vehicles over a large area. The most promising are Floating Car Data (FCD) and Cellular Floating Car Data (CFCD). FCD data are generated by smartphones or vehicles equipped with GNSS receivers, also known as GNSS probe data. These vehicles are mostly part of the fleet, e.g., taxi service, public transport service, or some company fleet. FCD and CFCD are used for calculating a wide range of useful traffic parameters, such as space mean speed, travel time, O-D matrices [56][57][58][59], and queue length [60][61][62]. An often used term in the literature, crowdsourced data, are data gathered from social media [63]. Posts from social media (e.g., Facebook, Twitter, WeChat, Sina Weibo) must be geotagged to collect various traffic events, such as traffic accidents and jams. Posts from social media usually have low reliability, caused by users who can generate random content in random places at random times and provide mixed information that is inaccurate. Still, if properly handled, it can be easily combined with conventional information sources. Technologies that provide data collection from connected vehicles [64] and airborne and satellite imagery also belong to area-wide sensors [65][66][67].
Based on location of mounting, fixed sensors can be classified into two categories: intrusive and non-intrusive sensors [68]. Intrusive sensors are installed on or in pavement, which include inductive loops, magnetic sensors and piezosensors. Non-intrusive sensors are placed above or next to the road, e.g., on poles or consoles. The main advantages of intrusive over non-intrusive sensors are high accuracy in detecting vehicles and the negligible impact of weather conditions. The main advantages of non-intrusive over intrusive sensors are faster and cheaper installation, often without interruption of traffic flow, and multiple flexible detection zones with most of these sensors.
Every technology has its advantages and disadvantages with several characteristics that can affect their effectiveness and suitability for certain purposes: for example, ability to collect traffic flow parameters of interest, accuracy, reliability, price, installation cost, privacy, etc. In Table 3 the main advantages and disadvantages of certain sensors are shown.  Traffic information from only one data source is insufficient to meet the need of providing a real-time traffic congestion estimation in a large city [69]. For many ITS applications, the information provided by individual sensors is incomplete, inaccurate and/or unreliable [70]. Using multiple sensors for data collection helps in overcoming the disadvantages of some sensor technology and allows for a better assessment of the traffic states. Furthermore, multisource data can be useful for modeling people's travel trajectories and imply potential mobility patterns [70].
According to [71], the DF model can include people, web bots, or data fused at lower levels as data sources in addition to the sensors themselves. Table 4 summarizes the literature review with the aim to present the data sources used in DF for congestion estimation needs.
Wang et al. in [71] combined social media data and GNSS probe data to understand urban traffic congestion better. As an extension of this work in [69], the authors model the traffic congestion using GNSS probe data and social media data (Twitter). To model the traffic congestion more accurately, they also extracted rich auxiliary information such as social events, physical features of road, point of interest features, and weather information. The discovered traffic co-congestion patterns are then used to detect anomalies in the arterial network and better estimate traffic conditions of a large arterial network. Kong et al. [72] developed a real time fusion-based system using GNSS probe taxi vehicle data and loop data from the Sydney Coordinated Adaptive Traffic System (SCATS). In review [73], authors separated multisource data on the traditional research data (survey data, bank notes, call detail records), and popular urban data (GNSS data, public transport data, social media check-in data) used for modeling and predicting human-mobility patterns. Zhu et al. in [74] used three different data sources: GNSS data obtained from buses, inductive loop data, and mobile phone network data, while automatic number plate recognition data were used as a ground truth. For the traffic speed prediction in [75], data was obtained from inductive loop devices in combination with weather data. Novel research [64] proposed a combined method for estimating traffic conditions fast and accurately by using connected vehicles combined with stationary detectors. Croce et al. [76] proposed a more accurate visualization of traffic conditions using specific elements of Transport System models zones and graphs. Procedure included DF of GNSS and traditional survey data. In [77], authors develop a framework for traffic state estimation using data collected from point detectors (loops, cameras, and radar) and probe data (GNSS and Bluetooth). The framework can filter, fuse, and process data from various sources in real time, providing a reliable Advanced Traveler Information Service (ATIS). The authors in [78] developed a new convex optimization framework for route flow estimation problem using a fusion approach for loop detectors and cellular signal traces.
Most of the authors used GNSS probe data and inductive loop data for DF. An inductive loop is a well-known technology, and in combination with GNSS probe data, it can overcome mutual shortcomings. Most authors use GNSS probe data because of their large spatial coverage and potential for real-time traffic state estimation.

Representation of Methods Used in Data Fusion
Challenges of DF are imperfection, inconsistency, confliction, alignment and correlation, and heterogeneity of the type of data. Researchers are dealing with these challenges to improve efficiency and quality in traffic estimation, classification, and prediction tasks. In [69] and [71], to combine multisource data, the authors proposed a coupled matrix and tensor factorization scheme named TCE_R. A method called search tree-based pattern mining is proposed to efficiently discover which road segments, geographically close to each other, are likely to experience the co-occurrence of traffic congestion. Recursive Kalman filter-based approaches provide a solution for traffic state estimation and DF. However, when data cannot be straightforwardly aligned over space and time, equations become computationally expensive. Therefore, the authors propose three alternative DF approaches that solve this problem and are tailored to fuse different traffic sensor data. The so-called PISCIT, FlowResTD, and Treiber-Helbing filter (EGTF) are able to fuse multiple data sources, as long as for each of these it is possible to estimate under which traffic conditions the data were collected (congested or free flow) [86].
Wang et al. [71] used GPS probes and traffic-related information collected from social media to estimate urban traffic congestion more accurately. Additionally, they extract auxiliary information, including road congestion correlations, social events, road features, and points of interest. The results are evaluated on the real arterial network and show the effectiveness and efficiency of the proposed method. Okawa et al. [87] use the Deep Mixture Point process for event prediction in urban areas. This approach can use highly dimensional and multisource data to benefit from data in a rich urban context. Salanova [77] proposes a framework for data collection, filtering and fusion to get real-time traffic estimation and short-term travel time prediction. The framework for DF uses the Data Expansion Algorithm explained in [88], where the authors merged estimated traffic flows and used the proposed link-based on volume-delay function to assign travel times and average speed values to links. Wu et al. [78] proposed a fusion of cellular network and traffic sensor data for route flow estimation. These two very different types of data are highly uncorrelated because cellular network data cannot be exactly mapped onto the road network. To overcome this lack of spatial information, caused by a rather sparse network of cell towers, authors introduced cellpaths, defined as a trajectory vehicle passes between two cell towers. Route flow prediction is finally achieved through convex optimization formulation using a map, cellpath flow, link flow, and O-D flow data. The problem, in this case, is the division of cellular network in different cells, and that was achieved by division in so-called Voronoi cells, which partition the area into cells by observing coverage from a certain cell tower, located in the center of each cell. The authors proposed different models of combined usage of cellular and loop-based data but their problem was the actual availability of real cellular data for privacy reasons. This probably remains the largest problem in similar approaches because different countries have different restrictions for sharing even anonymized cellular data. Kong et al. [89] propose an online information fusion approach for urban traffic state estimation. The approach consists of three parts of algorithms, including the evidential fusion, the data processing of loop detectors, and the data processing of GPS probe vehicles. In the application of the evidential fusion model, both loop detector data and GPS probe vehicle data are integrated so that the traffic states can be more comprehensively estimated and with more accuracy with using only one of them. The proposed approach can well balance the requirement between accuracy and real-time performance. Toole et al. in [90] presented a flexible, modular, and computationally efficient software system. The system estimates multiple aspects of travel demand using Call Detail Records (CDRs) from mobile phones in conjunction with open-and crowd-sourced geospatial data, census records, and surveys. Authors used algorithms to construct O-D matrices, presenting route trip through a road network. Additionally, they presented an online, interactive visualization platform to communicate these results to researchers, policymakers, and the public. The system flexibility is tested on multiple cities around the globe.

Statistic Methods
Wang et al. [79] reported a study on how to explore social media as an auxiliary data source and incorporate it with GNSS probe data to enhance estimation of the traffic congestion. The authors extensively collected tweets that report various traffic events such as congestions, accidents, and road constructions. Next, they proposed an extended Coupled Hidden Markov Model, which can effectively integrate GNSS probe readings and traffic-related tweets to estimate traffic conditions of an arterial network more accurately. The experimental results demonstrated the superior performance of the model by comparison with previous methods. Zhu et al. [74] used three different data sources, namely bus-based GPS data, inductive loop detector data, and mobile phone network data that are combined using three different DF techniques. The hybrid method outperforms the weighted mean approach and artificial neural networks to fuse multiple data resources and produce more accurate travel times. The results indicate that fusing multiple data together does not necessarily enhance the accuracy of travel time estimation. Travel time estimation depends on the reliability of individual data sources. Fusing highly correlated data sources can lead to a worse result. The results also show that even in dense urban areas, GPS data, when combined with inductive loop detector data, can provide reasonable travel time estimates of general traffic stream under different traffic states. Zheng et al. [84] used a slightly different approach and tried to combine data from social media platforms with GPS data collected from taxies. The authors collected data in 2014, for 19 days, from both sources. The social media data were collected from the largest micro-blog platform in China, Sina Weibo and were filtered through a list of keywords which are somehow related to transport in general. The first step was to match the GPS data on the road network and detect the path with anomalous travel time according to the historical measurements. After that they applied DF of GPS and social media data by selecting the proper messages related to names of nearby roads and landmarks. The social media data, of course, do not contain a proper geolocation, and to solve that problem authors used a rectangular search area of one square kilometer. After statistical analysis, it was concluded that after a nonrecurring traffic incident a large number of messages is created, making them a good tool for detection and analysis of unexpected events in the traffic flow. Li et al. in [85] proposed an extended generalized filter algorithm for the urban expressway traffic state estimation. To estimate the traffic state, they used multiple sources of data from fixed sensor data (inductive loops or radar data) and GNSS probe vehicle data. Patire et al. [80] proposed a hybrid data framework; they incorporated GNSS data with loop detector data for real-time travel time estimation and concluded that using fused data gave better results. The authors concluded that better travel time estimation might be achieved by fusing a relatively small amount of probe data other than by doubling the number of loop detectors. Sohn et al. [91] introduced DF to classify the severity of road traffic accidents. The authors described various fusion methods (Dempster-Shafer, Bayesian and Logistic methods) and ensemble algorithms (Bagging and Arcing) to improve the classification accuracy and discrimination power. Fusion algorithms (Bayesian procedure showed best results) display better discrimination power than a single classifier. Dempster-Shafer algorithm appears to improve the classification performance in terms of classification accuracy. Results indicate that a clustering-based classification algorithm works best for road traffic accident classification in Korea. Jiang et al. in [81] aimed to modify the extended generalized Treiber-Helbing filter (EGTF) to fuse GNSS data (probe vehicles) and traditional traffic data from loop detectors, to enhance more accurate estimations of traffic states (speed, travel time) and emissions on urban expressways. Choi et al. [92] proposed an algorithm for fusing travel time focusing on loop detector data and GPS probe vehicle data. The algorithm and procedure involve voting technique, fuzzy regression, and Bayesian pooling method. The results showed that the fused travel time is superior to the pure arithmetic mean method. The method produces more accurate, reliable, and realistic travel time. To more accurately estimate the MFD, the authors fused loop detector data and GNSS-FCD [37,82]. In [23], the authors used a simulation of an abstract grid network to validate results. Seven multi-sensor DF-based estimation techniques were investigated in [70]. All methods were implemented and compared in terms of their ability to fuse data from loop detectors and probe vehicles (Bluetooth and GNSS) to accurately estimate freeway traffic speeds. Results show that most DF techniques improve accuracy over single sensor approaches. The analysis shows that the improvement by DF depends on the technique, the number of probe vehicles, and the traffic conditions. Mil and Piantanakulchai [93] used spurious data and traffic conditions to estimate the travel time. To achieve this, they used the Bayesian DF approach, combined with the Gaussian mixture model, to fuse the travel time data: Data originate from different types of sensors to improve accuracy, precision, and completeness in terms of spatial and temporal distribution. The difference in traffic conditions classified using the Gaussian mixture model, and the bias estimation from individual sensor by introducing a non-zero mean Gaussian distribution learned from the training dataset were added to the model. To prove the concept, the authors used measured data as input to simulator and multiple types of simulated sensors: loop detector, GPS, and virtual trip line. Travel time was modeled using Gaussian mixture models with two or three components, depending on the state of traffic flow characteristic for a certain road. As a result of the presented work, authors reported an improvement in accuracy of at least 16.3% over mean absolute percentage error of the baseline model with reduced standard deviation up to 6.03%.

DNN Methods
Essien et al. [75] investigated the influence of weather conditions on traffic speed in urban conditions. To achieve this, the authors used the Long Short-Term Memory Neural Network (LSTM-NN). In comparison to classical artificial neural networks, LSTM-NN has the ability to "forget" or store information over a longer period of time. This feature made models using LSTM-NN superior in predicting the speed than SVM, Kalman Filter, and ARIMA. The authors used data obtained from inductive loops and weather data: rainfall, and temperature. For the testing scenario, they decided to use an urban arterial road in Greater Manchester and obtained the best prediction results in terms of minimal absolute error for the model, which used a combination of both weather data and inductive loops. This method outperformed ARIMA by a couple of orders of magnitude. Chou et al. [94] proposed a deep learning-based framework with the integration of road network, weather, and traffic data for predicting the long-term traffic time, called Deep Ensemble stacked Long Short-Term Memory (DE-SLSTM). For the difficulty of predicting the traffic time during congestion, they adopt the "cost sensitive" mechanism in the proposed framework to improve the prediction accuracy during rush hours. The proposed framework fits the ground truth better than Google maps and demonstrates good performance. Rodrigues et al. [95] turned to using deep learning architectures to combine text information with time-series data. This approach was used to solve the problem of taxi demand forecasting in New York. By fusing two complementary cross-modal sources of information, the authors showed that the proposed models can significantly reduce the error in the forecasts. Using textual information is rather difficult because it needs to be made understandable for the neural network to process it. The way authors are used here is by converting words into sorted integers, so that closely related words are closer in the number-space. Using the standard representation of successfulness, the authors empirically demonstrated that fusing these two very different data sources leads to significant forecasting error reductions.
In the image processing domain, one of the most used approaches is multiple image fusion or multiple feature fusion extraction. Li et al. [96] preprocessed original images of pavement cracks using different values of standard deviation for the Gaussian blur method. Pavement cracks are detected for every preprocessed image, and the results are then fused into the final image. Often, CNNs are used after fusion for classification or feature extraction purposes. Ke et al. [97] used CNN on the traffic cameras' images to automatically extract the features to estimate the occupancy and the traffic flow. The results are then fused with images that represent density and velocity estimations to achieve final goal of traffic state estimation. Hu et al. in [98] were training a CNN with the multiple features extracted from the images of the drivers in the car. The result was the algorithm for recognition of driving behavior able to detect dangerous actions of the driver. Guan et al. [99] were using a CNN in the fusion of visible spectrum and multi-spectral images. The CNN was trained to detect pedestrians in the images captured in different illumination scenarios.
Some authors are using the same dataset but fuse the results of the multiple methods. In [100], the authors are fusing the results of the K-Nearest Neighbor (KNN) with the LSTM. KNN is used to capture spatial features and detect most similar locations. Then, an LSTM is used to predict the traffic flow on the observed locations. In [101], the authors are combining Support Vector Regression (SVR) and the LSTM to detect the abnormal passenger flow on the observed urban rail transit network. The SVR is used to compute a steady passenger flow volume series, and the LSTM to model the large fluctuations in the traffic flow. Then, the results are combined into one framework. In many papers, authors use Artificial Intelligence, advanced filters (Kalman filter, Treiber-Helbing filter) or matrix factorization, to fuse the data or decisions based on collected data. In Table 5, DF methods are presented, and it is obvious that two approaches dominate, namely statistical and deep learning methods. We can also see that statistical methods are applied in the time horizon from 2002 to 2020, and deep learning methods are currently being applied, starting from 2019, as expected because of the popularity deep learning recently. DF can also decrease costs because usage of several low-budget sensors, in combination with the correct date processing, can achieve the same level of accuracy as the usage of only one more expensive sensor. For multisource DF, authors mostly used a combination of inductive loops, as point sensors, and GNSS data, as area-wide sensors. Potential for determining the most accurate estimation of the state of traffic flow is the use of at least one sensor from each group: point, point-to-point, and area-wide sensors. For example, point sensors (radars or inductive loops) to gather an accurate situation at the road segment or network link, point-to-point sensors (Bluetooth) for enriching area-wide, and area-wide sensors (GNSS data) for gathering a more accurate situation in the traffic network.
From the survey made on the selected papers, several potential research directions to address the limitations of the existing methods were identified. Two dominant approaches in estimation of congestion that were mostly used lately are statistical and deep learning methods: • Statistical methods can provide insights into the traffic flow conditions but fail when dealing with complex and highly nonlinear data. A statistical method is used either to offer insights about relationships within the data, its structure, or to create a model that can predict future traffic states. Statistical methods have solid and widely accepted mathematical foundations which makes them more "understandable" than some deep learning methods.

•
Much exploited, deep learning approaches create models that are "intelligent" and use a large amount of data to get useful insights into the traffic flow and detect various patterns. Deep learning is more flexible than statistics but there is not always a mathematical explanation of why a certain approach works better. Two approaches in using deep neural network (DNN) methods can be observed: (i) combination of image processing-related methods which use convolutional neural networks, and (ii) time-series analysis which uses the long-short term memory network. DNNs have been widely applied to various transportation problems, partly because they are very generic, accurate, and convenient mathematical models able to easily simulate numerical model components. They have been mainly used as a data analytic method because of their ability to work with massive amounts of multidimensional and multisource data. DNN methods are more flexible than statistical methods; the functional form is approximated via learning and not a priori assumed as it is the case with statistics [105]. On the other hand, DNN based models can be computational and memory expensive [106].
Comparing efficiency and performance of these methods is very difficult, if not impossible, and the reason is the usage of completely different testing scenarios for different methods and approaches by different authors. All referenced methods claim rather high accuracy and good performance in the given environment, but it is not clear how they would perform in different conditions. To overcome that, it might be beneficial to test methods using synthetic data which would contain various modalities (sensor data). Only that kind of a test would yield a good comparison of different approaches aiming at solving the same problem.
To estimate performance indicators of the traffic flow and to determine congestion, authors mostly used travel time or flow speed. At the time of writing this paper, there was no work trying to fuse congestion indexes mentioned in Table 2 with the abovementioned methods. It remains necessary to explore and determine if it is possible to estimate congestion and get realistic and more accurate information about the performance of the traffic flow based on congestion indexes fused with multi-sensor technologies. Multisource DF enables getting information about the state of traffic on network links with higher resolution, allowing it to be incorporated into the overall picture of a traffic network. For example, data collected from point detectors (radar, video, inductive loops) can measure traffic flow in each traffic lane separately. The traffic volume from one traffic lane to another can significantly differ, and that is the information that point-to-point or area-wide sensors cannot provide. This means that only combination of sensors can provide such detailed or high-resolution information of the traffic flow performance on a network link. This detailed information can be used in advanced ITS applications, such as dynamic traffic management (DTM), advanced traveler information systems, adaptive intersection control, routing, and traffic information services.

Conclusions
Human mobility is one of the essential human needs, and traffic congestion is a negative side effect of it. Traffic engineers use various approaches to solve or reduce this negative side effect. To achieve the best results, they need to accurately estimate the traffic flow performance or the level of congestion. For accurate estimation of the traffic flow performance, one of the approaches is to use data collected from multiple sensors and find a clever way to fusion this data. In this review, an overview of papers, where authors tried to determine the traffic state or predict congestion using multisource DF approach, is given. Additionally, a representation of different DF methods and techniques used for traffic congestion estimation is provided. This review aimed at making a basic analysis of recently used approaches in data fusion and providing guidelines to researchers in this field by helping them determine which approaches are most likely to provide good results. Dominant data fusion approaches, used from 2010 to 2020, are statistical analysis, and in recent approaches the most dominant are DNNs. It can be concluded that estimation of the traffic flow, generally, should be done using data from multiple different sources to provide resistance to possible biases some data collection technologies can impose. Another beneficial tool in the development of prediction methods, like data mining utilization, would be a standardized testing dataset from various multisource data, which would provide actual numerical evidence of how successful an approach is, given the available data.
Author Contributions: The conceptualization of the study was done by D.C., M.M., L.T., and N.J., who also did the funding acquisition. The writing of the original draft and preparation of the paper was done by D.C., and M.M. All authors contributed to the writing review and final editing. The supervision was done by N.J. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been financially supported by the University of Zagreb and Faculty of Transport and Traffic Sciences.