An Innovative and Cost-Effective Traffic Information Collection Scheme Using the Wireless Sniffing Technique

Lee, Wei-Hsun; Liang, Teng-Jyun; Wang, Hsuan-Chih

doi:10.3390/vehicles4040054

Open AccessArticle

An Innovative and Cost-Effective Traffic Information Collection Scheme Using the Wireless Sniffing Technique

by

Wei-Hsun Lee

^1,2,*

,

Teng-Jyun Liang

¹

and

Hsuan-Chih Wang

¹

Department of Transportation & Communication Management Science, National Cheng Kung University, No. 1 University Road, Tainan 701, Taiwan

²

Center for Innovative FinTech Business Models, National Cheng Kung University, No. 1 University Road, Tainan 701, Taiwan

^*

Author to whom correspondence should be addressed.

Vehicles 2022, 4(4), 996-1011; https://doi.org/10.3390/vehicles4040054

Submission received: 30 July 2022 / Revised: 9 September 2022 / Accepted: 20 September 2022 / Published: 30 September 2022

(This article belongs to the Topic Information Sensing Technology for Intelligent/Driverless Vehicle)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the wireless sniffing technique (WST) has become an emerging technique for collecting real-time traffic information. The spatiotemporal variations in wireless signal collection from vehicles provide various types of traffic information, such as travel time, speed, traveling path, and vehicle turning proportion at an intersection, which can be widely used for traffic management applications. However, three problems challenge the applicability of the WST to traffic information collection: the transportation mode classification problem (TMP), lane identification problem (LIP), and multiple devices problem (MDP). In this paper, a WST-based intelligent traffic beacon (ITB) with machine learning methods, including SVM, KNN, and AP, is designed to solve these problems. Several field experiments are conducted to validate the proposed system: three sensor topologies (X-type, rectangle-type, and diamond-type topologies) with two wireless sniffing schemes (Bluetooth and Wi-Fi). Experiment results show that X-type has the best performance among all topologies. For sniffing schemes, Bluetooth outperforms Wi-Fi. With the proposed ITB solution, traffic information can be collected in a more cost-effective way.

Keywords:

vehicle sensing; wireless sniffing; machine learning; intelligent traffic beacon; vehicular network

1. Introduction

The planning of transportation policies and strategies heavily rely on traffic information. Without comprehensive traffic information, transportation engineers and practitioners are unable to precisely design transportation planning, traffic signal plans, and so forth. In terms of ways to collect traffic information, common techniques include vehicle detectors (VD), automatic vehicle identification (AVI), GPS-based vehicle probing (GVP), ETC-based vehicle probing (EVP), and cellular-based vehicle probing (CVP). However, there are several limitations of these techniques:

Traffic information of transportation modes other than vehicles, such as walking and biking, are not easily collected with these techniques.
The costs of installation and maintenance of these techniques are high.
Penetration rate is low due to installation cost. For instance, EVP systems require the installation of an on-board unit (OBU) and road-side unit (RSU), and four ETC gantries are required to detect the turning proportion of vehicles at an intersection.

Wireless signal analyzing technology, named the wireless sniffing technique (WST), is an emerging technology for collecting traffic information. The main idea of the WST is to sniff Wi-Fi or Bluetooth (BT) wireless packets broadcasted from in-vehicle mobile devices, such as smartphones, smartwatches, Android Auto, or Apple CarPlay. In the sniffed packets, some basic public information from mobile devices can be learned, such as the media access control (MAC) address, received signal strength indicator (RSSI), and timestamp. Since the MAC address is an irreplaceable identification (ID) for each mobile device, the number of mobile devices, average speed, origin–destination (OD), travel path information, and vehicle turning proportion at an intersection can be easily obtained by the WST. If Wi-Fi or BT of a mobile device is turned on, it can be detected by sensors without interrupting the use of mobile devices. Furthermore, there are some advantages of the WST summarized as follows.

Immediacy: The time needed from data collection to uploading to the cloud can be achieved in almost real time.
High penetration: A high percentage of vehicles have been equipped with wireless automobile systems such as Android Auto or Apple CarPlay. It is reasonable to assume that every driver or passenger may have smartphones that can be detected by the WST.
Low cost: The costs of a single microcomputer, a wireless network card, and the maintenance fee are low.
Two-way transmission: The scanning mechanisms, such as Wi-Fi or BT, support two-way data transmission.

Although the WST is a promising technique for traffic information collection, there are three challenging problems: transportation mode classification problem (TMP), lane identification problem (LIP), and multiple devices problem (MDP). The TMP is the problem of correctly identifying the transportation mode (e.g., car, bike, or walking) of the owner of the sniffed mobile device. The LIP refers to the problem of correctly identifying the lane that the sniffed mobile device is riding in. The MDP refers to the double counting problem caused by multiple devices in the same vehicle.

Several previous studies have proposed methods to address the TMP [1,2,3,4,5], but few studies aim at solving the LIP or MDP. Duarte and Hu [2] first investigated the classifying of vehicles using wireless distributed sensor networks. The study proposed that vehicle types can be classified by applying machine learning techniques to collected acoustic, seismic, and infrared signal data. Other studies indicated channel state information collected by Wi-Fi can be applied to classify vehicle types by analyzing the spatiotemporal correlations of CSI amplitude and identified phase data [1, 3–5]. Nonetheless, a Wi-Fi CSI traffic information collection scheme requires direct transmitting of data from mobile devices, the applicability of which is also limited to the penetration rate of devices. On the other hand, the WST sniffs signals from in-vehicle devices instead of requiring data transmission, which enables the WST to be applied in a multi-lane scenario and be deployed flexibly.

An intelligent traffic beacon (ITB) was first proposed by Fan [6] with a Z-type topology, as shown in Figure 1. The ITB applies the WST for traffic information collection featuring seven heuristic algorithms to address the TMP, LIP, and MDP. However, there are several limitations in the study. First, the system in Fan’s study [6] assumed no lane-changing behaviors. The experiments were conducted in a freeway tunnel located in Taiwan, where lane-changing behavior is not allowed. Second, the study assumed a relatively simple traffic environment, and only included limited transportation modes. Types of vehicles traveling on freeways are limited to passenger cars, buses, and trucks. However, other transportation modes, such as scooters, bikes, or walking, are commonly seen in any urban scenario, which are not supported in the ITB proposed by Fan [6]. Third, spacing between two vehicles is larger and more stable in freeway tunnels than in general highways due to strict regulations on freeways. Moreover, Fan [6] only tested the Z-type topology, where each ITB would identify the lane information when a vehicle passed by, and lane position of each vehicle would be decided by voting mechanisms. The advantage of the voting algorithm used in the freeway tunnel scenario is that it prevents misidentification of some ITBs. However, this mechanism is supported only in a no lane-changing scenario, and it may fail and yield incorrect results once a driver changes lanes. Furthermore, the Z-type topology with a voting mechanism has been proved not to be a cost-effective solution. The assumptions mentioned above are not practical for most general roads.

The ITB introduced in Fan [6] has many limitations and can only be applied to a freeway tunnel scenario. The motivation and goal of this paper is to reinvent the ITB and propose an integrated solution to solve the LIP, TMP, and MDP problems for urban road scenarios. Since all the three problems are independent, machine learning models, including the hierarchical support vector machine (SVM), k-nearest neighbor (KNN), and affinity propagation (AP), are integrated to solve these problems. The ITB supports dual communication schemes (i.e., Bluetooth and Wi-Fi), which enable detection results to be more accurate and stable. To evaluate the accuracy of different topologies of ITB deployment, three types of deployment topologies are studied by field experiments: X-type, rectangle-type, and diamond-type topologies. The performance of these topologies are evaluated and discussed in Section 5. The contributions of this study are summarized as:

Three traffic information collection problems in the wireless sniffing technique—TMP, LIP, and MDP—are defined and extended to urban scenarios.
Three machine learning models, and two communication schemes, BT and Wi-Fi, are integrated in an ITB to solve these problems.
The performance of three ITB topologies, X, rectangle, and diamond topologies, are conducted and evaluated in the field experiments.

The remainder of this paper is organized as follows. Section 2 discusses the related works in the literature. Section 3 defines the three problems and discusses ITB deploying topologies and scenarios. Section 4 details the proposed solutions and ITB system design. Section 5 presents the experiment design, solution methodology, and discusses the results. Section 6 concludes this paper and discusses future works.

2. Literature Review

Alessandrini et al. [7] deployed 20 detectors and the distribution of human flow was analyzed by Wi-Fi signal and big data. Du et al. [8] collected the data of people flow via sniffing Wi-Fi. The results show that the detection rate of Wi-Fi is usually lower than other methods. It may be triggered by the proportion of those turning on Wi-Fi, the decline in the RSSI affected by multiple paths, and the variability of different devices. Dunlap et al. [9] installed a mobile phone on the bus and used the app to detect the surrounding BT and Wi-Fi signals. Meanwhile, GPS data were collected to analyze the passengers’ origin, destination, and transfer information at different sites. El-Tawab et al. [10] installed Raspberry Pi at a bus stop and calculated the waiting time of the passengers at each station. Jiang et al. [11] set up a Wi-Fi router on a bus for passengers to connect and analyze passenger behavior. Mikkelsen et al. [12] installed Raspberry Pi on a bus and collected the people flow data in different days and directions. The sniffing frequency may be affected by the network card, driver, mobile devices, operating system, and other applications. Oransirikul et al. [13] used Raspberry Pi to detect the Wi-Fi signal of passengers at a bus station and estimated the number of passengers.

Ding et al. [14] used Wi-Fi signals to observe the speed and traffic flow of the highway field, and VD data were used as the ground truth. The result shows the traffic flow would be overestimated or underestimated in some cases. Friesen et al. [15] used BT and Xbee communication technology to observe the traffic flow and analyzed traffic data at intersections. Fang et al. [16] used the big data from built-in sensors of mobile devices, such as magnetometers, gyroscopes, and accelerometers, to classify the transportation modes. Machine learning models such as decision tree, KNN, and SVM were used, and the number of features were amplified from 7 to 14 to explore whether the accuracy could be increased or not. Goodall et al. [17] conducted several tests under Wi-Fi and BT communication, including the detection range, transmission frequency, vehicle speed, and transmission rate. The results of the study show Wi-Fi packets are transmitted at a lower frequency and have a lower probability of successful transmission. In the collection of data, the Wi-Fi detector detects more MAC addresses than the BT detector, but the effect is worse when the vehicle passes through more than three consecutive detectors. Due to the low frequency of broadcasting, Wi-Fi signals are suitable for use on low-speed vehicles or roads with lower traffic flow. Jahangiri et al. [18] used sensors in mobile devices such as accelerometers, gyroscopes, and rotational vector sensors to collect data and identify five transportation modes by KNN, SVM, and decision tree-based related models. Won et al. [5] installed a Wi-Fi router (transmitter) on one side of the road and a notebook (receiver) on the opposite side of the road. Signals transmitted from the router would vary when a vehicle passed by. The signal variation would be transmitted to the notebook and identify its transportation mode by SVM. However, it is difficult to put into practice due to the high cost. Yang [19] used BT data via the gene algorithm model (GANN), KNN, and SVM to identify the transportation modes (motor vehicles, bicycles, pedestrians). GPS data were collected as the ground truth. This research illustrates the difficulty BT technology has in collecting traffic information, such as traffic flow, direction, lane identification, and transportation modes.

For Wi-Fi and BT data, the common applications to traffic are travel time estimation and transportation mode identification, especially BT data. As for Wi-Fi data, further applications are analyzing human flow or indoor positioning [20]. For vehicle detection, many studies use GPS and sensors in mobile devices, such as a gyroscope, magnetometer, and accelerometer, to identify transportation modes. However, it is not easy to obtain the data mentioned above, and GPS and sensors are not as extensive as Wi-Fi or BT. For traffic information, image recognition is commonly used to identify lane information [21]. Moreover, due to the development of artificial intelligence, lane information based on deep learning is also used [22]. In addition, Aliari [23] mentions that only 2%–3.4% of the traffic flow could be detected via BT communication, and it is not suitable to collect lane information via BT communication, either. Ding et al. [14] mentioned cars with multiple mobile devices or no device would cause the overestimation or underestimation of traffic flow, and this is a significant problem needing to be solved in the future. Except for the attempt of Fan [6], there is no research demonstrating an ability to solve the three issues effectively based on the WST, especially the LIP and MDP.

3. Problem Definition

3.1. Transportation Mode Identification Problem (TMP)

Normally, mobile devices with wireless communication schemes on road sections can be sniffed; for instance, smart phones on pedestrians or bikes and static devices in surrounding areas such as laptops or Wi-Fi access points. It is crucial to identify the transportation mode of the collected wireless signal data. In this study, four transportation modes, including passenger car, scooter, bike, and pedestrian, are required to be identified. The definition of the TMP is determined in (1), where

O_{x}

means object x (vehicle or bike or passenger), and

C_{y}

is transportation mode y.

Determine

O_{x} \in C_{y}

(1)

3.2. Lane Identification Problem (LIP)

The objective of solving the LIP is to determine which lane the vehicle is traveling in, and whether the detected vehicle has lane-changing behavior in the sensing area. For example, if a road consists of two lanes in each direction, when a driver changes lane, the proposed model should identify this behavior and lane-changing direction (i.e., from inner lane to outer lane or from outer lane to inner lane). The definition of the LIP is shown in Equation (2), where

M_{i}

is the i-th mobile device, and

L_{i i}

,

L_{o o}

,

L_{o i}

,

L_{i o}

are the interior lane, the exterior lane, changing from the exterior lane to the interior lane, and changing from the inner lane to the outer lane, respectively.

Determine

M_{i} \in {L_{i i}, L_{o o}, L_{o i}, L_{i o}}

(2)

3.3. Multiple Devices Problem (MDP)

Identifying the number of pedestrians by sniffing mobile devices is intuitive since in most cases one person only carries only one mobile device. However, it is a huge challenge to estimate the number of vehicles by the number of sniffed devices (count number of MAC addresses) because there might be more than one device in a vehicle. The MDP is the problem of counting the vehicles by the collected sniffed data, where the definition of the MDP is defined in (3).

M_{i}

means sniffed mobile device and i and

O_{x}

refers to an object x (vehicle or bike or passenger) that carries the

M_{i}

.

Determine

M_{i} \in O_{x}

(3)

3.4. Topologies and Scenario

Three topologies were designed to evaluate the WST, including X, rectangle, and diamond topologies, as shown in Figure 2. Data collected from two communication technologies, Wi-Fi and BT, will be evaluated in these topologies. After performing experiments and observations, the transmission distances of Wi-Fi and BT were both about 100 m. As a result, for the X-type topology and diamond-type topology, the distance between each ITB was 100 m in the LIP experiment to ensure the sniffing range of each ITB was overlapping. As for the rectangle-type topology, it was compared with the X-type topology, and it was discussed whether the middle ITB would influence the performance of the rectangle-type topology and X-type topology. The distance between each ITB for the rectangle-type topology was 200 m. In the TMP and MDP experiments, due to the limitation of the experiment, the distance between each ITB was 50 m for the X-type topology and diamond-type topology and 100 m for the rectangle-type topology.

The configuration of road used in this study was summarized as follows. There are two lanes in each direction: one is the low speed vehicle lane, and the other is the sidewalk for each direction. As shown in Figure 3, cars are permitted to travel in both lanes, scooters and bikes are in the low-speed vehicle lane, and pedestrians walk on the sidewalk.

Four assumptions were made in this paper. First, at least one mobile device has turned on either Wi-Fi or BT. Second, there are four transportation modes in this scenario, including car, scooter, bike, and walking. Third, the number of packets and the strength of RSSI will not be affected by the traffic peak hours and off-peak hours. Last, since normal urban traffic might be influenced by traffic jams, traffic signals, weather, or other conditions, the average free flow speed for vehicles on these two normal lanes is set as about 30–40 km/h.

4. System Design

4.1. Transportation Mode Problem (TMP)

When a vehicle is passing through two ITBs, the speed of a vehicle can be calculated from the distance between the two ITBs divided by vehicle’s travel time. In this study, the KNN model (k-nearest neighbors) was applied to classify four transportation modes: cars, scooters, bikes, and pedestrians. The ideal sniffed signal patterns from a moving mobile device passing through one ITB are illustrated in Figure 4. It is obvious that the contacting window (i.e., time length of a signal pattern) will be short if a vehicle has high speed (e.g., car) and will be long if moving speed is low (e.g., pedestrian).

4.2. Lane Identification Problem (LIP)

In an ideal case, it is found that an RSSI signal strength has a negative correlation with the distance from a mobile device and an ITB, which indicates that such a relationship can be extended to identify the vehicle lane position. That is, one can infer which lane a vehicle is traveling in according to the variations of RSSI. However, this method is not practically feasible since RSSI patterns vary depending on mobile devices. It is hard to estimate the distance between a device and ITBs based on RSSI signal strength because the collected RSSI varies from device to device. Assuming that there is no variability on each device and ITB pair, it is possible to identify the lane information by comparing different RSSI data sniffed by different ITBs in the topologies. All sniffed data from each ITB in this study were uploaded to a cloud platform and processed by machine learning models to classify the lane information for each device.

An example of the LIP in which ITB deployment follows the topology in Figure 3 is illustrated in Figure 5a. When the vehicle is traveling in the inner lane, theoretically, ITB3 and ITB6 would sniff the highest strength RSSI, and ITB2 and ITB5 would sniff the lowest strength RSSI. The case of vehicle with lane-changing behavior is illustrated in Figure 5b. If the vehicle changes from the inner lane to the outer lane, ITB4 would sniff the highest strength RSSI, and ITB5 and ITB6 would sniff the lower RSSI, with ITB5 detecting the lowest one.

4.3. Multiple Devices Problem (MDP)

Similar information was used to identify whether several devices were located in the same car. The mobile devices will have similar RSSI patterns, such as similar detection time and peak period if they are presented in the same vehicle. The similar collected patterns indicate that these mobile devices were presented in the same vehicle, as illustrated in Figure 6b. On the other hand, the collected RSSI signal data may diverge into several groups if these mobile devices are in different vehicles, as shown in Figure 6a. Therefore, a clustered unsupervised machine learning model should be applied to cluster the signal in a similar spatiotemporal pattern in several groups. In this work, affinity propagation [24] was chosen as the proposed solution for the MDP.

4.4. Framework

The framework for solving the three problems in the WST proposed in this work is shown in Figure 7. First, the wireless signal data collected by the ITB was uploaded to the MySQL database. During data preprocessing, datasets were transformed into features as the input of models and normalized after outlier filtering. Ground truths were labeled in the datasets for solving the LIP and TMP. For the LIP, lane information was labeled with the collected data, such as the inner lane or the outer lane. The ground truth of vehicle type, such as car, scooter, bike, or walking, was labeled for application to the TMP.

The basic features are the maximum RSSI, the minimum RSSI, and the count of packets for each mobile device, and the detailed features would be discussed with each issue in Section 5. The classification or clustering accuracies of each issue will be presented with different topologies and communication technologies.

4.5. Hardware and Software

The ITB hardware used was a customized LTE (4G) router with Bluetooth, Wi-Fi, and LTE interfaces where the receiver sensitivities were −85, −76, and −72 dBm for Wi-Fi 802.11 b/g/n interfaces, respectively. The antenna was dual band (2.4 G/5 G), where the length was about 14.5 cm, and the antenna gain was 2 dBi. A packet analyzer was running under the Linux operating system, ‘tcpdump’ command was used for sniffing the wireless signal, and the collected data were uploaded to the cloud server and saved in the MySQL database. Python was applied to perform raw data preprocessing, and Scikit-learn was used to perform the training and testing of data. The Jupyter Notebook, which is a web application, was used to share documents and program Python.

An observation of the collected signal raw data example is shown in Figure 8, where the scenario was six smartphones including three iOS and three Android smartphones with Wi-Fi being used in one vehicle with a speed of 40 km/h. The contact period was about 24 s, and the range of RSSI signal strength fell between −80 to −50 dBm.

5. Experiments

5.1. Transportation Mode Classification Problem (TMP)

The field experiment of the TMP was performed at the campus of National Cheng Kung University, as shown in Figure 9, where it was an enclosed field with a road width of 8 m. Four testers drove a car, rode a scooter, rode a bike, or walked through different ITB topologies several times. Drivers did not change lanes during driving. Ten mobile devices were carried in the testing vehicles. There were 355 samples for both the rectangle-type topology and X-type topology with BT, 335 samples for the diamond-type topology with BT, 392 samples for both the rectangle-type topology and X-type topology with Wi-Fi, and 397 samples for the diamond-type topology with Wi-Fi. The speed of the car and scooter was about 35 km/h, the bike was about 15 km/h, and the pedestrians walked at the speed of 3~5 km/h.

It was challenging to identify the types of transportation modes in the urban scenario rather than in the freeway scenario since there are more vehicle types in the urban area. Since the number of vehicle types is definite, the TMP can be identified as a classification problem. The k-nearest neighbors (KNN), a supervised machine learning model, was used to classify the transportation modes. It finds the k nodes which are closest to the new data from training datasets and classifies these data by combining these nodes. K is a user-defined value, and the KNN model will compute training and testing datasets with respect to different k values. After choosing the k value, the model will classify the new data by the majority vote based on the categories of these k nodes. In the TMP, both 10-fold cross-validation and confusion matrixes were used to measure the model performance. In the dataset, the proportions of the training set and the testing set were 75% and 25%. Four kinds of features concerning the TMP were selected, which were min. and max. RSSI, packet counts, and time duration of each device and ITB pair.

The size of the sniffed sample was 355 by Bluetooth and 392 by Wi-Fi. Three TMP confusion matrixes for BT are shown in Figure 10. In both X-type topology and rectangle-type topology, one scooter was misidentified as a bike. For the diamond-type topology, three cars were misidentified as scooters, and three scooters were misidentified as cars. Figure 11 shows the confusion matrixes for Wi-Fi. In the X-type topology, one bike was misidentified as walking, three cars were misidentified as scooters, and a scooter was misidentified as a car. In the rectangle-type topology, one bike was misidentified as walking, a car was misidentified as a scooter, and three scooters were misidentified as cars. In the diamond-type topology, a car was misidentified as a scooter and a scooter was misidentified as a car.

Figure 12 shows the accuracy of 10-fold cross-validation in three topologies, where k means the k value in the KNN method, and the value of k in each topology was chosen from the best k value by trial and error from k = 1 to 15. The results show that the accuracy of BT data was 98.9% and Wi-Fi data was 94.9%. For both the X-type topology and rectangle-type topology, the performances of BT were better than Wi-Fi. On the contrary, the performance of Wi-Fi was better than BT in the diamond-type topology.

5.2. Lane Identification Problem (LIP)

The experiment collected wireless signals in different lanes and routes. The field of the experiment was on Chengnan Rd., Annan Dist., Tainan, Taiwan (as shown in Figure 13). It is a semi-enclosed field with two lanes on each direction. In this experiment, ten mobile devices were put in the car with Wi-Fi and BT being on. A car with 10 mobile devices passed by the ITB several times at a speed of 30 km/h, which indicates the average travel speed of a road section in an urban scenario. Two scenarios were performed in this experiment.

Driving on a fixed lane (no lane-changing behavior): Driving in the interior lane and the exterior lane several times and collecting Wi-Fi and BT signals.
Driving and changing lanes: There were two routes for each topology, changing lanes from the interior one to the exterior one, and vice versa.

For the LIP, the lane information was distinguishable and so it was a classification problem. In addition, there were about 200–500 samples (small sample sets) for both BT and Wi-Fi with three topologies. The SVM, a widely used supervised learning model, was applied to solve the LIP because it is more intuitive to this classification problem compared to other machine learning models. In a high-dimensional space, the SVM model develops a hyperplane to separate the samples to achieve the classification effect. The process of obtaining the best hyperplane can be regarded as an optimization problem. The formulas are as follows:

\min_{w} \frac{1}{2} {∥ w ∥}_{2}^{2} + C \sum_{i = 1}^{N} ϵ_{i}

(4)

s u b j e c t t o y_{i} (w^{T} x_{i} + b) \geq 1 - ϵ_{i}, \forall x_{i}

(5)

ϵ_{i} \geq 0

In (4), to maximize the margin

\frac{2}{∥ w ∥}

, it could be transferred into min

\frac{∥ w ∥}{2}

;

y_{i}

equals 1 or −1, and it is the classification of the datasets;

w^{T} x_{i} + b

is the hyperplane which equals 0;

ϵ_{i}

is the slack variable, and C is a regularization variable which is used to prevent overfitting.

A two-layered hierarchical architecture including three independent SVM models was proposed to solve the LIP, as shown in Figure 14. The top layer SVM was applied to identify if a vehicle was changing lanes. Two lower level SVMs were designed. One was used to determine which way the vehicle was going if it changes lanes; the other applies for determining which lane the vehicle was traveling in if it does not change lanes. The signal data were collected based on three topologies: X-type topology, rectangle-type topology, and diamond-type topology. Since the variations in RSSI for each mobile device and each ITB are the critical pieces of information, three features were selected for training the SVM model: packet count, maximum and minimum RSSI. The proposed hierarchical structure for the LIP is illustrated as Figure 14, where three SVM models were be constructed as follows:

SVM 1: Classifying lane-changing behavior.

SVM 2: If the vehicle changes a lane, determine which route the vehicle moved.

SVM 3: If the vehicle is in a fixed lane, determine if it is in the interior or the exterior lane.

The average collected signal strengths were −85.44, −85.74, and −84.15 for RSSI using Bluetooth sniffers, and −17.46, −20.0, and −24.91 for RSSI using Wi-Fi on rectangle, X-type, and diamond-type topologies, respectively. The accuracies for the LIP are shown in Figure 15, which is summarized by 10-fold cross validation. The overall accuracy can be estimated by the average accuracy composite of higher SVM and lower level SVM (average of SVM1 × SVM2 and SVM1 × SVM3). For BT, the accuracy in identifying four cases of lane-changing behaviors was about 45.4%, 44.9%, and 39.6% in the X-type, rectangle-type, and diamond-type topologies, respectively. For the Wi-Fi, the accuracy in identifying four cases of lane-changing behaviors is 34%, 27.3%, and 45.4% in the X-type, rectangle-type, and diamond-type topologies, respectively. The results show that the X-type topology is superior to the others in BT. In the three classifiers, the accuracy in identifying the fixed lane (SVM 3) was about 80%, showing that the signal of the fixed lane varies regularly and could be classified easily.

5.3. Multiple Devices Problem (MDP)

There is no quantitative relationship between mobile phones and vehicles. In the traditional k-center clustering methods, such as k-means, the number of clusters (k) has to be determined first, and the clustering result may be incorrect. For example, there are two cars (two clusters) passing through the ITB at a certain time, and the ITB sniffs three devices. In the real situation, three devices are in one car, and the traditional k-center clustering model may group these devices into two clusters after k = 2 is specified. As a result, traffic information may be misidentified.

The affinity propagation algorithm (AP) was suitable for solving the MDP since it was unnecessary to specify the number of clusters. AP is a clustering algorithm proposed by Frey and Dueck [24], which calculates the similarity based on the concept of a message passing between data points. If there are n points, the similarity between n points can form a similarity matrix. In the AP algorithm, each point is a possible cluster center point, called an exemplar. Responsibility (R(i, k)) and Availability (A(i, k)) are the measurements to decide whether it is a cluster center point. The former is the degree to which point k is suitable for the clustering of the center of data point i, and the latter is the degree of suitability that data point i selects point k as the center of the cluster. If they are greater, it means k would probably be the clustering center. The R value and the A value will continue to be iteratively updated. When the cluster center is no longer updated to a certain extent or reaches the maximum number of iterations, the cluster center can be obtained, and the data are clustered. Three statistics of features were prepared for AP, including maximum RSSI, minimum RSSI, and count of sniffed packets for each mobile device collected by each ITB.

The experiment scenario designed for the MDP is illustrated in Figure 16. Two vehicles, a car and a scooter, were tested and were equipped with two and three mobile devices, respectively. The two vehicles passed through the ITB at normal speed for each topology. There were two scenarios in this experiment: in Figure 16a the vehicles are in parallel, and in Figure 16b the vehicles are in tandem.

The average collected signal strengths were −85.21, −85.55, −87.41 for RSSI using Bluetooth sniffers, and −22.03, −31.86, −36.1 for RSSI using Wi-Fi on rectangle, X-type, and diamond-type topologies, respectively. The accuracy of the proposed solution for MDP is shown in Figure 17, where Figure 17a,b presents the experiment results of the collected signal data in Bluetooth and Wi-Fi, respectively. Experiment results of Scenario 1 (vehicles in parallel) and Scenario 2 (vehicles in tandem) show that the accuracy of estimating vehicle numbers was 100% except for the rectangle-type topology with Wi-Fi. It indicates that most of the mobile devices could be clustered into two groups (two vehicles) in spite of different scenarios, sensor topologies, and communication schemes.

5.4. Discussion

For the evaluation of the TMP, the performance of the proposed solution was good, where the average accuracy was 96.9% for BT and 95.9% for Wi-Fi. It was mainly because the moving patterns of motored vehicles and non-motored vehicles were different in trivial ways. However, cars and scooters were misidentified more frequently. The reason may be that the moving speed patterns of these two types of vehicles are similar. Moreover, the misidentification of these two vehicles may be caused by the hardware and software variations in different smartphone brands, and so the patterns of maximum RSSI are irregular.

In the field experiment of the LIP, the best accuracy of the proposed solution for classifying four lane moving cases (inner, outer, inner to outer, outer to inner) was 45.4%, both in BT and Wi-Fi. It seems not good enough since the overall performance was a product of composite accuracy of upper level SVM and lower level. However, if we assume the vehicles do not change lanes in such a short area, then the LIP can be simplified by one SVM (SVM3), which had a higher accuracy of 81% (BT) and 71.3% (Wi-Fi). Comparing to the heuristic algorithm solution proposed in Fan [6], where the performance of accuracies were 91.5% (BT) and 33.8% (Wi-Fi), the solution proposed in this work is much more stable.

For both results, BT data performs better than Wi-Fi data. The reason may be because the communication distance of BT is shorter than Wi-Fi and so the possibilities of misidentifying lane position will be lower. In terms of topology, the X-type topology performs better than the rectangle and diamond topologies. It can be inferred that X-type topology has both symmetrical and asymmetrical ITB deployment, which can identify the vehicle moving variation in signal sniffing.

In the MDP experiment, the number of vehicles can be precisely estimated; however, the identification of which vehicle the device is located in may be misidentified. The average accuracy of device clustering was 66.7%, which indicates that one device may have a 33.3% possibility in being misidentified. The main reason for such differences may be the selected features are not providing sufficient implied information for the clustering model. Two ideas could be introduced for enhancing the device assigning accuracy: one is that more features could be developed, and the other is time series deep learning models could be tried, which are planned for future works.

6. Conclusions

Due to the uniqueness of the MAC address, traffic information such as speed, origin–destination estimation, and intersection turning proportions can be obtained by the WST cost-effectively in real time. In this work, an intelligent traffic beacon (ITB) was reinvented as an integrated solution for traffic information collection, which makes it a promising traffic information source compared to traditional vehicle detector technologies. The proposed ITB integrates three machine learning models, hierarchical SVM, KNN, and affinity propagation, to solve the three problems, LIP, TMP, and MDP, separately.

Field experiments with three sensor topologies (X-type, rectangle-type, and diamond-type) and two wireless sniffing schemes (Bluetooth and Wi-Fi) were conducted in urban scenarios. The results show that X-type topology outperforms others in all three problems, and diamond-type yields the worst and most unstable performance. The reason may be that both the X-type topology and rectangle-type topology have symmetrical ITBs, and the pair of opposite ITBs could sniff similar signal variations. For the communication scheme, Bluetooth performs better than Wi-Fi because of the existence of outliers in Wi-Fi scenarios.

For future work, it is possible to improve the performance of the ITB by combining the advantages of the WST and other techniques to collect more comprehensive traffic information. Suggestions for future research are listed below:

More wireless features, such as channel state information (CSI), could be explored for models to learn the implied information.
More machine learning or deep learning models could be evaluated and compared to enhance the accuracy for these issues.
More traffic scenarios could be designed in field experiments, such as traffic congestion or overtaking driving behaviors.
In the TMP, the accuracies of the three topologies with two communication technologies are all greater than 90%. However, some cars or scooters were misidentified as each other due to similar driving speeds. Such differences may be caused by the selected features, which entail insufficient information for data clustering.
Some other emerging traffic information collecting technologies, such as vehicle detection on video data by deep learning (such as YOLO [25]), can be integrated into the ITB.

Author Contributions

Conceptualization, W.-H.L.; methodology, W.-H.L. and T.-J.L.; software, T.-J.L.; validation, W.-H.L., T.-J.L. and H.-C.W.; formal analysis, T.-J.L.; investigation, W.-H.L.; data curation, W.-H.L.; writing— W.-H.L. and T.-J.L.; writing—review and editing, W.-H.L. and H.-C.W.; visualization, T.-J.L.; supervision, W.-H.L.; project administration, W.-H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Ministry of Science and Technology under contracts “MOST 111-2124-M-006-002, MOST 110-2410-H-006-064”, and “Center for Innovative FinTech Business Models” of National Cheng Kung University (NCKU), Taiwan, R.O.C.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The researchers acknowledge support from MAXWIN Technology and OS Lab team, CSIE, NCKU in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, C.; Ota, K.; Dong, M.; Yu, C.; Jin, H. WITM: Intelligent Traffic Monitoring Using Fine-Grained Wireless Signal. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 4, 206–215. [Google Scholar] [CrossRef]
Duarte, M.F.; Hu, Y.H. Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 2004, 64, 826–838. [Google Scholar] [CrossRef]
Sliwa, B.; Piatkowski, N.; Wietfeld, C. The Channel as a Traffic Sensor: Vehicle Detection and Classification based on Radio Fingerprinting. IEEE Internet Things J. 2020, 7, 7392–7406. [Google Scholar] [CrossRef]
Won, M.; Zhang, S.; Son, S.H. WiTraffic: Low-cost and non-intrusive traffic monitoring system using WiFi. In Proceedings of the 26th International Conference on Computer Communication and Networks (ICCCN), Vancouver, BC, Canada, 31 July–3 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–9. [Google Scholar]
Won, M.; Sahu, S.; Park, K.J. DeepWiTraffic: Low cost WiFi-based traffic monitoring system using deep learning. In Proceedings of the IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Monterey, CA, USA, 4–7 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 476–484. [Google Scholar]
Fan, Y.-H. A Spatiotemporal Seamless Traffic Information Collection Framework by Intelligent Vehicle Probing. Master’s Thesis, National Cheng Kung University, Tainan, Taiwan, 2017; pp. 1–86. [Google Scholar]
Alessandrini, A.; Gioia, C.; Sermi, F.; Sofos, I.; Tarchi, D.; Vespe, M. WiFi positioning and Big Data to monitor flows of people on a wide scale. In Proceedings of the European Navigation Conference (ENC), Lausanne, Switzerland, 9–12 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 322–328. [Google Scholar]
Du, Y.; Yue, J.; Ji, Y.; Sun, L. Exploration of optimal Wi-Fi probes layout and estimation model of real-time pedestrian volume detection. Int. J. Distrib. Sens. Netw. 2017, 13, 1550147717741857. [Google Scholar] [CrossRef]
Dunlap, M.; Li, Z.; Henrickson, K.; Wang, Y. Estimation of origin and destination information from BT and Wi-Fi sensing for transit. Transp. Res. Rec. J. Transp. Res. Board 2016, 2595, 11–17. [Google Scholar] [CrossRef]
El-Tawab, S.; Oram, R.; Garcia, M.; Johns, C.; Park, B.B. Data analysis of transit systems using low-cost IoT technology. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), Kona, HI, USA, 13–17 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 497–502. [Google Scholar]
Jiang, M.; Fan, X.; Zhang, F.; Xu, C.; Mao, H.; Liu, R. Characterizing On-Bus WiFi Passenger Behaviors by Approximate Search and Cluster Analysis. In Proceedings of the 7th International Conference on Cloud Computing and Big Data (CCBD), Macau, China, 16–18 November 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 31–36. [Google Scholar]
Mikkelsen, L.; Buchakchiev, R.; Madsen, T.; Schwefel, H.P. Public transport occupancy estimation using WLAN probing. In Proceedings of the 8th International Workshop on Resilient Networks Design and Modeling (RNDM), Halmstad, Sweeden, 13–15 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 302–308. [Google Scholar]
Oransirikul, T.; Nishide, R.; Piumarta, I.; Takada, H. Feasibility of analyzing Wi-Fi activity to estimate transit passenger population. In Proceedings of the IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), Crans-Montana, Switzerland, 23–25 March 2016; IEEE: Piscataway, NJ, USA, 2016; p. 362369. [Google Scholar]
Ding, F.; Chen, X.; He, S.; Shou, G.; Zhang, Z.; Zhou, Y. Evaluation of a Wi-Fi signal-based system for freeway traffic states monitoring: An exploratory field test. Sensors 2019, 19, 409. [Google Scholar] [CrossRef] [PubMed]
Friesen, M.; Jacob, R.; Grestoni, P.; Mailey, T.; McLeod, R.D. Vehicular traffic monitoring using BT. In Proceedings of the 26th Annual IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Regina, SK, Canada, 5–8 May 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–6. [Google Scholar]
Fang, S.H.; Liao, H.H.; Fei, Y.X.; Chen, K.H.; Huang, J.W.; Lu, Y.D.; Tsao, Y. Transportation modes classification using sensors on smartphones. Sensors 2016, 16, 1324. [Google Scholar] [CrossRef] [PubMed]
Goodall, N.J. Fundamental characteristics of Wi Fi and wireless local area network re-identification for transportation. IET Intell. Transp. Syst. 2016, 11, 37–43. [Google Scholar] [CrossRef]
Jahangiri, A.; Rakha, H.A. Applying Machine Learning Techniques to Transportation Mode Recognition Using Mobile Phone Sensor Data. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2406–2417. [Google Scholar] [CrossRef]
Yang, S.; Wu, Y.J. Travel mode identification using BT technology. J. Intell. Transp. Syst. 2018, 22, 407–421. [Google Scholar] [CrossRef]
Yang, C.; Shao, H.R. WiFi-based indoor positioning. IEEE Commun. Mag. 2015, 53, 150–157. [Google Scholar] [CrossRef]
Kim, Z. Robust lane detection and tracking in challenging scenarios. IEEE Trans. Intell. Transp. Syst. 2008, 9, 16–26. [Google Scholar] [CrossRef]
Li, J.; Mei, X.; Prokhorov, D.; Tao, D. Deep neural network for structural prediction and lane detection in traffic scene. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 690–703. [Google Scholar] [CrossRef] [PubMed]
Aliari, Y.; Haghani, A. BT sensor data and ground truth testing of reported travel times. Transp. Res. Rec. J. Transp. Res. Board 2012, 2308, 167–172. [Google Scholar] [CrossRef]
Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]

Figure 1. The Z-type topology (Fan [6]).

Figure 2. The topologies of ITB: (a) X-type; (b) rectangle-type; (c) diamond-type.

Figure 3. Research scenario.

Figure 4. The variation of RSSI in TMP.

Figure 5. Variation of RSSI for LIP in two cases. (number 1~6 indicates the ITB no. shown in Figure 3) (a) Driving without changing lane. (b) Driving and changing lane.

Figure 6. RSSI of MDP. (a) Mobile devices in different vehicles. (b) Multiple devices in the same vehicle.

Figure 7. Framework for the collection of traffic data by the WST.

Figure 8. An observation of collected Wi-Fi signal raw data of six smartphones.

Figure 9. Experiment field of TMP (in NCKU campus).

Figure 10. Confusion matrixes of three topologies for BT.

Figure 11. Confusion matrixes of three topologies for Wi-Fi.

Figure 12. TMP accuracy.

Figure 13. LIP experiment field (in Annan district, Tainan City, Taiwan).

Figure 14. The hierarchical SVM structure proposed for solving LIP.

Figure 15. Accuracy of LIP comparison on different topologies: (a) accuracy of three SVMs on Bluetooth; (b) accuracy of three SVMs on Wi-Fi.

Figure 16. (a) Scenario 1: the vehicles are in parallel; (b) Scenario 2: the vehicles are in tandem.

Figure 17. Accuracy of MDP comparison on different topologies: (a) accuracy of MDP in Bluetooth; (b) accuracy of MDP in Wi-Fi.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, W.-H.; Liang, T.-J.; Wang, H.-C. An Innovative and Cost-Effective Traffic Information Collection Scheme Using the Wireless Sniffing Technique. Vehicles 2022, 4, 996-1011. https://doi.org/10.3390/vehicles4040054

AMA Style

Lee W-H, Liang T-J, Wang H-C. An Innovative and Cost-Effective Traffic Information Collection Scheme Using the Wireless Sniffing Technique. Vehicles. 2022; 4(4):996-1011. https://doi.org/10.3390/vehicles4040054

Chicago/Turabian Style

Lee, Wei-Hsun, Teng-Jyun Liang, and Hsuan-Chih Wang. 2022. "An Innovative and Cost-Effective Traffic Information Collection Scheme Using the Wireless Sniffing Technique" Vehicles 4, no. 4: 996-1011. https://doi.org/10.3390/vehicles4040054

Article Menu

An Innovative and Cost-Effective Traffic Information Collection Scheme Using the Wireless Sniffing Technique

Abstract

1. Introduction

2. Literature Review

3. Problem Definition

3.1. Transportation Mode Identification Problem (TMP)

3.2. Lane Identification Problem (LIP)

3.3. Multiple Devices Problem (MDP)

3.4. Topologies and Scenario

4. System Design

4.1. Transportation Mode Problem (TMP)

4.2. Lane Identification Problem (LIP)

4.3. Multiple Devices Problem (MDP)

4.4. Framework

4.5. Hardware and Software

5. Experiments

5.1. Transportation Mode Classification Problem (TMP)

5.2. Lane Identification Problem (LIP)

5.3. Multiple Devices Problem (MDP)

5.4. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI