An Innovative and Cost-Effective Trafﬁc Information Collection Scheme Using the Wireless Snifﬁng Technique

: In recent years, the wireless snifﬁng technique (WST) has become an emerging technique for collecting real-time trafﬁc information. The spatiotemporal variations in wireless signal collection from vehicles provide various types of trafﬁc information, such as travel time, speed, traveling path, and vehicle turning proportion at an intersection, which can be widely used for trafﬁc management applications. However, three problems challenge the applicability of the WST to trafﬁc information collection: the transportation mode classiﬁcation problem (TMP), lane identiﬁcation problem (LIP), and multiple devices problem (MDP). In this paper, a WST-based intelligent trafﬁc beacon (ITB) with machine learning methods, including SVM, KNN, and AP, is designed to solve these problems. Several ﬁeld experiments are conducted to validate the proposed system: three sensor topologies (X-type, rectangle-type, and diamond-type topologies) with two wireless snifﬁng schemes (Bluetooth and Wi-Fi). Experiment results show that X-type has the best performance among all topologies. For snifﬁng schemes, Bluetooth outperforms Wi-Fi. With the proposed ITB solution, trafﬁc information can be collected in a more cost-effective way.


Introduction
The planning of transportation policies and strategies heavily rely on traffic information. Without comprehensive traffic information, transportation engineers and practitioners are unable to precisely design transportation planning, traffic signal plans, and so forth. In terms of ways to collect traffic information, common techniques include vehicle detectors (VD), automatic vehicle identification (AVI), GPS-based vehicle probing (GVP), ETC-based vehicle probing (EVP), and cellular-based vehicle probing (CVP). However, there are several limitations of these techniques:

1.
Traffic information of transportation modes other than vehicles, such as walking and biking, are not easily collected with these techniques. 2.
The costs of installation and maintenance of these techniques are high.

3.
Penetration rate is low due to installation cost. For instance, EVP systems require the installation of an on-board unit (OBU) and road-side unit (RSU), and four ETC gantries are required to detect the turning proportion of vehicles at an intersection.
Wireless signal analyzing technology, named the wireless sniffing technique (WST), is an emerging technology for collecting traffic information. The main idea of the WST is to sniff Wi-Fi or Bluetooth (BT) wireless packets broadcasted from in-vehicle mobile devices, such as smartphones, smartwatches, Android Auto, or Apple CarPlay. In the sniffed packets, some basic public information from mobile devices can be learned, such as the media access control (MAC) address, received signal strength indicator (RSSI), and scenario, and it may fail and yield incorrect results once a driver changes lanes. Furthermore, the Z-type topology with a voting mechanism has been proved not to be a costeffective solution. The assumptions mentioned above are not practical for most general roads. Figure 1. The Z-type topology (Fan [6]).
The ITB introduced in Fan [6] has many limitations and can only be applied to a freeway tunnel scenario. The motivation and goal of this paper is to reinvent the ITB and propose an integrated solution to solve the LIP, TMP, and MDP problems for urban road scenarios. Since all the three problems are independent, machine learning models, including the hierarchical support vector machine (SVM), k-nearest neighbor (KNN), and affinity propagation (AP), are integrated to solve these problems. The ITB supports dual communication schemes (i.e., Bluetooth and Wi-Fi), which enable detection results to be more accurate and stable. To evaluate the accuracy of different topologies of ITB deployment, three types of deployment topologies are studied by field experiments: X-type, rectangletype, and diamond-type topologies. The performance of these topologies are evaluated and discussed in Section 5. The contributions of this study are summarized as: 1. Three traffic information collection problems in the wireless sniffing technique-TMP, LIP, and MDP-are defined and extended to urban scenarios. 2. Three machine learning models, and two communication schemes, BT and Wi-Fi, are integrated in an ITB to solve these problems. 3. The performance of three ITB topologies, X, rectangle, and diamond topologies, are conducted and evaluated in the field experiments.
The remainder of this paper is organized as follows. Section 2 discusses the related works in the literature. Section 3 defines the three problems and discusses ITB deploying topologies and scenarios. Section 4 details the proposed solutions and ITB system design. Section 5 presents the experiment design, solution methodology, and discusses the results. Section 6 concludes this paper and discusses future works.

Literature Review
Alessandrini et al. [7] deployed 20 detectors and the distribution of human flow was analyzed by Wi-Fi signal and big data. Du et al. [8] collected the data of people flow via sniffing Wi-Fi. The results show that the detection rate of Wi-Fi is usually lower than other methods. It may be triggered by the proportion of those turning on Wi-Fi, the decline in the RSSI affected by multiple paths, and the variability of different devices. Dunlap et al. [9] installed a mobile phone on the bus and used the app to detect the surrounding BT Figure 1. The Z-type topology (Fan [6]).
The ITB introduced in Fan [6] has many limitations and can only be applied to a freeway tunnel scenario. The motivation and goal of this paper is to reinvent the ITB and propose an integrated solution to solve the LIP, TMP, and MDP problems for urban road scenarios. Since all the three problems are independent, machine learning models, including the hierarchical support vector machine (SVM), k-nearest neighbor (KNN), and affinity propagation (AP), are integrated to solve these problems. The ITB supports dual communication schemes (i.e., Bluetooth and Wi-Fi), which enable detection results to be more accurate and stable. To evaluate the accuracy of different topologies of ITB deployment, three types of deployment topologies are studied by field experiments: X-type, rectangle-type, and diamond-type topologies. The performance of these topologies are evaluated and discussed in Section 5. The contributions of this study are summarized as:

1.
Three traffic information collection problems in the wireless sniffing technique-TMP, LIP, and MDP-are defined and extended to urban scenarios.

2.
Three machine learning models, and two communication schemes, BT and Wi-Fi, are integrated in an ITB to solve these problems.

3.
The performance of three ITB topologies, X, rectangle, and diamond topologies, are conducted and evaluated in the field experiments.
The remainder of this paper is organized as follows. Section 2 discusses the related works in the literature. Section 3 defines the three problems and discusses ITB deploying topologies and scenarios. Section 4 details the proposed solutions and ITB system design. Section 5 presents the experiment design, solution methodology, and discusses the results. Section 6 concludes this paper and discusses future works.

Literature Review
Alessandrini et al. [7] deployed 20 detectors and the distribution of human flow was analyzed by Wi-Fi signal and big data. Du et al. [8] collected the data of people flow via sniffing Wi-Fi. The results show that the detection rate of Wi-Fi is usually lower than other methods. It may be triggered by the proportion of those turning on Wi-Fi, the decline in the RSSI affected by multiple paths, and the variability of different devices. Dunlap et al. [9] installed a mobile phone on the bus and used the app to detect the surrounding BT and Wi-Fi signals. Meanwhile, GPS data were collected to analyze the passengers' origin, destination, and transfer information at different sites. El-Tawab et al. [10] installed Raspberry Pi at a bus stop and calculated the waiting time of the passengers at each station. Jiang et al. [11] set up a Wi-Fi router on a bus for passengers to connect and analyze passenger behavior. Mikkelsen et al. [12] installed Raspberry Pi on a bus and collected the people flow data in different days and directions. The sniffing frequency may be affected by the network card, driver, mobile devices, operating system, and other applications.
Oransirikul et al. [13] used Raspberry Pi to detect the Wi-Fi signal of passengers at a bus station and estimated the number of passengers.
Ding et al. [14] used Wi-Fi signals to observe the speed and traffic flow of the highway field, and VD data were used as the ground truth. The result shows the traffic flow would be overestimated or underestimated in some cases. Friesen et al. [15] used BT and Xbee communication technology to observe the traffic flow and analyzed traffic data at intersections. Fang et al. [16] used the big data from built-in sensors of mobile devices, such as magnetometers, gyroscopes, and accelerometers, to classify the transportation modes. Machine learning models such as decision tree, KNN, and SVM were used, and the number of features were amplified from 7 to 14 to explore whether the accuracy could be increased or not. Goodall et al. [17] conducted several tests under Wi-Fi and BT communication, including the detection range, transmission frequency, vehicle speed, and transmission rate. The results of the study show Wi-Fi packets are transmitted at a lower frequency and have a lower probability of successful transmission. In the collection of data, the Wi-Fi detector detects more MAC addresses than the BT detector, but the effect is worse when the vehicle passes through more than three consecutive detectors. Due to the low frequency of broadcasting, Wi-Fi signals are suitable for use on low-speed vehicles or roads with lower traffic flow. Jahangiri et al. [18] used sensors in mobile devices such as accelerometers, gyroscopes, and rotational vector sensors to collect data and identify five transportation modes by KNN, SVM, and decision tree-based related models. Won et al. [5] installed a Wi-Fi router (transmitter) on one side of the road and a notebook (receiver) on the opposite side of the road. Signals transmitted from the router would vary when a vehicle passed by. The signal variation would be transmitted to the notebook and identify its transportation mode by SVM. However, it is difficult to put into practice due to the high cost. Yang [19] used BT data via the gene algorithm model (GANN), KNN, and SVM to identify the transportation modes (motor vehicles, bicycles, pedestrians). GPS data were collected as the ground truth. This research illustrates the difficulty BT technology has in collecting traffic information, such as traffic flow, direction, lane identification, and transportation modes.
For Wi-Fi and BT data, the common applications to traffic are travel time estimation and transportation mode identification, especially BT data. As for Wi-Fi data, further applications are analyzing human flow or indoor positioning [20]. For vehicle detection, many studies use GPS and sensors in mobile devices, such as a gyroscope, magnetometer, and accelerometer, to identify transportation modes. However, it is not easy to obtain the data mentioned above, and GPS and sensors are not as extensive as Wi-Fi or BT. For traffic information, image recognition is commonly used to identify lane information [21]. Moreover, due to the development of artificial intelligence, lane information based on deep learning is also used [22]. In addition, Aliari [23] mentions that only 2%-3.4% of the traffic flow could be detected via BT communication, and it is not suitable to collect lane information via BT communication, either. Ding et al. [14] mentioned cars with multiple mobile devices or no device would cause the overestimation or underestimation of traffic flow, and this is a significant problem needing to be solved in the future. Except for the attempt of Fan [6], there is no research demonstrating an ability to solve the three issues effectively based on the WST, especially the LIP and MDP.

Transportation Mode Identification Problem (TMP)
Normally, mobile devices with wireless communication schemes on road sections can be sniffed; for instance, smart phones on pedestrians or bikes and static devices in surrounding areas such as laptops or Wi-Fi access points. It is crucial to identify the transportation mode of the collected wireless signal data. In this study, four transportation modes, including passenger car, scooter, bike, and pedestrian, are required to be identified. The definition of the TMP is determined in (1), where O x means object x (vehicle or bike or passenger), and C y is transportation mode y.

Determine
O x ∈ C y (1)

Lane Identification Problem (LIP)
The objective of solving the LIP is to determine which lane the vehicle is traveling in, and whether the detected vehicle has lane-changing behavior in the sensing area. Determine

Multiple Devices Problem (MDP)
Identifying the number of pedestrians by sniffing mobile devices is intuitive since in most cases one person only carries only one mobile device. However, it is a huge challenge to estimate the number of vehicles by the number of sniffed devices (count number of MAC addresses) because there might be more than one device in a vehicle. The MDP is the problem of counting the vehicles by the collected sniffed data, where the definition of the MDP is defined in (3). M i means sniffed mobile device and i and O x refers to an object x (vehicle or bike or passenger) that carries the M i . Determine

Topologies and Scenario
Three topologies were designed to evaluate the WST, including X, rectangle, and diamond topologies, as shown in Figure 2. Data collected from two communication technologies, Wi-Fi and BT, will be evaluated in these topologies. After performing experiments and observations, the transmission distances of Wi-Fi and BT were both about 100 m. As a result, for the X-type topology and diamond-type topology, the distance between each ITB was 100 m in the LIP experiment to ensure the sniffing range of each ITB was overlapping. As for the rectangle-type topology, it was compared with the X-type topology, and it was discussed whether the middle ITB would influence the performance of the rectangle-type topology and X-type topology. The distance between each ITB for the rectangle-type topology was 200 m. In the TMP and MDP experiments, due to the limitation of the experiment, the distance between each ITB was 50 m for the X-type topology and diamond-type topology and 100 m for the rectangle-type topology.  The configuration of road used in this study was summarized as follows. There are two lanes in each direction: one is the low speed vehicle lane, and the other is the sidewalk for each direction. As shown in Figure 3, cars are permitted to travel in both lanes, scooters and bikes are in the low-speed vehicle lane, and pedestrians walk on the sidewalk. The configuration of road used in this study was summarized as follows. There are two lanes in each direction: one is the low speed vehicle lane, and the other is the sidewalk for each direction. As shown in Figure 3, cars are permitted to travel in both lanes, scooters and bikes are in the low-speed vehicle lane, and pedestrians walk on the sidewalk. The configuration of road used in this study was summarized as follows. There are two lanes in each direction: one is the low speed vehicle lane, and the other is the sidewalk for each direction. As shown in Figure 3, cars are permitted to travel in both lanes, scooters and bikes are in the low-speed vehicle lane, and pedestrians walk on the sidewalk. Four assumptions were made in this paper. First, at least one mobile device has turned on either Wi-Fi or BT. Second, there are four transportation modes in this scenario, including car, scooter, bike, and walking. Third, the number of packets and the strength of RSSI will not be affected by the traffic peak hours and off-peak hours. Last, since normal urban traffic might be influenced by traffic jams, traffic signals, weather, or other conditions, the average free flow speed for vehicles on these two normal lanes is set as about 30-40 km/h.

Transportation Mode Problem (TMP)
When a vehicle is passing through two ITBs, the speed of a vehicle can be calculated from the distance between the two ITBs divided by vehicle's travel time. In this study, the KNN model (k-nearest neighbors) was applied to classify four transportation modes: cars, scooters, bikes, and pedestrians. The ideal sniffed signal patterns from a moving mobile device passing through one ITB are illustrated in Figure 4. It is obvious that the contacting window (i.e., time length of a signal pattern) will be short if a vehicle has high speed (e.g., car) and will be long if moving speed is low (e.g., pedestrian). Four assumptions were made in this paper. First, at least one mobile device has turned on either Wi-Fi or BT. Second, there are four transportation modes in this scenario, including car, scooter, bike, and walking. Third, the number of packets and the strength of RSSI will not be affected by the traffic peak hours and off-peak hours. Last, since normal urban traffic might be influenced by traffic jams, traffic signals, weather, or other conditions, the average free flow speed for vehicles on these two normal lanes is set as about 30-40 km/h.

Transportation Mode Problem (TMP)
When a vehicle is passing through two ITBs, the speed of a vehicle can be calculated from the distance between the two ITBs divided by vehicle's travel time. In this study, the KNN model (k-nearest neighbors) was applied to classify four transportation modes: cars, scooters, bikes, and pedestrians. The ideal sniffed signal patterns from a moving mobile device passing through one ITB are illustrated in Figure 4. It is obvious that the contacting window (i.e., time length of a signal pattern) will be short if a vehicle has high speed (e.g., car) and will be long if moving speed is low (e.g., pedestrian).

Lane Identification Problem (LIP)
In an ideal case, it is found that an RSSI signal strength has a negative correlation with the distance from a mobile device and an ITB, which indicates that such a relationship can be extended to identify the vehicle lane position. That is, one can infer which lane a vehicle is traveling in according to the variations of RSSI. However, this method is not practically feasible since RSSI patterns vary depending on mobile devices. It is hard to estimate the distance between a device and ITBs based on RSSI signal strength because the collected RSSI varies from device to device. Assuming that there is no variability on each device and ITB pair, it is possible to identify the lane information by comparing different RSSI data sniffed by different ITBs in the topologies. All sniffed data from each ITB in this study were uploaded to a cloud platform and processed by machine learning models to classify the lane information for each device.
An example of the LIP in which ITB deployment follows the topology in Figure 3 is illustrated in Figure 5a. When the vehicle is traveling in the inner lane, theoretically, ITB3

Lane Identification Problem (LIP)
In an ideal case, it is found that an RSSI signal strength has a negative correlation with the distance from a mobile device and an ITB, which indicates that such a relationship can be extended to identify the vehicle lane position. That is, one can infer which lane a vehicle is traveling in according to the variations of RSSI. However, this method is not practically feasible since RSSI patterns vary depending on mobile devices. It is hard to estimate the distance between a device and ITBs based on RSSI signal strength because the collected RSSI varies from device to device. Assuming that there is no variability on each device and ITB pair, it is possible to identify the lane information by comparing different RSSI data sniffed by different ITBs in the topologies. All sniffed data from each ITB in this study were uploaded to a cloud platform and processed by machine learning models to classify the lane information for each device.
An example of the LIP in which ITB deployment follows the topology in Figure 3 is illustrated in Figure 5a. When the vehicle is traveling in the inner lane, theoretically, ITB3 and ITB6 would sniff the highest strength RSSI, and ITB2 and ITB5 would sniff the lowest strength RSSI. The case of vehicle with lane-changing behavior is illustrated in Figure 5b. If the vehicle changes from the inner lane to the outer lane, ITB4 would sniff the highest strength RSSI, and ITB5 and ITB6 would sniff the lower RSSI, with ITB5 detecting the lowest one.
ship can be extended to identify the vehicle lane position. That is, one can infer which lane a vehicle is traveling in according to the variations of RSSI. However, this method is not practically feasible since RSSI patterns vary depending on mobile devices. It is hard to estimate the distance between a device and ITBs based on RSSI signal strength because the collected RSSI varies from device to device. Assuming that there is no variability on each device and ITB pair, it is possible to identify the lane information by comparing different RSSI data sniffed by different ITBs in the topologies. All sniffed data from each ITB in this study were uploaded to a cloud platform and processed by machine learning models to classify the lane information for each device.
An example of the LIP in which ITB deployment follows the topology in Figure 3 is illustrated in Figure 5a. When the vehicle is traveling in the inner lane, theoretically, ITB3 and ITB6 would sniff the highest strength RSSI, and ITB2 and ITB5 would sniff the lowest strength RSSI. The case of vehicle with lane-changing behavior is illustrated in Figure 5b. If the vehicle changes from the inner lane to the outer lane, ITB4 would sniff the highest strength RSSI, and ITB5 and ITB6 would sniff the lower RSSI, with ITB5 detecting the lowest one.

Multiple Devices Problem (MDP)
Similar information was used to identify whether several devices were located in the same car. The mobile devices will have similar RSSI patterns, such as similar detection time and peak period if they are presented in the same vehicle. The similar collected patterns indicate that these mobile devices were presented in the same vehicle, as illustrated in Figure 6b. On the other hand, the collected RSSI signal data may diverge into several groups if these mobile devices are in different vehicles, as shown in Figure 6a. Therefore, a clustered unsupervised machine learning model should be applied to cluster the signal in a similar spatiotemporal pattern in several groups. In this work, affinity propagation [24] was chosen as the proposed solution for the MDP.

Multiple Devices Problem (MDP)
Similar information was used to identify whether several devices were located in the same car. The mobile devices will have similar RSSI patterns, such as similar detection time and peak period if they are presented in the same vehicle. The similar collected patterns indicate that these mobile devices were presented in the same vehicle, as illustrated in Figure 6b. On the other hand, the collected RSSI signal data may diverge into several groups if these mobile devices are in different vehicles, as shown in Figure 6a. Therefore, a clustered unsupervised machine learning model should be applied to cluster the signal in a similar spatiotemporal pattern in several groups. In this work, affinity propagation [24] was chosen as the proposed solution for the MDP.

Framework
The framework for solving the three problems in the WST proposed in this work is shown in Figure 7. First, the wireless signal data collected by the ITB was uploaded to the MySQL database. During data preprocessing, datasets were transformed into features as the input of models and normalized after outlier filtering. Ground truths were labeled in the datasets for solving the LIP and TMP. For the LIP, lane information was labeled with the collected data, such as the inner lane or the outer lane. The ground truth of vehicle type, such as car, scooter, bike, or walking, was labeled for application to the TMP.

Framework
The framework for solving the three problems in the WST proposed in this work is shown in Figure 7. First, the wireless signal data collected by the ITB was uploaded to the MySQL database. During data preprocessing, datasets were transformed into features as the input of models and normalized after outlier filtering. Ground truths were labeled in the datasets for solving the LIP and TMP. For the LIP, lane information was labeled with the collected data, such as the inner lane or the outer lane. The ground truth of vehicle type, such as car, scooter, bike, or walking, was labeled for application to the TMP.
The framework for solving the three problems in the WST proposed in this work is shown in Figure 7. First, the wireless signal data collected by the ITB was uploaded to the MySQL database. During data preprocessing, datasets were transformed into features as the input of models and normalized after outlier filtering. Ground truths were labeled in the datasets for solving the LIP and TMP. For the LIP, lane information was labeled with the collected data, such as the inner lane or the outer lane. The ground truth of vehicle type, such as car, scooter, bike, or walking, was labeled for application to the TMP. The basic features are the maximum RSSI, the minimum RSSI, and the count of packets for each mobile device, and the detailed features would be discussed with each issue in Section 5. The classification or clustering accuracies of each issue will be presented with different topologies and communication technologies.

Hardware and Software
The ITB hardware used was a customized LTE (4G) router with Bluetooth, Wi-Fi, and LTE interfaces where the receiver sensitivities were −85, −76, and −72 dBm for Wi-Fi 802.11 b/g/n interfaces, respectively. The antenna was dual band (2.4 G/5 G), where the length was about 14.5 cm, and the antenna gain was 2 dBi. A packet analyzer was running under the Linux operating system, 'tcpdump' command was used for sniffing the wireless signal, and the collected data were uploaded to the cloud server and saved in the MySQL database. Python was applied to perform raw data preprocessing, and Scikit-learn was The basic features are the maximum RSSI, the minimum RSSI, and the count of packets for each mobile device, and the detailed features would be discussed with each issue in Section 5. The classification or clustering accuracies of each issue will be presented with different topologies and communication technologies.

Hardware and Software
The ITB hardware used was a customized LTE (4G) router with Bluetooth, Wi-Fi, and LTE interfaces where the receiver sensitivities were −85, −76, and −72 dBm for Wi-Fi 802.11 b/g/n interfaces, respectively. The antenna was dual band (2.4 G/5 G), where the length was about 14.5 cm, and the antenna gain was 2 dBi. A packet analyzer was running under the Linux operating system, 'tcpdump' command was used for sniffing the wireless signal, and the collected data were uploaded to the cloud server and saved in the MySQL database. Python was applied to perform raw data preprocessing, and Scikit-learn was used to perform the training and testing of data. The Jupyter Notebook, which is a web application, was used to share documents and program Python.
An observation of the collected signal raw data example is shown in Figure 8, where the scenario was six smartphones including three iOS and three Android smartphones with Wi-Fi being used in one vehicle with a speed of 40 km/h. The contact period was about 24 s, and the range of RSSI signal strength fell between −80 to −50 dBm.
Vehicles 2022, 4, FOR PEER REVIEW 9 used to perform the training and testing of data. The Jupyter Notebook, which is a web application, was used to share documents and program Python. An observation of the collected signal raw data example is shown in Figure 8, where the scenario was six smartphones including three iOS and three Android smartphones with Wi-Fi being used in one vehicle with a speed of 40 km/h. The contact period was about 24 s, and the range of RSSI signal strength fell between −80 to −50 dBm.

Transportation Mode Classification Problem (TMP)
The field experiment of the TMP was performed at the campus of National Cheng Kung University, as shown in Figure 9, where it was an enclosed field with a road width of 8 m. Four testers drove a car, rode a scooter, rode a bike, or walked through different ITB topologies several times. Drivers did not change lanes during driving. Ten mobile devices were carried in the testing vehicles. There were 355 samples for both the rectangletype topology and X-type topology with BT, 335 samples for the diamond-type topology with BT, 392 samples for both the rectangle-type topology and X-type topology with Wi-

Transportation Mode Classification Problem (TMP)
The field experiment of the TMP was performed at the campus of National Cheng Kung University, as shown in Figure 9, where it was an enclosed field with a road width of 8 m. Four testers drove a car, rode a scooter, rode a bike, or walked through different ITB topologies several times. Drivers did not change lanes during driving. Ten mobile devices were carried in the testing vehicles. There were 355 samples for both the rectangle-type topology and X-type topology with BT, 335 samples for the diamond-type topology with BT, 392 samples for both the rectangle-type topology and X-type topology with Wi-Fi, and 397 samples for the diamond-type topology with Wi-Fi. The speed of the car and scooter was about 35 km/h, the bike was about 15 km/h, and the pedestrians walked at the speed of 3~5 km/h.

Transportation Mode Classification Problem (TMP)
The field experiment of the TMP was performed at the campus of National Chen Kung University, as shown in Figure 9, where it was an enclosed field with a road widt of 8 m. Four testers drove a car, rode a scooter, rode a bike, or walked through differen ITB topologies several times. Drivers did not change lanes during driving. Ten mobil devices were carried in the testing vehicles. There were 355 samples for both the rectangle type topology and X-type topology with BT, 335 samples for the diamond-type topolog with BT, 392 samples for both the rectangle-type topology and X-type topology with Wi Fi, and 397 samples for the diamond-type topology with Wi-Fi. The speed of the car and scooter was about 35 km/h, the bike was about 15 km/h, and the pedestrians walked at th speed of 3~5 km/h. It was challenging to identify the types of transportation modes in the urban scenari rather than in the freeway scenario since there are more vehicle types in the urban area Since the number of vehicle types is definite, the TMP can be identified as a classificatio problem. The k-nearest neighbors (KNN), a supervised machine learning model, was used to classify the transportation modes. It finds the k nodes which are closest to the new dat Campus field experiment 12 0m It was challenging to identify the types of transportation modes in the urban scenario rather than in the freeway scenario since there are more vehicle types in the urban area. Since the number of vehicle types is definite, the TMP can be identified as a classification problem. The k-nearest neighbors (KNN), a supervised machine learning model, was used to classify the transportation modes. It finds the k nodes which are closest to the new data from training datasets and classifies these data by combining these nodes. K is a user-defined value, and the KNN model will compute training and testing datasets with respect to different k values. After choosing the k value, the model will classify the new data by the majority vote based on the categories of these k nodes. In the TMP, both 10-fold cross-validation and confusion matrixes were used to measure the model performance. In the dataset, the proportions of the training set and the testing set were 75% and 25%. Four kinds of features concerning the TMP were selected, which were min. and max. RSSI, packet counts, and time duration of each device and ITB pair.
The size of the sniffed sample was 355 by Bluetooth and 392 by Wi-Fi. Three TMP confusion matrixes for BT are shown in Figure 10. In both X-type topology and rectangletype topology, one scooter was misidentified as a bike. For the diamond-type topology, three cars were misidentified as scooters, and three scooters were misidentified as cars. Figure 11 shows the confusion matrixes for Wi-Fi. In the X-type topology, one bike was misidentified as walking, three cars were misidentified as scooters, and a scooter was misidentified as a car. In the rectangle-type topology, one bike was misidentified as walking, a car was misidentified as a scooter, and three scooters were misidentified as cars. In the diamond-type topology, a car was misidentified as a scooter and a scooter was misidentified as a car. Figure 11 shows the confusion matrixes for Wi-Fi. In the X-type topology, one bike was misidentified as walking, three cars were misidentified as scooters, and a scooter was misidentified as a car. In the rectangle-type topology, one bike was misidentified as walking, a car was misidentified as a scooter, and three scooters were misidentified as cars. In the diamond-type topology, a car was misidentified as a scooter and a scooter was misidentified as a car.   Figure 12 shows the accuracy of 10-fold cross-validation in three topologies, where k means the k value in the KNN method, and the value of k in each topology was chosen from the best k value by trial and error from k = 1 to 15. The results show that the accuracy of BT data was 98.9% and Wi-Fi data was 94.9%. For both the X-type topology and rectangle-type topology, the performances of BT were better than Wi-Fi. On the contrary, the performance of Wi-Fi was better than BT in the diamond-type topology.  Figure 12 shows the accuracy of 10-fold cross-validation in three topologies, where k means the k value in the KNN method, and the value of k in each topology was chosen from the best k value by trial and error from k = 1 to 15. The results show that the accuracy of BT data was 98.9% and Wi-Fi data was 94.9%. For both the X-type topology and rectangle-type topology, the performances of BT were better than Wi-Fi. On the contrary, the performance of Wi-Fi was better than BT in the diamond-type topology. Figure 12 shows the accuracy of 10-fold cross-validation in three topologies, where k means the k value in the KNN method, and the value of k in each topology was chosen from the best k value by trial and error from k = 1 to 15. The results show that the accuracy of BT data was 98.9% and Wi-Fi data was 94.9%. For both the X-type topology and rectangle-type topology, the performances of BT were better than Wi-Fi. On the contrary, the performance of Wi-Fi was better than BT in the diamond-type topology.

Lane Identification Problem (LIP)
The experiment collected wireless signals in different lanes and routes. The field of the experiment was on Chengnan Rd., Annan Dist., Tainan, Taiwan (as shown in Figure  13). It is a semi-enclosed field with two lanes on each direction. In this experiment, ten mobile devices were put in the car with Wi-Fi and BT being on. A car with 10 mobile devices passed by the ITB several times at a speed of 30 km/h, which indicates the average travel speed of a road section in an urban scenario. Two scenarios were performed in this experiment.

Lane Identification Problem (LIP)
The experiment collected wireless signals in different lanes and routes. The field of the experiment was on Chengnan Rd., Annan Dist., Tainan, Taiwan (as shown in Figure 13). It is a semi-enclosed field with two lanes on each direction. In this experiment, ten mobile devices were put in the car with Wi-Fi and BT being on. A car with 10 mobile devices passed by the ITB several times at a speed of 30 km/h, which indicates the average travel speed of a road section in an urban scenario. Two scenarios were performed in this experiment. For the LIP, the lane information was distinguishable and so it was a classification problem. In addition, there were about 200-500 samples (small sample sets) for both BT and Wi-Fi with three topologies. The SVM, a widely used supervised learning model, was applied to solve the LIP because it is more intuitive to this classification problem compared to other machine learning models. In a high-dimensional space, the SVM model develops a hyperplane to separate the samples to achieve the classification effect. The process of obtaining the best hyperplane can be regarded as an optimization problem. The formulas are as follows: ≥ 0 In (4), to maximize the margin ‖ ‖ , it could be transferred into min ‖ ‖ ; equals 1 or −1, and it is the classification of the datasets; + is the hyperplane which equals 0; is the slack variable, and C is a regularization variable which is used to prevent overfitting. Driving and changing lanes: There were two routes for each topology, changing lanes from the interior one to the exterior one, and vice versa.
For the LIP, the lane information was distinguishable and so it was a classification problem. In addition, there were about 200-500 samples (small sample sets) for both BT and Wi-Fi with three topologies. The SVM, a widely used supervised learning model, was applied to solve the LIP because it is more intuitive to this classification problem compared to other machine learning models. In a high-dimensional space, the SVM model develops a hyperplane to separate the samples to achieve the classification effect. The process of obtaining the best hyperplane can be regarded as an optimization problem. The formulas are as follows: In (4), to maximize the margin 2 w , it could be transferred into min w 2 ; y i equals 1 or −1, and it is the classification of the datasets; w T x i + b is the hyperplane which equals 0; i is the slack variable, and C is a regularization variable which is used to prevent overfitting.
A two-layered hierarchical architecture including three independent SVM models was proposed to solve the LIP, as shown in Figure 14. The top layer SVM was applied to identify if a vehicle was changing lanes. Two lower level SVMs were designed. One was used to determine which way the vehicle was going if it changes lanes; the other applies for determining which lane the vehicle was traveling in if it does not change lanes. The signal data were collected based on three topologies: X-type topology, rectangle-type topology, and diamond-type topology. Since the variations in RSSI for each mobile device and each ITB are the critical pieces of information, three features were selected for training the SVM model: packet count, maximum and minimum RSSI. The proposed hierarchical structure for the LIP is illustrated as Figure 14, where three SVM models were be constructed as follows: The average collected signal strengths were −85.44, −85.74, and −84.15 for RSSI using Bluetooth sniffers, and −17.46, −20.0, and −24.91 for RSSI using Wi-Fi on rectangle, X-type, and diamond-type topologies, respectively. The accuracies for the LIP are shown in Figure  15, which is summarized by 10-fold cross validation. The overall accuracy can be estimated by the average accuracy composite of higher SVM and lower level SVM (average of SVM1×SVM2 and SVM1×SVM3). For BT, the accuracy in identifying four cases of lanechanging behaviors was about 45.4%, 44.9%, and 39.6% in the X-type, rectangle-type, and diamond-type topologies, respectively. For the Wi-Fi, the accuracy in identifying four cases of lane-changing behaviors is 34%, 27.3%, and 45.4% in the X-type, rectangle-type, and diamond-type topologies, respectively. The results show that the X-type topology is superior to the others in BT. In the three classifiers, the accuracy in identifying the fixed lane (SVM 3) was about 80%, showing that the signal of the fixed lane varies regularly and could be classified easily.  Figure 15, which is summarized by 10-fold cross validation. The overall accuracy can be estimated by the average accuracy composite of higher SVM and lower level SVM (average of SVM1 × SVM2 and SVM1 × SVM3). For BT, the accuracy in identifying four cases of lane-changing behaviors was about 45.4%, 44.9%, and 39.6% in the X-type, rectangle-type, and diamond-type topologies, respectively. For the Wi-Fi, the accuracy in identifying four cases of lane-changing behaviors is 34%, 27.3%, and 45.4% in the X-type, rectangle-type, and diamond-type topologies, respectively. The results show that the X-type topology is superior to the others in BT. In the three classifiers, the accuracy in identifying the fixed lane (SVM 3) was about 80%, showing that the signal of the fixed lane varies regularly and could be classified easily.
changing behaviors was about 45.4%, 44.9%, and 39.6% in the X-type, rectangle-type, and diamond-type topologies, respectively. For the Wi-Fi, the accuracy in identifying four cases of lane-changing behaviors is 34%, 27.3%, and 45.4% in the X-type, rectangle-type, and diamond-type topologies, respectively. The results show that the X-type topology is superior to the others in BT. In the three classifiers, the accuracy in identifying the fixed lane (SVM 3) was about 80%, showing that the signal of the fixed lane varies regularly and could be classified easily.

Multiple Devices Problem (MDP)
There is no quantitative relationship between mobile phones and vehicles. In the traditional k-center clustering methods, such as k-means, the number of clusters (k) has to be determined first, and the clustering result may be incorrect. For example, there are two cars (two clusters) passing through the ITB at a certain time, and the ITB sniffs three devices. In the real situation, three devices are in one car, and the traditional k-center

Multiple Devices Problem (MDP)
There is no quantitative relationship between mobile phones and vehicles. In the traditional k-center clustering methods, such as k-means, the number of clusters (k) has to be determined first, and the clustering result may be incorrect. For example, there are two cars (two clusters) passing through the ITB at a certain time, and the ITB sniffs three devices. In the real situation, three devices are in one car, and the traditional k-center clustering model may group these devices into two clusters after k = 2 is specified. As a result, traffic information may be misidentified.
The affinity propagation algorithm (AP) was suitable for solving the MDP since it was unnecessary to specify the number of clusters. AP is a clustering algorithm proposed by Frey and Dueck [24], which calculates the similarity based on the concept of a message passing between data points. If there are n points, the similarity between n points can form a similarity matrix. In the AP algorithm, each point is a possible cluster center point, called an exemplar. Responsibility (R(i, k)) and Availability (A(i, k)) are the measurements to decide whether it is a cluster center point. The former is the degree to which point k is suitable for the clustering of the center of data point i, and the latter is the degree of suitability that data point i selects point k as the center of the cluster. If they are greater, it means k would probably be the clustering center. The R value and the A value will continue to be iteratively updated. When the cluster center is no longer updated to a certain extent or reaches the maximum number of iterations, the cluster center can be obtained, and the data are clustered. Three statistics of features were prepared for AP, including maximum RSSI, minimum RSSI, and count of sniffed packets for each mobile device collected by each ITB.
The experiment scenario designed for the MDP is illustrated in Figure 16. Two vehicles, a car and a scooter, were tested and were equipped with two and three mobile devices, respectively. The two vehicles passed through the ITB at normal speed for each topology. There were two scenarios in this experiment: in Figure 16a the vehicles are in parallel, and in Figure 16b the vehicles are in tandem.
Vehicles 2022, 4, FOR PEER REVIEW 14 clustering model may group these devices into two clusters after k = 2 is specified. As a result, traffic information may be misidentified. The affinity propagation algorithm (AP) was suitable for solving the MDP since it was unnecessary to specify the number of clusters. AP is a clustering algorithm proposed by Frey and Dueck [24], which calculates the similarity based on the concept of a message passing between data points. If there are n points, the similarity between n points can form a similarity matrix. In the AP algorithm, each point is a possible cluster center point, called an exemplar. Responsibility (R(i, k)) and Availability (A(i, k)) are the measurements to decide whether it is a cluster center point. The former is the degree to which point k is suitable for the clustering of the center of data point i, and the latter is the degree of suitability that data point i selects point k as the center of the cluster. If they are greater, it means k would probably be the clustering center. The R value and the A value will continue to be iteratively updated. When the cluster center is no longer updated to a certain extent or reaches the maximum number of iterations, the cluster center can be obtained, and the data are clustered. Three statistics of features were prepared for AP, including maximum RSSI, minimum RSSI, and count of sniffed packets for each mobile device collected by each ITB.
The experiment scenario designed for the MDP is illustrated in Figure 16. Two vehicles, a car and a scooter, were tested and were equipped with two and three mobile devices, respectively. The two vehicles passed through the ITB at normal speed for each topology. There were two scenarios in this experiment: in Figure 16a  The average collected signal strengths were −85.21, −85.55, −87.41 for RSSI using Bluetooth sniffers, and −22.03, −31.86, −36.1 for RSSI using Wi-Fi on rectangle, X-type, and diamond-type topologies, respectively. The accuracy of the proposed solution for MDP is shown in Figure 17, where Figure 17a,b presents the experiment results of the collected signal data in Bluetooth and Wi-Fi, respectively. Experiment results of Scenario 1 (vehicles in parallel) and Scenario 2 (vehicles in tandem) show that the accuracy of estimating vehicle numbers was 100% except for the rectangle-type topology with Wi-Fi. It indicates that most of the mobile devices could be clustered into two groups (two vehicles) in spite of different scenarios, sensor topologies, and communication schemes.

Discussion
For the evaluation of the TMP, the performance of the proposed solution was good, where the average accuracy was 96.9% for BT and 95.9% for Wi-Fi. It was mainly because the moving patterns of motored vehicles and non-motored vehicles were different in trivial ways. However, cars and scooters were misidentified more frequently. The reason may be that the moving speed patterns of these two types of vehicles are similar. Moreover, the misidentification of these two vehicles may be caused by the hardware and software variations in different smartphone brands, and so the patterns of maximum RSSI are irregular.
In the field experiment of the LIP, the best accuracy of the proposed solution for classifying four lane moving cases (inner, outer, inner to outer, outer to inner) was 45.4%, both in BT and Wi-Fi. It seems not good enough since the overall performance was a product of composite accuracy of upper level SVM and lower level. However, if we assume the vehicles do not change lanes in such a short area, then the LIP can be simplified by one SVM (SVM3), which had a higher accuracy of 81% (BT) and 71.3% (Wi-Fi). Comparing to the heuristic algorithm solution proposed in Fan [6], where the performance of accuracies were 91.5% (BT) and 33.8% (Wi-Fi), the solution proposed in this work is much more stable.
For both results, BT data performs better than Wi-Fi data. The reason may be because the communication distance of BT is shorter than Wi-Fi and so the possibilities of misidentifying lane position will be lower. In terms of topology, the X-type topology performs better than the rectangle and diamond topologies. It can be inferred that X-type topology has both symmetrical and asymmetrical ITB deployment, which can identify the vehicle moving variation in signal sniffing.
In the MDP experiment, the number of vehicles can be precisely estimated; however, the identification of which vehicle the device is located in may be misidentified. The average accuracy of device clustering was 66.7%, which indicates that one device may have a 33.3% possibility in being misidentified. The main reason for such differences may be the selected features are not providing sufficient implied information for the clustering model. Two ideas could be introduced for enhancing the device assigning accuracy: one is that more features could be developed, and the other is time series deep learning models could be tried, which are planned for future works.

Conclusions
Due to the uniqueness of the MAC address, traffic information such as speed, origin-

Discussion
For the evaluation of the TMP, the performance of the proposed solution was good, where the average accuracy was 96.9% for BT and 95.9% for Wi-Fi. It was mainly because the moving patterns of motored vehicles and non-motored vehicles were different in trivial ways. However, cars and scooters were misidentified more frequently. The reason may be that the moving speed patterns of these two types of vehicles are similar. Moreover, the misidentification of these two vehicles may be caused by the hardware and software variations in different smartphone brands, and so the patterns of maximum RSSI are irregular.
In the field experiment of the LIP, the best accuracy of the proposed solution for classifying four lane moving cases (inner, outer, inner to outer, outer to inner) was 45.4%, both in BT and Wi-Fi. It seems not good enough since the overall performance was a product of composite accuracy of upper level SVM and lower level. However, if we assume the vehicles do not change lanes in such a short area, then the LIP can be simplified by one SVM (SVM3), which had a higher accuracy of 81% (BT) and 71.3% (Wi-Fi). Comparing to the heuristic algorithm solution proposed in Fan [6], where the performance of accuracies were 91.5% (BT) and 33.8% (Wi-Fi), the solution proposed in this work is much more stable.
For both results, BT data performs better than Wi-Fi data. The reason may be because the communication distance of BT is shorter than Wi-Fi and so the possibilities of misidentifying lane position will be lower. In terms of topology, the X-type topology performs better than the rectangle and diamond topologies. It can be inferred that X-type topology has both symmetrical and asymmetrical ITB deployment, which can identify the vehicle moving variation in signal sniffing.
In the MDP experiment, the number of vehicles can be precisely estimated; however, the identification of which vehicle the device is located in may be misidentified. The average accuracy of device clustering was 66.7%, which indicates that one device may have a 33.3% possibility in being misidentified. The main reason for such differences may be the selected features are not providing sufficient implied information for the clustering model. Two ideas could be introduced for enhancing the device assigning accuracy: one is that more features could be developed, and the other is time series deep learning models could be tried, which are planned for future works.

Conclusions
Due to the uniqueness of the MAC address, traffic information such as speed, origindestination estimation, and intersection turning proportions can be obtained by the WST cost-effectively in real time. In this work, an intelligent traffic beacon (ITB) was reinvented as an integrated solution for traffic information collection, which makes it a promising traffic information source compared to traditional vehicle detector technologies. The proposed ITB integrates three machine learning models, hierarchical SVM, KNN, and affinity propagation, to solve the three problems, LIP, TMP, and MDP, separately.
Field experiments with three sensor topologies (X-type, rectangle-type, and diamondtype) and two wireless sniffing schemes (Bluetooth and Wi-Fi) were conducted in urban scenarios. The results show that X-type topology outperforms others in all three problems, and diamond-type yields the worst and most unstable performance. The reason may be that both the X-type topology and rectangle-type topology have symmetrical ITBs, and the pair of opposite ITBs could sniff similar signal variations. For the communication scheme, Bluetooth performs better than Wi-Fi because of the existence of outliers in Wi-Fi scenarios.
For future work, it is possible to improve the performance of the ITB by combining the advantages of the WST and other techniques to collect more comprehensive traffic information. Suggestions for future research are listed below: 1.
More wireless features, such as channel state information (CSI), could be explored for models to learn the implied information.

2.
More machine learning or deep learning models could be evaluated and compared to enhance the accuracy for these issues.

3.
More traffic scenarios could be designed in field experiments, such as traffic congestion or overtaking driving behaviors.

4.
In the TMP, the accuracies of the three topologies with two communication technologies are all greater than 90%. However, some cars or scooters were misidentified as each other due to similar driving speeds. Such differences may be caused by the selected features, which entail insufficient information for data clustering.

5.
Some other emerging traffic information collecting technologies, such as vehicle detection on video data by deep learning (such as YOLO [25]), can be integrated into the ITB.