Crowdsourcing-Assisted Radio Environment Database for V2V Communication †

In order to realize reliable Vehicle-to-Vehicle (V2V) communication systems for autonomous driving, the recognition of radio propagation becomes an important technology. However, in the current wireless distributed network systems, it is difficult to accurately estimate the radio propagation characteristics because of the locality of the radio propagation caused by surrounding buildings and geographical features. In this paper, we propose a measurement-based radio environment database for improving the accuracy of the radio environment estimation in the V2V communication systems. The database first gathers measurement datasets of the received signal strength indicator (RSSI) related to the transmission/reception locations from V2V systems. By using the datasets, the average received power maps linked with transmitter and receiver locations are generated. We have performed measurement campaigns of V2V communications in the real environment to observe RSSI for the database construction. Our results show that the proposed method has higher accuracy of the radio propagation estimation than the conventional path loss model-based estimation.


Introduction
Along with the rapid development of wireless communication technology, the number of mobile terminals has significantly increased during the last decade. As can be seen from a prediction by Cisco [1], 50 billion terminals will be connected to the Internet by 2020, and an enormous number of wireless terminals will communicate in various environments under the finite spectrum resources. In order to efficiently utilize the limited spectrum, it is important to appropriately select communication parameters such as frequency and transmission power according to the radio environment characteristics between terminals. Therefore, a radio propagation estimation becomes a crucial issue in the future wireless communication systems.
The use of the empirical propagation model is the most fundamental method for the radio propagation estimation because these models had been constructed by many discussions with the huge measurement campaigns in various environments. Although the accuracy of the path loss models is limited, the error characteristics are well known; these models enable the simple system investigate the communication efficiency. In addition, in the discussion of the accuracy of the proposed method, a comparison of accuracy with other path loss models, the Okumura-Hata model and the two-ray path loss model is newly added. Figure 1 shows an overview of the radio environment database for wireless distributed networks. The database splits the communication area into the two-dimensional meshes. By regarding each mesh as a transmission location, the database creates the REMs linked with each mesh: N maps are stored at the maximum when the area is divided into N meshes. Although there are several methods for the map construction such as empirical path loss models and ray tracing, we focus on the measurement-based approach. In this way, it becomes possible to grasp statistical radio environment characteristics based on measurement data and radio environment can be estimated more accurately than the existing propagation model. As shown in Figure 2, the transmitting vehicle firstly transmits a signal to the surrounding receiving vehicles. Here, we assume that the packet includes the transmission position and the ID of the transmitter. This is a reasonable assumption in the V2V systems because the packet in IEEE 802.11p usually includes these information for the traffic safety support. The receiver extracts these information, and stores with the reception position, the received signal power, the reception time, the operating frequency and the ID of the receiver. The stored information is reported to the database when the vehicle can be online with some methods such as cellular and Wi-Fi. In the practical situation, we should consider the effect of the upload procedure on the communication traffics. To offload the upload cost, it is better to upload the measurement data in the time of low communication traffic, such as uploading via home Wi-Fi at night. After the database collects massive communication results, we create the REMs by utilizing the datasets. When multiple reported value data are included in a mesh, the mesh calculates the average value of the received power.

Overview of Proposed Radio Environment Database
The dataset accumulated after the initial database construction should be utilized again for averaging. This is because the average characteristic of radio propagation, i.e., sum of path loss and shadowing, is generally static over time domain. We anticipate that the accuracy of averaged multipath fading improves, as the number of samples increases. On the other hand, in the realistic situation, the shadowing effect will fluctuate when there is a change in the environment of the structure. One candidate of the countermeasure for this problem is to investigate the change of structure by time series analysis of observed values such as comparing the latest observation result with the probability density function (PDF) that consists of past observation results. If a change of the structure is detected, we can reconstruct the accurate database by removing the past observation results for the database construction or by taking weighted averaging with forgetting factors. Because this problem requires new algorithms and is far beyond the focus of this paper, we consider that this is a future work. After the database construction, the transmitter can adjust the communication parameters by utilizing the statistical information stored in the database. When the transmitter accesses to the database, the database provides the map corresponding to the transmitter position. By using the map, the transmitting vehicle can confirm the statistical communication quality at the arbitrary received location, and can modify the communication parameters. For example, when the estimated received signal power is lower than the desired value, the transmitter can improve the packet reception probability by using higher transmission power, low rate modulation schemes, and relay communications. In addition, if the estimated quality is higher than the desired value, the transmitter can choose the high rate communication with higher-order modulation or the interference mitigation with lower transmission power. The effect of the database-assisted communication parameter setting is discussed in Section 4.4.
Note that this paper does not consider the use of spatial interpolation for the REM construction. This is because the main purpose of this paper is to show that the proposed database can estimate the site-specific propagation attenuation using meshes related to the pair of transmission and reception locations. It should be noted that the spatial interpolation can be implemented by the path loss model that is fitted to the actually observed datasets. This is discussed in Section 4.

Measurement Campaign
In order to construct a radio environment database for wireless distributed networks and evaluate the accuracy of the database, V2V communications were conducted in Chofu City and Mitaka City, typical suburban areas in Tokyo, Japan over three days in January 2017. Measurement conditions are shown in Table 1. Note that the measurement setup, procedure, and the database structure are the same as in our initial work [23].

Measurement Equipment
Three vehicles shown in Figure 3 were prepared, and each vehicle was implemented in an in-vehicle device. For gathering the radio environment information, the vehicles communicated each other while traveling on the route shown in Figure 4. The speed of the vehicle is 40 km/h because this is the legal speed of vehicles in the metropolitan area in Japan. ARIB STD-T109 standard transmitters (Denso Corporation, Kariya, Japan) and receivers are used for V2V communication. This standard is Japanese V2V communication based on IEEE 802.11p and operated over 760 MHz. The modulation format is Orthogonal Frequency Division Multiplexing (OFDM), the access format is Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA), and the protocol stack is IEEE 802.11p based physical layer. A signal is transmitted from the onboard device at a cycle of 100 ms, and the other vehicle records the received signal power, the received time, the ID of both transmitter and receiver, and the received/transmitted position. Note that the transmitted position was extracted from the transmitted packet, and the received position and the time was obtained from the GPS connected to the onboard device. Note that Garmin GPS 18x, a USB-connected device, was used for the GPS system. The GPS acquires the location and time information once per second, and the accuracy of the position information is 95% within 15 m.
Here, specifications of the onboard device are shown in Table 2. The transmission time of one packet is 232 µs, the transmission cycle is 100 ms, and the position information updating period is 200 ms. Therefore, the vehicle device updates the position information every two messages. In addition, monopole antennas are used for both the reception and transmission. Table 3 shows the antenna characteristics.

Statistical Processing with Radio Environment Database
A radio environment database was constructed by statistical process to the measured datasets. Figure 5 shows the overview of the database. The database is operated by Cent OS 7 and MySQL 5.7, and the statistical processing is implemented by Hypertext Preprocessor (PHP) programs. After the measurement campaign, the statistical processing is performed as follows: 1. The measured datasets are reported from each vehicle to the database. 2. The database server registers the reported datasets to the table in the MySQL server. 3. The database server performs the statistical processing of the dataset. In this scheme, the database calculates the average RSSI for each pair of transmission and reception meshes using a PHP program. Then, the calculated statistical data is stored in the table for the statistical data. 4. According to the request from the terminal, the database server attempts to get the statistical data from the MySQL. 5. The terminal can get the radio environment maps.
In our measurement campaign, the receiving vehicles record the above information into a comma-separated values (CSV) file when the packet is successfully decoded. The stored CSV file is uploaded to the database server offline after the measurement campaign. In the database server, the RSSI is stored for each pair of transmission and reception meshes and statistical processing is performed. Here, transmission and reception meshes are calculated by normalizing latitude and longitude in the database server. In the local PC, download the statistical data in which RSSI is averaged for each pair of transmission and reception meshes in the CSV file. Figure 6 shows the example of the statistical data. Then, we specify any transmission mesh and create the radio environment map by using Python script. Finally, we can download the CSV file and create a map with Python script. If the new datasets are observed after the initial creation of the map, the new datasets are added to the accumulation table in the database server and calculate the average RSSI for each pair of transmission and reception meshes again. Tables 4 and 5 show registration data and statistical data in the MySQL server, respectively. The mesh code (First, Second, Third...) shows the scale of communication area and the smaller the mesh code, the more detailed radio environment can be grasped. The mesh code is calculated from latitude and longitude and used to conduct REMs. Transmitter ID and Receiver ID are used to delete unnecessary datasets. Statistical data represents the radio environment in the communication area and REMs are conducted by using Transmitter mesh code, Receiver mesh code and Averaged received power in dBm for each Transmitter mesh.

Experimental Results
In this section, we discuss the constructed database. After several maps are shown as examples, the accuracy of the database is evaluated by using Root Mean Squared Error (RMSE). Figure 7a. In these figures, the datasets were processed with a 10 m-squared mesh. Each map has different transmission positions, and we can confirm the effect of obstacles on the radio environment characteristics. As can be seen from the maps, the received signal power characteristics obviously fluctuate by changing the reference transmission mesh. For example, the difference of around 20 dB can be confirmed between two maps on the north side of the area. In Figure 7b, since the transmission position is located in front of the structure, the communication between the transmitter and the north area becomes a non-line-of-sight (NLOS) environment. On the other hand, line-of-sight (LOS) communications can be conducted in Figure 7c: the obstacle-dependent signal attenuation does not occur in this case.   These facts suggest that the measurement-based maps can contain the effect of structure in the radio propagation estimation. The accuracy of the constructed database is evaluated in Section 4.3.

Figures 7b,c show examples of the constructed maps in
Here, because only the route where the vehicles traveled has the average RSSIs, the map has tooth-missing information. On the other hand, in actual operation, massive numbers from the communication log will be obtained from many crowdsourcing vehicles. This means that there will be no data loss on the road where we usually drive.
In addition, we evaluated the calculation time for statistical processing of the datasets. Statistical processing is implemented by PHP 7.0.12 in Cent OS 7 with Intel(R) Xeon(R) CPU E5-2407@ 2.20 GHz. Table 6 summarizes examples of the calculation times for the statistical processing. The first row of the Table 6 is all datasets in this measurement campaign.

Propagation Characteristics
We derive the shadowing characteristics in this experimental environment to indicate that the characteristics show a good match with the general suburban area. First, the statistical processing with 10 m-squared meshes is performed by using all datasets. To derive the path loss characteristic, the link distances in the datasets were obtained from the transmitter/receiver position information. The logarithm of the link distance d (m) is then calculated and the scatter diagram was obtained. Note that the noise floor of our equipment in the communication bandwidth is about −96.0 dBm. In order to eliminate effects of the noise floor, we limit the maximum distance in the performance evaluation between links to 100 m. If we process the datasets over longer communication distances, the average received power is overestimated because we cannot obtain RSSIs below the noise floor. Because this is a challenging problem, this paper saves this issue for a future work. A similar problem is pointed out in some papers (e.g., Reference [24]). Referring to such a discussion, the overestimation will be improved.
In this paper, we model the distance attenuation characteristics as where a 1 , a 2 are the path loss index factor of the location dependency and b 1 , b 2 are the constant value that contains transmission power and antenna effects. R b is the distance from transmission position to break point and is expressed as follows [25]: where h b and h m are the transmitter and receiver height including antenna height, and these values are 1.485 m, respectively. a 1 , a 2 and b 1 , b 2 were obtained from the scatter diagram by using the least squares method, and the path loss characteristic was estimated. From the least squares method, we obtained the distance attenuation characteristic in this communication area as follows: Therefore, the path loss index can be calculated as 3.1925 for d > R b . The shadowing components were then obtained by subtracting the estimated path loss from the average received power in each mesh. As shown in Figure 8, the lognormal-like distribution can be confirmed. Here, the logarithmic mean is µ = 0.0005198 dB, and the standard deviation is σ = 3.776 dB. These values match well with the empirical values in typical suburban areas [26][27][28].

Estimation Accuracy
Next, we evaluate the accuracy of the constructed database. We constructed the database using the datasets observed on days 1 and 2. Statistical processing was performed with 2 m-, 5 m-, and 10 m-squared meshes, and the average received power value for each mesh was calculated. The datasets observed on day 3 were treated as the instantaneous signal power, and RMSE was calculated from the difference between the constructed database and the instantaneous values.
For comparison, path loss-based estimation method is also evaluated. The model in this communication area was calculated from the datasets observed on days 1 and 2. Although there are more complex path loss models that consider a larger number of variables (i.e., terrain models), they do not necessarily make better predictions [2]. Thus, the fitted path loss is a simple and accurate candidate for the compared path loss model. The scatter diagram is shown in Figure 9. The path loss was derived by considering the break point. From the figure, the path loss with the datasets was estimated as Note that parameters in this equation are different from Equation (3) because Equation (3) is estimated by using datasets observed on all the measurement days. On the other hand, Equation (4) consists of datasets obtained only on days 1 and 2.
In the path loss-based method, the estimated received power is derived by subtracting the propagation loss corresponding to Equation (4) from the transmission power. Then, we calculate the RMSE by taking the difference between the estimated received power and the datasets observed on day 3.
Logarithm of inter-link distance Received power value [dBm] In addition, we compare the estimation accuracy to the ITU-R P.1411 [25]. In this evaluation, we calculated attenuation based on ITU-R P.1411 by using the following equation: where d [m] is inter-link distance. L b is propagation loss from transmitter position to break point and R b is the distance from transmission position to break point. They are expressed as follows: Estimated received power was derived by subtracting the propagation loss that corresponded to Equation (5) from transmission power and calculated RMSE from the difference between the estimated received power and the day 3 datasets.
In addition, we compare the estimation accuracies of the two-ray path loss model [29] and the Okumura-Hata model [30]. Figure 10 shows the RMSE characteristics. The RMSE of the fitted path loss model was 5.546 dB. On the other hand, the RMSE of the constructed database are 4.241 dB at 2 m-squared mesh, 4.210 dB at 5 m-squared mesh, and 4.489 dB at 10 m-squared mesh. The reason why the RMSE at the 2 m-squared mesh is slightly large is that the number of data hits in the search was reduced due to the reduction of the mesh size, and a sufficient amount of data could not be obtained. Considering the standard deviation of the shadowing component and the instantaneous fading component, we can consider that the RMSE derived by this accuracy evaluation is generally a good value. The RMSE of the two-ray path loss model, the Okumura-Hata model, and ITU-R P.1411 were 12.34 dB, 13.70 dB and 5.67 dB, respectively. From the above results, we can conclude that more detailed signal fluctuation can be predicted by using the measurement-based radio environment database in the V2V communication systems. The proposed database will enable the improvement of the communication efficiency in the V2V communications.
Here, it should be noted that the fitted path loss can be used for estimating the received signal power where there is no statistical data. Thus, the RMSE of the fitted path loss also shows the performance of fitted path loss-aided interpolation.

Power Control Based on the Proposed Database
Finally, we discuss the effect of the proposed database-assisted communications. In the unicast communication, unlike the broadcast communication, the transmitted signal becomes the interference power at the vehicles other than the destination. Thus, suppressing the transmission power to the minimum, i.e., suppressing the spatial interference power, improves the communication efficiency. Motivated by this fact, we consider a simple power control situation where the transmitter designs its own transmission power according to the database so that we guarantee the outage probability that the received signal power falls below the desired value. The performance of the power control is performed according to the following procedure: 1. The database with 10-m mesh is constructed with the datasets observed on days 1 and day 2. 2. The empirical cumulative distribution function (CDF) of multipath fading factor in the evaluated area is estimated by taking the difference between each instantaneous dataset used for the database construction and the average received signal power in the database. We assume that the shape of CDF is static in the evaluation area.
3. The datasets observed on day 3 are treated as the instantaneous true received signal power. The database estimates the instantaneous received power at the reception mesh via the constructed database. 4. The database calculates the transmission power from the estimated average received signal power and the empirical CDF of the multipath fading so that the outage probability is guaranteed. Note that we limit the maximum transmission power to 19.2 dBm, which is the rated output in our equipment. 5. This power control is applied for all the instantaneous datasets observed on day 3.
The evaluation is performed around the intersection shown in Figure 7. For the performance comparison, we also perform the empirical path loss-based power control with the same procedure. In this method, the average received power is estimated via the measurement-based path loss model fitted to Equation (1). By fitting the model with the datasets from day 1 and day 2, we estimated the path loss as Note that parameters in this equation are different from Equations (3) and (4) because Equation (8) is estimated with datasets obtained in the intersection shown in Figure 7. On the other hand, both Equations (3) and (4) are estimated by the datasets obtained over the entire measurement area.
We performed the power control where the permissible outage probability is 0.1-0.2 and the desired received power is -82.0 dBm. Figure 11 shows the outage probability. Because the path loss-based estimation contains the uncertainties of both shadowing and multipath fading effects, the method overestimates the transmission power, and the outage probability is much lower than the desired value. On the other hand, the performance of the proposed method nearly equals the desired value. Here, we can confirm that the outage probability in the proposed method slightly exceeds the setting value when the value is small. This is due to the following reasons: • Uncertainty of estimation of the tail in the empirical CDF. Although achieving a high packet delivery ratio requires accurate information of the tail, it is difficult to obtain such information in the empirical CDF with limited samples. • Limitation of the maximum transmission power.
In order to keep the permissible outage probability under the above conditions, we set the margin to the transmit power. The dotted line in Figure 11 shows the outage probability with margin and Figure 12 shows the average transmission power including setting margin. In addition, for the region where the proposed method does not satisfy the desired value, the margin that satisfies the desired value is separately evaluated and its performance is plotted with the dotted line.
The proposed method can achieve the desired value with a small margin of less than 1 dB and, even if the margin is taken into consideration, the average transmission power is lower than the path loss based method. These results mean that the proposed method can achieve the desired communication quality with lower transmission power: spectral efficiency can be improved.

Conclusions
The measurement-based radio environment databases were proposed in the V2V communication environment. The evaluation results showed that the proposed database accurately estimates the radio environment fluctuations compared to some path loss models including the measurement-based path loss model and ITU-R P.1411. We conclude that the measurement-based radio environment database can be used for predicting radio environment characteristics in not only conventional applications where the transmitter is fixed, but also V2V communication systems.
In addition, we have shown that the proposed database can improve power control in the unicast situation, and the proposed database will also improve the communication efficiency in V2V systems.