BSM-Data Reuse Model Based on In-Vehicular Computing

: Basic Safety Messages that are frequently generated from multiple connected vehicles can play a primordial role in providing transport data see credible and reliable information they contain. Otherwise, when considering the way Basic Safety Messages (BSMs) are treated, multiple deﬁciencies prevent the latter to be capable of constituting a precious data source. As we know, data become more useful the more widely are used, which is the exact opposite of what happens with the BSMs that exist only temporarily, used locally, considered disposable, and are never stored. In this paper, we introduce a data reuse model that retains collected BSMs, stores, and processes them inside the vehicle constituting a continuous data source holding retained snapshots along the roadway. Our model provided a primary data source available on a large scale, considered to be a worthy dataset for machine learning tasks, capable of visualizing different trafﬁc-related indicators to enhance analytics and support decisions-making. In the study case, we set up an in-vehicle data platform, where we achieved an 80% of BSMs size reduction and provided a rich set of APIs to serve applications. We also adopted the Artiﬁcial Neural Networks (ANN) as an information processing paradigm for performing trafﬁc volume prediction, where the obtained results have reached over 99% of accuracy.


Introduction
The increasing volume of traffic that cities currently face is associated with many unpleasant phenomena, such as accidents, time delays, emergencies, as well as high pollution and degradation of life quality. Nevertheless, with the digital age constantly moving forward, a revolution in the transportation network is being spurred by advancements in communication technologies. Therefore, governments, academia, and industrials making tremendous efforts have made advancement to reinforce the evolution of the commonly named intelligent transportation system (ITS). Meanwhile, cars are getting increasingly intelligent. A contemporary car comprises over 20,000 components, about 40 microprocessors and dozens of sensors. In addition, an eclectic selection of technologies strive to offer different vehicular communications models known as vehicle-to-everything (V2X). These progresses in sensing technologies are inaugurating new possibilities, such as connected vehicles (CV). As one of the most heavily researched automotive technologies, CV technology aims at introducing improvements concerning safety and efficiency of the transportation system and roads. It then has the possibility to ameliorate ongoing activities, modify transportation system management,

Related Works
If connected vehicles are to form the center of future intelligent transportation systems, one of the most glamorous areas that presents many challenges as we progress is transport data sources. There is no doubt that data have great potential and power, and data are the blood that will run through the veins of connected vehicles [8]. Connected vehicles' data can provide the ability to meet the modern interests of drivers like for instance "What is the best fuel-efficient route" and go beyond the classical inquiries about shortest-distance or earliest arrival route [9]. In this respect, the increasing focus of the ITS research community towards the topic of data in connected vehicle environment has led to a bustle of works introducing different approaches to the problem of data collection.
Indeed, data collection that is based on the recruitment of connected vehicles is recently studied of [10], where an optimal set of vehicles are required to be identified and recruited to carry out an urban data collection in favor of service provider to facilitate users on city-streets. This incentive-based method named REVERS has exploited game-theory to fairly and optimally select the best vehicles under desired coverage, redundancy, and quality requirements. Similarly, to perform urban sensing for a desired coverage, given a limited budget, the authors of [11] proposed the recruitment of high reputation users. Additionally, the authors of [12] have focused on how to maximize message coverage in urban vehicular networks by the optimal deployment of roadside units. In their proposed message coverage maximization algorithm, namely MCMA, the authors have considered traffic stream and delay constraint of applications to attentively identify the appropriate site for RSUs. Some other studies have been conducted on connected vehicles data transmission issues. Accordingly, the authors of [13] have attempted to improve the cooperative data dissemination performance, where the authors considered the cooperative data scheduling decision in hybrid (V2I) and (V2V) vehicle networks as a maximum weighted independent set (MWIS) problem. Subsequently, by solving the MWIS, the goal is to maximize the number of vehicles that retrieve their requested data. In the same way, the work by [14] has dealt with connected vehicles related data distribution tasks, where the authors designed an infrastructure to enable large-scale message delivery by labeling and customizing unstructured data into topics in order to serve a wide range of consumers. Authors of [15] have attempted to cope with the problem of delay-tolerant connected vehicles data traffic delivery. They introduced an architecture where delay-tolerant traffic is offloaded from the data networks to the connected vehicle networks, without extra infrastructure or hardware deployment. The authors have also proposed a distributed data hopping mechanism to allow delay-tolerant data routing over CV networks. In another paper [16], for large-scale vehicular content distribution in urban areas, the authors came up with the idea of deploying a multitude of wireless buffer devices on the roadside, namely roadside buffers (RSBs), to extend the distribution of locally content to vehicles in the urban area. The work by [17] discussed the possibility of substituting RSU by city buses widely distributed in the city area to improve intra-cluster BSM dissemination. The authors have also proposed an allocation mechanism for intra-cluster message distribution. When considering the problem of vehicular datasets, authors of [18] mimicked the ordinary daily road activity of a 400 km 2 region and generated a realistic artificial vehicular mobility dataset.
Regarding aforementioned studies, it is important to note that data collection approaches remain ineffective in a dynamic environment, since they are not based on stable criteria for the selection of candidate vehicles that should be important and available to meet the relevant user interests in the network. In addition, they are not scalable data collection for a given coverage and budget constraints [11,12]. Despite that the authors of [10] have adopted the Information-Centric Networking (ICN) concept, in their work, the information importance remains location-based, depends on the content popularity and computed through the observation of the amount and frequency of user interests it received, where they neglected whether the receiver is satisfied or not with this content. Moreover, there is no proper metric to classify and identify the participants' eligibility that has been used by the authors of [11]. In [12], the proposed deployment strategy can notably be influenced by large-scale mobility model and road layout.
In regards to data transmission works, the centralized model presented in [13] is limited to single-hop V2V communication and do not match by the SDN concept of "logically centralized" control in a distributed network. In [14], although the authors have respected the recommended latency requirements for CV applications, they considered in their experimentation latency that only includes the delivery time of a message and not the processing times, such as the time required for aggregation and complex data transformation. The proposed infrastructure presented in [16] is not extendable to any network size, so it is not capable to support Internet-enabled content distribution. Though the mobility model and road vehicle density can deadly influence the intra-cluster message dissemination, they have not been considered in the clustering process in [17]. The authors of [18] have described the generation process and outlined the impact of the generated dataset on the simulative evaluation of vehicular networks. Nevertheless, a real-world dataset remains necessary to allow for a more rigorous validation of the mobility models.
On the data side, not all previous work used realistic, reliable, and large-scale data sources, such as BSMs; each system instead has defined its proprietary data formats and collected its required data to provide its services. Consequently, different applications in these systems are not able to complement each other by reuse and sharing data, as they are unable to understand each other's data. This hinders cross-application data reuse and optimizations (e.g., to reduce data traffic by reducing redundancy).
Besides, advances in vehicular communications technology are making content sharing within vehicular networks more effective and increasingly more popular [19]. Some other works have explored the potential of interworking between DSRC and cellular network technologies for efficient V2X communications in favor of data sharing. The work presented by [20] has investigated the possibility of leveraging DSRC and cellular interworking for successful V2X transmissions and examined possible DSRC and cellular combined architectures. Another study in [21] has taken advantage of the potential of V2X communications based on LTE technology to introduce a device-to-device content-sharing approach. V2V and V2I link planning take into account both data diversity and link quality.
As can be seen, notwithstanding that connected vehicle data have recently got in the limelight of the research community, many of the previous and on-going researches have focused on the concept of data collection and sharing, yet the reuse of data has not been in focus. Notably, BSM reuse, which is of salient importance, has been disregarded in previous researches. In addition, the realities of data reuse are not yet straightforward. Some of the fundamental issues are technical, from identifying what datasets are available and the size of that data, to the wireless connections suitable for transmission.
In fact, not only BSM is particularly used for the safety purpose, but it becomes outdated, useless, and deleted From the first use. However, on the one hand, arguably helpful, safety, mobility, and efficiency are not a separated aspects. Obviously, traffic accidents can breed traffic congestion and increase CO 2 emission. On the other hand, although BSM was initially limited to the safety purpose, it has the possibility to be used outside safety. If BSMs data collected from multiple connected vehicles are be cached, grouped, stored and widely diffused, they surely help boosting safety, decreasing fuel consumption, reducing traffic jams, as well as facilitating people's travel overall [6]. By way of illustration, a BSM reporting a risk is locally used by safety application. However, at the broadest level, the same BSM can be seen by efficiency applications as an input to estimate alternative paths.
Conversely, as far as we know, our work is the first to consider the reuse of BSM, which is a structured dataset to provide reliable, useful, and wide-scale data sources for connected vehicles. This BSMs reuse approach can provide data source to a variety of data-related services that support multi-modal transportation applications, not only for the safety purpose, but also efficiency and mobility.

Connected Vehicles and Data Availability
Connected-vehicles solution is of the many technological innovations currently jostling for attention. This new technology has revolutionized the automotive industries, and built the cornerstone of the internet of vehicles. According to [22], in 2020, the internet will be integrated in around 90% of modern cars, whereas it was integrated at less than 10% in 2013 and that certainly can help supporting next generation intelligent transportation systems. Vast research work and various industrial efforts have accelerated the achievement of connected vehicle technology. Different countries, such as the U.S. (in California, New York, Arizona, Florida, and Michigan), China, Germany, the U.K., and others, have established connected vehicles testbeds and pilot programs [23,24]. That is why the latter is commonly regarded as an area of development where applications find prosperous ground in the IoV epoch. By way of example, Figure 2, summarized the main categories of connected vehicle applications in the U.S. Indeed, great endeavors are being made by researchers towards innovative and cost-effective vehicular applications. Moreover, several applications that are proposed or under investigation are mainly related to safety, mobility, efficiency and infotainment, similar to emergency warning, traffic management, and weather information. The two most important aspects for connected vehicles to succeed technologically are then, first, numerous data need to be collected from diverse systems and sources. Second, these data should be treated and widely diffused through various communication technologies, such as DSRC, WiFi, 5G, and cellular.
Against this background, the general understanding is that better availability of data source is of the utmost importance to feed the plethora of connected vehicle applications and provide an intelligent transportation system management.
As the main thrust behind the connected vehicles is traffic safety, the BSM was initially designed to be the main message used by safety applications to shaire data among connected vehicles. Being considered as "heartbeat" messages, the BSM data tend to constitute the overwhelming majority of the CV data. Unfortunately, this valuable CV data source remains restricted in context, time and space.

Vehicular Data Representation
Vehicles are getting more and more intelligent. An average car today contains more than 20,000 components, including about 40 microprocessors and an important set of embedded sensors that can number up to 200 sensors per vehicle in 2020 [25]. Modern vehicles hinge on these considerable sets of sensors in order to generate and exchange vehicle motion and status data. Thereby, in a connected vehicle scenario, a rich data source is the vehicle itself. Nevertheless, most data that are generated by a vehicle are primarily of a technical nature; differ from carmaker to carmaker, and even within carmakers, from model to model. When considering the connected vehicles' technology, which aims at sharing some of these data with third parties, a variety of data representation known as messages sets have been proposed to support interoperability and enable data exchanges among connected vehicles network. The Society of Automotive Engineers (SAE) has developed the J2735 standard, which specifies a Message Set Dictionary, explicitly to support interoperability among applications based on the DSRC [26]. The SAE standard J2735 defines approximately 150 standard data elements and 70 standard data frames and describes 15 types of application data messages sets listed in Table 1. A message is a combination between two structures named data frame and data element. The data frame is a complex data structure that contains one or more data elements and even other data frames. As stated in [27], among the fifteen messages described in the J2735, the BSM is considered to be the more important.

Basic Safety Message Data
Connected vehicle safety application are greatly dependent BSM to exchange the core data that describe vehicle status, position, and motion among vehicles, as well as between vehicle and Infrastructure. The BSM has been designed with two parts (see Figure 3, for the format of the BSM). The Part I contains the core data information and is transmited regularly. The second part consists of other data elements that differ according to the vehicle model. Table 2 groups Part I data elements. This content presents the official data element and data frame terminology from the standard. The Acceleration Set4Way and VehicleSize items are based on data frames, and the remaining items are based on data elements [27].
In this paper, we focus on study and consideration of BSM as an original and affluent data source. BSMs that contain position and motion data, state information of the vehicles (e.g., latitude, longitude, elevation, heading, speed, acceleration, lights, brakes, wipers, time-stamped, path history) exist only temporarily, used locally, and are never stored.

Problem Specification
BSMs frequently generated from multiple connected vehicles can play a primordial role in providing transport data and see the credible and reliable information they contain. Otherwise, given the way that BSMs are considered and treated, multiple deficiencies prevent the latter to be capable of constituting a precious connected vehicle data source. Therefore, the following features can be listed.
Context: presently, the basic safety messages are particularly even exclusively used for the safety purpose. In contrast, a big bundle of its data elements are necessary for a considerable collection of applications not related to safety. At least, the BSM Part 1 data elements can conveniently provide basic vehicle information required by several applications. Arguably helpful, safety, mobility, and efficiency are not a separated aspects. Obviously, traffic accidents can breed Traffic congestion and increase CO 2 emission.
Validity: BSMs as safety data are overall regarded as snapshot data that give an idea of the state of the system at a definite time. From the first use, the BSM becomes outdated, useless, and is deleted.
Undoubtedly, BSMs collected from multiple connected vehicles can be cached, grouped and stored to construct continuous data streams that can supply almost real-time metrics, while they evolve over space and time. Indeed, data are then exploited in a better-connected form to improve safety, decrease fuel consumption, reduce traffic jams, as well as facilitate people's travel overall.
Range: as is well known, safety applications between vehicles require local broadcast of BSM within the limits of the DSRC. Even though, connected vehicles using applications are required in order to enable an extra-vehicular data exchange to permit collaborative sensing and action at scale. Thus, if transmitted according to different technologies other than DSRC, such as Cellular V2X, continuous data obtained through BSMs processing can widely feed applications with requested vehicle information.

Basic Concepts
To overcome the aforementioned difficulties, we propose a new BSM reuse model that makes use of a three-stages life cycle process. The new model is described below and graphically illustrated in Figure 4. The model that is shown in Figure 4 aims to represent a view of the different stages of the BSM life cycle. Given our primary goal of not constraining BSM use contextually and geographically, the main idea is that no BSM would be destroyed; all captured BSMs should be maintained, processed and reused in different ways to create value from BSMs and deal with them as wealthy data source outside their baseline design, out of safety context and beyond DSRC.

Data Capture
The data capture stage refers to two different parts, the generation and the acquisition of data. The data acquisition serves as the collection of extra-vehicular data in the form of BSM as well as data generation represents the creation of intra-vehicular data through local sensor observations. An additional real-time pre-processing task is also to be conducted in order to classify and filter captured data.

Data Maintain and Processing
When it comes to the maintaining and processing stage, a series of actions would be performed on raw BSMs to model, clean, compress, aggregate, organize, store, and extract data in an appropriate output form for subsequent use.

Data Reuse
The last stage aims at opening up new possibilities for endless reuse of stored BSMs. In actual fact, several data consumers use cases may require different data delivery types. For example, a safety application or an emergency vehicle service may require a real-time dataset when an accident takes place. Contrariwise, a data analytics company might opt for historical car data in order to understand traffic trends. This stage relies on different data delivery and visualization methods to cater to these different use case requirements as well as for the purpose of knowledge production.

In-Vehicle Computing: Advantages
Vehicles are getting more intelligent and well equipped. Emerging intelligent vehicles will possess sufficient storage and computing resources to perform tasks locally, thus reducing the network load and delays. Contemporary vehicles are capable of owning a computer inside, which is an industrial edge-computer that is designed to sustain the rigors of vehicular environments while capturing, storing, and analyzing data from various sensors and devices required for Intelligent Transportation System applications. In-Vehicle Computing will then become paramount to substitute the classical Vehicle Cloud Computing (VCC). Table 3 highlights the differences between IVC and VCC according to different features. In addition, the key advantages of IVC can be summarized, as follows.
A Storage: ncontrary to the centralized topology, the IVC permits the data storage inside the vehicle in the vicinity of their source of generation. This provides timely access to stored data and decreses the remot storage load. B Bandwidth: in the era of the connected vehicle, the amount of generated data is growing explosively and the content demands will further become varied. When considering the distance from users in centralized topology, cloud computing cannot assure the bandwidth requirements for delivering and remote processing of such a large amount of data. By mounting the computation and storage resources on vehicles, IVC is able to properly mitigate the high-bandwidth pressure. C Response Time: processing time with delivery time togeather represent the response time. In the case of centralized topology, the response time is considerable due to the delivery delay. In our decentralized topology IVC, the mounted computer as processing units is inside vehicle. Thus, the took responding time is significantly less, which enables connected vehicles to respond with more efficiency, better service, and further innovation through new applications. D Contextual data: in decentralized topology, users are able to obtain real-time information related to the behavior and location of vehicles, traffic conditions, network environment, etc. Accordingly, different applications would be improved. For instance, real-time information can be delivered to various vehicular users in accordance with their interests. In accordance with the foregoing and taking advantage of the emerging Vehicle-mounted computing technology, we propose our IVC-based model, which relies on data storage, and processing inside the vehicle and that can definitely help to address the costs of bandwidth and enable more efficient real-time applications that require fast processing and response.

Architecture Design
In our architecture, an in-vehicle computer serves as an edge computer and permits data to be processed and stored close to its source. The captured BSMs do not need to travel across the roads to a central data center, as it would in a traditional cloud-based architecture; nevertheless, the speed remains considerably faster, maintaining the latency much low.
As illustrated in Figure 5, a vehicle performs the aforementioned three-stages, as follows. At this first stage, to manage the data transfer process, we adopt our previous Request-To-Receive approach [5] showed in Figure 6, to address the blinding exchange of BSMs and reduces the average number of collected Data Element. Performing this type of validation early on has a positive impact on the bandwidth capacity.
A Categorization: further, we introduce a new concept of 'data temperature' to categorize the raw data captured accordingly. Hot data represent real-time BSMs and necessitate real-time processing to be more beneficial (i.e., less than a second from receipt to action). Hot data are also cached in a database shown in Figure 5, with red color. Cold data denote offline BSMs and are stored in blue database. Flextime processes can operate with cold data that have been stored. Hot data are simultaneously delivered alone to safety applications and to the storage function along with offline data. B Filtration: without any type of filtration, vehicles could easily get flooded with data. Data filtration addresses the issue of uninformative content of received raw BSM. The non-informative content can be real-time detected just by checking whether received BSM holds new data or not. If it is the case, the BSM is stored, otherwise, it comes to non-informative BSM that will be discarded. For instance, if a vehicle travels at more-or-less the same speed and heading during the entire trajectory, the data will essentially unchanged, then there would most probably be no loss of information. Therefore, a proper two-steps algorithm is performedin order to fulfill the categorization and filtration tasks. The algorithm will save valuable information and discard the rest.
Step 1: the designed algorithm examines the data element DSecond to know whether the message is possessing a real-time or offline data. The period of the message transmission is supervised by the DSecond data element. The later provides a time value when a BSM is populated with data there may be a lag between the time the data is collected and populated in the BSM. BSMs are then grouped into two categories: hot and cool. Each of these two groups of data will require different kinds of processing and storage functions.
Step 2: the algorithm inspects the value of two data elements TemporaryID and the MsgCount of every hot BSM. It discards the BSM having old content but simultaneously notify local safety applications and storage each BSM having new content.

Data Maintain and Processing
Maintaining the captured BSM is the main task in our approach focused on preparing data for analysis and further reuse. It refers to data storage, modeling, reduction, and aggregation.
A Data storage and modeling: Messages, Frames, or Elements? After capturing the raw data, there is a requirement to transmit the data to suitable in-vehicle data storage systems for further processing and reuse. Accordingly, consideration should be given to how data are stored.
Referring to the data is received as messages. The messages contain frames and the frames contain data elements. Connected vehicles and other related system functions usually require the use of data elements, but each element needs a location in time to make it useful.
• Messages: storing the data as messages would demand any future use of this information to inquire and access data elements across multiple messages. Furthermore, these messages will contain multiple elements that are not used by the function and may accordingly be inefficient to be accessed, used, and/or transmitted. • Frames: to store the data as frames alone would not suffice, since many of the data elements are not necessarily in a frame. • Element: if data are stored as elements alone, then any information concerning the association between data elements is lost. For example, if a BSM is divided into its elements, relationships between windshield wipers activation and temperature in a particular vehicle would not be known. However, such associations between data contained in messages generated by an individual vehicle can be generally accommodated by using the temporary ID assigned to messages sets to associate data for access or use. Because the association between data with one vehicle can be accommodated, it may be beneficial to store the data as elements. Overall, storing BSMs as elements that are associated with each other would allow each function or request to obtain information to only access the data elements it requires. Accordingly, we propose the relationship modelization that is shown in Figure 7.
This approach allows for elements of the same type from different messages to be grouped together. For example, if the vehicle is responding to an inquiry from other connected vehicle requesting weather data, it would be able to read elements such as temperature and windshield wiper activations from a single query, even though the data had been retrieved from multiple messages.
B Data reduction: data reduction is responsible for decreasing data storage requirements and communications bandwidth. A range of data reduction techniques may be appropriate to allow our model to minimize data storage requirements and communications bandwidth.
1 Compression: BSMZip (lossless compression for basic safety messages) Multiple compression techniques, including both lossy and lossless, could be applied to connected vehicle data. Lossless compression ensures data integrity and is more suitable for BSMs data.
In our model, we consider the application of run-length encoding (RLE) on a stream of BSM data. To the best of our knowledge, it has not been applied yet for data handling in the automotive domain. 2 Aggregation: different data aggregation strategies are appropriate to allow our model to perform a wide range of data processing, summarization, and display. Our model contains complicated aggregations on particular data elements, geo-fences, and some parameters for particular routes and areas of interest for end-users. Some of the connected vehicle applications may need to adopt a geo-fencing technique to help with limiting the data to be exchanged. This aforementioned technique defines an area of interest by drawing a boundary on BSMs, inside this defined area, specific data processing function can be accomplished. For instance, a speed detector on a highway may be a rectangle covering all lanes, in which any BSMs may have this speed qualified for further processing. Moreover, contextual aggregation of stored BSMs is performed using our previous ALFA scheme [5] to open up new possibilities of using BSMs outside the safety context.

Data Reuse
Next to the data maintaining function discussed above, access to data is a critical enabler for the efficient and wide reuse of BSMs into a multimodal transportation system. Hence, many data consumer use cases that may require different types of data delivery should be considered. Our model provides several data formats and it relies on different data delivery mechanisms to cater to these different use case requirements.
A Data reshaping formats: we can imagine and design a bunch of solutions such as a suite of APIs, portals, and apps to turn the passive stored BSMs into an active and actionable dataset and make every data element count. Hence, to support that and make this valuable dataset available for sharing and consumption, we count on a handful of data formats, as shown in Figure 8, including JSON, XML, and CSV in order to provide standard data that developers, systems, and applications can easily reuse. Neither type is better than the other, we simply provide the ability for developers, systems, and applications to select the one that meets their requirements the best. B Data delivery:as we mentioned earlier, our model serves two types of vehicle data, hot data that refers to real-time data and cold data represented in offline data. Data delivery is a sort of service that allows for different transport agents to re-use the stored BSMs across the following methods.
1 Streaming: usually, hot data are better served using a push mechanism, which ensures minimal delay and packet loss. Besides, streaming is the ideal delivery mechanism for applications that require hot, rich, vehicle data. As a means to guarantee optimal and rational streaming, we count on our previous RTR approach that permits requesters to determine filters, like Data Element list, geo-fencing, and maximum latency, so they timely only get their desired data. 2 Data Query: we rely on this retrieval technique to open up to different data consumers' the possibility of making requests on our database to obtain desired data. Data query is, therefore, a pull mechanism to provide hot or cold data by having data requests.
C Data analysis: another way to reuse stored BSMs is by the application of emerging data analytic methods like machine learning. Machine-learning techniques can inspect our BSM dataset and make possible the patterns recognition (like real-time vehicle traffic, and driver behavior different road traffic conditions), decision-making, and/or future trends forecasting. D Data visualization: from another perspective, the BSM dataset can feed various data visualization tools and techniques to provide monitoring data about vehicles on the road and support decision-making to significantly improve the efficiency of transport system operations. For instance, it is practicable to visualize different data elements according to time and geofencing limits.

Study Case: In-Vehicle BSM Data Services Platform
In the interest of delivering quick responses to end-users and enabling rapid storage and real-time data analysis, which is a vital feature for connected and autonomous vehicle applications, we carry out a real-world implementation of a vehicle data platform that is based on the reuse of collected and stored BSMs. Taking advantage of this available valuable data, In our study case, we only focus on the second and third stages of our proposed model. We also consider data reuse as a new other consumption of stored data in different cases through multiple scenarios.

Platform Implementation
A Vehicle server: Hardware To implement our data services platform we mount a laptop as an in-vehicle computer along with a 4G LTE Wi-Fi Router.

Data Preprocessing
A large amount of BSM data is accessible on the Safety Pilot Model Deployment Data (SPMD Project https://catalog.data.gov/dataset/safety-pilot-model-deployment-data), carried out in Ann Arbor, Michigan. The field test includes 75 miles of instrumented roadway. Approximately twenty-six roadside units (roadside equipmentl-RSE), which are capable of communicating with appropriately equipped vehicles, and devices via DSRC, were installed throughout the network, as presented in Figure 9. Approximately 3000 instrumented vehicles participated in this study. The vehicles include light/passenger vehicles, heavy/commercial trucks, and busses. We construct our relational database while using this rich and real-world connected vehicle dataset available under comma separated files (.csv) files. The result of our analysis on the available BSM file that is shown in Table 5, illustrates some of the summary measures that were populated with data collected on 11 April 2013. Most of the data elements in this dataset are collected at a frequency of 10Hz. This frequency results in a number of the tables being very large, restricting the tables' ease of use. Looking to the "No. of Rows" column in Table 5, we can easily notice the huge amount of just a One Day generated data. Using the aforementioned dataset, we could observe that: most of the time, the majority of vehicles continue straight ahead at the same speed. As a result, if we take into account that a basic safety message is broadcast at 10 Hz, Most of the data elements will maintain their values during normal traffic flow. As shown in Table 6. If we consider a time period of two seconds with 20 messages, the only susceptible change for most of the two-second fragment of data is the position (longitude and latitude). Most often, no useful data are provided by messages 2 through 19. Despite that, it is crucial when a significant modification occurs in the data that the proper application receives it with minimum delay.
Effective opportunities for a significant reduction in storage and bandwidth requirements are evident in Table 6, yet any data element may also have individual compression techniques applied to it. In our work, we perform an individual compression on speed data element. Data compression methods, like run-length encoding, has exhibited its importance in such cases. However, traditional database systems (i.e., row-oriented databases) do not widely apply data compression techniques. On the contrary, column-oriented databases provide more opportunities for data compression as the values of the same attribute are stored consecutively [28]. Using columnar database like MonetDB, we found that RLE is an attractive approach for compressing data in a column-store. Run-length encoding compresses continuous duplicate values in a column to a compact singular representation. For example, RLE compresses k continuous duplicates whose value is t into one tuple (t; k), i.e., (value, count) pair. RLE is widely used in column-oriented databases, where attributes are consecutively stored and runs of the same value are common.  To apply run-length encoding to a column, the column itself should have the following features: 1. the column is sorted. 2. the fanout of this column is high. The first requirement is easy to understand. If the column is not sorted, then elements with the same values are not grouped together. The second requirement is utilized to measure the average number of duplicates. Only when the number of duplicates is large, run-length encoding can obtain benefits. The definition of fanout is provided in the following. Figure 10, gives an example of applying run-length encoding to the (.csv) dataset collected on April 2013, stored into MonetDB database and initially contain more than 1.9 × 10 rows. The size of the original dataset is 219,990,043,384 bytes (219.99 GB), while the size of the encoded dataset (the standard run-length encoding) is only 43,998,008,676.8 bytes (43.99 GB), which is only 20% of the uncompressed one. Using the run-length encoding data compression method, we succeed in reducing the size of the bandwidth and storage that are required for BSMs by about 80 percent.

Data Delivery
This functionality provides a rich set of APIs to serve not only safety applications, but the different needs of other connected vehicles applications' (mobility, weathers...). To build up a RESTful Web Services with Spring Boot, Kafka, and Postgres, we first downloaded a Marven project from spring initializer page: (https://start.spring.io) shown in Figure 11. Subsequently, we inputted the downloaded project into our Java IDE and commence configuration.

Data Sets
We consider a Data Set as a logical storage of related vehicle data elements. Each data set has its own related API endpoints and data elements. The available Data Sets are: • Points data set: holds kinematic vehicle data. This data consists of data points which are time-stamped vehicle records that contain single or multiple vehicle data elements available like location, speed, etc. points data set are generated from Data Elements of the BSM Part one File. • Trips data set: contains calculated vehicle trips. An algorithm is used to detect trips from points data. Each trip include details, such as trip start and end times, total trip distance, and location. Also captured in the trip summary file is the distance driven while the vehicle speed was greater than 25 mph. This data element is of interest not only because it further details the trip, but also because it provides a sense of the conditions under which data, for a particular trip, were collected.
The trips data set table contains 11 fields. Table 7, summarizes a list of these fields and a brief description of each where Table 8, provides a few summary measures of the trip data set table from 11  April 2013, and Table 9, provides a 10-calculated trip sample from the trip data set file.

Data Delivery Methods
Diverse data consumer use cases may require different data delivery types. For example, an emergency car service may require a real-time event when an accident takes place. On the other hand, Usage-Based Insurance may pull a car's odometer once a week. Lastly, a data analytics company might opt for historical car data in order to understand traffic trends. Our data platform provides different data delivery methods to cater to these different use case requirements. Table 10 summarizes data  delivery methods, where Table 11 summarizes all available historical data APIs.
• Streaming: a 'push' mechanism that continuously streams Hot data to a Data Consumer.
Streaming uses HTTP POST requests and can send both aggregate and simple data elements. A stream is created by subscribing to a stream. Stream subscription defines one or more data filters such as desired vehicle area (i.e., city), maximal point latency, etc. Streaming is optimal for applications that require real-time, rich, vehicles data. • Historical data reports: multiple format reports, which contain Cold data. Historical data reports are triggered by a RESTful API call with parameters that define a region (e.g. city) and time span for the report. Report generation may take minutes up to hours to complete. Several historical reports exist for different data elements (e.g., speed, break. . . ,).
• Events: an event is defined by a logical rule on one or more data elements. When a rule is set to true, an event message is launched and sent to the data consumer. The system "remembers" that an event has been sent according to a specific rule and will only send it once. Using a Braking pressure data element, an example event maybe a maintenance application which gets an event whenever a vehicle traveling at a certain radius from a maintenance station, crosses a 200-bar braking pressure level (knowing that Higher pressures are not likely. The maximum pressure that the caliper can withstand before breakage is in the range of 250-300 bar. Events are a great way for applications to save processing power and network bandwidth and only get the data they need in real-time.

Simulation and Discussion
To check out how the API is getting vehicle data behaves, we simulate vehicle trips according to the following steps: • Step 1: Creating a trip: the first step in our simulating trip data is to configure a route. The starting and ending location may be anywhere. However, our available dataset is limited to Ann Arbor, Michigan, as illustrated in Figure 12. Our trip runs 12.67 miles from the start point at: "S State St, Ann Arbor, MI 48108, USA" to the endpoint at: "M-14, Ann Arbor, MI 48105, USA". Immediately after we inputted the starting and ending location, we will notice in Figure 12, that the map will provide a visual representation of the driver's route. • Step 2: Configure Events: we may select different events to simulate during each test run.
The events might occur a couple times within the test but will always occur at least once. At present, we have three available events shown in the Figure 12, bellow. • Step 3: Run the Simulation: as depicted in Figure 13, once the simulation begins, you will automatically see data coming in every 3 s within the "Point Dataset" tab located on the left side. The vehicle will progress within its route and indicate the location of the driver. Once an event takes place, a small circle will appear on the map. Additionally, the Point dataset timestamp will have a red dot next to it to indicate an event took place at this time. As To see a list of all the events that took place during the simulation, press the "Events" tab located on the left side.  Previous photos (Figures 12 and 13) show the Importance of maintaining and making BSM data available through multiple delivery methods. Hot data are real-time delivered over a continuous streaming mechanism, while cold data are diffused at regular intervals to increase their value and utility. Undoubtedly, cold data may not be valuable for traffic safety applications as collision avoidance applications, but it may be useful in other applications, such as those that are related to road planning. The following gains can be achieved to name but a few: 1 It gives data consumers the possibility of remotely reuse BSMs data. 2 It also allows developers to begin developing their applications without actually having any connected vehicles. 3 Real-time traffic information and navigation services and apps can use APIs to highlight areas of congestion and help drivers to find the fastest routes.

Data Forecasting
To take our model a step further, we suggest applying a machine learning (ML) method on our database to perceive whether preserved BSMs can help the road traffic volume prediction (TV). Our study applies the Artificial Neural Network (ANN) approach to predict the traffic volume while using past BSMs data. We deal with the following basic steps of the ML: get the data from database, prepare it, choose a model, train it, evaluate it, export it, and make the predictions available for use. Our development environment has relied on TensorFlow as the framework. For the generating part, we made use of Python, with Jupyter Notebooks, and for the prediction serving part, we adopted Java, while using Spring Boot.

Data Extraction
The SPM model deployment was conducted in Ann Arbor, Michigan. The field test includes 75 miles of instrumented roadway. Our dataset sample covered the amount of five days traffic activity that occurred over 6 h, during the period 6:00 a.m. to 12:00 p.m. (including congested and smooth traffic regimes); across a distance of 11.2 miles, as shown in Figure 14. Vehicles that participated in this study include light/passenger vehicles, heavy/commercial trucks, and busses. Based on DE_VehicleType data elements, vehicles were classified in three main categories: Passenger car, Bus, and Trailer.
Traffic volume can be determined by counting the number of vehicles that cross through a point on a road segment at a specific time and denoted by vehicles per hour (v/h).
We manually calculated the TV using the Position data frame by counting the number of vehicles traversing the road segment illustrated in Figure 14. It is noteworthy that we did not examine the Heading data element because we took into account both directions. Data extraction displayed in Table 12, was performed in 5 min. intervals.

Model Development
We performed the traffic volume prediction using the Multilayer Perceptron (MLP) artificial neural network which is a feedforward ANN. This kind of ANN relies on backpropagation for the training stage. It has multiple input layers connected as a directed graph with output layers. Analyzed dataset contains the following features shown in Table 13: 1-Date, 2-Time, 3-Number of Passenger car, 4-Number of Bus, 5-Number of Trailer, 6-Average speed of Passenger car, 7-Average speed of Bus, 8-Average speed of Trailer and 9-Traffic density. The whole inputs are significant except for the AST input, as is shown in Table 13 and is observable in Figure 15.
Generally measured in units of vehicles per mile (v/m), the traffic density is referred to the volume of vehicles on a road fragment. As a preprocessing step, we first randomized our dataset and then divided it into three sub-sets. The first sub-set represented 10% of the whole dataset and it was used for training. The second sub-set was taken for cross-validation in the ratio of 10%. The remaining part that represent 80% of the dataset was used for testing purposes. Distinct architectures of ANN have been designed to determine an efficient network. Table 14, presents several ANN models that were prepared to train on the dataset. The testing data sets along with cross-validation were used to identify the performance of every ANN architecture. As well as, to evaluate the predicted results, other parameters were exploited, such as Normalized Mean Square Error (NMSE), the coefficient of correlation (r), and Mean Absolute Error (MAE). The desired neural network is that of three hidden neurons, according to Table 14. Therefore, Figure 16 shows, in detail, the ANN structure used in our work.

Results and Discussion
The linear correlation between the predicted and real volume values that are illustrated in Figure 17, as well as the output summary of the qualified ANN test stage presented in Table 15, confirm that our developed model gives accurate results. The scatterplot that is shown in Figure 17 determines the solid relationship between predicted traffic volume (Y) and real traffic volume (X) confirmed by the correlation coefficient value very close to 1. This can be explained by the fact that the predicted traffic volume values are highly fitted to the real traffic volume values. Bearing in mind that our primary goal one hand and considering the good value of the coefficient of correlation and very small errors, on the other hand, it can be argued that our developed ANN has successfully reused stored BSM as past data during training, cross-validation, and testing stage to accurately predict future traffic volume despite for a short time. Thus, stored BSM can serve as a rich dataset for machine learning in the transportation field.

Data Visualization
Data visualization involves visually displaying information to present a point or perspective on specific data. Making the development of a web-based visualization platform easier and faster pushes us to choose some suitable development frameworks. Adopting the design philosophy of Model-View-Controller (MVC), we set up our data monitoring service by the integration of MyBatis at the Data Access Object (DAO) level and Echart 4.8 as the front-end visualization controller with Springboot.
Echart is an open-sourced, web-based, cross-platform framework that has a powerful function, friendly interface, excellent performance to enhance data visualization. This excellent development tool for front-end developers can help to present huge amounts of data to users in a very appropriate way, and users can analyze valuable information through charts. Using Echart, we can visualize a bunch of information based on data elements and data frames or according to time and geofencing technique.
BSMs are stored according to data elements and Data frames data, as shown in Figure 7. An overview of different elements visualization is shown in Figures 18a,b and 19a,b). This element visualization provides a ready means to tell stories from the atomic data as well as provides us with analysis at various levels of detail. The red color in Figure 18b gives an idea about speedy vehicles over time in multiple geographic areas, which allow executives to drill down into specific locations to see what is being done well or poorly.

Conclusions
Given our primary goal of not constraining BSM use contextually and geographically, in this paper, we introduced a new philosophy that aims at conserving collected BSMs and adopting the In-Vehicle Computing paradigm in order to create a reliable and useful transport data source. We then proposed our new BSM reuse model based on three-stage process. In its first stage, our proposed model captures generated and acquired BSMs. Then, in the second stage, it would perform a series of procedures on the raw BSMs to be storable according to the proposed model. In the third stage, our model aims at opening up new possibilities for the endless reuse of stored BSMs.
Later in our study case, we built an embedded data platform accrediting the Model-View-Controller design commonly used for developing user interfaces. This new platform has accomplished several purposes of data reduction, delivery, and visualization. We were able to perform lossless data compression and considerably reduce the data size; the thing that has a positive impact on bandwidth and storage requirements that have been reduced by about 80%. We have also achieved different data delivery according to the Pull and Push mechanisms to cater to the different data consumer use case requirements. Adopting the ANN paradigm, we obtained an accuracy of 0.9988 in carrying out traffic volume prediction. We attained the visualization of some data elements to enhance analytics and support decisions-making for transportation.
Our work bears certain limitations that should be recognized. First, the proposed model was partially developed using a pre-collected BSMs, forthcoming works should focus on the data capture stage. Second, our in-vehicle platform has been restricted to an isolated vehicle far from real-world traffic, so the number of vehicles, as well as real-world traffic conditions, should be considered in future work. Third, our in-vehicle platform was tested in a private wide area network by limited users, further works are required to correct this deficiency by making our platform publicly available.
Author Contributions: K.B. was involved in all parts of the study, including conceptualization, methodology, software, investigation, visualization, original draft preparation, reviewing, and editing. S.B. was mainly involved in conceptualization, interpretation, and discussion of results, reviewing, and editing. A.M. was mainly involved in supervision and validation. All authors have read and agreed to the published version of the manuscript Funding: This research was funded by the University of MEDEA with the LESIA laboratory of the University of Biskra.