GIS Mapping of Driving Behavior Based on Naturalistic Driving Data

: Naturalistic driving can generate huge datasets with great potential for research. However, to analyze the collected data in naturalistic driving trials is quite complex and di ﬃ cult, especially if we consider that these studies are commonly conducted by research groups with somewhat limited resources. It is quite common that these studies implement strategies for thinning and / or reducing the data volumes that have been initially collected. Thus, and unfortunately, the great potential of these datasets is signiﬁcantly constrained to speciﬁc situations, events, and contexts. For this, to implement appropriate strategies for the visualization of these data is becoming increasingly necessary, at any scale. Mapping naturalistic driving data with Geographic Information Systems (GIS) allows for a deeper understanding of our driving behavior, achieving a smarter and broader perspective of the whole datasets. GIS mapping allows for many of the existing drawbacks of the traditional methodologies for the analysis of naturalistic driving data to be overcome. In this article, we analyze which are the main assets related to GIS mapping of such data. These assets are dominated by the powerful interface graphics and the great operational capacity of GIS software.


Introduction
Currently, road traffic is one of the most widely discussed social topics. The continuous growth in mobility is a direct consequence of the increasing number of vehicles, levels of urbanization, and new social trends. Thus, there is an extensive literature dealing with issues related to increasing traffic flows [1][2][3] such as pollution [4], congestion [5], the inefficiency of transportation in some regions [6], and even about how transport systems articulate territories [7], among others. These problems present significant drawbacks with severe socio-economic and environmental costs [8,9]. Although all these phenomena can be observed to be global in nature, there are huge differences that depend on the different spatial scales. These are more evident in urban areas, especially in larger ones [10,11], with some signs of diseconomies of agglomeration becoming apparent [12,13]. For this reason, research into intelligent and sustainable mobility systems to meet the challenges ahead acquires more relevance and urgency [14,15].
The present article is structured as follows. A review of major strategies and software tools for analyzing naturalistic driving data is presented in Section 2. Data and study area are introduced in Section 3. The methodology for the visual representation of the data, in addition to some figures, is shown in Section 4. Section 5 lists the most relevant benefits related to GIS mapping. Lastly, a brief discussion about the main aspects introduced herein is presented in Section 6.

Strategies and Tools
The largest naturalistic driving projects carried out are the 100-cars study and the SHRP-2, both in the USA, in addition to UDRIVE and PROLOGUE in Europe [30][31][32]. Most studies focused on analysis of these data have developed strategies for summarizing and/or thinning datasets. Some of them split these datasets into single road sections. Other studies restricted their focus to some specific events or incidents (such as crashes, near-crashes) or areas where some anomalous behavior (such as acceleration and braking actions) was found. For example, Christoph et al. [33] restricted their study area to some specific intersections to analyze conflicts between drivers and riders. Wang et al. [34] studied the occurrence of some incidents on horizontal curves along rural two-lane highways, and Xiong et al. [35] analyzed how some drivers used their cell phone at crossings.
The majority of these studies refrain from analyzing whole datasets themselves, because they are unable to cope with the large size of the datasets. Most previous studies used standard software for mathematical and statistical analysis of datasets by implementing scripts specifically tuned up for any specific task. Lassarre et al. [36] recommended the implementation of algorithms for computing some relevant indicators. For this, they used software packages such as Matlab, Excel, and SPSS. Gatscha et al. [37] developed a software tool for analyzing small datasets that could be integrated into SPSS. Jovanis et al. [38] proposed a system for predicting the decisions of drivers, which was implemented on MLwiN 2.0. Val and Küfen [39] implemented their own software tool in Matlab, while Dozza [40] did the same (SAFER100car) for analyzing the data of the study of 100-cars. A more detailed review of these software tools can be found in Balsa-Barreiro [41].
The literature shows how software tools for analyzing naturalistic driving data implemented within GIS are still quite unusual. In fact, only Gordon et al. [42] considered the relationship between naturalistic driving and GIS, although they simply locate and map some discrete events (traffic hotspots) over time and space.

Data and Study Area
The data presented here were obtained in PROLOGUE, one of the most ambitious attempts to carry out a naturalistic driving trial at European level [43]. INTRAS (University Research Institute on Traffic and Road Safety) conducted the pilot study of PROLOGUE in Spain. This Spanish pilot trial was carried out in the surroundings of the city of Valencia, the third most populated city in Spain, during the months of June and July 2010. In short, five drivers participated for four days each. They drove a fully monitored vehicle, called Argos, for approximately two hours per day.
Around 80 parameters related to the driving performance of the different subjects were continuously recorded using different instruments and devices. Thus, the kinematic data related to speed, acceleration, and the braking force at any time were recorded. Furthermore, cameras also recorded both the interior and the surroundings of the vehicle, including, of course, the driving activities such as how often the driver interacted with any device. All these parameters were recorded at different temporal frequencies, depending on the technical specifications of each measuring device. The parameters related to kinematics were recorded at temporal frequencies of 10 -2 s and 10 -3 s. A more detailed description of the measuring devices used by Argos in the Spanish trial is shown in Valero-Mora et al. [44].
The data plotted in the following figures are uniquely related to the kinematic parameters in specific sections of the V-21 motorway, which links the city of Valencia and Puzol ( Figure 1). This is a route of 15.9 km, which was traveled in both directions. This is the same route as the one considered in previous articles by the same authors.

Methodology and Results
Mapping of kinematic parameters is based on a working methodology different from what is commonly used to map events that are spatially fixed. These kinematic data permit the modeling of how one subject really drives under real world conditions. These data display the variations caused by the surrounding conditions at any time and/or place.
GIS systems provide three basic geometries or entities for mapping: points, (poly)lines, and polygons. All of them are used for mapping our data. However, this procedure is not straightforward and some preprocessing steps are required. For this purpose, a point array corresponding to the road section traveled is initially set up. GNSS and INS devices allow for the coordinates of each point to be obtained. Once a point array is set up over the road section, this can be easily defined in the software as a lineal entity.
The spatial distribution of points with data along the road path (actual point array) is irregular ( Figure 2a) as it depends on the positioning and speed of the vehicle at any time. However, positioning data are not always available and consequently some strategies for estimating these coordinates must be implemented, such as the one presented in [27]. When that happens, we must draw a common road path, which coincides with the central axis of the road.
To represent these data and to perform some geoprocessing operations require that points must be regularly distributed along the road path. For this, we must define a theoretical point array along it (Figure 2b). The difference in the point distribution between both point arrays requires assigning values to each point from the actual scenario to the theoretical one. For this, we interpolate values from the actual point array into the regular one for which purpose we use the algorithm Inverse Distance Weighting, a two-dimensional interpolation function [45]. Due to the points are sequentially distributed along the same lineal entity, the value of each point in the theoretical point array may be computed as: where A refers to the theoretical scenario and B to the actual one. Once data values are assigned to the theoretical point array, geoprocessing operations can be carried out. We use two different attributes for mapping: geometries and colors. These attributes may be used in isolation or in combination. Magnitudes are represented by geometries. These attributes are polygonal entities projected onto one or both sides of the road path. These polygons allow shapes that emerge on the axis of the road to be defined. The distance between the axis and the external contour of this shape allows the magnitude of one certain kinematic parameter to be represented. In this way, some different parameters can be simultaneously plotted by overlapping several data layers (Figure 3a). With regard to the color attributes, tone changes depict variations in magnitude for any parameter (Figure 3b). The aforementioned methodology is illustrated in Figure 4. This road section is located in the first part of the route (study area A in Figure 1). The length of this road path is about 600 m. This road section comprises a curve to the east and an uphill segment with a steep slope preceding the curve. Figure 4 depicts the speed of one of the drivers who participated in the trial. These maps show a continuous variability in speed as well as drastic fluctuations in very short distances. It could be due to some intermittent traffic jams that may have blocked the traffic at that time. However, the shape and slope of these fluctuations, in addition to their frequency, indicate that these were probably due to random malfunctions in some instruments, as described in [41].  Notice that the range scale is adapted to the zoom level selected by the user. In addition to geometries, color attributes help to obtain a clearer and smarter representation of speed values, although they do not offer any additional information. The first subfigure, Figure 4a, plots how the speed values vary by using only geometric attributes. In order to do that, we use a single-colored polygonal shape projected onto the left side of the road path. Notice that the color attribute is only used for aesthetic purposes, not providing any additional information. The magnitude of speed values is inferred in relation to the blue (60 km/h) and orange (120 km/h) contours, which are drawn over points with the same values (i.e., isolines). These speed values correspond to the minimum and maximum legal speeds on Spanish motorways, respectively.
The polygonal shapes plotted in Figure 4b were generated using the tool for multiple buffering. These lines run parallel to the route path. Each of the internal lines represents a difference in speed values of 10 km/h with regard to the previous one. In addition, a discrete palette with 12 single colors is attached to the geometries. Thus, the speed at any point of the road path may be inferred by observing both geometric and color attributes.
Geometries and color attributes are also combined in Figure 4c. Both attributes represent speed values, although in a different way than the previous figures. Here, color attributes are vertically displayed in relation to the road path. The color palette is the same as that which we used in the previous figure. However, in this case, some additional intermediate colors were generated from the degradation effect of the colors located at the extremes. This degradation effect is achieved with a representation based on raster format. The next figure, Figure 4d, plots a polygonal single-colored shape. It was drawn using a buffer projected onto both sides of the road path. The aim of this map is to show an augmented vision of the speed values, based solely on geometries. Thus, the color attribute does not intend to provide any additional information. Finally, Figure 4e uses the contrary effect. In this figure, speed variations are solely based on color attributes, while geometries do not provide any additional information. These colors are vertically displayed in relation to the road path. The color palette is the same as that which we used in Figure 4c, which is based on raster format.

Benefits
GIS mapping of naturalistic driving data shows some advantages and great potentialities at two different levels of detail: (a) macro-, and (b) micro-scale. Among them, the most relevant are: (a) At a macro-level of detail,

1.
The facilitation of a more general perspective of the driver performance allowing standard and non-standard (or anomalous) driving behavior to be derived.

2.
The rapid detection of anomalous behavior, such as fast speed driving, abnormal speed changes, slow and/or wrong reactions to hazardous situations, etc.

3.
The influence of road path on certain kinematic parameters, such as braking, acceleration, and deceleration actions.

4.
The analysis of driving performance under certain conditions related with particular environments, traffic congestions, weather, etc. 5.
The establishment of common driving patterns based on different factors, such as daytime, genre, socioeconomic status, etc. 6.
The implementation of strategies for quality control of data, taking into account that misleading data may appear due to eventual malfunctions in some of the devices [28,29].
(b) At a micro-level of detail,

1.
The achievement of a holistic vision of the driving performance for any subject. Thus, it is possible to analyze in detail how certain maneuvers (related to entry/exit in motorways, over-taking, and interactions between drivers and/or pedestrians) really happen. It is shown in Figure 5, where more data layers are sequentially included from left (two in Figure 5a) to right (four in Figure 5c).

2.
The evaluation of the level of compliance with road signs and the degree of efficiency of awareness campaigns. 3.
The detection of road sections that are potentially dangerous not only in terms of crashes, but also other incidents, such as near-crashes or anomalous behavior.

Discussion and Conclusions
Despite the evident spatial character of most traffic events, limited research has been conducted in road safety analysis to account for spatial correlation [46]. This reveals how GIS tools and geographical analysis still have very limited relevance to, and application on, road traffic. Regarding the studies that used GIS, but not necessarily reliant on naturalistic driving, these were focused on the representation of discrete events related to different traffic hotspots [47]. Aguero-Valverde and Jovanis [46] indicated the importance of including spatial correlation in road crash models at the segment level. They used a full Bayes hierarchical approach with conditional autoregressive effects for the spatial correlation terms. Jia, Khadka and Kim [48] presented a method using kernel density estimation with spatial clustering for explaining the macro distribution of traffic crashes using data related to points of interest. Zou et al. [49] implemented negative binomial models with K mixture components and varying weight parameters for crash hotspot identification. Fawcett et al. [50] proposed a Bayesian hierarchical model for predicting accident counts. All these studies apply different methodologies for identifying hotspots and for assessing how these are spatially distributed. Their results are confined to map the spatial distribution of discrete events (usually crashes) and/or highlight some vulnerable road sections by using simple entities, such as points and (poly)lines. Their results represent macroscopic traffic flows, the data of which were collected using traditional methodologies.
In turn, as this article tries to exploit the potentiality of naturalistic driving data, its goal and methodology is slightly different from the ones in these previous studies. For this, some strategies for mapping naturalistic driving were shown. The figures presented depict different kinematic (no discrete) parameters which allow the actual driving behavior be inferred at a micro-level of analysis.
Naturalistic driving has some common limitations with Big Data. In fact, the capacity for collecting data is much greater than our ability to process and analyze these data. This leads to an imbalance between the data we can collect, and the knowledge that we can gain from these data. For this reason, new software tools, research focus, and work methodologies are increasingly demanded for the purposes of expanding our capacity to exploit these data that are currently available.
Data from naturalistic driving trials present a great potential for assessing driving behavior in real traffic environments. Ideally, these studies require the integration of multidisciplinary work teams, which can apply different work methodologies and/or software tools. One of the biggest advantages of GIS systems is that they allow for these data to be visualized, analyzed, and geoprocessed in a smarter way than with other tools. This allows for a more comprehensive analysis of data by adding more information (data layers) and adjusting the spatial scale (zoom view). Thus, the conclusions are more trustworthy.
As yet, most software tools used in naturalistic driving present important restrictions, such as unfriendly user interfaces, limited processing capacity, and scarcely intuitive displays, among others. GIS tools allow most of these drawbacks to be overcome. These can handle and analyze relatively huge datasets, such as the one resulting from naturalistic driving trials. Indeed, they incorporate toolboxes for managing large volumes of data by aggregating millions of records into simple entities, such multipoint tool for LiDAR point clouds [51]. This avoids the need for final datasets to be thinned or limited to some specific road sections and/or events. In addition, geoprocessing tools implemented in GIS allow for managing and adjusting datasets to the concrete requirements at any time.
The very nature of the naturalistic driving method requires carrying out a greater data collection both in quantitative (more indicators) and qualitative terms (at higher temporal frequencies). However, it is often excessive for most research purposes, which complicates the successive tasks of the handling, managing, processing and analysis of the data. Therefore, we recommend introducing a preliminary phase in any study to give careful consideration to the right balance between the research aims and how the vehicle should be instrumented [22].
GIS mapping allows for a bigger picture of the driving performance and the whole scenario to be obtained. Thus, it is possible to evaluate not only how (and how much) a specific parameter varies in a road section, but also its location. Spatial context is one of the factors that can explain, at least partially, how drivers behave in some road sections. This is, again, a clear advantage of naturalistic driving data when considering an additional dimension (the spatial factor) that is usually determinant for the purposes of understanding the data.
The resulting maps can clearly represent the data, which simplifies and facilitates the subsequent phases related to the analysis and interpretation of these data. In addition, GIS systems provide a simple and friendly interface, which can be easily managed by users with a basic level of expertise in mapping. This approach enables experts from different fields of knowledge to be brought together in multidisciplinary work teams, which encourages the development of creative solutions for dealing with the multi-faceted problem that is road safety [19].
This study is a further step towards understanding our driving behavior better. In the case of naturalistic driving, GIS mapping presents great advantages in applications for quality control of data, for extracting driving patterns, and for detecting events, among others. Future studies should focus both on these research topics, and also on strategies about how to achieve a more efficient exploitation of naturalistic driving data.