Next Article in Journal
Enhancing Emerging Pollutant Removal in Industrial Wastewater: Validation of a Photocatalysis Technology in Agri-Food Industry Effluents
Previous Article in Journal
Outdoor Content Creation for Holographic Stereograms with iPhone
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

POLIDriving: A Public-Access Driving Dataset for Road Traffic Safety Analysis

by
Pablo Marcillo
*,
Cristian Arciniegas-Ayala
,
Ángel Leonardo Valdivieso Caraguay
,
Sandra Sanchez-Gordon
and
Myriam Hernández-Álvarez
Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Ladrón de Guevara E11-25 y Andalucía, Edificio de Sistemas, Quito 170525, Ecuador
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(14), 6300; https://doi.org/10.3390/app14146300
Submission received: 8 April 2024 / Revised: 27 April 2024 / Accepted: 29 April 2024 / Published: 19 July 2024

Abstract

:
The problems with current driving datasets are their exclusivity to autonomous driving applications and their limited diversity in terms of sources of information and number of attributes. Thus, this paper presents a novel driving dataset that contains information from several heterogeneous sources and targets road traffic safety applications. We used an acquisition module based on software and hardware to collect information from a vehicle scanner and a health monitor. This module also consumes information from a weather web service and databases on traffic accidents and road geometric characteristics. For the acquisition sessions, drivers of different ages and genders drove vehicles on two routes at different day hours in different weather conditions. POLIDriving contains around 18 h of driving data, more than 61k observations, and 32 attributes. Unlike the other related datasets that include information on vehicle and road conditions, POLIDriving also includes information on the driver, weather conditions, traffic accidents, and road geometric characteristics. The dataset was tested in learning models to predict the risk levels of suffering a traffic accident. Hence, we built two learning models: Gradient Boosting Machine (GBM) and Multilayer Perceptron (MLP). GBM reached an accuracy value of 95.6%, and MLP reached an accuracy of 98.6%. Undoubtedly, POLIDriving will contribute greatly to the research on traffic accident prevention by providing a novel, numerous, and diverse driving dataset.

1. Introduction

Determining the real cause of a traffic accident is complicated because there is often not enough information at the time. Generally, the cause is attributed to the driver’s carelessness or negligence; however, the real cause can be another or a combination of other factors. The Global Status Report of Road Safety [1] from the World Health Organization (WHO) focuses only on five key risk factors, leaving other factors aside. For instance, a real cause of an accident can be a mechanical failure of the vehicle, adverse weather conditions, poor design of the road, or, much worse, a combination of some of them. While more information is available, determining the real cause of an accident will be easier.
In traffic accident prevention, specifically in the research area of traffic accident prediction, it is essential to count with a driving dataset that correlates and integrates information from heterogeneous sources. According to Marcillo et al. [2], the data sources commonly used in this area are driver’s data, vehicle data, weather conditions, traffic accidents, traffic flow, traffic events, light conditions, and road infrastructure. Thus, the key to a new driving dataset targeted to this area is the inclusion of information from most of these sources.
There are great driving datasets such as A2D2 [3], KITTI [4], BDD100K [5], or Apollo Scape [6]; however, most of them target autonomous driving applications. There are also datasets such as comma.ai [7] or comma2k19 [8] that target autonomous driving and road safety applications, but these datasets include information from one or at most two sources. Vehicular traffic flow and accident prevention applications require driving datasets that include as much information as possible from various sources. Considering that our work focuses on traffic accident prevention, specifically in predicting risk levels of suffering a traffic accident, generating a driving dataset that fulfills the requirements of this type of application is essential.
Our main contribution is to provide the research community with a driving dataset that correlates and integrates information from heterogeneous sources. Through POLIDriving, we provide 15 h of driving data from five real drivers and three extra hours of synthetic driving data from a fake driver with risky driving behavior. We also describe the process of data labeling through semi-supervised and ensemble learning. Finally, we provide two supervised learning models to predict risk levels of suffering a traffic accident.
The rest of this article is organized as follows: Section 2 presents related driving datasets, Section 3 describes the process of generating the POLIDriving dataset, Section 4 presents the results of this work, Section 5 discusses the most relevant implications of the dataset, and Section 6 presents the conclusions of this work.

2. Related Work

We found driving datasets for autonomous driving and driving behavior upon review. Generally, they provide high-resolution images, distance and proximity measures, vehicle parameters, and geolocation. Considering that the target of POLIDriving is the identification of risky driving patterns, we have considered the datasets that provide mainly vehicle data and geolocation. Thus, according to these requirements, the following are the most relevant driving datasets.
COMMA.AI [7] is a 7-h driving dataset that contains images of the road and measures of sensors. This dataset includes the driving records of three drivers using one vehicle. It was generated using a frontal camera, a Light Detection and Ranging (LiDAR) sensor, and a Positioning and Orientation System (POS). Pictures were taken at 20 Hz, and the measures were integrated at 100 Hz. COMMA.AI includes information on the vehicle through the measurements of the sensors and the road conditions through the pictures of the road.
COMMA2K19 [8] is a 33-h driving dataset that contains road and in-vehicle images and measurements of sensors. This dataset includes the driver record of one driver using two vehicles. It was generated using a set of cameras (two frontal and one internal), an On-Board Diagnostics (OBD)-II scanner, a Global Navigation Satellite System (GNSS) receiver, and a 9-axis Inertial Measurement Unit (IMU). Pictures were taken at 20 Hz and the measurements at 100 Hz. COMMA2K19 includes information on vehicle and road conditions.
PREVENTION [9] is a 6-h driving dataset that contains front and rear images of the road and measurements of sensors. This dataset includes the driving records of three drivers using one vehicle. It was generated using two high-resolution cameras (one frontal and one rear-facing), a LiDAR sensor, three long-range radars (one narrow-field and two broad-field), a GNSS receiver, and a Controller Area Network (CAN) bus scanner. PREVENTION includes information on vehicle and road conditions.
AUTOMOTIVE OBD-II [10] is a 6-h driving dataset containing sensor measures. This dataset includes the driving record of ten vehicles. It was generated using only an OBD-II scanner and a mobile app. Sensor measurements were taken at 10 Hz. AUTOMOTIVE OBD-II includes only information on the vehicle such as engine coolant temperature, rpm, speed, throttle position (It monitors the position of the throttle valve that later controls the air entering the engine.), intake air temperature, and others.
A2D2 [3] is a driving dataset that contains images from six cameras and measurements of sensors. It contains 392.556 observations (frames) and each frame contains images and data files. A2D2 contains the driving record of one driver using one vehicle. It was generated using six cameras, six LiDAR sensors, one bus scanner, and one Global Positioning System (GPS) receiver. A2D2 includes information on vehicle and road conditions.
In contrast, we present POLIDriving, an 18-h driving dataset that includes information related to the driver, vehicle, weather conditions, traffic accidents, and road geometrics characteristics. It includes information from drivers of different ages and genders, with or without medical conditions or traffic violations, and information from vehicles of different brands, models, types, and years of fabrication. POLIDriving differs from other datasets in the number of drivers, vehicles, and data sources used for the data generation. Table 1 and Table 2 present the features of the reviewed driving datasets.

3. Materials and Methods

This section presents the design of the acquisition module and the driver profiles, vehicles, devices, and services used in the acquisition sessions. Additionally, it presents the data sources and attributes included in the dataset, the selection of routes, and the geolocation of control points.

3.1. Acquisition Module

An acquisition module based on software and hardware was built to generate the driving dataset. This module consists of a mobile app that works with an OBD-II vehicle scanner, a GPS receiver, and a health monitor. Some vehicle parameters and the vehicle location are taken from the vehicular scanner and the geolocation device through the mobile app. Additionally, this module consists of software used to consume information from a weather service and to get information from a traffic accident database, a road geometrics database, and a health monitor. The weather conditions are taken from the weather service, the number of deaths from the traffic accidents database, the design speeds from the road geometrics characteristics database, and some health parameters of the driver from the health monitor. Figure 1 presents the design of the acquisition module.

3.2. Drivers and Vehicles

The following driver profiles and vehicles were used in this experiment.
  • One woman and four men between 25 and 43 years old of different body constitutions, with or without medical conditions, traffic violations, and driving experience.
  • Cross-Over Utility Vehicle (CUV), PICKUP, SEDAN-type vehicles of different brands, models, and different years of fabrication.
Table 3 and Table 4 present information on the drivers and vehicles.

3.3. Devices and Services

We equipped the vehicles used in the acquisition sessions with Veepeak OBDCheck BLE [11] scanners and Android cellphones, and their drivers wore Garmin Vivosmart 5 [12] health monitors. OBDCheck BLE is a vehicular scanner built on Bluetooth technology with support for OBD-II protocols such as CAN [13], KWP2000 [14], ISO9141-2 [15], J1850 VPW [16], and J1850 PWM [16]. The vehicular scanner receives measures from different Electronic Control Units (ECU (It is an embedded system that controls electrical systems in a vehicle. Many ECUs form the vehicle computer.)) every 100 ms, which are then merged to obtain observations at 1 Hz. Vivosmart 5 is a health monitor built on sensors that permits obtaining parameters such as heart rate, body temperature, body battery, and stress level.
The acquisition module uses the Accuweather service [17] to obtain weather information, a remote database from the Transit National Agency (ANT) [18] to obtain traffic accidents, and an own database (Section 3.6) to obtain road geometric characteristics. The Accuweather service provides weather conditions for a specific location each hour. The Locations API obtains a location key, which the CurrentConditions API uses to obtain the current conditions through a JSON object. Based on historical information about traffic accidents, the acquisition module determines the number of accidents around a specific location. Similarly, it determines the safe speed (design speed) in a specific location using the closest distance to the control points.

3.4. Data Sources and Attributes

Based on these studies [2,19,20], we chose the most relevant data sources and attributes for POLIDriving. Thus, our dataset contains information from different data sources, such as driver, vehicle, weather conditions, traffic accidents, and road geometric characteristics. Related to the attributes of each data source, the vehicle data includes attributes such as the steering angle, speed, rpm, acceleration, throttle position, engine temperature, and the vehicle id; the driver’s data includes the driver´s id and heart rate; the weather conditions data includes the current weather, visibility, and precipitation; the traffic accidents data includes the number of accidents on site, and finally, the road geometric characteristics data includes the design speed. Table 5 presents a detailed list of the attributes of the POLIDriving dataset.

3.5. Routes and Timetables

We considered roads with high traffic accident rates and roads with heavy traffic in the urban area of Quito, Ecuador. In this way, the experiment considered Simón Bolívar avenue as a high-rate accident road, General Rumiñahui highway, Velasco Ibarra, 6 de Diciembre, Galo Plaza Lasso, and Amazonas avenues as high-traffic roads. The experiment considered the two routes described below. Figure 2 presents the routes on the maps.
  • Route 1 is 59 km long that begins on the General Rumiñahui highway, crosses Velasco Ibarra, Ladrón de Guevara, Patria, 6 de Diciembre, and Galo Plaza Lasso avenues, returns by Galo Plaza Lasso, Amazonas, Patria, and Velasco Ibarra avenues, and finishes on the General Rumiñahui highway.
  • Route 2 is 96 km long that begins on the General Rumiñahui highway, continues with the Simón Bolívar avenue in the south-north direction, returns by the Simón Bolívar in the north-south direction, and finishes on the General Rumiñahui.

3.6. Control Points

We referenced several control points along the two routes geographically. Both routes were divided into segments with a start and end point and many control points for each segment [21]. These points helped us calculate the number of traffic accidents around them and determine the design speeds along the route. Thus, Route 1 counts 90 segments and 557 control points, and Route 2 counts 191 and 615 control points. Table 6 presents a sample of control points for Routes 1 and 2. Table A2 and Table A3 in Appendix A present the whole lists for Routes 1 and 2, respectively. Figure 3 presents a sample of the control points on the map.

4. Results

The POLIDriving dataset contains data from seven data acquisition sessions. Five drivers and four vehicles participated and were used in the sessions. Additionally, we generated two extra sessions with synthetic data based on real data. POLIDriving contains around 18 h of driving data and 32 attributes from five heterogeneous sources. All the raw and processed data files are stored in a public GitHub repository [22]. Table 7 presents a sample of the processed data file for the Alonso user for Route 1.
To test POLIDriving, we selected the traffic accident prediction research area, specifically predicting risk levels by identifying risky driving patterns. In that way, we built two learning models, one based on neural networks and the other on decision trees. First of all, we performed data integration to join data from the Accuweather web service and the traffic accident and road geometric characteristics databases. Then, we performed data cleaning and feature selection over POLIDriving. It was reduced to 14 attributes out of 32 available ones. Low variance filters and correlation and mutual information matrices were used to select the most relevant attributes. Figure 4 presents the correlation matrix before and after the feature selection.
Once preprocessing was performed, we manually labeled a very small portion of the observations. Then, we used semi-supervised techniques and a voting ensemble to label the rest. For manual labeling, we identified threshold values for the attributes and established ranges and their penalties. Table 8 presents a sample of threshold values and ranges used by experts for manual data labeling, and Table A1 in Appendix A presents the whole list of threshold values and ranges. We determined the risk level among low, medium, high, and very high depending on the number of penalties. Since few observations were labeled with high-risk levels, extra observations with synthetic data presenting risky driving patterns were added. It permitted labeling 8.5% of the total observations (23.152). From the total of labeled observations (1.980), we considered 75% (1.485) for training and 25% (495) for testing. Unlabeled observations (21.172) were labeled using a voting ensemble with labels generated by semi-supervised learning methods such as label propagation, label spreading, and self-training based on Multilayer Perceptron, Random Forest, and Gradient Boosting Machine. Table 9 presents the results of the labeling data.
Despite the strategies applied to add observations, the dataset was still unbalanced, so we applied the oversampling technique known as SMOTE [23]. It helped to balance all the minority classes with the majority class. Thus, the dataset reached values of 12.839 for every risk level. Considering the most common algorithms used in prediction models for traffic accident prevention proposed by [2], we built two models; the first used a Gradient Boosting Machine (GBM) with 100 estimators, a learning rate of 0.1, a maximum depth of 3, and relu as the activation function, and the second one used a Multilayer Perceptron (MLP) with three hidden layers, 100 neurons for each layer, and the hyperbolic tangent as the activation function. Table 10 presents the configurations for the learning models. Both models were trained and evaluated using cross-validation with ten folds. Table 11 presents the evaluation of the learning models.
Finally, Table 12 presents a random sample of observations and their predicting classes. The observation marked with (*15) can be interpreted as follows. That observation received the risk level ’very high’ because the driver speeded at 81 km/h, exceeding the designated speed of 80 km/h. Her/his vehicle worked in normal conditions, with an engine temperature of 94 °C and at 2950 rpm, low of normal range. However, the driver presented a slight tachycardia (101 bpm), probably due to anxiety or stress, while driving in adverse rain conditions with low visibility (3.2 km) and precipitation of 8 mm. Finally, the driver crossed through a road with a moderate traffic accident rate of 12 and a moderate traffic accident rate at a specific hour of 3.

5. Discussion

Although several driving datasets are available, most target the autonomous driving area, and the rest are very limited in terms of the reduced number of data sources and attributes, not to mention that the best ones are not free public access. This fact motivated the creation of a new dataset. As mentioned, this work aimed to create a driving dataset that targets the driving behavior area and correlates and integrates information from heterogeneous sources. Thus, we created a public driving dataset, which we named POLIDriving.
In comparison with related datasets in which the vehicles used in the acquisition sessions were equipped with high-resolution cameras, LIDAR radars, long-rage radars, OBD-II scanners, GNSS receivers, and IMU devices, in our dataset, the vehicles were equipped with an acquisition module which consists of a GNSS receiver, an OBD-II scanner, and a smartphone. This option was adopted because POLIDriving is intended to focus on driving behavior, not autonomous driving applications. Finally, although adding advanced equipment to vehicles to improve the dataset sounds tempting, a dataset that includes information from fully equipped vehicles is not viable because of resource limitations and mainly because such a dataset is beyond the aim of this study.
In numbers, POLIDriving includes information from five heterogeneous data sources in contrast to two data sources of the related datasets. Similarly, POLIDriving can have more or less the same number of attributes (around 40) as the related datasets; however, its attributes are not from only one or two sources. Instead, POLIDriving has 13 attributes related to vehicle data, three to driver’s data, 13 to weather data, two to traffic accidents, and one to road geometric characteristics.
As mentioned above, POLIDriving focuses on driving behavior so that it could be used, for instance, in applications related to identifying risky driving behaviors in drivers. Therefore, having information from different drivers is desirable in those applications. In accordance with this, we recruited many drivers to participate in the acquisition sessions. In comparison with the rest of the related datasets, POLIDriving used more drivers, five to be exact, of different ages, genders, and driving experience. Finally, we decided that every driver should use the vehicle that drives daily to avoid unusual driving behaviors.
Something to consider is how driving behavior influences the risk of suffering a traffic accident, as well as, how prone a driver with aggressive driving behavior is to accidents. According to the United Nations Economic Commission for Europe (UNECE), typical aggressive driving behavior includes speeding, not respecting traffic signals, or changing lanes inappropriately. Therefore, looking for more evidence confirming that aggressive driving behavior is closely related to a high probability of accidents is unnecessary. In this way, we added synthetic data for an unreal driver (furious) based on a real drive to POLIDriving. This driver is speeding, driving at very high rpm, and experiencing anxiety and stress, reflected by a very high heart rate.
Once POLIDriving was released, it was tested in a model to predict risk levels of suffering traffic accidents. We tried combining semi-supervised and ensemble learning techniques for data labeling. It allowed us to label 91.5% of observations using only 8.5% of labeled observations with an accuracy of 82.0%. This result is a great achievement, considering the accuracies of 71.0% and 75.0% obtained by the label propagation and spreading methods. The other great achievement was the accuracies of 95.6% and 98.6% obtained by the supervised learning models. It was accomplished by performing a cross-validation technique for tuning hyperparameters of the learning model. These achievements and the result of an audit method applied to a representative amount of observations ratified the good quality of the POLIDriving dataset.
Since POLIDriving uses different data sources, potential biases related to them must be analyzed and resolved. Biases concerning the weather service include the insufficient spatial resolution of the weather model and the updating frequency of only one observation per hour. In other words, the current conditions of a location can be better fit using the conditions of other nearby locations. Similarly, the conditions can change rapidly in very changing climates, so the updating frequency (1 sample/hour) must be higher to avoid having erroneous current conditions. A possible solution could be installing a portable weather station in the vehicle.
Future work should integrate information from other data sources, such as traffic flow or traffic events, into POLIDriving. For instance, as part of the traffic flow, attributes such as the number of vehicles and occupancy, and as part of the traffic events, attributes such as closures, broken vehicles, congestion, and blocked lanes [2]. Furthermore, POLIDriving could be improved for future acquisition sessions by including more women and senior adults as drivers, passenger transport (taxis), and emergency vehicles, as well as by designing new routes that include roads in rural areas and highways. Finally, and based on these studies [24,25], POLIDriving could be improved by installing a front-facing camera to identify gestures or grimaces associated with aggressive behavior or using already available attributes to recognize aggressive driving styles such as aggressive, distracted, or drunk driving.

6. Conclusions

We obtained a public driving dataset that targets driving behavior, specifically road traffic safety, and stands out for its heterogeneity. Our non-expensive and easy installation acquisition agent allowed us to use different types of vehicles in the acquisition sessions. Thus, we could engage more drivers and their vehicles to avoid unusual driving behaviors. The lack of driving datasets with the heterogeneity feature motivated us to create a dataset with as many different sources as possible. Thus, we also integrated data from external databases and web services related to traffic accidents, road geometric characteristics, and weather conditions.
Once built, the POLIDriving dataset allowed us to design and test learning models for road traffic safety. We tested the built dataset with our designed model to predict risk levels of suffering an accident. As you know, the performance of a learning model depends largely on the quality of the dataset used to train and test the model. The results confirmed, therefore, the good quality of POLIDriving, which also made us think that other authors will use our dataset in their applications. Undoubtedly, the POLIDriving dataset will greatly contribute to research on road traffic safety and will be a great asset to the community.
However, our dataset is not without its limitations, notably in the representation of gender and age demographics among participants and the variety of driving conditions tested. Future enhancements will address these gaps by incorporating a more balanced participant pool and designing studies that simultaneously analyze driving behaviors across diverse routes.
We strongly advocate for the continued expansion and refinement of POLIDriving. The dataset can offer even deeper insights into driving behaviors and traffic safety by including broader demographic and situational diversity. We invite the research community to explore POLIDriving for their projects, believing that collaborative efforts will propel forward our shared goal of improving road safety.

Author Contributions

Conceptualization, P.M.; methodology, P.M. and C.A.-A.; investigation, P.M.; writing—original draft preparation, P.M.; writing—review and editing, P.M., Á.L.V.C., S.S.-G. and M.H.-Á.; supervision, Á.L.V.C., S.S.-G. and M.H.-Á. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Escuela Politécnica Nacional grant number PIS 22-20 (Development of learning models to predict risk levels of suffering traffic accidents using AI and ML and its application in a system of alerts for mobile devices).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset presented in this work is publicly available at https://github.com/laboratorioAI/polidriving (accessed on 27 April 2024).

Acknowledgments

Our recognition to VIIV (Vicerrectorado de Investigación, Innovación y Vinculación) of Escuela Politécnica Nacional.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
LiDARLight Detection and Ranging
GNSSGlobal Navigation Satellite System
OBDOn-Board Diagnostic
IMUInertial Measurement Unit
CANController Area Network
LPLabel Propagation
LSLabel Spreading
SVMSupport Vector Machine
MLPMultilayer Perceptron
RFRandom Forest
GBMGradient Boosting Machine
SMOTESynthetic Minority Over-Sampling Technique

Appendix A

Table A1. Threshold values and ranges for manual data labeling.
Table A1. Threshold values and ranges for manual data labeling.
#AttributeItemIDValue RangePenalty
1rpmlow [0–1500]1
2normal [1501–3000]0
3high [3001–5000]2
4very high [5001–8000]3
5engine temperaturelow [0–82]1
6normal [83–94]0
7high [95–104]1
8overheating [105–200]2
9heart ratebradicardia [0–59]2
10sinus zona a [60–80]1
11sinus zona b [81–100]2
12tachycardia slight [101–120]3
13tachycardia severe [121–180]4
14weather typessunny1 1
15mostly sunny2 1
16partly sunny3 1
17hazy sunshine5 1
18mostly cloudy6 2
19cloudy7 2
20clouds and sun9 2
21partly cloudy35 3
22fog11 3
23rain18 4
24visibilitybad [0.0–0.0]4
25poor [0.1–2.4]3
26moderate [2.5–10.0]2
27good [10.1–50.0]1
28excellent [50.1–100.0]0
29precipitationnone [0.0–0.0]0
30light [0.1–2.4]1
31moderate [2.5–10.0]2
32heavy [10.1–50.0]3
33violent [50.1–100.0]4
34accidents on sitenone [0–0]0
35low [1–8]1
36moderate [9–30]2
37high [31–132]3
38very high [133–300]4
39design speednormal [0–0]0
40slight [1–10]1
41moderate [11–20]2
42serious [21–40]3
43very serious [41–100]4
44accidents timenone [0–0]0
45low [1–2]1
46moderate [3–9]2
47high [10–100]3
Table A2. Extended sample of control points for Route 1.
Table A2. Extended sample of control points for Route 1.
Road IDSegmentStarting PointEnd Point
ID Latitude Longitude ID Latitude Longitude
AGRS01P001−0.29755−78.46091P008−0.29065−78.46514
AGRS03P016−0.28166−78.47105P018−0.27999−78.47296
AGRS05P022−0.27837−78.48127P024−0.27687−78.48597
AGRS07P028−0.27093−78.48937P030−0.26995−78.48797
AGRS09P034−0.26585−78.48679P035−0.26472−78.4874
AGRS11P039−0.25961−78.48721P041−0.2572−78.48505
AGRS13P045−0.25224−78.48292P047−0.24996−78.48282
AGRS15P050−0.24518−78.48492P052−0.24281−78.48542
AGRS17P055−0.23902−78.48485P057−0.23649−78.48426
AGRS19P061−0.23038−78.48455P062−0.22925−78.485
AGRS21P066−0.2267−78.48815P069−0.2271−78.49302
AGRS23P073−0.22974−78.4974P076−0.23261−78.5013
AVIS25P079−0.23224−78.50294P085−0.22943−78.50077
AVIS27P096−0.22479−78.49702P104−0.22108−78.4952
AVIS29P112−0.21797−78.49151P121−0.21309−78.48891
DMQS31P125−0.21252−78.48894P136−0.21047−78.49363
DMQS33P142−0.20853−78.4953P155−0.20255−78.48677
DMQS35P167−0.19188−78.48108P184−0.17915−78.47825
DMQS37P203−0.16393−78.47518P212−0.15587−78.47696
DMQS39P231−0.13687−78.47347P246−0.12179−78.47898
DMQS41P252−0.10668−78.47647P256−0.10049−78.47192
DMQS43P278−0.12921−78.48137P287−0.13844−78.48264
DMQS45P304−0.15324−78.48467P309−0.15792−78.48439
DMQS47P335−0.16911−78.48446P343−0.17452−78.48541
DMQS49P356−0.18705−78.4876P363−0.19128−78.48837
DMQS51P370−0.19677−78.4896P375−0.19962−78.49078
DMQS53P385−0.20248−78.49399P392−0.20498−78.49547
DMQS55P400−0.20767−78.49708P403−0.20847−78.49612
DMQS57P408−0.20969−78.49451P413−0.21136−78.49354
DMQS59P419−0.21286−78.49219P426−0.21485−78.49106
AVIS61P431−0.21767−78.49135P434−0.21874−78.49304
AVIS63P441−0.22278−78.4959P447−0.22604−78.49801
AVIS65P452−0.22743−78.50016P456−0.22973−78.50097
AGRS67P461−0.23214−78.50301P465−0.23355−78.50293
AGRS69P470−0.22983−78.49735P473−0.22788−78.49466
AGRS71P478−0.22679−78.48822P480−0.22803−78.48646
AGRS73P484−0.23039−78.48466P487−0.23437−78.48442
AGRS75P491−0.23897−78.48499P493−0.24082−78.48563
AGRS77P498−0.24522−78.48505P501−0.24799−78.48381
AGRS79P506−0.25222−78.48304P509−0.25514−78.4842
AGRS81P514−0.25952−78.4873P517−0.26273−78.48834
AGRS83P521−0.26591−78.48694P524−0.26888−78.48705
AGRS85P528−0.27084−78.48943P531−0.27565−78.48944
AGRS87P536−0.27851−78.48131P539−0.27954−78.47542
AGRS89P544−0.28173−78.47117P550−0.28893−78.46642
AGR—General Rumiñahui Highway. DMQ—Metropolitan District of Quito. AVI—Velasco Ibarra Avenue..
Table A3. Extended sample of control points for Route 2.
Table A3. Extended sample of control points for Route 2.
Road IDSegmentStarting PointEnd Point
ID Latitude Longitude ID Latitude Longitude
AGRS001P001−0.29755−78.46091P008−0.29065−78.46514
AGRS003P016−0.28166−78.47105P018−0.27999−78.47296
AGRS005P022−0.27837−78.48127P024−0.27687−78.48597
AGRS007P028−0.27093−78.48937P030−0.26995−78.48797
AGRS009P034−0.26585−78.48679P035−0.26472−78.4874
AGRS011P039−0.25961−78.48721P041−0.2572−78.48505
AGRS013P045−0.25224−78.48292P047−0.24996−78.48282
AGRS015P050−0.24518−78.48492P052−0.24281−78.48542
ASBS017P055−0.23945−78.48496P059−0.24315−78.48256
ASBS019P063−0.24625−78.47657P065−0.24202−78.47433
ASBS021P069−0.23615−78.47298P071−0.2347−78.47095
ASBS023P074−0.23041−78.46985P075−0.22942−78.46914
ASBS025P081−0.21675−78.46567P082−0.21539−78.46449
ASBS027P085−0.21146−78.46253P087−0.20665−78.46084
ASBS029P090−0.20322−78.45855P092−0.20021−78.45669
ASBS031P095−0.19828−78.46070P096−0.19839−78.46164
ASBS033P099−0.19860−78.46491P100−0.19808−78.46644
ASBS035P104−0.19546−78.46405P106−0.19399−78.46052
ASBS037P109−0.19199−78.45866P110−0.19187−78.45795
ASBS039P116−0.18772−78.45412P118−0.18511−78.45273
ASBS041P121−0.18184−78.45117P122−0.18123−78.45135
ASBS043P126−0.18105−78.45391P127−0.18037−78.45582
ASBS045P132−0.17279−78.45193P134−0.16996−78.44987
ASBS047P137−0.16384−78.44792P139−0.16106−78.44722
ASBS049P143−0.15681−78.44634P145−0.15338−78.44721
ASBS051P148−0.15191−78.45150P149−0.15133−78.45193
ASBS053P152−0.14980−78.45068P153−0.14874−78.44897
ASBS055P156−0.14762−78.44748P158−0.14553−78.44521
ASBS057P162−0.14044−78.44466P165−0.13719−78.44722
ASBS059P169−0.12963−78.44828P170−0.12791−78.44828
ASBS061P173−0.11840−78.45088P176−0.11325−78.45675
ASBS063P179−0.11025−78.45799P181−0.10953−78.45877
ASBS065P184−0.11193−78.45749P185−0.11529−78.45485
ASBS067P188−0.11850−78.45096P189−0.1214−78.44807
ASBS069P192−0.12649−78.44817P194−0.12866−78.44853
ASBS071P197−0.13510−78.44764P199−0.13834−78.44709
ASBS073P202−0.14055−78.44478P204−0.14354−78.44382
ASBS075P209−0.14750−78.44751P210−0.14772−78.44819
ASBS077P215−0.15078−78.45246P216−0.15157−78.45197
ASBS079P219−0.15268−78.44880P220−0.15434−78.44667
ASBS081P223−0.16034−78.44692P226−0.16671−78.44792
ASBS083P232−0.17604−78.45459P233−0.17624−78.45568
ASBS085P236−0.17890−78.45535P238−0.18152−78.45518
ASBS087P241−0.18048−78.45218P243−0.18321−78.45104
ASBS089P246−0.18528−78.45344P247−0.18683−78.45409
ASBS091P252−0.19196−78.45678P256−0.19365−78.46142
ASBS093P262−0.19875−78.46491P268−0.20245−78.45792
ASBS095P274−0.20831−78.46164P276−0.2107−78.46236
ASBS097P279−0.21265−78.46373P280−0.21386−78.46407
ASBS099P283−0.21886−78.46671P285−0.22464−78.4689
ASBS101P288−0.23032−78.46996P289−0.23107−78.47046
ASBS103P294−0.23612−78.47313P301−0.24518−78.47624
ASBS105P306−0.24405−78.48218P307−0.2426−78.48259
ASBS107P315−0.23294−78.48880P316−0.2324−78.48933
ASBS109P319−0.23372−78.49296P320−0.23427−78.49205
ASBS111P325−0.23918−78.49250P326−0.23962−78.49314
ASBS113P329−0.24229−78.49592P331−0.24285−78.49992
ASBS115P334−0.24686−78.50143P336−0.24954−78.50271
ASBS117P340−0.25545−78.50350P341−0.25677−78.50286
ASBS119P344−0.26252−78.50514P345−0.26535−78.50739
ASBS121P348−0.26910−78.50768P350−0.2717−78.50844
ASBS123P353−0.27629−78.51247P357−0.28423−78.51812
ASBS125P360−0.28624−78.51907P362−0.28904−78.51979
ASBS127P369−0.30256−78.52161P377−0.31294−78.52237
ASBS129P383−0.32332−78.52019P384−0.3278−78.5191
ASBS131P388−0.33475−78.52013P390−0.33721−78.52059
ASBS133P393−0.34284−78.52297P396−0.34739−78.52354
ASBS135P399−0.34887−78.52331P403−0.35486−78.5252
ASBS137P409−0.35741−78.53344P410−0.35913−78.53468
ASBS139P414−0.36507−78.52912P415−0.36585−78.52861
ASBS141P428−0.38228−78.53171P432−0.38417−78.5321
ASBS143P437−0.37681−78.53043P444−0.36893−78.52879
ASBS145P449−0.36497−78.52902P452−0.36132−78.53296
ASBS147P456−0.35756−78.53338P457−0.35719−78.53073
ASBS148P458−0.35664−78.52849P459−0.35615−78.52749
ASBS149P460−0.35530−78.52579P463−0.35066−78.52291
ASBS151P466−0.34804−78.52329P468−0.3455−78.52364
ASBS153P473−0.33478−78.51997P475−0.33226−78.51952
ASBS155P480−0.32329−78.52007P483−0.31841−78.52146
ASBS157P491−0.30257−78.52146P496−0.29606−78.52032
ASBS159P499−0.29047−78.51954P501−0.28754−78.51955
ASBS161P504−0.28501−78.51836P509−0.27542−78.51148
ASBS163P512−0.26912−78.50753P513−0.26859−78.50741
ASBS165P516−0.26259−78.50504P518−0.25822−78.50242
ASBS167P521−0.25015−78.50296P522−0.24903−78.50217
ASBS169P525−0.24465−78.50078P528−0.24266−78.4972
ASBS171P531−0.24073−78.49421P532−0.23996−78.4934
ASBS173P538−0.23428−78.49155P539−0.23394−78.49249
ASBS175P542−0.23151−78.49158P543−0.2319−78.49016
AGRS177P552−0.24168−78.4858P555−0.24447−78.48527
AGRS179P560−0.24873−78.48341P563−0.25167−78.4829
AGRS181P568−0.25581−78.48447P571−0.25897−78.48675
AGRS183P576−0.26355−78.48809P578−0.26529−78.48724
AGRS185P583−0.26953−78.4876P585−0.27054−78.48902
AGRS187P590−0.27625−78.48834P593−0.27811−78.48255
AGRS189P598−0.27964−78.47458P601−0.28115−78.4716
AGRS191P609−0.29028−78.46552P615−0.29759−78.46097
ASB—Simón Bolívar Avenue.

References

  1. World Health Organization. WHO Global Status Report on Road Safety 2023; WHO: Geneva, Switzerland, 2023. [Google Scholar]
  2. Marcillo, P.; Valdivieso Caraguay, Á.L.; Hernández-Álvarez, M. A Systematic Literature Review of Learning-Based Traffic Accident Prediction Models Based on Heterogeneous Sources. Appl. Sci. 2022, 12, 4529. [Google Scholar] [CrossRef]
  3. Geyer, J.; Kassahun, Y.; Mahmudi, M.; Ricou, X.; Durgesh, R.; Chung, A.S.; Hauswald, L.; Pham, V.H.; Mühlegg, M.; Dorn, S.; et al. A2d2: Audi autonomous driving dataset. arXiv 2020, arXiv:2004.06320. [Google Scholar]
  4. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
  5. Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2636–2645. [Google Scholar]
  6. Huang, X.; Wang, P.; Cheng, X.; Zhou, D.; Geng, Q.; Yang, R. The apolloscape open dataset for autonomous driving and its application. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2702–2719. [Google Scholar] [CrossRef] [PubMed]
  7. Santana, E.; Hotz, G. Learning a driving simulator. arXiv 2016, arXiv:1608.01230. [Google Scholar]
  8. Schafer, H.; Santana, E.; Haden, A.; Biasini, R. A commute in data: The comma2k19 dataset. arXiv 2018, arXiv:1812.05752. [Google Scholar]
  9. Izquierdo, R.; Quintanar, A.; Parra, I.; Fernández-Llorca, D.; Sotelo, M. The prevention dataset: A novel benchmark for prediction of vehicles intentions. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3114–3121. [Google Scholar]
  10. Weber, M. Automotive OBD-II Dataset. 2023. Available online: https://radar.kit.edu/radar/en/dataset/bCtGxdTklQlfQcAq (accessed on 27 April 2024).
  11. Veepeak. OBDCheck BLE+. Available online: https://www.veepeak.com/product/obdcheck-ble-plus/ (accessed on 27 April 2024).
  12. Garmin. Vivosmart 5. Available online: https://www.garmin.com/en-US/p/782585 (accessed on 27 April 2024).
  13. ISO 11898-1:2015; Road Vehicles—Controller Area Network (CAN). Part 1: Data Link Layer and Physical Signalling. International Organization for Standardization: Geneva, Switzerland, 2015. Available online: https://www.iso.org/standard/63648.html (accessed on 27 April 2024).
  14. ISO 14230-1:2012; Road Vehicles—Diagnostic Communication over K-Line (DoK-Line). International Organization for Standardization: Geneva, Switzerland, 2012. Available online: https://www.iso.org/standard/55591.html (accessed on 27 April 2024).
  15. ISO 9141-2:1994; Road Vehicles—Diagnostic Systems. Part 2: CARB Requirements for Interchange of Digital Information. International Organization for Standardization: Geneva, Switzerland, 1994. Available online: https://www.iso.org/standard/16738.html (accessed on 27 April 2024).
  16. J1850_202212; Class B Data Communications Network Interface (STABILIZED Dec 2022). SAE International: Warrendale, PA, USA, 2022. Available online: https://www.sae.org/standards/content/j1850_202212/ (accessed on 27 April 2024).
  17. Accuweather. Accuweather. Available online: https://www.accuweather.com/ (accessed on 27 April 2024).
  18. Transit National Agency (ANT). National Accident Rate Viewer. Available online: https://www.ant.gob.ec/visor-de-siniestralidad-estadisticas/ (accessed on 27 April 2024).
  19. Yan, Y.; Zhang, Y.; Yang, X.; Hu, J.; Tang, J.; Guo, Z. Crash prediction based on random effect negative binomial model considering data heterogeneity. Phys. A Stat. Mech. Its Appl. 2020, 547, 123858. [Google Scholar] [CrossRef]
  20. Bao, J.; Liu, P.; Ukkusuri, S.V. A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data. Accid. Anal. Prev. 2019, 122, 239–254. [Google Scholar] [CrossRef] [PubMed]
  21. Heredia Silva, C.A. Desarrollo de potenciales aplicaciones móviles aplicables al estudio de velocidades seguras en vías. Caso de estudio: Avenida Simón Bolívar. Bachelor’s Thesis, PUCE-Quito, Quito, Ecuador, 2019. [Google Scholar]
  22. Pablo Marcillo. POLIDriving. Available online: https://github.com/laboratorioAI/polidriving (accessed on 27 April 2024).
  23. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  24. Shahverdy, M.; Fathy, M.; Berangi, R.; Sabokrou, M. Driver behavior detection and classification using deep convolutional neural networks. Expert Syst. Appl. 2020, 149, 113240. [Google Scholar] [CrossRef]
  25. Kovaceva, J.; Isaksson-Hellman, I.; Murgovski, N. Identification of aggressive driving from naturalistic data in car-following situations. J. Saf. Res. 2020, 73, 225–234. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Design of the acquisition module.
Figure 1. Design of the acquisition module.
Applsci 14 06300 g001
Figure 2. Routes for data acquisition. (a) Route 1 (b) Route 2.
Figure 2. Routes for data acquisition. (a) Route 1 (b) Route 2.
Applsci 14 06300 g002
Figure 3. Control points on the map.
Figure 3. Control points on the map.
Applsci 14 06300 g003
Figure 4. Correlation matrices for POLIDriving. (a) Initial correlation matrix. (b) Final correlation matrix.
Figure 4. Correlation matrices for POLIDriving. (a) Initial correlation matrix. (b) Final correlation matrix.
Applsci 14 06300 g004
Table 1. Datasets features—Part 1.
Table 1. Datasets features—Part 1.
AuthorsNameDurationFrequency of  AcquisitionDriversVehiclesSensors/DevicesApplications
[h] [Hz] Auton. Driving Driving Behavior
Santana et al. [7]comma.ai7.25pictures at 20 and measures at 100311 frontal camera, 1 LiDAR sensor, and 1 POS device
Shafer et al. [8]comma2k1933pictures at 20 and measures at 10122 frontal cameras, 1 internal camera, 1 OBD-II scanner, 1 GNSS receiver, and 1 9-axis IMU device
Izquierdo et al. [9]PREVENTION6laser at 10, radars at 33, and location receiver at 20311 frontal camera, 1 rear-facing camera, 1 LiDAR sensor, 3 long-range radars, 1 GNSS receiver, and 1 CAN bus scanner
Weber et al. [10]AUTOMOTIVE OBD-II6measures at 101101 OBD-II scanner-
Geyer et al. [3]A2D2-not mentioned116 cameras, 6 LiDAR sensors, 1 GPS, 1 IMU, and 1 bus scanner
Marcillo et al.POLIDriving18measures at 1541 OBD-II scanner, 1 GPS receiver, and 1 health monitor-
Table 2. Datasets features—Part 2.
Table 2. Datasets features—Part 2.
Name Datasources No.
Attributes
Data
Labeling
No.
Classes
Driver’s
Data
Vehicle
Data
Weather
Conditions
Traffic
Accidents
Geometric
Charact.
Road
Conditions
comma.ai----140--
comma2k19----1≈45--
PREVENTION----131-
AUTOMOTIVE OBD-II-----11--
A2D2----122--
POLIDriving-3224
1 pictures of the road. 2 a part of the dataset.
Table 3. Drivers’ information.
Table 3. Drivers’ information.
IDNameGenderAge
[Years]
Weight
[kg]
Height
[cm]
Medical
Conditions
Driver’s License Points
[/30]
Driving Experience
[Years]
1PabloMale4059165None3013
2AndresMale2569163None307
3RichardMale3774170None3019
4AlonsoMale4377170None3020
5YolandaFemale4362155None3023
Table 4. Vehicles information.
Table 4. Vehicles information.
IDBrandModelTypeYear
Manufacture
Kilometers Travelled
[×1000]
Last Maintenance
[Year]
Number
Airbags
1KiaSportageCUV20183120232
2KiaSolutoSedan20227520222
3ChevroletDMAXPickup201335020232
4ChevroletCavalierSedan201816020212
Table 5. Data dictionary.
Table 5. Data dictionary.
#AttributeClassUnitsData SourceSensor/Device
1timeTimestamp Vehicle dataGPS receiver
2speedNumerickm/hOBD-II scanner
3revolutions per minuteNumericrpmOBD-II scanner
4accelerationNumericm/s2OBD-II scanner
5throttle positionNumeric%OBD-II scanner
6engine temperatureNumeric°COBD-II scanner
7system voltageNumericvoltsOBD-II scanner
8distance traveledNumerickmOBD-II scanner
9engine load value 1Numeric%OBD-II scanner
10latitudeNumeric GPS receiver
11longitudeNumeric GPS receiver
12altitudeNumericmGPS receiver
13id vehicleNumeric Database
14heart rateNumericbpmDriver’s dataHealth monitor
15body temperatureNumeric°CHealth monitor
16id driverNumeric Database
17current weatherCategorical Weather dataWeb service
18has precipitationBoolean Web service
19is day timeBoolean Web service
20temperatureNumeric°CWeb service
21wind speedNumerickm/hWeb service
22wind directionNumeric Web service
23relative humidityNumeric%Web service
24visibilityNumerickmWeb service
25uv index 2Numeric Web service
26cloud coverNumeric Web service
27ceiling 3NumericmWeb service
28pressureNumericmbWeb service
29precipitationNumericmmWeb service
30accidents on siteNumericnumberTraffic accidentsDatabase
31design speedNumerickm/hRoad geometrics characteristicsDatabase
32accidents timeNumericnumberTraffic accidentsDatabase
1 It refers to the quantity of air that an engine consumes. 2 It refers to the level of ultraviolet radiation. 3 It refers to the height from the surface to the lowest layer of clouds.
Table 6. Sample of control points for Routes 1 and 2.
Table 6. Sample of control points for Routes 1 and 2.
RouteRoad IDSegmentStarting PointEnd Point
ID Latitude Longitude ID Latitude Longitude
1AGRS01P001−0.29755−78.46091P008−0.29065−78.46514
AGRS21P066−0.2267−78.48815P069−0.2271−78.49302
DMQS41P252−0.10668−78.47647P256−0.10049−78.47192
AVIS61P431−0.21767−78.49135P434−0.21874−78.49304
AGRS89P544−0.28173−78.47117P550−0.28893−78.46642
2AGRS001P001−0.29755−78.46091P008−0.29065−78.46514
ASBS041P121−0.18184−78.45117P122−0.18123−78.45135
ASBS081P223−0.16034−78.44692P226−0.16671−78.44792
ASBS121P348−0.26910−78.50768P350−0.2717−78.50844
ASBS173P538−0.23428−78.49155P539−0.23394−78.49249
Table 7. Sample of data file.
Table 7. Sample of data file.
TimeSpeedrpmAccelerationThrottle PositionEngine TemperatureSystem VoltageDistance TravelledEngine Load Value
15:33:15652306−0.727926.27459712.618.420417.6470
15:33:16622246−0.748832.94119712.618.434647.0588
15:33:17612217−0.228136.47059712.618.445249.4117
15:33:186122010.040.09612.618.467365.0980
15:33:196122250.072.54909612.618.478876.8627
15:33:206222580.081.96079612.718.499280.7843
TimeAltitudeid VehicleLatitudeLongitudeid DriverHeart RateBody
Temperature
Current Weather
15:33:152586.614−0.195041−78.46311546429Clouds and sun
15:33:162587.174−0.194989−78.46296346429Clouds and sun
15:33:172587.404−0.19493−78.46281646429Clouds and sun
15:33:182588.944−0.194862−78.46268446429Clouds and sun
15:33:192589.784−0.194783−78.4625646429Clouds and sun
15:33:202590.044−0.194683−78.46244146429Clouds and sun
TimeHas PrecipitationIs Day TimeTemperatureWind SpeedWind DirectionRelative HumidityVisibilityuv Index
15:33:15FALSETRUE19.514.506282
15:33:16FALSETRUE19.514.506282
15:33:17FALSETRUE19.514.506282
15:33:18FALSETRUE19.514.506282
15:33:19FALSETRUE19.514.506282
15:33:20FALSETRUE19.514.506282
TimeCloud CoverCeilingPressurePrecipitationAccidents OnsiteDesign SpeedAccidents Time
15:33:157431391019.62.48503
15:33:167431391019.62.47703
15:33:177431391019.62.48703
15:33:187431391019.62.48703
15:33:197431391019.62.48703
15:33:207431391019.62.49703
Table 8. Sample of threshold values and ranges for manual data labeling.
Table 8. Sample of threshold values and ranges for manual data labeling.
#AttributeItemIDValue RangePenalty
1rpmvery high-[5001–8000]3
2engine temperatureoverheating-[105–200]2
3heart ratetachycardia severe-[121–180]4
4weather typesrain18-4
5visibilitybad-[0.0–0.0]4
6precipitationviolent-[50.1–100.0]4
7accidents on sitevery high-[133–300]4
8design speedvery serious-[41–100]4
9accidents timehigh-[10–100]3
Table 9. Evaluation of data labeling.
Table 9. Evaluation of data labeling.
#MethodHyperparametersAccuracy
1Label propagation (LP)alpha = 0.2, gamma = 0.1, kernel = knn, number_neighbors = 10, and maximum_iterations = 50000.71
2Label spreading (LS)gamma = 0.1, kernel = knn, number_neighbors = 15, and maximum_iterations = 50000.75
3Self training (SVM)kernel = rbf, probability = True, and gamma = 0.10.62
4Self training (MLP)activation_function = relu, hidden_layers = 3, neurons_per_layer = 30, learning_rate = constant, maximum_iterations = 5000, and solver = adam0.84
5Self training (RF)number_estimators = 50, maximum_depth = None, minimum_samples_leaf = 1, maximum_features = sqrt, and minimum_samples_split = 20.83
6Self training (GBM)learning_rate = 0.8, maximum_depth = 30, number_estimators = 100, minimum_samples_leaf = 1, maximum_features = None, and minimum_samples_split = 20.82
7Ensembleestimators = [LP, LS, MLP, RF, GBM] and voting = hard0.82
Table 10. Configuration of the learning models.
Table 10. Configuration of the learning models.
#AlgorithmHyperparameters
1Gradient Boosting Machine (GBM)learning_rate = 0.8, loss_function = log_loss, maximum_depth = 30, maximum_features = sqrt, minimum_samples_split = 0.5, and number_estimators = 100
2Multilayer Perceptron (MLP)activation_function = tanh, hidden_layers = 3, neurons_per_layer = 100, learning_rate = adaptive, maximum_iterations = 1000, and solver = lbfgs
Table 11. Evaluation of the learning models.
Table 11. Evaluation of the learning models.
#AlgorithmFoldsResultsAvg. Accuracy
1GBM10[0.95713157, 0.95673647, 0.9589014, 0.95238095, 0.95751828, 0.9608773, 0.95573997, 0.95593756, 0.95692551, 0.95732069]0.956
2MLP10[0.98656657, 0.98636902, 0.98794705, 0.9832049, 0.98656392, 0.98755187, 0.9869591, 0.98676151, 0.9869591, 0.98794705]0.986
Table 12. Sample of observations and their predicting classes.
Table 12. Sample of observations and their predicting classes.
HourSpeedrpmAccel.Throttle Pos.Eng. Temp.Eng. Load ValueHeart RateCurr. WeatherVisib.Precip.Acdnt. on SiteDesign SpeedAcdnt. That TimeRisk Level
151143251−0.6417.69423.5100rain3.2810901very high
152639341.4874.19622.7102rain3.28105906very high
152635401.876.19794.5102rain3.28105906very high
15 *8129500.22609458118rain3.2812803very high
164420990.4116.19222101cloudy8242488015very high
1612936940.1431.89187.596cloudy884800high
191023895049949184cloudy6.44.828701high
15782873−0.2734.19485.9118rain3.2810803high
16612223−0.1330.69491.8106cloudy82437601high
157621840.120.49457.394rain3.202549010high
1612635730.2965.19486.794cloudy887900medium
195844130.46409181.284cloudy6.4082903medium
158424020.2616.59121.6113cloudy805600medium
16782820076.99490.2104cloudy82411700medium
168424100.2543.19526.367mostly cloudy16.11.32539014medium
2012234660.4928.6927189cloudy8038901low
205441770.4740.49391.485cloudy802700low
159838320.5678.89263.9104cloudy8015901low
166623160.3418.89134.598cloudy8245900low
167426800.5482958074hazy sunshine16.102469016low
* See the explanation of this observation in Section 4.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marcillo, P.; Arciniegas-Ayala, C.; Valdivieso Caraguay, Á.L.; Sanchez-Gordon, S.; Hernández-Álvarez, M. POLIDriving: A Public-Access Driving Dataset for Road Traffic Safety Analysis. Appl. Sci. 2024, 14, 6300. https://doi.org/10.3390/app14146300

AMA Style

Marcillo P, Arciniegas-Ayala C, Valdivieso Caraguay ÁL, Sanchez-Gordon S, Hernández-Álvarez M. POLIDriving: A Public-Access Driving Dataset for Road Traffic Safety Analysis. Applied Sciences. 2024; 14(14):6300. https://doi.org/10.3390/app14146300

Chicago/Turabian Style

Marcillo, Pablo, Cristian Arciniegas-Ayala, Ángel Leonardo Valdivieso Caraguay, Sandra Sanchez-Gordon, and Myriam Hernández-Álvarez. 2024. "POLIDriving: A Public-Access Driving Dataset for Road Traffic Safety Analysis" Applied Sciences 14, no. 14: 6300. https://doi.org/10.3390/app14146300

APA Style

Marcillo, P., Arciniegas-Ayala, C., Valdivieso Caraguay, Á. L., Sanchez-Gordon, S., & Hernández-Álvarez, M. (2024). POLIDriving: A Public-Access Driving Dataset for Road Traffic Safety Analysis. Applied Sciences, 14(14), 6300. https://doi.org/10.3390/app14146300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop