You are currently viewing a new version of our website. To view the old version click .
Informatics
  • Article
  • Open Access

30 January 2023

The Prediction of Road-Accident Risk through Data Mining: A Case Study from Setubal, Portugal

,
and
1
Portuguese Military Academy, Rua Gomes Freire, 1169-203 Lisbon, Portugal
2
Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
3
Military Academy Research Center (CINAMIL), Rua Gomes Freire, 1169-203 Lisbon, Portugal
4
Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics (LIBPhys-UC), 3000-370 Coimbra, Portugal

Abstract

This work proposes a tool to predict the risk of road accidents. The developed system consists of three steps: data selection and collection, preprocessing, and the use of mining algorithms. The data were imported from the Portuguese National Guard database, and they related to accidents that occurred from 2019 to 2021. The results allowed us to conclude that the highest concentration of accidents occurs during the time interval from 17:00 to 20:00, and that rain is the meteorological factor with the greatest effect on the probability of an accident occurring. Additionally, we concluded that Friday is the day of the week on which more accidents occur than on other days. These results are of importance to the decision makers responsible for planning the most effective allocation of resources for traffic surveillance.

1. Introduction

Road accidents cause multiple deaths each year and result in economic and physical damage to their victims; additionally, they incur the loss of public resources. Preventive action by the security forces has focused on what is known as Information-Guided Policing [1]. Since accident-related data are stored in the National Guard database, it is possible to discover patterns correlated with the occurrence of accidents and to create knowledge that is useful in decision-making. Data-mining techniques have evolved significantly in recent decades and are being widely applied to several real-world problems. Current data-mining methods can be used on a database to rapidly extract knowledge that can help to guide policing methods and thus improve accident-prevention techniques and awareness campaigns produced by the security forces.
This work aims to develop a tool to aid Information-Guided Policing in traffic management. Several data mining algorithms were applied to different types of datasets, including the National Guard database, which contains multiple accident reports. To complement the data provided by the National Guard, other publicly available databases were explored, such as meteorological data sources and the annual calendar.
This work is one of the limited number of research projects carried out by Portuguese researchers using data from the Portuguese National Guard to analyze and predict road accidents. One of the objectives of this work is to provide statistical and predictive information on traffic accidents for the National Guard and other researchers.
This investigation is original because, unlike other works that use categorical variables to identify the variables that most influence the severity of accidents, it sets out to predict the number of accidents likely to occur in a future time frame. One of the main objectives of this work is to make possible the prediction of accidents using categorical variables, combining a number of factors from past events with anticipated future data (e.g., meteorological conditions) to forecast the places where there will be a higher risk of traffic accidents occurring. A further objective of this work is to compare road accidents occurring prior to the COVID-19 pandemic with those occurring during the pandemic.
This work is divided into five sections. The first section sets out an introduction to the theme explored in this work and additionally relevant topics are also described. The second section examines the state of the art and is divided into two parts: the first part analyzes classical classification methods, and the second part analyzes deep neural network methods. The third section presents and develops concepts including the discovery of knowledge in databases and the respective steps used in data filtering to select the data relevant to preprocessing and to prepare the data for data mining algorithms, mainly classifiers and performance-evaluation metrics. In the fourth section the results are presented and analyzed; they are compared in order to identify the most effective algorithm for the intended task. The fifth section sets out our conclusions, where the main information extracted during this work is summarized.

3. Theoretical Framework

Current technology allows the storage of large and multiple databases. The analysis of these data is often useful; however, it is impractical without the aid of computational tools. The knowledge discovery in databases (KDD) process uses computational tools to identify valid and potentially useful patterns in the data and to generate knowledge [20,21,22,23,24]. Typically, this process includes the following steps:
Data selection/Problem definition: the domain of available data is defined, as are the information and data that are relevant and the knowledge-discovery objectives.
Preprocessing: this aims to prepare the data for the algorithms of the next stage. This involves performing data cleaning, data integration, data reduction, and data transformation/normalization.
Data Mining: the algorithms are applied to the data in search of knowledge and in order to extract patterns from the data. The choice of algorithm to be applied depends on the type of task to be performed.
Evaluation and representation of results: the models produced are interpreted, and evaluation metrics are used to estimate the quality of the results. Tools are used to visualize the data produced as output.
We aim to solve a regression problem in which the target variable is the number of accidents that occur on each road in a range of time periods. The learning is supervised once we already have the annotated data related to accidents, in order to train the model. The input data is categorical and the target variable is numeric.
Supervised learning occurs when data already have an associated output. As is the case with our data, we will only implement algorithms that fit this profile. For example, if the objective of a data mining problem is to predict male or female gender from the image of a face, it is necessary to have a set of faces with the gender already correctly identified. It is important to distinguish regression problems, where the data for which we want to predict the value are numerical values, from classification problems, where the data are categorical values [23,25,26,27,28].
Different techniques were analyzed in [26] and it was concluded that decision trees, naive Bayes, and support vector machines are the most frequently used techniques. Other frequently used supervised learning algorithms are k-nearest neighbors (kNN) [25,26,27,29] and the artificial neural network (ANN) [25,27,30]. Based on this information, these algorithms were implemented.
The most important attributes for road traffic accidents [5,31,32,33,34,35,36,37] were divided into three groups and listed in Table 1.
Table 1. Attributes considered important for traffic accidents that were found in the literature.
For the selection of attributes, it is important to analyze the correlation between the different variables and the target variable. The Pearson correlation coefficient is often used to compute the linear correlation between continuous numeric variables. However, we must use a different metric to compute the correlation between categorical variables, as is the case with our dataset. The Cramer V correlation is used to compute the correlation between nominal categorical variables with more than two (non-binary) values [38].
The Cramer’s V correlation is defined as [39]:
c = X 2 N ( k 1 )
where c is the value of V of Cramer, X2 is the value of chi-squared, N is the number of samples, and k is the number of categories of the variable with the smallest number of categories. The chi-square value is defined as:
X 2 = ( o i j e i j   ) 2 e i j  
where eij is the expected frequency value and oij is the observed frequency value of a combination of two values, one of variable i, the other of variable j. The expected frequency value can be computed as
e i j   = o i   .   o j N
and represents the expected frequency of a combination of two values (one of i, the other of j). In the previous formula, oi is the marginal frequency of one of the values of the variable i, oj is the marginal frequency of one of the values of j, and N is the total number of samples.
The interpretation of the strength of the correlation between two nominal categorical variables as a function of Cramer’s V is given in Table 2 [36].
Table 2. Interpretation of Cramer’s V coefficient.
To achieve a universal standard for deleting attributes with low correlation values, it is important that all calculated correlations be comparable. The Kruskal-Wallis is equivalent to the chi-square also used in Cramer’s V, so the values achieved can be compared in the two measures. The expression for the Kruskal-Wallis test [29,40,41] is given by:
H = ( N 1 ) i = 1 g n i ( r ¯ i r ¯ ) 2 i = 1 g j = 1 n i ( r i j r ¯ ) 2
where N is the total number of samples across all groups, g is the number of groups, ni is the number of samples in group i, rij is the rank value of sample j that belongs to group i, r i ¯ = j = 1 n i r i j n i is the mean value of the rank of all observations j in group i; and r ¯ = r 2 ( N + 1 ) is the average value of the sum of all classifications rij, i.e., the expected value for the average of all groups.
Relief-based feature selection (RBA) and sequential backward selection (SBS) were used for the selection of features [42,43,44,45]. Starting from an empty set of features, the SBS gradually adds features selected by a performance measure, which measures the extent to which each feature improves or worsens a mining method. At each iteration, the feature to be included in the feature set is selected from those available in the feature set.
To evaluate the different mining algorithms, we use the mean absolute error (MAE), which is an error measurement that sums the absolute error between the observations and the value obtained by the model. The mean squared error was not used, because the number of accidents has many outliers that significantly bias this metric. The MAE is given by the following equation:
M A E = 1 n j = 1 n | y j ¯ y j |
As the purpose of this work is to present the risk of accidents rather than to predict the exact value of accidents, the predicted values and the actual values are grouped into three risk groups: low, medium, and high. After making this grouping we can compute the classification accuracy:
A c c u r a c y = T P + T N T P + T N + F P + F N
The classification accuracy measures the ratio of correct predictions to the total number of instances evaluated, where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives [29].

4. Results and Discussion

In this section, the results produced by the methodology are presented and analyzed. When more than one technique is presented, these are compared in order to assess which technique best suits the task in question.
The implemented methodology was developed using a Lenovo IdeaPad 3 computer with an AMD Ryzen 5 5500 U processor and AMD Radeon Graphics processor. The Python language was used through Jupyter in Anaconda. The Python libraries used were Keras, Numpy, Scikit-learn, Matplotlib, and Pandas.

4.1. Dataset

The data provided by the Portuguese National Guard correspond to the years 2019 to 2021 in the district of Setubal, a peripheral city of the capital of Portugal, Lisbon. This information includes road accidents and also data on administrative offenses that contain the number of inspections carried out, the number of drivers who had consumed excessive alcohol, the number of drivers who were speeding, and other administrative offences. In the present work, only information relating to traffic accidents was considered relevant. Regards data selection, in Table 3, all the attributes selected from the accident reports are presented.
Table 3. Selected attributes from the National Guard Database.
The data distributed among several time intervals is presented in Figure 1.
Figure 1. Box plots of the frequency of accidents that occurred at the different time intervals (according to the TIME field of Table 3).
With regard to meteorological factors, it was possible to categorize the accidents according to the different weather conditions in which they occurred. Taking into account the probability of rain as P(C) and the probability of having an accident as P(A), the graph in Figure 2 allows us to extract the probability of rain being related to an accident, i.e., P(C|A). This is intended to facilitate a comparison between the probability of having an accident in rainy weather conditions, P(A|C), and the probability of having an accident in fine weather conditions, P(A|B). To make this comparison, the month of December was used as an example. Thus, P(C|A) for the month of December is given by:
P ( C | A ) = 144 + 126 + 132 404 + 300 + 346 = 0.38
where the numbers on the numerator are the number of accidents on rainy days in the months of December in 2019, 2020, and 2021, and the numbers in the denominator are the total number of accidents in the same month.
Figure 2. Number of accidents grouped by month, year, and type of weather condition.
As the average number of rainy days for the month of December in the Setubal district is 8.5 (information extracted from the Weather Spark website (https://pt.weatherspark.com/y/32195/Clima-caracter%C3%ADstico-em-Set%C3%BAbal-Portugal-durante-o-ano#Sections-Precipitation, (accessed on 1 November 2022))), we have:
P ( C ) = 8.5 31 = 0.27
Using Bayes’ theorem, we can compute:
P ( A | C ) = P ( C | A ) × P ( A ) P ( C ) = 1.4 × P ( A )
Using the same procedure for good weather, it can be concluded that the probability of an accident when it is raining is greater than the probability of an accident when the weather is good:
P ( A | B ) = 0.85 × P ( A ) < 1.4 × P ( A ) = P ( A | C )
The location type of the accident (i.e., inside or outside an urban region) was grouped by month and year.
Figure 3 represents the number of traffic accidents grouped by day of the week and by year.
Figure 3. Grouping of the number of accidents by day of the week and year.
Figure 3 shows that the day of the week with more accidents than any other is Friday (that is the day of the week when most traffic congestion occurs [46]), and the days with the fewest accidents are Saturday and Sunday.
Figure 4 presents data on road accidents grouped by month to facilitate the comparison of accidents between different years but in the same month.
Figure 4. Number of monthly accidents before COVID-19 (2019) and during the COVID-19 pandemic (2020 and 2021).
The information presented in Figure 4 shows an approximate average value of 550 accidents per month in 2019. In early 2020, COVID-19 expanded worldwide, leading to a pandemic being declared in March 2020, and many countries declared a lockdown that affected almost all activities. As a result, the number of accidents reached a minimum in April 2020, with fewer than 200 accidents. The monthly number of accidents gradually increased, with slight reductions in November 2020 and February 2021 due to government measures to encourage remote working, owing to concern about the peaks in the prevalence of the disease in Europe.

4.2. Selection of Attributes

It was possible to achieve the different correlation values for pairs of nominal and numerical categorical variables (using the Kruskal-Wallis test) and for pairs of nominal categorical variables with nominal categorical variables (using Cramer’s V).
In Figure 5 we can see that, in the variable “counting”, which represents the accident count, the variables with the highest correlation are the time of day, the type of place, the location, and the meteorological factors. The type of accident, which here represents its severity, was considered only to be verified if there was a correlation between the severity of the accident and the number of accidents that occurred; this is confirmed, since there is a low correlation of 0.18 for this pair of variables.
Figure 5. Correlations of Cramer V and Kruskal-Wallis test for the most relevant pairs of variables.
The RBA and SBS were used for feature selection processing data only from motorways, since it was concluded that only for motorways is it possible to obtain a credible model for accident prediction. Despite the results depending on the classification algorithm used, there were several attributes where both algorithms agreed (see Table 4).
Table 4. Relevance of features for the creation of predictive models obtained with RBA and SBS for incidents that occur on motorways.

4.3. Data Mining

Owing to the importance of the location of accidents, it was decided to group the data by their location: motorways; national roads or itineraries; and village roads. To achieve an accident-risk evaluation, we decided to divide the risk into classes, as shown in Table 5.
Table 5. Intervals of number of accidents that correspond to the different classes of risk.
The data were grouped according to accidents on motorways, on village roads, on itineraries, on national roads, and in municipalities. It was decided to divide the classes using the intervals defined in Table 5. The purpose of keeping the classification range was to facilitate an understanding of the behavior of each individual model for each type of road.
Accidents on motorways represented 9.3% of all accidents; accidents on itineraries or national roads represented 30% of all accidents; and accidents outside the previous two catgories, including those in village streets, represented 60.7% of all accidents.
Starting with the motorway dataset, the best models produced for each algorithm according to the metrics used are represented in Table 6, Table 7 and Table 8.
Table 6. Results for motorways: 9.3% of total accidents.
Table 7. Results for itineraries or national roads: 30.0% of total accidents.
Table 8. Results for village roads: 60.7% of total accidents.
This option was chosen based on information set out in the box diagrams shown in Figure 6. As can be seen, the variance of values is greater for the dataset relating to motorways and village roads.
Figure 6. Box plots for the values of the frequency of accidents on the different type of roads.
In the dataset of itineraries or national roads, there is, in most cases, only one accident in each of the time intervals; therefore, a higher frequency of accidents would be necessary for the model to be of use.
From Table 9 we see that the motorway, despite being the location with the lowest number of accidents (for the district of Setubal), is the location with the highest concentration of accidents per area when compared with villages, and with the highest concentration of accidents per individual motorway when compared with the concentration of accidents per individual national road. The motorway is the type of road where there are more injuries and deaths per accident; this can be seen in Table 10.
Table 9. Summary of the best results from Table 6, Table 7 and Table 8.
Table 10. Information relating to the number of injuries and deaths per accident.
Additionally, the motorway is the location where it is possible for the National Guard to carry out more effective surveillance; village roads, by contrast, are extremely numerous, and there is a large area where accidents can occur.

5. Conclusions

In this work, data-mining methods for the prediction of the risk of road accidents were analyzed. Data on accident reports were made available by the National Guard and related to accidents that occurred in the Setubal region from 2019 to 2021. We describe the process followed to develop accident-prediction methods. This process consists of three modules: (i) data selection and collection, (ii) pre-processing, and (iii) the use of mining algorithms.
Through a preliminary data analysis, it was concluded that the highest concentration of accidents is seen between 17 h and 20 h. It was also possible to conclude that rain is the meteorological factor with the highest probability of increasing the risk of an accident. A further conclusion is that the day of the week on which more accidents occur than any other is Friday. These conclusions are consistent with the literature [47].
Through an analysis of the correlation between the different variables, it was possible to conclude that location is the variable that most influences the frequency of accidents. Following on from this conclusion, the information characterizing the accidents was grouped according to the type of road where the accidents occurred. For this reason, it was necessary to create different models for each set. In addition to the location, the correlation between variables also highlighted other factors that influenced the frequency of accidents, such as the time of day, the meteorological conditions, and whether the accident occurred in a village or elsewhere. After dividing the data set into the three types of location (motorways, national roads or itineraries, and villages), it was possible, using the feature-selection algorithms, to understand which features most influence each type of accident location.
The data-mining problem was approached as a regression problem, since the target variable was the frequency of accidents in the defined time range. The mining algorithms tested were kNN, simple linear regression, Lasso and Ridge, the Decision Tree for regression, and the traditional neural network, both for the initial dataset and for the datasets divided by location in the following sets: motorways, national roads or itineraries, and villages. The best result was achieved through the neural network. However, for each set, different models were produced, with different architectures (number of nodes, training periods, etc.). The best result occurred for the motorway dataset. The motorway, despite being the location with the lowest number of accidents, is the one with the highest density of accidents per area when compared with villages; it also features the highest density of accidents per road, when compared with the concentration of accidents on national routes or roads. In addition, the motorway is the location where there are more injuries and deaths per accident. The motorway is also the location where it is possible for the National Guard to carry out more effective surveillance, since in the villages there are a large number of roads, and consequently there is a vast area where accidents can occur; however, the density of accidents on village roads is low.
This work is of value owing to the fact that it was possible to obtain good results for the prediction of the risk of accidents on motorways, but with variables that can be predicted in a future time frame. For example, it is possible today to make a weather forecast for the next week; we can distinguish the different days of the week in the future; we know which days will be holidays, etc. By using input data that relating only to future events, we are able to obtain an accident-risk result for a day in the future and thus enable the police to improve their forward planning.
In future work, the first step would be to improve data collection to ensure that the geolocation of accidents was acquired, making it possible to opt for more complex approaches. Another important variable to obtain would be the level of human mobility; it would be possible to acquire this by using applications such as Google Maps or Waze, or simply by recording the speed at which Uber taxis or other companies’ vehicles travel.

Author Contributions

J.S.S. and A.B. proposed the idea and concept; D.D. developed the software under the supervision of J.S.S. and A.B.; all authors revised and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Military Academy Research Center (CINAMIL) and by FCT through the projects UID/FIS/04559/2019, HAVATAR (PTDC/EEI-ROB/1155/2020) and LARSyS (UIDB/50009/2020).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this work was imported into the huge private National Guard database. Other researchers who intend to use this data in the future must formalize in writing a request for access to the National Guard’s private database, which will decide on a case-by-case basis.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hengst, M.D.; Mors, J.T. Community of Intelligence: The Secret Behind Intelligence-Led Policing. In Proceedings of the 2012 European Intelligence and Security Informatics Conference, Odense, Denmark, 22–24 August 2012; pp. 22–29. [Google Scholar]
  2. Castro, Y.; Kim, Y.J. Data mining on road safety: Factor assessment on vehicle accidents using classification models. Int. J. Crashworthiness 2016, 21, 104–111. [Google Scholar] [CrossRef]
  3. Kashyap, J.; Chandra, A.; Singh, P. Mining Road Traffic Accident Data to Improve Safety on Road-related Factors for Classification and Prediction of Accident Severity. Int. Res. J. Eng. Technol. 2016, 10, 2395–2456. [Google Scholar]
  4. Hussain, S.; Muhammad, L.J.; Ishaq, F.S.; Yakubu, A.; Mohammed, I.A. Performance evaluation of various data mining algorithms on road traffic accident dataset. Smart Innov. Syst. Technol. 2019, 106, 67–78. [Google Scholar] [CrossRef]
  5. Kumeda, B.; Zhang, F.; Zhou, F.; Hussain, S.; Almasri, A.; Assefa, M. Classification of road traffic accident data using machine learning Algorithms. In Proceedings of the 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN), Chongqing, China, 12–15 June 2019; pp. 682–687. [Google Scholar] [CrossRef]
  6. Chen, Q.; Song, X.; Yamada, H.; Shibasaki, R. Learning deep representation from big and heterogeneous data for traffic accident inference. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016), Phoenix, AZ, USA, 21 February 2016; pp. 338–344. [Google Scholar]
  7. Yuan, Z.; Zhou, X.; Yang, T. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, United Kingdom, 19 August 2018; Volume 18, pp. 984–992. [Google Scholar] [CrossRef]
  8. Krukowicz, T.; Firląg, K.; Chrobot, P. Spatiotemporal analysis of road crashes with animals in Poland. Sustainability 2022, 14, 1253. [Google Scholar] [CrossRef]
  9. Billah, K.; Sharif, H.O.; Dessouky, S. How Gender Affects Motor Vehicle Crashes: A Case Study from San Antonio, Texas. Sustainability 2022, 14, 7023. [Google Scholar] [CrossRef]
  10. Saveliev, A.; Lebedeva, V.; Lebedev, I.; Uzdiaev, M. An approach to the automatic construction of a road accident scheme using UAV and deep learning methods. Sensors 2022, 22, 4728. [Google Scholar] [CrossRef]
  11. Tajnik, S.; Luin, B. Impact of Driver, Vehicle, and Environment on Rural Road Crash Rate. Sustainability 2022, 14, 15744. [Google Scholar] [CrossRef]
  12. Bokaba, T.; Doorsamy, W.; Paul, B.S. Comparative study of machine learning classifiers for modelling road traffic accidents. Appl. Sci. 2022, 12, 828. [Google Scholar] [CrossRef]
  13. Islam, M.K.; Gazder, U.; Akter, R.; Arifuzzaman, M. Involvement of Road Users from the Productive Age Group in Traffic Crashes in Saudi Arabia: An Investigative Study Using Statistical and Machine Learning Techniques. Appl. Sci. 2022, 12, 6368. [Google Scholar] [CrossRef]
  14. Islam, M.K.; Reza, I.; Gazder, U.; Akter, R.; Arifuzzaman, M.; Rahman, M.M. Predicting Road Crash Severity Using Classifier Models and Crash Hotspots. Appl. Sci. 2022, 12, 11354. [Google Scholar] [CrossRef]
  15. Mesquitela, J.; Elvas, L.B.; Ferreira, J.C.; Nunes, L. Data Analytics Process over Road Accidents Data—A Case Study of Lisbon City. ISPRS Int. J. Geo-Inf. 2022, 11, 143. [Google Scholar] [CrossRef]
  16. Guido, G.; Shaffiee Haghshenas, S.; Shaffiee Haghshenas, S.; Vitale, A.; Astarita, V.; Park, Y.; Geem, Z.W. Evaluation of Contributing Factors Affecting Number of Vehicles Involved in Crashes Using Machine Learning Techniques in Rural Roads of Cosenza, Italy. Safety 2022, 8, 28. [Google Scholar] [CrossRef]
  17. Kim, H.; Kim, J.-T.; Shin, S.; Lee, H.; Lim, J. Prediction of Run-Off Road Crash Severity in South Korea’s Highway through Tree Augmented Naïve Bayes Learning. Appl. Sci. 2022, 12, 1120. [Google Scholar] [CrossRef]
  18. Rodionova, M.; Skhvediani, A.; Kudryavtseva, T. Prediction of crash severity as a way of road safety improvement: The case of Saint Petersburg, Russia. Sustainability 2022, 14, 9840. [Google Scholar] [CrossRef]
  19. Infante, P.; Jacinto, G.; Afonso, A.; Rego, L.; Nogueira, V.; Quaresma, P.; Saias, J.; Santos, D.; Nogueira, P.; Silva, M. Comparison of statistical and machine-learning models on road traffic accident severity classification. Computers 2022, 11, 80. [Google Scholar] [CrossRef]
  20. Goldschmidt, R.; Passos, E.; Bezerra, E. Data Mining, Conceitos Técnicas, Algoritmos, Orientações e Aplicações; Elsevier: Rio de Janeiro, Brasil, 2015. [Google Scholar]
  21. Fayyad, U.M.; Piatetsky-Shapiro, G.; Smyth, P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 82–88. [Google Scholar]
  22. Hendrickx, T.; Cule, B.; Meysman, P.; Naulaerts, S.; Laukens, K.; Goethals, B. Mining association rules in graphs based on frequent cohesive itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, 19–22 May 2015; pp. 637–648. [Google Scholar]
  23. Agarwal, S. Data mining: Data mining concepts and techniques. In Proceedings of the 2013 International Conference on Machine Intelligence and Research Advancement, Katra, India, 21 December 2013; pp. 203–207. [Google Scholar]
  24. Zhang, S.; Zhang, C.; Yang, Q. Data preparation for data mining. Appl. Artif. Intell. 2003, 17, 375–381. [Google Scholar] [CrossRef]
  25. Mueller, J.P.; Massaron, L. Deep Learning for Dummies; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
  26. Berry, M.W.; Mohamed, A.; Yap, B.W. Supervised and Unsupervised Learning for Data Science; Springer: Cham, Switzerland, 2020. [Google Scholar]
  27. Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  28. Sen, P.C.; Hajra, M.; Ghosh, M. Emerging Technology in modelling and graphics. Singap. Springer Singap. 2020, 937, 99. [Google Scholar]
  29. Belanche, L.A.; González, F.F. Review and evaluation of feature selection algorithms in synthetic problems. arXiv 2011, arXiv:1101.2320. [Google Scholar]
  30. Indrakumari, R.; Poongodi, T.; Singh, K. Introduction to Deep Learning. In Advanced Deep Learning for Engineers and Scientists; Springer: Cham, Switzerland, 2021; pp. 1–22. [Google Scholar]
  31. Eisenberg, D. The mixed effects of precipitation on traffic crashes. Accid. Anal. Prev. 2004, 36, 637–647. [Google Scholar] [CrossRef]
  32. Hayat, R.B.; Debbarh, M.; Antoniou, C.; Hayat, R.B.; Debbarh, M.; Antoniou, C.; Yannis, G. Explaining the road accident risk: Weather effects. Accid. Anal. Prev. 2013, 1, 456–465. [Google Scholar] [CrossRef]
  33. Tamerius, J.D.; Zhou, X.; Mantilla, R.; Greenfield-Huitt, T. Precipitation effects on motor vehicle crashes vary by space, time, and environmental conditions. Weather. Clim. Soc. 2016, 8, 399–407. [Google Scholar] [CrossRef]
  34. Febres, J.D.; Garca-Herrero, S.; Herrera, S.; Gutirrez, J.M.; Lpez-Garca, J.R.; Mariscal, M.A. Influence of seat-belt use on the severity of injury in traffic accidents. Eur. Transp. Res. Rev. 2020, 12, 1–12. [Google Scholar] [CrossRef]
  35. Musile, G.; Pigaiani, N.; Sorio, D.; Colombari, M.; Bortolotti, F.; Tagliaro, F. Alcohol-associated traffic injuries in Verona territory: A nine-year survey. Med. Sci. Law 2021, 61, 7–13. [Google Scholar] [CrossRef] [PubMed]
  36. Song, Y.; Kou, S.; Wang, C. Modeling crash severity by considering risk indicators of driver and roadway: A Bayesian network approach. J. Saf. Res. 2021, 76, 64–72. [Google Scholar] [CrossRef]
  37. Martn-delosReyes, L.M.; Martnez-Ruiz, V.; Rivera-Izquierdo, M.; Jimnez-Mejas, E.; Lardelli-Claret, P. Is driving without a valid license associated with an increased risk of causing a road crash? Accid. Anal. Prev. 2021, 149, 1–7. [Google Scholar] [CrossRef]
  38. Zhang, Z.; McDonnell, K.T.; Zadok, E.; Mueller, K. Visual correlation analysis of numerical and categorical data on the correlation map. IEEE Trans. Vis. Comput. Graph. 2015, 21, 289–303. [Google Scholar] [CrossRef]
  39. Bhattacharya, A.; Dunson, D.B. Simplex factor models for multivariate unordered categorical data. J. Am. Stat. Assoc. 2012, 107, 362–377. [Google Scholar] [CrossRef]
  40. Leon, A.C. Descriptive and Inferential Statistics. Compr. Clin. Psychol. 1998, 3, 243–285. [Google Scholar] [CrossRef]
  41. Sun, J. The Microbiome in Health and Disease Preface; Academic Press: Cambridge, MA, USA, 2020; Volume 171, pp. XV–XVI. [Google Scholar]
  42. Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef]
  43. Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
  44. Marcano-Cedeño, A.; Quintanilla-Domínguez, J.; Cortina-Januchs, M.; Andina, D. Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In Proceedings of the IECON 2010—36th Annual Conference on IEEE Industrial Electronics Society, Glendale, AZ, USA, 7 November 2010; pp. 2845–2850. [Google Scholar]
  45. Molina, L.C.; Belanche, L.; Nebot, À. Feature selection algorithms: A survey and experimental evaluation. In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, 9 December 2002; pp. 306–313. [Google Scholar]
  46. SeguroPorDias. O Congestionamento nas Estradas da Cidade do Porto (Congestion on the Roads of the City of Porto). Available online: https://seguropordias.pt/blog/tr%C3%A2nsito-porto-portugal (accessed on 29 December 2022).
  47. Ren, H.; Song, Y.; Wang, J.; Hu, Y.; Lei, J. A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3346–3351. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.