Development of a Binary Classification Model to Assess Safety in Transportation Systems Using GMDH-Type Neural Network Algorithm

Evaluating road safety is an enduring research topic in Infrastructure and Transportation Engineering. The prediction of crash risk is very important for avoiding other crashes and safeguarding road users. According to this task, awareness of the number of vehicles involved in an accident contributes greatly to safety analysis, hence, it is necessary to predict it. In this study, the main aim is to develop a binary model for predicting the number of vehicles involved in an accident using Neural Networks and the Group Method of Data Handling (GMDH). For this purpose, 775 accident cases were accurately recorded and evaluated from the urban and rural areas of Cosenza in southern Italy and some notable parameters were considered as input data including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed; and the number of vehicles involved in an accident was considered as output. In this study, 581 cases were selected randomly from the dataset to train and the rest were used to test the developed binary model. A confusion matrix and a Receiver Operating Characteristic curve were used to investigate the performance of the proposed model. According to the obtained results, the accuracy values of the prediction model were 83.5% and 85.7% for testing and training, respectively. Finally, it can be concluded that the developed binary model can be applied as a reliable tool for predicting the number of vehicles involved in an accident.


Introduction
Since the 20th century, road safety researchers considered accidents as unexpected and unpredictable events [1]. This fatalist notion was overcome by the scientific concept that tries to detect the potential influencing factors that affect the likelihood of road accident occurrence [2]. Traffic safety analysis was traditionally based on historic crash data which presents several lacks due to the limited availability, unreliability and poor quality of collision data [3,4]. Many scientists have spent considerable efforts in analyzing the impacts of various risk factors [5][6][7][8][9] and road safety measures [10][11][12]. For this reason, they have developed a great number of statistical methodologies to approach crash prediction problems [13]. Mathematical models have been the most popular technique in analyzing crash data [14]. The most commonly-used methods are based on Logistic Regression [15][16][17][18][19][20][21], Ordered Choice Models for the purpose of severity modeling of the crash injury data [22][23][24][25], Bayesian Hierarchical Models [26][27][28][29][30][31][32], Bivariate Models [33], Nested Logit Models [34], Multinomial Logit Models [35][36][37][38][39], in order to address the heterogeneity of the crash outcomes or Mixed Logit Models [40][41][42][43][44][45], to analyze the crash injury severities. and predict crash risk. For this purpose, they used Artificial Intelligence and Volume, Speed, and Sensor Occupancy data collected from roadside radar sensors along an Interstate in Iowa. Similar to this research, Xie et al. [76] utilized rich information generated from connected vehicles to obtain surrogate safety measures (SSMs) for risk identification. In particular, they have proposed time to collision with disturbance (TTCD) for risk identification in order to capture rear-end conflict risks in various car-following scenarios, even when the leading vehicle has a higher speed.
As could be seen from a literature review, the Artificial Neural Network (ANN) methodology represents a robust tool used to investigate complex phenomena without assuming any preliminary hypotheses on the model. The main aim of this research is to develop a binary model for predicting the number of vehicles involved in an accident through the use of Neural Networks and the Group Method of Data Handling (GMDH). The authors, applying a multi-scale approach, collected and evaluated 775 accident cases from urban and rural areas in the Province of Cosenza, in southern Italy. Several notable parameters were considered as input data of the model, including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed. Obviously, the number of vehicles involved in an accident was considered as output. In this study, for the training stage, 581 accident cases were selected randomly from the dataset. The rest were used to test the developed binary model. A confusion matrix and a Receiver Operating Characteristic curve were used to investigate the performance of the proposed model.
The paper is organized as follows: The methodology is presented in Section 2, with the theoretical description of GMDH type of neural network and the binary model functional form and a correlation analysis among data; in Section 3, a case study is described, Binary classification models are constructed and then the best model is selected; the obtained results of the best model are discussed in Section 4; and in Section 5, the conclusion is presented and some recommendations for future studies are suggested.

Methods
To predict the number of vehicles involved in an accident, a binary model was developed. The model is based on Neural Networks and, in particular, makes use of the Group Method of Data Handling (GMDH) technique. In this study, 775 accident cases were analyzed, employing a portion of the database for the training phase and the rest for the setup of the binary model. The performance of the proposed model was investigated using a confusion matrix and a Receiver Operating Characteristic. The flowchart of steps of conducting the research is shown in Figure 1.

Group Method of Data Handling (GMDH) Type of Neural Network
In order to assess the complex problems and systems, artificial intelligence and machine learning methods can be applied as an alternative powerful tool instead of classical methods. These methods are widely used in a variety of scientific fields and had a vital role in the development of sciences [77][78][79][80][81][82]. As one of the most important artificial intelligence and machine learning methods, the Group Method of Data Handling (GMDH) type of neural network is a reliable tool for identifying and assessing complex phenomena which is computer-based mathematical modeling. GMDH is one of the families of Inductive Algorithms, which was first introduced by Ivakhnenko [83,84]. He proposed a new idea that an iterative and incremental algorithm could be used instead of building estimation models together. This approach has a suitable capability to tolerate imprecision, uncertainty and deal with vagueness of complex and unstructured systems to reach a reliable modeling. In this approach, polynomial neurons are produced as simple structures and added step by step and then a complex system is formed by combining these simple structures. Natural selection patterns like evolutionary algorithms and gradual model construction indicate the capability of this approach in comparison with classical regression methods in obtaining a high-order input and output relationship [85]. The Polynomial Neural Network (PNN) is known as one of the most basic and important algorithms for building a GMDH model. The general form of GMDH works based on a map of input and output data which is a self-organized and a unilateral neural network, and also, it is called the polynomial of Ivakhnenko equation. The basic neural network map is based on Equation (1) [86,87].
where m indicates the amount of data for values X 1 , X 2 , X 3 , . . . ., X m for an output such as y.
By combining the quadratic polynomials of all the neurons based on Equation (2), outputŷ with an approximate functionf for a set of inputs such as X = (X i1 , X i2 , X i3 , . . . , X im ) with the least possible error compared to output y was obtained [88].
GMDH is made up of several layers; data initially is entered in the first layer and, after processing and combination of data, it is entered in the second layer as a new input. This process continues and when the algorithm reaches an optimal convergence in layer (n + 1) compared to layer (n), the process will be finished. According to Figure 2, data set is divided randomly into two parts including training and testing (checking) parts [89].

Group Method of Data Handling (GMDH) Type of Neural Network
In order to assess the complex problems and systems, artificial intelligence and machine learning methods can be applied as an alternative powerful tool instead of classical methods. These methods are widely used in a variety of scientific fields and had a vital role in the development of sciences [77][78][79][80][81][82]. As one of the most important artificial intelligence and machine learning methods, the Group Method of Data Handling (GMDH) type of neural network is a reliable tool for identifying and  GMDH is made up of several layers; data initially is entered in the first layer and, after processing and combination of data, it is entered in the second layer as a new input. This process continues and when the algorithm reaches an optimal convergence in layer (n + 1) compared to layer (n), the process will be finished. According to Figure 2, data set is divided randomly into two parts including training and testing (checking) parts [89]. GMDH have been used successfully for complex system modeling, pattern recognition, and knowledge discovery, hence, in this study, GMDH was applied to assess safety in a road transportation system.

Correlation Analysis
Before the binary classification modeling, it should be noted that the parametric correlation of each independent input data set should be calculated and controlled, because, although in this study, the input data set were considered by contribution of experts and the literature review, the correlation analysis is necessary to prevent misleading results. Hence, the Pearson correlation coefficient was used as one of the popular and practical approaches to measure the linear correlation between two variables. It is also called the Pearson product-moment correlation coefficient or bilateral correlation coefficient. Equations (3)-(6) demonstrate the mathematical relations of Pearson's correlation coefficient [90,91]. = = .
(3) GMDH have been used successfully for complex system modeling, pattern recognition, and knowledge discovery, hence, in this study, GMDH was applied to assess safety in a road transportation system.

Correlation Analysis
Before the binary classification modeling, it should be noted that the parametric correlation of each independent input data set should be calculated and controlled, because, although in this study, the input data set were considered by contribution of experts and the literature review, the correlation analysis is necessary to prevent misleading results. Hence, the Pearson correlation coefficient was used as one of the popular and practical approaches to measure the linear correlation between two variables. It is also called the Pearson product-moment correlation coefficient or bilateral correlation coefficient. Equations (3)-(6) demonstrate the mathematical relations of Pearson's correlation coefficient [90,91].
in which X and Y are the independent parameters. SS X and SS Y are the standard deviation of X and Y, respectively. SP DXY is the covariance of X and Y. ρ(r) is called Pearson's correlation coefficient which is in the interval of −1 and +1. The absolute value of these coefficients is used for ρ(r) and the positive and negative signs of these coefficients only show a direct and reverse relation between the two independent variables, respectively. If the value of correlation is close to 1, it is clear that there is a strong relation between two independent parameters and if the value of the correlation is close to 0, it is clear that there is a weak relation between them. In addition, negative correlation demonstrates that as one variable increases, so the other reduces, and vice versa [92,93].

Binary Modeling
The main goal of the binary classification model is to recognize a pattern and relation between input dataset including daylight, type of accident, weekday, location, speed limit and average speed, and the number of vehicles as a dependent variable (output). In order to construct an optimum binary Sustainability 2020, 12, 6735 6 of 19 model for prediction of the number of vehicles, determining the control parameters and performance indices of the algorithm contribute greatly to increasing the convergence speed of the algorithm. Hence, at first, in this study, the confusion matrix is considered as one of the practical performance indices for determining the accuracy and reliability of binary classification results analysis for learning with or without an observer. Figure 3 shows the general form of the confusion matrix for a two-cluster problem. In addition, according to the parameters defined in Equations (7) and (8), the value of accuracy (ACC) and error are calculated, respectively.
0, it is clear that there is a weak relation between them. In addition, negative correlation demonstrates that as one variable increases, so the other reduces, and vice versa [92,93].

Binary Modeling
The main goal of the binary classification model is to recognize a pattern and relation between input dataset including daylight, type of accident, weekday, location, speed limit and average speed, and the number of vehicles as a dependent variable (output). In order to construct an optimum binary model for prediction of the number of vehicles, determining the control parameters and performance indices of the algorithm contribute greatly to increasing the convergence speed of the algorithm. Hence, at first, in this study, the confusion matrix is considered as one of the practical performance indices for determining the accuracy and reliability of binary classification results analysis for learning with or without an observer. Figure 3 shows the general form of the confusion matrix for a two-cluster problem. In addition, according to the parameters defined in Equations (7) and (8), the value of accuracy (ACC) and error are calculated, respectively. As mentioned earlier, determining control parameters is the most notable section for increasing the convergence speed of the algorithm. It should be noted that there are no special equations and some of these parameters are determined by previous studies and others are usually determined based on the experience of experts and trial and error [94][95][96]. Hence, in the second step, the binary classification models are constructed based on three of the most important control parameters of the As mentioned earlier, determining control parameters is the most notable section for increasing the convergence speed of the algorithm. It should be noted that there are no special equations and some of these parameters are determined by previous studies and others are usually determined based on the experience of experts and trial and error [94][95][96]. Hence, in the second step, the binary classification models are constructed based on three of the most important control parameters of the algorithm, including selection pressure (SP), maximum number of layers (MNL) and maximum number of neurons in a layer (MNNL). The SP is considered equal to 0.6 based upon previous studies [85,97]. This parameter influences the sensitivity of the modeling error, which is dimensionless; while the maximum number of layers and maximum number of neurons in a layer are selected according to the experience of experts and trial and error. The MNL is considered 5, 10, 15, 20 and 30 and the MNNL includes 5, 10, 20 and 30, and totally, 20 models were constructed for predicting the number of vehicles. It is worth mentioning that there are some recommendations for the ratio of training and testing data from the whole dataset.

Data Collection and Preparation
The dataset was extracted from the Italian ACI-ISTAT database [98] with reference to the years 2017 and 2018. ISTAT is the Italian National Institute of Statistics, the main supplier of official statistical information in Italy. It collects and produces information on the Italian economy and Sustainability 2020, 12, 6735 7 of 19 society and makes it available for study and decision-making purposes. ISTAT works in cooperation with the Automobile Club of Italy (ACI) to standardize the accident data, collecting police reports. Statistical information on accidents is collected by ISTAT by means of a total monthly survey of all road accidents occurring in the entire national territory that have caused injuries to people (dead or injured). The ACI actively collaborates in this investigation. The survey takes place by filling in the ISTAT CTT/INC model called "Road accidents" by the authority that intervened on the site (traffic police, carabinieri, municipal police) for each road accident involving a vehicle circulating on the network road and causing injuries. Therefore, accidents from which no injuries to people have resulted, accidents that have not occurred in public traffic areas and accidents in which vehicles are not involved are excluded from the survey.
In order to parameterize the contents of the survey, the following definitions are used: -Road accidents: those that occur in a road open to public traffic, as a result of which, one or more people were injured or killed and in which at least one vehicle was involved; -Dead: people who died instantly (within 24 h) or those who died from the second to the thirtieth day, starting with that of the accident included; -Injured: people who suffered injuries as a result of the accident. Given the difficulty of defining objective criteria on the level of severity of the injuries suffered, there is no distinction between serious or light injuries.
A total of 775 accident cases were accurately recorded and evaluated from urban and rural areas of Cosenza in southern Italy (Figure 4). These accidents have been grouped, taking into account several categories (Table 1).     The ISTAT database was matched up with a traffic surveys on the same rural and urban roads considered, deriving average vehicle speeds and average traffic volumes. The surveys were carried out in October 2019 by using Bluetooth radar sensors to acquire vehicle speed and traffic volumes ( Figure 5). Radar sensors were located on the road sections with observed crashes. After the analysis of traffic volumes and speed values' statistical trends over a ten-year period, and considering social, economic, demographic and travel demand characteristics of the study area, traffic volumes and vehicle speed values were considered invariant over the last five years. Radar sensors were positioned in a segment where it could be assumed that homogeneous flow and speed conditions were present for the entire length. For example, when a sensor was positioned on a link with homogeneous geometric characteristics greater than 2 miles in length, a circular buffer of 2 miles diameter around the location of the radar sensor (1 mile upstream and downstream) was traced [99]. The geometric homogeneity of a road segment was defined, taking into account number of lanes, lane and shoulder width, speed limit, median type, and median width.
Sustainability 2020, 12, x FOR PEER REVIEW 9 of 20 homogeneous geometric characteristics greater than 2 miles in length, a circular buffer of 2 miles diameter around the location of the radar sensor (1 mile upstream and downstream) was traced [99]. The geometric homogeneity of a road segment was defined, taking into account number of lanes, lane and shoulder width, speed limit, median type, and median width.

Correlation Analysis
In this section, after selecting and preparing a dataset including daylight, type of accident, weekday, location, speed limit and average speed, a correlation analysis was conducted based on Pearson's correlation coefficient by statistical package for the social sciences (SPSS) software. The

Correlation Analysis
In this section, after selecting and preparing a dataset including daylight, type of accident, weekday, location, speed limit and average speed, a correlation analysis was conducted based on Pearson's correlation coefficient by statistical package for the social sciences (SPSS) software. The obtained results are shown in Table 2. According to Table 2, there is a weak correlation between the input data, which is therefore suitable for modeling; as it is known, if |ρ| > 0.85, the correlation coefficient is defined as "strong", which is inappropriate for modeling. For example, the value of Pearson's correlation between Daylight and Average speed is 0.01, and it means that not only are they independent of each other, but also, they have a direct relation; hence, by increasing or decreasing one of them, another will increase or decrease, respectively. Additionally, Daylight and Weekday are independent of each other with a correlation equal to −0.12 and they have an inverse relation. In addition, Type of accident is independent from other variables and it has an inverse relation with other variables. It is worth mentioning that although there is a high correlation coefficient between Speed limit and Average speed of about 0.85, this value can be acceptable by considering their nature. Consequently, it can be concluded that the value of ρ is acceptable for all variables in this study and it shows that they were properly selected.

Binary Modeling
In this study, 775 accident cases were accurately evaluated and recorded from the urban and rural areas of Cosenza in southern Italy, and based on the suggestion proposed in Looney's research study, 0.75 of dataset (581 cases) were selected randomly to train, and the rest (0.25 of dataset) were used to test the developed binary model [100]. As mentioned before, there are considered three control parameters for constructing models that the SP is considered equal to 0.6, based upon previous studies [85,97], and also, the values of MNL are considered 5, 10, 15, 20 and 30 and the values of MNNL include 5, 10, 20 and 30, hence, a total of 20 models were constructed for forecasting the number of vehicles. The obtained results of 20 models are shown in Table 3.
Finally, after constructing the models to select the best model, a simple ranking method was used for ranking each model which was introduced by Zorlu et al. [101]. The results of this ranking are shown in Table 4.
According to the obtained results from Table 4, the 16th and 19th models have the highest and lowest ranks among other developed models, which includes SP, MNL and MNNL of 0.6, 20 and 30, and also 0.6, 30 and 20 respectively.

Results and Discussion
As mentioned above, the 16th model indicates the best performance among the 20 developed models, whose MNL value of optimum is 30. Figure 6 shows the value of root mean square error (RMSE) in each layer. Although the deference of RMSE between consecutive layers from the second layer to end shows the desired precision level, this value is fixed from the 28th layer to the 30th, which demonstrates the suitable speed of convergence and flexibility of the algorithm. According to the obtained results from Table 4, the 16th and 19th models have the highest and lowest ranks among other developed models, which includes SP, MNL and MNNL of 0.6, 20 and 30, and also 0.6, 30 and 20 respectively.

Results and Discussion
As mentioned above, the 16th model indicates the best performance among the 20 developed models, whose MNL value of optimum is 30. Figure 6 shows the value of root mean square error (RMSE) in each layer. Although the deference of RMSE between consecutive layers from the second layer to end shows the desired precision level, this value is fixed from the 28th layer to the 30th, which demonstrates the suitable speed of convergence and flexibility of the algorithm. According to Figure 6 and Equations (7) and (8), the obtained results of the confusion matrix for the 16th model is calculated and shown in Figure 7 for training (a), testing (b) and all data (c). For training data in the confusion matrix, the results explain that the optimum model could estimate 106 and 3 data of class "0" (number of vehicles involved in the accident = 1) as correctly and wrongly, respectively, whose accuracy was 97.2%, and also it could predict 392 and 80 data of class "1" (number of vehicles involved in the accident > 1) as correctly and wrongly, respectively, whose accuracy was 83.1%. It should be noted that the total accuracy of training data obtained was 85.7%. In addition, for testing data, 27 cases were correctly predicted and 1 case was wrongly predicted from class "0", while, 31 data of class "1" were wrongly predicted in class "0" and 135 data in this class were correctly estimated. Finally, the confusion matrix of all data shows that the data of class "0" and "1" were predicted with 97.1% and 82.6% accuracy and, consequently, the accuracy of the total data is reached with highly acceptable degrees of accuracy at 85.2%. According to Figure 6 and Equations (7) and (8), the obtained results of the confusion matrix for the 16th model is calculated and shown in Figure 7 for training (a), testing (b) and all data (c). For training data in the confusion matrix, the results explain that the optimum model could estimate 106 and 3 data of class "0" (number of vehicles involved in the accident = 1) as correctly and wrongly, respectively, whose accuracy was 97.2%, and also it could predict 392 and 80 data of class "1" (number of vehicles involved in the accident > 1) as correctly and wrongly, respectively, whose accuracy was 83.1%. It should be noted that the total accuracy of training data obtained was 85.7%. In addition, for testing data, 27 cases were correctly predicted and 1 case was wrongly predicted from class "0", while, 31 data of class "1" were wrongly predicted in class "0" and 135 data in this class were correctly estimated. Finally, the confusion matrix of all data shows that the data of class "0" and "1" were predicted with 97.1% and 82.6% accuracy and, consequently, the accuracy of the total data is reached with highly acceptable degrees of accuracy at 85.2%.
For more evaluating, the results were assessed by another three performance indexes, namely, Precision, Recall, F1 score [102]. Figure 8 shows evaluation of confusion matrix by accuracy and in comparison with other techniques. In this analysis and evaluation, although the recall is lower than the other performance index, the results of this method should be considered together based upon the results of precision, of which, finally, the results show that the optimum developed model can provide the desired performance capability in estimating the number of vehicles involved in an accident. For more evaluating, the results were assessed by another three performance indexes, namely, Precision, Recall, F1 score [102]. Figure 8 shows evaluation of confusion matrix by accuracy and in comparison with other techniques. In this analysis and evaluation, although the recall is lower than the other performance index, the results of this method should be considered together based upon the results of precision, of which, finally, the results show that the optimum developed model can provide the desired performance capability in estimating the number of vehicles involved in an accident. In classification problems, using a receiver operating characteristic (ROC) curve can play a key role in analysis results which is a probability-based curve. Hence, the ROC curve was also used to evaluate the results provided by the 16th model. Figure 9 indicates the results for training, testing and all data based on the ROC curve. It should be noted that the threshold was considered at 0.5 which is a common value in this case. According to the performance of the 16th model, which was better than other developed models, the area under curve (AUC) of the 16th model is higher in comparison with other developed models. The value of the AUC obtained for evaluating the performance of the developed binary classification model ranges between 0 and 1. It is worth mentioning that the value of AUC equal and less than 0.5 shows that the performance of the developed model is not accepted, while this value is higher than 0.5 for the train, test and total ROC curve. In classification problems, using a receiver operating characteristic (ROC) curve can play a key role in analysis results which is a probability-based curve. Hence, the ROC curve was also used to evaluate the results provided by the 16th model. Figure 9 indicates the results for training, testing and all data based on the ROC curve. It should be noted that the threshold was considered at 0.5 which is a common value in this case. According to the performance of the 16th model, which was better than other developed models, the area under curve (AUC) of the 16th model is higher in comparison with other developed models. The value of the AUC obtained for evaluating the performance of the developed binary classification model ranges between 0 and 1. It is worth mentioning that the value of AUC equal and less than 0.5 shows that the performance of the developed model is not accepted, while this value is higher than 0.5 for the train, test and total ROC curve. Furthermore, based on these analyses, the following remarks and results can be highlighted: - The correlation analysis showed that input data including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed were correctly considered for the binary classification; -Figures 6-9 depict that the GMDH algorithm has a high capability to train and develop the model, which can correctly predict 661 data of the first and second classes from 775 data (total). Additionally, on the basis of the acquired results of confusion matrices, the results were assessed by the other three performance indexes and they indicated that the proposed model can provide higher performance capacity in evaluation of safety in transportation system; -Consequently, it can be concluded that the proposed binary classification model based on the GMDH algorithm was a reliable and alternative model instead of the classical model with a high appropriate acceptable degree to predict the number of vehicles involved in an accident, which may lead transportation engineers toward a greater accuracy and robustness of design and planning of roads by eventually investigating opportune countermeasures to reduce the safety risk; -It is worth mentioning that the binary classification model presented in this study is a model developed for the road network of the Cosenza area, which requires a more in-depth analysis to be transferred to other contexts; - In spite of the fact that the developed model was a reliable system model for evaluation of safety in transportation systems of this case study, it does not have capability for investigation of safety in transportation systems with incomplete data. Furthermore, based on these analyses, the following remarks and results can be highlighted: - The correlation analysis showed that input data including Daylight, Weekday, Type of accident, Location, Speed limit and Average speed were correctly considered for the binary classification; -Figures 6-9 depict that the GMDH algorithm has a high capability to train and develop the model, which can correctly predict 661 data of the first and second classes from 775 data (total). Additionally, on the basis of the acquired results of confusion matrices, the results were assessed by the other three performance indexes and they indicated that the proposed model can provide higher performance capacity in evaluation of safety in transportation system; -Consequently, it can be concluded that the proposed binary classification model based on the GMDH algorithm was a reliable and alternative model instead of the classical model with a high appropriate acceptable degree to predict the number of vehicles involved in an accident, which may lead transportation engineers toward a greater accuracy and robustness of design and planning of roads by eventually investigating opportune countermeasures to reduce the safety risk; -It is worth mentioning that the binary classification model presented in this study is a model developed for the road network of the Cosenza area, which requires a more in-depth analysis to be transferred to other contexts; - In spite of the fact that the developed model was a reliable system model for evaluation of safety in transportation systems of this case study, it does not have capability for investigation of safety in transportation systems with incomplete data.

Conclusions
Assessing safety due to the ambiguity and uncertainty which exist in the effective parameters affecting accidents is not an easy task. Hence, artificial intelligence (AI) and machine learning (ML) are effective methods to evaluate some recurring problems in transportation engineering, especially in road safety assessment. In this study, the main aim is the prediction of the number of vehicles involved in an accident to assess safety using the GMDH algorithm. This was accomplished using 775 accident cases obtained from the urban and rural areas of Cosenza in southern Italy. Several important parameters such as Daylight, Weekday, Type of accident, Location, Speed limit and Average speed were selected as input data and the number of vehicles involved in an accident was considered as output. Generally, 20 developed models were constructed based on three control parameters of algorithms including selection pressure, maximum number of layers and maximum number of neurons in a layer. In addition, in this modeling, 75% of the whole data set were selected for training and the rest considered for the testing dataset and the accuracy of each model was determined according to the confusion matrix. Finally, the 16th model with 85.7% and 83.5% accuracy for the training and testing dataset was selected as the best developed binary classification model. Furthermore, the authors intend to compare the results obtained for the analyzed case study to those obtained for other contexts and to provide a robust analysis of the model transferability. More efforts need to be made to investigate other parameters affecting the number of vehicles involved in an accident based on the dataset available, also for other regions or other countries. It is worth mentioning that road safety depends on the concurrency of three main factors: human behavior, infrastructure and environment; so it is necessary to model, as well as possible, the complex relationships existing among latent and real variables by coupling AI and ML techniques with other classic techniques such as Structural Equation Models (SEM). In future works, it is recommended to see the effectiveness of other types of artificial intelligence and machine-learning methods in order to improve analysis for a binary classification such as Learning Vector Quantization (LVQ) and Naive Bayes (NB) algorithm, and then comparing results with a logit model.