Analysis and Prediction of Risky Driving Behaviors Using Fuzzy Analytical Hierarchy Process and Machine Learning Techniques

: Driver behavior plays a pivotal role in ensuring road safety as it is a significant factor in preventing traffic crashes. Although extensive research has been conducted on this topic in developed countries, there is a notable gap in understanding driver behavior in developing countries, such as Pakistan. It is essential to recognize that the cultural nuances, law enforcement practices, and government investments in traffic safety in Pakistan are significantly different from those in other regions. Recognizing this disparity, this study aims to comprehensively understand risky driving behaviors in Peshawar, Pakistan. To achieve this goal, a Driver Behavior Questionnaire was designed, and responses were collected using Google Forms, resulting in 306 valid responses. The study employs a Fuzzy Analytical Hierarchy Process framework to evaluate driver behavior’s ranking criteria and weight factors. This framework assigns relative weights to different criteria and captures the uncertainty of driving thought patterns. Additionally, machine learning techniques, including support vector machine, decision tree, Naïve Bayes, Random Forest, and ensemble model, were used to predict driver behavior, enhancing the reliability and accuracy of the predictions. The results showed that the ensemble machine learning approach outperformed others with a prediction accuracy of 0.84. In addition, the findings revealed that the three most significant risky driving attributes were violations, errors, and lapses. Certain factors, such as clear road signage and driver attention, were identified as important factors in improving drivers’ risk perception. This study serves as a benchmark for policymakers, offering valuable insights to formulate effective policies for improving traffic safety.


Introduction
Globally, over 1.35 million people die from traffic crashes each year, which not only cause significant harm to human lives and physical infrastructure but also have a considerable adverse impact on social and economic environments, accounting for nearly 3% of the gross national product [1][2][3].These incidents not only affect the victims and their acquaintances but also have a broader impact on the society [3].Traffic-related mortalities are disproportionately distributed between developed countries and those in developing nations.For example, it is reported that 90% of total traffic-related mortalities occur in low-and middle-income developing countries [4].Implementing effective countermeasures to enhance road traffic safety becomes paramount for road safety agencies in developing countries.
to drivers in other countries, drivers in Pakistan may have different perceptions of road safety, which could impact the criteria used to assess a driver's behavior.
The current study is a continuation of a recent study that aimed to analyze driver behavior in Peshawar by focusing on a set of 14 factors using the Fuzzy AHP technique [25].The increase in the number of factors from 14 to 25 was based on their significance, such as angered responses, aggressive maneuvers near traffic lights, proximity to other vehicles, and non-adherence to speed limits.Considering the limitations of the previous study, these factors were incorporated based on empirical evidence suggesting their critical impact on driving behavior, which had not been included in the previously mentioned study.
Research in developed countries has demonstrated that these factors play a vital role in influencing driver behavior.The previous study randomly considered the number of respondents to be around 120, which was deemed adequately representative of the study area.Consequently, this study collected data from 306 participants.The determination of the number of participants was based on individuals who reported driving more than 150 km per week within the study context.Furthermore, the sample size was determined based on various factors, such as population size, confidence intervals, and a desirable margin of error.
The objectives of this study are as follows.First, the study conducts a driver behavior questionnaire, including key factors related to road crashes and socioeconomic characteristics of drivers, to understand driver behavior and its impact on traffic safety in Peshawar, Pakistan.Given the context of a developing country and the absence of comprehensive nationwide driver behavior data, this study employed a DBQ for data collection.Second, the study employs Fuzzy AHP to rank the criteria for the most influential factors in road crashes, as there is a scarcity of research in this area within the study region.The purpose of using Fuzzy AHP is to analyze the rank and weight of driver behavior.This study specifically considered three main factors, a violation, error, and lapses, and sub-criteria.Third, this study uses ML techniques, including Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree Classifier (DT), Random Forest (RF), and Ensemble Model (EM), to accurately and precisely predict risky driving behaviors by incorporating the three main causes of crashes.The results of this study may assist stakeholders and policymakers in developing appropriate approaches to effectively address driving behavior, especially in low-income countries like Pakistan and similar countries with similar demographic and social characteristics, such as India, Sri Lanka, Nepal, and Bangladesh.
The remaining four sections of the article are organized as follows.The second section contains a comprehensive literature review.The third section methodology contains information about the method, research location, data collecting, and expressed preference questionnaire.In section four, which is the results and discussion, findings of the study's descriptive statistics and model estimations are described.Section five contains conclusions, limitations, and suggestions for further research.

Literature Review
Driver behavior plays a crucial role in road safety and efficient transportation.It is influenced by several factors, such as speeding and violating traffic laws, which increase the likelihood of crashes and have negative social, financial, and economic effects [26,27].Over the span of more than two decades, numerous publications have addressed the creation, modification, and evaluation of these methods.In 1990, a study by Reason et al. [28] introduced a 50-item, self-report DBQ, in which drivers rated the frequency of risky behaviors executed while driving.Likewise, another study identified almost 200 studies that have since used the DBQ in part or in its entirety [29].According to a previous study conducted by af Wåhlberg et al. [30], one of the most popular ways to assess driving behavior is using the DBQ.In the initial study by Reason, Manstead, Stradling, Baxter, and Campbell [28], more than 500 drivers aged 20 years of age and older participated.According to principal components analysis, three factors, violations, errors, and lapses, accounted for 33% of the variance in replies.A violation is an act of disobeying laws or a norm of conduct that is seen as acceptable by society.The author claims that violations are willful deviations from the anticipated conduct required to preserve the security of a potentially harmful system.Usual infractions include speeding and operating a vehicle while under the influence of drugs or alcohol.On the other side, errors are described as when planned actions fail to provide the desired results.The eight items with the strongest component loadings for violations, errors, and lapses from past studies were used by Parker et al. [31].A sample of 1656 British drivers, ranging in age from 17 to 70, was subject to a condensed test.The authors discovered that they had successfully acquired the identical three-component approach [28].In the pursuit of knowledge, many researchers have examined the creation, improvement, and evaluation of a particular instrument during the previous two years.The 50-item DBQ, developed by Reason [32], asks respondents to describe their driving behaviors.Drivers were able to assess how frequently they engaged in dangerous activities while driving using the questionnaire.Since its conception, the DBQ has been used in approximately 200 studies, either in full or in portions [29].
The majority of crashes are due to human circumstances [33].Driver abilities and driving style, or the driver's performance and attitude, are two important aspects of human factors while driving.Although motivations, opinions, and personality factors impact driving style, driving abilities were linked to handling data and motor abilities [34].In reality, driving styles and abilities may interplay to impact collision risk, the usage of safety factors [35,36], the frequency of failures [37], and rehabilitation through errors [38] research regarding driving unusually as a probable major shift for a comprehensive framework of daily driving behaviors.The DBQ was developed based on a theoretical classification of abnormal behaviors [39], discovered in their initial study on mistakes and violations, which were two empirically separate types of behavior encompassing three variables (intentional violation, hazardous errors, and 'silly' errors).Later, Parker et al. [40] verified the threefactor structure of the DBQ in some other research.Researchers further demonstrated that the tool is very consistent over time.The consensus in research suggests that the original three-or four-factor structure, which includes errors, lapses, and aggressive and ordinary violations, has been replicated in numerous studies conducted in different countries, including the UK [37], Australia [41], Brazil [42], China [43], Greece [44], Finland [45], New Zealand [46], Sweden [47], and Turkey [48].It was also discovered that the DBQ demonstrated cross-cultural reliability and a higher level of resemblance among the four-factor structures among normal drivers in large samples of British, Finnish, and Dutch drivers.However, in certain research, more elements have been identified.The number of DBQ variables has occasionally been greater (e.g., five among older drivers [40] and six in the setting of the workplace) [49], and seldom lower (e.g., two within professional drivers) [50].
Regarding the methodological application of the DBQ in the literature for ranking and weighting criteria, multi-criteria decision making has been used, such as AHP and Fuzzy AHP.Researchers have employed various techniques alongside the AHP method to mitigate uncertainty and inconsistency.These techniques include interconnections [51], frequency ratio [52], sensitivity and uncertainty analysis [53], interval calculations [54], modified analytical hierarchy process [55], and the weights-of-evidence bivariate statistical model [56].Additionally, some scholars have integrated fuzzy theory with AHP to handle the ambiguity in comparisons [57,58].The Fuzzy AHP is considered a more accurate alternative to Fuzzy AHP, particularly in the realm of human thought and behavior.Fuzzy AHP exhibits enhanced precision and accuracy when compared to AHP [59].For instance, a research study on driver behavior has been conducted in Pakistan using the Fuzzy AHP methodology.The AHP model has also been used in earlier research on driver behavior, as may be seen in the study [60].
In recent years, advancements in artificial intelligence, particularly in areas like ML and deep learning, have significantly transformed various fields, including traffic safety.These innovative technologies are increasingly recognized as superior to traditional models for evaluating and predicting driver behaviors [61][62][63][64][65].There are different types of ML models, such as SVM, RF, DT, and NB.Classification issues have been successfully handled using the SVM, particularly when working with small sample sets.Its use in transportation research has been expanding significantly [66][67][68][69][70].The study assessed the effectiveness of SVM models for forecasting automobile crashes.According to the results, SVMs fared better at predicting crashes than negative binomial models and back-propagation neural networks.Researchers [71] have presented a hybrid strategy that used particle swarm optimization and SVM to predict traffic safety results.Once the basis classifiers are built, they are combined into ensemble classifiers utilizing a variety of ensemble rules.Ten alternative ensemble rules are used to merge the basis classifiers with respect to the ensemble procedure described in this work [72].The RF methodology is a predictive modeling approach that combines several randomized DT.It seeks to overcome the problem of excessive variation related to individual DT by averaging the results of numerous DT estimators.RF does this by aggregating the forecasts of an ensemble of randomized DT, which results in parallel estimators that effectively lower the variance component of the resultant model.The predictions are closer to an ideal model as a result of the decrease in variance.An ensemble learning strategy [73] is used to overcome this problem and handle the identification of aggressive driver behavior.By splitting up a single class imbalance problem into many category balance issues, this ensemble learning approach effectively manages datasets with class imbalances.The study emphasizes the difficulty of simulating and forecasting driver behavior with the ultimate aim of foreseeing driver behaviors before potentially hazardous circumstances arise.
In summary, various methods have been employed to analyze DBQ.However, these studies are subject to limitations, particularly concerning the number of participants and the scope of the questionnaire questions.For example, a previous study that compared driver behavior across Pakistan, China, Turkey, and Hungary on a broad scale had a limited sample size (70 participants in each country) and a constrained set of questionnaire questions.To address these limitations, the current study narrows its focus to a singlecountry level, specifically, the city of Peshawar in Pakistan.Peshawar was chosen due to its marked differences in culture and social norms, making it an intriguing and valuable locale for this research.The questionnaire has been refined based on a thorough review of the literature, with a specific focus on increasing the number of participants and including more questions.These modifications aim to comprehensively capture driver behaviors related to traffic safety.This study addresses the aforementioned gaps by employing Fuzzy AHP to rank criteria.This approach highlights the significance of influential factors contributing to traffic crashes.Additionally, ML techniques are utilized to predict driver behavior regarding traffic safety.The synergy between these two methods enables a comprehensive evaluation of driver behavior, which can be directly applied to the study area to prevent road traffic crashes.Furthermore, it provides valuable insights for decision makers and stakeholders in the development of effective driver behavior regulations and programs.

Materials and Methods
This study adopts a specific framework comprising a variety of methods.Initially, this study identified the pivotal criteria essential for evaluating driver behavior.These criteria were incorporated into the DBQ questionnaire, which was distributed to the study participants.Subsequently, we developed different levels of decision making necessary to comprehend the influence of these criteria on driver conduct.To gain a more comprehensive understanding, this study has analyzed the criteria in various ways.Additionally, we employed a specific approach, namely FAHP, which facilitates ranking criteria and assessing their importance or weight in shaping driver behavior.Finally, this study used ML models, including NB, DT classifier, SVM, RF, and EM, to predict driver behavior, as depicted in Figure 1.
employed a specific approach, namely FAHP, which facilitates ranking criteria and assessing their importance or weight in shaping driver behavior.Finally, this study used ML models, including NB, DT classifier, SVM, RF, and EM, to predict driver behavior, as depicted in Figure 1.

Methodology
This study employed a comprehensive approach to gathering information on drivers by distributing a document-based questionnaire through emails and utilizing various tools to distribute it to over 1000 randomly selected individuals aged 18 and above.To account for the absence of younger drivers in the sample, we also collected information on this demographic by surveying university students from several institutions in Peshawar.The study included participants who currently possess a valid driver's license, as well as those who do not [74].
The final dataset for this study consisted of 306 valid volunteer responses collected through a Google Form questionnaire distributed on various social media platforms.Data cleaning was performed to remove duplicates, handle missing values, and correct formatting errors.The cleaned dataset, containing 25 different input characteristics, was used to predict driving behavior using ML models.The 25 features were selected as an input feature, and the Likert scale of the questionnaire (1 = never to 6 = nearly all the time) was

Methodology
This study employed a comprehensive approach to gathering information on drivers by distributing a document-based questionnaire through emails and utilizing various tools to distribute it to over 1000 randomly selected individuals aged 18 and above.To account for the absence of younger drivers in the sample, we also collected information on this demographic by surveying university students from several institutions in Peshawar.The study included participants who currently possess a valid driver's license, as well as those who do not [74].
The final dataset for this study consisted of 306 valid volunteer responses collected through a Google Form questionnaire distributed on various social media platforms.Data cleaning was performed to remove duplicates, handle missing values, and correct formatting errors.The cleaned dataset, containing 25 different input characteristics, was used to predict driving behavior using ML models.The 25 features were selected as an input feature, and the Likert scale of the questionnaire (1 = never to 6 = nearly all the time) was considered to be the output.The driving behaviors were categorized into six classes based on degrees of hostility, and the dataset was trained to accurately classify and describe driver behaviors.

Participants' Statistics
According to the collected responses, the study received a response rate of 30% (N = 306), with 29% of the respondents being university students and 71% being engineers, doctors, businesspeople, etc.Furthermore, 73% of the participants had valid driver's licenses, compared to 27% who did not.Table 1 shows the age and gender distribution of the sample, with women making up 25.9% of the respondents.The age of the drivers ranged from 20 to 30 years, representing 49.5% of the sample.In addition, the majority of participants reported traveling 208.5 km on average per week.The participants' average percentage of active driver's licenses was 73%.Aberrant driving behaviors were measured using the expanded version of the DBQ [60].There are three aggressive infractions, eight ordinary violations, seven errors, and seven lapses in this list.The respondents were tasked with assessing how frequently they had engaged across all 25 behaviors over the preceding year using a six-point scale (1, never; 6, nearly all the time).

Demographic Measures
Those who responded provided information on their age, gender, experiences of driving, driver occupations, number of kilometers driven by drivers, whether they had a full driving license, and weakly miles traveled, which are shown in Table 1.

Driver Behavior Perception Evaluation Criteria
The research used Fuzzy AHP to compare and assess the DBQ for various traffic cultures utilizing the well-known major driver behavior characteristics developed based on the AHP framework [60].Such standards for driving behavior have a significant impact on traffic safety as well as being thought to be crucial for other road users' ability to move safely.According to [75], driving behavior is undoubtedly the most important element affecting traffic safety as a whole.To examine each criterion thoroughly, the driving behavior criteria were established for the research using a three-level hierarchical structure and sorted alphabetically.The primary driver behavior criteria, such as violations, lapses, and errors, make up the first level.Figure 2 illustrates how these primary driver behavior requirements are divided into sub-criteria.

Driver Behavior Perception Evaluation Criteria
The research used Fuzzy AHP to compare and assess the DBQ for various traffic cultures utilizing the well-known major driver behavior characteristics developed based on the AHP framework [60].Such standards for driving behavior have a significant impact on traffic safety as well as being thought to be crucial for other road users' ability to move safely.According to [75], driving behavior is undoubtedly the most important element affecting traffic safety as a whole.To examine each criterion thoroughly, the driving behavior criteria were established for the research using a three-level hierarchical structure and sorted alphabetically.The primary driver behavior criteria, such as violations, lapses, and errors, make up the first level.Figure 2 illustrates how these primary driver behavior requirements are divided into sub-criteria.

Cronbach's Alpha-Survey Questionnaire Reliability Test
The survey conducted in Peshawar city, Pakistan with 306 participants yielded a Cronbach's alpha score of 0.81, as shown in Table 2, indicating good internal consistency and reliability of the survey questionnaire, as depicted in Table 3.This result suggests that the items in the questionnaire consistently measured the intended construct.The obtained Cronbach's alpha score falls within the range of values commonly accepted as indicating good internal consistency reliability.The formula for Cronbach's alpha is shown in Equation (1).

Cronbach's Alpha-Survey Questionnaire Reliability Test
The survey conducted in Peshawar city, Pakistan with 306 participants yielded a Cronbach's alpha score of 0.81, as shown in Table 2, indicating good internal consistency and reliability of the survey questionnaire, as depicted in Table 3.This result suggests that the items in the questionnaire consistently measured the intended construct.The obtained Cronbach's alpha score falls within the range of values commonly accepted as indicating good internal consistency reliability.The formula for Cronbach's alpha is shown in Equation (1).Several Fuzzy AHP applications and approaches have been used by different scholars.For instance, triangular functions were used in the first Fuzzy AHP research [57].On the other hand, [76] developed a unique level analysis method to handle the synthetic extent standards of the Fuzzy AHP in pairwise comparisons.Through a comparison scale, the latter facilitates the estimation of priorities in the hierarchical structure.The effectiveness of Fuzzy AHP modeling as a tool for making decisions is well-accepted [36,77].
The Fuzzy AHP approach was used in this study to compute the weights of the driver behavior criterion and to identify the important driver behavior criteria.To more accurately assess the factors impacting road safety, a fuzzy scale was adopted for the design of the driver behavior questionnaire.Fuzzy numbers based on pairwise comparisons were utilized to categorize the driver behavior and sub-criteria using a hierarchical approach.A pairwise comparison was used to examine the questionnaire survey data obtained from the assessors of particular traffic cultures, and the global scores were generated.A consistency test was carried out to verify that the data on driver behavior were reliable.In this part, the researchers provided a brief overview of fuzzy hierarchical evaluation ideas.By creating a questionnaire survey using a triangular fuzzy number as a pairwise comparison scale, the authors used fuzzy logic.
In the realm of fuzzy logic, algebraic operations, such as addition, subtraction, multiplication, division, and reciprocation of two triangular fuzzy numbers Ǎ1 = (l 1 , m 1 , u 1 ) and Ǎ2 = (l 2 , m 2 , u 2 ), can be expressed mathematically in Equation ( 2) to Equation ( 6), as follows: According to the method of extent analysis by the researcher, And, all M j gi (j = 1, 2, 3, 4, 5, . . ., m) are triangular fuzzy numbers given in Table 4.The steps of Chang's analysis [78] can be described as follows.Step 1.The fuzzy synthetic extent (S i ) value concerning the ith criterion is defined as in Equation (8).
where l is the lower limit value, m is the most promising value, and u is the upper limit value.
Step 2. The degree of possibility of S 2 = (l 2 , m 2 , u 2 ) ≥ (l 1 , m 1 , u 1 ) can be defined as where x and y represent the values on an axis of the membership function of each criterion.This expression can be seen in Equation ( 9) below.
where µd is the highest intersection point µ s 1 and µ s 2 ; the graphical presentation can be seen in Figure 3.
where  is the lower limit value,  is the most promising value, and  is the upper limit value.Step 2. The degree of possibility of  = ( ,  ,  ) ≥ ( ,  ,  ) can be defined as where  and  represent the values on an axis of the membership function of each criterion.This expression can be seen in Equation ( 9) below.
where  is the highest intersection point  and  ; the graphical presentation can be seen in Figure 3.To compare  and  , both ( ≥  ) and ( ≥  ) are required.To compare S 1 and S 2 , both V(S 1 ≥ S 2 ) and V(S 2 ≥ S 1 ) are required.
Step 3. The degree of possibility for a convex fuzzy number S to be greater than k convex fuzzy numbers S i = (i = 1, 2, 3, . . .k) can be defined as in Equation (10).
Assume that d ′ (A i ) = minV(S i ≥ S k ).
For k = 1, 2, 3, . . ., n k ̸ = i, the weight vectors are given in Equation (11) as Step 4. Via normalization, the normalized weight vectors are given in Equation ( 12) as In fuzzy logic, a non-fuzzy number is a precise numerical value, as opposed to a fuzzy number, which represents a range of values with varying degrees of likelihood or membership.
The symbol "W" in the statement "And, W is the non-fuzzy number" refers to a specific numerical value that is not subject to fuzziness or uncertainty.It is a crisp value that can be used in mathematical operations without ambiguity.

Machine Learning for Driver Behavior Modeling
In this study, we employ various ML techniques to predict driver behavior using a comprehensive DBQ [79,80].By leveraging the power of ML algorithms, we aim to uncover patterns and insights that may be hidden within the collected data.The driver behavior classification system in our study encompasses four fundamental steps: data collection and preprocessing, feature and output separation, data partitioning for testing and training, and application of models (Figure 4).Each step is outlined in detail in the subsequent sections, providing comprehensive information regarding its execution and significance within the classification framework.By following this systematic approach, we aim to ensure the accuracy and reliability of our driver behavior analysis.The models used for this dataset include the following: SVM, NB, DT, RF, and EM.They are shown in Figure 4.The dataset was randomly partitioned into training (80%) and testing subsets (20%) for each class.This technique helps prevent bias and ensures the model's accuracy and performance.
Step 4. Via normalization, the normalized weight vectors are given in Equation ( 12) as  = (( ), ( ), … , ( ) In fuzzy logic, a non-fuzzy number is a precise numerical value, as opposed to a fuzzy number, which represents a range of values with varying degrees of likelihood or membership.
The symbol "W" in the statement "And, W is the non-fuzzy number" refers to a specific numerical value that is not subject to fuzziness or uncertainty.It is a crisp value that can be used in mathematical operations without ambiguity.

Machine Learning for Driver Behavior Modeling
In this study, we employ various ML techniques to predict driver behavior using a comprehensive DBQ [79,80].By leveraging the power of ML algorithms, we aim to uncover patterns and insights that may be hidden within the collected data.The driver behavior classification system in our study encompasses four fundamental steps: data collection and preprocessing, feature and output separation, data partitioning for testing and training, and application of models (Figure 4).Each step is outlined in detail in the subsequent sections, providing comprehensive information regarding its execution and significance within the classification framework.By following this systematic approach, we aim to ensure the accuracy and reliability of our driver behavior analysis.The models used for this dataset include the following: SVM, NB, DT, RF, and EM.They are shown in Figure 4.The dataset was randomly partitioned into training (80%) and testing subsets (20%) for each class.This technique helps prevent bias and ensures the model's accuracy and performance.We evaluate the performance of different models, each trained on the questionnaire responses, to predict and classify driver behavior accurately.Through this analysis, we examine the varying accuracies of these models and their potential implications for enhancing driver safety and developing targeted interventions.

Data Preprocessing and Hyperparameter Tuning
Prior to the development of ML models, the essential steps of data preprocessing were followed.During preprocessing, steps were taken to remove the observations having null or duplicate values and, similarly, outliers.This was followed by normalization of the data to ensure uniformity and consistency.To optimize the performance of ML models We evaluate the performance of different models, each trained on the questionnaire responses, to predict and classify driver behavior accurately.Through this analysis, we examine the varying accuracies of these models and their potential implications for enhancing driver safety and developing targeted interventions.

Data Preprocessing and Hyperparameter Tuning
Prior to the development of ML models, the essential steps of data preprocessing were followed.During preprocessing, steps were taken to remove the observations having null or duplicate values and, similarly, outliers.This was followed by normalization of the data to ensure uniformity and consistency.To optimize the performance of ML models and to overcome overfitting, a number of strategies, including feature selection, hyperparameter tuning, and selection of ensemble methods, were considered.Choosing relevant features having a significant impact on the output variable (driver behavior categorized under various classes) can reduce the model's complexity, thereby preventing overfitting.Similarly, hyperparameter tuning is aimed at finding the optimal balance between model complexity and generalization performance.Additionally, considering an ensemble model can also reduce the model's overfitting by considering the strengths and ignoring the weaknesses of individually based models.Specifically, we employed a rigorous random search procedure where hyperparameters were iteratively adjusted until an optimized model performance was obtained.The suggested hyperparameter method was selected because it can offer computationally acceptable solutions in high-dimensional space with relatively fewer iterations compared with the grid search technique.

Support Vector Machine (SVM)
The SVM classifier used in the present study demonstrated remarkable performance in categorizing driving behaviors.The SVM is widely recognized for its high accuracy in predicting class labels, and it proved to be effective in examining driving behavior in this study.Despite its computational complexity, SVM remains a solid choice for handling highdimensional data and mitigating overfitting.The model's capability to accurately categorize driving behaviors underscores its significance in this study [81].Unlike perceptrons, SVM identifies the hyperplane (H) with the maximum separation margin [82,83], as defined in Equation (13).
To classify a data point as negative or positive, a decision rule needs to be defined.The decision rule for SVM can be expressed as follows in Equation (14).
By replacing −c with b, the decision rule becomes, in Equation ( 15), Consequently, the classification output y can be defined as follows in Equation ( 16).
If the value of x + b is greater than zero, the point is classified as positive; otherwise, it is classified as negative.The objective is to find the values of → w and b that maximize the margin distance denoted as 'd'.

Naïve Bayes
The Naïve Bayes classifier is renowned for its simplicity and assumption of feature independence.It has been proven to be effective in classifying driver behavior, making it a reliable choice for this specific task.Although its accuracy may not be as high as other models, NB offers interpretability and computational efficiency, which are advantageous for this study.The analysis conducted in this research demonstrates the model's ability to accurately predict driver behavior [84,85].
The Naïve Bayes classifier is based on Bayes' theorem, a fundamental concept in probability theory.Bayes' theorem allows for updating the belief in the occurrence of an event (A) given the evidence (B).
Equation ( 17) is for calculating the posterior probability, P(A|B): Here, P(A|B) represents the probability of event A occurring given evidence B, P(B|A) is the probability of observing evidence B given that event A has occurred, P(A) represents the prior probability of event A occurring, and P(B) is the probability of observing evidence B.

Decision Tree Classifier
The DT classifier is a commonly used algorithm in ML for classification tasks, such as analyzing driver behavior.In the present study, the DT classifier demonstrated its effectiveness in classifying instances correctly in the test data [27].Although accuracy is frequently used to measure performance, precision, recall, and F1 score should also be taken into account for a thorough evaluation, especially when dealing with unbalanced datasets or various misclassification costs.By quantifying node impurity, the Gini index mathematically describes the DT.The equation for the Gini index is as follows in Equation (18).
Here, J represents the number of classes or categories in the classification problem, Pi denotes the probability of an instance belonging to class i, and Pk represents the probability of an instance belonging to any class other than i.
The Gini index is commonly used as a criterion to make decisions about splitting the data during the construction of the DT.By calculating the Gini index, one can assess the impurity or homogeneity of a node and determine the optimal splits.

Random Forest Classifier
The RF classifier was chosen because of its exceptional ability to correctly categorize instances of driving behavior.Its ensemble nature, which combines several DTs, enables it to identify intricate links and patterns in the data and average their predictions for the final classifier, given in Equation (19).With this method, the predictive performance of the system is improved, and it offers insightful information and trustworthy results for the study of driver behavior analysis.
• C represents the number of classes or categories in the dataset.

•
Pi represents the probability of an instance belonging to class i. • The equation calculates the squared probabilities of each class, sums them up, and subtracts the result from 1. • A lower Gini value indicates less impurity or a more homogeneous distribution of instances among the classes, given in Equation (20).
• C represents the number of classes or categories in the dataset.

•
Pi represents the probability of an instance belonging to class i. • The equation calculates the product of the probability of each class and its logarithm (base 2), sums them up, and assigns a negative sign.• A lower entropy value indicates less impurity or a more homogeneous distribution of instances among the classes, given in Equation (21).
Both the Gini index and entropy are used as impurity measures in the RF algorithm to determine the quality of a split during the construction of the DT.The objective is to find the best split that maximizes the separation between different classes, resulting in more accurate predictions.

Ensemble Model
Ensemble models integrate numerous individual models to obtain higher performance in ML tasks, particularly with high-dimensional data and complicated interactions.In ensemble models, SVM, DT, and RF are often-used algorithms.Techniques like bagging or boosting can be used to combine SVM.To prevent overfitting, DT is used in ensemble models like RF.Additionally, gradient boosting methods like Ada Boost, XG Boost, and Light GBM, which have demonstrated great performance in a variety of domains, may be included in ensemble models [73].
The definition of the function f (x, y) is shown below equation f (x, y) = 1 x ≥ y 0 x < y.The final classification result of the data is obtained based on the ensemble rules, shown in Equation (22).
In this study, our objective is to explore the effectiveness of ensemble models utilizing SVM, DT, and RF for enhancing prediction accuracy in a specific application domain.By harnessing the complementary strengths of these models and exploiting their diversity, we expect to achieve improved performance and robustness compared to individual models.To achieve this, our investigation will involve rigorous experimentation, model selection, and evaluation methodologies.Through these efforts, we aim to gain valuable insights into the efficacy of ensemble models in our targeted task and contribute to the advancement of predictive modeling techniques in ML.

Fuzzy AHP Results and Discussion
This study used the Fuzzy AHP method to rank the critical criterion in driver behavior towards improved traffic management in Peshawar, Pakistan.The study identified three main criteria, which are violations, errors, and lapses.When assessing driver behavior, violations are categorized into the most serious category because they include deliberate disregard for safety legislation or traffic laws, such as careless driving or exceeding the posted speed limit.In the second category, errors come up; errors refer to unintentional mistakes made by drivers, such as misjudging distances or not checking blind spots, which may lead to crashes and put lives at risk.Third lapses are instances of inattention or forgetfulness when driving, such as neglecting to switch on the headlights or yielding at a stop sign.Even though errors may not immediately pose a threat, they still contribute to unsafe driving conditions and raise the likelihood of crashes.Therefore, it is crucial to take into account all three factors when assessing driver behavior and putting policies in place to encourage safe and responsible driving practices, which are shown in Table 5.The violation criterion was subdivided into two categories: aggressive violations and ordinary violations.Regarding aggressive violations, the analysis revealed that "honking to indicate annoyance to others" stands out as the most prevalent form of aggressive driving behavior in Peshawar, Pakistan.This behavior poses a significant risk of conflicts and crashes on the road.The second-ranking behavior, "indicating aggression towards other drivers", indicates that some drivers employ verbal or nonverbal cues to express their aggression, which may escalate into physical altercations.Lastly, "chasing other drivers" was the least common aggressive behavior, as depicted in Table 6, with "AV" representing aggressive violations.Furthermore, the ordinary violations were classified into eight criteria.These sub-criteria for ordinary violations included disregarding speed limits on residential streets, changing lanes at the last minute, disregarding speed limits on motorways, overtaking slow drivers, crossing junctions to avoid traffic lights, avoiding traffic lights to beat other drivers, driving too close to other vehicles, and crossing junctions to avoid traffic lights.The most prevalent ordinary violation among drivers in Peshawar was "overtaking on slow drivers", followed by "changing lanes at the last minute", "pulling out of junctions that other drivers have stop", "disregarding speed limits on residential", "avoiding traffic lights to beat other drivers", "disregarding speed limits on motorways", and "driving too close to other vehicles"; the least prevalent ordinary violation was "crossing junctions to avoid traffic lights", shown in Table 7, where "OV" stands for ordinary violations.Secondly, this study investigated errors in driver behavior, which pertain to mistakes made by drivers due to a lack of knowledge, skills, or training.Seven sub-criteria for errors were identified in this study, with "don't use a seat belt during driving" receiving the highest rank, suggesting that some drivers in Peshawar do not consider the use of seat belts as important, despite being a basic safety measure.Ranked second was "underestimating the speed of oncoming vehicles", stressing the importance of drivers being able to accurately judge the speed of oncoming vehicles, followed by "use mobile during the driving", ranked third, while use of a mobile device during driving may cause a crash."Fail to notice pedestrians crossing" ranked fourth, indicating the importance of drivers being alert to pedestrians and giving them a right of way; "changing lanes without checking the rear-view mirror" ranked fifth, indicating that this behavior is a crucial error that drivers commit in Peshawar when it comes to traffic safety; "overtaking & didn't notice signaling" ranked sixth, implying drivers' tendency to overtake other vehicles without noticing signals or indicators; and the seventh is "miss give way sign", ranked third, highlighting drivers' failure to give way to other vehicles as required by traffic signs or signals.These findings are detailed in Table 8, with "E" denoting errors.It is clear that tackling these risky behaviors is essential for improving Peshawar's overall traffic safety.
Finally, this study analyzed lapses in driver behavior, which are unintentional mistakes, such as forgetting to signal or failing to check blind spots.While not as dangerous as violations, lapses are still common among drivers in Peshawar.The results show that the sub-criterion of "wrong Lane approach roundabout junction in the last position" received the highest ranking, followed by "hit something while reversing", "using third gear while away from traffic light", "switch one thing and on another", "no recollection of the road along you are traveling", "misread the signs", and "forgot where your park the car".These rankings indicate the relative importance of the sub-criteria in terms of their impact on driver behavior and traffic management in Peshawar.The finding that "misreading the signs" received the highest ranking highlights the need for clear and easily understandable In contrast to the previous study, the outcomes of the present study revealed a significant differences.These differences can be attributed to several factors.Firstly, unlike the previous studies that employed a shorter questionnaire and had a limited number of participants, the present study utilized a modified questionnaire to collect more comprehensive data and included a larger and adequate sample size.Additionally, the present study focused exclusively on the city of Peshawar, Pakistan, while the previous study encompassed a broader geographical scope, covering an entire country with only 70 participants [36].Another significant difference lies in the methodology used for ranking driver behavior.The previous studies relied on Kendall's agreement test to determine the weights, whereas the present study employed Fuzzy AHP.By utilizing Fuzzy AHP, the present study was able to consider the inherent uncertainties and imprecisions associated with human behavior, resulting in a more comprehensive and accurate ranking of driver behavior.These distinctions in methodology and context account for the observed variation in the study outcomes, highlighting the importance of adopting a tailored approach and considering specific factors when analyzing driver behavior in a specific location.

Comparison by Age Group and Ranking
Certainly, the rankings presented in this table play a crucial role in assessing and understanding driver behavior across different age groups.Respondents have assigned rankings on a scale from one to eight, with one indicating a higher likelihood of engaging in specific driving behaviors and eight signifying a lower likelihood.Essentially, this scale quantifies the frequency of certain driving-related actions within each age group.
Segmenting these rankings into distinct age groups (18-20, 21-30, 31-40, 41-50, 51-60) allows us to gain valuable insights into how driver behavior evolves with age.When a particular behavior receives a lower ranking within an age group, it suggests that individuals in that group are generally more cautious and compliant with traffic regulations.Conversely, higher rankings within the same age group may indicate a propensity for riskier or less rule-abiding behavior.
What is particularly interesting is the variance in rankings between the age groups.The rankings for age groups 18-20 and 21-30 are quite similar, suggesting a commonality in behavior between these two groups.However, as driver age increases and driving experience accumulates, we begin to observe variations in behavior.Age groups 31-40, 41-50, and 51-60 exhibit different rankings, indicating shifts in driver behavior, as shown in Table 10.This suggests that as drivers gain more experience and maturity, their behavior on the road may change.Note: DBQ = driver behavior questionnaire, AV = aggressive violation, OV = ordinary violation, E = error, L = lapses.
These rankings are invaluable for making comparisons across age groups and identifying trends in driver behavior.They offer the potential to understand how age influences driving habits.Policymakers, advocates for driving safety, and educators can use these insights to develop targeted interventions and educational programs.These initiatives aim to enhance road safety and promote responsible driving behavior tailored to specific age demographics.
Moreover, this dataset serves as a foundation for evidence-based strategies to improve road safety and implement age-appropriate driver training programs.It helps pinpoint areas where increased awareness and education may be needed, ultimately contributing to safer road conditions across all age groups.Based on the study results, several policy implications may be suggested.By utilizing the findings of current research, law enforcement agencies can launch targeted enforcement schemes to deter unsafe driving practices and ensure compliance with traffic laws.Public awareness campaigns aimed at educating drivers regarding risks associated with aberrant driving behavior could also be organized.By incorporating the common risky behaviors, rigorous driver training and education should be mandated.Furthermore, infrastructure interventions, such as redesigning intersections and implementing traffic-calming measures in high-risk areas, can deter risky driving behavior.Likewise, technological innovation and integration, such as in-vehicle monitoring systems or advanced driver assistance, may be considered to alert drivers about unsafe driving conditions.

Likert Scale Data Analysis
Table 11 can be used to make the decision using the perceptions of the respondent; thus, to do this, we use the weighted average value.
To calculate the weighted average value in Equation (23).
To make a decision based on the perceptions of the respondents, you can calculate a weighted average value.This value is determined by considering mean values for each criterion.In this case, the calculation results in a weighted average value of 2.633.Consequently, all values above 2.633 will be considered as indicating a "High Perception" decision, suggesting a higher likelihood of disobeying traffic laws.Values below 2.633 will be considered as indicating a "Low Perception", implying a higher likelihood of adhering to traffic laws as a driver.

1.
High Perception Criteria (AV1, OV2, OV4, OV8, E1, E2, E3, E4, E7, L4, L6): On average, respondents have a "High Perception" regarding these aspects of driver behavior.This means that they believe these criteria are more likely to involve rule violations or mismanagement.In other words, respondents think that in these areas there is a higher likelihood of drivers not following the rules or exhibiting poor behavior.

2.
Low Perception Criteria (AV2, AV3, OV1, OV3, OV5, OV6, OV7, E5, E6, L1, L2, L3, L5, L7): For these criteria, respondents hold a "Low Perception".This indicates that respondents perceive these aspects of driver behavior as more likely to follow the rules and exhibit good behavior.In simpler terms, respondents believe that in these areas drivers are more likely to follow the rules and behave well.
In short, the data analysis suggests that respondents generally have a higher perception of rule violations or mismanagement in the "High Perception" criteria, while they believe that drivers are more likely to follow rules and exhibit good behavior in the "Low Perception" criteria.These findings can help identify areas where improvements or interventions may be needed to enhance overall driver behavior and promote adherence to rules.Note: N = never, H.E = hardly ever, O = occasionally, Q.O = quite often, F = frequently, and N.AT = nearly all the time.

Machine Learning Model Results and Discussion
By examining the information gathered through surveys, ML techniques have been used to directly predict driving behavior.This research yielded a measure of "driver behavior".In this process, a method called Fuzzy AHP is utilized to rank the criteria relevant to driver behavior.Subsequently, ML algorithms are applied to the dataset, consisting of 25 features, to predict and scale driver behaviors on a scale ranging from one to six.
The current study used various ML models for a specific task.The evaluation results presented in Table 10 indicate that all of the benchmarked models exhibit remarkable performance in handling the prelabeled dataset.These findings suggest that these models are well-suited for real-time assessments of new data related to driver behaviors.Starting with the NB model, it exhibits a validation loss of 1.92, an accuracy rate of 66.30%, a precision score of 0.60, an F1 score of 0.38, and a recall rate of 0.46, as shown in Table 12.These metrics suggest that the NB model has moderate accuracy but struggles with precision and F1 score, indicating challenges in correctly classifying positive cases, although it manages to capture a reasonable portion of true positive cases.Moving on to the DT classifier, it shares a validation loss of 1.92 with NB but displays a slightly better performance with an accuracy of 68.40%, a precision of 0.67, an F1 score of 0.67, and a recall rate of 0.67.This model achieves a better balance between precision and recall compared to NB.The SVM model outperforms the previous two with a lower validation loss of 1.28, a higher accuracy of 76.08%, a perfect precision score of 1.00, an F1 score of 0.76, and a recall rate of 0.75.It excels in both precision and recall, making it effective in classifying positive cases.The RF model showcases a validation loss of 1.09, an accuracy of 80.30%, a precision of 0.84, an F1 score of 0.72, and a recall rate of 0.77.It demonstrates a strong balance between precision and recall, making it an effective choice for classification tasks.Finally, the ensemble model stands out with the lowest validation loss of 0.70, an accuracy rate of 80.40%, a precision score of 0.84, an outstanding F1 score of 0.95, and a robust recall rate of 0.89.This model combines the strengths of multiple models (SVM, DTC, RF), resulting in exceptional performance in accurately classifying positive cases.The consistency in validation loss values across these ML algorithms indicates the absence of overfitting or underfitting during the training process.These insights collectively highlight the strengths and trade-offs of each algorithm in classifying driver behavior data.This analysis was performed using a real-world dataset, and we considered multiple performance metrics, including accuracy, precision, recall, and F1-score.The results indicate that the ensemble model attained the highest accuracy at 84.4%.It outperformed the other models, demonstrating its predictive accuracy.Furthermore, the ensemble model exhibited superior precision and recall values, signifying its effectiveness in correctly classifying both positive and negative instances.The discussion of these results underscores the trade-offs inherent in selecting an ML model.These trade-offs consider certain aspects, such as model complexity, interpretability, and predictive performance.The choice of the most suitable model hinges on the specific requirements of the application at hand.In cases where high accuracy is a paramount concern and computational resources are plentiful, the ensemble model emerges as a formidable option.Conversely, when interpretability and the ability to explain decisions to stakeholders are crucial, the DT model proves valuable.RF offers a robust middle ground, maintaining a balance between performance and interpretability while mitigating the risk of overfitting.However, SVM, despite its prowess in high-dimensional spaces, may not be the most judicious choice in scenarios constrained by computational resources.

Receiver Operating Characteristic (ROC) Curve
The AUC (Area Under the Curve) values associated with each class in an ROC curve of a classification model offer valuable insights into the model's performance and its ability to differentiate between different classes.These AUC values in Figure 5 are indicative of the model's quality and its capacity to make class distinctions: 1.
Never: Class 1, (AUC = 0.95):An AUC of 0.95 for Class 1 signifies the model's effectiveness in accurately identifying instances of Class 1 and distinguishing them from other classes.The AUC values are indicative of the model's proficiency in classifying various classes.Higher AUC values are associated with better performance, suggesting the model's greater ability to distinguish that particular class.In this context:

•
The occasionally class exhibits the most outstanding model performance.

•
Never, hardly ever, and quite often classes also demonstrate strong performances.

•
The frequently class's performance is reasonable, though not as robust as other classes.

•
The nearly all the time class stands out with a perfect AUC, showcasing exceptional model performance in identifying Class 6.

Precision-Recall Curve
These Average Precision (AP) scores correspond to a precision-recall curve for various classes shown in Figure 6.Here is a concise explanation: • The occasionally class exhibits the most outstanding model performance.

•
Never, hardly ever, and quite often classes also demonstrate strong performances.• The frequently class's performance is reasonable, though not as robust as other classes.
• The nearly all the time class stands out with a perfect AUC, showcasing exceptional model performance in identifying Class 6.

Precision-Recall Curve
These Average Precision (AP) scores correspond to a precision-recall curve for various classes shown in Figure 6.Here is a concise explanation: • The occasionally class exhibits the highest precision-recall performance with an AP of 0.94, indicating that the model effectively balances precision and recall for this class.• The never class and hardly ever class also display strong performance with AP values of 0.93 and 0.80, respectively.These classes demonstrate a robust trade-off between precision and recall.

•
The quite often class achieves moderate performance with an AP of 0.66, indicating a reasonable balance between precision and recall but not as strong as Classes 1, 2, and 3. • The frequently class has the lowest performance with an AP of 0.23, suggesting that the model encounters challenges in achieving both high precision and high recall for this class.• The nearly all the time class stands out with a perfect AP of 1.00, indicating that the model attains the highest precision while maintaining full recall for this class.
Sustainability 2024, 16, x FOR PEER REVIEW 22 of

•
The occasionally class exhibits the highest precision-recall performance with an A of 0.94, indicating that the model effectively balances precision and recall for th class.

•
The never class and hardly ever class also display strong performance with AP valu of 0.93 and 0.80, respectively.These classes demonstrate a robust trade-off betwee precision and recall.

•
The quite often class achieves moderate performance with an AP of 0.66, indicatin a reasonable balance between precision and recall but not as strong as Classes 1, and 3.

•
The frequently class has the lowest performance with an AP of 0.23, suggesting th the model encounters challenges in achieving both high precision and high recall f this class.

•
The nearly all the time class stands out with a perfect AP of 1.00, indicating that th model attains the highest precision while maintaining full recall for this class.In short, these AP scores provide insights into the precision-recall performance f each class.Higher AP values imply better trade-offs between precision and recall, indica ing the model's effectiveness in classifying those classes.
The superior accuracy of the ensemble model can be attributed to its ability to leve age the diverse strengths and perspectives of each constituent model.By combining th predictions of the SVM, DT, and RF models, the EM achieved more accurate predictio compared to any individual model.This highlights the advantage of ensemble models achieving higher accuracy and improving the robustness of predictions.
Driving behaviors were categorized based on levels of hostility, which were sep rated into six classes: never (0), hardly ever (1), occasionally (2), quite regularly (3), reg larly (4), and nearly always (5).The dataset was trained, and the outcomes were used In short, these AP scores provide insights into the precision-recall performance for each class.Higher AP values imply better trade-offs between precision and recall, indicating the model's effectiveness in classifying those classes.
The superior accuracy of the ensemble model can be attributed to its ability to leverage the diverse strengths and perspectives of each constituent model.By combining the predictions of the SVM, DT, and RF models, the EM achieved more accurate predictions compared to any individual model.This highlights the advantage of ensemble models in achieving higher accuracy and improving the robustness of predictions.
Driving behaviors were categorized based on levels of hostility, which were separated into six classes: never (0), hardly ever (1), occasionally (2), quite regularly (3), regularly (4), and nearly always (5).The dataset was trained, and the outcomes were used to accurately classify and explain driving behaviors in several aspects.A driver's behavior is primarily good and exhibits few symptoms of hostility, as is the case when the projected outcome for a specific occurrence is 0. In contrast, if the expected result is 3, it suggests that the driver's overall behavior is assessed as "Quite regularly".The model's output of 3 implies that the driver's behavior is balanced and falls in the middle of the scale, showing a moderate frequency of driving behaviors across all criteria and sub-criteria.The classification system provides a thorough overview of the range of driving behavior and enables a thorough examination of the different levels of aggression displayed by drivers.
Building upon the foundation laid by previous studies that employ state-of-the-art ML for evaluating driver behavior, this research further elucidates the field's dynamic advancements.For instance, one study evaluated driving behavior using real vehicle experiments with 16 drivers, validating aggressive driving behaviors.This study employed the Sum Rule classifier and achieved an F1-score of 90.02% [86].Another study developed an ensemble learning method for vehicle behavior prediction, where both single-task and multi-task approaches generally outperformed others, with accuracies ranging from 0.72 to 0.94 [87].Similarly, an urban driving perception method was developed using highresolution maps and data-driven models, which enhanced maneuver prediction accuracy by up to 56% compared to a comparative approach [88].Additionally, a comprehensive study contributed to driving behavior research by referencing various studies and models and presented a comparison table of approaches and algorithms for driving behavior recognition, with accuracies ranging from 73% to 99.7% [89].Moreover, a study focused on driving behavior modeling in Mexico using ML models reported accuracies ranging from 0.82 to 0.96, offering insights into predicting driving behavior within the Mexican context [90].
The findings of the current study demonstrate the potential of ensemble models, which utilize SVM, DT, and RF, to significantly enhance prediction accuracy.The ensemble model achieved the highest accuracy, reaching an impressive 84.4%, thereby surpassing other models and highlighting its exceptional predictive capability.Furthermore, the ensemble model exhibited superior precision and recall values, emphasizing its effectiveness in accurately classifying both positive and negative instances.These results underscore the value of ensemble models as a robust approach for enhancing prediction accuracy in ML tasks aimed at predicting risky driving behaviors.

Conclusions
This study provides valuable insights into the factors that contribute to driver behavior and improved traffic management in Peshawar.Fuzzy AHP is used to rank the driver behavior with pairwise comparison and sub-criteria subsequently.Additionally, ML is used to predict the driver behavior while using different models.
The study identified three main criteria for assessing driver behavior: violations, errors, and lapses.Violations involve intentional disregard for safety laws, such as careless driving or speeding.Errors are unintentional mistakes, like misjudging distances or neglecting blind spots.Lapses refer to instances of inattention or forgetfulness, such as forgetting to signal or check blind spots.The study also found specific sub-criteria within each category, highlighting common behaviors, such as overtaking slow drivers, changing lanes without checking mirrors, and using third gear away from traffic lights.Clear road signage and driver attention were emphasized as important factors for improving behavior.
The results obtained from ML techniques contribute to the prediction of driver behavior.ML itself is employed as a means to forecast and anticipate driver behavior patterns.The aim of this method is a thorough assessment of several ML models that have been specially adapted for the task at hand.With an impressive accuracy rate of 76.8%, the results place the SVM model in the lead.The DT classifier obtains an accuracy of 68.4%, slightly better than the NB model's 66.30%, which follows closely behind.With a high accuracy rate of 80.30%, the RF classifier stands above its peers.An ensemble model is built by combining the capabilities of the SVM, DT, and RF models in order to further improve the forecast accuracy.The accuracy rating for this ensemble model is significantly higher at 80.40%.The majority of a driver's behavior is positive and shows little signs of antagonism, as is the case when the predicted result for a particular incident is 0. If the predicted outcome, however, is four, it indicates that the driver regularly exhibits violent behavior in a range of circumstances.The classification system provides a thorough overview of the range of driving behavior and enables a thorough examination of the different levels of aggression displayed by drivers.
The present study has certain limitations, particularly in targeting drivers in Peshawar.As many taxi drivers in the area lack formal education, an online survey administrated in Google Forms may not accurately capture their driving behavior.For future studies, it is recommended to collect data through a paper-based survey that specifically targets local taxi drivers, given the significant number of uneducated drivers in this occupation in Peshawar.One potential approach is to use Structural Equation Modeling (SEM) to modify the survey based on SEM results, which can capture latent attributes, which will provide realistic behaviors related to the relationship between different factors affecting driving behavior.
This study will a baseline for policymakers, stakeholders, and government bodies to make effective policies for sustainable urban mobility.Traffic management authorities can use the results to develop targeted interventions and policies to address these behaviors, ultimately leading to improved road safety, reduced traffic incidents, and predicting driver behavior.The study suggests that "honking and indicating aggression towards other drivers" are important areas of focus for aggressive violations, while "overtaking slow drivers", disregarding speed limits on residential roads and motorways, and pulling out of junctions that other drivers have stopped at are areas of focus for ordinary violations.The study also highlights the critical errors that drivers in Peshawar commit that can be detrimental to traffic safety.These findings can be used to prioritize road safety measures and interventions to address the most pressing issues.Policymakers can leverage the study's findings to formulate evidence-based policies for sustainable urban mobility.Ultimately, a collaborative effort between researchers, policymakers, and relevant authorities can lead to significant improvements in road safety and overall transportation efficiency in Peshawar.

Figure 2 .
Figure 2. The hierarchical structure of the driver behavior criteria.

Figure 2 .
Figure 2. The hierarchical structure of the driver behavior criteria.

Figure 3 .
Figure 3.The intersection of fuzzy numbers.

Figure 3 .
Figure 3.The intersection of fuzzy numbers.

Figure 4 .
Figure 4. Workflow chart of machine learning.

Figure 4 .
Figure 4. Workflow chart of machine learning.

25 = 2
Weight average value = Sum of Mean values Total number of items Weight average value =65.8137

2 .
Hardly ever: Class 2, (AUC = 0.91): With an AUC of 0.91 for Class 2, the model demonstrates a strong ability to differentiate Class 2 from other classes, implying a good performance in this regard.3. Occasionally: Class 3, (AUC = 0.99): The remarkably high AUC of 0.99 for Class 3 highlights the model's exceptional proficiency in recognizing and distinguishing Class 3 from other classes.4. Quite often: Class 4, (AUC = 0.93): The AUC of 0.93 for Class 4 reflects the model's competence in successfully distinguishing Class 4 from other classes, indicating a good level of performance.5. Frequently: Class 5, (AUC = 0.85): The AUC of 0.85 for Class 5 suggests that the model is reasonably effective in distinguishing Class 5 from other classes, indicating a decent performance.6.Nearly all the time: Class 6, (AUC = 1.0):An AUC of 1.0 for Class 6 signifies that the model excels at identifying Class 6 and distinguishing it from other classes, demonstrating a flawless performance.Sustainability 2024, 16, x FOR PEER REVIEW 21 of 28 3. Occasionally: Class 3, (AUC = 0.99): The remarkably high AUC of 0.99 for Class 3 highlights the model's exceptional proficiency in recognizing and distinguishing Class 3 from other classes.4. Quite often: Class 4, (AUC = 0.93): The AUC of 0.93 for Class 4 reflects the model's competence in successfully distinguishing Class 4 from other classes, indicating a good level of performance.5. Frequently: Class 5, (AUC = 0.85): The AUC of 0.85 for Class 5 suggests that the model is reasonably effective in distinguishing Class 5 from other classes, indicating a decent performance.6.Nearly all the time: Class 6, (AUC = 1.0):An AUC of 1.0 for Class 6 signifies that the model excels at identifying Class 6 and distinguishing it from other classes, demonstrating a flawless performance.

Figure 5 .
Figure 5. Receiver Operating Characteristic (ROC) Curve.The AUC values are indicative of the model's proficiency in classifying various classes.Higher AUC values are associated with better performance, suggesting the model's greater ability to distinguish that particular class.In this context:

Table 1 .
Sample characteristics of participants.

Table 4 .
TFN of linguistics comparison matrix.

Table 4 .
TFN of linguistics comparison matrix.

Table 5 .
Priority ranking of criteria for driver behavior.

Table 6 .
Priority ranking of criteria for aggressive violations.
Note: AV stands for aggressive violations.

Table 7 .
Priority ranking criteria for ordinary violations.
Note: OV stands for ordinary violations.

Table 10 .
Driver behavior ranking by different age groups.

Table 12 .
Accuracy of machine learning models.: accuracy, precision, recall, and F1-Score are common metrics employed in classification tasks to assess the performance of each model. Note