Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model

Moreno-Sanfélix, Alejandro; Gragera-Peña, F. Consuelo; Jaramillo-Morán, Miguel A.

doi:10.3390/urbansci10020068

Open AccessArticle

Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model

by

Alejandro Moreno-Sanfélix

^1,2,*

,

F. Consuelo Gragera-Peña

¹ and

Miguel A. Jaramillo-Morán

¹

Escuela de Ingenierías Industriales, Universidad de Extremadura, Avenida de Elvas s/n, 06006 Badajoz, Spain

²

Judicial Traffic Police of the Local Police of Badajoz, St. Gaspar Méndez, 2, 06011 Badajoz, Spain

^*

Author to whom correspondence should be addressed.

Urban Sci. 2026, 10(2), 68; https://doi.org/10.3390/urbansci10020068

Submission received: 10 December 2025 / Revised: 15 January 2026 / Accepted: 20 January 2026 / Published: 23 January 2026

Download

Browse Figures

Versions Notes

Abstract

Pedestrian crashes cause the most injuries of all types of traffic crashes. Despite their direct judicial and societal impact, the automatic classification of legal responsibility remains largely unexplored. This work addresses this gap by formulating the responsibility assessment problem as a supervised multi-class classification task and proposing a Multilayer Perceptron (MLP) based decision-support system. The objective is to establish the basis for a “robot judge” application that assists the Judicial Traffic Police (JTP), Courts, and Prosecutors in identifying cases with a clear level of responsibility in pedestrian crashes. This study draws on real-world data from reports by the Local Police of Badajoz (LPB) and Spanish Judiciary (SJ) judicial decisions. After rigorous data preprocessing, 14 meaningful binary variables were identified. The level of responsibility in a pedestrian crash depends on these 14 variables, which constitute the feature space used to model responsibility as a five-category output variable. We were able to reclassify the categories of each pedestrian crash and improve the metrics using the MLP model. More precise levels of responsibility could be determined. This would help the JPT and the Courts make more efficient and objective final decisions in similar cases. It would also enable them to focus their efforts on more complex cases requiring further investigation by human specialists. In turn, policymakers could take new measures to reduce pedestrian crashes by analyzing influential variables.

Keywords:

traffic; responsibility; pedestrian crash; multilayer perceptron; robot judge

1. Introduction

The number of people killed or injured in road traffic accidents worldwide is in-creasing every day [1]. Recent studies estimate that road traffic crashes will be the fifth leading cause of death worldwide by 2030 [2]. Pedestrians tend to be the most injured of all road users [3,4]. It is necessary to reduce the impact of accidents, prevent them and determine the real responsibility of each person involved in these events. Despite various road safety interventions, the prevention of fatal and serious traffic accidents remains a significant challenge, causing human suffering and economic costs.

Table 1 shows statistics from 2018 to 2024 from regions such as the United States, Europe, and Asia [5,6,7,8]. These statistics are alarming, and all authorities must make greater efforts to try to reduce these results. Understanding the multicausal nature of road crashes, where multiple conditions and factors interact is critical [9].

In order to improve these issues, Artificial Intelligence (AI) is beginning to be implemented as an innovative tool that will allow these problems to be addressed in a more accurate and efficient method. AI is also being used to analyze traffic accidents. It is a powerful tool for assessing road safety and identifying factors that should be addressed by policymakers [10]. AI has also been used to predict traffic injuries and severity in transportation studies [11,12]. Others try to predict where new accidents may occur [13,14].

Focusing on pedestrian crashes, there are also many works in scientific literature that analyze this type of traffic accident using methods that apply AI. Subasish, D. et al. (2021) [15] reviewed the characteristics of pedestrian fatalities in the US and the associated crash scenarios. They applied rule mining to four subgroups identified based on the frequency of fatal crash scenarios. Yasir, A. et al. (2023) [16] proposed a novel, real-time, vehicle-pedestrian crash risk modeling framework for signalized intersections. Using the Bayesian generalized extreme value modeling approach, they estimated crash risk in real time from traffic conflicts captured by post-encroachment time. The proposed framework processed 144 h of video data of traffic movement from three signalized intersections in Queensland, Australia. Sengupta, A. et al. (2024) [17] evaluated the reliability of automatically generated surrogates in predicting confirmed conflicts without human supervision. They employed advanced data-driven models, such as logistic regression and tree based algorithms, to examine the distinctions between significant variables in identifying bicycle and pedestrian conflicts. Eaysir, A. et al. (2025) [18] propose a game-theoretic approach that frames interaction as a simultaneous, non-cooperative, two-player game at unsignalized pedestrian crossings in Brisbane, Australia. Videos were analyzed using AI and modeled using binary logit models to understand pedestrian crossing and yield decisions. Qingwen, P. et al. (2025) [19] aimed to model the collision avoidance behavior of vehicles interacting with pedestrians in near-miss scenarios, contributing to the development of collision avoidance systems and safety-conscious traffic simulations. The researchers used unmanned aerial vehicles to collect high-resolution vehicle-pedestrian trajectory data at urban intersections. Then, they developed an effective algorithm that considers predicted trajectories and collision types to compute CurvTTC. Li, X. et al. (2025) [20] proposed a novel dual social graph attention network (DSGAT) that systematically models multi-level interactions of pedestrian trajectory prediction based on dual social graph attention networks. This framework is specifically designed to enhance the extraction of pedestrian interaction features within the environment, thereby improving the trajectory prediction accuracy. Kim, M. et al. (2025) [21] studied the connection between aggressive driving and traffic accidents involving elderly pedestrians. They used a method called eXplainable Artificial Intelligence (XAI) to do this study in Seoul, South Korea.

The investigation of any traffic accident is a complex process involving several influencing factors, the kind of problem for which AI is particularly well adapted, a fact that has led many researchers to use it as the tool of choice for traffic accident analysis. Likewise, the judicial process is also a complex task that requires a precise vision and in-depth analysis of each case [22]. For this reason, the work of the members of the judiciary -e.g., Judicial Traffic Police (JTP) or judges and prosecutors- is arduous [23,24]. Due to this complexity, there is great interest in the use of automation tools based on AI in judicial processes to improve efficiency [22]. There are growing expectations for the creation of robotic judges (also known as “AI judges” and “algorithmic judges”) that can replace human judges and make decisions automatically based on large amounts of data [25,26]. The concept of a “robot judge” or “AI judge” is still new, but there are signs that it is steadily growing. Many studies and efforts are underway [27].

In this work AI tools and the concept of “robot judge” are be used to determine the level of responsibility of the actors involved in a pedestrian crash (pedestrian and driver) by analyzing the influencing variables. AI can make decisions in short order in areas where there are millions of variables, or to determine an appropriate punishment for a crime [28]. This is where the concept of the “robot judge” comes in, which is beginning to be mentioned in scientific literature, and which will help both the JTP and the courts to determine civil or criminal liability, and that these responsibility rates are as close to reality as possible. Moreover, the JTP and judges may benefit from this technology as a tool for reducing effort costs and increasing accuracy [29].

The integration of AI into judicial systems offers significant benefits in terms of efficiency, access to justice, and accuracy of decision-making [30]. All of these advances have already had an impact on the way different countries approach their judicial processes [31,32]. However, human judges must also develop and adapt their roles to these new AI systems in order to build alliances and expand judicial efficiency, increase their effectiveness, and improve the quality of their final decisions [24].

In this line of research on “robot judges”, there are works such as that of Sourdin, T. (2018) [33], which notes that the impact of AI on the justice system is significant, with the technology helping to inform, support and advise people involved in the justice system, replacing functions and activities previously performed by humans, and ultimately changing the way judges work and deliver very different forms of justice. Ulenaers, J. (2020) [34] examined the potential impact of AI on the right to a fair trial. He focused on how it is used in the courtroom. On the one hand, judges’ decision-making processes can be assisted by “AI assistants” through the prediction and preparation of judicial decisions. Conversely, “robot judges” have the capacity to substitute for human judges, determining cases independently in court proceedings that are entirely automated. Chronowski, N. et al. (2021) [35] conducted an in-depth analysis of the practical implementation of AI as a tool to assist human judges, as a means to assist in the drafting of judicial decisions, and as an automated judge. Barysė, D., and Sarel, R. (2023) [29] investigated how society perceives the utilization of algorithms by judges in judicial proceedings. The process was divided into four stages: information acquisition, information analysis, decision selection, and decision implementation. Matić-Bošković, M. (2024) [30] took on the question of whether the use of AI by judicial professionals poses a threat to the right to a fair trial. They also looked into whether some of the tools that have been introduced, especially those that predict court decisions and recidivism, go against fundamental rights. Jin, Y., and He, H. (2020) [22] examined and evaluated the AI-based automation program implemented in China’s judicial system, highlighting that the extraction of information and the generation of rationales for judges might represent the subsequent phase in the implementation of AI-based automation instruments within the judicial apparatus. Aboelazm, K.S. et al. (2024) [27] examined the controversy surrounding the replacement of human judges with other robotic judges supported by AI technology, powerful algorithms and big data, using the analytical approach to extract the working mechanisms of AI in this area and identified cases of bias based on practical practice and the experience of some countries, such as the United States of America and China. Watamura, E. et al. (2025) [36] investigated whether jurors are more inclined to acquiesce to the judgments rendered by human judges or AI, particularly in cases involving mitigating circumstances in which human-like reasoning may be esteemed. Fine, A. et al. (2025) [37] examined the role of AI in judicial decision-making, focusing on bail and sentencing contexts.

As previously mentioned, many studies use AI models to analyze the most influential variables in road crashes. These works primarily focus on analyzing injury severity among the actors involved or on predicting crash occurrence in order to identify high-risk locations, commonly referred to as “black spots”. Other studies apply multiclass classifiers based on deep learning and explainable AI to analyze the importance of each feature, interpret results, and offer reliable tools to guide future policy development and decision-making for improving road safety [13,38,39,40,41].

However, the scientific literature contains very limited research on the automatic classification of responsibility levels in pedestrian crashes based on influential variables using AI-based approaches. These techniques can be employed to facilitate the work of judges and police officers when analyzing each specific case and are analogous to a robot judge. In this context, the Multilayer Perceptron (MLP) model-based classifier has the potential to provide critical information, automate decision-making, and assist in selecting multiple alternatives [42]. Nevertheless, as discussed in this work, there are cases where the results obtained require review by a human expert because they are not clearly defined, raising doubts. This makes the work useful as a support tool for research and legal adjudication [35]. Consequently, policymakers and courts should deliberate on the prospective hazards of algorithmic bias and the necessity for stringent oversight mechanisms to guarantee the ethical integration of AI into the legal framework [36]. In this way, could the arduous and complex work of the authorities responsible for determining responsibility in traffic crashes be optimized? Would these investigations improve traffic safety in our cities by focusing on the most influential variables?

In Figure 1 a research flow chart of the present work is shown. It provides a clear overview of our study. The process begins with the identification of the problem. Next, real and reliable data are collected from authorized databases -Local Police of Badajoz (LPB) and Spanish Judiciary (SJ)-. The data are preprocessed to determine the 14 input variables and five output categories. Then, the data are processed using the MLP75-25 model to analyze the metrics and their output parameters. After identifying the match and mismatch patterns, the data are reprocessed and optimized by determining different reliability bands (Band 1/Reliable, Band 2/Unreliable, and Band 3/Unclear) and reassigning mismatches. Finally, the optimized MLP75-25.opt is evaluated, and key insights and practical implications are discussed.

The rest of the paper is organized as follows. Section 2 explains and analyzes material and methods. Section 3 provides the results, while Section 4 discusses all the results. Finally, the conclusions are presented in Section 5.

2. Material and Methods

2.1. Project Summary

The objective of the present study is to establish the foundation for the development of an application analogous to a “robot judge” that can help JTP, courts, and prosecutors to identify cases where the level of responsibility is clearer and dedicate their time and effort to complex and challenging cases, leading to greater efficiency and effectiveness in their decision-making in the sense that Volok points out [43]. The result is a robust algorithm that, when implemented with big data, will improve its capacity and help reduce the cost of adjudications [27]. As a result, responsibility is automatically assigned to each person involved in the crash, making the process more objective and free from human subjectivity and bias. In other words, decisions are independent of subjective criteria, such as those of police officers and judges, who may be influenced by criteria unrelated to the objectivity of the process when classifying the level of responsibility, depending, for example, on the zone of the city, the economic capacity or fame of the people involved, or other non-objective factors. The final decision that determines the level of responsibility is not based on the severity of the victims’ injuries, either. Otherwise, the analysis of responsibility would also be subjective if injury severity determined the final decision. If human interpretation is required, a recommendation will be issued. This also enables local governments to implement new, efficient measures to reduce traffic crashes, as the study of pedestrian crash variables will be objectively analyzed.

Law enforcement agencies, including JTP and courts, will be equipped with the capacity to systematically and objectively examine pedestrian crashes. This will facilitate the identification of crashes where the level of responsibility is evident and those where it is ambiguous, enabling these agencies to concentrate their efforts on the latter category. Law enforcement will produce more accurate reports, so judicial authorities will no longer require in-person attendance at straightforward cases for clarification or ratification. Law enforcement will only attend trials for complex cases where the level of responsibility is unclear and requires a more thorough analysis. The final decisions of law enforcement, first, and the courts, second, will be consistent with these policies. Consequently, works of this nature can play a pivotal role in supporting human judges, as their function is not to substitute for but rather to assist them. This contributes substantially and effectively to the objectivity and impartiality of judicial rulings, the quality of these rulings, and the conservation of time and effort [27]. Moreover, it has the ability to prevent inconsistent final decisions in analogous cases and eliminate the uncertainty arising from the discretion of the human judge [44]. As Gabriel, I. (2022) [45] asserts, “similar cases should be resolved in a similar way”.

2.2. Data

Data related to pedestrian crashes were extracted from the reports of the JTP of the LPB in Spain and judicial decisions of the SJ from 2015 to 2024. The selected pedestrian crashes are those that resulted in at least one victim (minor, serious, or fatal) in Badajoz, and the case was tried in court. To ensure data homogeneity, pedestrian crashes involving more than one victim (whether minor, serious, or fatal) were excluded. Crashes involving more than one vehicle or driver were also excluded. These crashes are both less common and more complex to analyze. Additionally, the selected crashes are closed cases with no possibility of further legal action.

This study examined 510 crashes involving pedestrians. Of these 428 were selected from JTP of LPB and 82 from SJ. They figured out the most important factors that JTP and the Courts had considered when deciding how much liability there was. The level of responsibility determined in the JTP reports (428) corresponds with the rulings of the Courts of First Instance. The remaining 82 pedestrian crashes were selected from Courts with a higher level of hierarchy than the Courts of First Instance. In both cases, the samples are authentic, which ensures the validity of the results. The samples are also recognized as being highly credible in road traffic safety research.

All information was organized into an analysis matrix to identify the variables present in both court rulings and traffic police reports. Since the information in the original dataset (LPB and SJ) was presented as binary values (“yes” or “no”), these variables were also assigned binary values (“1” or “0”). This assignment is suitable for the problem at hand because assigning non-binary values would result in more com-plex models susceptible to subjective human interpretation. Thus, using binary values makes it easier to identify their influence on levels of responsibility.

Of all the variables identified, 14 were common to all 510 pedestrian crashes. These 14 variables were used to measure the liability index of each party involved. Therefore, we chose these 14 variables to include in the study. This set of variables represents the most important factors in determining the responsibility levels of those involved in pedestrian crashes. Each variable was assigned a binary value.

Table 2 shows the 14 selected binary variables and their assigned values (0 or 1). These variables are related to the four subsystems involved in a traffic accident, as established in the Sequential Event Model (MOSES) [46] and in the improvement of the Sequential Event Model (i-MOSES) [47], which are used to investigate and reconstruct traffic crashes based on the sequential event conceptual system. Both models are formulated from General Systems Theory and Accident Evolution Theory, where any road accident is divided into phases (pre-trip, trip, pre-impact, impact, and post-impact), and each phase can be considered to be composed of four subsystems (the human subsystem, the technological subsystem, the normative subsystem, and the structural subsystem). In the human subsystem we find six variables: H-1, attention to driving is determined by the coincidence of the possible perception position (PPP) and the real perception position (RPP); H-2, reaction time (RT) of the normal person; H-3, alcohol in driver; H-4, drugs in driver; H-5, alcohol in pedestrian and H-6, drugs in pedestrian. In the technological subsystem we have two variables: T-1, periodic technical inspection of the vehicle and T-2, pedestrian clothing. In the normative subsystem we would have four variables: N-1, driving license; N-2, speed limit; N-3, driving using mobile and N-4, crossing using mobile or with music headphones. Finally, the structural subsystem has two other variables: S-1, location of the pedestrian crash and S-2, lighting conditions. Fog, glare, slippery surfaces, and other factors are not considered influential in determining responsibility because police reports and court rulings do not address them. According to current regulations, road users (drivers and pedestrians) must take all necessary precautions in adverse conditions, including stopping the vehicle or leaving the road. Therefore, these factors do not absolve any party of responsibility.

Table 3 shows the five levels of responsibility for drivers and pedestrians. These levels are based on the reports from the JTP of the LPB and Court decisions from the SJ. In category A, the level of responsibility in a pedestrian crash would be 100% driver and 0% pedestrian, while in category B it would be the opposite, 0% driver and 100% pedestrian responsibility. In category C the level of responsibility would be 75% driver and 25% pedestrian, while in category D it would be the opposite, 25% driver and 75% pedestrian responsibility. Finally, in category E the driver and pedestrian are considered equally responsible for the crash, 50% each.

The legal standards for assigning responsibility may differ in other jurisdictions, which could limit the generalizability of this research. This would necessitate adapting the research to each specific regulatory system. For example, speed and alcohol concentration limits vary by country. Furthermore, the assigned responsibility percentages could also vary. For example, they could be 60-40/40-60 or 80-20/20-80 instead of 75-25/25-75. However, categories A, B, and E would be fixed.

2.3. Methodology

2.3.1. The MLP Neural Network Model

In this work, a very simple but powerful model was chosen to perform a supervised classification process: the well-known MLP [48]. This classifier was chosen because it is ideal for identifying the categories into which the data must be classified (five well-defined categories, as mentioned above). Furthermore, the MLP model has the advantage of allowing adjustments since it does not provide the classification directly like other methods do. It provides degrees of membership, and we can adjust the membership to the classes by manipulating them.

Additionally, it can be stated that the MLP is a suitable alternative to a rule-based expert system that the nature of the problem at hand may suggest: a classification of accidents based on whether or not certain items are accomplished. Defining such a system with a high number of variables to be considered (14 in this case) will be difficult because it will require knowledge of legal rules and an extensive trial-and-error process to develop the classification tool. The MLP model provides a reliable classification tool by learning from data without human intervention through an automated and fast algorithmic process.

It is well-suited for classification tasks because it can learn the organization of a dataset by adjusting its internal parameters (weights) during the training phase, in which each element of the dataset is assigned to one of several predefined categories (supervised learning). Provided that enough data is used to train the network and its structure is correctly defined, it can provide very reliable classifications [49,50,51]. In fact, they can approximate any continuous function with a single hidden layer containing enough neurons [52,53].

The MLP model consists of multiple layers of neurons, and each neuron is connected to all neurons in the preceding layer. The model consists of an input layer, one or more hidden layers, and an output layer. The input layer does not represent an actual layer. It represents the set of external inputs. The hidden layers process all the outputs each receives from the preceding one. The output layer provides the network’s prediction or output [54]. Each neuron performs the weighted sum of all the inputs it receives, which is modified by a simple activation function [55].

y_{k} = σ (\sum_{j = 1}^{n} w_{k j} x_{j} + θ_{k})

(1)

In this equation x_j represents the j-th input of the k-th neuron, w_kj stands for the strength (synaptic weight) of the connections between the k-th neuron and the j-th one in the previous layer, y_k is the neuron’s output and θ_k represents a bias constant. σ(.) is the neural activation function that provides its output. It is usually a nonlinear function, which allows the neural network to learn the nonlinear behavior of complex systems. Bounded functions, such as Gaussian, hyperbolic tangent, or sigmoid, are usually used to give the neural outputs their bounded behavior, as in natural neurons. Typically, a linear function is used in the output layer to provide a response that matches the values of the expected output data.

MLPs, like all neural models, must be trained before they can be used to perform the task for which they were designed. Therefore, the whole dataset (inputs and desired outputs) must be organized into two subsets: one for training and the other for validation. During training, the neural network learns the behavior of the system represented by the data; in the problem at hand, they learn the organization of these data. To perform the training process, all inputs (patterns in the problem at hand) are presented to the network, which will provide the corresponding output (category, in the problem at hand). These outputs are then compared to the desired responses to obtain a prediction error. This error is used to properly modify the neuron weights in order to minimize its value so that the network can learn the system behavior (data organization in the problem at hand). This is achieved by backpropagating the output error to the previous layers and adjusting the corresponding synaptic weights. This process is repeated iteratively until a predefined minimum error is reached. Several algorithms have been proposed for this process, though the Levenberg–Marquardt (LM) algorithm is the most commonly used. After training the neural network, its performance can be tested using validation data.

2.3.2. Performance Metrics

In order to assess the performance of the classification process, five evaluation metrics were calculated. To properly define these indices several concepts regarding whether a pattern is correctly classified into a category or not need to be stated to better understand the meaning of these metrics. They must be provided for each category. They are: True Positive (TP), a pattern is classified into the category it belongs to; False Positive (FP), a pattern that does not belong to the category is classified into it; True Negative (TN), a pattern that does not belong to the category is not classified into that category; False Negative (FN), a pattern that belongs to the category is classified into a category it does not belongs to.

Accuracy (ACC) represents the ratio of correctly classified patterns to the total number of patterns. In other words, it is the proportion of correctly classified patterns for a specific category [56]. It is obtained as:

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(2)

Precision (P) represents the ratio of patterns classified in the category to which they belong against the sum of correctly and incorrectly classified patterns in the category [57]. This is given by:

P = \frac{T P}{T P + F P}

(3)

Recall (R), also known as Sensitivity, is the ratio of patterns classified in the category to which they belong against all patterns included in the category [57]. It is obtained as:

R = \frac{T P}{T P + F N}

(4)

Specificity (S) represents the ratio of the patterns that do not belong to the category and are not classified into it to all the patterns that do not belong to the category. It is given by:

S = \frac{T N}{T N + F P}

(5)

F1Score (F1S) represents the harmonic mean of P and R [57], that is to say:

F 1 S = 2 \frac{(P \cdot R)}{(P + R)}

(6)

Additionally, given the severe class imbalance of the original data in the actual cases analyzed, other metrics are useful for analyzing the fit of unbalanced data sets, such as Macro F1Score (MF1S), Balanced Accuracy (BACC), and Cohen’s kappa coefficient (κ).

MF1S [58] and BACC [59] represent the average values of the F1S and R metrics, respectively.

The Cohen’s kappa coefficient (κ) measures the agreement between the observed and predicted classes of cases in a testing dataset [60]. It is obtained as:

κ = \frac{A C C - P}{1 - P},

(7)

where P is the hypothetical probability of chance agreement, and it is estimated using the values in the confusion matrix to calculate the probabilities of randomly choosing each class.

In order to provide a graphical comparation of the performance of each category, the Receiver Operating Characteristic (ROC) is also show for each category. All of these metrics are commonly used in classification problems. They are defined for each category, i.e., they refer to a specific category.

The ROC represents the curve for a given category describing the relationship between TP and FP rates as an average to evaluate the performance of the classification model [61]. The closer the ROC curve to the upper left corner, the better the performance of the model. The Area Under the Curve (AUC) value represents the area of the graph enclosed by the abscissa and ordinate axes below the ROC curve. It ranges from 0 to 1. The closer the AUC value is to 1, the better the model performance [62].

2.4. Model Structure

2.4.1. Neural Network Structure

The dataset used in this work is organized into 510 pairs of pattern-class binary vectors: the first is associated with the 14 variables that define the circumstances of the crash (Table 2) and the second represents the category (Table 3) to which this crash is assigned. Therefore, the first vector has 14 components (one for each variable) and the second vector has 5 components (one for each category). In this category vector, only one component will be “1” (the category to which the pattern belongs) and the rest will be “0”.

Once the whole dataset is organized into vectors they are randomly divided into two sets: one to train the MLP classifier (training) and another to validate its performance (validation). Several divisions were tested, and the best performance was obtained with 75% for training and 25% for validation.

The MLP used had only one layer, because, as mentioned above, this is enough to approximate the behavior of any system. However, there is no algorithmic method to determine the optimal number of neurons in this layer. This number must be determined through trial and error. The best results were obtained with 5 neurons. This is a key step in defining the structure of the neural network because an insufficient number of neurons will prevent the network from accurately reproducing the system’s behavior. Conversely, a network with too many neurons will, at best, behave correctly but with excessive computational load. At worst, it will only memorize the input patterns and will be unable to generalize the acquired knowledge to classify unlearned patterns (overfitting).

The number of inputs (first layer) is determined by the number of components of the input pattern, 14 in this problem. The number of neurons in the output layer depends on the number of categories into which the input patterns will be classified, 5 in this case.

The hyperparameters defining the neural network model is summarized in Table 4. These hyperparameters were selected because they produced the best metrics after validations were performed with different hyperparameters, such as changing the number of neurons and layers, transfer functions, and training and validation percentages.

2.4.2. Classification Process

Once the neural network was defined and then trained with the training dataset it was used to classify the elements of the validation one. Both the training and validation processes were carried out in the Matlab environment (R2025a).

When the network classifies one input pattern, each output neuron gives the membership degree of this pattern to the category each neuron represents. Therefore, rather than providing binary values (0 s or 1 s), the output neurons will provide values between these extremes if they detect similarities with more than one category. In these cases, the network provides the membership degree of the input pattern to each category represented by the output neurons. It may be assumed that the input pattern belongs to the category represented by the neuron with the highest value.

Ideally, one of the output neurons should give a value close to “1” (the category to which the pattern belongs) while the rest will give values close to “0” (the pattern does not belong to those categories). This indicates that the network has clearly identified the pattern as belonging to one category. However, this is not always the case. Sometimes, the highest neural output is not close to “1”, and in other cases, two or more outputs are close to each other. While processing the training dataset, the training algorithm will adjust the neuron’s weights so that only one output neuron gives a value close to “1” while the others remain close to “0”. When processing the validation dataset, however, the membership degrees provided by the output neurons cannot be modified, and a decision must be made about which category the pattern belongs to. It is reasonable to assume that the input pattern should be assigned to the category represented by the neuron with the highest output. This is what we have done in this work.

The first step in improving the ACC of the classification process was analyzing the results of the classification process by identifying matching and mismatching assignments. We found that the higher the classification ACC, the fewer the mismatches. Matches were found with values greater than 0.85. Therefore, when the neural network was able to clearly identify a pattern to belong to a category it does not make errors. However, when the ACC falls mismatches appear. This could be interpreted in two ways: either the network fails to classify those patterns, in which case the model works poorly and should be modified, or the assignment to a certain category by police officers or judges could be inaccurate.

Sometimes it can be difficult to decide which category the pedestrian crash should be assigned to. This is supported by the fact that discrepancies only appear in classifications with low membership degrees, those with values lower than 0.85, for which even the neural model has doubts about the right classification.

For this reason, the second step was analyzing mismatches patterns (all those patterns found with values lower than 0.85). With these cases, three bands are defined according to the level of precision to simplify the study: band 1, or “reliable”, patterns were initially well-classified pedestrian crashes, but they certainly cannot be considered accurate since no output vector value exceeds 0.85 (the highest value of a vector component is between 0.75 and 0.85); band 2, or “unreliable”, patterns were inaccurate (when no value of the vector components exceeds 0.60); and band 3, or “unclear”, pedestrian crashes that must be studied more exhaustively by a human specialist (the highest value of one of the components of the vector is between 0.60 and 0.75).

This work assumes that the classifications provided by police officers or judges could not be as accurate as desired because they could be influenced by subjective factors due to a lack of resources [63,64] or specialization [65,66]. Thus, when analyzing or judging two similar crashes that occurred far apart in time or geography, they could be evaluated differently. These small differences could cause judges or police officers to assign different levels of responsibility (i.e., categories) to these crashes. For example, they could assign them to Category E (50-50 attribution of responsibility) instead of Category D (25-75) or Category A (100-0) instead of Category C (75-25). It is also worth noting that these possible misclassifications may be promoted by the fact that five “crisp” categories into which each accident must be classified are defined, even though the circumstances of each accident may lead police officers or judges to assign responsibility levels different from those established for the five defined categories in this study, for example, 60-40/40-60 or 80-20/20-80.

In order to try to overcome the distortion caused by these subjective decisions a reprocessing of the training dataset is proposed. The inaccurate patterns are identified and the classifications assigned in the JTP reports of the LPB and the SJ judicial decisions are modified to align with the categories identified by the MLP model. In other words, when a mismatch is detected, the classification is changed to that provided by the neural model.

The classification of patterns within band 1 (reliable) remains the original assignment since these are considered patterns initially well-classified crashes, although not as accurate as desired (the output vector value is not greater than 0.85). For example, the output vector for Pattern 243 is (A_0.0157, B_0.1462, C_0.0519, D_0.7073, E_0.0789). According to this, the correct category for this attribution would be Category D. However, this interpretation is not entirely precise according to the network, since the value of 0.7073 is less than 0.85. After reprocessing, the D component of the output vector exceeds 0.85, which suggests that his classification may be correct.

The classification of band 2 patterns (unreliable) is modified according to the values of the output vector. If the values are close to each other and far from 0.60, they are not reclassified because the neural network does not provide a reference value for accurate reclassification. For example, in Pattern 22, the output vector is (A_0.3211, B_0.0366, C_0.3524, D_0.0870, E_0.2029), indicating that the network is uncertain about the classification between Categories A, C, and E. Conversely, if an output vector value is close to 0.60, but below it, the pattern is reclassified as the category that corresponds to this higher value. For example, Pattern 306 has an output vector of (A_0.0167, B_0.5693, C_0.0393, D_0.2144, E_0.1603) and is reclassified from Category D to Category B.

Finally, following the same logic for reclassification, patterns in band 3 (unclear) with values close to 0.75 are left unchanged. However, those with values close to 0.60 are modified.

It is worth noting that several values close to the three selected thresholds (0.85, 0.75 and 0.60) were also tested to try to improve the accuracy of the classification process after reprocessing the training dataset. However, no significant improvement was obtained.

Thereby, a new “optimized” training dataset is obtained, which will subsequently be used to train the neuronal model. Figure 2 shows the process developed in the second step of this study. Once the new training dataset is made up the neural network is trained with it. Then the validation dataset is classified again. The results obtained are compared to those obtained with the original training dataset. They are shown in Table 5 and Table 6, where “MLP75-25” identifies the results obtained with the original training dataset, and “MLP75-25.opt” identifies the results obtained with the “optimized” one.

3. Results

The results obtained after processing the validation dataset show (Table 5) that the model has relatively good performance: an overall ACC (total number of correctly classified patterns to total number of patterns) of 0.8538 for the training dataset and 0.7266 for the validation one. These values can be assumed as acceptable for an automatic classification tool, although further improvement should be desirable. Therefore, in order to achieve better performance, the reprocessing of the training dataset described above is performed with the aim of improving the classification metrics. To further justify selecting the MLP model over other machine learning tools, the results obtained with the MLP model have been compared with those obtained with other conventional models. In all cases, the accuracy was lower with the other models, such as the linear discriminant (62.48%), the multinomial logistic regression (65.31%), the k-nearest neighbors’ algorithm (64.29%), or the support vector machine (64.70%).

Table 5. Overall ACC of the two models studied.

Model	ACC
Model	Train	Validation
MLP75-25	0.8538	0.7266
MLP75-25.opt	0.9084	0.8906

Table 5 shows the overall ACC obtained with the original training dataset, “MLP75-25”, as well as that achieved with the same neural model when processing the “optimized” training dataset, “MLP75-25.opt”. These values show that modifying the training dataset as proposed in Section 2.4 has significantly improved ACC in the classification process. The overall ACC obtained with the training dataset increased from 0.8538 to 0.9048, an increase that, although not very high, shows that the network learned the classification represented by the new training dataset better. However, the increase achieved with the validation dataset is much more significant: the overall ACC has risen from 0.7266 to 0.8906, an increase of 0.164 (16.4% when referred to the perfect classification: overall ACC 1). These results show that the new model can match in nine out of ten cases, whereas the old model could only match approximately seven out of ten. The optimized model is more accurate and positively influences the final decisions of police officers and judges. This is especially beneficial for those involved in pedestrian crashes.

These results deserve to be analyzed in more detail by studying all the metrics de-fined to better evaluate the ACC of the classification process. The metrics obtained are shown in Table 6, and the analysis of these values shows that all metrics improved in the MLP75-25.opt model compared to the MLP75-25 model. This demonstrates that the MLP can provide efficient classifications with proper analysis of the output data for retraining. The performance of all metrics was relatively high in the MLP75-25.opt model. However, the metrics for Categories B, C, D, and E are worse than those for Category A. One possible reason for this is that there were not enough labeled samples in the dataset for these output categories. For categories with few members, such as B, C, D, and E, even minor discrepancies can significantly impact the R, P, and F1S values. This suggests that the model would be more accurate if the datasets were balanced. If we focus on the F1S metric, especially useful in the case of unbalanced categories, we can see that the result for the majority of Category A improves in the optimized model (from 0.91 to 0.99). However, there is a significant improvement in the rest of the minority categories, especially in Category D, which goes from 0.17 to 0.71.

Table 6. Metrics for MLP75-25 model vs. MLP75-25.opt model.

Category	ACC		R		P		S		F1S
Category	MLP 75-25	MLP 75-25.opt	MLP 75-25	MLP 75-25.opt	MLP 75-25	MLP 75-25.opt	MLP 75-25	MLP 75-25.opt	MLP 75-25	MLP 75-25.opt
A	0.8672	0.9844	0.8817	0.9880	0.9318	0.9880	0.8286	0.9778	0.9061	0.9880
B	0.8750	0.9375	0.7500	0.7857	0.3000	0.6875	0.8833	0.9561	0.4286	0.7333
C	0.8828	0.9688	0.1538	0.7286	0.3333	1.0000	0.9652	1.0000	0.2105	0.6000
D	0.9219	0.9297	0.1429	0.7857	0.2000	0.6471	0.9669	0.9474	0.1667	0.7097
E	0.9062	0.9609	0.2857	0.7000	0.2222	0.7778	0.9421	0.9831	0.2500	0.7368

Examining the ROC curves shown in Figure 3, for the MLP75-25.opt model, all curves tend to have an almost rectangular shape near the upper left corner of the graph for all cases, demonstrating reasonably accurate classification. In addition, the area under the curve (AUC) is greater than 90% for all categories. These values show that the learning performance is at a high level once the initial model (MLP75-25 model) has been optimized.

Analyzing the transition of the curves from one model to the other, in the MLP75-25 model, the ROC curve of Category E (green line) first moves significantly away from this upper left corner. This fact may indicate that Category E, which represents a balance of responsibility between the driver and the pedestrian in the pedestrian crash, is not correctly identified by the model because it does not attribute responsibility to one of the actors and is therefore more difficult to identify. Therefore, the model presents greater uncertainty in Category E, as shown in Table 6. This requires an exhaustive analysis by a human specialist in the field. In addition, the small number of patterns assigned to this category does not make it easy for the models to learn correctly how to classify patterns that could eventually belong to this category. This could be corrected if a larger number of these patterns could be made available for training or try some other techniques to help improve model performance in multi-class classifications with unbalanced data. To better analyze the results, three other metrics, which are particularly well-suited for classifying imbalanced data, were also tested: MF1S, Cohen’s kappa coefficient (κ), and BACC. All three metrics achieved very satisfactory values, with scores higher than 0.75 [67]. These metrics that suggest a reliable classification system. These values are shown in Table 7.

4. Discussion

Overall, the MLP model benefits from its simple architecture, is easy to use, and is used in other work related to road safety. Previous studies, such as those mentioned above, have shown that well-trained MLP models can produce satisfactory ACC for classification results. This work applies to the MLP model as an improvement method to classify the level of responsibility of those involved in a pedestrian crash. This responsibility is measured through 14 input variables.

As shown in Table 8, the initial analysis of the results obtained by the MLP75-25 model allows us to identify categories that were matched (some output vector value greater than 0.75) by the LPB’s JTP (77.71%) or SJ (89.97%) as explained in Section 2.4.2. Additionally, we can identify patterns of mismatches whose categories are band 2/unreliably classified by the MLP model (LPB: 12.46%; SJ: 4.86%) or that are band 3/unclear or not perfectly defined (LPB: 9.83%; SJ: 5.17%).

Eliminating questionable attributions will be virtually impossible. The optimized model tends to more accurately define match patterns (the highest output vector value is greater than 0.85) and unreliable patterns (all output vector values remain far from 0.60). Once identified, the latter could be modified to become a match pattern. However, the model includes patterns that it cannot accurately classify. These patterns will need to be interpreted and evaluated by experts in the field (e.g., certain Category E patterns with equal sharing of responsibilities between driver and pedestrian or patterns that are difficult to categorize, such as those between Categories A and C, or B and D, among others).

An in-depth analysis of the output vectors for each category in the first reprocessing iteration shows that, in an attempt to eliminate them, questionable attributions have not decreased considerably, but have increased slightly for some categories (Table 9). Furthermore, the reliable patterns now display values above 0.85, while unreliable patterns (far from 0.60) persist in showing values below 0.60. This leads us to conclude that the model correctly identified patterns initially well classified by police officers and judges, as well as those that were inaccurate, and those that require further analysis.

Moreover, Table 9 shows the percentage of questionable attributions as a percentage of the total data set for each category. According to the analysis of the output results presented by MLP75-25 and MLP75-25.opt models, this percentage of questionable attributions in some category increases slightly. Moreover, Categories C and E have the highest percentages of questionable attributions, at 7.70% and 31.35%, respectively. As already shown in the analysis of the MLP75-25 model, category E requires a special effort to be correctly identified by the competent authorities. This has been verified with the study of the MLP75-25.opt model. On the other hand, category C also requires special analysis because the classification of a pedestrian crash as category A or category C can easily be confused depending on the initial variables. However, this is not the case for categories B and D. There are determining variables, such as variable S-1 (location of the pedestrian crash), that make it easier to determine responsibility. In addition, it is observed that reprocessing tends to balance the categories, which could also be studied in future research.

Although good metrics were obtained in the first iteration with MLP75-5.opt, subsequent iterations could increase the number of match patterns by analyzing the results. This would lead to a much more accurate and simplified model. Patterns would be identified according to their classification ACC, falling into one of two bands: well-classified classifications, or matches, and classifications that require human analysis or questionable attributions. This would refine the metrics obtained in this study. Moreover, it would eliminate inaccurate classifications by police officers and judges in their reports and court rulings, respectively, and improve future decision-making.

5. Conclusions

Traffic accidents pose a major threat to road users, and pedestrian crashes remain the most severe category, causing the highest number of casualties in urban environments. Therefore, a detailed and objective assessment of the responsibility levels of the actors involved is a fundamental task for the competent authorities. Firstly, this analysis allows the judicial authorities to impose fair sentences. Secondly, it enables the authorities to draw conclusions that facilitate decision-making to reduce the problem.

In this study, a dataset of 510 records of pedestrian crashes was analyzed, including 428 reports from the JTP of the LPB, and 82 judicial decisions of the Spanish Judiciary recorded between 2015 and 2024.

We developed an MLP-based classification model to help competent authorities make decisions about responsibility for pedestrian crashes. The aim was to provide traffic police, judges, and prosecutors with a versatile data-driven decision support toolset to address pedestrian crashes more effectively and efficiently.

This issue was addressed by analyzing the 14 common variables extracted from 510 pedestrian crashes. These features were decisive in determining the real level of liability according to the JTP of the LPB and the courts. The data sets were first processed by an MLP model (MLP75-25). The results obtained showed that patterns with membership values higher than 0.85 always represented correct classifications, while those with lower values showed mismatches. Based on these findings three confidence bands were established: band 1, or “reliable”, patterns were initially well-classified crashes, but they certainly cannot be considered accurate (the highest value of a vector component is between 0.75 and 0.85); band 2, or “unreliable”, patterns were inaccurate (when no value of the vector components exceeds 0.60); and band 3, or “unclear”, pedestrian crashes that must be studied more exhaustively by a human specialist (the highest value of one of the components of the vector is between 0.60 and 0.75).

After analyzing the output vectors obtained, the responsibility levels of the pedestrian crashes are reclassified by adjusting the category that the model considers. With this new input data, the MLP model was retrained, MLP75-25.opt, and then used to classify the classification dataset, providing significantly better results than the previous model in all its metrics. These improved results tell us two key insights. First, in some patterns, the final decisions of the Traffic Police or the Courts determined levels of responsibility for the parties involved that did not correspond to the initial data. Second, the model helps us identify cases that require further specialist evaluation. This would bring us closer to the concept of a “robot judge”, as described in scientific literature.

The proposed model accurately classifiers responsibility levels and assist authorities in making informed decision. The model can also identify cases where the attribution is doubtful and requires a thorough investigation by a specialist in the field. This model uses real data, and our goal is to develop a tool that assists police officers and judges in making decisions without requiring them to alter their work procedures. We want the tool to adapt to its users, allowing them to work more efficiently and focus their efforts on analyzing the most complex cases. In addition to the classification capacity, our model can provide the competent authorities with an analysis of the most influential variables in a pedestrian crash. With this information, local governments can take measures to reduce road crashes.

Although neural networks, like many other classification models, and therefore this MLP model, do not explain how they assign class membership (hence the term “black box”), the MLP model generates a membership index for each class that users can consult at any time to help interpret the classification. Thus, the model identifies those classifications that are unreliable, allowing potential users to decide. The purpose of this study is to create a tool to assist police officers and judges, not replace them. For this reason, it is important to indicate which classifications are reliable and which are not. This MLP allows it and is an optimal model for classifying the level of responsibility in a pedestrian crash and predicting the decision of the competent authorities. Given its versatility and performance, the MLP proves to be an effective tool for classifying responsibility levels and anticipating outcomes.

Although our research makes important contributions, it is crucial to acknowledge its limitations. First, this study is based on a limited data set (JTP of LPB and SJ), excluding other JTPs of other Spanish cities, which may limit generalizability. Secondly, the analysis is based on available data, which may not fully reflect other variables and not capture the full complexity of responsibility attribution. Therefore, further research is needed to systematize the study and generalize the classification of pedestrian crash liability using neural networks. This would involve developing larger, publicly available data sets, standardizing the database, and conducting further comparative studies. In many cases, the datasets used are small and not publicly available. This complicates the models because the neural network cannot train itself sufficiently to obtain better results. A larger data set would be preferable, but obtaining authorization to use datasets containing personal information is often very complex due to restrictive norms about personal data protection. This study focuses on a specific city using data with the same amount of reliable information. This allows us to extract a homogeneous set of variables (inputs) to define traffic accidents and levels of responsibility (outputs). However, external validation using data from other cities or countries would improve robustness. Another limitation is that reducing complex behaviors to binary values oversimplifies reality. Consequently, some nuances in the attribution of responsibility may be lost.

Future research could incorporate non-binary values for the variables, provided these values are available in the initial databases. If the information to be processed includes confidence levels for continuous values for both the input patterns and output classes, instead of binary values, the MLP architecture maintains its performance. Other future research should also extend the scope of this analysis to other types of traffic crashes and integrate additional data sources. The integration of other supervised ML techniques could also be explored to compare and improve the detection of complex patterns. In summary, the model developed in this paper is interesting because it is considered applicable to other types of crash analyses involving different input variables and could also be extrapolated to other fields. Additionally, the initial data could be analyzed using an intrinsic category imbalance treatment, and overfitting could be monitored. Techniques, such as SMOTE (Synthetic Minority Over-sampling Technique) or ADASYN (Adaptive Synthetic Sampling) [68,69,70], could be applied to address imbalance-related issues and further improve model performance.

Author Contributions

Conceptualization, A.M.-S., F.C.G.-P. and M.A.J.-M.; methodology, A.M.-S. and M.A.J.-M.; software, A.M.-S. and M.A.J.-M.; validation, A.M.-S., F.C.G.-P. and M.A.J.-M.; formal analysis, A.M.-S.; investigation, A.M.-S.; resources, A.M.-S.; data curation, A.M.-S.; writing—original draft preparation, A.M.-S.; writing—review and editing, A.M.-S., F.C.G.-P. and M.A.J.-M.; visualization, A.M.-S., F.C.G.-P. and M.A.J.-M.; supervision, A.M.-S., F.C.G.-P. and M.A.J.-M.; project administration, A.M.-S., F.C.G.-P. and M.A.J.-M.; funding acquisition, A.M.-S., F.C.G.-P. and M.A.J.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This publication has been co-financed at 85% by the European Union, European Regional Development Fund (FEDER “una manera de hacer Europa”), and the Government of Extremadura, grant number GR24104, Management Authority. Ministry of Finance.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some of the data related to this study have not been deposited in a public repository, are confidential and are available on request in the database of the Judicial Traffic Police of the Local Police of Badajoz. The rest of the data are available in the database of the Spanish Judiciary (https://www.poderjudicial.es/search/indexAN.jsp, accessed on 13 February 2025).

Acknowledgments

The authors would like to thank the Badajoz Local Police and the Spanish Judiciary. We would also like to thank the reviewers for their helpful comments. While preparing this manuscript, the authors exclusively used Microsoft 365 Copilot Chat (Premium) to collect statistics on road crashes in the U.S., Europe, and Asia from 2018 to 2024. The authors reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MLP	Multilayer Perceptron
JTP	Judicial Traffic Police
LPB	Local Police of Badajoz
SJ	Spanish Judiciary
AI	Artificial Intelligence
LM	Levenberg–Marquardt
ROC	Receiver Operating Characteristic
AUC	Area Under Curve
ACC	Accuracy
R	Recall
P	Precision
S	Specificity
F1S	F1Score
MF1S	Macro F1Score
BACC	Balanced Accuracy
κ	Cohen’s kappa coefficient

References

Shaik, E.; Islam, M.; Quazi Sazzad, H. A review on neural network techniques for the prediction of road traffic accident severity. Asian Transp. Stud. 2021, 7, 100040. [Google Scholar] [CrossRef]
Sameen, M.I.; Pradhan, B. Assessment of the effects of expressway geometric design features on the frequency of accident crash rates using high-resolution laser scanning data and GIS. Geomat. Nat. Hazards Risk 2016, 8, 733–747. [Google Scholar] [CrossRef]
Casado-Sanz, N.; Guirao, B.; Galera, A.L.; Attard, M. Investigating the Risk Factors Associated with the Severity of the Pedestrians Injured on Spanish Crosstown Roads. Sustainability 2019, 11, 5194. [Google Scholar] [CrossRef]
European Road Safety Observatory. Annual Statistical Report on Road Safety in the EU 2022; European Commission: Brussels, Belgium, 2023; Available online: https://transport.ec.europa.eu/background/road-safety-statistics-2023_en (accessed on 7 January 2025).
World Health Organization (WHO). Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023; Available online: https://www.who.int/teams/social-determinants-of-health/safety-and-mobility/global-status-report-on-road-safety-2023?utm_source=copilot.com (accessed on 3 January 2026).
United Stated Department of Transportation. National Highway Traffic Safety Administration. Available online: https://crashstats.nhtsa.dot.gov/ (accessed on 3 January 2026).
European Road Safety Observatory. Annual Accident Report. Available online: https://road-safety.transport.ec.europa.eu/european-road-safety-observatory/statistics-and-analysis-archive/annual-accident-report_en (accessed on 3 January 2026).
Asian Transport Observatory. ATO National Database Masterlist of Indicators. Available online: https://asiantransportobservatory.org/snd-masterlist/?indicator=INF-CFP-001 (accessed on 3 January 2026).
Fian, T.; Hauger, G. Identifying High-Risk Patterns in Single-Vehicle, Single-Occupant Road Traffic Accidents: A Novel Pattern Recognition Approach. Appl. Sci. 2024, 14, 8902. [Google Scholar] [CrossRef]
Abdulhafedh, A. Road Crash Prediction Models: Different Statistical Modeling Approaches. J. Transport. Technol. 2017, 7, 190–205. [Google Scholar] [CrossRef]
Soto, B.G.; De Bumbacher, A.; Deublein, M.; Adey, B.T. Predicting road traffic accidents using artificial neural network models. Infrastruct. Asset Manag. 2018, 5, 132–144. [Google Scholar] [CrossRef]
Chakraborty, A.; Mukherjee, D.; Mitra, S. Development of pedestrian crash prediction model for a developing country using artificial neural network. Int. J. Inj. Control Saf. Promot. 2019, 26, 283–293. [Google Scholar] [CrossRef] [PubMed]
Iljoon, C.; Park, H.; Hong, E.; Jaeduk, L.; Kwon, N. Predicting effects of built environment on fatal pedestrian accidents at location-specific level: Application of XGBoost and SHAP. Accid. Anal. Prev. 2022, 166, 106545. [Google Scholar] [CrossRef] [PubMed]
Dongyu, W.; Zhang, Y.; Qiaojun, X. Geographically weighted random forests for macro-level crash frequency prediction. Accid. Anal. Prev. 2024, 194, 107370. [Google Scholar] [CrossRef]
Subasish, D.; Tamakloe, R.; Zubaidi, H.; Obaid, I.; Alnedawi, A. Fatal pedestrian crashes at intersections: Trend mining using association rules. Accid. Anal. Prev. 2021, 160, 106306. [Google Scholar] [CrossRef]
Yasir, A.; Mazharul, H.; Mannering, F. A Bayesian generalised extreme value model to estimate real-time pedestrian crash risks at signalised intersections using artificial intelligence-based video analytics. Anal. Methods Accid. Res. 2023, 38, 100264. [Google Scholar] [CrossRef]
Sengupta, A.; Guler, S.I.; Gayah, V.V.; Warchol, S. Evaluating the reliability of automatically generated pedestrian and bicycle crash surrogates. Accid. Anal. Prev. 2024, 203, 107614. [Google Scholar] [CrossRef]
Eaysir, A.; Sherrie-Anne, K.; Schroeter, R.; Mazharul, H. A game theoretical model to examine pedestrian behaviour and safety on unsignalised slip lanes using AI-based video analytics. Accid. Anal. Prev. 2025, 217, 108034. [Google Scholar] [CrossRef] [PubMed]
Qingwen, P.; Kun, X.; Hongyu, G.; Yuan, Z. Modeling crash avoidance behaviors in vehicle-pedestrian near-miss scenarios: Curvilinear time-to-collision and Mamba-driven deep reinforcement learning. Accid. Anal. Prev. 2025, 214, 107984. [Google Scholar] [CrossRef]
Li, X.; Liang, Y.; Yang, Z.; Li, J. Pedestrian Trajectory Prediction Based on Dual Social Graph Attention Network. Appl. Sci. 2025, 15, 4285. [Google Scholar] [CrossRef]
Kim, M.; Kim, D.; Shim, J. The Association Between Aggressive Driving Behaviors and Elderly Pedestrian Traffic Accidents: The Application of Explainable Artificial Intelligence (XAI). Appl. Sci. 2025, 15, 1741. [Google Scholar] [CrossRef]
Jin, Y.; He, H. An Artificial-Intelligence-Based Semantic Assist Framework for Judicial Trials. Asian J. Law Soc. 2020, 7, 531–540. [Google Scholar] [CrossRef]
Cataleta, M.S. Humane Artificial Intelligence: The Fragility of Human Rights Facing AI; East-West Center: Honolulu, HI, USA, 2020. [Google Scholar]
Kavanagh, C. Artificial Intelligence. In New Tech, New Threats, and New Governance Challenges: An Opportunity to Craft Smarter Responses? Carnegie Endowment for International Peace: Washington, DC, USA, 2019; pp. 13–23. [Google Scholar]
Jongbloed, A.W.; Nakad-Westrate, H.J.; Herik, H.; Salem, A.M. The rise of the robotic judge in modern court proceedings. In Proceedings of the ICIT 2015 The 7th International Conference on Information Technology, Amman, Jordan, 12–15 May 2015; pp. 59–67. [Google Scholar]
Wang, N. Black box justice: Robot judges and AI-based judgment processes in China’s court system. In 2020 IEEE International Symposium on Technology and Society (ISTAS); IEEE: New York, NY, USA, 2020; pp. 58–65. [Google Scholar] [CrossRef]
Aboelazm, K.S.; Dganni, K.M.; Tawakol, F.; Sharif, H. Robotic Judges: A New Step Towards Justice or the Exclusion of Humans? J. Lifestyle SDGs Rev. 2024, 4, e02515. [Google Scholar] [CrossRef]
Edwards, L. Data Protection and e Privacy: From Spam and Cookies to Big Data, Machine Learning and Profiling. In Law, Policy and the Internet; Lilian, E., Ed.; Hart Publishing: Oxford, UK, 2019; pp. 119–164. [Google Scholar]
Barysė, D.; Sarel, R. Algorithms in the court: Does it matter which part of the judicial decision-making is automated? Art. Intellig. Law 2023, 32, 117–146. [Google Scholar] [CrossRef]
Matić-Bošković, M. Impact of artificial intelligence on practicing judicial professions. Sociol. Pregl. 2024, 58, 481–499. [Google Scholar] [CrossRef]
Chinadaily. China Uses AI Assistive Tech on Court Trial for First Time. Available online: http://www.chinadaily.com.cn/a/201901/24/WS5c4959f9a3106c65c34e64ea.html (accessed on 1 April 2025).
El Confidencial. Maite.ai: La IA capaz de Dictar Sentencias Revoluciona los Despachos de Abogados. Available online: https://www.elconfidencial.com/espana/cataluna/2025-04-19/inteligencia-artificial-juridico-maite-1hms_4108936 (accessed on 20 April 2025).
Sourdin, T. Judge v Robot? Artificial intelligence and judicial decision-making. Univ. N. S. W. Law J. 2018, 41, 1114–1133. [Google Scholar] [CrossRef]
Ulenaers, J. The Impact of Artificial Intelligence on the Right to a Fair Trial: Towards a Robot Judge? Asian J. Law Econom. 2020, 11, 20200008. [Google Scholar] [CrossRef]
Chronowski, N.; Kálmán, K.; Szentgáli-Tóth, B. Artificial Intelligence, Justice, and Certain Aspects of Right to a Fair Trial. Acta Univer. Sapient. Leg. Stud. 2021, 10, 169–189. [Google Scholar] [CrossRef]
Watamura, E.; Liu, Y.; Ioku, T. Judges versus artificial intelligence in juror decision-making in criminal trials: Evidence from two pre-registered experiments. PLoS ONE 2025, 20, e0318486. [Google Scholar] [CrossRef]
Fine, A.; Berthelot, E.R.; Marsh, S. Public Perceptions of Judges’ Use of AI Tools in Courtroom Decision-Making: An Examination of Legitimacy, Fairness, Trust, and Procedural Justice. Behav. Sci. 2025, 15, 476. [Google Scholar] [CrossRef]
Zhang, X.; Xue, Q.; Guo, W.; Tan, J. Enhancing Model Transparency: A Comparative Analysis of SHAP and LIME in Explaining Traffic Accident Prediction Models. In The Proceedings of 2024 International Conference on Artificial Intelligence and Autonomous Transportation (AIAT 2024); Jia, L., Yao, D., Ma, F., Zhang, L., Chen, Y., Xue, Q., Eds.; Springer Nature: Singapore, 2025; Volume 1391, pp. 48–56. [Google Scholar]
Alanazi, F.; Umar, I.K.; Yosri, A.M.; Okail, M.A. Comparative evaluation of deep learning and traditional models for predicting traffic accident severity in Saudi Arabia. Sci. Rep. 2025, 15, 32568. [Google Scholar] [CrossRef]
Vicent, J.F.; Curado, M.; Oliver, J.L.; Pérez-Sala, L. A novel approach to predict the traffic accident assistance based on deep learning. Neural Comput. Applic. 2025, 37, 5343–5368. [Google Scholar] [CrossRef]
Bahador Parsa, A.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
Power, D. Decision Support Systems: Concepts and Resources for Managers; Greenwood Publishing Group: Santa Barbara, CA, USA, 2002. [Google Scholar]
Volokh, E. Chief Justice Robots. Duke Law J. 2019, 68, 1134–1192. [Google Scholar]
Zhu, F.Y.; Liu, Y.D.; Gao, F.; Wang, K. Research on constructing an artificial intelligence judicial database based on graph fusion. J. Yangzhou Univ. 2019, 23, 89–96. [Google Scholar]
Gabriel, I. Toward a Theory of Justice for Artificial Intelligence. Daedalus 2022, 151, 218–231. [Google Scholar] [CrossRef]
Campón Domínguez, J.A. El Diseño de Una Base de Datos de Investigaciones en Profundidad Sobre Atropellos a Peatones. Ph.D. Thesis, Universidad Carlos III, Madrid, Spain, 2015. [Google Scholar]
Moreno-Sanfélix, A.; Gragera-Peña, F.C.; Jaramillo-Morán, M.A. An improvement of the conceptual system of the sequential events model of road crashes (i-MOSES). Heliyon 2024, 10, e37268. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks. A Comprehensive Foundation, 2nd ed.; Pentince-Hall: Saddle River, NJ, USA, 2001. [Google Scholar]
Rather, A.M.; Agarwal, A.; Sastry, V.N. Recurrent neural network and a hybrid model for prediction of stock returns. Expert. Syst. Applicat. 2015, 42, 3234–3241. [Google Scholar] [CrossRef]
Göçken, M.; Özçalıcı, M.; Boru, A.; Dosdogru, A.T. Integrating metaheuristics and Artificial Neural Networks for improved stock price prediction. Expert. Syst. Applicat. 2016, 44, 320–331. [Google Scholar] [CrossRef]
Ilie, C.; Ploae, C.; Melnic, L.V.; Cotrumba, M.; Gurau, A.; Alexandra, C. Sustainability through the use of modern simulation methods applied artificial intelligence. Sustainability 2019, 11, 2384. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Cybenko, G. Approximation by Superpositions of a Sigmoidal Function. Math. Control Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Zhang, D.; Zeng, L.; Cao, K.; Wang, M.; Peng, S.; Zhang, Y.; Zhao, W. All Spin Artificial Neural Networks Based on Compound Spintronic Synapse and Neuron. IEEE Trans. Biomed. Circ. Syst. 2016, 10, 828–836. [Google Scholar] [CrossRef]
Gardner, M.W.; Dorling, S.R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atm. Environ. 1998, 32, 2627–2636. [Google Scholar] [CrossRef]
Kulkarni, A.; Chong, D.; Batarseh, F.A. Foundations of data imbalance and solutions for a data democracy. In Data Democracy; Batarseh, F.A., Yang, R., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 83–106. [Google Scholar] [CrossRef]
Shung, K.P. Accuracy, Precision, Recall, or F1? 2018. Available online: https://medium.com/data-science/accuracy-precision-recall-or-f1-331fb37c5cb9 (accessed on 15 January 2026).
Yang, Y.; Liu, X. A re-examination of text categorization methods. In Proceedings of the SIGIR ‘99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 1 August 1999; pp. 42–49. [Google Scholar]
Brodersen, K.H.; Ong, C.S.; Stephan, K.E.; Buhmann, J.M. The Balanced Accuracy and Its Posterior Distribution. In Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, 7 October 2010; pp. 3121–3124. [Google Scholar]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Measurem. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Jeremiah, R.; Way, P.D.; Firat, C.; Thanh-Nam, D.; Sartipi, M. Modeling and predicting vehicle accident occurrence in Chattanooga, Tennessee. Accid. Anal. Prev. 2021, 149, 105860. [Google Scholar] [CrossRef]
Wang, Y.; Zhai, H.; Cao, X.; Geng, X. Cause Analysis and Accident Classification of Road Traffic Accidents Based on Complex Networks. Appl. Sci. 2023, 13, 12963. [Google Scholar] [CrossRef]
de Ibiza, D. Situación “caótica” en la Unidad de Atestados de la Guardia Civil de Tráfico de Ibiza: Esta Semana se Queda Bajo Mínimos. Available online: https://www.diariodeibiza.es/ibiza/2024/02/19/situacion-caotica-unidad-atestados-guardia-98371856.html (accessed on 6 April 2025).
Asociación Unificada de Guardias Civiles. AUGC Denuncia la Precariedad Laboral de los Equipos de Atestados de Tráfico en la Provincia de Badajoz. Available online: https://www.augc.org/actualidad/augc-denuncia-precariedad-laboral-equipos-atestados-trafico-en-provincia-badajoz_21957_102.html (accessed on 22 March 2025).
Noticias de Navarra. Denuncian Falta de Personal y Formación en la Brigada de Atestados de Policía Foral. Available online: https://www.noticiasdenavarra.com/sucesos/2024/06/20/denuncian-falta-personal-formacion-brigada-8379901.html (accessed on 7 June 2025).
del Río Montesdeoca, L. Necesidad De Una Fiscalía Especializada En Seguridad Vial. Logos Guard. Civil 2024, 29, 13–50. [Google Scholar]
Garcia, C.; Viallon, V.; Bouaoun, L.; Martin, J.-L. Prediction of responsibility for drivers and riders involved in injury road crashes. J. Saf. Res. 2019, 70, 159–167. [Google Scholar] [CrossRef] [PubMed]
Yihong, L.; Yunpeng, W.; Tao, L.; Beibei, L.; Xiaolong, L. SP-SMOTE: A novel space partitioning based synthetic minority over-sampling technique. Knowl. Based Syst. 2021, 228, 107269. [Google Scholar] [CrossRef]
Kaur, R.; Sharma, R.; Dhaliwal, M.K. Evaluating Performance of SMOTE and ADASYN to Classify Falls and Activities of Daily Living. In Proceedings of the 12th International Conference on Soft Computing for Problem Solving. (SocProS 2023); Pant, M., Deep, K., Nagar, A., Eds.; Springer Nature: Singapore, 2024; Volume 995, pp. 315–324. [Google Scholar]
Suja, A.A. Imbalanced data learning using SMOTE and deep learning architecture with optimized features. Neural Comput. Applic. 2025, 37, 967–984. [Google Scholar] [CrossRef]

Figure 1. Overall flow chart of the MLP model implementation structure.

Figure 2. Process development flow following an in-depth analysis of output vectors in the MLP model. The questionable attributions are patterns whose reprocessed output vectors have the highest values between 0.60 and 0.85. The model cannot accurately classify them, and they require extensive analysis by a human specialist.

Figure 3. ROC curve and AUC obtained for the five output categories in the MLP model and 14 variables. (a) MLP75-25 model, (b) MLP75-25.opt model.

Table 1. Traffic crash rates, social costs, and pedestrian fatality trends from regions such as the U.S., Europe, and Asia (2018–2024).

Year	Region	Crash Rate (Deaths per 100,000 Inhabitants)	Social Costs	Pedestrian Fatality Trends
2018	U.S.	12.4	240,000 M$	Slight increase (3%)
	Europe	5.0	100,000 M€	Balanced
	Asia	20.0	400,000 M$	Increase (5–7%)
2019	U.S.	13.0	245,000 M$	Slight decrease (1–2%)
	Europe	4.8	105,000 M€	Slight decrease (2%)
	Asia	18.5	420,000 M$	Balanced
2020	U.S.	12.5	230,000 M$	Balanced
	Europe	4.7	110,000 M€	Balanced
	Asia	19.0	430,000 M$	Increase (6%)
2021	U.S.	14.0	260,000 M$	Increase (6–7%)
	Europe	4.5	115,000 M€	Balanced
	Asia	18.0	440,000 M$	Increase (4%)
2022	U.S.	14.3	265,000 M$	Increase (5%)
	Europe	4.4	120,000 M€	Slight decrease (1%)
	Asia	17.5	450,000 M$	Increase (3%)
2023	U.S.	13.9	270,000 M$	Balanced
	Europe	4.3	125,000 M€	Balanced
	Asia	16.5	460,000 M$	Balanced
2024	U.S.	12.8	280,000 M$	Slight decrease (1–2%)
	Europe	4.2	130,000 M€	Slight decrease (1%)
	Asia	16.0	470,000 M$	Balanced

Table 2. Variable description and binary values.

Subsystem	Num	Description	Value
Human	H-1	PPP and RPP match. The driver is attentive while driving	1
	H-1	PPP and RPP do not match. The driver is inattentive while driving	0
	H-2	RT ≤ 0.75 s (average of a normal person)	1
	H-2	RT > 0.75 s (average of a normal person)	0
	H-3	Alcohol rate (driver) ≤ 0.25 mg/L (Limit in Spain)	1
	H-3	Alcohol rate (driver) > 0.25 mg/L (Limit in Spain)	0
	H-4	Driver without drugs in their system	1
	H-4	Driver with drugs in their system	0
	H-5	Alcohol rate (pedestrian) ≤ 0.25 mg/L	1
	H-5	Alcohol rate (pedestrian) > 0.25 mg/L	0
	H-6	Pedestrian without drugs in their system	1
	H-6	Pedestrian with drugs in their system	0
Technological	T-1	Expired periodic technical inspection of the vehicle	1
	T-1	Current periodic technical inspection of the vehicle	0
	T-2	Pedestrian clothing color with high visibility	1
	T-2	Pedestrian clothing color with low visibility	0
Structural	S-1	At a pedestrian crossing or its influence area (approx. 5 m)	1
	S-1	Outside pedestrian crossing or its influence area (approx. 5 m)	0
	S-2	During the day and/or without glare and/or sufficiently illuminated road	1
	S-2	At night and/or with glare and/or insufficiently illuminated road	0
Normative	N-1	Expired or without driving license	1
	N-1	Current or with driving license	0
	N-2	Vehicle speed ≤ Speed limit of the road	1
	N-2	Vehicle speed > Speed limit of the road	0
	N-3	Driving no using mobile	1
	N-3	Driving using mobile when pedestrian crash occurs, or moments before	0
	N-4	Pedestrian crosses using mobile or with music headphones	1
	N-4	Pedestrian crosses no using mobile or without music headphones	0

Table 3. Categories and levels of responsibility used in this study from the original dataset.

Category	Responsibility (%)
Category	Driver	Pedestrian
A	100	0
B	0	100
C	75	25
D	25	75
E	50	50

Table 4. MLP network features.

Description	Features
Inputs variables	14
Outputs variables	5
Number of Layers	3
Hidden Layers	1
Number of neurons in each layer	14, 5, 5
Training Type	Supervised
Training Algorithm	LM
Transfer Function	Log-Sigmoid
Train	%75 (382)
Validation	%25 (128)

Table 7. Metrics for the MLP75-25.opt model.

Model	MF1S	BACC	κ
MLP75-25.opt	0.7536	0.7976	0.7991

Table 8. General evaluations of MLP75-25 model results.

Data Set	MLP75-25
	Matches (%)	Mismatches (%)
	Matches (%)	Band 2/Unreliable	Band 3/Unclear
LPB	77.71	12.46	9.83
SJ	89.97	4.86	5.17

Table 9. Evaluation of results for the MLP75-25 and the MLP75-25.opt models.

Category	MLP75-25		MLP75-25.opt
Category	Original Data Set (%)	Questionable Attributions About Original Data Set (%)	Reprocessing Data Set (%)	Questionable Attributions About Reprocessing Data Set (%)
A	61.71	1.38	58.45	0.96
B	10.51	0.61	11.66	0.67
C	9.67	3.91	8.62	7.70
D	10.37	1.32	12.06	1.48
E	7.75	30.24	9.21	31.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moreno-Sanfélix, A.; Gragera-Peña, F.C.; Jaramillo-Morán, M.A. Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model. Urban Sci. 2026, 10, 68. https://doi.org/10.3390/urbansci10020068

AMA Style

Moreno-Sanfélix A, Gragera-Peña FC, Jaramillo-Morán MA. Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model. Urban Science. 2026; 10(2):68. https://doi.org/10.3390/urbansci10020068

Chicago/Turabian Style

Moreno-Sanfélix, Alejandro, F. Consuelo Gragera-Peña, and Miguel A. Jaramillo-Morán. 2026. "Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model" Urban Science 10, no. 2: 68. https://doi.org/10.3390/urbansci10020068

APA Style

Moreno-Sanfélix, A., Gragera-Peña, F. C., & Jaramillo-Morán, M. A. (2026). Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model. Urban Science, 10(2), 68. https://doi.org/10.3390/urbansci10020068

Article Menu

Improving the Level of Responsibility Classification for Pedestrian Crashes with the Multilayer Perceptron Model

Abstract

1. Introduction

2. Material and Methods

2.1. Project Summary

2.2. Data

2.3. Methodology

2.3.1. The MLP Neural Network Model

2.3.2. Performance Metrics

2.4. Model Structure

2.4.1. Neural Network Structure

2.4.2. Classification Process

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI