Review Reports - Weight Feedback-Based Harmonic MDG-Ensemble Model for Prediction of Traffic Accident Severity

Round 1

Reviewer 1 Report

The reviewer would like to thank the authors for their effort in improving the manuscript. Despite this being a revision, I did not receive a rebuttal letter with responses to the reviewer points. This makes it very difficult to understand the reasoning behind specific changes or responses by the authors. I stress the two previous major points that prevent me from passing the paper once again.

One major overall comment is that the paper and the proposed methodology disregard the road safety scientific aspect, which emphasizes causation of high severity by specific factors. The practical application is also unfeasible to future prediction with completely new datasets as all time-related variables will provide no insights then. The computer science aspect is valid and good, however it needs to be integrated through a realistic road safety point of view.

In particular, the reviewer is very skeptical of Figure 5. There are many variables that show large MDG which are very circumstantial and cannot be used for prediction. These are time-related variables (Hour. Month, Year, Day, Minute, Date). Any model created using these is not fit for prediction in future, unknown datasets. In a sense, these time-related variables overfit the data, as predictions are conducted with data from the same periods. The authors are urged to create datasets separated temporally, not randomly, and try to assess model performance then (e.g. 5 years for training and 2 years for testing data).

Author Response

Reviewer#1

Q1. One major overall comment is that the paper and the proposed methodology disregard the road safety scientific aspect, which emphasizes causation of high severity by specific factors. The practical application is also unfeasible to future prediction with completely new datasets as all time-related variables will provide no insights then. The computer science aspect is valid and good, however it needs to be integrated through a realistic road safety point of view.

Author response: Thank you for your kind comment. I had a lot of think according to your comment. First of all, computer science is a really valid and great field. Using this, I thought a lot about realistic road safety perspectives.

Author action: First, we used the open data of traffic accident information provided by the Korea Highway Transportation Corporation. It consists of 27 variables such as the year of occurrence, month, day of the week, and the number of deaths and injuries, and includes statistical numerical information. Therefore, the method proposed in the paper aims to discovery the causes of traffic accidents that occur most frequently from statistical data of traffic accident information. This is because it is possible to devise countermeasures for accidents and solutions when they occur.

Q2. In particular, the reviewer is very skeptical of Figure 5. There are many variables that show large MDG which are very circumstantial and cannot be used for prediction. These are time-related variables (Hour. Month, Year, Day, Minute, Date). Any model created using these is not fit for prediction in future, unknown datasets. In a sense, these time-related variables overfit the data, as predictions are conducted with data from the same periods. The authors are urged to create datasets separated temporally, not randomly, and try to assess model performance then (e.g. 5 years for training and 2 years for testing data).

Author response: Thank you for your kind comment. I proceeded with the pre-processing method again according to your comment.

Author action 1: However, since Korea has four distinct seasons, it is necessary to consider the variables for year and month. In addition, traffic congestion occurs according to the time of commute to work, and the accident rate is high. Therefore, I think it is necessary to consider time.

However, data separation methods for analysis were used separated temporally into test data, verification data, and training data rather than randomly used k—folds.

Accordingly, the picture of the importance of variables using the MDG coefficient for Fig. 5 and Table 2 has been slightly changed as follows.

Figure 5. Variable importance using the MDG coefficient.

Table 2. The final result of the variable importance using the MDG coefficient.

Variable Name	MDG	Variable Name	MDG
Hour	1000.54	Date	665.693849
Casualty_type	1302.854	Violation_law	557.599871
Month	1058.541	Light	667.584
Attacker_type	1512.122	Location	1256.67162

Author action 2: Also, modified contents of paper (Section 3.1, 3.2 and etc.) as follows:

As shown in Table 1, 13 variables and 18,700 transactions were extracted through pre-processing. Under ‘Traffic Accident Severity’, ‘1’ means ‘Slight-moderate’; ‘2’ means ‘Serious’; ‘3’ means ‘very serious’. ‘Year’ is categorized according to the year of occurrence. ‘Violation_law’ a large category of law violations is categorized between ‘0’ and ‘2’: ‘0’ means a pedestrian’s negligence; ‘1’ means a driver’s violation; ‘2’ means poor maintenance of the vehicle. ‘Road_type’ a large category of road type is categorized between ‘0’ and ‘4’: 0’ means crossroad; ‘1’ means others/undefined; ‘2’ means one way only; ‘3’ means parking lot; ‘4’ means railroad crossing. However, based on the pre-processing results, it is difficult to identify the factors that influence severity. For this reason, the important variables that influence severity are extracted. Accordingly, to prevent over-fitting of data analysis, it is divided into 70% of training data, 20% of test data, and 10% of validation data.

However, since Korea has four distinct seasons, it is necessary to consider the variables for year and month. In addition, traffic congestion occurs according to the time of commute to work, and the accident rate is high. Therefore, it is necessary to consider time. As a result of the variable importance when using the MDG coefficient in Fig. 5, relatively low importance values are also extracted.

In Fig. 6, the model has three steps. The first is to construct an ensemble model. The pre-processed traffic accident data is split into 70% of training data, 20% of test data, and 10% of verification data. Various single classification models are trained with the training data. Afterward, the ensemble model is generated using the weighted-voting ensemble method that applies these single models.

Author Response File: Author Response.docx

Reviewer 2 Report

Overall, I enjoyed reading this paper. Although it is very technical, it is well-written and relatively easy to follow. The topic of crash severity is of interest and particular importance. I have some comments which I hope they help authors further improve the paper.

-Author are suggested to add more metrics, e.g. AUC.

- A very important topic on crash severity modelling, is the unobserved heterogeneity. Unfortunately, this cannot be addressed by Machine learning models. I would encourage authors to add a short discussion. I also feel that the authors have to discuss the limitations of the black box methods they are applying.

-Algorithms could be placed in an Appendix.

-A thorough language and spelling check is required to be undertaken by the authors as there are a few typos and minor errors (fig 1 for example).

Author Response

Reviewer#2

Q1. Author are suggested to add more metrics, e.g. AUC.

Author response: Thank you for your kind comment. I added in AUC according to your comment. Thus, the quality of our paper could be improved.

Author action: The evaluation of the model using AUC was written in performance evaluation section as follows:

There is a difference in the degree of risk of an accident according to the threshold of the severity of a traffic accident. Therefore, the ROC curve is used to evaluate the proposed classification model according to the change of the threshold value. The ROC curve represents a curve through TPR (True Positive Rate) and FPR (False Positive Rate) at the thresholds of various classifications. In order to determine the superiority of the model through the ROC curve, it is determined through the area of the AUC representing the area under the curve. AUC indicates from 0 to 1, and the larger the value, the better the performance. Figure 10 shows the ROC curve and AUC area of the proposed model. As a result of the evaluation, the area of AUC was evaluated as 0.87.

Figure 10. ROC curve and AUC area of the model

Q2. A very important topic on crash severity modelling, is the unobserved heterogeneity. Unfortunately, this cannot be addressed by Machine learning models. I would encourage authors to add a short discussion. I also feel that the authors have to discuss the limitations of the black box methods they are applying.

Author response: Thank you for your kind comment. I thought a lot based on your comment. However, we think that current machine learning models can also discover heterogeneity that is not sufficiently observed because their performance is improving, and we use machine learning because the amount of data used is not vast. Also, in the case of classification, machine learning models are more effective than deep learning models.

Author action: Therefore, we have written the following to discuss the limitations of the black box technique and the difference between deep learning and machine learning.

Configurable models include machine learning-based models or deep learning-based models. In a machine learning-based model, the user directly inputs the features of the data and finds the features, and the model finds pattern learning by itself. Accordingly, new knowledge and information can be obtained. On the other hand, in the case of a deep learning-based model, the model performs self-learning without user intervention from feature extraction to pattern discovery. It also extracts results from complex and vast data. However, since it has the characteristics of a black box, there is a disadvantage in that it is possible to predict the analysis result of the data through the result without an intermediate process. In this paper, in order to improve the efficiency of classification, a model is constructed by extracting variables necessary for analysis through MDG coefficients. Therefore, a machine learning-based model is constructed.

Q3. Algorithms could be placed in an Appendix.

Author response: Thank you for your kind comment. I thought about sending the algorithm as an appendix.

Author action: The results of my thoughts on placing it in the appendix are as follows.

If the algorithm is omitted from the text, the flow of the paper can be disrupted. The algorithm contains the most important content in our paper and allows you to grasp the whole flow.

Q4. A thorough language and spelling check is required to be undertaken by the authors as there are a few typos and minor errors (fig 1 for example).

Author response: Thank you for your kind comment. We were able to carefully review the paper once more.

Author action: We carefully checked the English grammar in all the contents and figures of the thesis. In addition, it was confirmed by requesting a correction from an expert. Therefore, the figure (Figure1, 2, 5, 6) has been modified as follows.

Figure 1.

Figure 2

Figure 4

Figure 6

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have taken into account the previous comments and they have addressed them decently. I believe that the manuscript has improved since its previous iteration and could therefore be accepted.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The paper deals with an interesting subject dealing with the creation of an ensemble model for the prediction of accident severity. It uses a novel methodology and sophisticated approach. The effort of the authors is apparent and acknowledged. There are, however, critical issues in the paper that are overlooked and render it unsuitable for publication.

One major overall comment is that the paper and the proposed methodology disregard the road safety scientific aspect, which emphasizes causation of high severity by specific factors. The practical application is also unfeasible to future prediction with completely new datasets as all time-related variables will provide no insights then. The computer science aspect is valid and good, however it needs to be integrated through a realistic road safety point of view.

In particular, the reviewer is very skeptical of Figure 5. There are many variables that show large MDG which are very circumstantial and cannot be used for prediction. These are time-related variables (Hour. Month, Year, Day, Minute, Date). Any model created using these is not fit for prediction in future, unknown datasets. In a sense, these time-related variables overfit the data, as predictions are conducted with data from the same periods. The authors are urged to create datasets separated temporally, not randomly, and try to assess model performance then (e.g. 5 years for training and 2 years for testing data).

Since the MDG acronym is in the title, it should be defined in the abstract. The acronym XGBTree needs definition as well.

The claims of the authors both in the abstract and in the introduction (lines 42-43) that “…the process for determining the severity includes an expert’s subjective opinion and is time-consuming” is inaccurate. There are numerous road-safety research papers that have developed predictive injury severity models, just as the authors endeavor to do. There are also systems developed in academia but also in the industry, that assess injury severity without subjectivity and time consumption. Since this is a manuscript in road safety research, the authors are suggested to perform additional literature review on that aspect, and not limit themselves to methodological issues of computer science.

The statement of lines 55-56 that “learning occurs with the use of limited data and a model causes over-fitting for new data” is inaccurate. There are several techniques to avoid over-fitting that could be mentioned (cross-validation, neuron deactivation in neural networks etc.)

The first paragraph of section 2.1 between lines 85-97 can be shortened as it describes well-known principles in modelling.

In lines 208-209, it is mentioned that “…variables are selected based on the threshold ‘500’ where, through repeated testing, cut-off occurs.” However, on Figure 4 (step 2), an indication of “select 100 over MDG” is written. Please clarify this point, or correct as necessary.

Lines 194-200 and 235-240 have some repetition. Furthermore, the authors should provide an overview of severity classification (not just a reference), since this classification is critical for the results of the study. The authors mention (line 240) that “This is calculated by adding 70% to the numbers of deaths and severe injuries, and 30% to the number of minor injuries.” Is that addition or multiplication? Please clarify.

The type of assailant/attacker variable needs explanation in the manuscript. The sample of variables is not described adequately (e.g. mean/st.dev/min/max attributes for all continuous variables or percentages for categorical variables).

In line 289 it is mentioned that “This is because accidents often occur at night or during rush hours.”. This is a lacking interpretation, unsupported by references. Accidents occur based on exposure, and the influence of night/day is unclear as several contradicting results have been published.

In line 400, the terms ‘moderately severe’ and ‘extremely severe’ are misleading. More proper terms would be ‘slight-moderate injuries’ and ‘severe (or serious) injuries’.

Reviewer 2 Report

The authors analyze how to establish objective criteria for determining the severity of traffic accidents describing a real-world study.

The proposed approach is interesting, but the authors should better describe some points of their approach.

In particular, the authors should be better described the definitions of their approach with respect to the important variables. Some variables, such as Minute, Year, Day, Date, on increasing the efficiency of the approach should be explained clearly instead of emphasizing higher accuracy or precision by using a new method. Furthermore, the authors should provide more details about the limitation of their study.

In Fig 4. Step 2. Shows 14 important variables, and in table 2, only 11 important variables.
Line 233- 237, ’The traffic accident information data used in this study is based on the 2012 to 2018 fatal traffic accident information…’ what is fatal traffic accident information mean? If all information selected from fatal traffic accidents, how to evaluate of traffic accident severity to slight, serious and fatal.
Line 276-277. ’The vertical axis represents the mean decrease in Gini. The horizontal axis represents the independent variables of traffic accident information data.’ It cannot refer to Fig 5.
Line 282-283. Why 500 threshold cut-off is selected? It should be described clearly.
Line 490-491.’ multiple regression method uses 10 independent variables.’ What the values (R²(R square), VIF and collinearity) of 10 independent variables?

Reviewer 3 Report

The present paper focuses on the prediction of traffic accident severity with data provided by the Korean Road Traffic Authority. A harmonic MDG-ensemble model is proposed with the weight feedback-based approach. I find the study valuable, nevertheless, there are some major issues to be fixed.

Extensive language improvement is required for this paper. Low readability issues pull away from the quality of the work. For example, the authors try to explain which gaps are filled by this study during lines 54-68. However, the readers can not understand something from poor English. Therefore, I suggest sending the article for a professional language service after revising.

The major issues of this article are related to paper structure, integrity, and traceability.

I suggest three subsections for the introduction to increase traceability and integrity such as background, literature review, the aim of the article and paper structure. You should explain more clearly which research gaps will be filled under the subsection: ‘’the aim of article and paper structure’’ in connection to the literature review. The literature review on the prediction of traffic accident severity is insufficient, please expand.

Section 2 and Section 3 should be explained in a more compact way. I suggest a clear methodology section for this study, the applied methods should be presented here holistically as a framework at the beginning of the section, then you can separately explain them in a brief way under the subsections of the applied methods. The readers should have an integrated view before going deeper. Also, a subsection to explain data and software usage.

Please also explain all technical terms used in this study clearly, the readers may not be aware of these terms. Also do not use abbreviations before explaining (e.g. line 47: loT).

Furthermore, do not repeat the same things explained in sections 2 and 3 for the results section. Highlight the fundamental gains of this study for the road traffic accident literature in the Conclusions.