Next Article in Journal
Coverage Optimization of Wireless Sensor Networks Using Combinations of PSO and Chaos Optimization
Next Article in Special Issue
Toward Autonomous and Distributed Intersection Management with Emergency Vehicles
Previous Article in Journal
Postwall-Slotline Stepped Impedance Resonator and Its Application to Bandpass Filter with Improved Upper Stopband
Previous Article in Special Issue
Deep Learning for Activity Recognition Using Audio and Video
 
 
Article
Peer-Review Record

Weakness Evaluation on In-Vehicle Violence Detection: An Assessment of X3D, C2D and I3D against FGSM and PGD

Electronics 2022, 11(6), 852; https://doi.org/10.3390/electronics11060852
by Flávio Santos 1,2,*, Dalila Durães 1,*, Francisco S. Marcondes 1, Niklas Hammerschmidt 3, José Machado 1,4 and Paulo Novais 1,4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2022, 11(6), 852; https://doi.org/10.3390/electronics11060852
Submission received: 10 January 2022 / Revised: 22 February 2022 / Accepted: 26 February 2022 / Published: 9 March 2022

Round 1

Reviewer 1 Report

This paper is an extension of previously published paper of authors [9]. Authors discusses limtations of method proposed in [9]. What is missing is a discussion of why exactly the three methods proposed in the previous paper are worth further examination, and why the discussion is limited to only them? There are plenty of works proposed by other authors in this field of reserch. Query car + violence + detection  in Google Scholar returns ~6150 hits since 2021.
Auths states that "The research is currently at an early stage when explorations are being undergone and feasibility is being evaluated." -  if it so, why the results are even worth publishing?
Line 95 - reference missing
Adversarial attacks and white box attacks are now well-known concepts and in this respect the work does not bring anything new.
I do not see enough new elements in the work to make it worth publishing as a new and stand-alone article.  

Author Response

We are grateful for all your comments on our paper. All the questions raised were commented and the suggestions were integrated into the new version of the paper.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper presents three different architecture models for analysis of violence recognition inside a vehicle. The adversarial robustness for each architecture has been performed.  The authors verified the model weakness and sensibility regarding the complete video and in a frame-by-frame way.

The principal drawback of this paper is that the authors did not present in explicit form the principal contributions of their method.

Comments:

  1. Please put the number of reference paper used in page 3, line 95. Also, revise several errors in grammar, principally in verbs and commas.
  2. The algorithms 1 and 2 presented by authors are known, as well as three architectures used, please present in explicit for form the principal contributions of your method.
  3. The database created is very small, only 20 scenarios: 12 from these ones are violent scenarios and the left 8 are non-violent scenarios. How do you justify the confidence of your experimental results in such very small data?
  4. You mentioned that the usage of full video does not give good results. You wrote “we decided to cut all violence videos and keep only the seconds which have violence situations”. Please justify, how do you prepare such cut videos in real situation when you have violent scenario?
  5. Please explain and present much more details when you are discussing Figs 7-8 for the model convergence in curves of cross-entropy training and accuracy during the training step. You should present all parameters of the experimental setting. Also, please provide experimental curves for accuracy in test stage for these three chosen architectures.
  6. When you analyzed the adversarial noise added to each frame (video level), the results in table 2 confirmed that all architectures did not resist the adversarial noise attacks. Please comment this fact and propose possible adjustment or recommendations for used architectures.

 

Author Response

We are grateful for all your comments on our paper. All the questions raised were commented and the suggestions were integrated into the new version of the paper.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

1. Authors did not highlight in the draft what they exactly have changed in their paper. Please do so and resubmit for the review.
2. I can agree with statement that "This paper is highly specialized in a very specific subject" and "specialization is unavoidable" and "surveillance, as it relates with people's security, cannot be tackled "in general"". The proposed title of paper is very general. This is not a paper about "the Weakness of In-Car Violence Recognition Models" but rather "the weakness of three selected recognition methods against FGSM and PGD attack evaluate in the in-car violence recognition domain". Please retitle the paper to be more specific (my title is too clumsy).

Author Response

See attached file.

Author Response File: Author Response.pdf

Reviewer 2 Report

1. This review did not find answer on Comment 3: “The database created is very small, only 20 scenarios: 12 from these ones are violent scenarios and the left 8 are non-violent scenarios. How do you justify the confidence of your experimental results in such very small data?”

The authors answered: “As in total we have 16 pairs, we recorded 640 scenes. This is similar to general violence recognition datasets, such as Real-life violent scenes dataset.” Your answer does not guaranty the confidence of your experimental results. You should reflect this fact in the paper text.

2. This reviewer did not find into the text of the revised paper the answer on comment 4. You mentioned that the usage of full video does not give good results and wrote “we decided to cut all violent videos in slices of 5 seconds and keep only the slices that have violent situations. In order to keep only the slices with violent behaviour, we re-watch all the slices and keep only the violent ones. After cutting all videos, we retrained all models with the new dataset, and the result was a non-highly sensitive and more accurate model in the deployed scenario.”. This reviewer his comment asked: “Please justify, how do you prepare such cut videos in real situation when you have violent scenario?”. The authors did not answer this last question of the comment.

3. You should attend the comment 5: “Please explain and present much more details when you are discussing Figs 7-8 for the model convergence in curves of cross-entropy training and accuracy during the training step. You should present all parameters of the experimental setting. Also, please provide experimental curves for accuracy in test stage for these three chosen architectures.” You did not answer this comment. Only change was the phrase (lines 219-220). ” All the models were trained during 200 epochs, using the Adam optimizer with a learning rate 0.001 and batch size of 16 videos.” There is not any explication. Why did you use such parameters values? You ignored other question in the comment: “provide experimental curves for accuracy in test stage for these three chosen architectures”.

This reviewer recommends using the standard practice when all the changes in the revised paper should be highlighting into the text with other color. Also, this expert asks to reflect in your file of the point-in-point answers all changes made, presenting part of the text that you changed in the manuscript.

Author Response

See attached file.

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report

Authors have addressed all my remarks. In my opinion paper can be accepted as it is.

Reviewer 2 Report

No comments.

Back to TopTop