Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Moving Object Detection in Traffic Surveillance Video: New MOD-AT Method Based on Adaptive Threshold

ISPRS Int. J. Geo-Inf. 2021, 10(11), 742; https://doi.org/10.3390/ijgi10110742

by Xiaoyue Luo^1,2, Yanhui Wang^1,2,*, Benhe Cai^1,2 and Zhanxing Li^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

ISPRS Int. J. Geo-Inf. 2021, 10(11), 742; https://doi.org/10.3390/ijgi10110742

Submission received: 24 August 2021 / Revised: 21 October 2021 / Accepted: 23 October 2021 / Published: 1 November 2021

Round 1

Reviewer 1 Report

In this paper, a moving object detection algorithm for video surveillance system is proposed based on dynamic threshold computation. The authors describe their method in detail and compared with the existing methods. The results show that the proposed algorithm is more accuracy. Also, the paper is well structured. However, there are still some problems that the authors should explained before the paper can be considered for publication.

1 the threshold computation requires the parameters of the camera. How to deal with the videos from cameras without their information? How to find out the threshold in these videos?

2 The proposed projection calculation can only deal with the objects that moving along the plane ground. How about objects in different height or not on the plane ground such as hill or steps. Furthermore, if an object moving in 3D space such as drone, how to calculate the threshold?

3 The authors should give a performance evaluation of the proposed method. Also the accuracy in different frame rates should be evaluated, since it may increase the performance of the moving object detection.

Author Response

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Moving object detection in traffic surveillance video: A new MOD-AT method based on adaptive threshold” (No. ijgi-1373318). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responses to the reviewer’s comments are as following.

Reviewer #1:

The videos threshold computation without the parameters of the camera.

Response: Thanks for the reviewer’s comments. This is a constructive suggestion by the reviewers. The threshold calculation method proposed in the paper cannot be completed without camera parameters. We need to know the three-dimensional coordinates of the camera center point and the homography matrix H. We can obtain the homography matrix H according to the method in section 3.2.1 of the article. Solve the problem by finding 4 or more corresponding points in the video image and the high-definition remote sensing image. The specific method of threshold calculation is described in detail in Section 3.2. For cameras without the above parameters, the threshold cannot be calculated. We also mentioned the future work on the improvement of the algorithm in the discussion section of the revision.

How to calculate thresholds for the objects in different height or not on the plane ground and an object moving in 3D space.

Response: The reviewer has made an excellent point here. In this paper, the method is mainly oriented to the monitoring area is flat. For different heights or not on the plane ground, under the premise of obtaining the internal and external parameters of the camera, the method is similar according to the camera model.

Specifically, the formula of the camera model can be expressed by. Where x represents image coordinates, X represents geographic three-dimensional coordinates,is a 3*4 matrix, k is internal parameter matrix, R is rotation matrix and T is translation matrix. And then, for any point in geographic space, it can be mapped to image space through P matrix.

At the same time, when the objects are at different heights, the camera parameters need to be accurately calibrated, and the DSM data of the surveillance scene needs to be collected. Due to the limitation of experimental data, the current article’s focuses on the plane ground. The expert suggestions will be explored in the future research. We have added to the discussion and revised the relevant content in Section 3.2.2.

The algorithm performance evaluates at different frame rates.

Response: Thanks for the reviewer’s reminds. We have added algorithm performance evaluation experiments for the effect of frame rate. Specifically, video #1 and video #2 are the experimental data used in this paper, where the video #1 frame rate is 25fps and the video #2 frame rate is 30fps.

To analyze the impact of different frame rates on the performance of the MOD-AT algorithm, the video #1 and video #2 frame rates were converted to 10fps, 20fps, 30fps, 40fps and 50fps,respectively. Then, the time efficiency and CPU of the algorithm were compared. As shown in the figure below, for video #1 and video #2 of different frame rates, video #1 and video #2 show the following change patterns. First, for every 10fps increase in frame rate, CPU usage increases by 15.4%-47.5%, and the time efficiency increases by 18.7%-43.7%. In addition, at the same frame rate, the more objects contained in each frame, the longer the processing time and the higher the CPU usage.

We have made correction in this revision as requested, and all these have been stated in a concise way in Section 4.4 of the revision.

Other changes

For the improvement of the manuscript, we also checked the whole paper and adjusted the manuscript’s grammar and spelling, as shown in the new manuscript.

In a word, we tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. And here we did not list the changes but marked in red in the resubmitted paper.

We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.

Author Response File: Author Response.docx

Reviewer 2 Report

The abstract has too many themes. It should explain the central issue of the paper briefly. For example, "the traditional moving object detection method ignores the threshold differences of image spatial caused by camera imaging characteristics"-- the description is unclear.
There is no need to provide too many equations. The research paper should focus on literature and its contribution.
The authors should provide a reason for the chosen baseline. For example, GGMM is a bio-inspired neural network for target search by multi-agents. Is it fair to use the method as a measure of comparison?
English grammar and style must be improved significantly.

Author Response

Dear Editors and Reviewers:

Reviewer #2:

The theme concise.

Response: The reviewer has made a very good point here. The summary has been revised as suggested to clarify the topic of the paper. The details can be seen in the version.

2 The unnecessary equations reduce.

Response: Thanks for the reviewer’s reminder. The original paper does involve more professional common-sense or procedural formulas. Some formulas in the original paper have been deleted and merged to ensure the simplicity and necessity of the formulas. The above has been elaborated in detail in the revision. See the new manuscript.

The rationality of method comparison.

Response: Thanks for the reviewer's comments. The GGMM in the manuscript describes the improved Gaussian mixture model, which is the abbreviation of the method used in the article [34] in line 102. Rather than "the traditional moving object detection method ignores the threshold differences of image spatial caused by camera imaging characteristics." To avoid ambiguity of abbreviation, the abbreviation of the GGMM algorithm in the paper is changed to IGMM, i.e., improved Gaussian mixture model, and the abbreviation corresponding to the experimental part is uniformly modified.

In this paper, three methods GVIBE, IGMM, and NPCM were selected to compare with MOD-AT in single-frame accuracy and multi-frame accuracy. GVIBE and IGMM are typical algorithms in traditional moving object detection methods. Both of these algorithms use a unified threshold value in the image space to filter the interference of the external environment, resulting in an unreasonable threshold value setting, thereby affecting the accuracy of moving object detection. NPCM is a new algorithm based on a nonlinear perspective correction model. The algorithm only considers the linear or nonlinear characteristics of the object on the image. It does not consider the projection distortion of the object size caused by the camera imaging mechanism. It also ignores the difference in imaging geometric characteristics of moving objects in different positions of the video frame. The threshold setting is not very specific, affecting the accuracy of moving object detection.

The English grammar and style improvement.

Response: Thanks for the reviewer’s reminder. We have revised the language expression of the paper as suggested. First, we uniformly modify the formula symbols of the paper to increase the readability of the paper. At the same time, the content and title description of the chart highlight the main points to be expressed. In addition, we have uniformly revised the inappropriate language and grammar in the paper. The details can be seen in the version.

Other changes

For the improvement of the manuscript, we also checked the whole paper and adjusted the manuscript’s grammar and spelling, as shown in the new manuscript.

We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.

Author Response File: Author Response.docx

Reviewer 3 Report

The manuscript addresses the moving object detection problem for surveillance purposes. It detects cars and pedestrians in motion. The problem itself is interesting and the idea, using adaptive threshold values along with the GMM based background subtraction approach for moving object detection, sounds novel. As shown in the experiments section, designed MOD-AT method surpasses its contemporaries.

However, the followings remain issues to be refined:

It is hard to follow Section 3, and the method itself could be presented properly, eliminating confusing statements. The authors may consider:

Equations:

Inconsistency: LT, LJ or L_T, L_J.
Definitions: Several variables in the equations are missing, e.g. u₀, v₀. Is D(X_cam, Y_cam, 0) the origin? In Eq. 12, X_n is misspelled I think.
The current notations are confusing and are hard to follow. The authors can notate the variables so that one can clearly separate that are of the image space (coordinates) and of world space (coordinates).

Figures:

Figure 1: The overall process are expected to have corresponding subsections in the text, where the blocks of the figure are clearly described.
Clearly explaining high-definition remote sensing images: You generate these in a simulation environment, or are the images real?
Figure 2: This figure itself is confusing. It is relatively hard to separate the variables belonging to the world or image coordinates. You can refine this figure. For example, width (W) and length (T). Which on is along X and which on is along Y?
When you calculate the projected width, the width values for upper and lower sides are different. Why do you calculate only one width value?
For all figures throughout the manuscript, you can use proper captions, e.g., the caption of Figure 2 does not make sense. You can briefly describe the figures in the captions.

Assumptions:

In Subsection 3.2.2, the lines (223-229), it is stated that the projection width of the object on the ground will not change. Do you consider that the objects are rotationally symmetric?
In Subsection 3.2.3, the lines (281-286), how do you reason about the change range of the object width and height? Does the method work if we change the scale factor from 3/2 to 2? Can you experimentally showcase that 3/2 is a more appropriate scale factor than 2?

The manuscript is required to be linguistically refined.

The singular/plural nouns, e.g. line (11-12), the traditional moving object detection method(s),
References, line (332-333), e.g. based on the background mixture method [?].
Abbreviations, e.g. equations (27-30) MP, VP, MR, VR.

Author Response

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have revised the paper according to the suggestions. The responses answered the concerts of the reviewers. It is suggested the paper can be published after some language check.

Author Response

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning on our manuscript entitled “Moving object detection in traffic surveillance video: A new MOD-AT method based on adaptive threshold” (No. ijgi-1373318). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have sent the paper to professional English editing service to polish. We also checked the whole paper and adjusted the manuscript’s grammar and spelling which were marked in red in the paper. We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper.

We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.

Reviewer 2 Report

The authors have addressed my concerns, I'm satisfied to accept this paper after text editing.

Author Response

Dear Editors and Reviewers:

We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.

Reviewer 3 Report

The revised manuscript well addresses the revisions, and this version can be published. The authors may consider giving the manuscript a last language/spell check.

Author Response

Dear Editors and Reviewers:

We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.

Article Menu

Moving Object Detection in Traffic Surveillance Video: New MOD-AT Method Based on Adaptive Threshold

Other changes

Other changes

Further Information

Guidelines

MDPI Initiatives

Follow MDPI