Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n

Math. Comput. Appl. 2025, 30(2), 44; https://doi.org/10.3390/mca30020044

by Adrián Núñez-Vieyra

, Juan C. Olivares-Rojas^*

, Rogelio Ferreira-Escutia, Arturo Méndez-Patiño, José A. Gutiérrez-Gnecchi

and Enrique Reyes-Archundia

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Math. Comput. Appl. 2025, 30(2), 44; https://doi.org/10.3390/mca30020044

Submission received: 3 February 2025 / Revised: 1 April 2025 / Accepted: 15 April 2025 / Published: 17 April 2025

(This article belongs to the Special Issue New Trends in Computational Intelligence and Applications 2024)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Authors present some results about the automatic process for analysing/detecting abnormal people flows on videos. As experimentation using Deep Learning approach the proposal is based on the second central statistic moment but some details could be considered in order to improve the paper.

-The proposal is based on using existing models to detect pedestrian and the counting in order to decide the kind of flow considering statistical values. Some details about the selected deep model are missing such as specific description of the model and parameter values for training the pre-trained model. In the document some parameters in experiments are mentioned (tracker, iou, conf) but never explained.

-When comparing against other methods (Table 3), it is not clear whether experimentation was carried out under the same conditions: running on the same hardware and using the same training/testing sets for all methods, how the results were validated (cross validation?); recall, specificity and F1 metrics are also missing. In addition, the difference/advantages of the proposal must be clearly presented when considering other proposals in SOTA, otherwise the proposal seems to be just a python library use-application.

-In table 3, additional most-recent works must be considered and for the case of real time methods, the runtimes must be reported.

-When describing the hardware for experiments, the processor speed is missing

Comments on the Quality of English Language

It is acceptable.

Author Response

The authors appreciate the reviewers' comments and observations on our work, which have allowed us to substantially improve it. We also appreciate the time spent reviewing and editing it.

The responses to the comments are presented below:

The proposal is based on using existing models to detect pedestrian and the counting in order to decide the kind of flow considering statistical values. Some details about the selected deep model are missing such as specific description of the model and parameter values for training the pre-trained model. In the document some parameters in experiments are mentioned (tracker, iou, conf) but never explained.

In response, the parameters whose default values were modified have been added and explained on lines 251-269.

When comparing against other methods (Table 3), it is not clear whether experimentation was carried out under the same conditions: running on the same hardware and using the same training/testing sets for all methods, how the results were validated (cross validation?); recall, specificity and F1 metrics are also missing. In addition, the difference/advantages of the proposal must be clearly presented when considering other proposals in SOTA, otherwise the proposal seems to be just a python library use-application.

In response to this comment, it's important to mention that we agree with what was expressed. A brief explanation has been added to the document on lines 306 to 313. In addition to what was mentioned, we want to point out that, in this document, we believe our contributions focus on how we approach the implementation of automatic contextualization, a situation not found in the cited research. We also focus on how we measure the accuracy of the tracking algorithm, which is closely related to more robust pedestrian counting.

In table 3, additional most-recent works must be considered and for the case of real time methods, the runtimes must be reported.

In response to your comment, some references have been added to Table 3, highlighting the accuracy and the model used. In the case of the runtimes you mention, it is difficult to compare them because, as mentioned in the previous point, the equipment, videos, and context are different and directly influence the results, especially timing. However, in our case, the importance of processing time depends on maintaining a certain synchronization with the original video to preserve the feeling of real-time processing and the natural fluidity of the scenes, considering that the database is also updated and that processing and storage are done over a local wireless network.

When describing the hardware for experiments, the processor speed is missing

In response to this observation, data was added to line 247.

Reviewer 2 Report

Comments and Suggestions for Authors

The article is very interesting, the summary is well methodologically conceived, as is the entire structure of the paper. However, I would like to send my comments on certain objections to the paper:

1. In Figure 1, in case F, after the condition where the arrow reaches the node, it is unclear where the information sequence is directed. This needs to be better graphically presented. In addition, the goal of the diagram (recognizing the room) is unclear and it needs to be presented more clearly.
2. In general, the model needs to be presented more clearly according to the description, which is very well written.
3. It is necessary to state the method of measuring the values shown in Tables 1 and 2.
4. According to the results from Table 3, why is your method better according to the values obtained, or what is the advantage over others.

Author Response

The authors appreciate the reviewers' comments and observations on our work, as they have allowed us to substantially improve it. We appreciate the time spent reviewing and editing it.

The responses to the comments are presented below:

The article is very interesting, the summary is well methodologically conceived, as is the entire structure of the paper. However, I would like to send my comments on certain objections to the paper:

In Figure 1, in case F, after the condition where the arrow reaches the node, it is unclear where the information sequence is directed. This needs to be better graphically presented. In addition, the goal of the diagram (recognizing the room) is unclear and it needs to be presented more clearly.

Taking this observation into account, the algorithm has been organized into two parts (main process and registration process) for clarity. It has also been modified and a description has been added, highlighted in yellow.

In general, the model needs to be presented more clearly according to the description, which is very well written.

As mentioned in the previous point, its description has also been modified in addition to the diagram, highlighting the changes in yellow. Lines 124-145, 149, 155-157, 163

It is necessary to state the method of measuring the values shown in Tables 1 and 2.

To address this observation, a description of how the measurements were performed was added for Table 1 and Table 2, which are marked in yellow on lines 242-252.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

After this revised version and considering previous comments, it is very difficult to highlight the relevance of the proposal mainly because of the non-fare comparison in table 3. Considering the four methods added, it seems the proposal is not competitive.

Related to the automatic contextualization, it seems to be a common feature in tracking systems via historic log files where the long content could be analysed via graphic-summarized-visualization techniques. The proposal takes in to account specific IDs found an this last could be an interesting additional detail in a log file.

As python experimental content, the paper could be relevant to a specific reader-field but not for that related to the journal reach.

Author Response

Reviewer 1

After this revised version and considering previous comments, it is very difficult to highlight the relevance of the proposal mainly because of the non-fare comparison in table 3. Considering the four methods added, it seems the proposal is not competitive.

We've reviewed your comments and greatly appreciate this type of feedback, and to address them, we've made some changes to the document, we have expanded the document to highlight the objectives of this research, emphasizing three aspects that significantly contribute to the relevance of this manuscript: automatic contextualization, the evaluation of strategies used for person tracking, and the generation of alerts for abnormal pedestrian flows. These topics have been highlighted with new yellow text (lines 125-134, 224-228, 313-352). Which appear highlighted in yellow in the reviewed document.

Automatic contextualization with its respective stages of recording, feedback, and alert generation based on the degree of abnormality in pedestrian flows. Additional information has been added to this topic (lines 125-134 and 224-228).
Evaluation of the tracking method, which considers the assignment of IDs upon detection of a person and only considers a reliable count if the person maintains that ID until they leave the frame. This section is described on page 5 and is marked in green.
Generation of alerts for abnormal pedestrian flows, which is feedback from the data generated in automatic contextualization and whose accuracy has been measured considering the retention of the identifiers assigned in the tracking of people. Information on this topic has been added (lines 224-228 and 336-339). However, it is detailed on page 10, where it is highlighted in green, and an example of alert issuance is shown in Figure 6.

Therefore, the counting system itself (person detection and classification) is only part of the solution and becomes meaningful with the automatic contextualization and evaluation of the tracking method. On this same topic, the accuracy of the count is affected because people are tracked from the moment, they enter the frame until they leave, as long as they maintain the same identification.

Regarding Table 3 and its results, we have made some changes:

- A column has been added to distinguish studies that consider tracking as part of the people counting system (use the ID to calculate accuracy).

- Two rows have been added with two specific cases of our proposal, in which fewer hours of video are used in the tests and with pedestrian flows with less occlusion. A sample of the videos used, as well as the application code in Python and a script with the database are available in a repository (https://itecm-my.sharepoint.com/:f:/g/personal/adrian_nv_morelia_tecnm_mx/EgUaRneiQYJOm12KLQlaZ7kBpYQ__zSl0PyrBUK9v5MMMQ?e=GhXx0t). This information was also included in the document in the footnote on page 6.

Therefore, it is important to clarify that the research referenced in Table 3 calculates the accuracy of the detection algorithm, as that is its proposal, and the tracking accuracy is calculated separately. In our case, we calculate the accuracy of people counting, including the correct identification retention during tracking. Some research in Table 3 does not even incorporate the tracking process, as people counting is performed on individual images rather than a sequence of images.

These changes are explained starting from lines 313-352 and are highlighted in yellow.

Related to the automatic contextualization, it seems to be a common feature in tracking systems via historic log files where the long content could be analysed via graphic-summarized-visualization techniques. The proposal takes in to account specific IDs found an this last could be an interesting additional detail in a log file.

As mentioned in the answer to the previous point, automatic contextualization not only includes data recording but also continuous automatic feedback, which reinforces the generation of alerts when abnormal pedestrian flows are detected. Regarding the use of identification, counting accuracy considers tracking people from the moment they enter the area until they leave, provided they maintain the same identification.

As python experimental content, the paper could be relevant to a specific reader-field but not for that related to the journal reach.

We once again appreciate the observations that enrich this research and, in this case, we submit our manuscript to the special issue “New Trends in Computational Intelligence and Applications 2024” (https://www.mdpi.com/journal/mca/special_issues/7TRJ7T3H82) and we consider that it can fit into some of the topics proposed in said special issue: Automatic Image Processing, Machine Learning and Statistical Learning.

Article Menu

Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n

Further Information

Guidelines

MDPI Initiatives

Follow MDPI