Next Article in Journal
Evaluation of Connected Vehicle Pavement Roughness Data for Statewide Needs Assessment
Previous Article in Journal
Towards Resilient Critical Infrastructure in the Face of Extreme Wildfire Events: Lessons and Policy Pathways from the US and EU
 
 
Article
Peer-Review Record

A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications

Infrastructures 2025, 10(9), 247; https://doi.org/10.3390/infrastructures10090247
by Can Zhao, Yiyang Jiang and Jinfeng Wang *
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Infrastructures 2025, 10(9), 247; https://doi.org/10.3390/infrastructures10090247
Submission received: 19 June 2025 / Revised: 12 September 2025 / Accepted: 15 September 2025 / Published: 17 September 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors
  1. It is necessary to rewrite the introduction to emphasize the novelty of your study. Compared with existing research, what is the advancement in your study? What is the objective of your study?
  2. Carefully check the entire manuscript. the same paragraph should use the same tense. For example, Lines 144-147,”The combined dataset, restructured into YOLO-compatible format, 144 comprises diverse occlusion scenarios and was randomly partitioned into training, vali- 145 dation, and test sets at a 7:2:1 ratio. A 300-epoch data-augmented training regimen was146 implemented.” Two different tenses are used. Line 204, “Where “should be lowercase. Lines 68-71:””
  3. The reference style should be uniform and follow the requirements of the journal.
  4. Line 413, “monitoring are” should be “monitoring area”. In line 467,” It qachieves” should be “It achieves”. Please carefully check all grammar errors in the whole paper. The description of the article is very poor.
  5. How many images are used in this study? The test data is based on how many repetitions?
  6. Is Equation (4) correct? As far as I know, Gauss's area formula has a form that is easier to understand.
  7. How to obtain the measured values of acceleration? Which type of acceleration sensor was used? It is necessary to provide the specific index of the sensor. The existing test data is highly questionable.
  8. In lines 387-389, “This difference can be attributed primarily to the target being closer to the camera initially during upward movement, resulting in more significant occlusion, reduced system reaction time, and consequently, detection errors.” This sentence is confusing and ambiguous. As you refer in line 239, two cameras and two acceleration sensors are installed on each of the main towers on the east and west sides. It can be concluded that two cameras are installed in bilateral symmetry mode. When the target is closer to one camera, the other camera can capture the object.
  9. The font in the figures(figures 7-9, 11—14,17) should be modified according to the requirements of the magazine template.
Comments on the Quality of English Language

The Quality of the English Language is poor.

Author Response

For research article

 

 

Response to Reviewer 1 Comments

 

1. Summary

 

 

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files

 

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Not applicable

Rewrite introduction part

Is the research design appropriate?

Must be improved

 

Are the methods adequately described?

Must be improved

 

Are the results clearly presented?

Must be improved

 

Are the conclusions supported by the results?

Not applicable

Rewrite the conclusion part

Are all figures and tables clear and well-presented?

Can be improved

Revise the fronts in diagrams

 

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: It is necessary to rewrite the introduction to emphasize the novelty of your study. Compared with existing research, what is the advancement in your study? What is the objective of your study?

Response 1: Thank you for your helpful comments. In the revised introduction, we emphasize the increasing vulnerability of modern footbridges to vibrations caused by pedestrian activity and the challenges of real-time monitoring using traditional methods. We also clarify the importance of key metrics—flow rate, density, and speed—and how classical models can lead to inaccuracies, particularly in high-density conditions.

The novelty of our study lies in:

  1. Integrated Multi-Algorithm Pipeline: Combining YOLOv8, ByteTrack, and monocular depth estimation for precise real-time pedestrian flow measurement.
  2. Real-Time Structural Risk Assessment: Linking crowd dynamics to structural safety, triggering warnings for risks like overcrowding or excessive running.
  3. Validation for Structural Input: Demonstrating strong correlation between crowd data and bridge vibration simulations for structural assessment.
  4. Practical Implementation: Using lightweight models and standard cameras for easy deployment and upgrades on existing bridges.

These revisions highlight the advancements of our approach and its contributions to pedestrian flow management and structural safety. We appreciate your feedback, which has helped improve the manuscript.

 

Comments 2: Carefully check the entire manuscript. the same paragraph should use the same tense. For example, Lines 144-147,”The combined dataset, restructured into YOLO-compatible format, 144 comprises diverse occlusion scenarios and was randomly partitioned into training, vali- 145 dation, and test sets at a 7:2:1 ratio. A 300-epoch data-augmented training regimen was146 implemented.” Two different tenses are used. Line 204, “Where “should be lowercase. Lines 68-71:””

Response 2: Thank you for your suggestion. In response, we have revised the manuscript to ensure consistency in tense usage throughout the document. Specifically, we carefully checked the entire manuscript and ensured that the same tense is applied within each paragraph.

 

Comments 3: The reference style should be uniform and follow the requirements of the journal.

Response 3: Thank you for your insightful comment. We have carefully reviewed the manuscript and ensured the reference style is uniform. We appreciate your attention to detail.

 

Comments 4: Line 413, “monitoring are” should be “monitoring area”. In line 467,” It qachieves” should be “It achieves”. Please carefully check all grammar errors in the whole paper. The description of the article is very poor.

Response 4: We have also carefully reviewed the entire manuscript for any other grammatical errors and improved the overall language quality. We appreciate your feedback, which has helped enhance the clarity and professionalism of the manuscript.

 

Comments 5: How many images are used in this study? The test data is based on how many repetitions?

Response 5: Thank you for your question. Regarding the YOLO model's training, we used a total of 18,200 images to strengthen the model. These images were part of a diverse dataset that helped improve the detection and tracking performance in complex scenarios. As for the "repetitions" mentioned, if you are referring to the number of times the tests were repeated, every valid was repeated 3 times at least. We hope this clarifies your inquiry, and we’ve updated the manuscript to reflect these details more explicitly. Thank you again for your valuable feedback.

 

Comments 6: Is Equation (4) correct? As far as I know, Gauss's area formula has a form that is easier to understand.

Response 6: Thank you for your observation. The equation we used in Equation (4) is indeed a matrix form, which is mathematically equivalent to the traditional Gauss's area formula. The choice of this matrix form is due to our computational approach, which aligns better with our processing method, though it represents the same concept as the conventional formula. Thank you for pointing this out, and we appreciate your input in improving the manuscript.

 

Comments 7: How to obtain the measured values of acceleration? Which type of acceleration sensor was used? It is necessary to provide the specific index of the sensor. The existing test data is highly questionable.

Response 7: Thank you for your insightful comment. The test data collected from these sensors has been thoroughly validated, and we have ensured that the measurements align with the expected performance. We appreciate your concern about the reliability of the test data, and we have included additional details in the manuscript to reinforce the validity and accuracy of the sensor data. We hope this clarifies the data acquisition process, and thank you for your valuable input in improving the transparency of the manuscript.

 

Comments 8: In lines 387-389, “This difference can be attributed primarily to the target being closer to the camera initially during upward movement, resulting in more significant occlusion, reduced system reaction time, and consequently, detection errors.” This sentence is confusing and ambiguous. As you refer in line 239, two cameras and two acceleration sensors are installed on each of the main towers on the east and west sides. It can be concluded that two cameras are installed in bilateral symmetry mode. When the target is closer to one camera, the other camera can capture the object.

Response 8: Thank you for your comment and for pointing out the confusion in the sentence. As noted in the manuscript, the two cameras were indeed installed on the main towers on the east and west sides. However, due to the relatively long span of the bridge and the distance between the cameras, they were not able to capture the same target simultaneously. In this study, the two cameras were treated as independent units, effectively functioning as monocular vision systems. We have revised the manuscript to clarify this point and better explain the impact of camera positioning on detection accuracy.

 

Comments 9: The font in the figures(figures 7-9, 11—14,17) should be modified according to the requirements of the magazine template.

Response 9: Thank you for your observation. We have updated the font in Figures to Times New Roman, as per the magazine's template requirements. We appreciate your attention to this detail.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

General Comment:

This paper presents a lightweight, fully automated computer vision system for real-time monitoring of pedestrian dynamics on footbridges. By integrating object detection, multi-target tracking, and monocular depth estimation, it quantifies key metrics such as pedestrian flow rate, density, and velocity. Field tests validate the system’s effectiveness, showing a strong correlation (84.3%) between predicted vibration patterns and measured acceleration data.

The study is original, well-structured, and presents a promising technological framework. The validation results demonstrate good alignment between the values acquired by the proposed framework and the ground truth. However, several key aspects require further elaboration and clarification to increase scientific rigor and reproducibility. I therefore recommend major revision. Please refer to the specific comments below for detailed suggestions.

Specific Comments

Comment 1 – Literature Review

The paper overlooks several recent contributions to pedestrian detection and tracking using computer vision, implementing methods similar to those proposed in this study. For example, https://doi.org/10.3390/rs15082088 integrated YOLO with Deep SORT for real-time pedestrian and vehicle tracking in complex urban environments, demonstrating improved accuracy. Similarly, https://doi.org/10.1016/j.trip.2025.101366  proposed a comprehensive tracking framework combining YOLO, coordinate transformations, Kalman filtering, cost matrices, and re-identification strategies to track pedestrians and vehicles and extract surrogate safety metrics.

The authors should acknowledge these contributions to enhance the value and positioning of the paper.

Comment 2 – Contribution

The authors highlight key limitations in current pedestrian monitoring systems—namely limited real-time processing, high costs, suboptimal efficiency, and delayed response. However, the introduction should more clearly articulate how the proposed system addresses each of these issues.

For instance, what specific technical optimisations reduce implementation cost? How does the proposed architecture enable faster or more reliable real-time analysis compared to other state-of-the-art solutions? Addressing these questions explicitly would strengthen the contribution and enhance the paper’s impact.

Comment 3 – Methodological Framework

Several refinements are needed in the methodological section:

  • Formal Consistency: All mathematical variables, indices, and acronyms must be clearly defined before use. Once introduced, notation must remain consistent across the manuscript. For example, the symbol “S” is used for area in Equation 4, while “A” appears elsewhere for the same quantity. Such inconsistencies hinder comprehension and reduce formal clarity. Consistency ensures that readers—especially those replicating the study—can follow the derivation and methodology with ease.
  • Visual Detection Filtering: It should be clarified whether a confidence-based filtering mechanism has been applied in the visual perception layer to exclude unreliable pedestrian detections from YOLO. Such filtering is essential in real-world applications, where noise and false positives (e.g., due to low visibility) can otherwise degrade the robustness of tracking and downstream metric estimation. Including even a simple thresholding mechanism helps stabilise performance and increase trust in the system’s outputs.
  • Definition of Flow Rate: The manuscript defines pedestrian flow rate as the number of people passing through an area. This is conceptually inaccurate. According to the hydrodynamic analogy common in pedestrian modelling, flow rate is typically measured across a line or section—not across an area. The implementation appears correct, as the authors apply a detection line to estimate the flow. However, the text should be revised to ensure terminological accuracy and prevent confusion.

Comment 4 – Case study

In the case study, the technical specifications of the sensing hardware should be provided. For the cameras, this includes resolution, frame rate, etc. For the accelerometers, measurement range, sensitivity, and sampling rate should be specified.

Providing these details is essential not only for replicability but also to evaluate the cost-efficiency of the solution. Given that the paper claims to address prohibitive implementation costs (a common shortcoming in prior work), the authors should clearly demonstrate the affordability and accessibility of the hardware used. Including a short cost estimate, or benchmarking it against alternative technologies (e.g., LIDAR or high-end sensor networks), would further support this claim.

Comment 5 – Discussion

The discussion should be enriched by addressing external environmental factors—such as lighting conditions, weather variability, and visibility—affecting system performance. These aspects are critical in outdoor deployment scenarios and often introduce significant variability in detection accuracy.

A brief sensitivity analysis or at least a qualitative commentary on how the system is expected to perform in low-light or adverse weather (rain, fog) would add robustness to the evaluation and offer practical insight for real-world applications.

Comment 6 – Future developments

The study opens compelling directions for future research. While it focuses on pedestrian bridge management, similar structural integrity concerns are increasingly relevant for vehicular bridges, especially under rising traffic volumes and cargo loads. Recent literature (e.g., https://doi.org/10.1080/15732479.2018.1496119; https://doi.org/10.1080/15732479.2022.2059525; https://doi.org/10.1109/TITS.2024.3371265) highlights the growing risks of overloading.

Weight-in-motion (WIM) systems have proven effective for monitoring bridge loads, but their high cost often limits deployment (e.g., https://doi.org/10.1016/j.autcon.2021.103844; https://doi.org/10.1016/j.measurement.2021.110408; https://doi.org/10.1016/j.cstp.2023.101023). An innovative development could involve fusing WIM data with potentially lower-cost, computer vision-derived traffic information, applying a similar framework as this study for vehicular bridges.

Acknowledging this path—and drawing from recent literature in the field of bridge overloading—would demonstrate the adaptability of the proposed system to broader infrastructure management challenges and elevate the strategic impact of the research.

Author Response

For research article

 

 

Response to Reviewer 2 Comments

 

1. Summary

 

 

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files

 

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Can be improved

 

Is the research design appropriate?

Yes

 

Are the methods adequately described?

Can be improved

 

Are the results clearly presented?

Can be improved

 

Are the conclusions supported by the results?

Yes

 

Are all figures and tables clear and well-presented?

Yes

 

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1:  Literature Review

The paper overlooks several recent contributions to pedestrian detection and tracking using computer vision, implementing methods similar to those proposed in this study. For example, https://doi.org/10.3390/rs15082088 integrated YOLO with Deep SORT for real-time pedestrian and vehicle tracking in complex urban environments, demonstrating improved accuracy. Similarly, https://doi.org/10.1016/j.trip.2025.101366  proposed a comprehensive tracking framework combining YOLO, coordinate transformations, Kalman filtering, cost matrices, and re-identification strategies to track pedestrians and vehicles and extract surrogate safety metrics.

The authors should acknowledge these contributions to enhance the value and positioning of the paper.

 

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have rewritten the introduction. We have revised the Introduction to include relevant contributions such as those from DOI: 10.3390/rs15082088 and DOI: 10.1016/j.trip.2025.101366.

 

Comments 2: Contribution

The authors highlight key limitations in current pedestrian monitoring systems—namely limited real-time processing, high costs, suboptimal efficiency, and delayed response. However, the introduction should more clearly articulate how the proposed system addresses each of these issues.

For instance, what specific technical optimisations reduce implementation cost? How does the proposed architecture enable faster or more reliable real-time analysis compared to other state-of-the-art solutions? Addressing these questions explicitly would strengthen the contribution and enhance the paper’s impact.

Response 2: We appreciate your suggestion to explicitly clarify the contributions of our system. We have now revised the Introduction to directly address how our system optimizes cost and improves real-time analysis.

 

Comments 3: Methodological Framework

Several refinements are needed in the methodological section:

  • Formal Consistency: All mathematical variables, indices, and acronyms must be clearly defined before use. Once introduced, notation must remain consistent across the manuscript. For example, the symbol “S” is used for area in Equation 4, while “A” appears elsewhere for the same quantity. Such inconsistencies hinder comprehension and reduce formal clarity. Consistency ensures that readers—especially those replicating the study—can follow the derivation and methodology with ease.
  • Visual Detection Filtering: It should be clarified whether a confidence-based filtering mechanism has been applied in the visual perception layer to exclude unreliable pedestrian detections from YOLO. Such filtering is essential in real-world applications, where noise and false positives (e.g., due to low visibility) can otherwise degrade the robustness of tracking and downstream metric estimation. Including even a simple thresholding mechanism helps stabilise performance and increase trust in the system’s outputs.
  • Definition of Flow Rate: The manuscript defines pedestrian flow rate as the number of people passing through an area. This is conceptually inaccurate. According to the hydrodynamic analogy common in pedestrian modelling, flow rate is typically measured across a line or section—not across an area. The implementation appears correct, as the authors apply a detection line to estimate the flow. However, the text should be revised to ensure terminological accuracy and prevent confusion.

Response 3: We have carefully reviewed the manuscript and ensured that all variables, indices, and acronyms are clearly defined before use. We also checked for consistency across the manuscript. The symbol inconsistency for the area in Equation (4) has been addressed, and "S" has been replaced with "A" to ensure consistency. Additionally, we have clarified that a confidence-based filtering mechanism is applied in the Visual Perception Layer to improve detection reliability, as requested. The definition of flow rate also revised as “cross through a counting line”.

 

Comments 4: Case study

In the case study, the technical specifications of the sensing hardware should be provided. For the cameras, this includes resolution, frame rate, etc. For the accelerometers, measurement range, sensitivity, and sampling rate should be specified.

Providing these details is essential not only for replicability but also to evaluate the cost-efficiency of the solution. Given that the paper claims to address prohibitive implementation costs (a common shortcoming in prior work), the authors should clearly demonstrate the affordability and accessibility of the hardware used. Including a short cost estimate, or benchmarking it against alternative technologies (e.g., LIDAR or high-end sensor networks), would further support this claim.

Response 4: We have added the specific technical specifications for both the cameras and accelerometers in the Case Study section. This includes camera resolution, frame rate, and the accelerometers' measurement range, sensitivity, and sampling rate. We also include a brief cost comparison with other sensor technologies to provide insight into the affordability of the solution.

 

Comments 5:  Discussion

The discussion should be enriched by addressing external environmental factors—such as lighting conditions, weather variability, and visibility—affecting system performance. These aspects are critical in outdoor deployment scenarios and often introduce significant variability in detection accuracy.

A brief sensitivity analysis or at least a qualitative commentary on how the system is expected to perform in low-light or adverse weather (rain, fog) would add robustness to the evaluation and offer practical insight for real-world applications.

Response 5: We have expanded the Discussion to include a qualitative analysis of how environmental factors—such as lighting and weather conditions—can affect system performance. We acknowledge that factors like low-light conditions or rain could reduce detection accuracy, and we provide a brief future discussion of potential developments.

 

Comments 6: Future developments

The study opens compelling directions for future research. While it focuses on pedestrian bridge management, similar structural integrity concerns are increasingly relevant for vehicular bridges, especially under rising traffic volumes and cargo loads. Recent literature (e.g., https://doi.org/10.1080/15732479.2018.1496119; https://doi.org/10.1080/15732479.2022.2059525; https://doi.org/10.1109/TITS.2024.3371265) highlights the growing risks of overloading.

Weight-in-motion (WIM) systems have proven effective for monitoring bridge loads, but their high cost often limits deployment (e.g., https://doi.org/10.1016/j.autcon.2021.103844; https://doi.org/10.1016/j.measurement.2021.110408; https://doi.org/10.1016/j.cstp.2023.101023). An innovative development could involve fusing WIM data with potentially lower-cost, computer vision-derived traffic information, applying a similar framework as this study for vehicular bridges.

Acknowledging this path—and drawing from recent literature in the field of bridge overloading—would demonstrate the adaptability of the proposed system to broader infrastructure management challenges and elevate the strategic impact of the research.

Response 6: We have revised the Future Developments section to include a discussion on the potential application of the system to vehicular bridges. We also believe that further research is needed to address the challenge for WIM applications, due to visual methods exhibit inherent limitations for accurate vehicle weight estimation.

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a computer vision system for real-time monitoring of crowd dynamics on pedestrian bridges. It integrates object detection, multi-object tracking, and monocular depth estimation to accurately quantify key crowd metrics, including pedestrian flow, density, and speed. The effectiveness of the system is demonstrated through field testing with real-world case studies. However, the manuscript contains several issues that require revision, including those found in the results analysis section. After careful review, I recommend a major revision. The specific issues are as follows:

  1. The literature review in the introduction should be moved to a separate Section 2 dedicated to literature review. A good introduction should also succinctly answer the question “what do we know already?”
  2. The content of Section 2 should be the "Research Methodology" section.
  3. Line 211: The term “linear Kalman filter” should be explained in detail, and the theoretical rationale for choosing this algorithm should be provided.
  4. The four evaluation metrics used for the model in Figure 7 and Table 2 (Precision, Recall, mAP50, and mAP50-95) should be briefly introduced and explained.
  5. Line 274: What does “four columns” refer to?
  6. Line 294: Why were these three flow scenarios (“15-person queue formation, 30-erson queue formation, and 30-person random movement”) chosen?
  7. Table 8 shows the peak acceleration occurs at the 1/8 position, whereas the text states it is at the 1/4 position, which is contradictory.
  8. In line 454, the phrase “and the overall structural response trend is within a reasonable range,” and in line 456, “with a relative error of 17.68%, but still within a reasonable range,” lack a clear definition of what is a “reasonable range.” A specific evaluation criterion or benchmark should be provided.
  9. In Section 5 (Conclusion), the practical significance and limitations of the study are missing, and the discussion on future research directions is insufficient.

Author Response

For research article

 

 

Response to Reviewer 3 Comments

 

1. Summary

 

 

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions in the re-submitted files

 

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Must be improved

Rewrite introduction part

Is the research design appropriate?

Must be improved

 

Are the methods adequately described?

Must be improved

 

Are the results clearly presented?

Must be improved

 

Are the conclusions supported by the results?

Must be improved

Rewrite the conclusion part

Are all figures and tables clear and well-presented?

Must be improved

Revise the diagrams

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: The literature review in the introduction should be moved to a separate Section 2 dedicated to literature review. A good introduction should also succinctly answer the question “what do we know already?”

Response 1: Agree. Thank you for your comments. Therefore, we have rewritten the introduction section.

 

Comments 2: The content of Section 2 should be the "Research Methodology" section.

Response 2: Thank you for pointing this out. We have revised the section into “Research Methodology”.

 

Comments 3: Line 211: The term “linear Kalman filter” should be explained in detail, and the theoretical rationale for choosing this algorithm should be provided.

Response 3: Thank you for highlighting this gap. We have added a detailed explanation of the linear Kalman filter (LKF) in Section 2.2.2 including justification for selecting LKF over nonlinear variants based on computational efficiency and motion linearity.

 

Comments 4: The four evaluation metrics used for the model in Figure 7 and Table 2 (Precision, Recall, mAP50, and mAP50-95) should be briefly introduced and explained.

Response 4: We appreciate this constructive suggestion. A dedicated paragraph explaining the four evaluation metrics has been added at the beginning of Section 3.1.1.

 

Comments 5: Line 274: What does “four columns” refer to?

Response 5: Thank you for highlighting this ambiguity. We have clarified "four columns" refers to spatial sampling columns on the bridge deck. The "columns" denote longitudinal sampling zones along two walkways:

1. Eastern walkway: Left and Right edges,

2. Western walkway: Left and Right edges.

In total “four columns”. This layout captures positional variance across the full deck width, critical for monocular ranging calibration. The revised text in Section 3.1.2 explicitly defines these zones to eliminate confusion.

 

Comments 6: Line 294: Why were these three flow scenarios (“15-person queue formation, 30-erson queue formation, and 30-person random movement”) chosen?

Response 6: We appreciate the reviewer's inquiry about our experimental design. The selection of three specific scenarios (15-person queue, 30-person queue, and 30-person random movement) was based on:

  1. Density coverage: In our 35m² monitoring area, 15-person formations achieve 0.43 pers/m² aligning with EN03 Level 1 (Sparse Traffic). 30-person configurations reach 0.86 pers/m², matching EN03 Level 2 (Busy Traffic) thresholds.
  2. Occlusion testing: Queues generate predictable linear occlusion patterns. Random movement creates chaotic multi-directional occlusions
  3. Ecological validity: All scenarios reflect actual pedestrian distributions observed during field surveys.

This rationale is now explicitly detailed in Section 3.2.1.

 

Comments 7: Table 8 shows the peak acceleration occurs at the 1/8 position, whereas the text states it is at the 1/4 position, which is contradictory.

Response 7: We sincerely appreciate the reviewer's meticulous observation. This was indeed a documentation error. The peak acceleration value 0.095 m/s² occurs at the 1/4 section, not the 1/8 section as initially mislabeled in Table 8. We sincerely appreciate the reviewer's meticulous observation. This was indeed a documentation error:

 

Comments 8: In line 454, the phrase “and the overall structural response trend is within a reasonable range,” and in line 456, “with a relative error of 17.68%, but still within a reasonable range,” lack a clear definition of what is a “reasonable range.” A specific evaluation criterion or benchmark should be provided.

Response 8: Thank you for your insightful comment. We have decided to remove the reference to the term “reasonable range” and instead focus on explaining the relative errors for the two sets of measured data.

After supplementing the experimental data and the acceleration sensor’s index, we confirmed that there are two sets of measured data for this point: 0.081 m/s² and 0.090 m/s², with a theoretical value of 0.095 m/s². The relative errors between the theoretical and measured data are 17.48% for the first set (0.081 m/s²) and 5.56% for the second set (0.090 m/s²).

The differences in the relative errors between the two data sets can be attributed to the actual positions of pedestrians during the test. Specifically, the two measurement points are located on opposite sides of the same section, approximately 4 meters apart. The actual pedestrian excitation was likely closer to the second measurement point (Data Point 2), which explains why the peak acceleration at this point is higher and closer to the theoretical value.

It is worth noting that both the measured and theoretical acceleration values for these two points are below the threshold of 0.15 m/s², as specified in the EN03 standard for pedestrian-induced vibrations. Therefore, despite the relative error of 17.48% for the first set of data, both sets of measurements remain within the acceptable limits set by the EN03 standard, ensuring that the acceleration values do not exceed the safety threshold.

We have revised the manuscript to reflect this clarification and have removed the mention of a "reasonable range," focusing instead on the accuracy of the experimental data and its compliance with the EN03 standard.

 

Comments 9: In Section 5 (Conclusion), the practical significance and limitations of the study are missing, and the discussion on future research directions is insufficient.

Response 9: Thank you for your insightful feedback. We have made significant revisions to address the practical significance, limitations and future discussion of our system. The system was deployed with a total cost of around $200, including $150 for camera installation and $50 for network setup, making it more affordable than alternative methods like LiDAR. However, we acknowledge that environmental factors, such as lighting conditions and visibility, could affect detection accuracy, particularly in adverse weather or low-light conditions. Future work will focus on improving system robustness under challenging conditions by using multiple cameras for cross-validation and integrating additional sensors like radar or thermal cameras.

Thank you for helping us improve the manuscript.

 

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The paper presents a promising approach, the Computer Vision-Based Pedestrian Flow Management System for Footbridges and their Applications.

Authors are suggested to clarify the following comments:

  1. The title of the paper needs to be rewritten due to grammatical errors in the syntax.
  2. Contribution points are not clear. Need to increase the novelty of work.  In Figure 2, the authors should include their architecture, not just yolo v8.
  3. In Table 3, the comparative algorithm dataset (15-person/30-person queue) is not adequate. 
  4. Different benchmark dataset needs to be included in the experiment part with the comparison result.
Comments on the Quality of English Language

 The writing of the English Language in this paper needs to improve.

Author Response

For research article

 

 

Response to Reviewer 4 Comments

 

1. Summary

 

 

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files

 

2. Questions for General Evaluation

Reviewer’s Evaluation

Response and Revisions

Does the introduction provide sufficient background and include all relevant references?

Can be improved

 

Is the research design appropriate?

Can be improved

 

Are the methods adequately described?

Must be improved

 

Are the results clearly presented?

Can be improved

 

Are the conclusions supported by the results?

Can be improved

 

Are all figures and tables clear and well-presented?

Can be improved

 

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: The title of the paper needs to be rewritten due to grammatical errors in the syntax.

Response 1: Thank you for this comment. We agree with your suggestion and have rewritten the title for clarity and grammatical correctness. The revised title is: "A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications." This change can be found in the first section of the revised manuscript.

 

Comments 2: Contribution points are not clear. Need to increase the novelty of work.  In Figure 2, the authors should include their architecture, not just yolo v8

Response 2: We appreciate this feedback and agree with your suggestion. To clarify the contribution of this work and highlight its novelty, we have restructured the Contribution section to better define the problem and how our system addresses the challenges of pedestrian flow management on footbridges. However, we have not made adjustments to the YOLOv8 architecture itself. Instead, we have focused on enhancing its performance by utilizing an updated dataset for more robust training. The updated dataset includes more diverse pedestrian scenarios, allowing for improved accuracy.

 

Comments 3: In Table 3, the comparative algorithm dataset (15-person/30-person queue) is not adequate.

Response 3: Thank you for pointing this out. We agree that the dataset used for evaluating the algorithm should be comprehensive. To address this, we have incorporated a specially designed dataset that focuses on the density and distribution of pedestrians on footbridges. This dataset was created specifically to validate the accuracy of the proposed algorithm under various pedestrian densities and distribution patterns. We have also provided a detailed explanation of the design rationale and why this dataset was selected in the revised manuscript.

Furthermore, we have conducted extensive validation using a larger dataset in the field test, which is included in the updated results and discussion sections. These field tests provide strong empirical evidence supporting the algorithm's performance in real-world scenarios.

Thank you for pointing this out, and we appreciate your input in improving the manuscript.

 

Comments 4: Different benchmark dataset needs to be included in the experiment part with the comparison result.

Response 4: Thank you for your comment. In response, we have focused on comparing the results of our system with manually measured data during the feasibility validation. This comparison was conducted to demonstrate the practicality and reliability of the system specifically for the target footbridge. While we have not included additional benchmark datasets in this particular experiment, we acknowledge the benefit of expanding the dataset in future research. We plan to incorporate a wider range of datasets in upcoming experiments to further validate the system’s performance across diverse conditions.

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I carefully check the revised manuscript in the attached version and the revisions have been responded to my comments point-to-point in the revised version and highlighted in yellow, so that I can easily find the the places that have been modified. I agree to accept the article.

Author Response

Thank you very much for your time and effort in reviewing our manuscript and for confirming that the revisions have addressed your comments. We greatly appreciate your positive feedback and are pleased to hear that you agree to accept the article.

We are also grateful for your suggestion regarding the highlighting of modifications. In accordance with your advice, all changes have been marked in yellow in the revised manuscript to facilitate easy identification of the updated content.

Thank you once again for your valuable input, which has significantly improved the quality of our paper.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors adequately addressed my previous comments.

Author Response

Thank you very much for your time and effort in reviewing our manuscript and for confirming that the revisions have addressed your comments. We greatly appreciate your positive feedback and are pleased to hear that you agree to accept the article.

We are also grateful for your suggestion regarding the highlighting of modifications. In accordance with your advice, all changes have been marked in yellow in the revised manuscript to facilitate easy identification of the updated content.

Thank you once again for your valuable input, which has significantly improved the quality of our paper.

Reviewer 3 Report

Comments and Suggestions for Authors

The research questions and objectives of the article have been clearly defined, and the introduction is relatively complete, with clear content and comprehensive expression. At present, the structure of the article is rigorous, and the logic is coherent.

However, one major issue remains unaddressed: the lack of an independent literature review section. The review of research in this field is not comprehensive. Please supplement the paper with a standalone literature review section to provide a thorough review of extant studies.

Author Response

Comments 1: However, one major issue remains unaddressed: the lack of an independent literature review section. The review of research in this field is not comprehensive. Please supplement the paper with a standalone literature review section to provide a thorough review of extant studies.

Response 1: We sincerely thank the reviewer for this insightful comment. We agree that a comprehensive and independent literature review is essential for situating our work within the broader research context. In direct response to this comment, we have added a new, standalone section titled "2. Literature Review".

 

Reviewer 4 Report

Comments and Suggestions for Authors

The paper presents a promising approach, A Computer Vision-Based Pedestrian Flow Management System for Footbridges and Its Applications.

Authors are suggested to clarify the following comments:

  1. In the experiment part benchmark dataset comparison is missing.
  2. Related work is missing.

Author Response

Comments 1: In the experiment part benchmark dataset comparison is missing.

Response 1: We sincerely appreciate the reviewer's valuable comment. We agree that benchmarking against standard datasets is crucial for validating the model's performance. In response, we have now conducted extensive comparative experiments on the CrowdHuman dataset, a widely recognized benchmark for crowded pedestrian detection. The results, now included in the new Table 5 in Section 4.1.1,

Comments 2: Related work is missing.

Response 2: We sincerely thank the reviewer for this important feedback. We also note that another reviewer similarly pointed out the need for a more comprehensive literature review. In direct response to these valuable comments, we have substantially expanded and reorganized the related work section to provide a more thorough and systematic review of extant studies.

Specifically, we have now added a dedicated Section 2: "Literature Review and Related Works".

Back to TopTop