Next Article in Journal
Perception of Audio–Visual Synchronization in Olfactory-Enhanced 360-Degree Video
Previous Article in Journal
Investigation on the Effects of Operating Parameters on the Transient Thermal Behavior of the Wet Clutch in Helicopters
 
 
Article
Peer-Review Record

Multi-Camera Machine Vision for Detecting and Analyzing Vehicle–Pedestrian Conflicts at Signalized Intersections: Deep Neural-Based Pose Estimation Algorithms

Appl. Sci. 2025, 15(19), 10413; https://doi.org/10.3390/app151910413
by Ahmed Mohamed * and Mohamed M. Ahmed
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Appl. Sci. 2025, 15(19), 10413; https://doi.org/10.3390/app151910413
Submission received: 14 August 2025 / Revised: 15 September 2025 / Accepted: 16 September 2025 / Published: 25 September 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. As stated in "Data Preparation", real-time and previously recorded video clips spanning 72 hours were extracted. For these 72 hours, a specific time distribution should be provided, such as the proportion of weekday/weekend hours and the duration of peak time periods.

2. In the text, among the OpenPifPaf, YOLOv7, and CenterTracks algorithms, OpenPifPaf was ultimately selected. However, the text only provides a qualitative description of the advantages of OpenPifPaf. It is recommended that the authors supplement comparative data (such as accuracy) of the three algorithms on the same test set.

3. In the "Pedestrian Detection Assessment" paragraph on Page 8 of the article, the detection accuracies of Cameras 1, 2, and 3 are introduced. The results show that the detection performance of Camera 3 is relatively poor. Although a solution is provided later in the text, no specific analysis is conducted on the phenomenon that the detection performance of Camera 3 is unsatisfactory.

4. In the experimental evaluation section of the paper, only two metrics—Precision and IoU—were counted. It is recommended that the other two metrics, Recall and F1-score, should also be counted and analyzed.

5. In the section "Conclusions and Discussions", it is recommended that the text discuss whether there is an optimal number of multi-cameras, specifically addressing why the fields of view (FOVs) of 3 cameras were integrated instead of more.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study examined pedestrian and vehicle trajectories at signalized intersections by integrating three surveillance cameras with overlapping FOVs. A detection framework combining two pose estimation algorithms improved accuracy and addressed issues like background blending, occlusions, and inconsistent tracking. Vehicle key points missed by a single camera were recovered, and a correction procedure reduced perspective distortions. Key points were clustered to reconstruct trajectories, using 10-pixel thresholds for pedestrian feet and 25-pixel thresholds for vehicles. Validation analyzed 33 pedestrian conflicts, classified as eastbound right-turning and northbound left-turning vehicles, using minimum-distance calculations, speed profiles, and TTC values to confirm framework accuracy.

The major problem of this paper is: the concept of traffic conflict is not clearly defined in this paper, and the scope of the paper don't include analyzing traffic conflict. If existing research are targeting traffic conflict, please state the definition in related work, otherwise give the definition in a scientific way. The suggestions is revising the scope of the paper and focusing on object identification and trajectory reconstruction via multi cameras.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript makes a significant contribution to traffic safety research, particularly in the context of vehicle–pedestrian interactions at signalized intersections. The topic is highly relevant and of practical importance, given that traditional safety assessment methods based on crash history have substantial limitations. The authors developed an innovative framework that integrates footage from multiple surveillance cameras and applies convolutional neural networks for detecting and tracking both pedestrians and vehicles. The results clearly demonstrate advantages over single-camera systems and highlight the strong potential for advancing proactive risk assessment tools at intersections. However, before publication, I recommend minor revision.

  • The manuscript is not fully formatted according to the journal’s author guidelines. Please review the structure of section titles, figure formatting, in-text citations, and inconsistencies in font style.
  • The introduction is very detailed. I recommend extending it to cover the following aspects and adding more recent references, such as: https://doi.org/10.3390/infrastructures9120215, https://doi.org/10.56578/mits030104, https://doi.org/10.56578/mits030201.
  • The methodology section is comprehensive and informative, but at times overly lengthy and technically dense. It would be advisable to summarize the main elements in the text and move technical specifications and supplementary procedures to appendices or supporting material, or remove them where appropriate. This would improve the readability of the manuscript.
  • The discussion should be separated from the conclusion.
  • The discussion is well grounded in data, but it should include stronger comparisons with existing studies on conflict detection and multi-camera systems. I understand that the number of such studies is limited, but alternative works on other types of detection (or even double-camera setups) could be referenced. This would broaden the literature base and strengthen the discussion section.
  • The conclusion, in its current form, largely repeats the results. It should be refocused to emphasize practical implications, such as usability for local authorities, traffic engineers, and the development of intelligent monitoring systems.
  • Some sentences are unnecessarily long, and figure captions are also too extensive. Please consider shortening them where possible for clarity.

Overall, the manuscript is well written and represents a valuable contribution to the field. My recommendation is minor revision..

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This manuscript proposes a multi-camera fusion framework for signalized intersections. However, following issues should be addressed before considering for publication.

  1.  Vision fusion is a mature topic, so how this manuscript distinguished from existing research should be clearly clarified. The contribution and innovation should be reorganized to emphasize the major advantage of the proposed method, not just analyze the lacks of existing approaches.
  2. How the authors consider the robustness and reliability of the camera sensors since there are lots of disturbances (noises), stochastic faults (such as ref. doi: 10.1109/TITS.2025.3565857) and possible attacks (such as ref. doi: 10.1109/TASE.2023.3337073) ?
  3. I might be curious about the "vehicle-pedestrian interaction", because it seem like there is only one-way sensing, which is the work done by cameras, so how does the pedestrian react in this proposed framework?
  4. What is the definition of conflict? Please give the clear clarification and more explanations since it is one of the contribution in this manuscript.
  5. The resolutions of fig.1 and fig.2 should be improved.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

  This paper presents a detailed framework for detecting and analyzing vehicle-pedestrian conflicts at signalized intersections using multiple camera feeds and deep neural network-based pose estimation algorithms. The authors propose a multi-camera system that enhances detection precision by addressing challenges such as occlusions and low resolution. Results show significant improvements in detection accuracy, especially for pedestrians, compared to single-camera systems. The study also demonstrates the framework's potential for improving traffic safety assessments. There are some areas that need improvement.

  1. The methodology section is comprehensive but could benefit from a clearer explanation of how the integration of multiple camera systems impacts real-time processing. Consider simplifying the description of the multi-camera integration process to ensure readers understand its practical application.
  2. The choice to use three cameras is mentioned, but the paper would benefit from a more explicit discussion of why this specific number was chosen. Does adding more cameras further improve accuracy, or are there diminishing returns? 
  3. While the data collection methodology is detailed, the analysis of different environmental conditions (e.g., snowy or rainy weather) could be expanded. A deeper comparison of the framework's performance under these challenging conditions would enhance the paper's robustness.
  4. The paper highlights various challenges in pedestrian and vehicle detection. Further insights into how the selected algorithms (e.g., YOLOv7, OpenPifPaf) can be optimized or adapted for different traffic scenarios would be valuable, especially for real-world applications in diverse urban environments.
  5. The accuracy of the detection algorithms is compared to ground truth data, but the paper could improve by providing a more thorough explanation of how this ground truth data was obtained. Was it manually annotated, or is there a more automated method used for validation?
  6. The beginning of the background should highlight the rapid development of connected and automated vehicles, thereby enhancing the research value of this work. The related work “A MAS-Based Hierarchical Architecture for the Cooperation Control of Connected and Automated Vehicles, IEEE Transactions on Vehicular Technology, vol. 72, no. 2, pp. 1559-1573, Feb. 2023” can be referred.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The revised version gives a clear definition of  traffic conflicts as situations where two or more road users move in such a way that their paths would intersect, and a collision would occur unless one or both take evasive action. And it states the scope of the work more clearly that the the framework developed here represents the first step in a multi-level safety assessment process.

Author Response

"Please see the attachment."

Author Response File: Author Response.docx

Reviewer 4 Report

Comments and Suggestions for Authors

No comments since all my concerns are resolved.

Author Response

"Please see the attachment."

Author Response File: Author Response.docx

Reviewer 5 Report

Comments and Suggestions for Authors

   The author has addressed my concerns well. I would recommend the publication of this paper. However, in addition to adding relevant descriptions at the beginning of the background, the authors should also include the related reference citations. The trend of rapid development in CAVs can also refer to the recent work “Enhancing High-Speed Cruising Performance of Autonomous Vehicles Through Integrated Deep Reinforcement Learning Framework, IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 1, pp. 835-848, Jan. 2025”. Furthermore, more description can be added to highlight the core innovations of this paper.

Author Response

"Please see the attachment."

Author Response File: Author Response.docx

Back to TopTop