Next Article in Journal
Reputation Consensus Mechanism for Blockchain Based on Information-Centric Networking
Previous Article in Journal
4D Track Prediction Based on BP Neural Network Optimized by Improved Sparrow Algorithm
 
 
Article
Peer-Review Record

A Novel Machine Vision-Based Collision Risk Warning Method for Unsignalized Intersections on Arterial Roads

Electronics 2025, 14(6), 1098; https://doi.org/10.3390/electronics14061098
by Zhongbin Luo 1,2, Yanqiu Bi 3,4,*, Qing Ye 2, Yong Li 1 and Shaofei Wang 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2025, 14(6), 1098; https://doi.org/10.3390/electronics14061098
Submission received: 13 February 2025 / Revised: 3 March 2025 / Accepted: 5 March 2025 / Published: 11 March 2025
(This article belongs to the Special Issue Computer Vision and Image Processing in Machine Learning)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

From Due to the significant number of traffic incidents involving pedestrians as well as cyclists as well as cyclists, the topic of this study is timely and the way it is presentation is interesting. The structure of the article is correct. The purpose has been formulated and justified.
The purpose of of the conducted research was to develop a predictive system of warning against collision at intersections not connected to traffic lights. The developed system uses YOLOv8 technology to detect objects and the technology of Deep SORT to accurately track pedestrians and non-motorized vehicles, and
then predicts their trajectories using a Bi-LSTM network. The structure of
of the reviewed article is thoughtful and clear. The text consists of six chapters.
A bibliography is included at the end, introducing the reader to the sources the author used in working on the topic. The literature is a useful supplement for the viewer interested in further exploration of the presented subject matter.
The manuscript does not contain shortcomings that necessarily need to be corrected before publication. However, the authors have not identified the research gap that this article fills. Instead, in the conclusion, the capabilities of the developed system are presented, but they are not compared with other such (similar) systems.


Notes on tables and figures:
The paper uses different numbering style “(1) as well as 1.” - I propose to standardize.
The figure caption “Figure 10. Bi-LSTM model schematic diagram.”-should be moved from page 14 to page13.
In line 497 is “Equation (23).” It should be Equation (20).
The table number on page 16 is incorrect “Table A1.” - should be “Table 1.”
I ask the authors to answer some questions:
Can the system be applied to traffic involving autonomous vehicles?
Can the system be applied to traffic involving emergency vehicles (fire department vehicles, ambulance vehicles, police vehicles)?
Do, and if so, what effect do weather conditions have on the work of the collision warning system at intersections not connected to traffic lights? 

Author Response

Thank you very much for your comments and suggestions.

Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing.

 

  1. Comment: The paper uses different numbering style “(1) as well as 1.” - I propose to standardize.

Response: Thanks for your suggestion. All numbering in the text has been standardized to the format “(1), (2), (3), etc.”

 

  1. Comment: The figure caption “Figure 10. Bi-LSTM model schematic diagram.”-should be moved from page 14 to page13.

Response: Thanks for your suggestion. The caption for “Figure 10. Bi-LSTM model schematic diagram” has been moved to page 13.

 

  1. Comment: In line 497 is “Equation (23).” It should be Equation (20).

Response: Thanks for your suggestion. The incorrect reference to “Equation (23)” in line 497 has been corrected to Equation (20).

 

  1. Comment: The table number on page 16 is incorrect “Table A1.” - should be “Table 1.”.

Response: Thanks for your suggestion. The table labeled “Table A1” on page 16 has been renumbered as Table 1.

 

  1. Comment: Can the system be applied to traffic involving autonomous vehicles?

Response: Yes, Our system is inherently compatible with autonomous vehicles (AVs). The collision risk warning framework—comprising real-time object detection, tracking, and trajectory prediction—can serve as a supplementary safety layer for AVs. By integrating the system’s output (e.g., PCRA risk levels) into AV decision-making modules, autonomous vehicles could dynamically adjust their motion planning to avoid conflicts at unsignalized intersections. However, further testing would be required to validate interoperability with specific AV communication protocols (e.g., V2X).

 

  1. Comment: Can the system be applied to traffic involving emergency vehicles (fire department vehicles, ambulance vehicles, police vehicles)?

Response: The system can detect and track emergency vehicles using the YOLOv8+Deep SORT framework. Our system can detect and track emergency vehicles using the same YOLOv8+Deep SORT framework, as it is trained to recognize general vehicle categories. However, to prioritize emergency vehicles, additional logic would need to be implemented. For instance: (1) Priority Tagging: Emergency vehicles could be identified via specific visual markers (e.g., flashing lights) or acoustic signatures (sirens) using auxiliary sensors. (2) Warning Adjustments: The collision risk algorithm could dynamically elevate warnings for other road users when emergency vehicles are detected, ensuring timely right-of-way clearance. This extension would require dataset augmentation with emergency vehicle annotations and scenario-specific training.

To further enhance prioritization and reliability, V2X (Vehicle-to-Everything) communication could be integrated into the system. If roadside units (RSUs) and vehicles are equipped with V2X capabilities, the system could directly receive real-time information about emergency vehicle types, trajectories, and priority status via standardized protocols (e.g., SAE J2735).

(1) V2X-Based Identification: Emergency vehicles could broadcast their classification (e.g., "ambulance," "fire truck") and operational status (e.g., "en route to emergency") through dedicated short-range communication (DSRC) or cellular-V2X (C-V2X). This would allow the system to bypass reliance on visual detection alone, ensuring accurate identification even in low-visibility conditions.

(2) Dynamic Warning Optimization: Upon receiving V2X signals, the system could immediately issue high-priority warnings to other road users, clearing paths for emergency vehicles. For example, variable message signs (VMS) could display "Emergency Vehicle Approaching" alerts, while non-emergency vehicles receive speed adjustment recommendations via in-vehicle interfaces.

(3) Hybrid Detection Framework: Combining V2X data with the existing vision-based tracking system would create redundancy, improving robustness. For instance, if a V2X-equipped emergency vehicle is occluded visually, its trajectory could still be tracked via V2X-reported coordinates.

 

  1. Comment: Do, and if so, what effect do weather conditions have on the work of the collision warning system at intersections not connected to traffic lights?

Response: Weather conditions (e.g., rain, fog, low-light) can impact the system’s detection and tracking accuracy, as the current implementation relies on visual cameras. To mitigate this:

(1) Robustness Enhancements: The model already employs data augmentation techniques (e.g., HSV adjustments, Mosaic augmentation) to improve resilience to lighting variations and partial occlusions.

(2) Multi-Sensor Fusion: In future work, integrating LiDAR or radar data could complement visual inputs under adverse weather.

In field tests (Section 5.3), the system demonstrated stable performance in moderate rain and overcast conditions, but heavy fog or extreme low-light scenarios may require additional hardware or algorithmic adaptations.

 

Additional Note on Research Gap Clarification:

We have added a paragraph in the Introduction (Section 1.2) to explicitly highlight the research gap addressed by this work: "While existing studies focus on trajectory prediction for individual road users (e.g., vehicles or pedestrians alone), this work uniquely addresses the interaction dynamics and collision risks between heterogeneous road users (vehicles, pedestrians, non-motorized vehicles) at unsignalized intersections. Furthermore, unlike prior systems optimized for signalized intersections, our framework integrates real-time risk estimation tailored to uncontrolled environments, filling a critical gap in proactive safety solutions for such high-risk scenarios."

 

Thank you for your valuable feedback, which has significantly strengthened the clarity and impact of our manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The article concerns the development of a modern warning system for the risk of collisions at unsignalized intersections of arterial roads. The authors use a combination of YOLOv8 technology (for object detection), Deep SORT (for trajectory tracking) and Bi-LSTM (for traffic prediction). The article presents a detailed analysis of the problem, methodology and experimental results, which demonstrate the high effectiveness of the system.

I really like the introduction, divided into: Background and Motivation, Problem Statement, Organization of the Paper. The Literature Review and the rest of the study are presented against this background. The structure of the article is therefore clear and the study is well justified. The work is well-structured, contains solid theoretical foundations, and the research methodology has been reliably described. The experiments confirm the effectiveness of the proposed solution, which makes the article a valuable contribution to the field of road safety and intelligent transport systems.

I appreciate the use of a modern technological approach, good justification of the problem based on specific statistical data on road accidents, which emphasizes the importance of the discussed issue. The most important thing is solid experiments conducted in real conditions and a thorough analysis of the results.

I have only a few comments:
1. Lack of analysis of the system limitations - although the article mentions some challenges, such as variable lighting conditions, there was no in-depth analysis of the potential limitations of the model.

2. Also add a few sentences about the possibility of extending it to other factors affecting road traffic.

3. The article should be improved editorially and visually in matters such as:
Font size - for example, compare formula 21 and 22
Blur, stretching in the figure, e.g. fig.7,

Double dots, e.g. line 483
Missing spaces, e.g. figure 11, line 336

Author Response

Thank you very much for your comments and suggestions.

Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing.

 

  1. Comment: Lack of analysis of the system limitations - although the article mentions some challenges, such as variable lighting conditions, there was no in-depth analysis of the potential limitations of the model.

Response: Thanks for your suggestion. A dedicated subsection titled "5.4 Limitations" has been added to the Results and Discussion section (lines 566-577 in the revised manuscript).

 

  1. Comment: Also add a few sentences about the possibility of extending it to other factors affecting road traffic.

Response: We sincerely thank the reviewer for highlighting the importance of discussing the system’s extensibility. In response to this suggestion, we have added the following paragraph to the Conclusions section (lines 612-623 in the revised manuscript):

"The proposed framework is inherently adaptable to address diverse traffic scenarios beyond unsignalized intersections. For instance, integrating real-time geofencing data (e.g., via V2X or digital twin platforms) could enable dynamic risk estimation for temporary traffic configurations such as construction zones or detours. Additionally, incorporating pedestrian intent recognition modules—leveraging gaze estimation or skeletal pose prediction—would enhance trajectory forecasting accuracy in cases of abrupt pedestrian direction changes. Future work will explore fusion with crowd-sourced traffic data (e.g., trajectory patterns, near-miss incidents) to refine situational awareness in complex urban environments. This extensibility underscores the system’s potential as a foundational component of next-generation intelligent transportation systems."

This addition explicitly addresses the system’s adaptability to external traffic factors and outlines actionable pathways for future research, aligning with the reviewer’s recommendation to broaden the discussion on extensibility.

 

  1. Comment: The article should be improved editorially and visually in matters such as:

Font size - for example, compare formula 21 and 22

Blur, stretching in the figure, e.g. fig.7,

Double dots, e.g. line 483

Missing spaces, e.g. figure 11, line 336?

Response: Thanks for your suggestion. All formatting issues have been resolved.

 

Thank you for this insightful feedback, which has strengthened the practical relevance and forward-looking perspective of our work.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper deals with the design of a warning system (based on audible and visual alarms) for unsignalized locations of intersecting traffic roads in order to increase their safety.

The basic problem here is vehicle and pedestrian detection. The authors build on the Yolov8 detector and implement a Bi-LSTM network for trajectory prediction.

They used a large dataset of 30,000 images to train the network.

The effectiveness of the proposed system has been verified in operation in September 2024 on the G310 motorway with 97% accuracy achieved.    

Some wording could be clarified or added. 

  • Page 10, lines 345-346: How to determine the time interval between adjacent frames? Obviously, this value should depend on the speed of the vehicles, shouldn't it?
  • 10, l. 343: Is it appropriate to assume linearity of change between consecutive video frames?
  • On page 15, you list five metrics for object detection and, in part, their advantages or disadvantages. But which one do you recommend as the most appropriate to use in a given application?
  • In Table A1 you present the results of the metrics with the system gradually modified with additional features, but the comparison is within the stages of your design, lacking the comparison with the results in the papers of other authors that you cite in Section 1. Does the final version of the system give better results than competing solutions?
  • The 2014 tests showed a prediction accuracy of 97%. Will the proposed system be deployed in practice? What will be the economic cost per installation? 
  • Can a system tuned on the dataset used be considered universal for all unsignalized crossings? The intersections may vary in the number of lanes on the road, and in larger numbers there may be additional problems with shadowing by vehicles in another lane. 
  • How does such accuracy affect the number of accidents? Is it possible to estimate the number of crashes, and possibly the number of injured, and quantify the decrease in these values?  

Typos, format:

The typography of the equations does not seem to use LaTeX style.

  1. 2, l. 85-86: “vision technologies offers new opportunities“ – “… technologies offer …“
  2. 4, l. 141-142: “unsignalized intersections that integrates“ – “… integrate“
  3. 3, l. 126: “[18]and“ – “[18] and“ (inserted space)
  4. 4, l. 144: “tracking of Pedestrians“ – “tracking of pedestrians“
  5. 5, l. 187: “prevention. many studies“ – “prevention. Many studies“
  6. 8, l. 288: Deep SORT (Simple Online and Realtime Tracking) algorithm – this abbreviation has been mentioned many times on previous pages of the text and its meaning should therefore be given much earlier than on page 8

There should be a space in the headings after the section numbers 3.4, 3.4.1, 3.4.2

  1. 10, l. 370: “t)are“ – “ t) are“
  2. 11: the font size in Equation (9) is different from Eq. (10) and (11); similarly for Eq. (21) and (22) on page 15
  3. 14: The label of Figure 10 should not be on the other side
  4. 15, l. 483: “variability..“ – “variability.“
  5. 15, l. 583: “classes..“ – “classes.“

Author Response

Thank you very much for your comments and suggestions.

Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing.

 

  1. Page 10, lines 345-346: How to determine the time interval between adjacent frames? Obviously, this value should depend on the speed of the vehicles, shouldn't it?

Response: Thanks for your suggestion. The time interval between adjacent frames () is determined by the camera’s frame rate, which is fixed at 25 fps in our implementation. This value is hardware-dependent and standardized for traffic monitoring systems to ensure temporal consistency. While vehicle speed influences displacement between frames, the fixed ∆? simplifies real-time processing and aligns with industry practices for video-based traffic analysis. For high-speed scenarios (e.g., highways), higher frame-rate cameras could be adopted in future deployments, though this would require recalibrating the coordinate mapping model.

 

  1. Comment: 10, l. 343: Is it appropriate to assume linearity of change between consecutive video frames?

Response: Thanks for your suggestion. The linear motion assumption between consecutive frames is a simplification valid for short time intervals ( = 0.04 s at 25 fps). Over such brief periods, vehicle trajectories can be approximated as linear with minimal error. For non-linear maneuvers (e.g., sharp turns), the piecewise linearization approach (Equation 6) ensures compatibility by segmenting trajectories into quasi-linear segments. This method balances computational efficiency and accuracy, as validated by the 97% prediction accuracy in field tests.

 

  1. Comment: On page 15, you list five metrics for object detection and, in part, their advantages or disadvantages. But which one do you recommend as the most appropriate to use in a given application?

Response: Thanks for your suggestion. In Table 1 (formerly Table A1), we prioritize mAP50-95 as the most comprehensive metric for evaluating detection performance, as it averages precision across IoU thresholds from 0.5 to 0.95, reflecting robustness to localization errors. While mAP50 is widely used in traffic applications for its focus on moderate overlaps, mAP50-95 better aligns with safety-critical systems requiring precise bounding boxes.

 

  1. Comment: On page 15, you list five metrics for object detection and, in part, their advantages or disadvantages. But which one do you recommend as the most appropriate to use in a given application?

Response: Thanks for your suggestion. In Table 1 (formerly Table A1), we prioritize mAP50-95 as the most comprehensive metric for evaluating detection performance, as it averages precision across IoU thresholds from 0.5 to 0.95, reflecting robustness to localization errors. While mAP50 is widely used in traffic applications for its focus on moderate overlaps, mAP50-95 better aligns with safety-critical systems requiring precise bounding boxes.

 

  1. Comment: In Table A1 you present the results of the metrics with the system gradually modified with additional features, but the comparison is within the stages of your design, lacking the comparison with the results in the papers of other authors that you cite in Section 1. Does the final version of the system give better results than competing solutions?

Response: Thanks for your suggestion. Our system’s performance has been benchmarked against two state-of-the-art methods: YOLOv7+FairMOT and EfficientDet+CenterTrack. The proposed framework achieves a 4.2% higher mAP50-95 (84.5% vs. 80.3%) and reduces pedestrian trajectory prediction errors by 18%, demonstrating superior accuracy and robustness in unsignalized intersection scenarios. These improvements stem from architectural enhancements such as the ReContext gradient composition feature pyramid and optimized Bi-LSTM temporal modeling, which better handle occlusions and multi-scale object detection compared to prior works.

 

  1. Comment: The 2014 tests showed a prediction accuracy of 97%. Will the proposed system be deployed in practice? What will be the economic cost per installation?

Response: Thanks for your suggestion. Your concern about the system’s applicability to diverse unsignalized intersections is well-founded. While our current model achieves high accuracy in the tested scenarios (1~2 lanes per direction), performance in multi-lane intersections with complex occlusion patterns may require additional optimizations. To address this, we explicitly acknowledge this limitation in the revised Discussion section and propose two mitigation strategies: (1) Multi-Camera Fusion: Deploying cameras at multiple angles to reduce blind spots caused by lane-specific occlusions. (2) Adaptive Training: Fine-tuning the model with lane-count-specific datasets to enhance generalization.

The system has been deployed at some unsignalized intersections in the G310 project. Each installation includes: (1) Hardware: 2 radar units, 2 cameras, 1 edge computing device (NVIDIA Jetson AGX Orin), 2 variable message signs (VMS), and 2 audio-visual warning devices. (2) Cost: Approximately ¥130,000 (USD 18,000) per intersection, covering hardware procurement (radars, cameras, edge device: ¥50,000; VMS and warning devices: ¥80,000), installation, and calibration. (3) Maintenance: Annual costs average ¥20,000 (USD 2,800) for sensor recalibration, software updates, and device upkeep.

  1. Comment: Can a system tuned on the dataset used be considered universal for all unsignalized crossings? The intersections may vary in the number of lanes on the road, and in larger numbers there may be additional problems with shadowing by vehicles in another lane.

Response: Thanks for your suggestion. Thanks for your suggestion. Your concern about the system’s applicability to diverse unsignalized intersections is well-founded. While our current model achieves high accuracy in the tested scenarios (1~2 lanes per direction), performance in multi-lane intersections with complex occlusion patterns may require additional optimizations. To address this, we explicitly acknowledge this limitation in the revised Discussion section and propose two mitigation strategies: (1) Multi-Camera Fusion: Deploying cameras at multiple angles to reduce blind spots caused by lane-specific occlusions. (2) Adaptive Training: Fine-tuning the model with lane-count-specific datasets to enhance generalization.

 

  1. Comment: How does such accuracy affect the number of accidents? Is it possible to estimate the number of crashes, and possibly the number of injured, and quantify the decrease in these values?

Response: Thanks for your suggestion. While direct accident reduction metrics require long-term observation, preliminary data from the G310 deployment (3-month post-installation) indicate a 42% decrease in near-miss incidents (e.g., sudden braking, evasive maneuvers) and improved road user awareness, as reported in driver/pedestrian surveys. These results suggest that the system’s 97% prediction accuracy effectively mitigates collision risks, though longitudinal studies (planned over 2 years) will quantify casualty reductions.

 

  1. Comment: Typos, format:

 

The typography of the equations does not seem to use LaTeX style.

 

2, l. 85-86: “vision technologies offers new opportunities“ – “… technologies offer …“

4, l. 141-142: “unsignalized intersections that integrates“ – “… integrate“

3, l. 126: “[18]and“ – “[18] and“ (inserted space)

4, l. 144: “tracking of Pedestrians“ – “tracking of pedestrians“

5, l. 187: “prevention. many studies“ – “prevention. Many studies“

8, l. 288: Deep SORT (Simple Online and Realtime Tracking) algorithm – this abbreviation has been mentioned many times on previous pages of the text and its meaning should therefore be given much earlier than on page 8

There should be a space in the headings after the section numbers 3.4, 3.4.1, 3.4.2

 

10, l. 370: “t)are“ – “ t) are“

11: the font size in Equation (9) is different from Eq. (10) and (11); similarly for Eq. (21) and (22) on page 15

14: The label of Figure 10 should not be on the other side

15, l. 483: “variability..“ – “variability.“

15, l. 583: “classes..“ – “classes.“

Response: Thanks for your suggestion. All noted issues have been corrected. In addition, the equations were formatted using Microsoft Word’s native equation editor to comply with the journal’s Word template requirements. If LaTeX-style formatting is preferred, we will gladly revise all equations accordingly.

 

Thank you for this insightful feedback, which has strengthened the practical relevance and forward-looking perspective of our work.

Author Response File: Author Response.docx

Back to TopTop