Wearable AR System for Real-Time Pedestrian Conflict Alerts Using Live Roadside Data
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsReview on article “Wearable AR System for Real-Time Pedestrian Conflict Alerts Using Live Roadside Data”
Manuscript ID electronics-3327016
Augmented Reality (AR) is a powerful technology that brings a range of advantages to real-life applications.AR-based navigation tools offer real-time guidance by overlaying directions onto the live view of streets or indoor spaces, making it easier for users to navigate complex environments. This seamless integration of digital directions into the real world enhances convenience and reduces the stress of getting lost, especially in unfamiliar locations.
The proposed article opens up other possibilities for AR, namely as a means of helping pedestrians from auto-transport hazards on the roads. I find this research relevant, interesting, and useful. However, I have a few general and specific comments about the publication.
General comments
1. It is not entirely clear from the text of the article whether the object for testing was moving or static, because this is important from the point of view of determining the relevance, accuracy and delay of data transmission to the user. If the test was performed on a static object, then I believe that such testing is not sufficient to assert the accuracy of the digital representation of an object from the real world. If the test was performed on a moving object, then this fact should be clearly indicated in the article
2. In addition, it is important to find out how the increase in users of such an application will affect the speed of data transfer (and therefore its relevance); testing has not been conducted in this direction.
3. In section 2.2.2., which describes one of the key parts of the system's functioning, many methods and algorithms are mentioned quite descriptively. In my opinion, it would be good to add references to literature and a diagram explaining the workflow of this part.
Specific comments
1. It is better to combine paragraphs 2.1. and 2.1.1 into one. In particular, it is better to make figures 1 and 2 into one, combining the devices together with the corresponding functionality for a more holistic perception and understanding.
2. In formula (3) there is an error, written
receivedPointArray[framePointCount].y = cse × elevationAnglesCos[laserID], and it should be
receivedPointArray[framePointCount].y =cse × azimuthAnglesCos[azimuthIndex]
This format for presenting formulas (1)-(4), in my opinion, is not easy to read; it is better to write them in accepted mathematical notation.
3. Figure 3 needs to be supplemented with projections on the coordinate axis.
4. Equation (5) specifies the coordinates of the displacement vector, so it probably also requires a corresponding mathematical notation
5. Line 356 “At first, the point cloud was displayed in a grayscale format, but a new material was later found to provide a colored point cloud.” What the phrase mean? Which materials?
6. Line 368 “Determining the sensor’s location in the real world is done by the user, who can manipulate a digital sphere object in the AR world to place it at the LiDAR sensor’s location. Once they are ready, the camera will be digitally shifted so that the point cloud lines up with the environment”. I don’t understand what this mean.
Comments for author File: Comments.pdf
Author Response
Review 1:
Comment 1: It is not entirely clear from the text of the article whether the object for testing was moving or static, because this is important from the point of view of determining the relevance, accuracy and delay of data transmission to the user. If the test was performed on a static object, then I believe that such testing is not sufficient to assert the accuracy of the digital representation of an object from the real world. If the test was performed on a moving object, then this fact should be clearly indicated in the article
Response 1: Thank you for pointing this out. We have accordingly revised and changed the results to better emphasize this point at lines 423 – 428 on page 13: “The objects sensed by the LiDAR sensors and visualized through the AR devices in this study are all moving objects, such as conflicting vehicles. These moving objects are of particular interest because they may pose potential conflicts with vulnerable road users. The proposed LiDAR-AR system was specifically developed to detect these moving objects and provide real-time visualization alerts via AR devices, ensuring timely and accurate responses.”
Comment 2: In addition, it is important to find out how the increase in users of such an application will affect the speed of data transfer (and therefore its relevance); testing has not been conducted in this direction. .
Response 2: Thank you for your comment. The proposed LiDAR-AR system involves several key steps: LiDAR sensing, transmitting sensing results to a data server, geolocating moving objects on the server, communicating geolocated object information to individual AR devices, and visualizing the sensed conflict objects on those devices. Of these steps, the increase in users would primarily impact the communication of geolocated object information to individual AR devices.
Our prototype leverages WebSocket technology, which is commonly employed in real-time systems such as multiplayer gaming and financial data platforms. These applications effectively handle thousands of simultaneous users with robust performance under heavy loads. While testing to determine the specific impact of increased user numbers on data transfer speeds and system relevance has not yet been conducted, the underlying technology demonstrates the system's capability to scale and accommodate a growing user base effectively.
Given the proven scalability of WebSocket-based systems, we are confident that our system can also manage multiple users without significant impact on data transfer speed. Nonetheless, we agree that conducting direct performance tests in a multi-user environment would provide valuable data to further validate the system's scalability and efficiency.
We have chosen to clearly more clearly emphasized this point at lines 219 – 227 on page 6 in the system design section: “In addition, the system is designed with multi-user compatibility, though for the prototype this was not actually implemented. While specific testing has not yet been conducted to evaluate the impact of increasing the number of users on data transfer speeds and system relevance, similar systems that utilize WebSocket technology—such as those used in multiplayer gaming and real-time financial platforms—have demonstrated the ability to handle thousands of simultaneous users effectively. These established use cases suggest that our system would remain scalable and efficient in multi-user scenarios. However, further testing in a multi-user environment would provide valuable insights and validate this scalability under real-world conditions.”
Comment 3: In section 2.2.2., which describes one of the key parts of the system's functioning, many methods and algorithms are mentioned quite descriptively. In my opinion, it would be good to add references to literature and a diagram explaining the workflow of this part.
Response 3: Thank you for your comment. We agree with your point, and have accordingly found and added related citations as follows on lines 235 – 265 on page 7: “Feature selection is crucial in supervised classification. With LiDAR sensors adept at capturing object surfaces, seven dimension-centric features were derived: 3D distance, point count, direction, height and its variance, as well as 2D length and area. The interaction between these features, showcasing varied distribution patterns for different road users, was assessed. On closer scrutiny of feature importance, the 2D length feature often emerged as pivotal in classification. However, the traditional calculation of this feature was influenced by several external factors. To address this, we utilized continuous tracking trajectories, refining the 2D length feature to better represent an object's actual dimensions. In the classification stage, we employed four supervised classification techniques: Artificial Neural Network (ANN), Random Forest (RF), Adaptive Boosting (AdaBoost), and Random Undersampling Boosting (RUSBoost). Each method brings distinct benefits: ANN refines its predictions using error feedback, RF employs multiple decision trees, AdaBoost prioritizes previously misclassified data, and RUSBoost targets underrepresented classes in the training dataset [28, 29, 30]. To classify pedestrians and skateboard/scooters, the speed patten was added to the previous seven features. To predict trajectory, we used LSTM with Bayesian Optimization. Long Short-Term Memory (LSTM) networks, a subset of the standard RNN, excel at managing extended sequence data [31]. They employ memory cells and gates to process and refresh trajectory information. Four distinct architectures have been explored: Linear, Densely Connected, Multi-Branch, and Feature Pyramid. These networks utilize historical data, typically spanning 20 frames, to anticipate future vehicle positions. A set of seven essential hyperparameters, relating to both architecture and training, are optimized using Bayesian techniques. This approach supplants exhaustive search methods with a more streamlined strategy, ideal for intricate tasks with numerous dimensions. The objective function seeks to minimize the Root Mean Square Error (RMSE) between the actual and projected x and y coordinates of vehicle trajectories. Bayesian optimization, employing Gaussian Process regression and the Expected Improvement (EI) acquisition function, identifies optimal hyperparameters without resorting to exhaustive search methods [32]. By harnessing real-time LiDAR data, this system presents a comprehensive strategy to augment safety at intersections and other vital roadway segments. With continuous enhancements, such an infrastructure holds the potential to markedly elevate safety provisions for all road users.”
Comment 4: It is better to combine paragraphs 2.1. and 2.1.1 into one. In particular, it is better to make figures 1 and 2 into one, combining the devices together with the corresponding functionality for a more holistic perception and understanding.
Response 4: We agree with this point, and to address this, we have combined figures 1 and 2 into one figure, and have merged 2.1 and 2.1.1 into one section called 3.1 System Design; this can be found from lines 184 – 227 on pages 5 and 6.
Comment 5: In formula (3) there is an error, written receivedPointArray[framePointCount].y = cse × elevationAnglesCos[laserID], and it should be receivedPointArray[framePointCount].y =cse × azimuthAnglesCos[azimuthIndex]. This format for presenting formulas (1)-(4), in my opinion, is not easy to read; it is better to write them in accepted mathematical notation.
Response 5: Thank you for your comment. Regarding the equation, the calculation aligns with the specific orientation of Unity's game-engine coordinate system. To correctly display the point cloud, the y-coordinate is determined by multiplying the measured distance with the cosine of the elevation angle for the corresponding laserID. As for the formatting, we have simplified the equations to enhance clarity by using straightforward x, y, and z notations and explicitly highlighting the values where sine calculations are applied.
Comment 6: Figure 3 needs to be supplemented with projections on the coordinate axis.
Response 6: Agree. To address this comment, we have modified the figure to incorporate projections to respective coordinate axes on page 9.
Comment 7: Equation (5) specifies the coordinates of the displacement vector, so it probably also requires a corresponding mathematical notation
Response 7: We agree with this comment. To address it, the notation of the formula has been changed accordingly to indicate that it is a displacement vector on page 11.
Comment 8: Line 356 “At first, the point cloud was displayed in a grayscale format, but a new material was later found to provide a colored point cloud.” What the phrase mean? Which materials?
Response 8: Thank you for your comment. In Unity, a "material" refers to a graphical property that defines how an object’s surface appears, including its color, texture, and visual effects. We agree that the language is ambiguous, and as such have made the following modification to the sentence at lines 371 – 377 on page 11 to better describe this: “In Unity, a "material" refers to a graphical property that determines how an object’s surface appears, including its color, texture, and visual effects. Initially, the point cloud was displayed using Unity's default material, which rendered it in grayscale. To enhance the visualization, a custom Unity setting was implemented to assign colors to the points based on their attributes (e.g., height, intensity). This update enabled the point cloud to be displayed in color, making it more informative and visually intuitive for users.”
Comment 9: Line 368 “Determining the sensor’s location in the real world is done by the user, who can manipulate a digital sphere object in the AR world to place it at the LiDAR sensor’s location. Once they are ready, the camera will be digitally shifted so that the point cloud lines up with the environment”. I don’t understand what this mean.
Response 9: We agree that the language used in that section was ambiguous, so we modified the paragraph to clarify the process on lines 387 – 399 on pages 11 - 12: “When using a new AR device for the first time with the proposed LiDAR-AR system, the AR user must calibrate the AR visualization coordinate system to align with the LiDAR coordinate system. This calibration process involves adjusting a virtual 'LiDAR Sensor digital sphere object' within the AR view, which is automatically generated by the system for this purpose. The AR user manually moves this digital sphere object in the AR environment to align it with the physical LiDAR sensor, which is visible through the device's lens. This step utilizes a general AR functionality that allows users to manipulate virtual objects within a 3D space, ensuring the digital and physical systems are synchronized. Once the digital sphere is correctly positioned, the system uses this alignment to digitally adjust the AR camera. This ensures the point cloud data aligns accurately with the physical environment, providing a realistic and precise representation within the AR interface (Figure 5).”
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this paper, the authors proposes a novel approach to enhancing pedestrians’ safety using a wearable augmented reality (AR) system integrated with live roadside light detection and range (LiDAR) sensor data. The effectiveness of this system has been demonstrated and evaluated through various tests including latency measurements. Also, the authors discuss the potential of this system with respect to vehicle-to-vehicle (V2V) systems and other societal benefits. Overall, the research in meaningful and interesting. However, the following drawbacks should be clarified.
(1) There is no doubt that the research is innovative in engineering. However, it seems to be less innovative in scientific research. Please clarify the main innovation points of this paper in terms of scientific research.
(2) The Equation of (1)-(4) should be expressed using the physical quantity.
(3) The physical meaning of Eq. (5) is unclear.
(4) The authors state that the LSTM with Bayesian Optimization is employed to predict trajectory. However, there are almost no description in the paper. The introduction of this method should be detailed described in the paper. Also, the trajectory prediction should be discussed.
(5) The conclusion part is missing.
(6) The experiment results are not incomprehensive. The results of processing the raw LiDAR data and the trajectory prediction are not discussed. Also, the comparisons with other methods should be conducted.
Comments on the Quality of English LanguageThe Quality of English Language is OK.
Author Response
Review 2:
Comment 1: There is no doubt that the research is innovative in engineering. However, it seems to be less innovative in scientific research. Please clarify the main innovation points of this paper in terms of scientific research.
Response 1:
We appreciate your thoughtful feedback and would like to clarify the contributions of our paper. While this work is primarily an engineering-focused study, aimed at developing and implementing a novel system for real-world applications, it also makes contributions at the scientific level, as detailed below:
Integration of Engineering and Scientific Approaches: The paper primarily emphasizes the practical engineering development of a wearable AR system integrated with roadside LiDAR data for pedestrian safety. However, the integration of AR and LiDAR, coupled with the innovative data processing pipeline, contributes to the scientific understanding of how these technologies can work together to solve complex real-world problems.
Human-Centric Visualization for Collision Risk Mitigation: From a scientific perspective, our system advances the understanding of how AR visualization can effectively deliver real-time, spatially located safety alerts to vulnerable road users. This human-centric design highlights an innovative approach to combining data processing, visualization, and behavioral safety in real-world contexts.
Broader Implications and Cross-Disciplinary Contributions: While the paper is rooted in engineering, the methodologies and findings contribute to scientific fields such as transportation safety, urban computing, and human-machine interaction. For instance, the insights on real-time data visualization and spatial awareness may inform future research in other domains, including industrial safety, emergency response, and smart city planning.
Comment 2: The Equation of (1)-(4) should be expressed using the physical quantity.
Response 2: Thank you for your comment. We appreciate your suggestion. To enhance the clarity of Equations (1)-(4), we have revised them to explicitly on page 8 to represent the physical quantities involved. Specifically, we use symbols such as α and β to denote the elevation and azimuth angles, respectively. Additionally, we have simplified the terms to "x," "y," and "z" to maintain a clear and concise representation.
Comment 3: The physical meaning of Eq. (5) is unclear.
Response 3: Thank you for your comment. We recognize that Equation 5 may initially seem ambiguous in its current context. To address this, we have updated the figure caption on lines 367 and 368 to explicitly reference the equation, clarifying that the coordinate shift in the augmented reality plane is determined by the displacement vector defined in Equation 5. Additionally, we have revised the structure of the equation on page 11, incorporating a clearer set of parentheses to improve the representation of vector notation.
Comment 4: The authors state that the LSTM with Bayesian Optimization is employed to predict trajectory. However, there are almost no description in the paper. The introduction of this method should be detailed described in the paper. Also, the trajectory prediction should be discussed.
Response 4: Thank you for your comment. We would like to point out that coauthor Zhihui Chen will be publishing a dissertation that extensively covers the LiDAR data processing methods employed in this project. As such, the focus of this manuscript is not on developing new LiDAR processing techniques but on leveraging pre-existing methods to create an innovative system for pedestrian safety.
Regarding the LSTM with Bayesian Optimization, we acknowledge that its introduction and details are minimal in this manuscript. This is because the trajectory prediction component using LSTM is part of a separate manuscript currently under preparation, which will delve into its methodology and analysis in greater detail. Including the full description of the LSTM model here would extend beyond the intended scope of this manuscript.
Comment 5: The conclusion part is missing.
Response 5: Thank you for your comment. Our discussion section was supposed to represent our conclusion, but to make this clearer, we have revised the heading to be “Discussion and Conclusion” on line 441 of page 14.
Comment 6: The experiment results are not incomprehensive. The results of processing the raw LiDAR data and the trajectory prediction are not discussed. Also, the comparisons with other methods should be conducted.
Response 6: We appreciate the reviewer’s comment and the opportunity to address the perceived gaps in our experimental results.
The primary focus of this paper is to present an innovative engineering solution that integrates augmented reality (AR) with real-time roadside LiDAR sensing to enhance pedestrian safety by enabling advanced vision capabilities for users to perceive occluded risks. While trajectory prediction and LiDAR data processing are integral components of the system, the detailed methodologies and experimental evaluations for these aspects are documented in previously published works, including the referred dissertation by the co-authors. The trajectory prediction was mainly used for object tracking in the LiDAR sensing range, which can then be used to calculate speeds and moving directions of moving objects
To provide clarity and context, we have added references to the relevant works where the detailed algorithms and experimental results on LiDAR data processing and trajectory prediction are extensively discussed. This allows us to maintain the focus of this paper on the unique contribution of integrating AR and LiDAR to address safety challenges while acknowledging the foundational methodologies that support our system.
As for comparisons with other methods, the primary innovation of this work lies in the integration and application of AR and real-time LiDAR for pedestrian safety, which has not been directly addressed in existing solutions. However, we recognize the value of benchmarking and will consider adding comparative analysis in future studies to further validate the system’s efficacy.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper proposed a method for Pedestrian Conflict Alerts. I have some concerns:
1. The authors should release the code and data.
2. The quality of Figure 4 should be improved.
3. The language of this paper should be improved.
4. The following papers can be used to improve the proposed method:
1) Generative adversarial and self-supervised dehazing network
2) Semantic-aware dehazing network with adaptive feature fusion
Comments on the Quality of English Language
The language of this paper should be improved.
Author Response
Review 3:
Comment 1: The authors should release the code and data.
Response 1: We appreciate the reviewer’s suggestion to release the code and data. The innovation presented in this paper is currently undergoing an internal intellectual property (IP) review at the University of Nevada, Reno, which restricts us from sharing the code at this stage. However, we can provide sample data collected from our intersection tests to support reproducibility and further exploration by interested researchers.
Additionally, we want to clarify that in the United States, publishing a research paper does not conflict with submitting a patent application. Innovators have up to one year after publishing a related research paper to file a patent application. This ensures that the dissemination of academic knowledge and the pursuit of IP protection can coexist.
Comment 2: The quality of Figure 4 should be improved.
Response 2: Thank you for your feedback. We agree that the quality of Figure 4 could be improved. To address this, we have replaced the original figure on page 9 with a higher-quality image of a LiDAR-generated point cloud, which now includes clearly defined boxes highlighting objects such as vehicles and pedestrians.
Comment 3: The language of this paper should be improved.
Response 3: Thank you for your comment. We agree that there are some parts of the paper language that can be improved, and so throughout the paper we have revised various parts to enhance clarity.
Comment 4: The following papers can be used to improve the proposed method:
1) Generative adversarial and self-supervised dehazing network
2) Semantic-aware dehazing network with adaptive feature fusion
Response 5: Thank you for your comment. We agree that those could be used to improve our proposed method, so address these papers within the discussion and conclusions section on lines 476 – 483 on pages 14 and 15: “While the current system performs well in standard conditions, its effectiveness in environments with reduced visibility, such as haze or fog, could be further optimized. Future work could explore integrating advanced dehazing methods, such as the generative adversarial and self-supervised dehazing network [30], which improves the relationship between hazy and haze-free data, or the semantic-aware dehazing network [31], which enhances clarity through adaptive feature fusion. These approaches could enable the system to maintain high accuracy and usability across a wider range of environmental scenarios.”
Reviewer 4 Report
Comments and Suggestions for AuthorsThis paper proposes a technical method to predict pedestrian collision risk based on LiDAR data and visualize it in real time using AR. In particular, it is noteworthy that the introduction of 5G-based high-speed data transmission technology minimizes the processing burden of AR devices and designs an efficient system structure that processes key data on the server.
In addition, the analysis and prediction of road user movements using trajectory prediction algorithms using LSTM and Bayesian Optimization is considered an important contribution that suggests the possibility of developing a traffic safety system. This is an attempt to effectively resolve driver dependency and environmental constraints, which have been pointed out as limitations of existing traffic safety systems, and has great academic and practical significance.
As a major strength, the integrated use of LiDAR data and AR technology is likely to produce results that effectively improve risk recognition capabilities in pedestrian safety systems. This is considered an important approach that can overcome the existing limitations of smart intersections and pedestrian safety management systems.
However, the proposed method has the following limitations.
1. The proposed system has high implementation complexity including AR devices and LiDAR sensors, so it may be difficult to commercialize. In particular, the cost issue of these technologies can be a significant obstacle to practical application.
2. The proposed system has not been verified for performance in a situation where multiple users use it simultaneously. Data collisions and processing delays in a multi-user environment are important challenges in the commercialization stage.
3. The size and weight of the AR device and the user experience according to the wearability are factors that can limit the commercialization of this system. Lightweighting and design improvement are necessary to increase the long-term usability of users.
Therefore, the following considerations are required for improvement of the proposed method.
1. In order to solve the high cost of LiDAR sensors and AR devices, it is necessary to seek technological alternatives such as low-cost sensors or cloud-based data processing methods.
2. It is necessary to develop a method to minimize data processing delays and collisions through performance tests in an environment where multiple users use the system simultaneously.
3. Additional research is needed to improve usability by reducing the weight and wearing comfort of the AR device and developing a design that meets the needs of various users.
This paper makes an innovative and meaningful contribution as a technological approach to improve pedestrian safety. However, additional research and verification are needed for improvements for commercialization, and more field tests and technical perfection are required for the proposed method to develop into a practical traffic safety system.
Author Response
Review 4:
Comment 1: In order to solve the high cost of LiDAR sensors and AR devices, it is necessary to seek technological alternatives such as low-cost sensors or cloud-based data processing methods.
Response 1: We agree with this comment, and as such we have added an additional paragraph on lines 448 – 453 on page 14 that addresses this limitation: “In addition, addressing the cost of LiDAR sensors and AR devices will be essential for widespread adoption. Exploring technological alternatives, such as low-cost sensors or cloud-based data processing methods, could help reduce overall implementation expenses without compromising performance. By leveraging augmented reality technology, this system can increase situational awareness for vulnerable road users in a way that is more intuitive and harder to overlook, paving the way for safer and more efficient streets.
Comment 2: It is necessary to develop a method to minimize data processing delays and collisions through performance tests in an environment where multiple users use the system simultaneously.
Response 2: We agree with this comment. Therefore, we have more clearly emphasized this point at lines 210 – 218 on page 6: “In addition, the system is designed with multi-user compatibility, though for the prototype this was not actually implemented. While specific testing has not yet been conducted to evaluate the impact of increasing the number of users on data transfer speeds and system relevance, similar systems that utilize WebSocket technology—such as those used in multiplayer gaming and real-time financial platforms—have demonstrated the ability to handle thousands of simultaneous users effectively. These established use cases suggest that our system would remain scalable and efficient in multi-user scenarios. However, further testing in a multi-user environment would provide valuable insights and validate this scalability under real-world conditions.”
Comment 3: Additional research is needed to improve usability by reducing the weight and wearing comfort of the AR device and developing a design that meets the needs of various users.
Response 3: We agree with this comment, and as such we have added an additional paragraph on lines 469 – 475 on page 14 that addresses this limitation: “Moreover, further research is needed to improve the usability of AR devices by reducing their weight and enhancing wearing comfort. Developing a design that meets the diverse needs of users will be instrumental in ensuring long-term adoption and effectiveness in real-world applications. These improvements will contribute to creating a solution that is not only functional but also user-friendly and accessible. However, it should be noted that the design and implementation of new AR devices are beyond the scope of this specific research paper.”
Reviewer 5 Report
Comments and Suggestions for AuthorsThis paper proposes an AR system that integrates live LiDAR data to deliver spatially accurate and timely warnings to pedestrians, thereby mitigating accident risks.
The paper is nice and I enjoyed reading it; however, I have several concerns:
1. Figure 1 is unclear. Why does the Wifi router appear twice? What is the elongated purple object? A detailed explanation is needed.
2. Currently, the introduction and Literature Review are combined into a single section. It would be beneficial to separate these sections for improved clarity.
3. In Table 1, everything goes according to the formula: Laser*1800+azimuth_angle. It would be better to explicitly write this formula.
4. In equations 1-3, please specify what "cse" is. What do these letters stand for?
5. The authors write that equation 5 provides "the distance between the origin and the sensor location"; however this equation gives 3 values, so what is the distance?
6. In Figure 6, How do the authors decide how much and in which direction to move the camera?
7. As individual reaction times differ significantly, this variability may influence the accuracy and effectiveness of the proposed model. In Y. Wiseman, "Autonomous vehicles will spur moving budget from railroads to roads", International Journal of Intelligent Unmanned Systems, Vol. 12(1), pp. 19-31, 2024, available online at: https://u.cs.biu.ac.il/~wisemay/ijius2024.pdf , the author write "Different people have different reaction times. For example, older people have a longer reaction time on average; however, age is not the only factor that affects reaction time. The average reactive time of human beings is 1.3s ... the reactive time of an autonomous vehicle is about 100 milliseconds.". I would encourage the authors to cite this paper and explain how all these different reaction times go into the suggested model, at least as a future work.
8. Xu, H., Huang, S., Yang, Y., Chen, X., & Hu, S., "Deep Learning-Based Pedestrian Detection Using RGB Images and Sparse LiDAR Point Clouds" IEEE Transactions on Industrial Informatics, Vol. 20, pp. 7149-7161, 2024 proposes a multimodal platform combining RGB cameras, sparse LiDAR, and data processing modules for pedestrian data acquisition. I would encourage the authors to compare their system with this system.
7. Section 4 should be called "discussion and conclusions".
8. It would be helpful to include a discussion on the potential shortcomings and avenues for enhancing the proposed method.
9. The format of references should be consistent.
Author Response
Review 5:
Comment 1: Figure 1 is unclear. Why does the Wifi router appear twice? What is the elongated purple object? A detailed explanation is needed.
Response 1: Agree. The diagram does appear confusing in its current context, so it was redesigned for improved clarity on page 6.
Comment 2: Currently, the introduction and Literature Review are combined into a single section. It would be beneficial to separate these sections for improved clarity.
Response 2: Thank you for your comment. We agree that separating the introduction and literature review will improve the clarity of the paper, so we have separated the literature review into its own section titled “2. Literature Review”.
Comment 3: In Table 1, everything goes according to the formula: Laser*1800+azimuth_angle. It would be better to explicitly write this formula.
Response 3: Thank you for your comment. However, the table is intended solely to illustrate how the array is structured for storing azimuths and laserIDs, without involving any calculations. Therefore, we believe it is unnecessary to include an additional formula.
Comment 4: In equations 1-3, please specify what "cse" is. What do these letters stand for?
Response 4: Thank you for your comment. In this case, “cse” is merely a term stand-in for the horizontal projection measured on the x and y axis along a given point. To make this clearer, we have revised the equations to use “” in place of “cse” on page 8.
Comment 5: The authors write that equation 5 provides "the distance between the origin and the sensor location"; however this equation gives 3 values, so what is the distance?
Response 5: Agree. Equation 5 provides a vector, not a distance, so we replaced the term “distance” with “displacement” on lines 374 - 375 of page 11: “Such a shift would be bound by the displacement between the origin and the sensor location, as given by Equation 5:”
Comment 6: In Figure 6, How do the authors decide how much and in which direction to move the camera?
Response 6: Thank you for your comment. In our paper, we state in lines 374 - 375 that the shift of camera is bound by the displacement of the vector in equation 5. To make this clearer, we have revised the caption of figure 6 on page 12 to refer directly to equation 5: Coordinates of objects in the augmented reality plane: (a) Before Camera Shift; (b) After Camera Shift using the displacement vector in Equation 5”
Comment 7: As individual reaction times differ significantly, this variability may influence the accuracy and effectiveness of the proposed model. In Y. Wiseman, "Autonomous vehicles will spur moving budget from railroads to roads", International Journal of Intelligent Unmanned Systems, Vol. 12(1), pp. 19-31, 2024, available online at: https://u.cs.biu.ac.il/~wisemay/ijius2024.pdf, the author write "Different people have different reaction times. For example, older people have a longer reaction time on average; however, age is not the only factor that affects reaction time. The average reactive time of human beings is 1.3s ... the reactive time of an autonomous vehicle is about 100 milliseconds.". I would encourage the authors to cite this paper and explain how all these different reaction times go into the suggested model, at least as a future work.
Response 7: Thank you for your feedback. We agree that this aspect is something that should be considered for future work, so we have added an additional paragraph on lines 475 - 483, pages 13 and 14 in the discussion and conclusions section to incorporate this idea: “Beyond technical considerations, individual differences in reaction times among users may also influence the accuracy and effectiveness of the proposed system. Research by Y. Wiseman highlights that reaction times vary significantly based on factors such as age, with the average human reaction time being approximately 1.3 seconds, compared to the near-instantaneous reaction time of autonomous systems (approximately 100 milliseconds) [32]. Accounting for this variability is crucial, as delayed user responses to alerts could diminish the system's overall safety benefits. Future iterations of the model could incorporate dynamic timing adjustments to tailor warnings based on predicted user response times.”
Comment 8: Xu, H., Huang, S., Yang, Y., Chen, X., & Hu, S., "Deep Learning-Based Pedestrian Detection Using RGB Images and Sparse LiDAR Point Clouds" IEEE Transactions on Industrial Informatics, Vol. 20, pp. 7149-7161, 2024 proposes a multimodal platform combining RGB cameras, sparse LiDAR, and data processing modules for pedestrian data acquisition. I would encourage the authors to compare their system with this system.
Response 8: Thank you for your comment. While the corresponding author's team has published several papers on LiDAR sensing and its applications for object detection, classification, and tracking, this specific research paper focuses on the LiDAR-AR system for visualizing 3D conflicting moving objects detected by LiDAR sensors. It is important to note that the proposed LiDAR-AR system is designed to integrate with any sensing technology or system capable of providing accurate 3D spatial localization of detected objects. Therefore, we did not perform a direct comparison of the LiDAR sensing system with other sensing systems in this paper, as the emphasis is on the AR visualization framework rather than the specific sensing technology used. We have clarified this point on lines 445 – 449.
Comment 9: Section 4 should be called "discussion and conclusions".
Response 9: Thank you for your comment. We agree with this idea and have revised the heading accordingly to “discussion and conclusions” on page 14.
Comment 10: It would be helpful to include a discussion on the potential shortcomings and avenues for enhancing the proposed method.
Response 10: Thank you for your feedback. We agree that more discussion can be made on limitations, so we have added various paragraphs on lines 433 - 491 on pages 14 and 15 to address this.
Comment 11: The format of references should be consistent.
Response 11: Thank you for your comment. After examining each of the references, we have ensured that they are all consistent.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe revised paper can be accepted for publication.
Comments on the Quality of English LanguageThe quality of the English language is OK.
Author Response
Comment 1: The revised paper can be accepted for publication.
Response 1: Thank you for reviewing our efforts and all of the valuable comments you have given us!
Reviewer 3 Report
Comments and Suggestions for AuthorsI still have some concerns:
1. As shown in Fig. 1, why was the data sent to a notebook computer?
2. Fig. 2 has some errors.
3. I hope the authors can release some data.
Author Response
Comment 1: As shown in Fig. 1, why was the data sent to a notebook computer?
Response 1: Thank you for your comment. The figure misrepresents where data is sent, as from a LiDAR sensor it does not necessarily have to travel to a notebook computer. Rather, it can be sent to any edge computer. To clarify this, we have modified the diagram on page 6, line 208 to feature an image of an edge computer instead of a notebook laptop.
Comment 2: Fig. 2 has some errors.
Response 2: Thank you for your comment. We agree that there were some red spellcheck markings in the figure, and we have replaced the image with a cleaned version without these red markings on page 8, line 297.
Comment 3: I hope the authors can release some data.
Response 3: Thank you for your comment. We agree that some data can be released, so here is a link to sample LiDAR data: https://nevada.box.com/v/UNRsamplelidardata. We have also placed this link within the paper at lines 432 and 433 on page 14.
Reviewer 5 Report
Comments and Suggestions for AuthorsThe authors made a decent effort and the paper is certainly publishable so I would recommend accepting the paper.
Author Response
Comment 1: The authors made a decent effort and the paper is certainly publishable so I would recommend accepting the paper.
Response 1: Thank you for reviewing our efforts and all of the valuable comments you have given us!