Next Article in Journal
Project Management Efficiency Measurement with Data Envelopment Analysis: A Case in a Petrochemical Company
Previous Article in Journal
Application of Artificial Neural Networks for Power Load Prediction in Critical Infrastructure: A Comparative Case Study
 
 
Article
Peer-Review Record

Dynamic Queries through Augmented Reality for Intelligent Video Systems

Appl. Syst. Innov. 2024, 7(1), 1; https://doi.org/10.3390/asi7010001
by Josue-Rafael Montes-Martínez 1,*, Hugo Jiménez-Hernández 2,*, Ana-Marcela Herrera-Navarro 2, Luis-Antonio Díaz-Jiménez 2, Jorge-Luis Perez-Ramos 2 and Julio-César Solano-Vargas 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Syst. Innov. 2024, 7(1), 1; https://doi.org/10.3390/asi7010001
Submission received: 20 October 2023 / Revised: 3 December 2023 / Accepted: 7 December 2023 / Published: 19 December 2023
(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript titled: "Dynamic Queries through Augmented Reality for Intelligent Video Systems" addresses the challenges of distributed zone monitoring systems, focusing on the complexities of information analysis, communication, interoperability, and heterogeneous interpretation. The authors introduce a framework aiming to establish homogeneity in a distributed monitoring system by mapping trajectories from various cameras into a global reference space, allowing for comparison and analysis of heterogeneous information. They also propose a novel similarity metric for querying information from diverse sources. The evaluation involves the development of applications using an Augmented Reality system based on realistic environments and historical data recovery through a client-server model. The detailed review comments are listed below.

Strengths

·      Innovative Framework: The introduction of a framework for achieving homogeneity in a distributed monitoring system is a notable step forward.

·        Global Reference Space: Creating a global reference space for trajectories from multiple cameras could potentially address the challenge of interoperability.

·        Novel Similarity Metric: The proposed metric for querying information from diverse sources might enhance system flexibility and usability.

·        Practical Evaluation: Developing applications in an Augmented Reality system for evaluation purposes suggests a practical approach to assess the system's performance.

Improvement Suggestions:

1.      Specific Criteria: The manuscript mentions the absence of specific criteria for exchanging and analyzing information. It would be beneficial if the authors address this lack and proposes standardized criteria for information exchange in distributed systems.

2.      Implementation: While the manuscript introduces the concept of a global reference space, it lacks detail on the technical implementation of it. Therefore, it is suggested to add some details for clarity and ease of understating.

3.      Methodology used: More information is needed on the methodology used for evaluating the proposed system. Details about the parameters, metrics, and criteria used for assessment would enhance the credibility of the results.

4.      Real-World Viability: While the use of an Augmented Reality system for evaluation is promising, the real-world applicability and scalability of the proposed framework need further discussion and evidence.

Comments on the Quality of English Language

Minor

Author Response

Reviewer 1:

  1. Specific Criteria: The manuscript mentions the absence of specific criteria for exchanging and analyzing information. It would be beneficial if the authors address this lack and proposes standardized criteria for information exchange in distributed systems.

R.- Thank you for the comment. The information about this point has been added and corresponds to the paragraphs in the discussion section. This paragraph addresses the communication in a distributed system via text-plain files Jason-like to the server, which uses the global space as a reference framework to store and query the historical information.

  1. Implementation: While the manuscript introduces the concept of a global reference space, it lacks detail on the technical implementation of it. Therefore, it is suggested to add some details for clarity and ease of understating.

R.- Thank you for the comment; you are right. The global space becomes a vague definition. The text becomes improved by adding an extra paragraph to clarify. This paragraph explains that the global space uses the UTM coordinates to locate the building position that becomes the origin, and the displacement becomes referred to as this origin. The UTM coordinates help to define in a macro-view a 2D plane to orientate all the camera's positions and directions used to monitor the movement information expressed in trajectories projected from the different devices. Each camera becomes oriented and projected in the global space through specific color markers in the scenario. The process is described as follows:

  • In the real plane, reference points called active marks (red, blue, green, and yellow boxes) are strategically placed so that they enter the viewing space of the cameras to be used.
  • A segmentation is performed on a sample image from each camera to obtain a binary image containing only the active brands' information.
  • The centroid of each mark is found and taken as a reference for the image plane.
  • Homography is applied from the references in the real plane and the references in the image plane; in this way, the global reference plane is constructed.
  1. Methodology used: More information is needed on the methodology used for evaluating the proposed system. Details about the parameters, metrics, and criteria used for assessment would enhance the credibility of the results.

R.- Thank you, to clarify this situation a process diagram and a table with used parameters are added. (Fig 15, 16, 17, and table 3). This clarifies the parameters used.

  1. Real-World Viability: While the use of an Augmented Reality system for evaluation is promising, the real-world applicability and scalability of the proposed framework need further discussion and evidence.

R.- Experimental evidence becomes added in sections 7 and 8 in paragraphs 404 -529.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper provides a technological solution to the problem of scene re-construction using trajectories produced by different video-cameras with overlapping field of view, and used by a distributed system. The problem of data fusion in this setting is well known and the related works part of the introduction lacks some references, e.g., to solutions for moving person/object re-identification by different cameras (2D, 3D and augmented reality) and their use in distributed surveillance systems, even for large public spaces. References to middleware solutions for data interoperability in distributed sensor-based systems are also not cited. The advancement and novelty of the presented approach is not clear, as well as the benefit of the proposed architecture. Why is the visualization of the merged trajectories through AR relevant? The authors should better explain the following sentence: "Currently, no specific criteria exist for exchanging and analyzing information in a distributed system." The proof of 4.2 is not clear. The experimental part is not strong enough as the case study is limited and performances of the architecture, e.g., of the similarity metric approach, is not discussed. The impact of the work should be discussed as well. I suggest that the authors improve the paper and re-submit it.

 

Comments on the Quality of English Language

Usage of punctuation should be reviewed as, for example, the sentence from Line 116 to Line 127 is too long. Many sentences should be simplified.

Author Response

Reviewer 2:

  1. To solutions for moving person/object re-identification by different cameras (2D, 3D and augmented reality) and their use in distributed surveillance systems. References to middleware solutions for data interoperability in distributed sensor-based systems are also not cited.

R.- Thank you, this point is fixed, and the following references have been added: 59 – 63.

  1. The advancement and novelty of the presented approach is not clear, as well as the benefit of the proposed architecture.
  2. The discussion and conclusion are rewritten for better clarity. In short answer, the novelty is in two paths:
  3. The construction of a BD of trajectories from different camera sources, for which these are mapped in a global reference provided by the UTM.
  4. Consequently, the storage in a global reference becomes relevant if an external device such as a smartphone can query information. Then, the second contribution is a similarity criterion for matching trajectories. This metric's effectiveness is shown; however, a query becomes performed by external devices, allowing project t of the 2D-of-view trajectory of the interest and display in the device.

Another technical novelty is using dynamic augmented reality to display trajectories not dependent on a static scenario label.

  1. Why is the visualization of the merged trajectories through AR relevant?
  2. This becomes relevant because, typically, AR systems only use a static mark to display information. In this context, this system shows an example of a dynamical AR where the gadget represents a virtual spatial and time window of the information, and anyone can navigate through the data related to the spatiality. This technology becomes essential to recreate time events through monitoring systems.
  3. Currently, no specific criteria exist for exchanging and analyzing information in a distributed system

 

R.- Thank you for the comment. The information about this point has been added and corresponds to the paragraphs in the discussion section. This paragraph addresses the communication in a distributed system via text-plain files Jason-like to the server, which uses the global space as a reference framework to store and query the historical information.

 

  1. The proof of 4.2 is not clear.

 

R.- The proof is complementary to the work to warrant the proposal expression for measuring the similarity as a metric. This section aims to show that the metric in the specific case of the linear approximation becomes equivalent to the L1 metric, which may rewrite this metric to reduce the complexity and gain computability. The paragraph becomes rewritten, and comments are added to gain clarity.

 

  1. The experimental part is not strong enough as the case study is limited and performances of the architecture e.g., of the similarity metric approach, is not discussed. The impact of the work should be discussed as well.

 

R.- Experimental evidence becomes added in section 7 in paragraphs 404 - 529.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The article presents a study regarding analysis of information collected from distributed cameras. The information from at least two cameras is transformed into trajectories, which are collected into a global space, a kind of repository. Using an Augmented Reality app on a handheld device, sample trajectories can be recorded and used for identifying the most similar trajectories in the global space. For measuring the similarity of trajectories, a metric has been defined.

The article is well-written and well-structured. The topic is current and a valuable contribution to the field.

There are a few issues which require improvements:

·        The abstract does not provide an overview for readers not familiar with the topic. It starts with the technical term “Distributed zone monitoring systems“,  which might not be known by any reader checking the abstract if the article is worth reading for that person.

·        It would be beneficial to motivate the work using a concrete use case … when do you need this work?

·        Table 1: The columns need to be described, e.g. what does result mean? What means “Training rule”

·        L24 – L28: Some and etc. at the end: I consider the etc at the end for superfluous as, some already represents a selections.

·       each one requires and the results each delivers. “

·       L89 – 94: This statement needs to be substantiated with a reference.

·       Table 2: The formulas are quite hard to comprehend, as the variables are not well-known to a person, who might be interested, but is not an expert in the field. One option would be to include the formula and the symbols into describing text in the table.

·       L 352 “The computational complexity of the last expression is in the order of O(k) where k is a constant.” Is O(constant) not equivalent to O(1)? What is described by k?

·        L373: trajectories: “the five most similar to the reference are selected” most similar trajectories?

·        Figure 11: data repositor*y*

·        Figure 12 and Figure 11: font size should be the same. The caption Server and Client confuses when it is visualized in a box like the other elements: It might be sufficient if the image caption would be: “Server: Communication Process”?

·        L392: isolated line

·        Figure 15: “first consult”? What does it mean? Typo: result?

·       L409: “some settings in the application’s programming“sounds a bit strange  -> “settings in the application?”

Comments on the Quality of English Language

Mostly fine, comprehensible

Author Response

Reviewer 3:

 

  1. The abstract does not provide an overview for readers not familiar with the topic. It starts with the technical term “Distributed zone monitoring systems“,  which might not be known by any reader checking the abstract if the article is worth reading for that person.

R.- Thank you for the comment. A brief description has been added in the abstract to give context to the reader.

  1. It would be beneficial to motivate the work using a concrete use case … when do you need this work?

R.- Thank you for the comment. In section 7 we have added particular cases for the application of our work.

  1. Table 1: The columns need to be described, e.g. what does result mean? What means “Training rule”

R.- Thank you for the comment. Table 1 has been modified with a more precise description of the points discussed in it.

  1. L24 – L28: Some and etc. at the end: I consider the etc at the end for superfluous as, some already represents a selections. “each one requires and the results each delivers. “

R.- Thank you for the comment, etc. has been eliminated for clarity.

  1. L89 – 94: This statement needs to be substantiated with a reference.

R.- A reference has been added:

Carmigniani, J., Furht, B., Anisetti, M., Ceravolo, P., Damiani, E., & Ivkovic, M. (2011). Augmented reality technologies, systems and applications. Multimedia tools and applications, 51(1), 341-377.

  1. Table 2: The formulas are quite hard to comprehend, as the variables are not well-known to a person, who might be interested, but is not an expert in the field. One option would be to include the formula and the symbols into describing text in the table.

R.- Thank you for the comment. A description for each method has been added to Table 2.

  1. L 352 “The computational complexity of the last expression is in the order of O(k) where k is a constant.” Is O(constant) not equivalent to O(1)? What is described by k?

R.-  Thank you for the comment. Yes, you are correct, so in the context, we want to emphasize that the complexity of the measure is dependent on the polynomial order for fitting the data. In the particular case that each chunk of information is approximated to a second-order function, the similarity criteria works with lines, which simplifies the computation, and the metrics can easily expressed as a single algebraic expression that represents a constant complexity.

L373: trajectories: “the five most similar to the reference are selected” most similar trajectories?

R.- Thank you for the comment. Yes, the five most similar trajectories to the reference are displayed through augmented reality, now in L400.

  1. Figure 11: data repositor*y*

R.- Thank you for the observation, the figure has been modified, and now is Figure 12.

  1. Figure 12 and Figure 11: font size should be the same. The caption Server and Client confuses when it is visualized in a box like the other elements: It might be sufficient if the image caption would be: “Server: Communication Process”?

R.- Thank you for the observation, figures 11 and 12 have been modified, and now are figures 12 and 13.

Figure 15: “first consult”? What does it mean? Typo: result?

R.- Thank you for the comment. It refers to the first result of the query, we have changed the sentence and the caption for Figure 15, now Figure 19.

L409: “some settings in the application’s programming“ sounds a bit strange  -> “settings in the application?”

R.- Thank you for the comment. By changing the source code, you can easily add the number of trajectories required from only the most similar trajectory to all the trajectories contained in the data repository. The sentence has been changed.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have provided a point-to-point reply to my previous comments and improved the description of the usefulness of the approach. I would suggest to update the references to other more recent works related to distributed video-based (and AR) systems, where components use a global reference system, and those applied to security as quite a lot of references are old. Here is is one example but I am sure there are other works.  

Villani, M.L.; De Nicola, A.; Bouma, H.; van Rooijen, A.; Räsänen, P.; Peltola, J.; Toivonen, S.; Guarneri, M.; Stifini, C.; De Dominicis, L. A Modular Architecture of Command-and-Control Software in Multi-Sensor Systems Devoted to Public Security. Information 2023, 14, 162. https://doi.org/10.3390/info14030162

Therefore, some assumptions on the novelty may appear strong. Performance of the system in a real environment would add more value.  Drawbacks of using AR technology in real applications are not mentioned.

Comments on the Quality of English Language

There are some long sentences that are hard to follow. Language should be simplified.

Author Response

Answers for the reviewer:

  1. I would suggest to update the references to other more recent works related to distributed video-based (and AR) systems

R.- Thanks for the comment. The following references have been added:

Distributed video-based systems:

Villani, M. L., De Nicola, A., Bouma, H., van Rooijen, A., Räsänen, P., Peltola, J., ... & De Dominicis, L. (2023). A Modular Architecture of Command-and-Control Software in Multi-Sensor Systems Devoted to Public Security. Information, 14(3), 162. DOI https://doi.org/10.3390/info14030162

Bouma, H., Villani, M. L., van Rooijen, A., Räsänen, P., Peltola, J., Toivonen, S., ... & De Dominicis, L. (2022). An integrated fusion engine for early threat detection demonstrated in public-space trials. Sensors, 23(1), 440. DOI https://doi.org/10.3390/s23010440

Qiu, S., Zhao, H., Jiang, N., Wang, Z., Liu, L., An, Y., ... & Fortino, G. (2022). Multi-sensor information fusion based on machine learning for real applications in human activity recognition: State-of-the-art and research challenges. Information Fusion, 80, 241-265.. DOI: https://doi.org/10.1016/j.inffus.2021.11.006

Li, Y., Yang, G., Su, Z., Li, S., & Wang, Y. (2023). Human activity recognition based on multienvironment sensor data. Information Fusion, 91, 47-63. DOI: https://doi.org/10.1016/j.inffus.2022.10.015

Chen, K. Y., Chou, L. W., Lee, H. M., Young, S. T., Lin, C. H., Zhou, Y. S., ... & Lai, Y. H. (2021). Human motion tracking using 3d image features with a long short-term memory mechanism model—an example of forward reaching. Sensors, 22(1), 292. DOI: https://doi.org/10.3390/s22010292

 

 

Augmented reality

Arafa, A., Sheerah, H. A., & Alsalamah, S. (2023). Emerging Digital Technologies in Healthcare with a Spotlight on Cybersecurity: A Narrative Review. Information, 14(12), 640. DOI: https://doi.org/10.3390/info14120640

Christoff, N., Neshov, N. N., Tonchev, K., & Manolova, A. (2023). Application of a 3D Talking Head as Part of Telecommunication AR, VR, MR System: Systematic Review. Electronics, 12(23), 4788. DOI: https://doi.org/10.3390/electronics12234788

Nunes, J. S., Almeida, F. B., Silva, L. S., Santos, V. M., Santos, A. A., de Senna, V., & Winkler, I. (2023). Three-dimensional coordinate calibration models for augmented reality applications in indoor industrial environments. Applied Sciences, 13(23), 12548. DOI: https://doi.org/10.3390/app132312548

Stappung, Y., Aliaga, C., Cartes, J., Jego, L., Reyes-Suárez, J. A., Barriga, N. A., & Besoain, F. (2023). Developing 360° Virtual Tours for Promoting Tourism in Natural Parks in Chile. Sustainability, 15(22), 16043. DOI: https://doi.org/10.3390/su152216043

Bhang, K. J., & Huh, J. R. (2023). Effectiveness of Fine Dust Environmental Education on Students’ Awareness and Attitudes in Korea and Australia Using AR Technology. Sustainability, 15(22), 16039. DOI: https://doi.org/10.3390/su152216039

Kleftodimos, A., Evagelou, A., Gkoutzios, S., Matsiola, M., Vrigkas, M., Yannacopoulou, A., ... & Lappas, G. (2023). Creating Location-Based Augmented Reality Games and Immersive Experiences for Touristic Destination Marketing and Education. Computers, 12(11), 227. DOI: https://doi.org/10.3390/computers12110227

 

  1. Performance of the system in a real environment would add more value.

R.- Thank you for your comment. The performance system becomes affected in the following stages:

1.- Communication: The complete system is provided by ethernet communication. The video cameras are Ethernet interfaces. The communication becomes in a camera of 1 Mbps in streaming. The ethernet infrastructure becomes a 1 GB ethernet VPN. Simultaneously, we are processing two cameras, which in peak situations give around 20 Mbs sustained in a continuous streaming.

2.- Computation of data for storage: The backend server aims to execute the tracking algorithm and store it in the global reference space. The transformation in a global reference space becomes complex because the cameras become fixed, and the projection is based on the active markers in the scenario. Once the camera computes the matrix projection, it becomes fixed until the camera position or orientation changes.

3.- Computation of data for querying: This process involves the client (personal gadget) and requires sampling in a field of view of the orientation into the scene. A client App captures the trajectory for querying after locating the active markers. Projection markers are commutated to transform into active markers and global reference space. This operation is performed once after capturing the trajectory to be consulted as a reference.

4.- Displaying process: Represent the most intensive resources because the position and orientation continuously become checking; this consciously projects to the current camera orientation the trajectory into the reference space consulting the similar information to the server. The displaying process represents a 3D view, which uses the client's current GPU or graphic interface. This operation becomes limited by the client and trajectory complexity dynamic expressed in the device. On the other hand, the backend server computes the similarity of the trajectories and sends the point trajectories to be shown on the device.

  1. Drawbacks of using AR technology in real applications are not mentioned.

R.- Thank you for your comments. The use of augmented reality has grown significantly in the last decade thanks to technological advances; however, it is a technology that requires prior study for its use in specific spaces since it requires references to be able to match the real information and the virtual information that is intended to be added. In addition, it is essential to mention that mid-high-range devices are still required to display information fluidly. Within the behavior presented by this type of technology, the system lacks precision and accuracy by losing the references, called active marks, so recalibration is required to work without problems.

For further work, we are working on dynamically selecting markers to orientate the systems. Dynamic markers from camera views automatically help us to orientate. On the client side, the best selection of markers helps to generate the matrix projection.

Author Response File: Author Response.pdf

Back to TopTop