Edge-Enhanced TempoFuseNet: A Two-Stream Framework for Intelligent Multiclass Video Anomaly Recognition in 5G and IoT Environments
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis article presents a neural network architecture for anomaly classification on low-resolution videos in 5G and IoT environments. It uses super resolution to upsample low-resolution videos, then feeds the higher resolution videos to a two-stream neural network architecture for anomaly classification.
The topic is interesting, and the proposed approach makes some sense. However this article does not explain its contributions clearly and lacks some technical details.
Some comments are below:
Line 71: "This study focuses on the technical limitations caused by low-quality videos, specifically poor lighting and low spatial resolution." However, this article does not provide information regarding how to deal with poor lighting in this paper. No evaluation is provided either.
Super resolution based approach is a contribution of this article. However, neural network based super resolution is a well studied area. Could you explain more about the uniqueness and novelty of the super resolution approach used in this article? This article mentions "modified GLEAN" a few times. Could you provide some details of the modifications?
Line 245: "The Libswscale is a highly optimized python implementation that can be used for scaling, colorspace, and pixel format transformation operations." I think libswsacle is a library implemented with C. I guess you mean the python binding. Please double check it.
Section 5 replicates information discussed in the previous sections. For example, line 371 to 390 can be significantly simplified. Lines 454 to 457 repeat information discussed before.
Some thoughts about this article's conclusion. Downscaling videos is for experiment purposes only. Discussing it in the conclusion section seems misleading. Readers may think downsampling is a step of the system. This article mentions a few times about anomaly classification using low quality videos, but the research is limited to the low resolution videos. So the conclusion is not so convincing. Because low quality can mean a lot of different things, such as poor quality due to sensors, information loss due to data loss or compression, etc.
Some minor issues:
- Equation (2) needs a comma.
- Table 1. More accurately, row 7 is "storage", not "memory".
Comments on the Quality of English LanguagePlease remove/simplify some repetitive sentences.
Author Response
The response to Reviewer 1 comments is attached as pdf file
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper titled "Edge-Enhanced TempoFuseNet: A Two-Stream Framework for Intelligent Multiclass Video Anomaly Recognition in 5G and IoT Environment" by Gulshan Saleem et al. proposes a novel framework for the classification of anomalies in surveillance videos, particularly in the context of 5G and IoT environments. This review will address various aspects of the paper, including its novelty, methodology, experimental setup, and overall significance in the field.
Novelty
The study introduces the Edge-Enhanced TempoFuseNet, a two-stream architecture combining spatial and temporal features for video anomaly recognition. This framework stands out for its use of a novel super-resolution technique based on an encoder-bank-decoder configuration and leveraging StyleGAN for feature enhancement​​. The innovative approach in handling low-quality video surveillance footage, particularly those with poor lighting and low spatial resolution, is a significant advancement in the field.
Recommendations
1. Clarify and Expand Methodological Details
- Enhance Methodological Explanations: Provide more detailed explanations or step-by-step breakdowns of complex processes, especially the novel aspects of the TempoFuseNet architecture and the super-resolution technique. Clearer explanations can aid in the reproducibility of your results.
- Algorithm and Pseudocode: Include algorithmic representations or pseudocode for key components of your methodology, particularly for the two-stream architecture and the super-resolution process.
2. Improve Presentation and Readability
- Figures and Visuals: Enhance or add more figures and visuals, especially for the architecture and the workflow of the system. Visual aids can significantly improve reader comprehension.
- Formatting and Structure: Ensure that the paper is well-structured and formatted according to the journal's guidelines. Clear subheadings, consistent terminology, and an organized flow of content are essential.
3. Future Work
- Future Work: Clearly outline potential areas for future research. This can include expanding the model’s capabilities, testing on more diverse datasets, or integrating the model into real-world surveillance systems.
4. Proofreading and References
- Thorough Proofreading: Ensure the paper is free from typographical and grammatical errors. This improves the paper's professionalism and readability.
- Update and Verify References: Make sure all references are up-to-date and correctly cited. Ensure that all relevant recent works are included in the literature review.
Author Response
The response to Reviewer 2 comments is attached as pdf file
Author Response File: Author Response.pdf