Next Article in Journal
Multiagent Manuvering with the Use of Reinforcement Learning
Next Article in Special Issue
Fish Monitoring from Low-Contrast Underwater Images
Previous Article in Journal
Advances in Tangible and Embodied Interaction for Virtual and Augmented Reality
Previous Article in Special Issue
Analysis of Enrollment Criteria in Secondary Schools Using Machine Learning and Data Mining Approach
 
 
Article
Peer-Review Record

Cyber-Physical System Security Based on Human Activity Recognition through IoT Cloud Computing

Electronics 2023, 12(8), 1892; https://doi.org/10.3390/electronics12081892
by Sandesh Achar 1,*,†, Nuruzzaman Faruqui 2,*,†, Md Whaiduzzaman 3,*,†, Albara Awajan 4 and Moutaz Alazab 4
Reviewer 1: Anonymous
Reviewer 2:
Electronics 2023, 12(8), 1892; https://doi.org/10.3390/electronics12081892
Submission received: 9 March 2023 / Revised: 2 April 2023 / Accepted: 13 April 2023 / Published: 17 April 2023

Round 1

Reviewer 1 Report

The proposed human activity recognition-based approach using GoogleNet-BiLTSM network to enhance Cyber-Physical security without human intervention looks interesting. 

Here are a few suggestions to improve the article:

1. Mentioning the existing approaches and justifying the use of GoogleNet-BiLTSM (i.e., how this hybridization helps to improve the accuracy) in the introduction will add strength to the paper.

2. Providing a brief explanation about sequence folding will be helpful to the readers.

3. In section 3.2.2,  please provide the reason for using only 19 layers.

4. A brief explanation of the algorithms will be helpful to the readers.

5. Algorithm 1 uses how many VMS?

6. Explaining the variables used in equations (2) - (4) will be helpful to the readers.

7.  Is the 2-D subtraction happens in the Edge node or on the cloud? Maybe explaining figure 5 will be helpful.

8. How does the network controller decides on the quality of the network and how does this approach uses the network quality?

9. And explaining how the cost of providing security is computed as $4.29 will be helpful.

 

Author Response

Suggestion 1

Mentioning the existing approaches and justifying the use of GoogleNet-BiLTSM (i.e., how this hybridization helps to improve the accuracy) in the introduction will add strength to the paper.

Authors’ Response

We would like to express our gratitude for such a valuable suggestion. We have added the following paragraph in the introduction according to this suggestion and added necessary citation:

The BiLSTM networks are well-known for their excellent capabilities of classifying time-dependent variables. However, they are limited by their feature extraction capabilities. On the other hand, GoogleNet is an excellent CNN for extracting features. However, its computational complexities impose a challenge in time-dependent classification. Combining GoogleNet and BiLSTM network together to recognize activities from real-time video streams compensates for the weaknesses of each system and makes the classifier more effective.

It improved the quality of the paper. Thank you, respected reviewer, for this valuable suggestion. The modifications have been highlighted in the revised paper.

Suggestion 2

Providing a brief explanation about sequence folding will be helpful to the readers.

Authors’ Response

We are truly grateful for this valuable suggestion. We explained the sequence folding as follows:

A BiLSTM is a recurrent neural network (RNN) that processes sequential data by collecting past and future context. Sequence folding speeds up and improves RNN training, including BiLSTMs. The input sequence is split into smaller, fixed-length subsequences, or "folds," in sequence folding. The BiLSTM, which comprises two independent LSTMs: a forward LSTM and a backward LSTM, processes these folds concurrently. The forward LSTM reads the subsequences from left to right and the reverse LSTM from right to left. The data is then more fully represented by concatenating the hidden states from both LSTMs at each time step.

We added valid citations to support the validity of our explanation. This modification has improved the readability of our paper. We would like to thank the esteemed reviewer again for this suggestion. The modification has been highlighted in the revised paper.

 

Suggestion 3

In section 3.2.2,  please provide the reason for using only 19 layers.

Authors’ Response

The esteemed reviewer has very keen observation skills, and we are honored to have such a knowledgeable reviewer on our paper. We have explained the reason for using the 10th layer as follows:

According to the GoogleNet architecture, this layer is responsible for averaging the extracted features. The research approach used in this paper utilizes GoogleNet for feature extraction. That is why the input to the BiLSTM network has been taken from the 19th layer of GoogleNet.

It has been added to the paper with reference. The modification has been highlighted in the revised paper.

Suggestion 4

A brief explanation of the algorithms will be helpful to the readers.

Authors’ Response

We thank the respected reviewer for this precious suggestion. We have explained both algorithms according to this suggestion. Now the paper has become easier to understand.

Explanation of the Algorithm 1

The number of VMs depends on the requests and the Service Level Agreement (SLA) with the service provider. This paper initializes a single VM to construct the feature vector. Algorithm 1 takes the GoogleNet and the corresponding frames as the input. Initially, it converts the frame according to the GoogleNet input layer size and stores the resized image in Ls variable. After that, the features are extracted from video frames in a loop. In every iteration, the features are added to a feature vector Fs. When no more frame remains, the algorithm saves the feature vector.

Explanation of the Algorithm 2

The HAR-CPS algorithm takes the CCTV video stream and  HTTP Live Streaming (HSL) request as inputs. Initially, it initiates a variable i, reads the frames from the video stream and stores the first frame in F[i] array. When there is a frame, the while loop is activated. In this loop, the HLS request is accepted for each frame, and the frames are continuously read and stored in the F[i] array. The frame difference is calculated by taking the difference between two successive frames. If there is more than a 70% difference between two frames, the proposed HAR-CPS algorithm sends the frame to the GoogleNet-BiLSTM hybrid network. This network classifies the frame and returns the predicted class with a confidence score. If the confidence score is higher than 80%, an alarm is generated according to the identified action. Otherwise, the HAR-CPS algorithm does not take any action.

We again thank the esteemed reviewer for this valuable suggestion. The modification has been highlighted in the revised paper.

Suggestion 5

Algorithm 1 uses how many VMS?

Authors’ Response

This is a very good question. We believe the readers will have the same question as well. That is why we’ve explained the number of VMs in algorithm 1. In the experimental setup, it initializes 1 VM. However, the number is determined by the Service Level Agreement (SLA) between the user and the cloud service provider. In our case, it is one.

The modification has been highlighted in the revised paper.

Suggestion 6

Explaining the variables used in equations (2) - (4) will be helpful to the readers.

Authors’ Response

Thank you for this outstanding suggestion. We have explained each and every variable used in equations 2, 3, and 4 four and updated the manuscript.

The modification has been highlighted in the revised paper.

Suggestion 7

Is the 2-D subtraction happens in the Edge node or on the cloud? Maybe explaining figure 5 will be helpful.

Authors’ Response

Thank you for the thoughtful question. There is no doubt that the readers will have the same question. At the same time, we would like to thank the esteemed reviewer for suggesting to explain in figure 5.

The 2-D subtraction is done at the Edge node, and it crosses the threshold, the frame is sent to the GoogleNet-BiLSTM hybrid network, which runs on the cloud server.

In respond to the suggestion to explain the figure 5, we have added the following explanation in the paper with necessary citation:

The Pi Server in figure 5 is the subscriber in the Subscriber-Publisher messaging system. It uses Remote Procedure Call (RPC) protocol to communicate with the Edge server through the Broker model. The same communication protocol is used in the Pi Camera Node (PCN), which is the publisher of the messaging system. The IoT camera node also used the Broker model to communicate with the Edge server. The Edge server has a persistent storage and log management system, which stores threshold values, network quality information, and every event log. The Network Latency Controller (NLC) in figure 5 is connected to an NTA00002B Nemo Outdoor 5G NR Drive Test sensor manufactured by Keysight Technologies, Inc. It senses 5G network parameters, including bandwidth, throughput, latency, traffic volume, signal intensity, discontinuity, and interference. Depending on the bandwidth demand, availability, and current throughput, the NLC adjusts the knob values of the Mez to maintain a quality-latency trade-off.

The modification has been highlighted in the revised paper.

Suggestion 8

Authors’ Response

How does the network controller decides on the quality of the network and how does this approach uses the network quality?

We are glad to receive such thought-provoking questions from the reviewer. We believe the readers will have the same question in their minds. That is why we have added the following explanation which answers the questions:

The Network Latency Controller (NLC) in figure 5 is connected to an NTA00002B Nemo Outdoor 5G NR Drive Test sensor manufactured by Keysight Technologies, Inc. It senses 5G network parameters, including bandwidth, throughput, latency, traffic volume, signal intensity, discontinuity, and interference. Depending on the bandwidth demand, availability, and current throughput, the NLC adjusts the knob values of the Mez to maintain a quality-latency trade-off.

The modification has been highlighted in the revised paper.

Suggestion 9

And explaining how the cost of providing security is computed as $4.29 will be helpful.

Authors’ Response

We would like to thank the respected reviewer once again for such a valuable suggestion. We have explained the cost computed as $4.29 as follows and added it to the manuscript

According to the SLA with the cloud service provider, based on the computational resource usage in table 7 of the manuscript, the predicted monthly cost of providing cyber-physical security using the proposed system is $4.29 only. It is a prediction by the pay-as-you-go payment system with a 6.82% (plus or minus) deviation probability.

The modification has been highlighted in the revised paper.

We have attached the PDF version of these responses as well. 

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper is combined between GoogleNet-BiLSTM network-based classifier and the algorithm and cloud computing with minimal human intervention.

 

1) The authors should discuss what is HAR? Cyber-Physical System Security? And IoT-Cloud Computing?

2) Following this paper, I do not understand why we combine them and how to adapt between them. Please explain clearly.

3) Some approaches were proposed based on Machine Learning (in this paper MLP [42], basic ML) which can get an accuracy of ~75%. Please check and update it.

Author Response

Question 1

The authors should discuss what is HAR? Cyber-Physical System Security? And IoT-Cloud Computing?

Authors’ Response

We would like to appreciate the reviewer for such a valuable question. We agree with the reviewer that we should have defined these terminologies in the paper. We have added short definitions of HAR, Cyber-Physical System Security, and IoT-Cloud Computing from the context of paper as follows:

HAR: HAR is a field in computer vision and machine learning that focuses on recognizing and classifying different human activities.

Cyber-Physical System Security: "Cyber-Physical Security," abbreviated as CPS, describes safeguarding systems comprising physical and computational resources.

IoT-Cloud Computing: IoT-Cloud Computing is the combination of Internet of Things (IoT) devices and cloud computing services to process, analyze, and store data from IoT devices in a way that is more scalable, flexible, and efficient.

The modifications have been highlighted in the revised manuscript.

Suggestion 1

Following this paper, I do not understand why we combine them and how to adapt between them. Please explain clearly.

Authors’ Response

We would like to express our gratitude for such a valuable suggestion. We have added the following paragraph in the introduction to explain the reason behind combining GoogleNet and BiLSTM network.

The BiLSTM networks are well-known for their excellent capabilities of classifying time-dependent variables. However, they are limited by their feature extraction capabilities. On the other hand, GoogleNet is an excellent CNN for extracting features. However, its computational complexities impose a challenge in time-dependent classification. Combining GoogleNet and BiLSTM network together to recognize activities from real-time video streams compensates for the weaknesses of each system and makes the classifier more effective.

It improved the quality of the paper. Thank you, respected reviewer, for this valuable suggestion. The modifications have been highlighted in the revised paper.

Suggestion 2

Some approaches were proposed based on Machine Learning (in this paper MLP [42], basic ML) which can get an accuracy of ~75%. Please check and update it.

Authors’ Response

We are truly amazed at the keen observation of the esteemed reviewer. And we are delighted to have such a knowledgeable reviewer on our paper. Your questions and suggestions have improved the quality of the paper.

The mentioned paper MLP at citation number 42, which is now at citation number 50 in the revised manuscript, defines the Multilayer Perceptron (MLP) model. We have used it with our dataset. The accuracy of machine learning models varies based on the dataset, feature selection, and training process. To create a fairground for all experimenting models, we used the same dataset and features. In our implementation, the accuracy of MLP is 65.71%.

We agree with the reviewer that the accuracy of the MLP in the mentioned paper reaches up to 75%. However, it is for the experiment conducted in the paper of M. Riedmiller. The accuracy identified in our experiment with the proposed dataset differs from the paper you mentioned. And both performance measurements are accurate. Depending on the dataset, feature, and training method, the accuracy of the same machine-learning model can vary.

The PDF of this responses to the respected reviewer has been attached as well.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

My comments were addressed clearly.

Back to TopTop