Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Detection of Shoplifting on Video Using a Hybrid Network

Computation 2022, 10(11), 199; https://doi.org/10.3390/computation10110199

by Lyudmyla Kirichenko^1,2

, Tamara Radivilova³

, Bohdan Sydorenko¹ and Sergiy Yakovlev^4,5,*

Reviewer 1:

Tomasz Niedoba

Reviewer 2: Anonymous

Computation 2022, 10(11), 199; https://doi.org/10.3390/computation10110199

Submission received: 11 October 2022 / Revised: 31 October 2022 / Accepted: 1 November 2022 / Published: 6 November 2022

(This article belongs to the Special Issue Integrated Computer Technologies in Mechanical Engineering – Synergetic Engineering Ⅱ)

Round 1

Reviewer 1 Report

The paper deals with an interesting topic related to shoplifting cases and their detection. This kind of events happen everywhere and a good control of such cases is important. Therefore, I find the paper to be needed.

The paper consists of six main parts. The first one is Introduction where Authors introduced the background and the second one is a short review of references. In my opinion, these two chapters can be joined as the first one is very short. The review itself is well done. It presents various approaches of detecting shoplifting cases by means of different methods.

Authors ended these two chapters with stating a problem. In my opinion this also does not have to be done in a separated chapter. This can be added to Introduction part or to the next one – Materials and Methods.

Fourth chapter – Materials and Methods – shows the core of the methodology used in the paper. First, the input data set was introduced and presented. Next, Authors chose a neural network to classify video. Authors selected convolutional and recurrent neural networks as both of them are needed to take different features into consideration. The following subchapter deals with assessment of classification accuracy. Authors proposed to use fractional expressions to that purpose. In my opinion – here there is lack of numeration of the formulas. This should be done so then certain observations can be easily related to certain equations.

Fifth part shows the experiment. The main way of how the proposed algorithm works was presented. Authors shortly discussed each step of the methodology. In my opinion – this chapter can be joined with the previous one as it is still presenting the methods and way of how the experiment was conducted.

Sixth part is dedicated to results and discussion. The chapter is well presented and the obtained results are of good quality. Therefore, it can be stated that proposed algorithm can be an efficient tool to detect shoplifting cases basing on video analysis.

Final part are conclusions, which should be chapter 7 and it is 5 in the text. However, after reformatting the content these numbers will change.

Generally, the paper is interesting and the presented topic is important. My main concern is related to the paper edition. Too many chapters and some of the information are not in the right one. But, after a minor revision the paper will be suitable to be published.

Author Response

Dear Reviewer!

Thank you very much for the deep study of our article and your comments. We tried to take all of them into account in the process of changing the text.

We strictly adhered to the rules for structuring articles in the journal. If you don't mind, we'd like to leave the section structure as it is.

Best wishes,

Lyudmyla Kirichenko

Tamara Radivilova

Bohdan Sydorenko

Sergiy Yakovlev

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear authors,

Thanks for submitting to the journal in these uncertain times. I applaud your commitments.

Summary: The paper proposes a CNN+RNN based approach for detecting shoplifting in real-time. CNN model is used to extract feature vectors which are then fed to RNN to perform time-series decision to classify if the activity is shoplifting or not. The task is a binary classification problem and authors customized the dataset as the available samples for the dataset was not large.

Questions:

1) How are features extracted from the CNN model? What layer in the CNN model is used? How is it reduced to (30,960) dimension before feeding to the RNN model?

2) What are the hyperparameters used for training the network. This disclosure is a must.

3) What optimizer was used for training? When was the decision of training stop was taken? Have you considered thinking about overfitting?

Author Response

Dear Reviewer!

Thank you very much for the deep study of our article and your comments. We tried to take all of them into account in the process of changing the text.

1) How are features extracted from the CNN model? What layer in the CNN model is used? How is it reduced to (30,960) dimension before feeding to the RNN model?

Transfer learning was used to extract features, specially pretrained on ImageNet dataset CNN MobileNetV3Large from Keras (TensorFlow API). Since we are using a pretrained model, its weights are loaded along with the model architecture, in this way we use uploaded parameters to reduce the dimension from 30x224x224x3 ( it means that we the resolution is 224x224 and the three-color model (RGB)) to 30x960. Feature extraction uses a neural network without a fully connected layer at the top to get a set of features before they go into a predict.

The list of layers used in the MobileNetV3Large neural network is presented in the table below. Taken from the article at the link - https://arxiv.org/pdf/1905.02244.pdf.

2) What are the hyperparameters used for training the network. This disclosure is a must.

One of the few hyperparameters of the convolutional neural network available for selection was the dimension of the video fragment, it was chosen based on the parameters of the input layer, we reduced all instances to 224x224x3, it means that the resolution is 224x224 and the three-color model (RGB). Also, as previously specified, we removed the top layer in order to perform feature extraction. At the output, we have a vector with a dimension 1x960 (one frame), that is, 960 features, which is a constant value for this neural network.

The following hyperparameters were selected by a large number of experiments: batch_size = 64, epochs = 60, the ratio of train and test data – 70/30, the quality metric in terms of training – accuracy.

Also, an important and necessary part of the process of selecting hyperparameters is the construction of the architecture of the sequence model (recurrent classifier), the set of layers is as follows:

At the input we have a GRU layer with 32 units.
Next comes the GRU layer with 16 units.
Dropout layer with a rate of 0.4 to reduce overfitting.
Fully connected layer with 8 neurons with relu activation function.
And the output is a fully connected layer with 1 neuron and sigmoid activation function, which gives the probability of theft.

Using the Adam optimizer (Adaptive Momentum) from Keras [TensorFlow API], we have a leaning rate of 0.001 by default, and the descent stochastic gradient momentum is adjusted adaptively as determined by the adam optimizer.

Since we have only 2 classes, binary_crossentropy loss was chosen to perform in the recurrent neural network.

3) What optimizer was used for training? When was the decision of training stop was taken? Have you considered thinking about overfitting?

The Adam optimizer was chosen to train the recurrent neural network. Using checkpoints that TensorFlow API provides, the weights with the best value of metric (accuracy) on the validation set were saved. Further, on the accuracy per epoch plot, an area, in which the metric values on the validation set did not grow, but on the training set they reached ~1, was detected. Then based on this, it was decided to reduce the number of epochs to 60, in this way we avoided overfitting.

Most of the answers have been added to the article.

Best wishes,

Lyudmyla Kirichenko

Tamara Radivilova

Bohdan Sydorenko

Sergiy Yakovlev

Author Response File: Author Response.pdf

Article Menu

Printed Edition

Detection of Shoplifting on Video Using a Hybrid Network

Further Information

Guidelines

MDPI Initiatives

Follow MDPI