Shoplifting Detection Using Hybrid Neural Network CNN-BiLSMT and Development of Benchmark Dataset
Abstract
:1. Introduction
1.1. Contribution of This Study
1.2. Organization of This Study
2. Related Work
3. Dataset Development Process
3.1. Data Collection
- Picking up the items and placing them in their pocket.
- Boys setting the items into their shirts.
- Boys putting items into their jackets.
- Boys also placed the stuff inside their college bags.
3.2. Data Annotation and Statistics
3.3. Images from Dataset
3.4. Comparison with Existing Dataset
4. Methods
4.1. Baseline Methods
4.1.1. Two-Dimensional Convolutional Neural Network
4.1.2. Three-Dimensional Convolutional Neural Network
4.2. Proposed Method
5. Experimental Setup
6. Results and Discussion
- Multi-scale processing to process images at different scales. This allows the network to capture features at different levels of abstraction, which helps improve its accuracy.
- Inception V3 uses a combination of convolutional layers of different sizes and pooling layers to reduce the number of parameters in the network while still maintaining high accuracy. This efficient use of parameters allows the network to train faster, even with less data.
- Inception V3 includes auxiliary classifiers that are used during training to encourage the network to learn more useful features, which helps improve the overall accuracy of the network.
- Inception V3 uses various regularization techniques, such as dropout and weight decay, to prevent overfitting and improve the generalization performance of the network.
- BiLSTMs can capture long-term dependencies in sequential data by using recurrent connections that allow information to flow through the network from one time step to the next.
- BiLSTMs process the input sequence in both forward and backward directions, allowing the network to capture information from both past and future contexts.
- BiLSTMs use memory cells to store information for later use. This allows the network to selectively remember or forget information based on the current input and the state of the network.
- BiLSTMs can be trained using techniques such as gradient clipping and dropout to prevent overfitting and improve generalization performance. These techniques help to prevent the network from becoming too specialized for the training data and allow it to perform well on new, unseen data.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Acknowledgments
Conflicts of Interest
References
- Arroyo, R.; Yebes, J.J.; Bergasa, L.M.; Daza, I.G.; Almazán, J. Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert Syst. Appl. 2015, 42, 7991–8005. [Google Scholar] [CrossRef]
- Ansari, M.A.; Singh, D.K. An expert eye for identifying shoplifters in mega stores. In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2021; Springer: Singapore, 2021; Volume 3, pp. 107–115. [Google Scholar]
- Kirichenko, L.; Radivilova, T.; Sydorenko, B.; Yakovlev, S. Detection of Shoplifting on Video Using a Hybrid Network. Computation 2022, 10, 199. [Google Scholar] [CrossRef]
- Gaur, K.D. Textbook on the Indian Penal Code; Universal Law Publishing: Boca Raton, FL, USA, 2009. [Google Scholar]
- Singh, D.K. Human action recognition in video. In Proceedings of the Advanced Informatics for Computing Research: Second International Conference, ICAICR 2018, Shimla, India, 14–15 July 2018; Revised Selected Papers, Part I 2. pp. 54–66. [Google Scholar]
- Singh, D.K.; Kushwaha, D.S. Tracking movements of humans in a real-time surveillance scene. In Proceedings of Fifth International Conference on Soft Computing for Problem Solving; SocProS 2015; Thapar University: Patiala, India, 2016; Volume 2, pp. 491–500. [Google Scholar]
- Kirichenko, L.; Radivilova, T. Analyzes of the distributed system load with multifractal input data flows. In Proceedings of the 2017 14th IEEE International Conference the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), Lviv, Ukraine, 21–25 February 2017; pp. 260–264. [Google Scholar]
- Szentannai, K.; Al-Afandi, J.; Horváth, A. Mimosanet: An unrobust neural network preventing model stealing. arXiv 2019, arXiv:1907.01650. [Google Scholar]
- Sultani, W.; Chen, C.; Shah, M. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 14–19 June 2020; pp. 6479–6488. [Google Scholar]
- Arunnehru, J.; Chamundeeswari, G.; Bharathi, S.P. Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput. Sci. 2018, 133, 471–477. [Google Scholar] [CrossRef]
- Martínez-Mascorro, G.A.; Abreu-Pederzini, J.R.; Ortiz-Bayliss, J.C.; Garcia-Collantes, A.; Terashima-Marín, H. Criminal intention detection at early stages of shoplifting cases by using 3D convolutional neural networks. Computation 2021, 9, 24. [Google Scholar] [CrossRef]
- Amin, J.; Anjum, M.A.; Ibrar, K.; Sharif, M.; Kadry, S.; Crespo, R.G. Detection of anomaly in surveillance videos using quantum convolutional neural networks. Image Vis. Comput. 2023, 135, 104710. [Google Scholar] [CrossRef]
- Ansari, M.A.; Singh, D.K. ESAR, An Expert Shoplifting Activity Recognition System. Cybern. Inf. Technol. 2022, 22, 190–200. [Google Scholar] [CrossRef]
- Yamato, Y.; Fukumoto, Y.; Kumazaki, H. Security camera movie and ERP data matching system to prevent theft. In Proceedings of the 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2017; pp. 1014–1015. [Google Scholar]
- Tsushita, H.; Zin, T.T. A study on detection of abnormal behavior by a surveillance camera image. In Proceedings of the First International Conference on Big Data Analysis and Deep Learning; University of Miyazaki Japan: Miyazaki, Japan, 2019; pp. 284–291. [Google Scholar]
- Nasaruddin, N.; Muchtar, K.; Afdhal, A.; Dwiyantoro, A.P.J. Deep anomaly detection through visual attention in surveillance videos. J. Big Data 2020, 7, 87. [Google Scholar] [CrossRef]
- Zhiqiang, W.; Jun, L. A review of object detection based on convolutional neural network. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 11104–11109. [Google Scholar]
- Chung, H.-Y.; Chung, Y.-L.; Tsai, W.-F. An efficient hand gesture recognition system based on deep CNN. In Proceedings of the 2019 IEEE International Conference on Industrial Technology (ICIT), Melbourne, VIC, Australia, 13–15 February 2019; pp. 853–858. [Google Scholar]
- Wu, Y.; Zheng, B.; Zhao, Y. Dynamic gesture recognition based on LSTM-CNN. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 2446–2450. [Google Scholar]
- Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
- Li, Q.; Cai, W.; Wang, X.; Zhou, Y.; Feng, D.D.; Chen, M. Medical image classification with convolutional neural network. In Proceedings of the 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, 10–12 December 2014; pp. 844–848. [Google Scholar]
- Gao, H.; Cheng, B.; Wang, J.; Li, K.; Zhao, J.; Li, D. Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment. IEEE Trans. Ind. Inform. 2018, 14, 4224–4231. [Google Scholar] [CrossRef]
- Sharma, S.; Sharma, S.; Athaiya, A. Activation functions in neural networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
- Gong, X.; Tang, B.; Zhu, R.; Liao, W.; Song, L. Data augmentation for electricity theft detection using conditional variational auto-encoder. Energies 2020, 13, 4291. [Google Scholar] [CrossRef]
- Debella-Gilo, M.; Gjertsen, A.K. Mapping seasonal agricultural land use types using deep learning on Sentinel-2 image time series. Remote Sens. 2021, 13, 289. [Google Scholar] [CrossRef]
- Li, L.; Ota, K.; Dong, M. Sustainable CNN for robotic: An offloading game in the 3D vision computation. IEEE Trans. Sustain. Comput. 2018, 4, 67–76. [Google Scholar] [CrossRef]
- Ouyang, X.; Xu, S.; Zhang, C.; Zhou, P.; Yang, Y.; Liu, G.; Li, X. A 3D-CNN and LSTM based multi-task learning architecture for action recognition. IEEE Access 2019, 7, 40757–40770. [Google Scholar] [CrossRef]
- Eckle, K.; Schmidt-Hieber, J. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw. 2019, 110, 232–242. [Google Scholar] [CrossRef]
- Niyas, S.; Vaisali, S.C.; Show, I.; Chandrika, T.; Vinayagamani, S.; Kesavadas, C.; Rajan, J. Segmentation of focal cortical dysplasia lesions from magnetic resonance images using 3D convolutional neural networks. Biomed. Signal Process. Control. 2021, 70, 102951. [Google Scholar] [CrossRef]
- He, X.; Yang, X.; Zhang, S.; Zhao, J.; Zhang, Y.; Xing, E.; Xie, P. Sample-efficient deep learning for COVID-19 diagnosis based on CT scans. medrXiv 2020. medrXiv:2020.2004.2013.20063941. [Google Scholar]
- Sam, S.M.; Kamardin, K.; Sjarif, N.N.A.; Mohamed, N. Offline signature verification using deep learning convolutional neural network (CNN) architectures GoogLeNet inception-v1 and inception-v3. Procedia Comput. Sci. 2019, 161, 475–483. [Google Scholar]
- Tao, D.; Wen, Y.; Hong, R. Multicolumn bidirectional long short-term memory for mobile devices-based human activity recognition. IEEE Internet Things J. 2016, 3, 1124–1134. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Ansari, M.A.; Singh, D.K. Optimized Parameter Tuning in a Recurrent Learning Process for Shoplifting Activity Classification. Cybern. Inf. Technol. 2023, 23, 141–160. [Google Scholar] [CrossRef]
- Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Oslo, Norway, 4–6 June 2018; pp. 1–2. [Google Scholar]
- Tong, W.; Xie, Q.; Hong, H.; Shi, L.; Fang, H.; Perkins, R. Assessment of prediction confidence and domain extrapolation of two structure–activity relationship models for predicting estrogen receptor binding activity. Environ. Health Perspect. 2004, 112, 1249–1254. [Google Scholar] [PubMed] [Green Version]
- Akhtar, N.; Saddique, M.; Asghar, K.; Bajwa, U.I.; Hussain, M.; Habib, Z. Digital video tampering detection and localization: Review, representations, challenges and algorithm. Mathematics 2022, 10, 168. [Google Scholar] [CrossRef]
- Saddique, M.; Asghar, K.; Bajwa, U.I.; Hussain, M.; Aboalsamh, H.A.; Habib, Z. Classification of authentic and tampered video using motion residual and parasitic layers. IEEE Access 2020, 8, 56782–56797. [Google Scholar] [CrossRef]
- Asghar, K.; Sun, X.; Rosin, P.L.; Saddique, M.; Hussain, M.; Habib, Z. Edge–texture feature-based image forgery detection with cross-dataset evaluation. Mach. Vis. Appl. 2019, 30, 1243–1262. [Google Scholar] [CrossRef]
Datasetsets | Shoplifting | Non-Shoplifting | Number of Videos | Dataset Length | Average Fames/s |
---|---|---|---|---|---|
Arroyo et al. [1] | 155 | 755 | 910 | 2730 s | 10 |
Ansari et al. [2] | 87 | 88 | 175 | 1750 s | 15 |
UCF-Crime [3] | 28 | 1872 | 1900 | 460,800 s | 30 |
Developed Dataset | 450 | 450 | 900 | 2700 s | 30 |
Methods | Convolutional Layers | Activation Units | Training Accuracy (%) | Validation Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | AUC |
---|---|---|---|---|---|---|---|---|
2D CNN | 4 | Relu Softmax | 50.00 | 45.00 | 51.00 | 50.00 | 50.40 | 0.49 |
3D CNN | 8 | Relu Softmax | 60.85 | 55.38 | 66.60 | 58.80 | 61.80 | 0.57 |
Proposed Method | 16 | Relu Softmax | 82.01 | 81.00 | 88.80 | 78.40 | 83.01 | 0.88 |
Actual | Predicted | ||
Shoplifting | Non-Shoplifting | ||
Shoplifting | 230 | 220 | |
Non-Shoplifting | 230 | 220 |
Actual | Predicted | ||
Shoplifting | Non-Shoplifting | ||
Shoplifting | 300 | 150 | |
Non-Shoplifting | 210 | 240 |
Actual | Predicted | ||
Shoplifting | Non-Shoplifting | ||
Shoplifting | 400 | 50 | |
Non-Shoplifting | 110 | 340 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Muneer, I.; Saddique, M.; Habib, Z.; Mohamed, H.G. Shoplifting Detection Using Hybrid Neural Network CNN-BiLSMT and Development of Benchmark Dataset. Appl. Sci. 2023, 13, 8341. https://doi.org/10.3390/app13148341
Muneer I, Saddique M, Habib Z, Mohamed HG. Shoplifting Detection Using Hybrid Neural Network CNN-BiLSMT and Development of Benchmark Dataset. Applied Sciences. 2023; 13(14):8341. https://doi.org/10.3390/app13148341
Chicago/Turabian StyleMuneer, Iqra, Mubbashar Saddique, Zulfiqar Habib, and Heba G. Mohamed. 2023. "Shoplifting Detection Using Hybrid Neural Network CNN-BiLSMT and Development of Benchmark Dataset" Applied Sciences 13, no. 14: 8341. https://doi.org/10.3390/app13148341
APA StyleMuneer, I., Saddique, M., Habib, Z., & Mohamed, H. G. (2023). Shoplifting Detection Using Hybrid Neural Network CNN-BiLSMT and Development of Benchmark Dataset. Applied Sciences, 13(14), 8341. https://doi.org/10.3390/app13148341