Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance
Abstract
1. Introduction
- Review of the most relevant state-of-the-art contributions in the last four years dealing with DL applied to the AD problem.
- Detailed categorization of the existing methods in AD by classifying the approaches according to the specific DL methods and the adopted architectural models for AD.
- A comprehensive analysis of the DL architectures used in AD has been introduced to make it easy for a researcher to choose which approach may be more appropriate for the particular AD application.
- A performance evaluation of methodologies are discussed in terms of datasets and measures of performance.
- A discussion of the current challenges and needs in the domain of DL applicable to AD is put forth.
- A description of those new trends in DL-based AD are discussed to provide several interesting ideas to be considered in future research.
Survey Methodology
2. Classifications of Anomaly Detection
2.1. Image-Based and Video-Based Detection
2.2. Single-Point Anomaly and Group Anomalies
3. DL Methods of AD
3.1. Supervised Learning-Based AD for Video Streaming
3.2. Semi-Supervised Learning-Based AD for Video Streaming
3.3. Unsupervised Learning-Based AD for Video Streaming
3.4. Transfer Learning-Based AD for Video Streaming
3.5. Deep Active Learning-Based AD for Video Streaming
3.6. Deep Reinforcement Learning-Based AD
3.7. Deep Hybrid Models-Based AD for Video
4. DL Architectures for AD
4.1. Two-Stream Convolutional Architecture (Dual-Stream CNNs)
4.2. 3D Convolution Architecture (3D-ConvNet)
4.3. ConvLSTM Architecture
4.4. Using Human Skeleton Data
4.5. Miscellaneous Architectures
5. Benchmark Datasets
6. Anomaly Detection Approach Performance Metrics
7. Applications of AD for Video
7.1. Autonomous Driving
7.2. Automated Surveillance
7.3. Industrial Automation
7.4. Medical AD
8. Research Challenges in DL-Based AD Approaches
8.1. Anomaly Characteristics
8.2. Anomaly Definition
8.3. Environmental Factors
8.4. Division of Dataset
8.5. Data Diversity
8.6. Data Annotation
8.7. Feature Normalization
8.8. Model Generalization
8.9. Real Time Systems
8.10. Lightweight Models
9. Future Directions
9.1. Aerial Surveillance
9.2. AD from Moving Cameras
9.3. Self-Supervised Learning in Video
9.4. Human–Robot Collaboration
9.5. Ensemble Approaches
10. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dávila-Montero, S.; Dana-Lê, J.A.; Bente, G.; Hall, A.T.; Mason, A.J. Review and Challenges of Technologies for Real-Time Human Behavior Monitoring. IEEE Trans. Biomed. Circuits Syst. 2021, 15, 2–28. [Google Scholar] [CrossRef] [PubMed]
- Ren, J.; Xia, F.; Liu, Y.; Lee, I. Deep Video Anomaly Detection: Opportunities and Challenges. In Proceedings of the 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, 7–10 December 2021; pp. 959–966. [Google Scholar]
- Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R. A Unifying Review of Deep and Shallow Anomaly Detection. Proc. IEEE 2021, 109, 756–795. [Google Scholar] [CrossRef]
- Al-Dhamari, A.; Sudirman, R.; Mahmood, N.H. Transfer Deep Learning along with Binary Support Vector Machine for Abnormal Behavior Detection. IEEE Access 2020, 8, 61085–61095. [Google Scholar] [CrossRef]
- Yuan, J.; Wu, X.; Yuan, S. A Rapid Recognition Method for Pedestrian Abnormal Behavior. In Proceedings of the 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL), Chongqing, China, 10–12 July 2020; pp. 241–245. [Google Scholar]
- Bian, C.; Wang, L.; Gu, H.; Zhou, F. Abnormal Behavior Recognition Based on Edge Feature and 3D Convolutional Neural Network. In Proceedings of the 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Zhanjiang, China, 16–18 October 2020; pp. 1–6. [Google Scholar]
- Gorodnichev, M.G.; Gromov, M.D.; Polyantseva, K.A.; Moseva, M.S. Research and Development of a System for Determining Abnormal Human Behavior by Video Image Based on Deepstream Technology. In Proceedings of the 2022 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), Sankt Petersburg, Russia, 31 May–4 June 2022; pp. 1–9. [Google Scholar]
- Cao, B.; Xia, H.; Liu, Z. A Video Abnormal Behavior Recognition Algorithm Based on Deep Learning. In Proceedings of the 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 18–20 June 2021; Volume 4, pp. 755–759. [Google Scholar]
- Vrskova, R.; Hudec, R.; Kamencay, P.; Sykora, P. Recognition of Human Activity and Abnormal Behavior Using Deep Neural Network. In Proceedings of the 2022 14th International Conference Elektro, Krakow, Poland, 23–26 May 2022; pp. 1–4. [Google Scholar]
- Fan, B.; Li, P.; Jin, S.; Wang, Z. Anomaly Detection Based on Pose Estimation and GRU-FFN. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 23–25 December 2021; pp. 3821–3825. [Google Scholar]
- Traoré, A.; Akhloufi, M.A. Violence Detection in Videos Using Deep Recurrent and Convolutional Neural Networks. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada,, 11–14 December 2020; pp. 154–159. [Google Scholar]
- Emad, M.; Ishack, M.; Ahmed, M.; Osama, M.; Salah, M.; Khoriba, G. Early-Anomaly Prediction in Surveillance Cameras for Security Applications. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt,, 26–27 May 2021; pp. 124–128. [Google Scholar]
- Chexia, Z.; Tan, Z.; Wu, D.; Ning, J.; Zhang, B. A Generalized Model for Crowd Violence Detection Focusing on Human Contour and Dynamic Features. In Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina, Italy, 16–19 May 2022; pp. 327–335. [Google Scholar]
- Zhang, W.; Miao, Z.; Xu, W. A Video Anomalous Behavior Detection Method Based on Multi-Task Learning. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp. 396–400. [Google Scholar]
- Alkanat, T.; Groot, H.G.J.; Zwemer, M.; Bondarev, E.; de Peter, H.N. Towards Scalable Abnormal Behavior Detection in Automated Surveillance. In Proceedings of the 2021 4th International Conference on Artificial Intelligence for Industries (AI4I), Laguna Hills, CA, USA, 20–22 September 2021; pp. 21–24. [Google Scholar]
- Tang, X.; Astle, Y.S.; Freeman, C. Deep Anomaly Detection with Ensemble-Based Active Learning. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 1663–1670. [Google Scholar]
- Kabir, M.M.; Safir, F.B.; Shahen, S.; Maua, J.; Binte Awlad, I.A.; Mridha, M.F. Human Abnormality Classification Using Combined CNN-RNN Approach. In Proceedings of the HONET 2020—IEEE 17th International Conference on Smart Communities: Improving Quality of Life using ICT, IoT and AI, Charlotte, NC, USA, 14–16 December 2020; pp. 204–208. [Google Scholar] [CrossRef]
- Heo, T.; Nam, W.; Paek, J.; Ko, J. Autonomous Reckless Driving Detection Using Deep Learning on Embedded GPUs. In Proceedings of the 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Delhi, India, 10–13 December 2020; pp. 464–472. [Google Scholar]
- Xiao, Y.; Wang, Y.; Li, W.; Sun, M.; Shen, X.; Luo, Z. Monitoring the Abnormal Human Behaviors in Substations Based on Probabilistic Behaviours Prediction and YOLO-V5. In Proceedings of the 2022 7th Asia Conference on Power and Electrical Engineering (ACPEE), Hangzhou, China, 15–17 April 2022; pp. 943–948. [Google Scholar]
- Shi, Y.; Guo, B.; Xu, Y.; Xu, Z.; Huang, J.; Lu, J.; Yao, D. Recognition of Abnormal Human Behavior in Elevators Based on CNN. In Proceedings of the 2021 26th International Conference on Automation and Computing (ICAC), Portsmouth, UK, 2–4 September 2021; pp. 1–6. [Google Scholar]
- Pawar, K.; Attar, V. Application of Deep Learning for Crowd Anomaly Detection from Surveillance Videos. In Proceedings of the 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 28–29 January 2021; pp. 506–511. [Google Scholar]
- Wang, Z.; Jiang, K.; Hou, Y.; Dou, W.; Zhang, C.; Huang, Z.; Guo, Y. A Survey on Human Behavior Recognition Using Channel State Information. IEEE Access 2019, 7, 155986–156024. [Google Scholar] [CrossRef]
- Li, J.; Xie, H.; Zang, Z.; Wang, G. Real-Time Abnormal Behavior Recognition and Monitoring System Based on Panoramic Video. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7129–7134. [Google Scholar]
- Marsiano, A.F.D.; Soesanti, I.; Ardiyanto, I. Deep Learning-Based Anomaly Detection on Surveillance Videos: Recent Advances. In Proceedings of the 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA), Yogyakarta, Indonesia, 20–21 September 2019; pp. 1–6. [Google Scholar]
- Chalapathy, R.; Chawla, S. Deep Learning for Anomaly Detection: A Survey. arXiv 2019, arXiv:1901.03407. [Google Scholar]
- Pawar, K.; Attar, V. Deep Learning Approaches for Video-Based Anomalous Activity Detection. World Wide Web 2019, 22, 571–601. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef]
- Nayak, R.; Pati, U.C.; Das, S.K. A Comprehensive Review on Deep Learning-Based Methods for Video Anomaly Detection. Image Vis. Comput. 2021, 106, 104078. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Al-Shamma, O.; Fadhel, M.A.; Farhan, L.; Zhang, J.; Duan, Y. Optimizing the Performance of Breast Cancer Classification by Employing the Same Domain Transfer Learning from Hybrid Deep Convolutional Neural Network Model. Electronics 2020, 9, 445. [Google Scholar] [CrossRef]
- Ali, L.R.; Jebur, S.A.; Jahefer, M.M.; Shaker, B.N. Employing Transfer Learning for Diagnosing COVID-19 Disease. Int. J. Onl. Eng. 2022, 18, 31–42. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Al-Amidie, M.; Al-Asadi, A.; Humaidi, A.J.; Al-Shamma, O.; Fadhel, M.A.; Zhang, J.; Santamaría, J.; Duan, Y. Novel Transfer Learning Approach for Medical Imaging with Limited Labeled Data. Cancers 2021, 13, 1590. [Google Scholar] [CrossRef] [PubMed]
- Alzubaidi, L.; Fadhel, M.A.; Al-Shamma, O.; Zhang, J.; Santamaría, J.; Duan, Y.; Oleiwi, S.R. Towards a Better Understanding of Transfer Learning for Medical Imaging: A Case Study. Appl. Sci. 2020, 10, 4523. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Z.; Zhou, C.; Jiang, Y.; Sun, J.; Wang, M.; He, X. Generative Adversarial Active Learning for Unsupervised Outlier Detection. IEEE Trans. Knowl. Data Eng. 2019, 32, 1517–1528. [Google Scholar] [CrossRef]
- Pimentel, T.; Monteiro, M.; Veloso, A.; Ziviani, N. Deep Active Learning for Anomaly Detection. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Pang, G.; van den Hengel, A.; Shen, C.; Cao, L. Deep Reinforcement Learning for Unknown Anomaly Detection. arXiv 2020, arXiv:2009.06847. [Google Scholar]
- Aberkane, S.; Elarbi, M. Deep Reinforcement Learning for Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the 2019 6th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria, 24–25 November 2019; pp. 1–5. [Google Scholar]
- Zhao, Y.; Deng, B.; Shen, C.; Liu, Y.; Lu, H.; Hua, X.-S. Spatio-Temporal Autoencoder for Video Anomaly Detection. In Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1933–1941. [Google Scholar]
- Naik, A.J.; Gopalakrishna, M.T. Deep-Violence: Individual Person Violent Activity Detection in Video. Multimed. Tools Appl. 2021, 80, 18365–18380. [Google Scholar] [CrossRef]
- Lin, C.-B.; Dong, Z.; Kuan, W.-K.; Huang, Y.-F. A Framework for Fall Detection Based on Openpose Skeleton and Lstm/Gru Models. Appl. Sci. 2020, 11, 329. [Google Scholar] [CrossRef]
- Khayrat, A.; Malak, P.; Victor, M.; Ahmed, S.; Metawie, H.; Saber, V.; Elshalakani, M. An Intelligent Surveillance System for Detecting Abnormal Behaviors on Campus Using YOLO and CNN-LSTM Networks. In Proceedings of the 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 8–9 May 2022; pp. 104–109. [Google Scholar]
- Vrskova, R.; Hudec, R.; Kamencay, P.; Sykora, P. A New Approach for Abnormal Human Activities Recognition Based on ConvLSTM Architecture. Sensors 2022, 22, 2946. [Google Scholar] [CrossRef]
- Ali, M.A.; Hussain, A.J.; Sadiq, A.T. Deep Learning Algorithms for Human Fighting Action Recognition. Int. J. Online Biomed. Eng. 2022, 18, 71–87. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. Adv. Neural Inf. Process. Syst. 2014, 27, 1–11. [Google Scholar]
- Huang, X.; He, P.; Rangarajan, A.; Ranka, S. Intelligent Intersection: Two-Stream Convolutional Networks for Real-Time near-Accident Detection in Traffic Video. ACM Trans. Spat. Algorithms Syst. 2020, 6, 1–28. [Google Scholar] [CrossRef]
- Hao, W.; Zhang, R.; Li, S.; Li, J.; Li, F.; Zhao, S.; Zhang, W. Anomaly Event Detection in Security Surveillance Using Two-Stream Based Model. Secur. Commun. Netw. 2020, 2020, 8876056. [Google Scholar] [CrossRef]
- Jamadandi, A.; Kotturshettar, S.; Mudenagudi, U. Two Stream Convolutional Neural Networks for Anomaly Detection in Surveillance Videos. In Smart Computing Paradigms: New Progresses and Challenges; Springer: Singapore, 2020; pp. 41–48. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3d Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Abdali, A.-M.R.; Al-Tuma, R.F. Robust Real-Time Violence Detection in Video Using Cnn and Lstm. In Proceedings of the 2019 2nd Scientific Conference of Computer Sciences (SCCS), Baghdad, Iraq, 27–28 March 2019; pp. 104–108. [Google Scholar]
- Lin, F.-C.; Ngo, H.-H.; Dow, C.-R.; Lam, K.-H.; Le, H.L. Student Behavior Recognition System for the Classroom Environment Based on Skeleton Pose Estimation and Person Detection. Sensors 2021, 21, 5314. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Yi, J.; Farha, Y.A.; Gall, J. Pose Refinement Graph Convolutional Network for Skeleton-Based Action Recognition. IEEE Robot. Autom. Lett. 2021, 6, 1028–1035. [Google Scholar] [CrossRef]
- Ali, M.A.; Hussain, A.J.; Sadiq, A.T. Human Fall Down Recognition Using Coordinates Key Points Skeleton. Int. J. Online Biomed. Eng. 2022, 18, 88–104. [Google Scholar]
- Lathifah, N.; Lin, H.-I. A Brief Review on Behavior Recognition Based on Key Points of Human Skeleton and Eye Gaze To Prevent Human Error. In Proceedings of the 2022 13th Asian Control Conference (ASCC), Jeju Island, Republic of Korea, 4–7 May 2022; pp. 1396–1403. [Google Scholar]
- Zhang, F.; Bazarevsky, V.; Vakunov, A.; Tkachenka, A.; Sung, G.; Chang, C.-L.; Grundmann, M. Mediapipe Hands: On-Device Real-Time Hand Tracking. arXiv 2020, arXiv:2006.10214. [Google Scholar]
- Jia, J.-G.; Zhou, Y.-F.; Hao, X.-W.; Li, F.; Desrosiers, C.; Zhang, C.-M. Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition. J. Comput. Sci. Technol. 2020, 35, 538–550. [Google Scholar] [CrossRef]
- Agahian, S.; Negin, F.; Köse, C. An Efficient Human Action Recognition Framework with Pose-Based Spatiotemporal Features. Eng. Sci. Technol. Int. J. 2020, 23, 196–203. [Google Scholar] [CrossRef]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A Large Video Database for Human Motion Recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Patil, P.W.; Murala, S. MSFgNet: A Novel Compact End-to-End Deep Network for Moving Object Detection. IEEE Trans. Intell. Transp. Syst. 2018, 20, 4066–4077. [Google Scholar] [CrossRef]
- Patil, P.W.; Biradar, K.M.; Dudhane, A.; Murala, S. An End-to-End Edge Aggregation Network for Moving Object Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8149–8158. [Google Scholar]
- Patil, N.; Biswas, P.K. A Survey of Video Datasets for Anomaly Detection in Automated Surveillance. In Proceedings of the 2016 Sixth International Symposium on Embedded Computing and System Design (ISED), Patna, India, 15–17 December 2016; pp. 43–48. [Google Scholar]
- Fisher, R.B. The PETS04 Surveillance Ground-Truth Data Sets. In Proceedings of the 6th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Prague, Czech Republic, 10 May 2004; pp. 1–5. [Google Scholar]
- Mehran, R.; Oyama, A.; Shah, M. Abnormal Crowd Behavior Detection Using Social Force Model. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 935–942. [Google Scholar] [CrossRef]
- Mahadevan, V.; Li, W.; Bhalodia, V.; Vasconcelos, N. Anomaly Detection in Crowded Scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1975–1981. [Google Scholar]
- Blunsden, S.; Fisher, R.B. The BEHAVE Video Dataset: Ground Truthed Video for Multi-Person Behavior Classification. Ann. BMVA 2010, 4, 4. [Google Scholar]
- Bermejo Nievas, E.; Deniz Suarez, O.; Bueno García, G.; Sukthankar, R. Violence Detection in Video Using Computer Vision Techniques. In Computer Analysis of Images and Patterns; Springer: Cham, Switzerland, 2011; pp. 332–339. [Google Scholar]
- Hassner, T.; Itcher, Y.; Kliper-Gross, O. Violent Flows: Real-Time Detection of Violent Crowd Behavior. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 1–6. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv 2021, arXiv:1212.0402. [Google Scholar]
- Lu, C.; Shi, J.; Jia, J. Abnormal Event Detection at 150 Fps in Matlab. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2720–2727. [Google Scholar]
- Caba Heilbron, F.; Escorcia, V.; Ghanem, B.; Carlos Niebles, J. Activitynet: A Large-Scale Video Benchmark for Human Activity Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 961–970. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P. The Kinetics Human Action Video Dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Luo, W.; Liu, W.; Gao, S. A Revisit of Sparse Coding Based Anomaly Detection in Stacked Rnn Framework. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 341–349. [Google Scholar]
- Sultani, W.; Chen, C.; Shah, M. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6479–6488. [Google Scholar]
- Soliman, M.M.; Kamal, M.H.; Nashed, M.A.E.-M.; Mostafa, Y.M.; Chawky, B.S.; Khattab, D. Violence Recognition from Videos Using Deep Learning Techniques. In Proceedings of the 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 8–10 December 2019; pp. 80–85. [Google Scholar]
- Mandal, M.; Vipparthi, S.K. An Empirical Review of Deep Learning Frameworks for Change Detection: Model Design, Experimental Frameworks, Challenges and Research Needs. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6101–6122. [Google Scholar] [CrossRef]
- Lindemann, B.; Maschler, B.; Sahlab, N.; Weyrich, M. A Survey on Anomaly Detection for Technical Systems Using LSTM Networks. Comput. Ind. 2021, 131, 103498. [Google Scholar] [CrossRef]
- Kalsotra, R.; Arora, S. A Comprehensive Survey of Video Datasets for Background Subtraction. IEEE Access 2019, 7, 59143–59171. [Google Scholar] [CrossRef]
- Wu, P.; Liu, J.; Shen, F. A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2609–2622. [Google Scholar] [CrossRef]
- Zhao, Y.; Man, K.L.; Smith, J.; Guan, S.-U. A Novel Two-Stream Structure for Video Anomaly Detection in Smart City Management. J. Supercomput. 2022, 78, 3940–3954. [Google Scholar] [CrossRef]
- Ullah, W.; Ullah, A.; Hussain, T.; Muhammad, K.; Heidari, A.A.; Del Ser, J.; Baik, S.W.; De Albuquerque, V.H.C. Artificial Intelligence of Things-Assisted Two-Stream Neural Network for Anomaly Detection in Surveillance Big Video Data. Futur. Gener. Comput. Syst. 2022, 129, 286–297. [Google Scholar] [CrossRef]
- Ohgushi, T.; Horiguchi, K.; Yamanaka, M. Road Obstacle Detection Method Based on an Autoencoder with Semantic Segmentation. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Nitsch, J.; Itkina, M.; Senanayake, R.; Nieto, J.; Schmidt, M.; Siegwart, R.; Kochenderfer, M.J.; Cadena, C. Out-of-Distribution Detection for Automotive Perception. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Indianapolis, IN, USA, 19–22 September 2021; Volume 2021, pp. 2938–2943. [Google Scholar] [CrossRef]
- Ryan, C.; Murphy, F.; Mullins, M. End-to-End Autonomous Driving Risk Analysis: A Behavioural Anomaly Detection Approach. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1650–1662. [Google Scholar] [CrossRef]
- Lindemann, B.; Fesenmayr, F.; Jazdi, N.; Weyrich, M. Anomaly Detection in Discrete Manufacturing Using Self-Learning Approaches. Procedia CIRP 2019, 79, 313–318. [Google Scholar] [CrossRef]
- Maschler, B.; Knodel, T.; Weyrich, M. Towards Deep Industrial Transfer Learning for Anomaly Detection on Time Series Data. In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vasteras, Sweden, 7–10 September 2021; pp. 1–8. [Google Scholar]
- Aboah, A. A Vision-Based System for Traffic Anomaly Detection Using Deep Learning and Decision Trees. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4207–4212. [Google Scholar]
- Samuel, D.J.; Cuzzolin, F. Unsupervised Anomaly Detection for a Smart Autonomous Robotic Assistant Surgeon (SARAS) Using a Deep Residual Autoencoder. IEEE Robot. Autom. Lett. 2021, 6, 7256–7261. [Google Scholar] [CrossRef]
- Breitenstein, J.; Termöhlen, J.-A.; Lipinski, D.; Fingscheidt, T. Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches. arXiv 2021, arXiv:2102.05897. [Google Scholar]
- Ferreira, R.S.; Guérin, J.; Guiochet, J.; Waeselynck, H. SiMOOD: Evolutionary Testing Simulation with Out-Of-Distribution Images. In Proceedings of the 27th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2022), Beijing, China, 22 November–1 December 2022. [Google Scholar]
- Siddique, A.; Afanasyev, I. Deep Learning-Based Trajectory Estimation of Vehicles in Crowded and Crossroad Scenarios. In Proceedings of the 2021 28th Conference of Open Innovations Association (FRUCT), Moscow, Russia, 27–29 January 2021; pp. 413–423. [Google Scholar]
- Prati, A.; Shan, C.; Wang, K.I.-K. Sensors, Vision and Networks: From Video Surveillance to Activity Recognition and Health Monitoring. J. Ambient Intell. Smart Environ. 2019, 11, 5–22. [Google Scholar]
- Bakunah, R.A.; Baneamoon, S.M. A Hybrid Technique for Intelligent Bank Security System Based on Blink Gesture Recognition. J. Phys. Conf. Ser. 2021, 1962, 12001. [Google Scholar] [CrossRef]
- Rego, A.; Ramírez, P.L.G.; Jimenez, J.M.; Lloret, J. Artificial Intelligent System for Multimedia Services in Smart Home Environments. Cluster Comput. 2022, 25, 2085–2105. [Google Scholar] [CrossRef]
- Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 14318–14328. [Google Scholar]
- Fernando, T.; Gammulle, H.; Denman, S.; Sridharan, S.; Fookes, C. Deep Learning for Medical Anomaly Detection–A Survey. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
- Fernando, T.; Denman, S.; Ahmedt-Aristizabal, D.; Sridharan, S.; Laurens, K.R.; Johnston, P.; Fookes, C. Neural Memory Plasticity for Medical Anomaly Detection. Neural Netw. 2020, 127, 67–81. [Google Scholar] [CrossRef]
- Xu, K.; Jiang, X.; Sun, T. Anomaly Detection Based on Stacked Sparse Coding with Intraframe Classification Strategy. IEEE Trans. Multimed. 2018, 20, 1062–1074. [Google Scholar] [CrossRef]
- Akilan, T.; Wu, Q.J.; Safaei, A.; Huo, J.; Yang, Y. A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation. IEEE Trans. Intell. Transp. Syst. 2019, 21, 959–971. [Google Scholar] [CrossRef]
- Maschler, B.; Weyrich, M. Deep Transfer Learning for Industrial Automation: A Review and Discussion of New Techniques for Data-Driven Machine Learning. IEEE Ind. Electron. Mag. 2021, 15, 65–75. [Google Scholar] [CrossRef]
- Vu, H.; Phung, D.; Nguyen, T.D.; Trevors, A.; Venkatesh, S. Energy-Based Models for Video Anomaly Detection. arXiv 2017, arXiv:1708.05211. [Google Scholar]
- Miki, D.; Chen, S.; Demachi, K. Unnatural Human Motion Detection Using Weakly Supervised Deep Neural Network. In Proceedings of the 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), Irvine, CA, USA, 21–23 September 2020; pp. 10–13. [Google Scholar]
- Mehmood, A. LightAnomalyNet: A Lightweight Framework for Efficient Abnormal Behavior Detection. Sensors 2021, 21, 8501. [Google Scholar] [CrossRef] [PubMed]
- Osifeko, M.O.; Hancke, G.P.; Abu-Mahfouz, A.M. SurveilNet: A Lightweight Anomaly Detection System for Cooperative IoT Surveillance Networks. IEEE Sens. J. 2021, 21, 25293–25306. [Google Scholar] [CrossRef]
- Chang, S.; Li, Y.; Shen, S.; Feng, J.; Zhou, Z. Contrastive Attention for Video Anomaly Detection. IEEE Trans. Multimed. 2021, 24, 4067–4076. [Google Scholar] [CrossRef]
- Mandal, M.; Kumar, L.K.; Vipparthi, S.K. Mor-Uav: A Benchmark Dataset and Baselines for Moving Object Recognition in Uav Videos. In Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 12–16 October 2020; pp. 2626–2635. [Google Scholar]
- Chen, X.; Li, Z.; Yang, Y.; Qi, L.; Ke, R. High-Resolution Vehicle Trajectory Extraction and Denoising from Aerial Videos. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3190–3202. [Google Scholar] [CrossRef]
- Jiang, C.; Paudel, D.P.; Fofi, D.; Fougerolle, Y.; Demonceaux, C. Moving Object Detection by 3d Flow Field Analysis. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1950–1963. [Google Scholar] [CrossRef]
- Fang, Z.; Jain, A.; Sarch, G.; Harley, A.W.; Fragkiadaki, K. Move to See Better: Self-Improving Embodied Object Detection. arXiv 2020, arXiv:2012.00057. [Google Scholar]
- Xu, D.; Xiao, J.; Zhao, Z.; Shao, J.; Xie, D.; Zhuang, Y. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10334–10343. [Google Scholar]
- Han, T.; Xie, W.; Zisserman, A. Video Representation Learning by Dense Predictive Coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Al-amri, R.; Murugesan, R.K.; Man, M.; Abdulateef, A.F.; Al-Sharafi, M.A.; Alkahtani, A.A. A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data. Appl. Sci. 2021, 11, 5320. [Google Scholar] [CrossRef]






| [Ref.], Year | Type of Network | Proposed Architecture | Dataset (Accuracy) | Examples of Anomalies | 
|---|---|---|---|---|
| [5], 2020 | CNN | human skeleton, YOLOv3, Multi-scale information fusion network | UVF-101, HMDN51, and camera (96.3%) | Run, fall, fight | 
| [4], 2020 | CNN | VGGNet-19 pretrained network, binary SVM | UMN (97.44%), UCSD-ped1 (86.69%) | Carts, bikers, skateboarders, running, person walking over the grass | 
| [17], 2020 | CNN, RNN | Combined CNN-RNN | NAHFE (89.5%) | Drug addiction, autism, criminal mentality. | 
| [6], 2020 | CNN | Canny edge detection algorithm, 3D-ConVNet | HMDB51 and Hollywood2 (93%) | Climbing, fighting, falling | 
| [11], 2020 | CNN, RNN | ConvLSTMs | Hockey (99%), Violent Flow (93.75%), RLV (96.74) | Violence | 
| [18], 2020 | CNN | YOLOv2 | Camera (99.8% | Reckless driving | 
| [21], 2021 | LSTM, AEs | Convolution AE and sequence to sequence LSTM | UMN (87%) | Sudden running | 
| [12], 2021 | CNN, GAN | 3D-ConVNet | CUHK Avenue (68.94%), ShanghaiTech (88.26%) | Crime | 
| [8], 2021 | CNN | 3D-ConVNet | Behave (91.75%), Caviar (92.86%) | Robbery, fight | 
| [10], 2021 | GRU, FFN | Human skeleton, GRU-FFN | ShanghaiTech (82.6%), Avenue (91.7%) | Running, falling down, robbing, fighting. | 
| [38], 2021 | CNN, LSTM | Human skeleton, ConvLSTM | Weizmann (73.1%), KTH (93.4%), private (86.5%) | Punching, kicking | 
| [39], 2021 | RNN | Human skeleton, LSTM, GRU | UR Fall Detection and Fall Detection (98.2%) | Fall | 
| [7], 2022 | RNN | LSTM and GRU | Camera (84%) | Fall, fight | 
| [13], 2022 | RNN, CNN | 3D-ConVNet, LSTM | RLVS (96.5%), Hockey (97%), violent flow (93.2%) | Violence | 
| [20], 2022 | CNN | Human skeleton, ConvLSTMs | Camera (85%) | Door blocking, door picking | 
| [9], 2022 | CNN | ConvLSTM | Abnormal Activities (97.64%) | Robbery, fight hijack, harassment | 
| [40], 2022 | CNN, LSTM | YOLOv5, ConvLSTM | Hockey fight (93.5%), Cigarette smoker (90%), Playing cards (93.8%) | Smoking, playing cards, fighting | 
| [19], 2022 | CNN | YOLOv5 | Private (91%) | Not wearing safety helmet, entering dangerous area, smoking | 
| [41], 2022 | CNN, LSTM | ConvLSTM | Abnormal Activities (96.19%) | Begging, Drunkenness, Fight, Harassment, Hijack, Knife Hazard, Robbery, and Terrorism | 
| [42], 2022 | CNN | Human skeleton, YOLOv3, VGG16 pre-trained network | Camera (95%) | Walking, hugging, fighting | 
| Dataset [Ref.] | Year | Description | No. of Videos | Resolution | Example Anomalies | URL | 
|---|---|---|---|---|---|---|
| CAVIAR [60] | 2004 | It includes videos of two different situations. The sequences are ground truth labeled frame-by-frame with bounding boxes and a semantic description of the activity in each frame. There are 28 video sequences grouped into 6 different activity scenarios. | 28 | 384 × 288 | Fighting and leaving a package in a public place | https://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/ (accessed on 11 November 2022) | 
| UMN [61] | 2006 | It’s a collection of 11 videos depicting various escape scenarios across three indoor and outdoor scenes. Each clip starts with examples of normal behavior and then turns into abnormal examples. | 11 | 320 × 240 | People running (escape) | |
| UCSD-PED1 [62] | 2010 | It consists of clips of groups of people walking towards and away from the camera, and some amount of perspective distortion. There are 34 training and 36 testing videos, each containing 36 frames. | 70 | 158 × 238 | Movement of bikers, skaters, cyclist, small carts, people in a wheelchair | http://www.svcl.ucsd.edu/projects/anomaly/dataset.html (accessed on 11 November 2022) | 
| UCSD-PED2 [62] | 2010 | It consists of a scene where most pedestrians move horizontally. The video footage of each scene is sliced into clips of 120–200 frames. There are 16 training videos and 12 testing ones. | 28 | 240 × 360 | Movement of bikers, skaters, cyclist, small carts, people in a wheelchair | http://www.svcl.ucsd.edu/projects/anomaly/dataset.html (accessed on 11 November 2022) | 
| BEHAVE [63] | 2010 | It focuses on aberrant behavior associated with criminal activity. It has around 90,000 frames of humans identified by bounding boxes, with interacting groups classified into one of 6 different behaviors. | 4 | 640 × 480 | Chase, fight, and run | |
| Hockey fight [64] | 2011 | It is collected of hockey games and scenes from action movies to describe the violent behaviors in ice hockey matches. Each clip consisting of 50 frames, is manually labeled as “fight” or “non-fight” | 1000 | 720 × 576 | Fight | https://academictorrents.com/details/38d9ed996a5a75a039b84cf8a137be794e7cee89 (accessed on 15 November 2022) | 
| HMDB-51 [56] | 2011 | It is collected from a variety of sources ranging from digitized movies to YouTube videos. In total, there are 51 action categories. | 6766 | Variable resolution | Shoot gun, climbing and falling | http://serre-lab.clps.brown.edu/resources/HMDB/ (accessed on 15 November 2022) | 
| Violent Flow [65] | 2012 | Data is compiled from various sources to characterize the actions of crowds in public areas like parks, streets, and squares. | 246 | 320 × 240 | Violence | http://www.openu.ac.il/home/hassner/data/violentflows/ (accessed on 14 November 2022) | 
| UCF-101 [66] | 2012 | A total of 27 h of footage, covering 101 different action categories, are included. Users uploaded videos with realistic camera movement and cluttered backgrounds to make the database. | 13,320 | 320 × 240 | Robbery, hijack, harassment, explosions, and fight | http://crcv.ucf.edu/data/UCF101.php (accessed on 14 November 2022) | 
| CUHK Avenue [67] | 2013 | It contains 16 training and 21 testing video clips with total 30,652 frames which describe the movement and behavior of pedestrians, cars, cyclist. | 37 | 640 × 360 | Running, throwing objects, and loitering | http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html (accessed on 14 November 2022) | 
| ActivityNet [68] | 2015 | It provides 203 activity classes, with an average of 137 videos per class, for a grand total of 849 video hours. | 27,801 | 1280 × 720 | http://www.activity-net.org (accessed on 15 November 2022) | |
| Kinetics [69] | 2017 | It contains 400 human action classes, with 400–1150 clips for each action, each from a unique video. The clips average roughly 10 s in length and are all collected from various videos available on YouTube. | 306,245 | variable resolution | Violence | https://www.deepmind.com/open-source/kinetics (accessed on 16 November 2022) | 
| ShanghaiTech Campus [70] | 2017 | It has 13 scenes with complex light conditions and camera angles. It contains 130 abnormal events and over 270, 000 training frames. | 330 | 846 × 480 | Brawling, chasing, skaters, bikers, and trolley on the pedestrian walkways | https://svip-lab.github.io/dataset/campus_dataset.html (accessed on 16 November 2022) | 
| UCF-Crime [71] | 2018 | It has 128 h of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies as well as normal activities | 1900 | variable resolution | Abuse, Arrest, Arson, Assault, Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, Vandalism. | http://crcv.ucf.edu/projects/real-world/ (accessed on 16 November 2022) | 
| RLVS [72] | 2019 | It consists of violent clips that involve fights in many different environments, such as the street, jails, and schools. The nonviolent videos also feature human activities, including playing sports, exercising, and eating. | 2000 | Average size of 397 × 511 | Fight and Violence | https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset (accessed on 17 November 2022) | 
| Metric | Definition | Equation | 
|---|---|---|
| Accuracy | It measures the number of anomalous and normal instances that are successfully classified with respect to the overall dataset. Accuracy can be a useful measure if we have a similar balance in the dataset. | |
| Equal Error rate (EER) | It is a metric that evaluates the proportion of anomalies and normal instances that are misclassified with respect to the overall dataset. It’s used to show biometric performance. | |
| Recall (Sensitivity) (True Positive Rate) | The ratio of detected anomalies to total anomalies is calculated. Recall is very used when you have to correctly classify some event that has already occurred. | |
| Precision (Detection rate) | It is a metric that compares the number of real anomalies discovered to the total number of anomalies. It calculates the accuracy of the True Positive. | |
| Specificity (True Negative Rate) | It determines the percentage of the samples that were correctly labeled as normal. specificity is important when the objective is to minimize the number of negative examples that are incorrectly classified. | |
| False Positive Rate (FPR) | It is the ratio of the number of anomalous instances that are incorrectly classified in relation to all normal instances. | |
| False Negative Rate (FNR) | It measures the ratio of normal instances that are incorrectly classified in relation to all normal instances. | |
| F1-Score | It calculates the harmonic Mean between recall and precision rates. The greater the F1-Score, the better is the performance of the model. It’s often used when class distribution is uneven. | |
| J Score | It is a single statistic that captures the performance of a binary classification test. | |
| Percentage of Wrong Classifications (PWC) | It calculates the ratio between the number of incorrect predictions and the total number of predictions. | |
| Receiver operating characteristic curve (ROC) | It gives details on a curve that represents the percentage of anomalies that were correctly recognized against those that were missed at varying thresholds. | |
| Area under ROC curve (AUC) | It is the area under the curve of the plot of FPR vs. TPR at different points in [0, 1]. As the value increases, our model’s accuracy improves. It yields good results when the observations are balanced between each class. | 
| Application Type | Technique Used | Ref. | 
|---|---|---|
| Automated surveillance | CNN | [77] | 
| CNN and LSTM | [78] | |
| Autonomous driving | Autoencoder + semantic segmentation | [79] | 
| GAN + Post hoc statistics | [80] | |
| CNN + Gaussian Processes | [81] | |
| Industrial automation | LSTM and autoencoder | [82] | 
| LSTM, CNN, autoencoder | [83] | |
| Intelligent traffic monitoring | YOLOv5 and decision tree | [84] | 
| Surgical Robotics | Deep Residual Autoencoder | [85] | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jebur, S.A.; Hussein, K.A.; Hoomod, H.K.; Alzubaidi, L.; Santamaría, J. Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance. Electronics 2023, 12, 29. https://doi.org/10.3390/electronics12010029
Jebur SA, Hussein KA, Hoomod HK, Alzubaidi L, Santamaría J. Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance. Electronics. 2023; 12(1):29. https://doi.org/10.3390/electronics12010029
Chicago/Turabian StyleJebur, Sabah Abdulazeez, Khalid A. Hussein, Haider Kadhim Hoomod, Laith Alzubaidi, and José Santamaría. 2023. "Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance" Electronics 12, no. 1: 29. https://doi.org/10.3390/electronics12010029
APA StyleJebur, S. A., Hussein, K. A., Hoomod, H. K., Alzubaidi, L., & Santamaría, J. (2023). Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance. Electronics, 12(1), 29. https://doi.org/10.3390/electronics12010029
 
        





 
       