Detecting Abnormal Behavior Events and Gatherings in Public Spaces Using Deep Learning: A Review
Abstract
1. Introduction
2. Research Methodology
2.1. Prior Research
2.2. Research Goals
- TI=(crowd gathering) OR TI=(spontaneous demonstration) AND TI=(crowd aggregation) AND TI=(abnormal aggregation) AND PY=(2017–2025);
- TITLE-ABS-KEY ((crowd OR gathering) AND ( (movement AND patterns) OR (weapon AND detection) ) AND analysis AND (human OR person OR people)) AND PUBYEAR > 2016;
- TITLE-ABS-KEY (“public safety”) AND TITLE-ABS-KEY (“computer vision”) AND TITLE-ABS-KEY (“deep learning”) AND PUBYEAR > 2016.
2.3. Inclusion and Exclusion Criteria
2.4. Selection Results and Data Extraction
2.5. Temporary Chart of Publications
2.6. Most Representative Keywords
3. Findings and Quality Assessment
4. Analysis of the Results
4.1. Q1. What Is Considered a Gathering of People?
4.2. Q2. Can AI Recognize What a Gathering of People Is?
4.3. Q3. How Can We Detect with Videos the Difference Between a Peaceful and a Non-Peaceful Gathering?
- Example 1: Movement in the Opposite Direction to the Crowd Flow: A typical anomaly in video surveillance scenarios involves detecting an individual walking in the opposite direction to the prevailing movement of the crowd. While this behavior does not necessarily imply violence, it does constitute a significant deviation from the usual collective dynamics. Such anomalies may indicate potential risk situations, such as attempted escapes, suspicious behavior, or the possible onset of an incident.
- Example 2: Sudden Transition from Normal to Violent Behavior: Another paradigmatic situation involves a group of people who, initially, are behaving in a completely peaceful or routine manner. However, at a given moment, two individuals begin to fight or engage in physical aggression. These events highlight the importance of automated systems being able not only to identify atypical patterns but also to detect, at an early stage, the transition between normal behaviors and scenarios that may become dangerous.
4.4. Q4. What Are the Different Types of Deep Learning Networks and Architectures Used for Abnormal Event and Gathering Detection?
4.5. Q5. What Are the Performance and Accuracy of These Methods on Public Security Datasets?
4.6. Q6. What Key Elements of the Datasets Can Provide Evidence of the Outcome of the Gathering, and Which Datasets Are Most Commonly Used to Monitor Gatherings Using Deep Learning?
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Haghani, M.; Coughlan, M.; Crabb, B.; Dierickx, A.; Feliciani, C.; van Gelder, R.; Geoerg, P.; Hocaoglu, N.; Laws, S.; Lovreglio, R.; et al. A roadmap for the future of crowd safety research and practice: Introducing the Swiss Cheese Model of Crowd Safety and the imperative of a Vision Zero target. Saf. Sci. 2023, 168, 106292. [Google Scholar] [CrossRef]
- Kuppusamy, P.; Bharathi, V. Human abnormal behavior detection using CNNs in crowded and uncrowded surveillance – A survey. Meas. Sens. 2022, 24, 100510. [Google Scholar] [CrossRef]
- Azorin-Lopez, J.; Saval-Calvo, M.; Fuster-Guillo, A.; Garcia-Rodriguez, J.; Mora-Mora, H. Constrained self-organizing feature map to preserve feature extraction topology. Neural Comput. Appl. 2017, 28, 439–459. [Google Scholar] [CrossRef]
- Azorin-Lopez, J.; Saval-Calvo, M.; Fuster-Guillo, A.; Garcia-Rodriguez, J.; Cazorla, M.; Signes-Pont, M.T. Group activity description and recognition based on trajectory analysis and neural networks. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1585–1592. [Google Scholar]
- Azorín-López, J.; Saval-Calvo, M.; Fuster-Guilló, A.; García-Rodríguez, J. Human behaviour recognition based on trajectory analysis using neural networks. In Proceedings of the The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; pp. 1–7. [Google Scholar]
- Wang, L.; Zhou, Y.; Li, R.; Ding, L. A fusion of a deep neural network and a hidden Markov model to recognize the multiclass abnormal behavior of elderly people. Knowl.-Based Syst. 2022, 252, 109351. [Google Scholar] [CrossRef]
- Abdalla, M.; Javed, S.; Radi, M.A.; Ulhaq, A.; Werghi, N. Video Anomaly Detection in 10 Years: A Survey and Outlook. arXiv 2024, arXiv:2405.19387. [Google Scholar]
- Azorin-Lopez, J.; Saval-Calvo, M.; Fuster-Guillo, A.; Garcia-Rodriguez, J. A novel prediction method for early recognition of global human behaviour in image sequences. Neural Process. Lett. 2016, 43, 363–387. [Google Scholar] [CrossRef]
- Hu, X.; Lian, J.; Zhang, D.; Gao, X.; Jiang, L.; Chen, W. Video anomaly detection based on 3D convolutional auto-encoder. Signal Image Video Process. 2022, 16, 1885–1893. [Google Scholar] [CrossRef]
- Borja-Borja, L.F.; Azorin-Lopez, J.; Saval-Calvo, M.; Fuster-Guillo, A.; Sebban, M. Architecture for automatic recognition of group activities using local motions and context. IEEE Access 2022, 10, 79874–79889. [Google Scholar] [CrossRef]
- Borja-Borja, L.F.; Azorin-Lopez, J.; Saval-Calvo, M.; Fuster-Guillo, A. Deep learning architecture for group activity recognition using description of local motions. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Wang, L.; Tan, H.; Zhou, F.; Zuo, W.; Sun, P. Unsupervised anomaly video detection via a double-flow ConvLSTM variational autoencoder. IEEE Access 2022, 10, 44278–44289. [Google Scholar] [CrossRef]
- Feng, X.; Song, D.; Chen, Y.; Chen, Z.; Ni, J.; Chen, H. Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 5546–5554. [Google Scholar]
- Yu, G.; Wang, S.; Cai, Z.; Zhu, E.; Xu, C.; Yin, J.; Kloft, M. Cloze test helps: Effective video anomaly detection via learning to complete video events. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 583–591. [Google Scholar]
- Sun, S.; Hua, J.; Feng, J.; Wei, D.; Lai, B.; Gong, X. TDSD: Text-driven scene-decoupled weakly supervised video anomaly detection. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; pp. 5055–5064. [Google Scholar]
- Chen, W.; Ma, K.T.; Yew, Z.J.; Hur, M.; Khoo, D.A.A. TEVAD: Improved video anomaly detection with captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5549–5559. [Google Scholar]
- Bendali-Braham, M.; Weber, J.; Forestier, G.; Idoumghar, L.; Muller, P.A. Recent trends in crowd analysis: A review. Mach. Learn. Appl. 2021, 4, 100023. [Google Scholar] [CrossRef]
- Ansari, M.A.; Singh, D.K. Human detection techniques for real time surveillance: A comprehensive survey. Multimed. Tools Appl. 2021, 80, 8759–8808. [Google Scholar] [CrossRef]
- Tyagi, B.; Nigam, S.; Singh, R. A Review of Deep Learning Techniques for Crowd Behavior Analysis. Arch. Comput. Methods Eng. 2022, 29, 5427–5455. [Google Scholar] [CrossRef]
- Bhuiyan, M.R.; Abdullah, J.; Hashim, N.; Farid, F.A. Video analytics using deep learning for crowd analysis: A review. Multimed. Tools Appl. 2022, 81, 27895–27922. [Google Scholar] [CrossRef]
- Zhai, Z.; Liu, P.; Zhao, L.; Qian, J.; Cheng, B. An efficiency-enhanced deep learning model for citywide crowd flows prediction. Int. J. Mach. Learn. Cybern. 2021, 12, 1879–1891. [Google Scholar] [CrossRef]
- Elias, P.; Macko, M.; Sedmidubsky, J.; Zezula, P. Tracking subjects and detecting relationships in crowded city videos. Multimed. Tools Appl. 2022, 83, 15339–15361. [Google Scholar] [CrossRef]
- Kim, S.; Hwang, S.; Hong, S.H. Identifying shoplifting behaviors and inferring behavior intention based on human action detection and sequence analysis. Adv. Eng. Inform. 2021, 50, 101399. [Google Scholar] [CrossRef]
- Emad, M.; Ishack, M.; Ahmed, M.; Osama, M.; Salah, M.; Khoriba, G. Early-Anomaly Prediction in Surveillance Cameras for Security Applications. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference, MIUCC 2021, Cairo, Egypt, 26–27 May 2021; pp. 124–128. [Google Scholar] [CrossRef]
- Lin, W.; Gao, J.; Wang, Q.; Li, X. Learning to detect anomaly events in crowd scenes from synthetic data. Neurocomputing 2021, 436, 248–259. [Google Scholar] [CrossRef]
- Zheng, Z.; Xia, Y.; Chen, X.; Yao, J. Security alert: Generalized deep multi-view representation learning for crime forecasting. Comput. Intell. 2022, 39, 4–17. [Google Scholar] [CrossRef]
- Guo, H.; Zhang, D.; Jiang, L.; Poon, K.W.; Lu, K. ASTCN: An Attentive Spatial-Temporal Convolutional Network for Flow Prediction. IEEE Internet Things J. 2022, 9, 3215–3225. [Google Scholar] [CrossRef]
- Khaire, P.; Kumar, P. A semi-supervised deep learning based video anomaly detection framework using RGB-D for surveillance of real-world critical environments. Forensic Sci. Int. Digit. Investig. 2022, 40, 301346. [Google Scholar] [CrossRef]
- Fang, J.; Zhang, X.; Yang, B.; Chen, S.; Li, B. An Attention-based U-Net Network for Anomaly Detection in Crowded Scenes. In Proceedings of the 2022 IEEE 14th International Conference on Computer Research and Development, ICCRD 2022, Shenzhen, China, 7–9 January 2022; pp. 202–206. [Google Scholar] [CrossRef]
- Galab, M.K.; Taha, A.; Zayed, H.H. Adaptive Technique for Brightness Enhancement of Automated Knife Detection in Surveillance Video with Deep Learning. Arab. J. Sci. Eng. 2021, 46, 4049–4058. [Google Scholar] [CrossRef]
- Fernandez-Carrobles, M.M.; Deniz, O.; Maroto, F. Gun and Knife Detection Based on Faster R-CNN for Video Surveillance. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2019; Volume 11868, LNCS. [Google Scholar] [CrossRef]
- Boltes, M.; Schumann, J.; Salden, D. Gathering of data under laboratory conditions for the deep analysis of pedestrian dynamics in crowds. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, Lecce, Italy, 29 August–1 September 2017. [Google Scholar] [CrossRef]
- Yang, D.S.; Liu, C.Y.; Liao, W.H.; Ruan, S.J. Crowd gathering and commotion detection based on the stillness and motion model. Multimed. Tools Appl. 2020, 79, 19435–19449. [Google Scholar] [CrossRef]
- Alqaysi, H.H.; Sasi, S. Detection of Abnormal behavior in Dynamic Crowded Gatherings. In Proceedings of the 2013 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 23–25 October 2013. [Google Scholar] [CrossRef]
- Fitwi, A.; Chen, Y.; Sun, H.; Harrod, R. Estimating interpersonal distance and crowd density with a single-edge camera. Computers 2021, 10, 143. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, X.; Brown, G.; Wang, C.; Chin, P. Mixed Spatio-Temporal Neural Networks on Real-time Prediction of Crimes. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1749–1754. [Google Scholar] [CrossRef]
- Huang, Z.; Wang, P.; Zhang, F.; Gao, J.; Schich, M. A mobility network approach to identify and anticipate large crowd gatherings. Transp. Res. Part B Methodol. 2018, 114, 147–170. [Google Scholar] [CrossRef]
- Nauman, M.A.; Shoaib, M. Identification of anomalous behavioral patterns in crowd scenes. Comput. Mater. Contin. 2022, 71, 925–939. [Google Scholar] [CrossRef]
- Yang, C.L.; Wu, T.H.; Lai, S.H. Moving-Object-Aware Anomaly Detection in Surveillance Videos. In Proceedings of the 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington, DC, USA, 16–19 November 2021. [Google Scholar] [CrossRef]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Sultani, W.; Chen, C.; Shah, M. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Zhou, S.; Shi, R.; Wang, L. Extracting macroscopic quantities in crowd behaviour with deep learning. Phys. Scr. 2024, 99, 065213. [Google Scholar] [CrossRef]
- Qaraqe, M.K.; Elzein, A.; Basaran, E.; Yang, Y.; Varghese, E.B.; Costandi, W.; Rizk, J.; Alam, N. PublicVision: A Secure Smart Surveillance System for Crowd Behavior Recognition. IEEE Access 2024, 12, 26474–26491. [Google Scholar] [CrossRef]
- Peng, Y.; Hao, H.; Zhou, T.; Han, B.; Yin, W. Research on the Detection Algorithm for Abnormal Crowd Behaviors Based on an Enhanced SlowFast Model. In Proceedings of the 2024 9th International Conference on Computer and Communication Systems (ICCCS), Xi’an, China, 19–22 April 2024; pp. 67–72. [Google Scholar] [CrossRef]
- Chaudhary, R.; Kumar, M. Optimized deep maxout for crowd anomaly detection: A hybrid optimization-based model. Network 2024, 36, 148–173. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Qiu, L.; Wang, J.; Feng, S. The use of convolutional neural networks for abnormal behavior recognition in crowd scenes. Inf. Process. Manag. 2025, 62, 103880. [Google Scholar] [CrossRef]
- Jadhav, N.; Rangdale, S.; Solav, S.; Gayake, N.; Nanware, S.; Ranjan, N.M. Utilizing YOLO V5 and deep learning Approach to detect and manage crowds through advanced computational methodologies and Twilio Programmable Messaging API. In Proceedings of the 2024 OPJU International Technology Conference (OTCON), Raigarh, India, 5–7 June 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Rajitha, B. Intelligent Vision-Based Systems for Public Safety and Protection via Machine Learning Techniques; IGI Global Scientific Publishing: Hershey, PA, USA, 2021. [Google Scholar] [CrossRef]
- Luo, L.; Xie, S.; Yin, H.; Peng, C.; Ong, Y. Detecting and Quantifying Crowd-Level Abnormal Behaviors in Crowd Events. IEEE Trans. Inf. Forensics Secur. 2024, 19, 6810–6823. [Google Scholar] [CrossRef]
- Yan, D.; Ding, G.; Huang, K.; Bai, C.; He, L.; Zhang, L. Enhanced Crowd Dynamics Simulation with Deep Learning and Improved Social Force Model. Electronics 2024, 13, 934. [Google Scholar] [CrossRef]
- Sultani, W.; Chen, C.; Shah, M. Real-world Anomaly Detection in Surveillance Videos. arXiv 2019, arXiv:1801.04264. [Google Scholar] [CrossRef]
- Landi, F.; Snoek, C.G.; Cucchiara, R. Anomaly Locality in Video Surveillance. arXiv 2019, arXiv:1901.10364. [Google Scholar]
- Shao, S.; Zhao, Z.; Li, B.; Xiao, T.; Yu, G.; Zhang, X.; Sun, J. CrowdHuman: A Benchmark for Detecting Human in a Crowd. arXiv 2018, arXiv:1805.00123. [Google Scholar]
- Ferryman, J.; Shahrokni, A. PETS2009: Dataset and challenge. In Proceedings of the 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Snowbird, UT, USA, 7–9 December 2009; pp. 1–6. [Google Scholar] [CrossRef]
- Mehran, R.; Oyama, A.; Shah, M. Abnormal crowd behavior detection using social force model. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 935–942. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar] [CrossRef]
- Chan, A.B.; Morrow, M.; Vasconcelos, N. Analysis of Crowded Scenes using Holistic Properties. In Proceedings of the Eleventh IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS 2009), Miami, FL, USA, 7–12 December 2009. [Google Scholar]
- Cheng, M.; Cai, K.; Li, M. RWF-2000: An Open Large Scale Video Database for Violence Detection. arXiv 2020, arXiv:1911.05913. [Google Scholar] [CrossRef]
- Lu, Z.; Guan, X.; Yan, D.; Li, Y.; Huang, T. Crowd behavior reconstruction with deep group feature learning. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 3–6 December 2024; pp. 5699–5706. [Google Scholar] [CrossRef]
- Yahuarcani, I.O.; Diaz, J.E.G.; Satalaya, A.M.N.; Noriega, A.A.D.; Cachique, F.X.L.; Llaja, L.A.S.; Pezo, A.R.; Rojas, A.E.L. Recognition of violent actions on streets in urban spaces using Machine Learning in the context of the Covid-19 pandemic. In Proceedings of the 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 9–10 December 2021. [Google Scholar] [CrossRef]
- Zhou, Y.; Qu, Y.; Xu, X.; Shen, F.; Song, J.; Shen, H. BatchNorm-based Weakly Supervised Video Anomaly Detection. arXiv 2023, arXiv:2311.15367. [Google Scholar] [CrossRef]
- Liu, W.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Diversity-Measurable Anomaly Detection. arXiv 2023, arXiv:2303.05047. [Google Scholar] [CrossRef]
- Xiao, J.; Liu, T.; Ji, G. Divide and Conquer in Video Anomaly Detection: A Comprehensive Review and New Approach. arXiv 2023, arXiv:2309.14622. [Google Scholar] [CrossRef]
- Hachiuma, R.; Sato, F.; Sekii, T. Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling. arXiv 2023, arXiv:2303.15270. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, Z.; Zhang, B.; Fok, W.; Qi, X.; Wu, Y.C. MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection. arXiv 2022, arXiv:2211.15098. [Google Scholar] [CrossRef]
- Naji, Y.; Setkov, A.; Loesch, A.; Gouiffès, M.; Audigier, R. Spatio-temporal predictive tasks for abnormal event detection in videos. arXiv 2023, arXiv:2210.15741. [Google Scholar] [CrossRef]
- Reiss, T.; Hoshen, Y. Attribute-based Representations for Accurate and Interpretable Video Anomaly Detection. arXiv 2022, arXiv:2212.00789. [Google Scholar] [CrossRef]
- Mohammadi, H.; Nazerfard, E. Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model. arXiv 2022, arXiv:2202.02212. [Google Scholar] [CrossRef]
- Wu, J.C.; Hsieh, H.Y.; Chen, D.J.; Fuh, C.S.; Liu, T.L. Self-Supervised Sparse Representation for Video Anomaly Detection. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
- Georgescu, M.I.; Ionescu, R.; Khan, F.S.; Popescu, M.; Shah, M. A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4505–4523. [Google Scholar] [CrossRef]
- Hirschorn, O.; Avidan, S. Normalizing Flows for Human Pose Anomaly Detection. arXiv 2023, arXiv:2211.10946. [Google Scholar] [CrossRef]
- Garcia-Cobo, G.; SanMiguel, J.C. Human skeletons and change detection for efficient violence detection in surveillance videos. Comput. Vis. Image Underst. 2023, 233, 103739. [Google Scholar] [CrossRef]
- Lv, H.; Zhou, C.; Cui, Z.; Xu, C.; Li, Y.; Yang, J. Localizing Anomalies From Weakly-Labeled Videos. IEEE Trans. Image Process. 2021, 30, 4505–4515. [Google Scholar] [CrossRef] [PubMed]
- Wang, G.; Wang, Y.; Qin, J.; Zhang, D.; Bao, X.; Huang, D. Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles. arXiv 2022, arXiv:2207.10172. [Google Scholar] [CrossRef]
- Islam, Z.; Rukonuzzaman, M.; Ahmed, R.; Kabir, M.H.; Farazi, M. Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021. [Google Scholar] [CrossRef]
- Su, Q.; Shu, L.; Hancke, G.P.; Huang, K.; Nurellari, E.; Zhao, Q.; Choudhury, N.; Hazarika, A. Camera planning for physical safety of outdoor electronic devices: Perspective and analysis. IEEE/CAA J. Autom. Sin. 2025. [Google Scholar] [CrossRef]
- Pereira, S.S.; Maia, J.E.B. MC-MIL: Video surveillance anomaly detection with multi-instance learning and multiple overlapped cameras. Neural Comput. Appl. 2024, 36, 10527–10543. [Google Scholar] [CrossRef]
- Xie, Z.; Ni, Z.; Yang, W.; Zhang, Y.; Chen, Y.; Zhang, Y.; Ma, X. A robust online multi-camera people tracking system with geometric consistency and state-aware re-id correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024. [Google Scholar] [CrossRef]
- Wu, H.; Zeng, Q.; Guo, C.; Zhao, T.; Chen, C.W. Target-Aware Camera Placement for Large-Scale Video Surveillance. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 13338–13348. [Google Scholar] [CrossRef]
- Jakob, P.; Madan, M.; Schmid-Schirling, T.; Valada, A. Multi-perspective anomaly detection. Sensors 2021, 21, 5311. [Google Scholar] [CrossRef] [PubMed]
- Hsieh, Y.H.; Kao, C.C.; Lai, C.H.; Lin, K.P.; Yang, S.Y.; Yuan, S.M. Low-FPS Multi-Object Multi-Camera Tracking via Deep Learning. Electronics 2025, 14, 1373. [Google Scholar] [CrossRef]
- Cob-Parro, A.C.; Losada-Gutiérrez, C.; Marrón-Romera, M.; Gardel-Vicente, A.; Bravo-Muñoz, I. Smart video surveillance system based on edge computing. Sensors 2021, 21, 2958. [Google Scholar] [CrossRef] [PubMed]
- Dharan, A.M.; Mukhopadhyay, D. A comprehensive survey on machine learning techniques to mobilize multi-camera network for smart surveillance. Innov. Syst. Softw. Eng. 2025, 21, 313–332. [Google Scholar] [CrossRef]
- Li, C.; Li, J.; Xie, Y.; Nie, J.; Yang, T.; Lu, Z. Multi-camera joint spatial self-organization for intelligent interconnection surveillance. Eng. Appl. Artif. Intell. 2022, 107, 104533. [Google Scholar] [CrossRef]
- Zhu, L.; Wang, L.; Raj, A.; Gedeon, T.; Chen, C. Advancing video anomaly detection: A concise review and a new dataset. arXiv 2024, arXiv:2402.04857. [Google Scholar]
- Wang, Y.; Zhao, Y.; Huo, Y.; Lu, Y. Multimodal anomaly detection in complex environments using video and audio fusion. Sci. Rep. 2025, 15, 1–22. [Google Scholar] [CrossRef]
- Cao, C.; Lu, Y.; Wang, P.; Zhang, Y. A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20392–20401. [Google Scholar]
- Ramachandra, B.; Jones, M. Street scene: A new dataset and evaluation protocol for video anomaly detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2569–2578. [Google Scholar]
Questions | Route |
---|---|
Q1. What is considered to be a gathering of people? | The foundational concept to consider at the outset of this research is the notion of a gathering. Valuable outcomes can be achieved by accurately determining the head count and confirming the presence of a group surpassing a specified threshold. Furthermore, it is crucial to establish the capability of tracking individuals and their interactions. |
Q2. Can AI recognize what a gathering of people is? | The integration of deep learning techniques with well-trained models plays a crucial role in facilitating real-time crowd scene monitoring. By employing AI techniques and leveraging meticulously trained datasets alongside mathematical models, it becomes possible to effectively discern between organized congregations of individuals and mere streams of people occupying the same vicinity. |
Q3. How can we detect with videos the difference between a peaceful and a nonpeaceful gathering? | For those occasions when a gathering may lead to violent actions, the system should be able to predict and even anticipate them. Once the need for the number of participating individuals is covered, whether the behavior of that crowd of people corresponds to something that could be called normal, or whether, on the contrary, actions outside the norm are evident. |
Q4. What are the different types of deep learning networks and architectures used for abnormal event and gathering detection? | This section will be addressed by examining several prevalent learning mechanisms frequently employed in anomaly detection, including autoencoders, GANs, and diverse variations of CNNs. |
Q5. What are the performance and accuracy of these methods on public security datasets? | The performance and accuracy of the CNNs and autoencoders examined in the preceding section can be influenced by several factors. These factors include the size and quality of the datasets, the complexity of the problem under consideration, the architectural design of the models employed, and the choice of evaluation metrics for performance assessment. Consequently, our research will focus on identifying the situations where the utilization of one AI technique over another is more advantageous or suitable. |
Q6. What key elements of the datasets can provide evidence of the outcome of the gathering, and which datasets are most commonly used to monitor gatherings using deep learning? | We need studies to be able to analyze actions in a gathering of people, but also consider the combination of other determining factors, and whether a bladed weapon or blunt object can be found in the crowd. In these cases, it is necessary to use deep learning to detect other important factors. |
Decisions | Related to |
---|---|
Inclusive Decisions | Related to people detection |
Related to the detection of objects | |
Related to public safety | |
Related to behavioral patterns | |
Exclusive Decisions | Not related to deep learning techniques |
Not related to camera analysis (computer vision) | |
Not related to engineering (computer science) |
Keywords | Count | Keywords | Count | Keywords | Count | Keywords | Count |
---|---|---|---|---|---|---|---|
detection | 903 | video | 569 | datasets | 534 | crowd | 512 |
training | 443 | prediction | 442 | network | 430 | object | 349 |
flow | 264 | anomaly | 259 | camera | 204 | crime | 201 |
ratio | 197 | abnormal | 186 | performance | 176 | count | 167 |
people | 160 | convolutional | 152 | framework | 144 | public | 136 |
distance | 128 | tracking | 114 | density | 108 | recognition | 108 |
deep learning | 90 | architecture | 89 | target | 86 | behavior | 80 |
pattern | 76 | anomalous | 75 |
Category | Primary Studies |
---|---|
Crowd aggregation | [21,22,25,27,28,32,33,34,35,36,37] |
Abnormal crowd aggregation | [20,23,24,25,28,30,31,38,39,40,41,42,43,44,45,46,47] |
Critical abnormal aggregation | [26,39,48,49,50] |
Dataset | Type | Nº Videos/Images | Method | Model |
---|---|---|---|---|
UCF-Crime [51] | Surveillance | 2047 | Abnormal event detection | Autoencoders, CNN, 3D-VAE, transfer learning. |
UCF-Crime2Local [52] | Surveillance | 13,662 | Abnormal event detection | CNN, Spatiotemporal Autoencoder, transfer learning. |
CrowdHuman [53] | Crowd | 15,000 images, 5000 annotated | Pedestrian detection, multi-person tracking | Faster R-CNN, RetinaNet, YOLOv3, Cascade R-CNN |
PETS2009 [54] | Surveillance | 24,259 | Object Tracking | Multiple Object Tracking (MOT), R-CNN, DeepSORT |
UMN Dataset [55] | Surveillance | 1200 | Object detection | Faster R-CNN, YOLOv3, SSD, Mask R-CNN |
ShanghaiTech [56] | Crowd | 1119 | Crowd counting | CNNs, DCNN, MCNN, FCN |
UCSD [57] | Pedestrian | Over 2000 | Anomaly detection | CNN, VAEs, GANs, Transfer learning |
RWF-2000 [58] | Surveillance | 2000 | Abnormal event detection | LSTM, CNN, Reinforcement Learning |
PublicVision [43] | Surveillance crowd | 1413 videos | Behavior and violence-level recognition | Swin Transformer |
Method | Application | Metric | Value | Reference |
---|---|---|---|---|
Optimized deep maxout | Anomaly detection | Accuracy | 97.28% | [45] |
Enhanced SlowFast | Detection of 5 behaviors in public spaces | Processing speed (FPS) | 40.5 FPS | [44] |
YOLOv5 + Twilio API | Overcrowding detection | Real-time response (latency) | Not defined | [47] |
ACSAM | Anomaly detection in dense crowds | Accuracy improvement over previous methods | >12% | [46] |
PublicVision (Swin Transformer + encrypted VPN) | Crowd behavior classification | Global accuracy Mean Average Precision (mAP) Inference latency | 89.76% 93.3% ∼20 frames/inference | [43] |
Title | Objective | Model and Method | Tool | Dataset | Site |
---|---|---|---|---|---|
[33] | Determine the stillness state in every different place and situation. | RNN. Stillness level model and the leaky bucket model (LBM) | Not defined | PETS2009 | Outdoor and indoor |
[34] | Generate an alarm for the security personnel to take appropriate actions | DADCG algorithm. MHI and Optical Flow techniques. | CVST—Matlab R2012a | YouTube | Outdoor and indoor |
[25] | Reduce over-fitting because of scarcity of data. | CNN: C3D, GAN, N3D, ResNet | SHADE | UCSD, UCF-Crime, UMN | Outdoor |
[26] | Develop a unified model to predict crime, leveraging latent relationships from social media | Smote-TomekLinks iterative, 1D-CNN | Not defined | Twitter, 4-month city-wide dataset | Outdoor |
[28] | Detection of anomalies in real-world surveillance sites | CNN MobileNet | Not defined | RGB+D, UCF-Crime2Local | Outdoor and indoor |
[60] | Detect violent actions on streets in urban spaces | CNN | A conventional laptop, VLC tool | Created by the authors | Outdoor and indoor |
[31] | Develop two novel weapon detectors applying deep learning | Faster R-CNN, RPN | Not defined | COCO 2017, Gupta dataset, Open Images dataset | Outdoor and indoor |
Category | Sub-Category | Accuracy |
---|---|---|
Crowd aggregation | Liquor law violation | 59.8% [26] |
Narcotics | 76.8% [26] | |
Prostitution | 86.9% [26] | |
Deceptive practice | 71.0% [26] | |
Grappling | 95.0% [60] | |
Arrest | 63.5% [25] | |
Chase | 59.1% [25] | |
Run | 63.2% [25] | |
Abnormal crowd aggregation | Robbery | 77.3% [26] |
Theft | 74.7% [26] | |
Assault | 75.1% [26] | |
Punching | 92.0% [60] | |
Kicking | 93.0% [60] | |
Fight | 55.1% [25] | |
Knife Detection | 85.44% [31] | |
Critical abnormal aggregation | Homicide | 70.0% [26] |
Gun Detection | 46.68% [31] | |
Shoot | 83.2% [25] | |
Strangulation | 90.0% [60] | |
Kidnapping | 52.1% [26] | |
Arson | 54.6% [26] | |
Weapons violation | 77.3% [26] |
UCF-Crime | UCSD | ShanghaiTech | RWF-2000 | ||||
---|---|---|---|---|---|---|---|
Method | AUC (%) | Method | AUC (%) | Method | AUC (%) | Method | Accuracy |
[61] | 87.24 | [62] | 99.7 | [63] | 87.72 | [64] | 93.4 |
[65] | 86.98 | [66] | 98.9 | [67] | 85.94 | [68] | 90.4 |
[69] | 85.99 | [70] | 98.7 | [71] | 85.9 | [72] | 90.25 |
[73] | 85.38 | – | – | [74] | 84.3 | [75] | 89.75 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rodrigo-Guillen, R.; Garcia-D’Urso, N.; Mora-Mora, H.; Azorin-Lopez, J. Detecting Abnormal Behavior Events and Gatherings in Public Spaces Using Deep Learning: A Review. J. Sens. Actuator Netw. 2025, 14, 69. https://doi.org/10.3390/jsan14040069
Rodrigo-Guillen R, Garcia-D’Urso N, Mora-Mora H, Azorin-Lopez J. Detecting Abnormal Behavior Events and Gatherings in Public Spaces Using Deep Learning: A Review. Journal of Sensor and Actuator Networks. 2025; 14(4):69. https://doi.org/10.3390/jsan14040069
Chicago/Turabian StyleRodrigo-Guillen, Rafael, Nahuel Garcia-D’Urso, Higinio Mora-Mora, and Jorge Azorin-Lopez. 2025. "Detecting Abnormal Behavior Events and Gatherings in Public Spaces Using Deep Learning: A Review" Journal of Sensor and Actuator Networks 14, no. 4: 69. https://doi.org/10.3390/jsan14040069
APA StyleRodrigo-Guillen, R., Garcia-D’Urso, N., Mora-Mora, H., & Azorin-Lopez, J. (2025). Detecting Abnormal Behavior Events and Gatherings in Public Spaces Using Deep Learning: A Review. Journal of Sensor and Actuator Networks, 14(4), 69. https://doi.org/10.3390/jsan14040069