Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Video-Based Physical Violence Detection
3.1.1. Data Gathering and Pre-Processing
3.1.2. Feature Extraction
3.1.3. Classifier Design
- (1)
- Randomly select m samples and their corresponding labels from the training set.
- (2)
- Calculate the gradient value and error, and update the gradient accumulation r.
- (3)
- Update the parameters according to r and the gradient values.
3.2. Audio-Based Bullying Emotion Detection
3.2.1. Audio Databases and Acoustic Features
3.2.2. Classifier Design
3.3. Improved D–S Fusion Algorithm
3.3.1. Classic D–S Fusion Algorithm
- (1)
- physical violence = true and bullying emotion = true: this is a typical campus violence scene, and is exactly what the authors want to detect;
- (2)
- physical violence = true and bullying emotion = false: this can be a playing or sport scene with physical confrontation. According to the authors’ observations, campus violence events are usually accompanied by bullying emotions, so this case is classified as non-violence in this paper;
- (3)
- physical violence = false and bullying emotion = true: this can be an argument or a criticism scene. In this paper, the authors focus on physical violence, so they catalog this case into non-violence, too;
- (4)
- physical violence = false and bullying emotion = false: this is a typical non-violent scene.
- If there is serious conflict between the evidence, then the fusion result is unsatisfactory;
- It is difficult to identify the degree of fuzziness;
- The fusion result is greatly influenced by the value of probability distribution function.
- Improve the BPA functions of certain evidence on certain hypotheses;
- Take the confidence levels and conflict levels of evidence into consideration.
3.3.2. Improvement on BPA Functions
3.3.3. Improvement on Fusion Rules
4. Results
4.1. Video-Based Physical Violence Classification Results
4.2. Audio-Based Bullying Emotion Classification Results
4.3. Improved D–S Fusion Classification Results
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dang, L.M.; Min, K.; Wang, H.; Piran, J.; Lee, C.H.; Moon, H. Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognit. 2020, 108, 107561. [Google Scholar] [CrossRef]
- Zhiqiang, G.; Dawei, L.; Kaizhu, H.; Yi, H. Context-aware human activity and smartphone position-mining with motion sensors. Remote Sens. 2019, 11, 2531. [Google Scholar] [CrossRef] [Green Version]
- Tian, W.; Yang, C.; Hongqiang, L.; Jing, T.; Hichem, S.; Fei, T. Online detection of action start via soft computing for smart city. IEEE Trans. Ind. Inform. 2021, 17, 524–533. [Google Scholar]
- Liu, X.; Jia, M.; Zhang, X.; Lu, W. A novel multichannel Internet of things based on dynamic spectrum sharing in 5G communication. IEEE Internet Things 2019, 6, 5962–5970. [Google Scholar] [CrossRef]
- Serrano, I.; Deniz, O.; Espinosa-Aranda, J.L.; Bueno, G. Fight recognition in video using Hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 2018, 27, 4787–4797. [Google Scholar] [CrossRef] [PubMed]
- Serrano, I.; Deniz, O.; Bueno, G.; Garcia-Hernando, G.; Kim, T.-K. Spatio-temporal elastic cuboid trajectories for efficient fight recognition using Hough forests. Mach. Vis. Appl. 2018, 29, 207–217. [Google Scholar] [CrossRef]
- Chen, J.; Xu, Y.; Zhang, C.; Xu, Z.; Meng, X.; Wang, J. An improved two-stream 3D convolutional neural network for human action recognition. In Proceedings of the 2019 25th International Conference on Automation and Computing (ICAC), Lancaster, UK, 5–7 September 2019. [Google Scholar] [CrossRef]
- Sumon, S.A.; Goni, R.; Hashem, N.B.; Shahria, T.; Rahman, R.M. Violence detection by pretrained modules with different deep learning approaches. Vietnam J. Comput. Sci. 2020, 7, 22–23. [Google Scholar] [CrossRef]
- Eknarin, D.; Luepol, P.; Suwatchai, K. Video Representation Learning for CCTV-Based Violence Detection. In Proceedings of the 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON), Bangkok, Thailand, 12–14 December 2018. [Google Scholar] [CrossRef]
- Accattoli, S.; Sernani, P.; Falcionelli, N.; Mekuria, D.N.; Dragoni, A.F. Violence detection in videos by combining 3D convolutional neural networks and support vector machines. Appl. Artif. Intell. 2020, 34, 202–203. [Google Scholar]
- Nawaz, R.; Cheah, K.H.; Nisar, H.; Yap, V.V. Comparison of different feature extraction methods for EEG-based emotion recognition. Biocybern. Biomed. Eng. 2020, 1, 101–102. [Google Scholar] [CrossRef]
- Sugan, N.; Srinivas, N.S.; Kumar, L.S.; Nath, M.K.; Kanhe, A. Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales. Digit. Signal Process. 2020, 1, 608–609. [Google Scholar]
- Han, T.; Zhang, J.; Zhang, Z.; Sun, G.; Ye, L.; Ferdinando, H.; Alasaarela, E.; Seppänen, T.; Yu, X.; Yang, S. Emotion recognition and school violence detection from children speech. Eurasip J. Wirel. Commun. Netw. 2018, 235. [Google Scholar] [CrossRef] [Green Version]
- Kushwah, A.; Kumar, S.; Hegde, R.M. Multi-sensor data fusion methods for indoor activity recognition using temporal evidence theory. Pervasive Mob. Comput. 2015, 21, 19–29. [Google Scholar] [CrossRef]
- Fahad, M.S.; Ashish, R.; Jainath, Y.; Akshay, D. A survey of speech emotion recognition in natural environment–science direct. Digit. Signal Process. 2020, 110, 102951. [Google Scholar] [CrossRef]
- Muljono, M.R.P.; Agus, H.; Catur, S. Speech emotion recognition of indonesian movie audio tracks based on MFCC and SVM. IC3I 2019, 22–25. [Google Scholar] [CrossRef]
- Si, L.; Wang, Z.; Jiang, G. Fusion recognition of shearer coal-rock cutting state based on improved RBF neural network and D-S evidence theory. IEEE Access 2020, 8, 101963–101977. [Google Scholar] [CrossRef]
- Lin, Z.; Tang, S.; Peng, G.; Zhang, Y.; Zhong, Z. An artificial neural network model with Yager composition theory for transformer state assessment. In Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 25–26 March 2017; pp. 652–655. [Google Scholar] [CrossRef]
- Avola, D.; Cascio, M.; Cinque, L.; Foresti, G.L.; Massaroni, C.; Rodolà, E. 2d skeleton-based action recognition via two-branch stacked LSTM-RNNS. IEEE Trans. Multimed. 2020, 22, 2481–2496. [Google Scholar]
- Avola, D.; Cinque, L.; Fagioli, A.; Foresti, G.L.; Massaroni, C. Deep temporal analysis for non-acted body affect recognition. IEEE Trans. Affect. Comput. 2020, 1–12. [Google Scholar] [CrossRef]
Database | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Video database | 92.00 | 95.65 | 88.00 | 91.67 |
Database | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
CASIA database | 91.67 | 94.12 | 88.89 | 91.43 |
Finnish database | 95.00 | 95.00 | 95.00 | 95.00 |
Chinese database | 88.33 | 89.66 | 86.67 | 88.14 |
Result | Video | Audio | D–S Fusion |
---|---|---|---|
Violence | 0.98 | 0.10 | 0.84 |
Non-violence | 0.02 | 0.90 | 0.16 |
Result | Video | Audio | D–S Fusion |
---|---|---|---|
Violence | 0.98 | 0.01 | 0.33 |
Non-violence | 0.02 | 0.99 | 0.67 |
Result | Video | Audio |
---|---|---|
Violence | ||
Non-violence |
(a) | ||||
---|---|---|---|---|
Algorithm | Accuracy | Precision | Recall | F1-Score |
Yager | 86.21 | 91.30 | 91.30 | 91.30 |
Section 3.3.2 | 95.38 | 97.30 | 94.74 | 96.00 |
Section 3.3.3 | 94.00 | 97.83 | 90.00 | 93.75 |
(b) | ||||
Algorithm | Accuracy | Precision | Recall | F1-Score |
Yager | 86.21 | 91.30 | 91.30 | 91.30 |
Section 3.3.2 | 95.31 | 97.22 | 94.59 | 95.89 |
Section 3.3.3 | 97.00 | 97.96 | 96.00 | 96.97 |
(c) | ||||
Algorithm | Accuracy | Precision | Recall | F1-Score |
Yager | 80.00 | 90.91 | 80.00 | 85.11 |
Section 3.3.2 | 84.75 | 95.45 | 72.41 | 82.35 |
Section 3.3.3 | 92.00 | 97.73 | 86.00 | 91.49 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, L.; Liu, T.; Han, T.; Ferdinando, H.; Seppänen, T.; Alasaarela, E. Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences. Remote Sens. 2021, 13, 628. https://doi.org/10.3390/rs13040628
Ye L, Liu T, Han T, Ferdinando H, Seppänen T, Alasaarela E. Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences. Remote Sensing. 2021; 13(4):628. https://doi.org/10.3390/rs13040628
Chicago/Turabian StyleYe, Liang, Tong Liu, Tian Han, Hany Ferdinando, Tapio Seppänen, and Esko Alasaarela. 2021. "Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences" Remote Sensing 13, no. 4: 628. https://doi.org/10.3390/rs13040628
APA StyleYe, L., Liu, T., Han, T., Ferdinando, H., Seppänen, T., & Alasaarela, E. (2021). Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences. Remote Sensing, 13(4), 628. https://doi.org/10.3390/rs13040628