1. Introduction
In the realm of law enforcement patrol tasks, the utilization of police robots presents a compelling alternative to human resources, offering all-weather functionality, efficiency, accuracy, and remote monitoring capabilities. This paper highlights the increasing significance of police robots in patrol work, contributing positively to community security and stability. Deployable in diverse settings such as streets, stations, airports, and large-scale events, police robots face a complex and dynamic urban environment characterized by numerous obstacles, including pedestrians, vehicles, and buildings, as well as fluctuating weather and lighting conditions. These challenges necessitate rapid data processing and precise target detection capabilities. To address these demands, efficient computer vision algorithms are employed to analyze real-time video data acquired during patrol operations. However, traditional You Only Look Once (YOLO) algorithms encounter limitations such as insufficient real-time processing, susceptibility to environmental factors, and suboptimal detection accuracy. This paper explores strategies for overcoming these challenges and enhancing the effectiveness of police patrol robots in ensuring public safety and security.
Security patrols often encounter challenging scenarios, such as violent altercations involving weapons, necessitating rapid and accurate detection methods. This paper addresses the complexities inherent in detecting such behaviors, including the diversity of actions, real-time decision-making requirements, occlusion-induced feature incompleteness, and increased misjudgments. To mitigate these challenges, Scientists have devised creative techniques.
Mostafa et al. [
1] proposed a YOLO-based deep learning C-Mask model for real-time mask detection and recognition in public places through drone monitoring, greatly improving mask detection performance in terms of drone mobility and camera orientation adjustment. Zhou et al. [
2] designed an efficient multitasking model that fully mines information, reduces redundant image processing calculations, and utilizes an efficient framework to solve problems such as pedestrian detection, tracking, and multi-attribute recognition, improving computational efficiency and decision-making accuracy. Wang et al. [
3] applied object detection to the fields of security and counter-terrorism and proposed a Closed-Circuit Television (CCTV) autonomous weapon real-time detection method based on You Only Look Once version 4 (YOLOv4), which has greatly improved accuracy, real-time performance, and robustness. Alvaro et al. [
4] proposed a dual recognition system based on license plate recognition and visual encoding in the field of recognition and detection. They evaluated the performance of both the public and proposed datasets using a multi-network architecture, making significant contributions to vehicle recognition. Azevedo et al. [
5] combined the YOLOR-CSPNet (YOLOR-CSP) architecture with the DeepSORT tracker, which greatly improved inference efficiency. Pal et al. [
6] proposed a new end-to-end vision-based detection, tracking, and classification architecture that enables robots to assist human fruit pickers in fruit picking operations. Ji et al. [
7] proposed a small object detection algorithm called MCS-YOLOV4 based on YOLOv4 and the Corrected Intersection over Union (CIoU) loss function, which introduces an extended perception block domain and improves the attention mechanism and loss function, resulting in significant improvements in detecting small objects. Garcia-Cobo et al. [
8] proposed a new deep learning architecture that uses Convolution Long Short-Term Memory (ConvLSTM), a convolutional alternative to standard Long Short-Term Memory (LSTM), to accurately and efficiently detect violent crimes in surveillance videos. In their research on violence detection, Rendón-Segador et al. [
9] proposed the CrimeNet network, which combines a Vision Transformer (ViT) with a neural network trained with adversarial Neural Structured Learning (NSL) to improve detection accuracy.
The above mentioned algorithmic improvements have improved the target detection accuracy to different degrees; however, when detecting police situations such as fights, knives, guns, etc., there are still problems such as low real-time performance, difficulties in dealing with the situation of cover-up of characters, and frequent false detections, which lead to the accuracy and speed of police recognition not meeting the requirements. In order to improve the effect of the police patrol robot in the process of security patrol for the recognition of dangerous police situations, this paper improves the algorithm from the backbone network, the attention mechanism, and the loss function. Targeted optimization is carried out from the perspectives of real-time, accuracy and misjudgment difficulty, to improve its target detection performance and robustness in complex environments, and to provide a powerful and efficient solution for target detection of police patrol robots in security patrol scenarios.
4. Conclusions
In large security events, airport lobbies, station waiting rooms, and other densely populated areas with high crowd mobility, the target person may be partially or completely obscured, and dangerous behavior may occur in an instant. Robots are required to have ultra-high frame rate shooting and real-time processing capabilities. In addition, criminals may intentionally hide weapons or use crowd cover to make it difficult for robots to capture key frames. The improved model achieves high efficiency and accuracy in identifying dangerous behaviors such as holding knives and guns during the movement of police patrol robots in densely populated areas.
The enhanced algorithm suggested in this research employs the lightweight FasterNet network to substitute the initial backbone network. This substitution enhances the detection speed of the model while ensuring the accuracy of detection. Additionally, the inclusion of the attention mechanism BiFoemer further enhances the model’s accuracy. Moreover, the utilization of the WIoU as a loss function results in the enhancement of the bounding box regression performance of the detection model. The dataset for this work was created by combining the publicly accessible dataset with the pertinent image videos captured during real-life police incidents.
The experimental results of the improved model on the dataset show that in the fighting and assault dataset, compared with the original YOLOv8, in the two evaluation indexes of average detection accuracy and detection speed, the detection of fighting and assault behavior recognition has been improved by 2.42% and 5.83%, respectively, the detection of knife-wielding behavior recognition has been improved by 2.87% and 4.67%, respectively, and the detection of gun-wielding behavior recognition has been improved by 3.01% and 4.91%, respectively. It is capable of performing the target detection task of police situation recognition in the process of security patrol. In the future, we will continue to optimize the model on the basis of the model, so that the model has higher detection ability and faster detection speed when facing more complex security patrol scenes and police situations.
At present, police patrol robots have made significant improvements in terms of functionality, but they still need to undergo practical testing when dealing with complex scenarios. In addition, in terms of laws and regulations, it remains to be further clarified whether police patrol robots can serve as independent law enforcement entities and whether law enforcement procedures comply with current legal provisions.