A Systematic Review of Using Machine Learning and Natural Language Processing in Smart Policing

: Smart policing refers to the use of advanced technologies such as artiﬁcial intelligence to enhance policing activities in terms of crime prevention or crime reduction. Artiﬁcial intelligence tools, including machine learning and natural language processing, have widespread applications across various ﬁelds, such as healthcare, business, and law enforcement. By means of these technologies, smart policing enables organizations to efﬁciently process and analyze large volumes of data. Some examples of smart policing applications are ﬁngerprint detection, DNA matching, CCTV surveillance, and crime prediction. While artiﬁcial intelligence offers the potential to reduce human errors and biases, it is still essential to acknowledge that the algorithms reﬂect the data on which they are trained, which are inherently collected by human inputs. Considering the critical role of the police in ensuring public safety, the adoption of these algorithms demands careful and thoughtful implementation. This paper presents a systematic literature review focused on exploring the machine learning techniques employed by law enforcement agencies. It aims to shed light on the beneﬁts and limitations of utilizing these techniques in smart policing and provide insights into the effectiveness and challenges associated with the integration of machine learning in law enforcement practices.


Introduction
Artificial intelligence (AI) is becoming increasingly popular for tackling tasks that can be time-consuming for humans.Machine learning (ML) algorithms act as the key technology enabler in many fields, such as healthcare, business, law enforcement, and policing [1].Police agencies, crime labs, and courts employ algorithms for various purposes, including administrative tools, facial recognition programs, surveillance cameras, DNA matching, and bail and sentencing [2].These technologies are expected to achieve quicker results while minimizing human prejudices.However, they still have the potential to reflect human biases, because training an ML algorithm involves learning patterns in labeled training data, typically generated by humans [3].
In recent years, due to the increasing number of reported criminal incidents, accompanied by the growing amount of crime data, which are difficult for humans to process manually, the use of tools provided by smart policing has become more common.Additionally, the main priority of police departments is to prevent crime so as to increase cities' safety.As a result, predictive policing has been introduced as a research field, which involves a range of technologies, such as crime documentation, predictive crime maps, advanced computer software, and artificial intelligence algorithms.These stools enable the police to utilize predictive analytics, making forecasts regarding the probable occurrence of future crimes and identifying potential perpetrators and victims.The underlying rationale behind these predictions lies in the assumption that criminal behavior and crime patterns can be predicted by drawing on criminological research and theories like rational choice and deterrence theories, routine activities theory, and broken windows theory [4].
The main contribution of this paper is providing a systematic literature review (SLR) [5] of various AI frameworks based on ML and natural language processing (NLP) that have been proposed and used in smart policing, while bringing transparency to their methods, especially those with statistical reliability to generate consistent data over multiple uses of a model or algorithm.The objective of a systematic review is to collect and provide a summary of studies that address a formulated research question [5].
There are surveys in the literature that cover different topics on ML and NLP in smart policing.One survey explored 15 studies to evaluate the possibilities of leveraging massive data repositories to scrutinize crime incidents and their correlation with different socioeconomic factors.This study suggests developing efficient computational models for crime prediction by identifying outliers, categorizing crime patterns, and employing advanced data mining and machine learning techniques [6].
Another paper presents an evaluation of several relational extraction systems based on NLP techniques according to their effectiveness in identifying semantic relations within criminal police reports, encompassing both English and Portuguese documents.The study provides valuable guidance for further research and the design of relational extraction systems for relevant domains [7].
One review paper presents a comprehensive and in-depth analysis of data mining applications in the context of crime by examining over one hundred applications.These applications are systematically listed in chronological order, providing a historical perspective of the evolution of data mining in crime analysis.With the growing applications of data mining techniques and the emergence of big data, the paper also addresses the need for increased training and investment in educating and empowering the youth with knowledge of the advantages, developments, and practical uses of data mining techniques [8].
Another systematic review analyzed over 150 studies, investigating the application of machine learning and deep learning algorithms in crime prediction.The study provides trends and factors associated with criminal activities by examining the algorithms and datasets used in crime prediction research [9].
As there is a lack of a holistic understanding of the financial cybercrime ecosystem, a survey tried to address this gap by studying the financial cybercrime ecosystem based on four factors: different fraud methods adopted by criminals; relevant systems, algorithms, drawbacks, constraints, and metrics used to combat each fraud type; the relevant personas and stakeholders involved; and open and emerging problems in the financial cybercrime domain [10].
One paper also conducted an extensive investigation into different approaches employed globally for crime prediction.The methods were systematically categorized, and their effectiveness was assessed based on precision and accuracy [11].
The present study aims to comprehensively explore the research papers within the field of crime prediction, encompassing the utilization of both ML and NLP techniques in this domain.Additionally, it seeks to shed light on the ethical challenges associated with the deployment of these methodologies.It is noteworthy that our study was carried out as a part of a research project for Mobile Innovations Corporation, which offers an application designed to empower police officers to write incident reports more quickly.
The remainder of this paper is organized as follows: Section 2 summarizes the relevant background, terminologies, and definitions necessary to understand the paper.Section 3 provides details on the method used to conduct the systematic literature review, including the research questions, and the search process used to identify the primary studies.Section 4 presents the findings and results, which consist of the list of primary studies found and the answers to the research questions.This section also presents a detailed discussion of the limitations of existing methods and the ethical challenges.Section 5 discusses a use case of large language models in the smart policing application provided by Mobile Innovations Corporation, and Section 6 provides directions for future research.Finally, Section 7 concludes the paper.

Background
The rise in crime rates and the challenges that they present have sparked a need for effective crime forecasting and preventive measures.Smart policing has rapidly emerged as a response to the pressing need for innovative solutions in law enforcement, particularly in light of various high-profile cases of police misconduct and growing public demands for reform [12].Due to the growing amount of crime data, law enforcement agencies and police departments consider the use of advanced technologies such as smart policing to process this large volume of data, offering promising avenues for crime prediction, prevention, and improved efficiency [13].
In general, smart policing refers to the application of data, analytics, and innovative technologies, such as AI and big data, to enhance law enforcement activities and ensure public safety [12,14].It involves the development of various technologies for predicting and preventing crimes, leveraging accumulated security data and AI.Data analysis and pattern recognition play a crucial role in identifying emerging patterns and trends in criminal activities, enabling authorities to take proactive actions [15].Additionally, smart policing tools can produce results in less time while mitigating human prejudices.Studies show that law enforcement and police departments use ML and NLP techniques for multiple tasks, such as administrative tasks, forensics, analyzing crime statistics, creating crime maps, CCTV surveillance, license plate recognition, facial recognition, speech-to-text reporting, and crime documentation [13,16].
There are also different types of tools adopted by police departments for analyzing crime data and predictive policing, which refers to technologies that use ML algorithms and statistical analysis methods to predict criminal activities and their location, date and time, type of crime, and victims of future crimes based on both historical and real-time crime data [17].These predictions can assist law enforcement agencies in making decisions more efficiently, particularly regarding resource deployment.In theory, predictive policing is based on the assumption that crimes do not happen randomly; instead, they are followed by local environmental situations and the situational decision-making of victims [4,18].Therefore, these technologies will help find crime patterns and aid in police intervention and prevention.
Unlike traditional policing methods that primarily rely on criminal data, predictive policing considers a broader range of data sources.These technologies use data mining methods to collect and analyze a wide range of data, including structured and unstructured data.The employed methods help law enforcers to identify crime trends, and they facilitate resource deployment and decision-making.
The shift toward predictive policing happened in the late 2000s.Before this change, a form of smart policing known as statistically informed policing, which includes intelligenceled or data-driven policing, emerged in the 1990s when Jack Maple, a New York City transit police officer, developed a crime mapping system by visualizing the locations where crimes happened repeatedly.The New York City Police Department later adopted this system.This approach, called CompStat, is now widely used by police departments worldwide [19].It helps identify and analyze crime patterns and hotspots, measure and incentivize police activity, and allocate police resources effectively; therefore, it plays the role of a crime control and prevention method as well [20].
With the deployment of such analytical platforms, classical public statistics could now be replaced by algorithmic practices that focus on prediction by identifying clusters and patterns [21].In recent years, the rise of algorithms has led to increased interest in studying algorithms in the social sciences.As a result, accountability, transparency, and audit have become crucial aspects of public debates about algorithms [22].
Predictive policing can also be viewed as a form of preemptive policing based on statistical data.This implies that law enforcement can collaborate with various societal actors to address the main factors that lead to criminal behavior and promote shared safety.

Methodology
Following the guidelines suggested by [23] and the PRISMA method [5], this systematic review adheres to a structured approach.This section elaborates on the methodology employed to carry out the literature review, encompassing various intricate procedures.It comprehensively outlines the completion of each stage of the process.

Research Questions
This study aims to answer the following research questions (RQs):

•
RQ1: What methods in ML and NLP have been proposed to process crime data and predictive policing?• RQ2: What are the strengths and limitations of the current proposed methods, and how can they be addressed?
Addressing these issues helps to gain a deeper comprehension of the present shortcomings within the field, which will lead to investigating potential solutions for the limitations of current predictive policing algorithms and devising approaches to handle text data more effectively.

Research Process
The purpose of this study was to find published papers related to the applications of AI used in policing, how it can be helpful for predictive policing, its challenges, and the proposed solutions to address them.
Following the PRISMA method, we performed our search on IEEE and Google Search, which indexes a wide range of scholarly publications, in the "incognito mode" of Google Chrome to prevent any interference from cookies.We used the following string in search engines to find the related studies: ("Artificial intelligence" OR "Machine Learning" OR "Natural language processing" OR "Deep Learning") AND ("Policing" OR "Law enforcement" OR "Predictive policing") Publications written in English from journals or conference proceedings were selected, and the last search was conducted on 25 July 2023.Google Scholar offered about 25,000 results, and the first 30 pages were evaluated to identify the most relevant literature.The selection of primary studies involved reviewing the titles, keywords, and abstracts, in addition to briefly scanning the main contents of the papers to gain insights into the conceptualization of using AI in smart policing and predictive policing, its benefits, and potential drawbacks and challenges.To ensure an unbiased selection of primary studies, a set of inclusion criteria were defined.Primary studies had to fulfill at least one of the following inclusion criteria (ICs):

•
IC1: Provides/lists the ML methods or frameworks used in smart policing; • IC2: States the challenges of using AI in smart policing and how to address them.
Finally, 45 papers, including 12 papers from IEEE, were considered as primary studies.To make sure that we covered the most relevant related works, we also used the snowballing technique to find additional related works by examining the references of the primary studies [24].Using this technique, we added 58 papers to our review; therefore, 103 papers were reviewed in total.

Findings
The results and findings presented in this section respond to the proposed research questions.The used algorithms and proposed tools in smart policing are identified, and the limitations and benefits of these methods are presented in this section.
Through the search process, 46 primary studies were identified, in addition to 33 papers that were added during the snowballing.In the following text, these studies serve as the basis to answer the proposed research questions in this systematic literature review.

Addressing RQ1
In general, crime can be associated with individuals or places, leading to the categorization of smart policing technologies into two main groups: One category involves location-based approaches that predict where and when a crime is likely to be committed, with a focus on relevant factors of criminal activities and environmental features [25].They usually use mapping systems to split the map into small segments or grids and then calculate the probability of a crime being committed based on the features of each segment; therefore, risk profiles will be generated for different locations.These methods are also useful to forecast the timing of officer patrols for detecting and deterring criminal activity [26].
The second group is person-based approaches, or offender-based models.These strategies focus on identifying the people who are most likely to be criminals or victims based on their personal information assessment or their history of criminal behavior.These models generate risk profiles of people within the criminal justice system, which are then used by police departments and law enforcement agencies to determine the appropriate actions [25].
Throughout our systematic literature review, we organized the studies based on the techniques that they used in smart policing.These techniques belong to three groups of mapping techniques that involve using statistics, ML, and NLP.

Mapping Techniques
Various mapping techniques are employed to identify crime hotspots, which can be inferred as a basic form of crime prediction.As listed in [27], these techniques include point mapping, thematic mapping of geographic areas, spatial ellipses, grid thematic mapping, and kernel density estimation (KDE).
Spatial ellipses include tools that locate dense concentrations of crime points on a map, known as hot clusters, and then fit a "standard deviational ellipse" to each cluster.These ellipses provide information about the nature of the underlying crime clusters based on their size and alignment [28].However, criticisms arise due to the need for users to understand the software's routines, as the lack of guidance on parameter values can lead to ambiguity and variable results.Additionally, the representation of hotspots as ellipses may not accurately reflect the distribution of crime, potentially leading to misleading interpretations [29,30].
Geographic boundary thematic mapping is a method for representing spatial distributions of crime events that involves aggregating crime incidents into predefined geographic units and shading these areas based on the number of crimes within them.However, thematic shading based on boundaries may fail to reveal patterns across and within these units [31].Despite the limitations, this mapping system is still widely applied in various contexts, including analysis of vehicle theft in relation to land use and crime pattern analysis [29].
Kernel density estimation (KDE) is considered to be the most suitable method for visualizing crime data, due to its availability and accuracy in identifying hotspots, as well as its aesthetic appeal [29,32].KDE combines the area division in a regular grid of cells and the aggregation of point data within a specified search radius to estimate the probability density of actual crime incidents for each cell by using a kernel function to estimate the probability density of actual crime incidents.This will result in a heatmap that represents the density or rate of criminal events across the study area without being constrained by geometric shapes like ellipses [33].
Despite the popularity of KDE, the selection of a thematic range can be problematic, as agencies often prioritize visual appeal over the validity of the map.This can lead to variations in maps created from the same data.There are also concerns that maps can be misleading when they are created based on small amounts of data [29].

Machine Learning
In 2012, PredPol, Inc. introduced a predictive analysis platform that provides real-time crime risk information with a precision of 200 meters.This startup gained prominence in predictive policing by offering more than traditional crime hotspot maps [34].Their method was inspired by earthquake prediction techniques, as researchers observed similarities between crime propagation dynamics and earthquakes [35].PredPol utilizes stochastic point processes, a statistical physics approach, and a machine learning algorithm to make predictions modeling the distribution of events in time and space.The algorithm is trained based on historical event datasets for each city, and it is regularly updated with new events from the police department on a daily basis [36].Many similar platforms are in use by police departments throughout the nation, as listed and summarized in Table 1.
More recently, data mining and ML algorithms have played an important role in crime prediction tasks, including predicting crime hotspots and crime categories or identifying criminals and victims.According to studies, predictive policing relies on many data mining and ML techniques, such as classification clustering, and regression, but not all of these techniques perform equally effectively.Pre-crime observation system that predicts crimes by mainly consulting the near-repeat hypothesis and a rational-choice-framed conception of offenders that can be translated into algorithms for classifying and evaluating crime risk in geographic areas

Police departments in Switzerland and Germany
Various studies have compared different algorithms or designed frameworks with the utilization of ML in this context.These algorithms include support-vector machine (SVM), naïve Bayes (NB), artificial neural networks, k-nearest neighbors (KNN), decision tree (DT), and random forest (RF).Table 2 shows a brief summary of these studies, including the algorithms and methods that they have utilized.Table 3 also provides detailed measurements of these studies, including information about the used datasets and the prediction accuracy of each model's performance when applied as reported in the studies.
Accuracy is considered, as most of the methods in the reviewed studies are classification algorithms that are used for predicting crime categories/types or crime hotspots.Random forest, one of the popular methods, is an ensemble learning method used for classification or regression tasks.It generates multiple decision trees by training them on different subsets of the dataset by using the bagging method and random selection of attribute sets.The individual predictions of these trees are then combined and determined by the voting of tree classifiers to generate a final output and reduce the risk of overfitting [48,49].Clustering methods such as k-means are also popular in crime prediction and analysis.One study provided an ML framework for crime prediction and prevention in big cities using k-means clustering and the naïve Bayes classifier [54].They showed that, using k-means clustering, they could learn the behavior of the corresponding entity to identify the geospatial region to which it belongs.Using these methods, they identified regions with the highest rates of crime to predict where the next crimes would happen.
Regression methods are used when the objective is to estimate the value of a variable by considering the value of known predictor features.One study performed a comparison between regression models, negative binomial, and Poisson regression, showing that all three of these models perform similarly [58].In another study [53], the ARIMA model was compared with smoothing exponential methods with SES and HES, where ARIMA showed higher accuracy for crime prediction based on the time series of crime data (including robberies, thefts, and burglaries) derived from the 110 computer-aided dispatch (CAD) recordings of the local police station.
As a relatively new area of application, smart policing technologies are mainly dominated by different types of neural networks, commonly referred to as deep learning methods [69,[73][74][75].Various types of deep learning methods exist for specific purposes.Convolutional neural networks (CNNs) were designed for image classification tasks like facial recognition, which can also be applied to spatial data, such as maps, treating them as images.In this preprocessing step, CNNs extract essential features from the images, which are then used as predictors in a neural network.Recurrent neural networks (RNNs) were developed to handle pooled cross-section data, enabling the exploitation of temporal structures within the data.Additionally, generative adversarial networks (GANs) can be employed as target hardeners to enhance the security of algorithms that are vulnerable to hacking [76].
In one study [74], researchers used a combination of deep learning and ML methods to design a policing system in Sri Lanka as a mobile application.This system has an automated video surveillance monitoring component that can analyze human activities to identify suspicious behaviors using a CNN model.Also, pretrained state-of-the-art models, including VGG16, InceptionV3, and ResNet-50, are used to obtain high-level feature maps from the final pooling layer output.These extracted features are then fed into an LSTM network to perform the final behavior classifications.In addition, the crime prediction component of the app involves classification algorithms like SVM, DT, RF, and logistic regression (LR) to visually display locations on a map where there is a higher probability of crime occurrence.
Moreover, several studies are dedicated to the application of these algorithms in surveillance technology.These systems, found in public and private locations, allow for simultaneous monitoring of various locations and have evolved significantly over the years given the rising global concerns related to crime and terrorism [77,78].In [72], the researchers introduced a crime detection system that involves an aerial spy vehicle that resembles the shape of a bird and constantly flies in the sky, capturing images and detecting unusual activities.It relies on deep neural networks (DNNs) to analyze video data and predict future frames.The process involves converting raw video into individual frames, which are then transformed into grayscale images.CNNs and RNNs are applied to extract features from these images and classify them.The system aims to minimize human intervention and help law enforcement authorities catch criminals effectively.
Furthermore, many studies have focused on the customization of deep neural networks for the real-time detection and classification of weapons during surveillance of criminal activities.These efforts highlight the growing demand for automatic systems in policing, given the increasing rate of crime and the frequent use of handheld weapons like pistols and revolvers in illegal or criminal activities [77,[79][80][81][82][83][84][85][86].Another study proposed a model to detect handguns based on the individual's pose, utilizing CNNs [87].Using different architectures of CNN is a common practice for weapon detection in images, as it has shown exceptional performance in object recognition tasks [88].One of the popular methods used for weapon detection is the YOLO (You Only Look Once) family of CNNs, which has evolved through versions YOLOV1 to YOLOV4.In YOLOV1, there is a single CNN for predicting object bounding boxes in grids [89].YOLOV2 improved its accuracy with techniques like batch normalization and anchor boxes [90].YOLOV3 incorporated multilabel classification, prediction of different bounding boxes, and feature pyramid networks.It also introduced the Darknet-53 feature extractor [91].YOLOV4 further enhanced learning with cross-stage partial connections, Cross mini-batch normalization, mish-activation, mosaic data augmentation, drop block regularization, and CIoU loss for bounding box regression, resulting in improved accuracy and speed [92].
One study introduced a Raspberry-Pi-and cloud-assisted face recognition system for law enforcement agencies, enabling them to securely detect and recognize faces in real-time scenarios.A portable wireless camera was attached to a police officer's uniform to capture videos, which were processed by the Raspberry Pi for facial detection and recognition.The method employs a bag-of-words model for feature extraction and an SVM for identifying suspects [93].ML algorithms such as CNN and SVM are applicable in facial recognition, which is a critical area of research, and its applications extend to security, law enforcement, and public surveillance [93][94][95].While these algorithms have shown promise in facial recognition, their practicality and effectiveness in real-world law enforcement scenarios remain relatively unexplored.

Natural Language Processing
Most police departments use electronic systems for crime reporting that have replaced the traditional paper-based crime reports.When a crime is recorded by police, situational and behavioral details describing the incident are documented in a free-text narrative report.These crime reports typically contain information such as the type of crime, date/time, location, and information about the suspect, victim, and witness(es), in addition to the narrative or description of the crime.The challenge in mining crime data often comes from the narrative part, as converting them into data mining attributes is not always an easy job [50].Some studies have shown that, by means of NLP, these documents can be more useful for administrative and investigative tasks in smart policing [106].NLP is a subset of AI and ML that includes approaches to analyze natural language in text or speech [107].
Police narrative reports are noisy, as they include grammatical mistakes, misspellings, acronyms, and informal language.Also, as other entities such as crime type names or vehicles also exist in these reports, general named-entity recognition tools may not be effective.Additionally, they include sensitive data, including the personal information of victims or criminals.Therefore, NLP models to analyze crime data in these reports should be trained on various data addressing these challenges.Additionally, police agencies often lack the expertise and resources to conduct detailed analyses or securely share data for academic research.
As mentioned earlier, ML has been recognized as a valuable tool in the field of criminology.However, according to a recent review on the intersection of crime and AI, there is a lack of research specifically focusing on NLP in this literature, particularly in relation to police free-text data analysis.In this section, our focus is specifically on analyzing free-text police data [108].
Existing analysis of free-text crime data often revolves around unsupervised learning and crime linkage [8].Crime linkage aims to identify crimes committed by the same individual(s).Notable studies using unsupervised learning and NLP with police free-text data include [109] and [110], which explored how crimes can be grouped based on their characteristics and how they were committed.Other studies, like [111,112], use unsupervised NLP techniques to cluster crimes to inform policing strategies in different areas.
There have also been efforts to extract specific information directly from police freetext data, such as exploring the relationship between mental health and types of domestic violence through rule-based information extraction [113,114].However, this approach requires substantial effort in building rules and dictionaries, making it challenging for routine adoption.
Additionally, many studies focus on analyzing data from social media platforms such as Twitter.For example, a study has hypothesized that language usage on Twitter can be a valuable measure to predict crime rates in cities.They used the WEKA preprocessing toolkit and SVM to analyze and classify Twitter data [115].
Other studies have explored Twitter-based prediction of criminal incidents, specifically focusing on hit-and-run crimes [97,116].Their approach involved semantic analysis of tweets through semantic role labeling to extract events mentioned in tweets.They then employed latent Dirichlet allocation for event-based topic extraction, revealing hidden relationships between major events and observable events reported in tweets.The predictive model itself relies on a generalized linear regression framework to predict whether an incident will occur on the following day based on the information gleaned from tweets.
One of the common NLP techniques used in smart policing, especially in writing police reports, is information extraction, including named-entity recognition (NER), which aims to detect named entities such as people, places, organizations, and dates and extract specific crime elements from reports and data.NER enables better problem grouping and improves information availability, which is often lacking in structured formats.By automating the extraction of detailed information from crime reports, NER significantly reduces the analysis time, allowing police analysts to respond effectively.Studies that utilize information extraction and NER for crime data analysis are listed in Table 4.
In [117], four main approaches for NER are listed (lexical lookup, rule-based, statisticsbased, and ML), while most of the existing NER systems are based on more than one of these approaches.Another study proposed an information extraction method using NER that outperformed Linguakit, a multilingual toolkit developed for NLP that contains NER, and RAPORT, which is a Portuguese question-answering system that uses NLP and NER [118].

Reference Data Source(s)
Description Result [115] Text analysis and classification using the WEKA toolkit and SVM [116] Twitter data in addition to 290 incident records collected from local law enforcement agencies in Charlottesville, Virginia Explores Twitter-based prediction of criminal incidents, with a focus on hit-and-run crimes, using NLP techniques such as sentiment analysis and event extraction.A linear regression model is also used to predict if a crime will occur in the following days based on information extracted from the tweets The model's performance was evaluated using a receiver operating characteristic (ROC) curve.The results indicated that date from social media platforms such as Twitter could be a valuable resource for predicting criminal incidents, but there are areas for improvement and further research, especially considering the temporal aspect of event descriptions and feature selection methods [118] Portuguese narrative police reports Presents a system that uses information retrieval techniques to extract, transform, clean, load, and find a connection between police reports collected from different sources to identify relevant entities within the extracted information The proposed model outperformed Linguakit and RAPPORT in terms of the F-score [119] Mozenda Web Screen Scrapper tool and 4 online newspapers: Otago Daily Times, Zealand Herald, Sydney Morning Herald, and The Hindu Proposed a crime information extraction system using NER and a conditional random fields (CRF) machine learning approach to identify locations in sentences and classify them based on online newspapers by focusing on information related to the theft crime The model was evaluated based on four newspaper articles from three countries, resulting in accuracy of 84% to 90% for articles from New Zealand and 73% to 75% for articles from India and Australia [120] Malaysian newspapers and social media sites Introduced an ensemble framework for crime information extraction from the web using NER and classification algorithms including NB, SVM, and KNN, along with a weighted voting ensemble method to combine them The proposed model outperformed the baseline models, with an F-score of 89.48% for identifying crime types and 93.36% for extracting crime-related entities [121] News articles related to identity theft on the internet found by search engines and annual identity theft reports Proposed an approach to analyze criminal behaviors and predict future trends of identity theft and fraud using NLP methods and information extraction, including NER and part-of-speech tagging based on raw text from news articles on the web.The Identity Threat Assessment and Prediction (ITAP) algorithm, designed in a modular pipeline, collects news stories, preprocesses them, extracts named entities, categorizes them, and creates identity theft records Around 3500 identity theft news stories were collected, their text was cleaned, and named entities were extracted and categorized.These categories formed identity theft records, which were then used for various analyses, such as identifying affected groups, assessing risk for specific PII attributes, tracking occurrence frequency across different sectors and locations, evaluating potential financial impacts, and tracking changes over time [122] A set of crime reports related to internet fraud on the official website of the Dutch police (each report contains 1-5 sentences and 85 tokens on average) Evaluates the standard NER algorithm, named Frog, for the Dutch language based on a manually annotated corpus collected from 250 complaints reports from the Dutch police; it discusses confusion in entity type assignment and recall errors, and proposes ways to improve performance The current Dutch NER algorithm performs inadequately on unedited free-entry data.The significance of this depends on the purpose of entity recognition, e.g., law enforcement seeks relevant information, while linguistics aims for named-entity identification, so different types and assignments matter, and domain-specific roles demand further processing Table 4. Cont.

Reference Data Source(s) Description Result
[123] -Proposed a method for extracting valuable information about suspects' hard drives and social networks to discover criminal communities and analyze their relations The method efficiently identified criminal communities and their interlinked subgroups, offering a detailed view of network structure, crucial for criminal network analysis; it also received positive feedback from a Canadian law enforcement unit's digital forensics team [117] A set of police narrative reports provided from the Phoenix Police Department database Presents a neural-network-based entity extractor by using NER techniques to detect valuable entities such as person names, addresses, narcotic drugs, and vehicle names in police reports The system achieved promising precision and recall rates for person names and narcotic drugs but performed less effectively for addresses and personal properties [124] Texts on the web Proposed a semantic NLP model to develop systems that extract crime information from unstructured text in a collaborative web environment.The framework centers around a semantic inferential model (SIM)-based NLP module This framework's performance was demonstrated through the creation of "WikiCrimesIE," a tool for extracting crime-related information from text on the web, which gained an F-score of 78% for crime extraction and 70% for crime type identification [125] Chinese criminal investigation notes, online news on the internet, and litigation data

Introduces a method for criminal information analysis and relation visualization by utilizing entity extraction techniques and part-of-speech (POS) tagging based on Chinese criminal text
By forming term networks based on documents from sources like criminal investigation notes, news, and litigation data, this method enhances the visualization of detailed information and hidden relationships, enabling efficient exploration of potential criminal activities [126] 65 Arabic crime articles with a total of 13,300 words Introduces a rule-based NER to identify and classify named entities in Arabic crime text as it applies syntactical rules such as sentence splitting, tokenization, and POS tagging The system achieved 90% accuracy, showing effectiveness and satisfactory performance.The paper outlines plans to integrate the rule-based system with machine learning techniques and embed it within a crime analysis framework [127] Crime news articles represented in html format collected from the Malaysian National News Agency (BERNAMA Introduces a method to extract information on nationalities from crime news in Malaysia by applying NER using gazetteers and rule-based extraction.The system is composed of three modules: direct extraction, indirect extraction, and victim-suspect reference identification The method's performance was evaluated based on a manual extraction system and showed an F-score of 70%.The authors also highlighted challenges with punctuation and nationality indicators causing the system to miss certain references or extract incorrectly, as well as difficulties in identifying implicit state markers for victims or suspects [128] Crime news from online sources and crime records for 2001 to 2014 provided by the National Crime Records Bureau Presents an Android application called Reach 360, designed to offer alerts and support in dangerous situations, including features such as alerting contacts, demonstrating crime hotspots via heatmaps, and forecasting crimes based on crime news using machine learning.NLP tasks such as sentence segmentation, word tokenization, POS, and NER are used to process crime news Multilayer perceptron performed better than logistic regression and RF in terms of accuracy for crime forecasting.The study does not provide other specific details on the performance evaluation of their application; it mainly focuses on introducing its features, the methodology behind it, and its potential to address safety concerns and forecast crimes Various models were applied to forecast hate crime trends, and the results were compared.Regressive models outperformed the ARIMA model, with models including event-related variables performing better [131] Twitter posts by users in the United Kingdom between October 2015 and October 2016 Presents a comprehensive study of online antagonistic content on Twitter that involved data collection from Twitter The authors developed a supervised machine learning classifier with a bag-of-words model to identify antisemitic content, providing an analysis of the production and propagation of antagonistic content [132] A corpus of two million downloaded tweets Introduces an intelligent system used by the Spanish National Office Against Hate Crimes to identify and monitor hate speech on Twitter.The system makes use of NLP methods including lemmatization, stop-word removal, and POS tagging for preprocessing tweets, and then classifies them using MLP and LSTM The authors evaluated 19 different strategies, each comprising various combinations of features and classification models.Ultimately, the top-performing model, achieving an AUC of 0.828, leveraged word embeddings, emojis, and token expressions and further enhanced them through text frequency-inverse document frequency.This approach outperformed the existing models in the literature.
[71] Twitter posts Applies ML algorithms to Twitter posts for text classification and sentiment analysis to analyze hate speech and hateful sentiment in the context of spiritual belief SVM outperformed NB and KNN in terms of F-score, precision, and recall for sentiment classification and religion classification An online reporting system was developed in [133], combining information extraction and named-entity recognition with the principles of cognitive interview to retrieve information from police and witness narrative reports, with a significantly high precision rate of 94% for police narratives and 96% for witness narratives, and a recall rate of 85% for police narratives and 90% for witness narratives.The authors emphasized that utilizing information extraction methods such as named-entity recognition in crime data can help investigators to effectively collect and extract more information, especially from individuals who may be hesitant or embarrassed to report incidents [134].
Several studies demonstrate the effective fusion of NLP and ML techniques [115,116,128,129].In [128], with a primary emphasis on enhancing women's safety, the authors introduce an Android mobile application that can send alerts to users about locations where a crime has recently happened through a heatmap visualization.They use NLTK to extract information from the web through NLP tasks such as NER, part-of-speech tagging, and tokenization.Additionally, they take advantage of the MLP algorithm to forecast crime.An interesting use-case study also provides a comprehensive approach to crime analysis by integrating mapping techniques such as KDE and hotspot analysis, ML models, and NLP to understand crime patterns and forecast crime occurrence [129].They applied NLP methods such as topic modeling and sentiment analysis to tweets related to crime.
As demonstrated by researchers, hate speech crime detection can be considered as another application of smart policing where using NLP techniques alongside ML methods is common.One study introduced a framework to address the problem of hate speech on social media and its connection to hate crimes [130].The authors used NLP techniques for event extraction and a regression model based on multi-instance learning to extract hate crime events from the New York Times.In [131], the researchers utilized the SVM model combined with a bag of words to perform text classification on Twitter posts and analyze online antisemitism patterns, emphasizing the value of collective efficacy in countering online hate speech.
Another paper introduced "HaterNet", a novel classification approach that combines an LSTM neural network with an MLP, with a high AUC of 0.828 for identifying and monitoring hate speech on Twitter [132].The authors took advantage of NLP methods including lemmatization, stop-word removal, and POS tagging for preprocessing tweets, so they were presented as a vector of unigrams based on frequency and word embeddings.While detecting hate speech as a crime falls under the broad umbrella of smart policing, it is a specific focus area due to the unique challenges and consequences associated with hate speech.Smart policing can help authorities respond more effectively to hate speech, prevent escalation to hate crimes, and maintain public safety.
Furthermore, with the recent rise and popularity of generative AI and large language models such as GPT 3.5, there is a controversy about their usage in smart policing and other applications, but there are only a few studies considering the use of generative AI or customizing language models for smart policing purposes.By means of NLP models, these AI tools have shown significant success in various tasks and domains, including healthcare and medicine [135], reducing the need for extensive preprocessing of text.As mentioned in [106], while large language models hold promise for supporting policing through NLP, ethical challenges will be raised.In the following section, we address RQ2 by explaining these challenges and concerns of using AI.

Addressing RQ2
Several studies have analyzed the use of AI technologies in smart policing from different perspectives.Researchers have pointed out that AI is changing policing just like other aspects of society, but the concerns and challenges may differ due to the special role of police in societies [16,136].Therefore, it is essential to evaluate the proposed solutions and how much they are going to be used by law enforcement agencies.Despite the powerful and fast tools that AI offers for policing tasks, utilizing them still raises ethical concerns about possible biases.According to these concerns, experts have argued that predictive algorithms are tools to assist law enforcement by enhancing their judgment, not to replace them [137,138].Additionally, studies suggest that the legal and ethical complexities of using ML algorithms in smart policing demand continuous attention.Therefore, a collaborative, multidisciplinary approach involving policing, computer science, law, and ethics experts should address the challenges and operational requirements of using such algorithms in smart policing by defining standards for transparency, intelligibility, and ethical considerations [139,140].
Transparency, in this context, refers to the visibility and accessibility of the used algorithm's source code and parameters [141].Intelligibility, on the other hand, pertains to the degree to which the code or disclosed information sufficiently explains how the model operates in practice, while auditability allows human observers to retroactively examine how the tool arrived at a certain decision [140].
Studies also recommend that for AI algorithms to be valuable in smart policing and law enforcement, they must not only improve their efficiency and accuracy but also be perceived as fair in their recommendations or decision-making [142].Data retrieval from these algorithms depends on the data that they are fed or trained with.As [143] explains, police play a special role in creating their data.The algorithms are trained based on historical datasets, which means that they can learn the biases and patterns in the data created by human decisions, so if there is a bias in the data themselves, this bias also exists in the functionality of the algorithm.On the other hand, another study argues that depending solely on human oversight of automated systems, known as "human-in-theloop" approaches, is deficient.Instead, it emphasizes the importance of transparency and accountability in the training phase of machine learning algorithms, especially during their parameterization.In addition, it explains that by using such methods, traditional accountability linked to a public official's decision-making has now shifted to those who design machine learning systems, collect the datasets, and implement the system within the framework.In other words, just having accurate predictions does not necessarily lead to improved smart policing performance.The authors of [139] highlighted the need for evaluating the fairness of algorithms and AI tools used in smart policing.
According to several studies [139,142,144], bias in AI algorithms is defined as using data or algorithmic outputs that lead to unethical discriminatory effects on individuals and communities, or when the collected data are insufficient or unrepresentative.Crime data themselves may be biased, reflecting past police actions rather than true crime patterns, so striking the right balance between predictive power and fairness is challenging.Developing and implementing fair and transparent algorithms requires interdisciplinary collaboration between police, mathematicians, computer scientists, data scientists, and legal experts.
Additionally, using AI in smart policing raises questions about proportionality and the balance between individual rights and public purposes, and about how much the police should inform the public about the AI that they use [16].In any event, each case of algorithmic implementation must be carefully reviewed, and ongoing attention and vigilance are needed to ensure fairness as the datasets are continually updated and revised.
The authors of [106,145] state that there are three main reasons for biases: data coverage, which means that police may not be aware of all crimes happening because not all crimes are reported to the police, and this reporting gap may lead to biases in specific regions (for example, regions with a higher presence of police may have a higher rate of crime or arrests) [146]; data richness, as the accuracy of the extracted data from police free-text relies heavily on the quality of the original reports, and their possible systematic imbalances specific to different areas and communities can lead to biases in AI algorithms; and algorithmic bias, which arises when certain crime descriptions are not well understood by certain models, especially if the original training data for the language models lack exposure to reports with such unusual language.
To address these challenges, [106] suggests conducting research on the richness and quality of information that is recorded about a crime incident, criminals, and victims, in addition to reviewing and considering all available models for different crimes and incidents to make sure that information is not mispresented to the algorithms.In addition, it suggests that the technical teams should work closely with police partners while sharing their data with additional security measures and their concerns.This approach provides a promising start to understanding the potential utility of AI in smart policing.
Moreover, another study [142] outlined different metrics proposed by [3] and [147][148][149][150][151][152] to measure fairness, which should be considered while designing predictive systems and algorithms in smart policing.These metrics include classification parity, which considers an algorithm to be fair if it equally predicts positive classification for both privileged and disadvantaged groups; calibration, which assesses the fairness of the algorithms by ensuring that subjects in both groups have the same likelihood of positive classification for any predicted probability; equalized odds, which requires equal likelihood of both positive and negative outcomes for both groups; and equal opportunity, which ensures that the predictor predicts positive classification for both groups with the same likelihood.It also defines fairness through awareness, which focuses on treating similar individuals with similar outcomes, and counterfactual fairness, which ensures that a prediction algorithm treats an individual equally regardless of the group they belong to.
In general, most studies emphasize the need for careful consideration regarding AI and its application in smart policing, so as to prevent unjust and unethical impacts.Understanding the computational techniques and datasets used in designing such systems is crucial, as biases within the data can lead to unfair outcomes.Moreover, as mentioned in [153], the use of AI tools can influence people's beliefs and practices, potentially prejudicing and disrespecting individual rights and dignity.Therefore, they proposed the ethics-of-care approach in AI system design to address these issues, aiming to mitigate significant flaws and potential harms in AI systems that can affect people's lives and societies.This ethical approach can extend beyond smart policing and find relevance in various applications of AI for consequential decision-making.

Next-Generation Smart Policing
Mobile Innovations Corporation offers an electronic pocket notebook (EPNB) application designed and implemented on Microsoft Azure to empower police officers by replacing traditional pen-and-paper methods and enhancing interconnectivity among law enforcement professionals.This solution facilitates the secure and comprehensive collection of data through mobile devices, allowing officers to integrate text, audio, pictures, statements, and tickets with their narrative reports, thereby creating a documentation system.
With the development of AI generative tools, large language models, and chatbots such as ChatGPT, there is a need for conducting research that delves into their applications in the realm of smart policing.Specifically, the focus of this research was on the integration of these advanced technologies within the EPNB framework.Such integration holds the potential to improve how law enforcement operates.
For this purpose, we used Azure OpenAI Studio to fine-tune the OpenAI API.The OpenAI API includes a set of models with different features, and they can be customized for specific tasks with few-shot prompting and fine-tuning.The Azure OpenAI Service provides REST API access to language models such as GPT-35-Turbo, which is optimized for conversational interfaces.We tried different examples and scenarios to test the namedentity recognition and summarization abilities of this model.Then, by means of prompt engineering, we customized the tasks for the chatbot on the Chat Playground of Azure OpenAI Studio using the following system message: "You are an AI assistant that helps police officers to fill report template files.You will be given a narrative report from a police officer, you have to extract the name of the criminal, the name of the victim, their age, race, sex, type of incident/occurrence, charges, the amount of the charges, date and time, location, addresses, and other related name entities and statue of the case and put them into a JSON format.For example: [{ "criminal name": "Value1", "criminal age": "Value2", "location": "Value3", # ... }] If any of this information is not defined in the report just leave it blank.You also need to summarize the narrative report and put it into "summary" in the same JSON format file." This chatbot receives a narrative report as an input and then extracts information such as date/time, location, criminal's name, victim's name, etc.Then, it generates a JSON format text including this information.The output JSON file is used to fill the incident report template.Figure 1 shows a schema of the application.As we tested different scenarios using this AI assistant, it showed great performance when comparing the results with police reports.This is a test case of using large language models in smart policing and how AI can be used in document management; however, there is still room for improvement and research due to the lack of available data.

Discussion
The landscape of AI and its applications in smart policing presents opportunities that hold the potential to enhance law enforcement practices.While acknowledging that each application within smart policing presents its own unique set of challenges, providing a comprehensive exploration of these challenges and their solutions within a single study is inherently intricate.Therefore, our focus was on reviewing methods with proven statistical reliability, aiming to cover a comprehensive and representative range within the scope of a literature review.Based on our survey, most of the current studies focus on using machine learning algorithms in smart policing applications such as crime prediction and the detection of suspicious activities through surveillance cameras.Additionally, some studies present NLP methods such as sentiment analysis, text classification, and information extrac-tion systems to detect hate speech crimes and analyze social media data for efficient crime analysis and criminal document management.However, future research could explore more advanced NLP techniques for extracting information from social media and police narrative reports.Leveraging deep learning models like transformer-based architectures or large language models could enhance the accuracy and depth of information extraction from free-text data.
In this study, we showed a test case of large language models in smart policing, but the integration of large language models into smart policing raises an urgent need for comprehensive evaluation in terms of technical feasibility and ethical considerations.As these models learn from a large amount of data, there is a potential for providing biased results that reflect the biases present in the data.Therefore, it is suggested to initiate a process of gathering feedback from police officers, law enforcement agencies, and even the related communities, so that we can evaluate these models with more confidence.The feedback-driven evaluation would aid in understanding the possible biases in AI tools and the ethical implications associated with their deployment in smart policing.Future research in this context could focus on the development of algorithms that actively detect biases in smart policing algorithms.These algorithms could be designed to recognize and minimize impacts on social groups, suggesting fair results and reducing the potential for discriminatory outcomes.By designing such smart systems, researchers can pave the way for the responsible and effective integration of AI in law enforcement practices.This would not only enhance the credibility of smart policing but also build trust between law enforcement and the communities that they serve.
Ethical considerations can affect the future of AI in smart policing, so creating specific ethical frameworks that address such challenges posed by generative AI and large language models is crucial.As discussed in several studies, these frameworks should center around transparency, accountability, and fairness by incorporating human-centered design principles that engage both law enforcement personnel and the involved social communities, such as technologies that align with their real-world needs.This approach will guide the development and deployment of AI tools, thus mitigating concerns of overreliance on automated systems in the decision-making process.
Moreover, a global perspective is essential in understanding how each country adopts AI strategies within its own legal and social contexts for smart policing.A comparative analysis would shed light on successful practices, potential pitfalls, and cultural norms that influence the adoption and implementation of AI technologies.Such insights could aid in the creation of adaptable and context-aware frameworks that consider the complexities of different jurisdictions and societies.

Conclusions
This systematic review explored studies that propose ML and NLP approaches to use in policing, in addition to summarizing the potential challenges and issues regarding the use of these methods.Predictive policing and other AI technologies have shown the potential to be faster than traditional response-based policing, as exemplified by their effectiveness in crime analysis, which could be helpful in monitoring criminal activities and allocating safety resources more accurately.However, its success in producing accurate results and its impact on crime rates depend on considering ethical concerns and understanding crime incidents, which can be resource-intensive to extract from police administrative free-text data.Based on our systematic literature review, ML and NLP offer possible solutions to ease the analytical burden for police, enabling their wider adoption.This widespread and careful adoption of smart policing could have a positive impact on society by reducing opportunities for crime and the resulting harm from victimization and offending.
While ML and NLP show promise, there are challenges, including the technical expertise required to use such models and the need to consider ethical issues and address potential biases.We listed the defined measurements to evaluate the ethics of using ML and NLP.Police agencies often lack the necessary expertise, and private companies may prioritize protecting their technologies over transparency.Therefore, studies suggest that it falls upon the academic community to explore how these technologies can support policing efforts and address these challenges to avoid negative outcomes.If implemented properly, AI can empower policing techniques, especially predictive policing, leading to more efficient monitoring of criminal activities and mitigation of the associated harms.

Figure 1 .
Figure 1.A simplified schema of the designed application for Mobile Innovations.The EPNB application is integrated into Microsoft Azure's cloud, which also offers services to use advanced technologies such as OpenAI and the Azure Language Service through API calls.

Table 1 .
Platforms that are in use by police departments throughout the nation.

Table 2 .
Description summary of studies that use ML algorithms.
KNN, RF, SVM, NB, CNN, long short-term memory (LSTM) Presents a comprehensive comparison of ML algorithms for crime hotspot prediction based on historical data of a large city in Southeast China from 2015 to 2018; they also used built environment data such as road network density and points of interest LSTM performed better than other models as it extracted the patterns and regularity from historical crime data more accurately; built environment data improved the performance as well

Table 3 .
Detailed measurements of studies that use ML algorithms.

Table 4 .
Studies that use NLP techniques.